The distance for a pair of vertices in a graph G is the length of the shortest path between them. The distance distribution for G specifies how many vertex pairs are at distance h, for all feasible values h. We study three fast randomized algorithms to approximate the distance distribution in large graphs. The Eppstein-Wang (ew) algorithm exploits sampling through a limited (logarithmic) number of Breadth-First Searches (bfses). The Size-Estimation Framework (sef) by Cohen employs random ranking and least-element lists to provide several estimators. Finally, the Approximate Neighborhood Function (anf) algorithm by Palmer, Gibbons, and Faloutsos makes use of the probabilistic counting technique introduced by Flajolet and Martin, in order to estimate the number of distinct elements in a large multiset. We investigate how good is the approximation of the distance distribution, when the three algorithms are run in similar settings. The analysis of anf derives from the results on the probabilistic counting method, while the one of sef is given by Cohen. For what concerns ew (originally designed for another problem), we extend its simple analysis in order to bound its error with high probability and to show its convergence. We then perform an experimental study on 30 real-world graphs, showing that our implementation of ew combines the accuracy of sef with the performance of anf.
A Comparison of Three Algorithms for Approximating the Distance Distribution in Real-World Graphs / P. Crescenzi; R. Grossi; L. Lanzi; A. Marino. - ELETTRONICO. - (2011), pp. 92-103. (Intervento presentato al convegno First International ICST Conference on Theory and Practice of Algorithms in (Computer) Systems tenutosi a Roma nel 18-20 aprile 2011) [10.1007/978-3-642-19754-3_11].
A Comparison of Three Algorithms for Approximating the Distance Distribution in Real-World Graphs
CRESCENZI, PIERLUIGI;LANZI, LEONARDO;MARINO, ANDREA
2011
Abstract
The distance for a pair of vertices in a graph G is the length of the shortest path between them. The distance distribution for G specifies how many vertex pairs are at distance h, for all feasible values h. We study three fast randomized algorithms to approximate the distance distribution in large graphs. The Eppstein-Wang (ew) algorithm exploits sampling through a limited (logarithmic) number of Breadth-First Searches (bfses). The Size-Estimation Framework (sef) by Cohen employs random ranking and least-element lists to provide several estimators. Finally, the Approximate Neighborhood Function (anf) algorithm by Palmer, Gibbons, and Faloutsos makes use of the probabilistic counting technique introduced by Flajolet and Martin, in order to estimate the number of distinct elements in a large multiset. We investigate how good is the approximation of the distance distribution, when the three algorithms are run in similar settings. The analysis of anf derives from the results on the probabilistic counting method, while the one of sef is given by Cohen. For what concerns ew (originally designed for another problem), we extend its simple analysis in order to bound its error with high probability and to show its convergence. We then perform an experimental study on 30 real-world graphs, showing that our implementation of ew combines the accuracy of sef with the performance of anf.I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.