This paper studies k-plexes, a well known pseudo-clique model for network communities. In a k-plex, each node can miss at most k-1 links. Our goal is to detect large communities in today's real-world graphs which can have hundreds of millions of edges. While many have tried, this task has been elusive so far due to its computationally challenging nature: k-plexes and other pseudo-cliques are harder to find and more numerous than cliques, a well known hard problem. We present D2K, which is the first algorithm able to find large k-plexes of very large graphs in just a few minutes. The good performance of our algorithm follows from a combination of graph-theoretical concepts, careful algorithm engineering and a high-performance implementation. In particular, we exploit the low degeneracy of real-world graphs, and the fact that large enough k-plexes have diameter 2. We validate a sequential and a parallel/distributed implementation of D2K on real graphs with up to half a billion edges.
D2K: Scalable Community Detection in Massive Networks via Small-Diameter K-Plexes / Alessio Conte; Tiziano De Matteis; Daniele De Sensi; Roberto Grossi; Andrea Marino; Luca Versari. - STAMPA. - (2018), pp. 1272-1281. (Intervento presentato al convegno Proceedings of the 24th ACM SIGKDD International Conference onKnowledge Discovery & Data Mining, KDD 2018) [10.1145/3219819.3220093].
D2K: Scalable Community Detection in Massive Networks via Small-Diameter K-Plexes
Roberto Grossi;Andrea Marino;
2018
Abstract
This paper studies k-plexes, a well known pseudo-clique model for network communities. In a k-plex, each node can miss at most k-1 links. Our goal is to detect large communities in today's real-world graphs which can have hundreds of millions of edges. While many have tried, this task has been elusive so far due to its computationally challenging nature: k-plexes and other pseudo-cliques are harder to find and more numerous than cliques, a well known hard problem. We present D2K, which is the first algorithm able to find large k-plexes of very large graphs in just a few minutes. The good performance of our algorithm follows from a combination of graph-theoretical concepts, careful algorithm engineering and a high-performance implementation. In particular, we exploit the low degeneracy of real-world graphs, and the fact that large enough k-plexes have diameter 2. We validate a sequential and a parallel/distributed implementation of D2K on real graphs with up to half a billion edges.I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.