D2K: Scalable Community Detection in Massive Networks via Small-Diameter K-Plexes

Conte, Alessio; Tiziano De Matteis,; Daniele De Sensi,; Grossi, Roberto; Marino, Andrea; Versari, Luca

doi:10.1145/3219819.3220093

This paper studies k-plexes, a well known pseudo-clique model for network communities. In a k-plex, each node can miss at most k-1 links. Our goal is to detect large communities in today's real-world graphs which can have hundreds of millions of edges. While many have tried, this task has been elusive so far due to its computationally challenging nature: k-plexes and other pseudo-cliques are harder to find and more numerous than cliques, a well known hard problem. We present D2K, which is the first algorithm able to find large k-plexes of very large graphs in just a few minutes. The good performance of our algorithm follows from a combination of graph-theoretical concepts, careful algorithm engineering and a high-performance implementation. In particular, we exploit the low degeneracy of real-world graphs, and the fact that large enough k-plexes have diameter 2. We validate a sequential and a parallel/distributed implementation of D2K on real graphs with up to half a billion edges.

D2K: Scalable Community Detection in Massive Networks via Small-Diameter K-Plexes / Alessio Conte, Tiziano De Matteis, Daniele De Sensi, Roberto Grossi, Andrea Marino, Luca Versari. - STAMPA. - (2018), pp. 1272-1281. (Proceedings of the 24th ACM SIGKDD International Conference onKnowledge Discovery & Data Mining, KDD 2018 ) [10.1145/3219819.3220093].