Abstract

With the help of information technologies, we have access to very large networks, even with billions of nodes. This large size has limited our ability to perform analysis and provide theoretical compelling explanation on the whole network. One solution is to extract connected subgraphs and analyze them as subpopulations. We propose a method for extracting such subpopulation archiving two desirable properties: 1) be effective, resulting in subpopulations with more ties within them than to the external network; and 2) be fast, so that it scales well to large networks. We develop a method called the “Transitive Clustering and Pruning” (T-CLAP) algorithm. We compare the speed and effectiveness of this algorithm to two other popularly community detection algorithms - Newman's and Clauset’s algorithms. We find that T-CLAP is orders of magnitudes faster than Newman's algorithm; and is superior to Clauset's algorithm in terms of returning effective subpopulations that are useful.

Share

COinS
 

An effective and efficient subpopulation extraction method in very large social networks

With the help of information technologies, we have access to very large networks, even with billions of nodes. This large size has limited our ability to perform analysis and provide theoretical compelling explanation on the whole network. One solution is to extract connected subgraphs and analyze them as subpopulations. We propose a method for extracting such subpopulation archiving two desirable properties: 1) be effective, resulting in subpopulations with more ties within them than to the external network; and 2) be fast, so that it scales well to large networks. We develop a method called the “Transitive Clustering and Pruning” (T-CLAP) algorithm. We compare the speed and effectiveness of this algorithm to two other popularly community detection algorithms - Newman's and Clauset’s algorithms. We find that T-CLAP is orders of magnitudes faster than Newman's algorithm; and is superior to Clauset's algorithm in terms of returning effective subpopulations that are useful.