Comps readings: community detection

By cpikas on June 2, 2009.

Last set of comps readings, I talked about sense of community:
belonging, having influence, fulfillment of needs, and emotional
support. Now, let's talk about the physics version of
"community" - cohesive subgroups. In a graph, these are
groups of nodes in a graph that are more connected to each other than
to other parts of the graph. Clumpy spots. If you read old
href="http://www.worldcat.org/oclc/30594217">Wasserman and
Faust, you'll probably think of cliques, cores, and lambda
sets... some how these didn't do it for me - literally, when I was
trying to
href="http://terpconnect.umd.edu/%7Ecpikas/ScienceBlogging/PikasEScience08.pdf">locate
communities in science blog networks, it didn't work..
If you have a computer science or maybe even sociology
background you'll probably
just look at some sort of clustering (agglomerative or divisive).
The hot thing for the
past few years comes from physicists and that's what's covered here.
I did other posts on SNA
articles, so those are mostly
elsewhere. (BTW - if you ever take stats for the social sciences and
can substitute R for stata, do so and take the time to learn it. The
igraph package for R has all of the coolest community detection
thingies in it) (note, too, that these readings are not necessarily for
the dabbler in bibliometrics or social network analysis!)

Newman, M. E. J., & Girvan, M. (2004). Finding and evaluating
community structure in networks. Physical Review E (Statistical,
Nonlinear, and Soft Matter Physics), 69(2), 26113-21.
(just go here)

This article, like the ones from Barabasi, sort of kicked off this
flurry of research. They use a divisive clustering technique
- so they start with the whole network, and break the connections with
the highest betweeness. See figure.

See how if you remove
that one line, how you completely break up the thing? That line has
high betweenness. So they calculate that for all of the lines using
whatever method, then take the line with the highest out, then
re-calculate and remove, and again. They then go on to talk about the
actual algorithm to use to efficiently do all of this betweenness
calculating and give some examples. There's a lot in this
article, though, because they next talk about how to figure out when
you're done and if you've got decent communities. This measure is
modularity (see the article for the definition), but basically it's 0
if random and 1 is the maximum. If you calculate Q at each step, then
you can stop when it's highest. Note that any given node can only be in
one community, unfortunately. (in real life, people are nearly always
in multiple communities)

Reichardt, J., & Bornholdt, S. (2006). When are networks truly
modular? Physica D, 224(1-2), 20-26. doi: 10.1016/j.physd.2006.09.009
(or look here)

They review Newman and Girvan and suggest a new way that groups
connected nodes and separates non-connected
nodes. They go through a process and end up
with an equation that's apparently like a Hamiltonian
for a q-state Potts spin glass (dunno, ask a physicist if you need more
info on that!). This method allows for overlapping
communities because there could be times when you could move a node
from one community to the next without increasing the energy.
They compared it for some standard graphs and it did better
than N-G. Instead of just stopping by minimizing modularity, they
compare the modularity to a random graph with the same degree
distribution.

Reichardt, J., & Bornholdt, S. (2007). Clustering of sparse
data via network communities-a prototype study of a large online
market. Journal of Statistical Mechanics: Theory and
Experiment, P06016. doi:10.1088/1742-5468/2007/06/P06016

In this one they test the spin glass community detection method against
the German version of ebay to look for market segmentation. The network
has bidders as nodes, and if they bid on the same item there is an
edge. The spin glass method was successful at pulling out
clusters and using odds ratios, the authors showed that these clusters
corresponded to groupings of subject categories. The Q was much higher
than it would be for a random graph.

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Yeah, me too.

August 2, 2010

I'm also leaving ScienceBlogs, but it's not for the reasons some others have given. I don't think Pepsi's blog will hurt my real life reputation and besides, it's been pulled, there have been apologies - it's time to forgive. July was the first month I've gotten enough hits to get a paycheck - and…

Very cool - American Physical Society offers free access to public libraries

July 29, 2010

This APS rocks! Here's the press release from PAMnet: FOR IMMEDIATE RELEASE APS ONLINE JOURNALS AVAILABLE FREE IN U.S. PUBLIC LIBRARIES Ridge, NY, 28 July 2010: The American Physical Society (APS) announces a new public access initiative that will give readers and researchers in public libraries…

Michael Pater, Connecticut artist, died today

July 25, 2010

He was also my husband's uncle. I only found two of his images online, the remainder are photographs of prints we have on our walls - intentionally poor quality for those. He was a member of the Lyme Art Association, so there may be more information on their site. The Courant (Hartford, CT)…

Hey maybe scientists should do more than just wait for their journal to issue a press release on their new fabu article

July 25, 2010

The authors thesis is that the only mandatory communication of results is in peer reviewed journal articles. Scientists aren't required to do other communicating and often leave communication to the public to the media. They ask if is this is adequate given the very low percentage of scientific…

Well, sometimes you just have to Google it

July 21, 2010

So there I was, try all kinds of librarian ninja tricks on the fanciest, most expensive research databases money can buy (SciFinder, Reaxys, Inspec...) and no joy. Couldn't find what I needed. I'm perfectly willing to admit that I don't know all that much chemistry, but usually I do ok since I work…

Comps readings: community detection

More like this

B-Trees - Balanced Search Trees for Slow Storage

Balanced Binary Trees in Haskell

Two-Three Trees: a different approach to balance

Advanced Haskell Data Structures: Red-Black Trees

Yeah, me too.

Very cool - American Physical Society offers free access to public libraries

Michael Pater, Connecticut artist, died today

Hey maybe scientists should do more than just wait for their journal to issue a press release on their new fabu article

Well, sometimes you just have to Google it

Exposing a Climate Science Fraud

Maxwell's Equations & Light

Our Galaxy's Next Supernova