Computer Science, Web of Science, Scopus, conferences, citations, oh my!

The standard commercial library citation tools, Web of Science (including their newish Proceedings product) and Scopus, have always been a bit iffy for computer science. That's mostly because computer science scholarship is largely conference-based rather than journal-based and those tools are tended to massively privilege the journal literature rather than conferences.

Of course, these citation tools are problematic at best for judging scholarly impact in any field, using them for CS is even more so. The flaws are really amplified.

A recent article in the Communications of the ACM goes through the problems in a bit more detail: Invisible Work in Standard Bibliometric Evaluation of Computer Science by Jacques Wainer, Cleo Billa and Siome Goldenstein.

A bit about why they did the research:

Multidisciplinary committees routinely make strategic decisions, rule on subjects ranging from faculty promotion to grant awards, and rank and compare scientists. Though they may use different criteria for evaluations in subjects as disparate as history and medicine, it seems logical for academic institutions to group together mathematics, computer science, and electrical engineering for comparative evaluation by these committees.

*snip*

Computer scientists have an intuitive understanding that these assessment criteria are unfair to CS as a whole. Here, we provide some quantitative evidence of such unfairness.

A bit about what they did:

We define researchers' invisible work as an estimation of all their scientific publications not indexed by WoS or Scopus. Thus, the work is not counted as part of scientists' standard bibliometric evaluations. To compare CS invisible work to that of physics, mathematics, and electrical engineering, we generated a controlled sample of 50 scientists from each of these fields from top U.S. universities and focused on the distribution of invisible work rate for each of them using statistical tests.

We defined invisible work as the difference between number of publications scientists themselves list on their personal Web pages and/or publicly available curriculum vitae (we call their "listed production") and number of publications listed for the same scientists in WoS and Scopus. The invisible work rate is the invisible work divided by number of listed production. Note that our evaluation of invisible work rate is an approximation of the true invisible work rate because the listed production of particular scientists may not include all of their publications.

A bit about what they found:

When CS is classified as a science (as it was in the U.S. News & World Report survey), the standard bibliometric evaluations are unfair to CS as a whole. On average, 66% of the published work of a computer scientist is not accounted for in the standard WoS indexing service, a much higher rate than for scientists in math and physics. Using the new conference-proceedings service from WoS, the average invisible work rate for CS is 46%, which is higher than for the other areas of scientific research. Using Scopus, the average rate is 33%, which is higher than for both EE and physics.

CS researchers' practice of publishing in conference proceedings is an important aspect of the invisible work rate of CS. On average, 82% of conference publications are not indexed in WoS compared to 47% not indexed in WoS-P and 32% not indexed in Scopus.

And a bit about what they suggest:

Faced with multidisciplinary evaluation criteria, computer scientists should lobby for WoS-P, or better, Scopus. Understanding the limitations of the bibliometric services will help a multidisciplinary committee better evaluate CS researchers.

There's quite a bit more in the original article, about what their sample biases might be, some other potential citation services and other issues.

What do I take away from this? Using citation metrics as a measure of scientific impact is suspect at best. In particular (and the authors make this point), trying to use one measure or kind of measure across different disciplines is even more problematic.

Let's just start from scratch. But more on that in another post.

More like this