NSF Workshop on Scholarly Evaluation Metrics – Morning 1

I attended this one-day workshop in DC on Wednesday, December 16, 2009. These are stream of consciousness notes.

Herbert Van de Sompel (LANL) - intro - Lots of metrics: some accepted in some areas and not others, some widely available on platforms in the information industry and others not. How are these metrics selected? Why are some more attractive than others?

Two other points: informal science communication on the web - it's being adopted rapidly - scholars immediately reap the benefits. Lots of metrics: views, downloads, "favorites", followers. So our current metrics are impoverished (not his word), looking only at citations. Also what about data sets? By using citations of data sets in the literature, are we measuring the right thing? How do we measure reuse to consider their impact? What about data sets used in workflow/workbench things like Kepler or Trident (or Taverna?) - these create provenance records that could be shared and then used to measure research.

And nano-citations that Jan Velterop deals with (see later on in afternoon post).

Metrics measure but they are also performative - they influence the evolution of the system. If there is an incentive for data sharing, like metrics, then this could encourage open sharing of data sets.

 

Johan Bollen (now at Indiana) - about MESUR. Two paradigm shifts they are trying to bring about: usage-based, and network-based metrics. Network based vs. rate based (counting citations).

Usage data: not just journals, but for anything available online, so can reflect all participants, not just those who publish in the same journals. Immediacy. Challenges: privacy, aggregating across various systems/algorithms, dealing with bots.

Networks: these methods are everywhere and are quite successful in many domains, why are they less frequently used in scholarly assessment? (he provides the reading suggestion of an old version of Wasserman, but he was talking metrics more like Bonaicich) Example: ranking journals using IF vs. Page rank - you can get a better picture bcs pagerank takes entire network into account. He gave examples of network methods - like closeness (Newman), like random walk (page rank, eigenfactor), entropy...?

Their project - a billion usage events from universities and vendors. Their data set is covered by lots of confidentiality agreements. Article-level usage events, unique session ID, date/time.... Their data set was more representative of the proportion science, soc sci, and humanities (JCR is like 92% sci 8% soc sci).  PLOS One article with a map. Journal of Nursing is evidence for the utility of this method - it is highly linked, lots of people read it, but it wouldn't show up on a citation map. Journals that don't have a lot of use, but are critical for the connectivity of the graph > important to communication in that field.

Metric map: metric 1 vs metric 2. Correlation of IF and usage. Map the metrics according to their correlation coefficients (like MDS?).  So that's the orangey red map from the PLOS article. Rate metrics like JIF are clumped. Total cites are clumped. Page rank, betweenness are each clumped. Usage are clumped. Usage and betweenness are much closer than any of the rate metrics to usage.

They're now looking at bursts and temporal effects. Contagion.

Future: sustainability, more research on how can be gamed (and how this is detected), hand off to disinterested third party and then allow use by having visiting scientists.

audience q: how get institutions interested in this if they are bound to citation information.

audience q: conceptually, usage isn't really that different from citation, so maybe it would be interesting if you can tell from the usage, how often people are following citations (like chaining to go from one article to the next)

audience q: what's that one in left field? 38 - journal use probability - very quick but doesn't correlate well with others at all.

audience q: talk more about privacy and what would need to happen so this data set can be shared now or by this third party in the future?

audience q: humanities - citation reasons are different need qualitative look....

audience q: filter out pay vs. free access to see different

audience q: gaming

audience q: person from a publisher - q about quality of data. they changed vendors and changed ways of collecting data - it is a concern, they looked for discontinuities in the data to try to find issues.

 

Michael J. Kurtz  (Harvard-Smithsonian CFA) - (see also my earlier post on his work - this is a very similar talk). Making decisions. If you're picking a school, is the US News & WR ranked 17 really better than 35? When librarians are deciding on subscriptions, Any local measure is more helpful than any global measure.

Measurements: Sky surveys: Palomar, Sloan, Deep Lens Survey. Measuring light from these sources. As an example of how measurement improved over time.

Three types of study:

  • inferior or poorly understood data - webometrics. example: reads from Google, reads by Astronomers - no statistically significant correlation between the 2. Springer reports some huge %
  • improvements, adjustments to established measures - different fields, researchers vs. practitioners, age differences (quadratic with age), different eras c ~ csub0e^0.04T, multiple authors ( how do you attribute the citation of a multiple author paper? - if you have a collection, divide by N, but that doesn't take into account 1st vs. nth author)
  • new data (look for Bollen & Kurtz paper on usage bibliometrics).

(new papers are read more, old papers are cited more) range of productivity of professional astronomers is ~10.

measuring non-articles: information services, data services, processes/workflows

measuring ads - utility cost saved by using ads - measure that gain by assigning time: 15 minutes to walk to a library and then photocopy, etc. So with all of that ADS saved in 2002, about 736 fte of research time (if I got that right).

Jian Zhang, Chaomai Chen and ... - visualizing papers per area of the sky.

Workflows - vistrails. Carol Goble's example of the reuse of a workflow - how does the originator of the module get credit?

audience q: time lag for gdp graph

audience q: are scientists willing to work with engineers to improve systems

my q: more on users from google vs. astronomers

audience q: how do you define a read - length of time on the screen?

More like this