Wednesday, May 23, 2012

Thinking about the value of libraries

This news story caught my attention recently: D.C. to cut 34 school librarians as they are a poor investment.  The news post is a brief summary of the story in the Washington Post and includes this statement from the DC Public Schools Chancellor:
“We have invested in full-time librarians for the last three or four years and we haven't seen the kind of payoff we'd like.”  While noting that she is not disparaging librarians she said, "We have pulled away from programs where we haven't received a return on our investment.” 
The article implies that the ROI would be measured as improved test scores.  Similar cuts in other prominent school districts were also mentioned, including the LA district cutting school nurses.  This last point made me think about the potential value of such services that are more or less indirectly involved in education.

While there have been studies that have demonstrated direct correlations of quality libraries (measured by number of books per student) or quality librarians (measured by direct student-contact activities) and student achievement, the DC Chancellor did not see this association in her district.  Was the association too oblique or indirect? Was the data not available?  Did she even look at any data that was available?  Were the librarians given a decent chance (enough hours, enough resources, enough time with students)?

This requirement of demonstrated value and "return on investment" is quite familiar to libraries of all types and levels.  Unfortunately, librarians have been behind the curve on understanding this and coming up with useful tools of demonstrating their value.  Conversely, administrators and purse-string-holders are also behind the curve of recognizing that value should be demonstrated in more ways than money or test scores.  They are using similar reasons to justify elimination of what had been standard support services such as school nursing, speech therapy, and counseling.

(On a side note, I find it terribly hypocritical that those who uphold the value of classroom teachers and who justify the cutting of such services by stating they need to add more teachers in the classroom are, in fact, merely shifting the responsibility for these services onto the already heavy load that teachers must bear.  Not only must they teach a more rigid curriculum, but they are bearing more responsibility for ensuring children take their medications (oh, that's right - that's there responsibility for the school admin assistants), helping the depressed child deal with the disrupted family, teach a child to cope with stuttering, and teach all the children how to find and evaluate information or find a good book.  Some help they are giving teachers.)

Given that the powers-that-be are not changing their approach, that they have demonstrated their will to eliminate what cannot be demonstrated as valuable, and that this trend is decades old and shows no sign of receding, I believe we must accept and move on.  As the ACRL has been advocating through the Value of Academic Libraries project, librarians need research to quantify the value that we have known all along that libraries provide.   The blog has some good ideas on the kinds of questions that need to be, and could be, answered, including:

  • How does use of print and digital collections correlate with course pass/fail rates, grades, or GPA?
  • What is the monetary value of providing the resources students use to learn course content and complete assignments over alternative sources?
  • Is there an association between grant funding success and access to library resources, particularly those used in the grant application?
This is a challenge to librarians and library administration.  But I don't think it is one that is insurmountable.  Nor do I believe the challenge can be ignored or considered contemptuously.  Recall Ranganathan's Fifth Law of Library Science - The Library is a Growing Organism.  I believe that libraries are not doomed, as long as they continue to evolve.  An organism that fails to grow or evolve fails to live.  Personally and professionally, I am excited to accept this challenge and prove the worthiness of the library and librarians (including myself) to the the most skeptical.

Wednesday, May 16, 2012

When is a library no longer a library?

Here is an existential question - What happens when a library sells books?  Are they still a library or are they a bookseller?   This blog posting advocates having libraries sell ebooks directly to those who don't want to wait to read the latest bestseller.  One commenter noted that, "Then you're no longer a library if you start selling books in volume. Libraries loan books."  That begs the question - when is a library no longer a library?

Certainly, libraries have sold books before, usually used books pulled during weeding projects or donated by patrons and others who were cleaning out their own shelves.  This is legally possible due to the print materials' "first sale doctrine", which does not apply to electronic resources.  Do these sales make the library a bookseller?  Does it no longer make them a library?

What about selling books by a visiting author?  I'm not sure if this still occurs in public libraries, but I recall this many times in our school libraries.  If so, how does that affect the library's role in the community?

I was at first repulsed by the idea of mixing commerce with the library's ideal of providing resources free of charge.  But thinking about it further, I'm not sure it really changes this ideal - the library would continue to lend, but under the restrictions enforced by publishers.  Providing an opportunity to own a book would enable the library user to continue to see the library as a source of information and reading material.

What was it that has kept libraries on the edges of the bookselling business?  Has it been the traditional separation of commerce and government?  That separation is blurring, with private/public partnerships being advocated by the fiscally-conservative politicians and activists.  Has it been the reluctance of librarians to view themselves as commercial?  Business owners are often both revered and reviled in large and small communities.  As public servants, librarians (including myself) like to consider themselves above the level of those who sell.  But don't we need to sell?  OK, usually we sell the need for a free access to books, resources, and public space.  But how is selling ebooks any different from holding a book sale?

If the issue is the separation of commerce from government, then the Friends of the Library model should work as well for e-books as it would for used book sales.  Links in the catalog could direct users not to Amazon (directly or indirectly via Google Books), but instead to the FoL's ecommerce site where the ebook could be purchased.  FoL would continue to donate the money raised to the library.  Is this really inviolate?  Does this really change the identity of the library?  If the library still continues to loan material free of charge, isn't it still a library, even if it sells?  I'm not sure...

Tuesday, May 15, 2012

Library Data and issues of granularity versus privacy

A new posting on the Library Impact Data Project blog discusses an interesting dilemma that is all to common when analyzing raw data.  How detailed can and should you get when dissecting that data?  In this case, the problem is regarding user groups and the potential for revealing identities.  The author is concerned that presenting data on all possible user groups might inadvertently reveal individuals' identities due to the very small numbers in certain groups.  In this day and age of IRB's and academic privacy concerns, that would be "a BIG data protection no-no."  Then, the issue becomes how to aggregate the groups in a logical and useful manner.  Simply combining two groups (in the author's example, say, Black and Chinese ethnicities) would not necessarily be either logical or useful.  Indeed, the author argues that there should be some commonality among the groups combined.  In the public health setting where I came from, the ethnic groups were based largely outcomes - the groups that had the worse outcomes would be compared with the groups that had the best outcomes.  Other issues to consider, which the author notes later in the posting, is the impact of the results - what could be done to change the outcomes?  While librarians cannot change or effect the change of a person's ethnicity, they could direct programs to these groups that work more effectively.  Qualitative research could be conducted to determine the reasons for differences in the measured factors that led to differences in outcomes.

In this case, the factors were number of E-Z Proxy logins and the number of downloads, and the outcomes were graduation levels (based on grades).  Not sure if they looked, but I could not find any publication that examined these factors such as race and home country against the outcomes.  This would be important to consider for aggregation in addition to similarities of groups.  In this case, the White students used the electronic resources the least of all ethnic groups.  If this group of students had greater rates of higher graduation levels, then perhaps ethnicity is a confounder, changing the relationship between e-resource usage and grades.  Those from other countries in the European Union were using the e-resources the most - did they have a harder time using the resources?  Was there a language barrier?  Cultural differences in the organization of the libraries' resources?

The author does a good job of describing the issues associated with analyzing data - it's not simply comparing the averages of two groups.  You need to consider aspects of the data set (number in each group, number in the entire set, the distributions of the values, the kind of measurement used), the relationship of the variables and groups (are the groups independent, are the variables distinct, are there similarities of the groups), as well as the context (the environment, the language, the culture, the population, etc.).  In other words, you need to know your data well.

Thursday, May 10, 2012

The impact of Google on Library databases

A recent thread on the Summon discussion list has spurred further discussions elsewhere.  The thread started with a complaint about the lack of predictability among the results of boolean searches in Summon.  Essentially, it was hard to determine why particular results appeared in the list due to other "relevance factors" that "polluted" the ranking of the results.  The librarian mentioned having a preference for "the pure drop" - predictability.  From this initial comment, other discrepancies of results of the same searches in different systems, notably relating to differences in processing boolean operators, stemming, and phrase searching.  This led to additional posts calling for "advanced" search functions, such as proximity operators, wildcard (or "hard stemming"), and search set manipulation, all of which could be described as "librariany".  The thread was pretty much ended by the comments from the other end of spectrum - those who were concerned that such changes would revert the Discovery system back to a typical library database.  These posters pointed out that such systems were purposefully designed not to be like your typical library database - built for the easy use by students and non-experts, not for the librarians.

This thread was picked up by Aaron Tay, Senior Librarian at National University of Singapore.  He started thinking about the effect Google and search engines in general have had on library databases, including catalogs and abstracts & indexes.  He considered how such systems have slowly adopted selected features and functions that users have come to expect in any search engine of any kind, notably ranking by relevancy, "implied AND", automatic stemming, and full-text searching (when possible).

As he discussed the implementations of these features, I noticed that predictability of search results is a common thread of concern among librarians.  We want to explain why each and every result is in that set.  We want to be sure we get "the pure drop".  This is an essential difference between librarians and the clients they serve.  Students do not appear bothered if there are a few irrelevant results - as long as they get more relevant ones than not.  They effectively "bleep" over them and move on.  Furthermore, for most students and public library users, the level of satisficing is lower than for librarians and some expert searchers.  These attributes may explain why our users (and many front-line reference librarians) love these Discovery services so much more than expert reference and research librarians.

In my view, Summon has done well to bring more resources to the attention of our users - full-text usage is up, as well as audiovisual downloads.  The expert searching features and functions should only be introduced if they do not interfere with the system's core feature - a simple keyword search interface.  Certainly, I agree that they should not "treat it like the 'normal'" library database.  There are different tools for different tasks - choose wisely.

Tuesday, May 8, 2012

Digging into Data Challenge - what's next...

While reviewing the ALA's summary of the ARL meeting in Chicago, my eyes locked into the Digging into Data Challenge.  Sponsored by various humanities and social sciences organizations, this challenge offers prizes and prestige to those who develop innovative and interesting ways of gaining insight from untapped data resources.  The first round was in 2009, with recipients proposing ways of analyzing literature, music notations, letters, speeches, images and railways to understand the human condition.  The recipients for the second round were announced just this past January.  Key projects that caught my attention especially included:

  • An Epidemiology of Information: Data Mining the 1918 Influenza Pandemic
  • Electronic Locator of Vertical Interval Successions (ELVIS): The First Large Data-Driven Research Project on Musical Style
  • Imagery Lenses for Visualizing Text Corpora
  • Digging into Connected Repositories (DiggiCORE)
  • Integrating Data Mining and Data Management Technologies for Scholarly Inquiry
One that was particularly interesting to me was (emphases added):
Digging into Metadata: Enhancing Social Science and Humanities Research
Principal Investigators: Mick Khoo, Drexel University, IMLS; Diana Massam, University of Manchester, AHRC/ESRC/JISC. Additional participating institutions: University of Glamorgan. 
Description: The project will automatically generate new forms of metadata tags from existing metadata records and associated resources that will support discovery across multiple repositories.  The project will utilize four repositories that vary in size, domain, metadata creation method and workflow, and quality.  PERTAINS, a tool developed by one of the partner schools, will be used to analyze the metadata records in each repository and then to generate Dewey Decimal Classification-based tags.  Clustering algorithms will be used to generate an index of similarity and match between resources in different repositories.  After conducting a search, the user will retrieve a list of resources from the different collections that have been tagged in similar ways. Visualization techniques will be used to display the results in ways that enhance the research process.
I look forward to seeing these completed projects in about 2 years.   

Friday, May 4, 2012

Data, data, data - on journal prices!

I was forwarded info about the latest release of the Journal Cost-Effectiveness project, put together by  Ted Bergstrom and Preston McAfee.  They are researchers who have become quite heavily involved in the fight against the practices of overcharging by commercial publishers.  Because of their emphasis on the continued rising prices of subscriptions on top of very large profits, their data is divided by for- and non-profit entities.  Their methodology is well-described and transparent, and they provide the complete set of data for all years (2009-2011) for downloading.

The data set includes price (2010 prices) per article, price per citation (citations in 2009 papers to articles published 2004-2008), composite price index ("geometric mean of the Price Per Article and the Price Per Citation"), a relative price index (relative to mean of non-profit subscription price within same subject), and a ranked "value" based on the relative price index (<1.25 is "good", 1.25-2.00 is "medium", >=2.0 is "bad").

I used SPSS to see the distribution of the measures by profit status (for- or non-profit):

Doing a t-test of the differences in the mean showed that the differences were, of course, all statistically significant.  But that doesn't do justice to the extreme differences between these two groups.  Also important is the amount of variation within each group.  This time I looked only at mean price per article for each of the 3 "value" categories.  Here's an error chart:
This demonstrates not only the extreme differences between the price per article for non- and for-profit publishers, but also the differences in variation.

Now, same data, only by value and clustered by profit status:

Notice that the variation in the "good" value category is the least (very tiny bars), is somewhat larger in the "medium" category, but much larger in the "bad" category.

This is a look at just one aspect of one measure of this very nice data set.  I'm going to have fun looking at the other aspects of this and the other measures.  But what I'd really like to do is merge this with our own list of journals and see the value we're getting.  Of course, the prices are based on the published subscription prices for libraries (for tiered prices, they used that for a large academic library), and may not match ours.  I am disappointed that they did not include all of the data, including prices, number of articles, and number of citations.  Maybe they'll provide it if I ask...

Wednesday, May 2, 2012

The Fifth Law

I find Ranganathan's Fifth Law of Library Science the most intriguing: The Library is a Growing Organism.  While the First Law can be considered a "Trivial Truism", which he compares to Newton's First Law of Motion, the Fifth Law is an abstraction of natural laws.  Essentially, "an organism which ceases to grow will petrify and perish."  Without this understanding, librarians and library administrators deny their charges the necessary nutrition (funding) and nurturing (planning), and these libraries will stagnate (loose their impact) and die (closed due to lack of interest or support).

He describes both growth in size (of collections, of readers, of staff), as well as in form (evolution), although he devotes most of this chapter on the former.  Ranganathan had already spent much of the first half of the book detailing the evolution of the laws of library science from "Books are for Preservation" to "Books are for use", and from "Books are for the Chosen Few" to "Every Reader His Book".  And while he "cannot anticipate fully...what further stages of evolution are in store for this Growing Organism -- the library...", his Fifth Law ensures the understanding that libraries will continue to grow and evolve, or they will perish.

Regarding the growth in size, Ranganathan laments the "modesty" that library administrators underestimate the rate of growth of their charges. Even worse, he charges, is acting on this assumption of little change, and setting in motion "a faulty organisation obstructing the free development of a its full stature."

He then associates library collection sizes with publication rates in the world.  He lists the book production for 1927.  I added 2011 numbers for comparison:

Country 1927 2011
Russia 36,680 123,336 336%
Germany 31,026 93,124 300%
Japan 19,967 78,555 393%
India 17,120 82,537 482%
Great Britain 13,810 206,000 1492%
France 11,922 67,278 564%
United States 10,153 288,355 2840%

Ranganathan then lists the annual rates of accession in some of the largest libraries, listed here with the latest rates I could find:

Library 1927 2011 Rate Source
Library of Congress 202,111 454,212 225% 2010 Annual Report
Cambridge University Library 90,916 139,948 154% 2010 Annual Report
Birmingham Public Library 28,566 166,847 584% Local Library Standards Audit 2010-2011
Imperial Library, Calcutta 7,832 62,773 801% Calculated by comparing data from the Encyclopedia of Library Science1 and the National Library's Web site
1. Gaur, R. C., Jeyaraj, V., & Kumar, K. (2009). India: Libraries, archives and museums. Encyclopedia of library and information sciences, third edition (pp. 2291-2329) Taylor & Francis. doi:doi:10.1081/E-ELIS3-120044942

I wanted to include others listed, but getting the acquisitions data on libraries is not easy.

It is interesting, though, that it appears that Ranganathan's attempt to associate publication rates with library collection size has broken down over time. It is interesting that the LoC acquisitions was only two & a quarter times the number of over 80 years before, while the number of items published is 28 times. This indicates that libraries have ceased being the storehouses (even the LoC) of all information. That merely changes the relationship of the publication rates with the collection growth rates.

But Ranganathan moves on to describe how libraries should be able to accommodate the natural growth (which R. assumed would continue unabated). After addressing how to make the physical building and furniture flexible for this growth in size, he then addresses the need for the classification system to accommodate growth. Not unsurprisingly, he laments the Library of Congress Classification System for using a "primitive method of leaving gaps int he ordinary serial use of numbers," while promoting Dewey's system as "a demonstration of the immense potentiality of the decimal fraction." But most of all, he admonishes libraries to use standard classifications rather than tweaking systems or making their own.. He does mention the Colon Classification system he was then developing (at the time of the second edition), in that its mixed alphanumeric format and use of decimal formats are able to apply the three aspects of analytico-synthetic scheme: phase, facet and zone.

Next, Ranganathan turns his attention to the growth in number of readers that will inevitably occur as both the population increases and the Second Law brings about changes to that population (greater availability of reading material, increased literacy, "open access" shelves). I worry a bit, though, that with the decreasing emphasis on reading print books, libraries have been moving their books into long-term storage. While there have been improvements that decrease the amount of time it takes to retrieve a book, this policy effectively closes the stacks again, reversing the efforts made over a century ago. As electronic books can eventually replace print books, the print-to-electronic ratio may take a while to reach 1:1. Will the shift of collections reduce the availability of materials to such an extent that it has a detrimental effect on access to information?

Finally, he addresses not only the growth in size of the library, but also in form. Here, however, his foresight is limited and he focuses the progress that has already occurred. He discusses the gradual shift from keeping books chained like prisoners to "stock-taking" to limited "stock-use" to the "highly differentiated and complicated character of the organisation of the library to-day". He returns to the "growing organism" metaphor by celebrating the variety of libraries as "species" with their own "problems and peculiarities" but also with "common features".  

He ends this chapter on his final law detailing "the vital principle of the library" -
"it [the library] is an instrument of universal education, and assembles together and freely distributes all the tools of education and disseminates knowledge with their aid."
There would be no better way for S.R. Ranganathan to end his treatise on library science and librarianship.