Wednesday, April 3, 2013

Dealing with the aftermath of cancellations

Due to factors beyond the immediate control of our library, we had to reduce our materials budget for this current fiscal year by about one-seventh.  This was my initiating into collections management, essentially cutting journals and databases.  For journals, key factors involved in making decisions on what to cut included:

  • Usage
  • Length of embargo from aggregator
  • Cost  (both total cost and cost-per-use)
Although we had worked with our subject liaisons and had communicated the cuts to the faculty through both the liaisons and our dean, it has only been since January when they've really taken notice.  This is because of the inevitable delays between release of the final budget and actual elimination of service.  One such incident provides a good example.

We had decided to cancel the subscription to Proceedings of the National Academies of Science (PNAS) primary because our $3000+ subscription fee was paying for only six months worth of content, and about 20-25% of these articles were Open Access.  The copyright fee for PNAS is $2.00 - a quick glance at the articles in PNAS (research articles) suggests an average article length of about 8-10 pages.  It would take, then, about 150 ILL requests to make up for the cost (OK, less than that if you factor in indirect costs of ILL, but you get the point).  Indeed, there had only been one ILL request for a PNAS article this year.  

Yet, our liaisons are hit by (admittedly a vocal few) irate faculty who are indignant that we would not have a current subscription.  I hope the liaisons inform these reasonable people of the factors involved.  I wish we could include this information in a Notes field in the MARC record -- maybe they'd read it, maybe not.  

Sunday, March 17, 2013

Research misconduct in LIS?

The editorial, Bogus Evidence by R. Laval Hunsucker in the latest Evidence-Based Library & Information Practice (EBLIP), discusses at great length the potential for research misconduct in the LIS field.  After setting the stage with the recent apparent rise in "questionable research practices" (QRP's) and outright fraud in the basic, medical, and social sciences, Laval brings our attention to our own profession, or rather, the lack of attention that our profession has given this issue.  He questions whether we, as members of the LIS profession, should consider research misconduct to be more or less or equally prevalent than in other research fields.  He admits that there is enough evidence to support any of these positions, yet not enough evidence to reach any conclusions.  And that, Laval asserts, is the crux of the problem.  Why should we assume that we are any better (or worse) than any other field?  And if we are neither better nor worse, shouldn't we be concerned that we are equally bad?

Laval brings up some very valid points, particularly regarding the difficulty of detecting research fraud.  Indeed, having been involved in a few research studies, I can imagine where fraud could occur, if so desired, particularly with data.  Auditing the data collected is rarely done, and yet, it is, I believe, the weakest (or easiest) point.  I have heard of large surveys where one single survey-taker fraudulently completed forms. Proper follow-ups detected the problem, but not before so many had been submitted that the integrity of the entire study had been threatened.  But that is an easier problem to detect because the researchers were themselves conducted the study with integrity.  The more difficult cases occur when the researchers manufacture the data.  Only a full data audit could detect this, but as mentioned above, this is so rarely done because it is difficult and time-consuming (and thus, costs money).

So, like so many researchers, those conducting studies in LIS are trusted to collect, analyze, and report their data in an unbiased and appropriate manner with little oversight.  Is this trust justified?  It appears that in this day and age of competition for jobs, promotions, and respect, people are growing more deceitful.  But could we not say, too, that in this day and age of growing transparency with the Internet, people are growing more skeptical and distrustful?

Professionally, I'm less concerned with outright fraud in the LIS literature than with QRP's related to poor training and limited knowledge.  This is particularly true regarding studies conducted by practicing librarians like myself, rather than LIS research faculty who have completed more formal training and apprenticeship in research, in the form of the dissertation.  Most practicing physicians do not initiate and formally conduct clinical studies.  There are many who participate in research, but very few actually develop proposals, gather the data, analyze the results, and write the papers of their own research.  Yet academic librarians are very often expected to do so themselves, often with less training than the physicians receive.  Therefore, why should we not expect QRP's to occur?  Laval himself notes that the "good news, and the other important difference, is that genuinely fraudulent research in LIS is almost certainly far less prevalent than sloppy research in LIS."

Actually, the apparent increase in research misconduct does need more study in order to establish the environment in which our research can be trusted.  But to judge all studies with a jaundiced eye will make it more difficult to move ideas into practice.  Laval discusses the many proposed solutions, ranging from changes to rewards and incentives to formalized research integrity training.  Like any complex problem with multiple foundations, no solution addressing just one of these foundations will work by itself.  But like they tell those with mental or addiction problems, simply recognizing the problem exists is the first step.

Saturday, March 2, 2013

New results from impact studies

The ACRL Value of Academic Libraries site highlighted two articles recently published from the University of Minnesota, led by Shane Nackerud.  Both articles are published in the April 2013 issue of portal: Libraries and the Academy, although I'll be linking to their institutional repository.  The first article describes how the data was collected and analyzed and provides basic demographics of the users of library's services.  The research questions in this paper were:
  • Do sufficient measures exist to determine what services individual library patrons use?
  • Do the Libraries reach the majority of students in some way?
  • Do students in different colleges use library materials and services in different ways?
  • How does undergraduate library use compare to that of graduate students?
With increasing use of methods that capture, essentially, who uses each service, they were able to link with demographic and academic data about each user.  The "access points" or services for which data was captured includes:

  • material circulation
  • ILL requests
  • library workstation logins 
  • usage of electronic resources for those who were off-campus or those who logged into the library workstations
  • attendance at workshops
  • reference consultations
  • course-integrated instruction (through Blackboard)
While not a complete set of data from all library services, this set does represent much of what the library provides.  Missing are usage of electronic resources from those on-campus and not using library workstations, in-house usage of materials, brief reference transactions, and visits to the library.  The authors admit that such data, especially the last, would be essential for measurement of "library as place", but they express concerns about over-reaching to such an extent that it would affect usage of the very services they would be measuring.  

But it's still a big data set - over 1.5 million transactions from over 61,000 unique users.  They were able to work with their Office of Institutional Research to get the demographic and academic data.  This step has been a common obstacle to such impact research for many libraries, whose own OIR's were reluctant to share the information.  The solution seems to be to split the collection of data between the two campus units - library gathers the user identifiers and the OIR provides the demographic/academic data, returning an anonymized data set to the library.  This way, neither party should have access to the entire data set, thus securing privacy that much more.  

With the complete, anonymized data set, the librarians were able to run correlation analysis to determine if academic achievement was in any way associated with library usage of any kind.  Pretty basic...and, to no one's surprise, there was a significant correlation.  This is very much in-line with other similar studies, such as the Library Impact Project.  I found this part interesting, though (emphasis added):
Already library staff have been able to share this data with University deans and administration and the feedback has been both positive and somewhat unexpected. For example, while University administrators have been enthused by the results, they are also not surprised. It seems intuitive that libraries should be able to demonstrate appropriate levels of usage, and that usage should result in increased academic success.
How frustrating!  University administration puts libraries (and others) under pressure to justify our value, our impact on student achievement, then says, "So?"!   Maybe this demonstrates a need to find out exactly what measures administrators expect.

In the second paper, the authors look at a subset of the population that could provide the clearest association of library usage and academic outcome: first-year, non-transfer students.  This group would have the fewest confounders, such as previous college experience, to muddie the results.  They looked at the effects of library usage on both student achievement (grades) and retention.  Their statistical analyses was more sophisticated than you typically see in LIS research, using not only t-tests and chi-squared-tests to determine the significance of differences between groups, but also determining the effect size (medium) and multiple linear and logistic regression.  They attribute the (relative) richness of this analysis to their own outreach to campus statisticians, and they recommend libraries not try to do it all themselves.  Hear, Hear.  The value of doing this analysis is that the authors demonstrated the size of the relationships they had found (significantly positive), but also the limitations of such relationships (most correlations were small).

However, with their models, they were able demonstrate significant effect of any library usage on GPA, while controlling for demographic factors.  Essentially, users of library service had a GPA 0.23 points higher than non-users, and that usage of the library accounted for 12.4% of the difference between the groups.  Given how many factors went into the model (12), that's a bigger chunk than expected (1/12th or 8.3%).  The second model broke the services out.  Not unexpectedly, these effects were much, much smaller.  The only services that showed statistically significant effects were database use, book loans, and workstation logins, and while the size was small, they were services that would be used repeatedly over the course of the semester. The totality of the usage of services represented a larger share of the effect - 13.7%.

Finally, their logistical regression models were conducted to predict student retention based on either usage of any service or usage of specific services.  This kind of model demonstrates the strength of the relationships by showing how library services can predict, or explain, outcomes.  This is a key aspect of research - can a variable predict a specific outcome?  If it can, then it can be used to change the outcome.  These models were both significant, even when adjusting for demographic factors.  Another important feature of logistic regression analysis is the calculation of the odds ratio (OR).  This measures the sized of the effect of the variable on the outcome.  In this case, students who used any of the library's services were 1.54 times more likely to continue to the next semester than those who did not.  Conversely, few of the individual services showed significant effects on retention; those that did were likely due to small sample sizes (few attendees).

So, what does this all mean?  Using these moderately-sophisticated statistical analyses is very much like triangulation - analyzing data from different angles to see the true picture.  This picture shows that there appears to be modest relationship between usage of any of the library's services and student achievement and retention.  However, picking out which services had the biggest effects is more difficult.  The linear model showed database logins, workstation logins and materials circulation as having a small effect; this doesn't show, though, in the logistic model.  More evidence, then, is needed.

It is somewhat disappointing that more interpersonal services, such as instruction and consultation, showed much lower effects.  This, I imagine, is due in no small part to the size of the data set.  Usage of these was services much lower compared to the more self-service, well, services. This could hide any association of the less-used services because of higher standard errors.  The problem with studying such low-usage services is selection bias.  If this can be controlled, randomly-selecting classes to provide instruction, then the effects should be more significant.

These articles, I think, are invaluable to the efforts of demonstrating value.  Like all applied research, it is but a piece in the overall puzzle.  It is not sufficient for the argument, but with more such studies filling in the gaps of knowledge, the picture becomes more and more clear.  It would be nice, however, if those who are the intended audience of such studies (presumably the campus decision-makers) would show their interest.

Tuesday, February 19, 2013

Collection Assessment: Going in the right direction


For the last six months or so, I've been trying to develop a more systematic method of evaluating our collections, incorporating different kinds of measures.  So it's nice to see examples from other libraries, as demonstrated by the slew of posters and presentations from the last Library Assessment Conference.  Here are highlights from a few that piqued my interest...

From out of UC Berkeley, Susan Edwards, et al. describe their evaluation of the library's collections based on three types of measures: collection uniqueness (overlap with their closest peer), direct usage (cross-tab analysis of book usage by location and patron affiliation), and indirect usage (citation analysis of dissertations).    This is very much in the direction I've been working, evaluating the collection from different angles, using these exact same measures (among others).  For collection uniqueness, they point out both that having a fair amount of overlap is appropriate, but that there is no national benchmark for overlap percentages.  How unique should the collection be?  I'd be interested in perhaps collaborating with UC Berkeley to come up with that national benchmark for overlap.  But the citation analysis was most interesting, in part because they used a random selection of citations.  A major obstacle of conducting citation analyses is the time and labor necessary to gather and record each citation.  This is really not necessary if a random selection is used appropriately.  I'd really like to learn exactly how they did their selection.  From this analysis, they learned that their monograph collection for the social welfare students did not meet their needs as well as other collections.  An interesting feature of their poster was an interactive slide on which users could add stickers that related their estimation of how well their own libraries met their users' needs.

From the University of Maryland University College Library, Lenore England and Barbara J. Mann describe their efforts to centralize the evaluation of electronic resources.  Their poster described the criteria included in the evaluations, as well as the methods of communication with faculty and students regarding the review process.  What was most interesting was the use of a LibGuide that is used to both document the process and communicate the progress to those who may be most impacted by the collection development decisions.    The LibGuide not only makes the process transparent, but also provides the opportunity for comments from the stakeholders. This may be a useful method to employ in our next go-around of budget cuts.

Alicia Estes and Samantha Guss from NYU described their methods for Data Gathering and Assessment for Strategic Planning.  This was accomplished using a team-based approach, with librarians from a wide range of divisions of the library.  The team gathered data to be used in the planning process, including summarizing recent library assessment activities, discovering and producing an inventory of data collected, and "identifying trends."  In addition to providing data for strategic planning, the poster listed some lessons learned from this project.  These included discovering a need for more training in gathering, analyzing and understanding statistics, the need for an individual explicitly responsible for gathering and managing data ("to 'own' assessment"), and most notably, the need for a "more uniform process for data collection."  Alicia and Samantha, I feel your pain.

But this is a good lead in to a set of posters on developing such processes and repositories.  From Joanne Leary and Linda Miller describe Cornell Library's implementation of LibPAS for their annual data collection.  This caught my eye because we, too, are implementing LibPAS as a central repository of our statistics.  Some of the challenges, opportunities and the "Conceptual Shifts" seemed quite familiar, including the "chance to review and rethink" data collection, the challenge of a large and complicated organization, and the shift of having standardized data that is immediately available.  Although it's a little late for us to learn from their efforts, but it is good to know with whom we could collaborate or to whom we could go for ideas.  Nancy B. Turner, from Syracuse University, described their use of SharePoint for their data collection. Their document repository was most intriguing, with its "structured metadata for filtering results".  Finally, there is the poster from Kutztown University Library (you learn something new everyday) which describes their efforts to combine their locally-grown data repository system (ROAR) with the university's TracDat system.  Again, this caught my eyes because of our use of TracDat for campus assessment.

Of course, the latest efforts have been to associate usage of library resources and services to student outcomes, notably grades.  The poster from the University of Minnesota focused on using the data that are already available to the library to make this connection.  This included circulation, computer workstation logins, e-resource logins (mostly from off-campus users), registration for library instruction, and individual consultations.  Despite certain limitations of this data, they were able to demonstrate clear quantitative associations of a number of data with student grades and re-enrollment.  They do not mention if these associations were tested for statistical significance, but I am definitely interested in their methods.

Overall, I realized how much I missed from last year's Library Assessment Conference and what I hope to contribute this coming year.

Sunday, February 10, 2013

Favorite TEDTalk of the week

For the last few months, I've been trying to schedule a time on Sunday mornings to watch the latest TEDTalks posted during the week.  Today, I'm having to catch up from missing a couple of weeks due to various reasons.  While most are quite interesting, I did want to highlight one that I found most intriguing and/or inspirational.



 Tyler DeWitt talks about his efforts as a middle-school science teacher to explain science without the "tyranny of precision" and conforming to the "cult of seriousness".  His key line is, "Let me tell you a story..." advocating the use of storytelling as a tool for engaging students.  Tyler seems like a person who has been born & raised in a cult and has just thought of and is bewildered by ideas such as speaking your mind and freedom to believe in any (or no) religion.  He (metaphorically) wonders why others haven't thought of this before.  It's almost like Tyler believes he's the first person to use storytelling to reach students in science.  And it's quite reasonable for him to believe this.  He, himself, has been indoctrinated into the "cult of seriousness" and has been complicit in the "tyranny of precision" through his own science education and training to teach and write about science.  So we can forgive him for his seemingly egocentrism and examine the meat of his argument: Make the teaching of science engaging and inspiring to students by using simpler language, metaphors that the students can relate to, and occassional "little lies".  While he winces when people call it "dumbing down" (you'd be surprised at how many of the negative comments use this exact phrase), ....  The "little lies" is another aspect that some people take exception to.  But Tyler affirms it is better that the students learn overall concepts that may not be 100% accurate than to not learn any of the concepts at all.

I always find it interesting to read the comments of the TEDTalks to find what others think.  Inevitably, there are the gushing "Right on!" and "Amen!" comments, but just as inevitably there are naysayers.  I think this exchange is good and right.  And I found it quite interesting that the bulk of the negative comments came from fellow science teachers who express the exact sentiments that Tyler advocates against: a tyranny of precision, focusing on the idea of "little lies" idea; and the "dumbing down" of science.  I think the concerns about the "little lies" are due to a poor choice of words.  From what I understood, Tyler is not advocating lying per se but rather not being 100% precise.  Using the examples of his talk, by leaving out the fact that a few viruses use RNA instead of DNA, he avoids confusing the students as they attempt to understand the basic concept of bacteriophage viruses.  The more exceptions you throw into an explanation, the harder it is to understand.  So, while Tyler is not telling the whole truth, he is, nonetheless, telling the truth.   The second issue of "dumbing down" is also about choice of words, in this case, by the naysayers.  Tyler emphasizes "simplifying" the language, not "dumbing it down".  I believe the difference is based on your assumptions of how learning should happen.  If you believe that it is the individual student's responsibility to "do the work" and understand it on his or her own, then using simpler language is "dumbing down".  If you believe it is the responsibility of the teacher to explain and to, well, teach, then using simpler language is one tool of many.

Finally, context of the class is key to the teaching methods used. Tyler (and most science teachers in  primary, secondary, and lower-level undergraduate education) is teaching students many of whom will not even go to college, let alone become a scientist or science teacher.  His goals are to have his students understand the basic concepts of science and the scientific method, and to inspire interest in discovering the ways of nature.

So, what does this have to do with librarianship?  Well, librarianship is an extension of education, and using library-centric terms and teaching searching skills using complex concepts like "boolean" and "relevance".  Just as with science teachers, there are some who think we should teach the terminology and not "dumb it down".  I believe there is a compromise of sorts - teaching the terminology by telling a story.  Similarly, by using metaphors and storytelling to explain concepts of information literacy, our goals should not include making our students professional searchers, but rather to be able find relevant and useful information and to evaluate the sources and potential biases of the sources.

Saturday, February 9, 2013

The data you need?

Walt Crawford, who has done some rather amazing analyses of library-data, notably the freely-available data from IMLS (for public libraries) and NCES (for academic libraries), appears to be feeling a little, well, under-appreciated.  In his post, "The data you need? Musings on libraries and numbers," (in which he admits to have edited "to reduce the whininess") he expresses his concern that there appears to be a lot of data out there but nobody seems to care.  He cites a series of examples, including the low sales of his own work, the apparent demise of Tom Hennen's American Public Library Ratings (which, admittedly, I was previously unaware of), and the unfortunate circumstances that led a PhD colleague to pursue other venues because there were no jobs to analyze data in libraries.  This last example hits home with me because I feel quite fortunate to have the title of Collection Assessment Librarian.  Not only am I paid to analyze data about our collections, but that is my primary responsibility; it was not tacked on to the list of responsibilities of the Collection Development Librarian or the Reference Librarian or even the Dean.  This is what I do.  I am not meaning to boast, but rather to express my appreciation.  I also hope to point out that while such work of data analysis may be under-appreciated, I think there is interest, however scattered it may be.

But this is a general problem associated with LIS field itself - are we a science or are we a profession?  Can we be both?  The science of LIS implies that data is analyzed to answer fundamental questions regarding the who, what, when, where, why, and how of libraries and information.  But the big questions are almost always asked by the academicians.  Those in the profession are generally more concerned with the little questions regarding their collections, their budgets, their users.  What I think is needed is a greater connection of the little questions to the bigger ones.  How is my collection at my library affected by economic forces of scholarly communication?  In exactly what ways will the local, state and national economies affect the services and collections of my library?

My concern is that we are not preparing professional librarians to make these connections.  While "research" is mentioned in 16 of the approximately 90 sections/standards in the 2008 Accreditation Standards, courses in research methods and data analysis are haphazardly required by the SLIS graduates.  Of the 3 LIS schools in Texas, only UT requires a course in research methods, but not in statistics.  While I don't think that a practicing librarian needs to have the same training and skills of an epidemiologist or social sciences researcher, I do believe they should be able to read and evaluate published research and apply it to their smaller questions.  I also think they should be able to conduct small-scale studies to answer their questions using methods that will provide valid answers.

Walt goes on to question his own contributions, due to lack of response from the library community.  He is, essentially, taking a sounding, asking - Is anybody there? Does anybody care?  Well, Walt, I think we do care and some of us do read your results.  I think what is contributing to this apparent anomie is not disinterest, but perhaps a kind of paralysis - what do we do with this? It is interesting that the libraries in my state have generally moderate circulation rates or that circulation is correlated to expenditures.  What can I do with that information?  While this may be taught in the core curriculum of MLS programs, it may be forgotten as the graduates enter the workforce and get sucked into drudgery of their everyday routines.

Trying to address these issues, he asks some questions which are quite familiar to me:
  • Am I asking the right questions?
  • Is there any analysis that is worth doing?
  • How can this information be made "meaningful and useful to librarians"?
  • Are librarians "willing to deal with data at all–to work with the results, to go beyond the level of analysis I can do and make it effective for local use"?
  • Can librarians "get" the differences in statistical measures, such as averages versus medians?
I don't believe the truth is clear about this.  It is probably something like, some of us do care but don't get it; some of us get it but don't care; some care and understand it, but don't have the time; and some of us are quite interested and can follow through.  And it's not clear whether this last group is growing in numbers or just staying on the fringes.

Finally, he asks the one question that got me to start this post in the first place: What is the data we need?  This struck me because I've been working on my first collection assessment using more formal process and I keep thinking of other measures to include:
  • Relative circulation rates compared with holdings rates
  • Distribution of age of books
  • Distribution of materials by type, age and usage
  • Comparisons of these against our peer institutions
  • Comparisons of databases against our peers
  • Spending on these subject areas compared with our peers
  • Acquisitions of recognized materials (highly-recommended, highly-cited, award-winning, etc.)
  • Coverage of resources in databases
  • Usage of all of our resources (notably electronic)
  • Publication of materials in this area, especially given changes to the ecology and economy of scholarly communications
  • Impact of primary research materials on the field and in the school
Some of this data we have or can start collecting.  We are paying dearly for the use of the WorldCat Collection Analysis System so that we can compare with our peers (I understand the risks and problems with this but we believe it can still provide valid trends and comparisons).  We have been working to standardize how circulation and in-house usage data is collected at the different collections or libraries within our system.  And we have been working to bring all the data into central repositories to make comparisons and analysis a little easier.

Others, notably of usage and publication, are notoriously difficult.  It would be very useful to know if our usage of selected databases differed significantly from usage of the same resources at our peer institutions.  Heck, even after nearly 10 years of the COUNTER standard, our own usage data is still quite difficult to compile and understand (for some vendors, that data is absolutely worthless because it is masked by queries through a common interface).  

So, Mr. Crawford, I just wanted to say, I feel your pain.  It can seem lonely doing all this work without the formal recognition and the ultimate expression of value (money).  This is why I will start looking at the NCES data more carefully and try to think of how this information can be applied locally.  And I finally put my money where my mouth is...I purchased the book (print is still my preferred format) and downloaded the Graphing ebook - you now have one sale.

Saturday, February 2, 2013

What could I do with 20 extra hours a week?

This past week, I have hired two library student assistants.  This is a first for me, to be solely responsible for interviewing and hiring and training anybody.  While each student has other responsibilities, about half of their time can be spent on tasks that I can assign.  So, this adds up to one part-time worker!  Now, what could I do with essentially 20 extra hours a week?  Here are some ideas:

  • Compare our holdings in specific subject fields with those of our other institutions..
  • Check the rate of ownership of award-winning books by year of award.
  • Analyze the distribution of circulation by subject area, year of publication and patron type and compare it with the distribution of usage of ebooks.
  • Analyze the life-cycle of materials by format (print book, ebook, media), and determining the period to first use and length of "time on shelf" between uses.
  • Compare the items requested via ILL from previous years to determine how many we eventually gain access to.
  • Determine differences in MARC records and other metadata between items used and items not used.
  • Compare the usage of ebooks through different vendors to answer the question, Does platform matter?
  • Conduct a Brief Test of Collection Strength.
  • Conduct a basic descriptive assessment of our media collection.
  • ....I've only just begun....
Admittedly, it will take me some time to develop the methods and document the procedures for the assistant to do these tasks.  But these are pretty smart students who have a desire to do well and gain experience.  I, too, look forward to developing some basic managerial and supervision skills and experience.  

Although you could say they are only students, I feel a responsibility to mentor each one and provide an introduction to professional librarianship in an academic setting.  Towards that end, I am considering providing a space for students on this blog, in which they could develop their writing skills.  I'd be interested in learning more from them.