By Andrew White, PhD (Director, Library Information Services, Rensselaer Polytechnic Institute)
Column Editors: Tamir Borensztajn (Vice President of SaaS Strategy, EBSCO Information Services)
and Kathleen McEvoy (Vice President of Communications, EBSCO Information Services)
Against the Grain V33#6
The collection and use of analytics for libraries is not a 21st century innovation. Numbers that are associated with library operations and services such as gate counts, circulation transactions, community programs, total bound volumes, individual book titles, reference questions answered, as well as linear feet of shelving represent some subset of the types of analytical data libraries have been collecting for some time. What analytics should be and are collected can vary by library type. But regardless of whether the library supports elementary or secondary education, a public community, an institution of higher education, a hospital, a law firm, or a corporation, gathering analytics have been and are still part of library operations.
The aforementioned analytics examples address data about the physical library, in terms of both collections and services. But clearly, the rise and adoption of computer technologies and networks has shifted both the focus and emphasis on what types of data libraries gather for analysis. Some fifteen years ago, I wrote about the then evolving discipline of e-metrics for libraries. Certainly, much has changed since then, including the methods, variety, and details of e-metrics and analytics. Many of these changes have been for the better. For example, Project COUNTER has greatly improved the ability for libraries to assess the usage and financial value of digital information formats within their respective collections. Yet, some of the challenges that existed even prior to the digital transformation still persist while other potential analytics options have perhaps increased due to conditions catalyzed by the global COVID-19 pandemic.
But first, why do libraries gather and analyze data? Most likely, libraries use analytics to assist in understanding and improving the utilization of services and collections. And such data is also likely to be used to demonstrate library impact, value and return on investment to a parent organization or governing body. But with the majority of library analytics initiated and evolving as quantitative information for the purposes of developing operational excellence, how are these same analytics leveraged to tell the story of library impact?
To put this question in context, let us consider a comparison between pre- and current pandemic scenarios for a library. Under relatively stable pre-pandemic operating conditions, a library is able to provide physical and virtual access simultaneously to a spectrum of services and collections. Various analytics for gate counts, book and media circulation, and reference transactions can serve as indicators of activity volume. By extension, the same data acts as proxies for illustrating some degree of library value with an assumption that more activity is indicative of increased assigned value.
However, what if those same quantitative analytics were interpreted by the library’s parent organization or governing body during pandemic conditions? Under pandemic requirements for social distancing or the quarantining of materials, along with various restrictions to the physical library, the analytics for physical services and collections would show a dramatic decrease. Meanwhile, examples of current library virtual services analytics, such as numbers of virtual chat reference questions, library website sessions, and full-text downloads of online journal articles would exhibit an increase. Yet the combination and comparison of these two different service analyses and the disparity between the numbers could easily support administrative conclusions for cost-saving or budget reallocations. With reduced activity associated with the physical library, why not reduce library funding, cut staffing through furloughs or lay-offs, and shrink other services that appear as non-critical in the analytics? The potential danger here is that quantitative library performance analytics without broader context can lead to detrimental decisions.
In the above pandemic scenario, supplemental non-library analytics are the missing elements for better administrative decisions. And additional non-library analytics are needed to paint a more comprehensive picture of library impact. For example, libraries gather quite a bit of quantitative information for collection development purposes as a way to maximize finances while supporting a diversity of topics, subjects, and information formats available and relevant to their respective user community. And even the previously noted examples of library service analytics, while excellent quantitative indicators of activity, fall short of providing concrete qualitative evidence of impact and value.
There are numerous efforts devoted to assembling library value and impact analytics that are already underway. Return-on-investment calculators and large-scale user surveys are some of the methods that have utilized analytics to demonstrate the value of academic, public, and corporate libraries. In order to illustrate the effect that library operations have on the outcomes or performance of the parent organization and associated stakeholders, such methods require combining various analytics and data. At the University of Illinois Urbana-Champaign, for example, the university
“…examined the use of citations drawn from library resources in grant proposals, the success rate for proposals, and the average grant award. The university provided institutional data on the percent of faculty who are principal investigators, their success rate with grant proposals, the amount of university grants, and the library budget. A survey was conducted with UIUC faculty that validated assumptions in the model and provided measures that confirmed the importance and frequency of citations in grant proposals, and the likelihood that the citations used in grant proposals were drawn from library resources.”1
This academic library example shows how quantitative and qualitative analytics are not mutually exclusive and are useful in understanding both library operations and impact. The example also demonstrates that when it comes to showing library impact, the data that evolved for collection development purposes now must be re-contextualized to become proxies for qualitative conclusions when combined with other types of non-library analytics. Even in this example, we spot the well-established reporting habits in the larger information ecosystem that ultimately contribute to a continued reliance on quantitative data by libraries. The various citation impact metrics used by both publishers and authors are a prime example of how quantitative analytics are applied as approximations for determining and assigning value. Generally speaking, from a library collection development perspective, and frequently from a promotion and tenure perspective, more citations is assumed to equate to greater value and, in some instances, justification for higher costs of information access.
In public libraries, the library value calculations also tend to focus on the amount of money spent on library collections and services compared with the amount of money generated or saved in the library’s geographic service area. A review of the 2017-2018 Division of Library and Information Services Library and Data Statistics from the Florida Department of State illustrates such a foundational reliance on quantitative data. What follows is a portion of the information found in Table 22 — Collection Expenditures Per Capita — FY 2017-18. In this table the service area population data was provided by the Florida Estimates of Population, published by the University of Florida, Bureau of Economic and Business Research. This population data is then combined with library collections budget data from the Division of Library and Information Services.2
Taking these numbers at face value, the table shows that Miami-Dade serves more individuals than Palm Beach, and that Miami-Dade spends less on collections than Palm Beach. But does this comparison illustrate that one library is more or less cost effective for its respective service population? The Collection Expenditures Per Capita was calculated by dividing the Total Collection Expenditures per library by its respective Service Area Population. But is there a certainty that every individual in the population of a respective service area is a user of their associated library system?
When viewed as annual statistics over the period of a few years, such analytics have been very useful in identifying various trends, such as library collection expenditures, increased dependency on digital content, and adaptations in staffing titles and duties. However, it is not uncommon to see analytics comparable to the above table used for the purposes of rankings and peer comparisons. Maybe you have been asked to provide analytics data that is initially intended for national surveys, only later on to refer to that same national survey data in a larger aspirational peer comparison. Such comparisons can serve administrative lobbying efforts for more financial resources, reallocations or reductions in staffing or space, or modifications to hours and services. In such cases, the analytics standardizations needed for national surveys obscure some of the variability and peculiarities inherent in library peer comparisons. In the previous Florida public libraries example, does the table indicate that Miami-Dade should have an increase to its collections budget because its population is almost 2.5 times that of Palm Beach’s? How comparable are the collections of libraries in small private universities when there is only moderate commonality in the overall academic programs offered and research conducted at the schools in the comparison?
Perhaps the problem is that various library analytics are being presented to two different audiences with the hope that these same analytics will serve two different purposes. And unlike the analytics and data standardization that exists in various national surveys, there appears to be less agreement on what is to be counted and why when it comes to collecting and reporting data intended to illustrate library value. So if the analytics needs, purpose, and application differ depending upon the consuming audience, should libraries be developing an analytics portfolio designed specifically for achieving operational excellence and another analytics set to address the value, impact, and return on investment narratives?
To date, most library service analytics lack details that could illustrate use and impact at a user, customer, patron, or stakeholder level. Website visits, gate counts, COUNTER usage statistics, reference transactions, along with attendance at community programs and movie showings are some examples that tend to report activity numbers in terms of volume.The aforementioned UIUC example of assessing library value tries to supplement the quantitative activity tracking analytics by correlating the expenditures for library collections with faculty success, still relying on other homogeneous analytics in asserting positive library return on investment for the university.
Yet we are seeing some other citation initiatives, like alternative metrics to impact factors, that aim to assess societal impacts of research through mentions in the news, blogs, tweets, posts, or policy. I recall a conversation with one academic vice president and former editor of a prominent peer-reviewed journal, in which he challenged the various citation metrics as impact indicators. While he acknowledged that citation metrics provided some gauge of research impact, he indicated that he was more interested as an administrator in knowing which research outcome from a given author had the most significant contribution to innovation and new discovery, value factors not necessarily captured by counting the frequency of citations.
The increased reliance on networked access to cloud-based services and platforms can offer new opportunities for libraries to gather and collect data that provide greater understanding of library impact. As part of the larger shift to remote work and education activities, evidence suggests that pandemic conditions have stimulated greater interest in a combination of necessary increased cybersecurity and simpler user authentication systems. Such a combination perhaps offers user demographics analytics via library network authentication strategies now transitioning towards multi-factor and federated infrastructures. By contrast, a significant portion of Internet commerce and social media companies have leveraged for some time their ability to report activities at demographic levels, providing opportunities to analyze and understand value and impact. But with any technological innovation, libraries have good reason to be concerned about the potential pitfalls when ensuring adherence to privacy and security of such data. There are numerous examples where such user level data has been misused, compromised, or sold without user knowledge or approval. So as part of the continued commitment to and interest of patron privacy, libraries should have a seat at the table in order to shape and define the types of user level information that is being shared, gathered, and reported for the purposes of both service improvement and demonstrating value.
Still, libraries could begin rethinking the types of value-centric data that can be gathered and reported from existing systems and data sources. Even the development of some basic demographic level service analytics from patron categories which are initially intended as evidence of value could provide a secondary function of feeding back into service improvement analyses. Over the course of the COVID-19 pandemic, our library has gathered and reported inter-library loan activity at both demographic and material format levels. Such analytics helped to demonstrate library value and productivity during a period when the physical library was off-limits to all but library staff. The data also revealed trends in preferred and available information formats as well as subject interests, analytics that can contribute to improvements in collection development, interlibrary loan services, and consortially-based reciprocal borrowing/lending agreements. Our hope is that reporting inter-library loan activity at the demographic level helps to dispel some of stereotypical notions of libraries and their staff being less productive when they are physically unavailable to the public.
So if history is any indication, the advancement of both information technologies and information management will continue to increase our reliance on analytics as tools for data-driven decisions. And as information formats, methods of digital access, and standards for authentication evolve, libraries will need to re-imagine how best to develop new and existing analytics in order to more accurately demonstrate their continued importance and value to many facets of the global society and economy.