Home 9 Featured Posts 9 The Increasingly Complicated World Of Citation Part 1: Changing Methods & Applications

The Increasingly Complicated World Of Citation Part 1: Changing Methods & Applications

by | Jun 2, 2022 | 0 comments


By: Nancy K. Herther, writer, consultant and former librarian with the University of Minnesota Libraries

Throughout the long history of scientific research and reporting, theories that have formed and continue to evolve about the influence and value of prior research in the building of knowledge and discovery. A key feature of this is the formal recognition by citation to key works that influence each new scientific contribution. In a 2021 editorial in Elements Magazine, French geologist Jannick Ingrin asserts that, “authors have all been affected by the change in value of publications, far from the genuine and simple goal of sharing scientific knowledge. We are now much more conditioned by citation indexes than we have ever been.”

Ingrin goes on to note that “citation indicators basically and simply assess the visibility of a paper, which is far from being a perfect evaluation of the paper’s influence or impact in the field. Notwithstanding, today more than ever, we all look at our paper citations as a form of quality indicator of our work. In addition, our careers are partly evaluated and fundamentally influenced by these citation indicators. Consequently, we allow citation indicators to influence the way authors write papers and the way reviewers evaluate them.”


In his key 1975 article, John Martyn argued that “citation in the primary literature expressly states a connection between two documents, one which cites and the other which is cited, whereas citation in other listings does not usually imply any connection between documents other than that affected by the indexing machinery.”  In his 2005 book, The Hand of Science: Academic Writing and its Rewards, John Cronin noted  that “we are still left with a black-box explanation of citing behavior.” 

UC-Berkeley’s Judy Bolstad, in her review of Cronin’s book, noted that “over the past several decades, academic writing and scholarly communication have changed dramatically due to various societal influences. Scholars and scientists have continuously shaped the concept of academic writing and how it is viewed, causing a shift in its purpose to include a more social perspective.” Citation, Bolstad noted, has also opened the door to more complex forms of reward, valuation and evaluation – which, along with the developing internet, has forever changed the perception and influence of the global research enterprise.


The rise in using patent citations for research and evaluation became popular in the 1980s.   This became known as patentometrics with Swedish researcher Björn Hammarfelt noting that “interest in patents – not only as legal and economic artifacts but also as scientific documents – became evident in the 1980s.” 

The number of journals and other publication venues is becoming another challenge that existing citation indexes have not kept up with. “The degree of internationalisation of general practice journals varied from 94.2% for family practice to 2.0% for primary care,” noted Taiwan researchers in 2022. “There are wide disparities in internationalisation among different countries and general practice journals. There is much room for improvement in the internationalisation of general practice journals in the SCI database.” 

“Researchers cannot keep up with the volume of articles being published each year. In order to develop adequate expertise in a given field of study,” noted Northeastern University researchers, “Students and early career scientists must be strategic in what they decide to read.” However, guidance and standards are difficult to find, even if they exist.  And, beyond the problems with adequate access to the growing international literature, is the issue of citation and indexing.


Traditional bibliographic indexes and databases continue to cover ‘core’ literature. Citation Indexes also continue to play a key role.  However, web search engines and other web-based options are challenging the role of bibliometric tools as a type of quality control. A soon-to-be-published Scientometrics analysis of the three core citation databases – Web of Science, Dimensions and Scopus – finds that “the databases do present structurally different perspectives, although Scopus and Dimensions with their additional circle of applied research vary more from the more base research-focused WoS than they do from one another.”

Dimensions. Released in 2010, Dimensions “was introduced as an alternative bibliometric database to the well-established Web of Science (WoS) and Scopus, however all three databases have fundamental differences in coverage and content, resultant from their owners’ indexation philosophies,” noted a recent article by German researchers.  Called “the world’s largest linked research information dataset,” Digital Science notes that “with Dimensions, you can search across multiple content types, ranging from publications to grants, clinical trials, patents, datasets and policy documents. The linking of all these data enables you to view the information in context, gain new insights from complex analytics, and trace the relationships between your results.”  

Scopus. Elsevier’s abstract and citation database, Scopus, was launched in 2004 and now covers nearly 36,377 titles from approximately 11,678 publishers, of which 34,346 are peer-reviewed journals in top-level subject fields: life sciences, social sciences, physical sciences and health sciences.

The h index was proposed by Jorge Eduardo Hirsch in 2005, defined as the number of papers with citation number ≥h, is considered a useful indicator of the scientific output of a researcher.

Established in 1999, scite is a Brooklyn-based startup that “helps researchers better discover and understand research articles through Smart Citations –citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence.”  Today, scite has one billion citation statements “extracted and analyzed from over 30 million full-text articles.” 



Despite the key role that citation plays throughout the research process from funding requests to publication and application of results, a recent  2021 post from Clarke and Esposito notes, “you might think that after 20 years of research and more than 130 studies on the subject, we’d have a clear picture of the effect that open access publishing has on an article’s citation performance.

Unfortunately, decades of poor studies and a mystifying unwillingness to perform experimental controls for confounding factors continues to muddy the waters around the alleged open access citation advantage (OACA).”

A growing literature is being published on citation that not only continues to explore options, but also questions various aspects of the reality and impact of citation.  A cursory review of recent literature reveals many areas of concern and potential, future research:

1. Citation & Retraction

“Scientific publications with compromised integrity should be retracted. Papers citing retracted publications might need correction if findings depend on the retracted publication, notes a recent study. “While many studies have reported on post-retraction citations, few have focused on citations made before the retraction.”

2. Citation Mapping & Future Research Agendas

Increasingly, newer methods of analysis, such as citation mapping and graphic presentation are being used to better analyze citation data and to better identify  “trends and potential research fields for the future.” Recent research is now using graphical presentation, bibliometrix features of R software, VOSviewer and other software as a way to better “see” trends such as mapping themes and co-occurrence.”

3.  Finding New Ways to deal with Big Data Research

Big Data often means analyzing huge datasets – whether structured, semi-structured or unstructured.  Recently, researchers have begun using citation and co-citation analysis to explore research articles.  One report which studied trends in Big Data itself, used “citation and co-citation analysis to explore research articles using Big Data to determine “the degree centrality and betweenness centrality” for identifying core papers. Researchers noting that “this literature review is one of the first studies to examine the knowledge structure of BD research in the information systems discipline by using evidence-based analysis methods.” 

4.  Citation Bias – Is It Questionable Research or Scientific Misconduct? 

Danish researcher Peter C Gøtzsche recently examined the concept of citation bias in the very data analysis practices used by researchers.  Focusing on “placebo-controlled trials,” he found examples that he believes “amount to scientific misconduct, as it seriously distorts readers’ perceptions of what the best available evidence tells them, or constitutes misleading justification for conducting additional placebo-controlled trials.” The key importance of this at this time is that “More attention to citation bias is needed to reduce potentially harmful consequences for patients, both directly, by distortion of evidence, and indirectly, by eroding trust in science.”

5. Linking “Massive Scientific Productivity” to Creating the “Citation Elite”

Researchers have also looked at the dominance of researchers with the highest “scientist citation profiles” in the citation patterns for COVID research. The results found significant advantage for some researchers over others. “For many scientists, citations to their COVID-19 work already accounted for more than half of their total career citation count. Overall, these data show a strong covidization of research citations across science with major impact on shaping the citation elite.”

6. Questioning the Value of the H-Index With Changing Authorship Patterns

In 2021, Intel researchers, after “analyzing millions of articles and hundreds of millions of citations across four scientific fields and two data platforms,” advises that the use of “h-index in ranking scientists should be reconsidered.”  Their research found “that fractional allocation measures such as h-frac provide more robust alternatives by “an interactive exploration of the data.” The limits of the h-index would be mitigated by focusing on the “fractional allocation of citations among authors, which has been discussed in the literature but not implemented at scale.”

7. Using Citation Network Analysis

Dealing with issues of gender bias in publication, researchers from Northeastern University focused on helping novice academics….suggesting that “students and early career scientists must be strategic in what they decide to read.” Their suggested remedy focuses on “identifying key research communities” involved in research, using graphic theoretical approaches to “identify the most influential papers within each community and the ‘bridging’ articles that linked distinct communities to one another.”  They conclude that, at least for the field of psychology, this method would provide “opportunities to increase gender equity in the field.”

8. Deciding which Research to Cite May Be Biased

Noting that “citations are made not only based on the pure scholarly contributions but also based on non-scholarly attributes, such as the affiliation or gender of authors,” researchers from Kyoto University studied whether “preprints are affected by citation bias with respect to the author affiliation.” Finding that “citation bias exists and that it is more severe in case of preprints,” their study suggests that “as preprints are on the rise, affiliation-based citation bias is, thus, an important topic not only for authors (e.g., when deciding what to cite), but also to people and institutions that use citations for scientific impact quantification (e.g., funding agencies deciding about funding based on citation counts).” 


Aug. 16, 2018; Hannah Rubin (Photo by Matt Cashore/University of Notre Dame)

In an excellent thought piece in a 2022 article in Philosophical Studies, University of Notre Dame philosopher Hannah Rubin notes that “in many fields, members of underrepresented or minority groups are less likely to be cited, leading to citation gaps. Though this empirical phenomenon has been well-studied, empirical work generally does not provide insight into the causes of citation gaps.” 

Rubin argues that “the social identity of a researcher can affect their position in a community, as well as the uptake of their ideas.” Her article seeks to go beyond this “empirical phenomenon” using mathematical models to provide better understanding of the causes of citation gaps. The finding of her study is that “citation gaps are likely due in part to the structure of academic communities. The existence of these ‘structural causes’ has implications for attempts to lessen citation gaps, and for proposals to make academic communities more efficient (e.g. by eliminating pre-publication peer review). These proposals have the potential to create feedback loops, amplifying current structural inequities.” 

Professor Rubin spoke with ATG in April about her research.

NKH:  We all know that knowledge evolves as research changes over time. The role of citation began in the 1960s with Gene Garfield’s interest and work to create citation indexing. Today it is used as a quality metric by users searching for ‘good’ information (based on the assumption that the more citations, the better the quality), administrators seeking to evaluate the quality of their staff, and individual researchers to justify the value of their own research to funding organizations and for promotion. In your field does this hold true? What made you want to do your recent research in this area?

HR:  I think in every field citations are used as a measure of quality, at least to some extent. Of course, they’re never going to be a perfect measure – there are a number of reasons why someone might cite a paper, even if it’s not very good – the question is whether in relying on citations (or the h-index, or i10-index) we’re making good decisions based on these measures. In that vein, I and many others talk about how these measures can be biased in such a way that overly relying on them can lead us to make bad decisions, decisions that ultimately disadvantage people who are already historically excluded or kept on the fringes of academia.

NKH: COVID has made access to information – and use of the internet for research – all the more critical and pervasive. Preprints and non-peer-reviewed pieces appear in return lists when searching the internet. Deciding what is ‘real,’ what has gone through some type of peer-review/oversight is an issue that academic/research librarians face daily in their work.  I really appreciate your recent article. In it you examined how “the social identity of a researcher can affect their position in a community, as well as the uptake of their ideas….using mathematical models, that citation gaps are likely due in part to the structure of academic communities.” 

This is a critical issue, especially for information professionals! Your suggestions for change include “eliminating pre-publication peer review..[which have] the potential to create feedback loops, amplifying current structural inequities.” What advice would you give to our community of librarians and information professionals related to the changes in knowledge with the rise of Big Data?

HR: One of the things that is important to keep in mind with the rise of Big Data is that the algorithms we use to process this data can unintentionally reinforce, or even amplify, inequities that are already out there. This is true for algorithms predicting recidivism or approving credit, and it’s true for various search algorithms. My work on citation gaps has shown one way this can play out in academic communities, where people from marginalized groups can be pushed further and further to the peripheries by search algorithms which reward those who are already socially well-positioned for their work to be highly cited and impactful in a community.

Altering certain aspects of how research gets produced and published, like eliminating pre-publication peer review, have the potential to create new feedback loops (or make existing feedback loops worse), increasing structural inequities over time.


“The COVID-19 pandemic has touched every corner of American society,” Harvard Medical School researchers note,  “including the lives of scientists. The past year has seen many researchers dramatically shift the focus of their work, as experts from across different disciplines came together to study this novel disease and develop potential therapies.” McGill University tuberculosis researcher Madhukar Pai, observes that COVID has had the effect of distorting how science is being funded, produced and reported on at the probable expense of other key research areas. “Humanity has endured many crises over centuries. The COVID-19 crisis will also pass. Crises will come and go, but we need a long-term vision and strategy for research and scholarship.”

COVID has made access to information – and use of the internet for research – all the more critical and pervasive for scholars and the public at large. Preprints and non-peer-reviewed pieces appear in return lists when searching the internet. Deciding what is ‘real,’ what has gone through some type of peer-review/oversight is an issue that academic/research librarians face daily in their work. 

In the second part of this report, we talk with the experts behind two of the most significant citation indexing systems and get their perspectives on the art and science of citation today.

Nancy K. Herther, writer, consultant and former librarian with the University of Minnesota Libraries


Submit a Comment

Your email address will not be published.

Share This