Scite.AI Update- Part 1: Creating New Opportunities- An ATG Original

by | Feb 15, 2021 | 0 comments

#

By: Nancy K. Herther, writer, consultant and former librarian 

In 2019 Charleston Hub first reported on the new, innovative scite, an effort to easily check how a scientific article has been cited and if it has been supported or disputed by others. Scite is a deep learning platform that evaluates the reliability of scientific claims by citation analysis and provides a far deeper understanding of the content of scholarly mentions/discussions of research from articles than ever before.  In the past 18 months, progress on the scite database has been very impressive in providing more depth of analysis and breadth of the literature.

And scite has gotten noticed in the scientific community having received in 2019 alone:  The NSF Phase I SBIR Award in  July 2019; 2019 People’s Choice Award for Most Innovative Idea from the International Society of Managing and Technical Editors (ISMTE); and the 2019 Award for Innovation in Publishing from the Association of Learned and Professional Society Publishers (ALPSP). Validation has also come from the hundreds of published scholarly research papers using scite data and analysis. According to a quick search in Google Scholar for the year 2020, over 450 articles are listed in a search by the company/database name, scite.

In his article in the evolution of the Science Citation Index in International Microbiology in 2007, Gene Garfield discussed key goals and needs for studying citedness and the issue of quality in terms of what and where the citations occur. At his core, Garfield was a scientist interested deeply in the progress of science. “By collecting all the relevant citing papers on a subject in a WoS search, the collective memory of the citing authors produces a visual description of the topical history.” By itself, this was a revolutionary way to observe the progress of science. 

 SMART CITATION FOR INTELLIGENT RESEARCH

“Using Smart Citations, easily check how an article has been cited and if it has been supported or disputed by others. Scite is a Brooklyn-based startup that helps researchers better discover and evaluate scientific articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contradicting evidence. Scite is used by researchers from dozens of countries and is funded in part by the National Science Foundation and the National Institute of Drug Abuse of the National Institutes of Health,” according to scite.ai.

Scite is a very different type of platform from most of the other systems out there.  It includes mentions and other key elements of value/success of some research; however, as I see it, scite follows a different type of structure and philosophy.

Covered by Against the Grain in a 2019 article, scite, in development since June 2018, provides an amazing look into the future promise of research analytics. The company describes scite not as a product or database, but as “a platform that uses deep learning, natural language processing, and a network of experts to identify and promote reliable research by evaluating the veracity of scientific claims.”

Over the past two years, scite has made incredible progress in database development, size and functionality. As the database is described on the scite pages: Over 23 million full-text articles have been processed and 813 million citation statements have been extracted and analyzed. 

For searchers, scite uses a system they call Smart Citations that allows “users to see how a scientific paper has been cited by providing the context of the citation and a classification describing whether it provides supporting or disputing evidence for the cited claim.” The speed of the database is impressive and the graphics are every bit as impressive as the information that it presents. The Reference Check allows you to quickly and easily upload your own research articles to see how the references you are citing have been cited by others. 

Rather than broadening the scope of references/citations,  scite employs “deep learning to show how an article has been cited and, specifically, if it has been supported or contradicted, where the citations appear in the citing paper, and if it is a self-cite or a citation from a review or article. In short, we want to make citations smart–citations that not merely tell how many times an article is cited, but also provide the context for each citation and the citation meaning, such as whether it provides supporting or contradicting evidence for the cited claim.”

In August 2020, scite released their Smart Citation Badges which are now being used on many key academic databases and publishers sites. The database has grown dramatically as well – due to the artificial intelligence built into the database design and the active participation by a growing number of academic and research journal publishers. 

In a further key enhancement, Citation Alerts have been added to their services, which allows people to sign up to be notified via email whenever your articles or research receive a new citation. These can also be synced to your ORCID profile so you can automatically, and seamlessly, give access to your latest publications to anyone using ORCID. 

REVISITING SCITE IN 2021 WITH JOSH NICHOLSON

Josh Nicholson

Charleston Hub first published information on scite in 2019. Things are continuing to move quickly for this start-up and the value of the approaches being taken have been lauded across the disciplines, global research community and by users as well.  No one speaks more authoritatively – or with greater clarity – about scite than founder and CEO Josh Nicholson

NKH: Scite is founded on using a deep learning platform to evaluate the reliability/credibility of scientific claims by using the well-established method of citation analysis. I know that the database has grown dramatically in the past two years. How has the database and your company grown in the past two years?


JN: We have more than doubled the number of citation statements in our database and are currently at 803M citation statements. Our growth comes from new indexing agreements with publishers such as Wiley, Cambridge University Press, and others that allow us to gain access to content that is not open access.

NKH: In a recent coauthored article, you described the scite database as “a massive database of qualitatively described citations, and machine learning algorithms.” Can you unpack that a bit for our readers?

JN: As mentioned, we have over 800 million citation statements in our database, which is a massive amount. In fact, it is the largest database of citation *statements* in the world. Other services like Web of Science, Scopus, and Google Scholar might have a higher citation count but we have more citation contexts and statements.

NKH: As you explained the core philosophy of scite in an article earlier this year, “if I cite an article because I have evidence that contradicts it, it is a citation. If I cite an article because I have evidence that supports it, it is a citation. In today’s world, we as researchers, administrators, and publishers treat all citations as equal even though they are not.” This differs massively from altmetrics and builds on the base established by Garfield’s citation indexes.

Josh Nicholson

JN: Eugene Garfield suggested “citation markers” back in 1966, which is effectively what our citation classifications are. Despite the early suggestion, it has been technically very challenging to classify citations this way so no one has been able to do it. Id’ say we’re pretty lucky to be tackling this problem when we are because the technology to do so now exists. We are distinctly different from Altmetrics but there are similarities. I think Altmetrics does a nice job of capturing the conversation in Social Media, whereas we want to capture the conversation in scientific articles.

NKH: I’m personally surprised that in the work to include social media as an evaluative framework, that scite isn’t being discussed as a key development to the corpus of citation analysis. To me, altmetrics is a way to broaden the analysis of impact to use newer social media as a potential indicator of quality. However, the value of altmetrics has yet to prove itself beyond ‘popularity’ or information spread. Scite builds on the existing citation foundations. As you explained the core philosophy of scite in an article earlier this year, “if I cite an article because I have evidence that contradicts it, it is a citation. If I cite an article because I have evidence that supports it, it is a citation. In today’s world, we as researchers, administrators, and publishers treat all citations as equal even though they are not.” This differs massively from altmetrics and builds on the base established by Garfield’s citation indexes.

JN: I fully agree and I think it is just a matter of time until scite and the smart citations we produce are embedded into scholarly communication more deeply. Altmetrics has been around for years before scite so it makes sense that is more well known. We have been working diligently with publishers though and will be going live on 103 journals published by Wiley in early February.

NKH: Can you describe some of the changes/enhancements that have been developed for scite in the past two years? What about future plans for scite?

Scite Visualization

JN: As we have improved our coverage and added more citation statements to our database we have shifted more of our focus to user-facing features. We introduced DIY dashboards and visualizations, which allow users to look at a group of articles more effectively. We have integrated with Zotero and Mendeley so users can see how articles in their reference library have been cited according to scite. Additionally, we have launched the scite Reference Check that allows authors, editors, and peer reviewers to check how reliable their references are by exposing references made to retracted articles or highly disputed articles. The Reference Check can be used by individual authors but we have also introduced it into Manuscript Manager and will be integrating it into Editorial Manager (Aries) and another system in early 2021.

Our plans are to help researchers do better research and to do that we think we need to more effectively integrate into their workflows. Our plans going forward will be to work with partners and further develop our browser extensions so that they can harness the power of scite anywhere they are looking at a scientific article online.

NKH: In a September 2020 article in Research Integrity and Peer Review, a group of researchers posit that “initiatives focused solely on citations (e.g., Scite.AI) that use artificial intelligence to clarify whether a citation provides supporting or contradicting evidence for the cited claim, might be very effective in detecting bibliographic errors. However, these tools too are not designed to detect incorrect citations that pertain to erroneous quotations or paraphrases and given semantic complexities of identifying these errors; it is not clear whether artificial intelligence should be used for this purpose at all.” Your reaction?

JN: It’s correct we don’t identify if a citation was made in error. We take what the authors of scientific papers have written and classify it as providing supporting or disputing evidence, or just mentioning it. If authors have cited the paper for the wrong reasons this is not something that we currently can identify. I think it’s an interesting problem that others have documented and I don’t think that it is intractable, just very difficult and not our focus right now.

NKH: Many journal publishers are now incorporating scite data into their websites and appear as the results pages are displayed, such as Europe PMC. How widespread is the acceptance by traditional publishers and journals today? I’d think it would be an easy sell!

JN: Journals are happy to display if work they have published has been supported but they might be hesitant to display a badge that shows an article has been disputed. Also, they might be hesitant if our numbers are too low. We have worked hard to improve our citation coverage and have dropped all costs associated with the scite badge (scite.ai/badge). As mentioned above, we have hundreds of journals adding our information to their articles in 2021 confirmed and I expect more to follow that.

NKH: Could you describe the progress that has been made with the technology/algorithms/database over the past two years? How has the company grown as well? Last time we spoke, the company was still forming and working with a small group of staff. Can you update us with data on the growing size of the database and a description of the database as it is today?

JN: We have come a long way in terms of the deep learning model classifier and are actually close to submitting a paper that details this. Again, as mentioned, we are now at 803M citation statements, up from 236M in May 2019. The precision of our classifier has also more than doubled in the same time since.

NKH: Many researchers are also picking up on the scite system and studying scite as a base for better science valuation. In one recent report, for example, the author created a “machine-learning model to classify each journal by subject according to its scite journal index (sji).” This article concludes with the hope that “a new, modern method of ranking is necessary, and scite provides a great step towards a future where rankings are controlled for scientific value and accuracy.” Are you seeing similar confirmation of the key role of scite in the evaluation mixture?

JN: I think it’s still a bit early to see this really happening beyond a few places because most people still don’t know about us. As we become more engrained into user workflows and more researchers learn about scite, I think it will be unavoidable to use scite for evaluation as it gives so much information. To this point, we have had multiple Provosts of Research reach out to us to use our data but this is not at scale yet.

NKH: In a just-published Nature article, the author noted that “So far, Scite.ai has analyzed more than 16 million full-text scientific articles from publishers such as BMJ Publishing Group in London and Karger in Basle, Switzerland. But that is just a fraction of the scientific literature.” How do you see the growth of the core database? This is clearly a huge project. Are you getting strong buy-in from publishers – I’d assume they’d be very excited about scite and its long-term value to science.

JN: We have done a good job of partnering with publishers but still have some large holdouts, like Elsevier. I think overtime we will have more and more indexing agreements but it seems normal that some might be hesitant at first. Some of our partners are:

Karger
Wiley
BMJ
Sage
Frontiers
Future Science Group
Cambridge University Press
Microbiology Society
IOP
Thieme
Rockefeller University Press
International Union of Crystallography

NKH: Since citation counts have been the major measure of research influence, researchers are certainly taking a good look at scite today. As also quoted in the above article, “Giovanni Colavizza, an AI scientist at the University of Amsterdam, currently a visiting researcher at the Alan Turing Institute in London, says that “their results are sound and precise”, from what he can tell. “Most citations are classified as ‘mentions’, because the classifier is trained to be cautious, which is reasonable, too,” says Colavizza, who is a user of the platform and whose team has analyzed data from the start-up in the past.” Social media data is certainly easier to obtain and the added analysis that scite requires adds to the development cycle. How do you see scite today and in another 2 or 5 years?

JN: Scite has made massive amounts of progress since the idea was first conceived going from analyzing hundreds of citations to hundreds of millions. I expect we will continue to grow the database and find new ways of getting that data into researchers’ workflows. It’s hard to tell where we will be in 2 or 5 years but our main focus is on building something sustainable so that we can continue to grow and innovate. It’s cost hundreds of thousands of dollars to process all these articles so making sure we have a sustainable business model is important.

NKH: How do you see scite being used along with the more hyped social altmetrics?

JN: I think we capture a parallel conversation around research articles to Altmetrics. To me, it would be very interesting to classify Altmetrics in the way we do citation statements so you could see if an article was being shared on social media because it was receiving positive attention or negative attention but this is not something we have any plans to tackle.

NKH:  Thank you for your time, your work and giving everyone an update on scite!

Scite is inspiring a new generation of citation analysis and research. In Part 2, ATG interviews Mason Hayes, who has not only been able to create his own analytic tools using scite – but now works part-time for the company even as he works on his doctorate.

Nancy K. Herther, writer, consultant and former librarian with the University of Minnesota Libraries

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Share This