By: Nancy K. Herther, writer, consultant and former librarian with the University of Minnesota Libraries
Yewno, founded in 2015, has worked to establish close connections with libraries with the common goal to help “the world to uncover the undiscovered through its new inference engine, which introduces an entirely new approach to knowledge discovery.” The company focus is on developing their inference engine which incorporates “machine learning, cognitive science, neural networks, and computational linguistics into an intelligent framework to enhance human understanding by correlating concepts across vast volumes sources.”
Yewno is using their partnerships and private funding sources as a base, is working to create “numerous partnerships across the finance sector, top research universities, publishers and content aggregators worldwide” as well as working to establish clear links with libraries and information professionals. As Yewno’s COO Ruth Pickering explains this new challenge: “Libraries can help their communities navigate through the overwhelming sea of information to quickly identify what is most relevant and gain insights that using other technologies might not be apparent.”
Yewno’s recent efforts have included a host of significant partnerships with research institutions and content providers to integrate Yewno into the existing knowledge systems of key information providers. Last October, Yewno announced a partnership with Ex Libris Primo enabling Yewno Discover’s knowledge graphing capabilities to enhance Ex Libris Cloud Apps open framework to better enable academics/vendors to “create even more connected and adaptable discovery services for their users. Also in October, Yewno announced an agreement with Citi, who will use Yewno’s AI solutions in its data science team, Citi Global Data Insights (CGDI) unit. Yewno has also actively sought beta test sites in academic libraries and their institutions as another way to assess user needs, tweak software and add to their growing base of queries and results.
YEWNO ISN’T ALONE IN SEEKING AI SOLUTIONS
With digitalization progressing at an increasingly fast pace, businesses are adopting machine learning and artificial intelligence in their everyday operations to streamline the search process and make information more malleable and quickly accessible. As technology advances in the past two years, a variety of machine learning frameworks have come to light. Some deep learning examples include Amazon SageMaker, Databricks, Unified Analytics Platform and Microsoft Azure Machine Learning Studio.
As a solution to the processing, dimensionality reduction, compression, and extraction of huge stores of Big Data, deep learning has become the most useful method of making this information available and useful for research and learning. Rather than relying on existing assumptions of ‘connections’ between or amongst data, the often 3-D knowledge graphs allow for free-form discovery.
OTHER NEW DISCOVERY SYSTEMS ON THE RISE
Yewno isn’t alone in establishing itself in this new era of search. Other newer AI-based discovery systems include:
Sciride allows for searching a subset of the information records they call ‘citation statements’ that are “pieces of text from scientific literature supported by citing other peer-reviewed publications, carry significant amount of condensed information on prior art.” This offers a unique approach which allows researchers to “easily find information to build an evidence-based narrative for their own manuscripts.”
Scholix, called a framework for Scholarly Link eXchange, is a new protocol supported by major science giants such as Scopus. One goal for this project is “to enable an open information ecosystem to understand systematically what data underpins literature and what literature references data.”
Another interesting new approach is Using BASE from Bielefeld University Library or PubMed. You can now do a search and the Open Knowledge Maps system will pull out the top 100 articles, clustering them based on the “similarity of metadata.”
YEWNO’S NEW CHAPTER IN SEARCH
“Yewno’s mission is to extract and organize valuable insights from an overwhelming quantity of unstructured data. We are building the next generation Knowledge Graph which helps people to overcome the information overload problem, and to research and understand the world in a more natural manner,” the company explains on their website.
“In contrast to classical information-retrieval engines, rooted in theoretical computer science, our approach is inspired by the way humans process information from multiple sensorial channels and it leverages state-of-the-art Computational Linguistics, Network Theory, Machine Learning, as well as methods from the traditional Artificial Intelligence. At the core of our technology is the framework that extracts, processes, links and represents atomic units of knowledge – concepts – from heterogeneous data sources. A Deep Learning Network continuously “reads” high-quality sources and projects concepts into a multi-layered and multi-dimensional Embedding Space where similarity measures are used to group together related concepts along different dimensions (semantic, syntactic, just to name a few).”
Ingesting types of data (e.g. news, statistics, data, patents, stock prices, etc.) and different types of analytical tools and measures, the result is the Yewno Knowledge Graph (YKG). However, perhaps the most interesting and important feature is that the representations are never static but change over time as new information is added to the system. This allows the system to not just reflect information at some static point but change as the available information grows and changes.
That company has invested much of the past five years applying their AI systems to Open Access federal government documents in order to test both the ability to handle rapid change in data and the new potential analyses that arise. The company explains this project in this way: ”Government information is vital for analysts, researchers, students, and citizens seeking to engage with their leadership, study decision making, and support accountability. While government information is largely publicly available through websites and repositories; it is a phenomenally fragmented and unstructured set of content. Access is rarely enough: the dense text and interwoven nature of government decision making presents a tremendous challenge for actual use. By interpreting chains of concepts and relationships, it draws inferences and unlocks insights that would otherwise not be readily apparent.”
A few years ago we had another startup, Knowtro, which promised to simplify the search process; however, the company wasn’t able to get the funding to make it to production. Of course, there have been many other key products over the past 20 years that have pushed the envelope for search: FirstSearch’s FactSearch, RDS TableBase, ProQuest’s Statistical Insight, and other databases. However, none of these has had the impact of Yewno Discover. Just as AI is making so many aspects of research and analysis easier, Yewno seems to be breaking new ground and a true game changer for the information industry.
TALKING YEWNO WITH RUTH PICKERING
Ruth Pickering is both Co-Founder and Chief Strategy and Business Development Officer at Yewno, making her the perfect person to provide background on this highly innovative new approach to discovery. This interview was conducted in early June 2021.
NKH: We are all facing this avalanche of information. The mass digitization of content, together with internet search capabilities, has put an unprecedented amount of information at our fingertips. Yewno offers an AI-engine that uses advanced technology to help find potential connections in discovery from a wide base of research information. Built with Amazon’s AWS, Yewno’s software is able to analyze millions of information sources in real-time. Rather than simply hunting for keywords, the algorithms read/analyze text – and it’s context and meaning, and present the information/linkages graphically for users. Some of the librarians that I’ve spoken with see this as a great tool – however, many have expressed to me that the cost/emphasis of Yewno Discover, etc., is too tailored for an advanced audience. How do you see Yewno Discover’s role?
RP: Yewno Discover leverages AI technology to help researchers deal with information overload. Yewno’s artificial intelligence algorithms are not simply hunting for keywords, but can also read and understand what they’re reading. They can not only help researchers find the resources they are looking for, but take them to the exact section of text that is relevant. In addition, AI can infer connections, in the same way that humans can have intuition, but at a vastly different scale.
One of the really important things about Yewno’s technology is that it reads full text, so even if historic cataloguing and metadata has been inconsistent or has gaps, it doesn’t matter. However, ingesting huge quantities of full text does have processing costs associated with it. To keep the product unbiased we opted for a subscription model.
NKH: You have been involved with Yewno since its founding in 2015. How did you come to realize the importance/value/potential for this type of tool today?
RP: We all have different research styles, and we wanted to create a product that didn’t require training or specialist knowledge. The comment I hear overwhelmingly from researchers is that they find the visual graph intuitive and it suggests concepts and connections that they hadn’t thought to search for initially. Additionally, people often talk about bringing back serendipity and that the tool is ‘fun’. I think we’ve all had bad experiences of searching for something and after many hours finding ourselves frustrated and not completing the task we set out to do. The Yewno Discover interface is very engaging and helps people understand context which is so important. The tool helps remove some of the frustrations of traditional, list-based research and gets researchers to the exact section of text that they are looking for faster, giving them more time to spend on their research argument itself.
NKH: You’ve mentioned in interviews and articles that the key initial challenge was building the content base. How easy has it been to partner with major publishers, etc.? You’ve called Amazon’s AWS system (which, as with their other business areas, has created a major platform that handles massive amounts of data) “a critical element of Yewno’s success.” Could you elaborate on making this decision? What features about AWS (their clear dominance in the web environment, experience, etc.) have brought you to this choice?
RP: When we started approaching publishers about participating in Yewno Discover, AI technology, and in particular knowledge graphs – which we use at Yewno, we were not that well known. Most publishers recognize that it’s a research ecosystem and they want their content to be found, especially where it’s licensed and universities are paying fees. I’ve really enjoyed working with the publishing industry and I’ve learnt a lot. I’ve also seen a lot of change in the past 5 years in terms of consolidation and the move to Open Access content.
We chose AWS as we needed a cloud-based, distributed environment which could scale infinitely. They’ve been a great partner to us.
NKH: At the same time, many people still have a need for simplicity. Burlington NC-based Knowtro was an interesting startup that was based on increasing simplicity as their approach: “Knowtro turns high quality scientific research into simple statements of knowledge that anyone can use to inform their lives.” The company never received the first-round funding that it needed. However, the product was clearly looking at another key market segment that deserves consideration. As our knowledge-base is growing exponentially, how do you see the need for simplicity as another approach to knowledge discovery? Less for researchers and perhaps more for the lay public – as well as those key decision-makers who need perhaps a lighter touch rather than a deep dive into some topic?
RP: I agree, whether you are in a full or part time course, we are all lifelong learners and at different times might want to research across any domain. One of the really useful things about Yewno Discover’s visual interface is that users can control how many concepts and connections they want to see – more or less – depending on their level of research and how much time they have available.
NKH: As I understand it, Yewno is really in two parts – a huge and still growing database of structured data/information and that AWS-based (or is compliant a better term?) sophisticated AI search engine. Another key aspect is the state-of the art presentation software. Would you call Yewno more of a state-of-the-art search/presentation system or a radically new approach to connecting/indexing the very dynamic corpus of research information today?
RP: Yewno Discover is a tool that enables users to explore through a graphic UI and also dig deep into the literature. When our AI technology ‘reads’ the content algorithmically, it creates incredibly rich, granular indexing of the content which it turns into a graph visualization. I’ve seen people use the tool differently. Some people spend a lot of time in the graph exploring, adding and removing concepts, exploring the connections and reconfiguring the graph. I’ve also seen people spend less time in the graph and then jump into the literature list. We all have different styles of research and how we use different tools depends on so many factors, like our subject level knowledge when we start, how long we have, what results we’re looking for, or what research output we are creating.
NKH: How many (or what percentage) of major publishers, research centers, and governmental data is covered in Yewno today? Is the focus on contemporary research or is Yewno working to broaden the scope to include more historic data and interpretation? The types of reports/research, as well as journals, books, government research, etc. that is accessed in Yewno? How does the content accessed by Yewno break out by categories or types of information? How global is the data contained or accessed by Yewno? Are you actually ‘partnering’ with content sources or using the functionality of AWS to gather key data? Both? How much of the content is older information versus the currently produced research?
RP: We have a huge number of Content License agreements and we also have large Open Access (OA) resources. When we set about building Yewno Discover, we wanted a broad range of content giving great interdisciplinary coverage. When we work with publishers, we will go back as far as their collection goes. We continue to expand our content sources as further content becomes digitized and as new content is produced each day. We have comprehensive journal coverage through Microsoft Academic Graph along with many direct publisher relationships. We also include books, dissertations, reference, conference proceedings, repositories, and archives. Last year we added 41 news sources so that people can find recent information even when there hasn’t been time for content to be produced via peer review.
NKH: Yewno has been able to attract participation by some of the largest journal/indexing companies across the globe. Many of these huge companies have had their own indexing/access systems. How easy has it been to make the needed linkages between various resources?
RP: Philosophically we want to provide an environment where the researcher doesn’t have to know who the relevant publishers are in order to find great content. When you go to Yewno Discover you can access all the different content types from the spectrum of content owners and OA sources.
In addition, because the technology is reading the full text algorithmically and indexing at a much more granular level, it will help surface content that’s previously been overlooked because no one found it. That’s really exciting.
We don’t want content to be overlooked simply because it’s been poorly tagged historically. Equally, language evolves and the terms we use today may not have been in use when the content was originally indexed, so potentially valuable research can be overlooked. There’s also historic bias to contend with. Things like the United Nations Sustainable Development goals use very specific wording, but, if you’re interested in zero poverty, you’re probably interested in poverty alleviation efforts, poverty eradication and so many other relevant terms that won’t have been tagged to these 17 goals. Equally, when looking at DEI – content may be subject to bias or simply poorly indexed making it difficult for researchers to find using traditional tools. AI can help with this because it’s going back and reading the full text. At Yewno we use a concept based approach so each term – word or group of words – has a definition, so if the term has changed over time or if multiple terms have the same meaning, relevant content can be found.
NKH: I think your perspectives and experience are key to understanding how this ‘next generation’ of researchers sees where we are and where we need to go. Thank you for your insights and your time!
In the last part of this series, we look more deeply at library reactions and use of Yewno in order to bring the world of information into every library search box.
Nancy K. Herther is a research consultant and writer who recently retired from a 30-year career in academic libraries. [email protected]