By Matthew Ismail, a Charleston Conference Director and Editor-in-Chief, Charleston Briefings
“We’re not doing Large Language Models, we’re doing Smart Language Models.”
It’s always fun to chat with Anita Schjøll Abildgaard about her company, iris.ai, because Anita brings a wonderful combination of tech-savvy and storytelling to her business that makes it easy for the non-techie like me both to follow what her business does and to enjoy the conversation.
Anita’s background tells its own story. In high school in Norway, Anita’s favorite subjects were–not surprisingly for a tech CEO–math, physics, and chemistry, but she also had a compelling interest in theater. While she had considered starting her pre-med studies at university, Anita decided instead to do a BA in theater, enormously enjoying the blend of storytelling and performing. And while Anita was still considering going to medical school after her BA, a friend of hers suggested instead, at age 20, that they start a consultancy business based on their theater experiences.
And lo and behold, Anita was struck by the entrepreneurship bug, which was surprisingly full of creative excitement. Establishing the business concept, starting a company, creating the marketing, telling the story of the business–all of this drew unexpectedly on her performing and storytelling background.
Then, while still young, Anita had the opportunity to go to Silicon Valley as a part of an entrepreneurship program and found herself swimming in the world hotbed of tech-enabled entrepreneurship. She spent a couple of years soaking up the excitement and endless innovation of a Silicon Valley startup, then returned to Europe determined to establish some companies of her own.
And this is where iris.ai was conceived. What problems does iris solve? Iris works with scientific texts in the early stages of a research project, a time when we are doing a broad and inclusive literature review and establishing a thesis based on sound research and data. We ask: What is the state of the art in this research area? Since science is increasingly carried out in interdisciplinary projects, even experienced researchers are regularly faced with new concepts and unfamiliar vocabulary.
This is where iris.ai comes in. At iris.ai’s core is Researcher Workspace, an about-to-launch flexible suite of machine learning-based tools that allows researchers to save 75% of their time on early literature searching, filtering, summarizing, data extraction and analyzing. Researcher Workspace allows researchers to use our own natural language description of a topic to search by concept across disciplines and technical vocabularies. Researchers can add any data set of articles or other documents – from just a handful to millions in a collection – and ask iris to narrow down the reading list by machine identified concepts and topics or even self-written context matching descriptions. Iris can also create summaries to help us sort our papers properly within our workflow. In addition, there is also an extraction tool in Researcher Workspace that allows researchers to extract and systematize tables and other data from research articles, from PDF into csv format.
The AI-powered basis of iris.ai brings tremendous advantages. For instance, when we use standard keyword searching to uncover research papers and data, we are searching for what we already know and using familiar vocabulary. Researcher Workspace, on the other hand, uses concept matching, so it finds papers that are conceptually related, but which we would not have uncovered by keyword.
Iris.ai is a dynamic company that is in the midst of exciting changes. The legacy version of iris.ai is soon to be replaced with a new, more powerful, version of Researcher Workspace. “With this platform,” says Anita, “what we’re doing is providing the researchers, our users, with a variety of different tools that they can use for any research document collection. That can be open access, that can be patent collections, or that can even be your Zotero lists, which grew entirely out of proportion over the last ten years, downloaded pdfs, exports from your regular search tools with thousands of hits, any types of sources. If you want to find a specific bit of insight in that list but you’ve got thousands of references and you don’t even know where to start, then we have a range of smart tools that, each and every one of them, solves one part of the problems of this tedious process. What we’re doing is giving the user the tools to speed up their process themselves.”
For example, “Let’s say you have a collection of full-text pdf articles, and you want to run a statistical analysis on the data these researchers have collected on a certain topic. Now all that data is going to be in these tables, scattered across your documents, and you really just want that data. You can either sit down and work one-by-one through these tables, row by row, and copy and paste, number by number, and put that into a spreadsheet…And that’s not hard to do, but if you have twenty or thirty documents, and each of them has three tables, that is a couple of days worth of work, minimum. And we have this nifty tool within Researcher Workspace called Table Extractions where you just select those ten or twenty articles and tell it to extract all the tables and you have each of those tables in a csv file. You can open them in Excel and suddenly you have all the data in one place.” You can organize the data yourself, or you can use a more advanced Extract tool, available on a project basis (mostly for corporate R&D), that also organizes and systematizes all the data from text and tables for you into your preferred database layout.
This tool takes a task that is “ridiculously time-consuming and mind-numbing” and automates it–precisely why we most love AI. And creating this specific tool was not easy, because researchers use many very strange formats in their tables, and getting a machine to read those different tables was not exactly straightforward!
Reference lists are also a problem for many researchers. “Let’s say you have a Zotero list and you want to look for articles which, in their abstract, describe this particular use-case. I can’t put this use-case into one specific word. Maybe it’s a medical subject and I want the context for what they’re doing, not just a keyword. I don’t want to have to construct a complicated statement using boolean logic. So what Researcher Workspace has is a context filter, so you can upload your Zotero list, go to the filtering tool, then write in your own words–fifty or sixty words–a description of what you’re looking for. Then we do context matching of your filter and all of these abstracts. Then you can set the context score for yourself. So maybe you want a narrower context to get a really good match to your text, or you could even do it negatively, and remove all the abstracts that talk about this topic…That’s a way to go from a thousand articles to, say, the thirty that you want, just by explaining what you need as you would to a friend or colleague.” We could sit down and scan the titles and abstracts of a thousand papers, but Researcher Workspace would save us a lot of time.
Of course, given the storm of recent public interest in Large Language Models such as ChatGPT, I had to ask Anita how iris.ai is employing LLMs. She smiled at the question, of course, since LLMs are nothing new to someone who works in AI. “The thing that can’t be done now by ChatGPT,” says Anita, “is facts. The models are great at rephrasing, but they will also rephrase facts.” An LLM may, e.g., rephrase the content of an article clearly and effectively, but it may also rephrase the melting point of steel in the process, rendering the results useless. “ChatGPT also writes some brilliant reference lists–except that they’re fake. They don’t exist, right? But they sound like articles.”
So how is iris.ai using LLMs? Anita says that Researcher Workspace will shortly launch (in beta) a way to attach your result list to a chatbot. Unlike LLMs trained on public data, however, this LLM won’t make up answers from data harvested elsewhere (hallucinate) but will only answer questions related to the researcher’s own relevant documents. “We’re not doing Large Language Models, we’re doing Smart Language Models. The Large Language Models are great at formulating text, making it sound beautiful, but they’re not great at tracking facts. That’s what we are good at–tracking and tracing facts. Our summaries have very little hallucination. They’re not as fluent as the one you’d get in ChatGPT, but it won’t make up as many facts as ChatGPT will. You also have full control of your sources. An LLM you query on the internet? You don’t know where your sources are coming from – and if you’re working on something confidential, be careful as you don’t know who will get access to your inputted information.” Iris.ai will use LLMs only where it makes sense in the context of your specific workflows–only with text, not with numbers and tables that it’s important to get right.
LLM querying adds a level of interactivity to the research process, but iris.ai only wants to use LLMs in a context that adds trust in the tool. They will be implementing a tool that relates the results of the query to the articles they most likely came from. LLMs don’t quote, after all, they paraphrase, so even their smart tool will only tell you a likely source, not a specific quote or citation.
I ask Anita, “How far are we from having a model of interaction with a computer that’s just chatting, just talking?” Anita says, “In some ways those exist already. Speech-to-text is not that far away, though it’s a little tricky with people’s accents, etc. But if you speak fluent English, French, German, Chinese, etc.–any of the major languages–they work well. We’re not too far from having machines list credible sources for what they’re saying and to tell you how trustworthy the data is. We’re still far from having a machine–five to ten years or more–with the intelligence to paraphrase existing knowledge and to draw conclusions from the data to suggest how novel those conclusions are.”
“We have a requirement in our company right now,” says Anita, “that everyone has to use, not necessarily ChatGPT, but generative AI tools for coding, for discussions, for marketing, for sales three or four hours a week minimum, because it’s changing everything.”
I ask: “What’s going on now that’s most exciting as far as the growth of the business is concerned?” Anita’s excitement about iris.ai is palpable. “That falls into two categories. The first one is that we’re launching the new Researcher Workspace, which is super exciting. We have tons of new features, a whole new way of working with research content. We’ve onboarded the first several clients already. Brand new ways of working with iris.ai tools. That will be scaled up in the summer and fall [of 2023].
“And the second one is the EIC [European Innovation Council] funding. It’s called the EIC Accelerator, which is a funding opportunity for which many people apply but only very few are funded. It’s a two and a half million Euro grant plus a coinvestment (subject to due diligence) of up to twelve million Euros from the EIC Fund. That means we will be raising capital fairly shortly, and if all goes well, we’ll have half of the round from the EIC Fund–which makes fundraising a lot easier!” The EIC specifically said that the technology developed by iris.ai is of “‘strategic interest and importance’ to the European Union because they are a fact-based, ethical, credible AI player dealing with science and research, which is one of the most valuable resources the EU has. It’s pretty cool to be recognized in that context.”
Of course, my next question is: “What are you going to do with all that money?” Anita laughs. “The grant is very concrete. That’s to make the Researcher Workspace ‘enterprise ready.’ We’re going to take that platform from what it is now, which is a powerful platform that we’re very excited to be launching, to a go-to platform that you can’t live without in your daily research work over the next couple of years.”
“For the fundraising part of it: more sales, more support, more technical people, more research. Just expanding our operations and using that funding to take a market position and have stronger commercial growth.”
Be on the lookout for iris.ai! Researcher Workspace could absolutely transform your research process.
Anita Schjøll Abildgaard and Matthew Ismail, “A Conversation with Anita Schjøll Brede, CEO & Co-Founder at Iris.ai,” ATGthePodcast 160, May 31, 2022: 44:08 minutes.
Anita Schjøll Abildgaard and Matthew Ismail, Google Meet conversation, June 15, 2023.