AI and Science Publishing: “Cutting through the clutter has never been more important”

L
Librarians
By: Saskia Hoving, Mon Aug 1 2022
Saskia H

Author: Saskia Hoving

There is more published research available than ever before. And while it’s a vital resource during times of crisis – like the Covid-19 pandemic – the sheer volume of information means it’s all too easy for important findings to be overlooked. In the second of two blogs on using AI to tackle information overload, we turn the focus on times of crisis and a unique solution developed to cut through the clutter.

The first wave of the Covid-19 pandemic in 2020 saw an explosion in published articles tackling the tricky topic of Sars-Cov-2. The number of articles published about Covid-19 grew from zero to 28,000 in just the first six months of the pandemic. In mid-May, nearly 3,000 papers were published in a single week. The Director-General of the World Health Organization (WHO), Tedros Adhanom Ghebreyesus, addressed this at the 2020 Munich Security Conference, stating, “We're not just fighting a pandemic; we're fighting an infodemic”. The vast quantities of research output were overwhelming to the research community. But more than that, the immediate and widespread sharing of medical and other scientific information outside of expert circles before it has been thoroughly vetted (for example, with the steady rise of preprints at this time) was dangerous for the public.

Even academics called for restraint at the time, with Science Business reporting that the COVID-19 pandemic was leading to a “flood of ‘useless’ science”. And while there were already some solutions to this issue available – such as Covid-19 Primer and COVIDScholar – unfortunately, researchers we spoke to were mostly unaware of them. So, the question for us as publishers was – is there something we can do about this? The answer lay in an innovative approach developed to produce the first machine-generated book, which we discussed in our first blog on this topic.

In a webinar, Markus Kaindl, Springer Nature’s Group Product Manager for Research Intelligence, explained the development of an app that supported researchers during the pandemic. Here, we take a look at what he covered and the developments that have followed.

Creating a simple overview using an AI-based report

In March 2020, we started by creating a simple overview of recent Springer Nature publications on Covid-19 using an automated report. A broad spectrum of 144 English publications that passed our critical filtering was used as the source material. This included original papers, news, snippets, editorial notes, and brief communications.

To make it as useful as possible, Markus explained that the team spoke to biologists and virologists to better understand the community's pain points. That enabled them to move fast and create an “early days” prototype for feedback as quickly as possible.

The team used technology similar to that used to produce the machine-generated book, which helped them group content that had been pre-filtered in a meaningful way around the outbreak. As citations couldn’t be used to identify the most compelling content (as many had only just been published), other metrics were used – such as platform downloads or digital media mentions.

The result provided extractive summary snippets for a quick inspection, as well as a link directly through to the original publication.

Creating something more personalized

The question remained, how could we make this tool findable and accessible for researchers and ensure it didn’t become another one of those helpful apps that are never used?

"To us, it became clear, we needed to move from a dynamic, but still, somehow, static report that we generated using AI to an app,” explained Markus Kaindl in the webinar. “We are in the fortunate position to be able to leverage a unique combination of centuries-old brands, strong credibility within our communities, a good understanding of our user's needs, and technology solutions at hand that were developed for other products in-house."

So, after receiving positive feedback, both externally and internally on the tool prototype, the next step for the team was to experiment in various directions, creating a framework for personalized research exploration as part of an app.

Some of the key areas explored include:
●    Domain, persona and task-specific content recommendations
●    Content across all publishers using the public CORD-19 dataset
●    Reading lists, automatic summaries
●    Most prolific potential collaborators

"The beauty of this approach is that it will not only make a difference for Covid-19 research,” explained Markus. “We believe [it will be useful] for many other urgent areas of research like sustainability and climate change, for example."

Extreme summarization and TLDRs

There is so much potential in AI to support researchers and we’re only at the beginning of that journey. Another area explored by Markus in his part of the webinar was “TLDRs”. (Classically known as TL;DR which stands for ‘too long; didn’t read’.) 

TLDRs are a form of ‘extreme summarization’ and act as an alternative to abstracts. TLDRs of scientific papers leave out the non-essential background or methodological details and capture the key important aspects of the paper. 

The challenge with creating them is that writing a TLDR of a scientific paper requires expert background knowledge and complex domain-specific language understanding. This helps in identifying the salient aspects of the paper while maintaining faithfulness to the source and the correctness of the written summary. An initial pilot using AI to create TLDRs was run within the computer science subject area, with great success. 

"We fed in the abstract, introduction, and conclusions of individual sample papers,” explained Markus. “And then asked the authors of those papers about the resulting TLDR. Not only were they judged as correct, but also as highly useful."

Natural language models and AI content generation

Another area Markus explored in the webinar was whether AI could be used to generate scientific content. To do this, he started by explaining natural language models – which most of us know best for easements like search autocomplete and typing suggestions now omnipresent in online products 

Essentially, you give a language model a ‘primer’ (such as the start of a search query) and it will then suggest the rest. The question Markus posed was, “Can we use this to automate scientific research generation?”

To test this, Markus’ team set out to train a language model with 20,000 paper introductions from the Association of Computational Linguistics. The result was impressive, with the program able to produce accurate, compelling text from a simple primer sentence. But how could this be used, particularly considering the ethical considerations involved in ensuring that fake science isn’t produced by such a programme?

"It is clear that we cannot, and also do not want to, delete the human from the loop,” said Markus. “My hope, if we manage to master this as a science publisher, is that we will be able to support researchers when kickstarting the writing – helping them overcome writers’ block. The machine can just generate a suggestion and the human can edit it to its final perfection."

This approach could also work well in applications like science journalism or to create automatically generated and dynamically curated topical pages – for example on a topic like a climate change.

What does this all mean for librarians?

One of the first questions asked during the webinar was where librarians fit into these discussions regarding AI? 

"This means a paradigm shift,” answered Markus. “I think the focus will move from providing content to researchers to providing services that help cut through the clutter."

Markus went on to say that one of the challenges for librarians will be to train and educate younger authors and researchers about the options. Another is being aware of the risks – checking your sources are always right, actively addressing plagiarism and intellectual property questions, and so on.

"Librarians are ‘Knowledge Incubators’,” concluded Markus. “And they can be the research translators too. So the message is to embrace this technology and learn how to use it."

It’s only the start for AI in publishing

Building on the examples we’ve looked through here, most recently Nature has developed Research Intelligence – a new suite of AI-powered solutions that summarize research trends to allow organizations to quickly measure their success, uncover hidden connections, and guide their strategy. You can read all about it in this blog post.


Enjoyed this blog? Don’t forget to read our first blog on this topic, where we go into more detail on the role of publishers in managing information overload and look at the first machine-generated academic books.


Saskia H

Author: Saskia Hoving

In the Dordrecht office, Marketing Manager Saskia Hoving produces The Link Newsletter for research communities. Focusing on the evolving role of libraries regarding SDGs, Open Science, and researcher support, she explores academia's intersection with societal progress. With a lifelong passion for sports and recent exploration into "Women’s inclusion in today’s science", Saskia brings dynamic insights to her work.