Opening doors to discovery of content and data with AI

Advancing Discoveries
By: Henning Schoenenberger and Markus Kaindl, Thu Apr 16 2020

Author: Henning Schoenenberger and Markus Kaindl

In January this year we announced a new partnership between Springer Nature and the non-profit open science organisation OpenAIRE to allow links from publications to research datasets and other scholarly outputs to be extracted using text and data mining. The goal of the partnership was to enable researchers to get a clearer and richer view of the research they are analysing, as well as help them extract the most relevant information from a large corpus of publications.

Similarly, in the summer of last year, we expanded our collaboration with the Allen Institute for Artificial Intelligence (AI2) so that metadata and full text from over 3.4 million content items could be delivered to them. This allowed AI2 to create a comprehensive literature graph for application in their discovery platform Semantic Scholar. Using this search engine, scientists and scholars can navigate links between articles, authors and topics, stay up-to-date with the latest developments in their field, identify new areas for investigation, aid hypothesis generation and investigate new research methodologies.

Abstracting and indexing collaborations like these are, of course, not new – we have been involved in such partnerships since the advent of digital publishing. But these collaborations are essential if we are to serve our communities now and in the future. Here is why:

Grounded in the needs of our communities

Partnering with organisations that offer services based on artificial intelligence has clear benefits for researchers: their work gets increased exposure, it is easier to discover, author visibility and credit improves and ultimately the impact of the research is greater. But it is not just the research community that benefits.

Often, the organisations that we partner with have missions and goals that are tackling some of the greatest challenges our society faces. For example, Springer Nature is currently piloting a partnership with the Chan-Zuckerberg Initiative where we provide journal content for integration in their discovery and analysis tool Meta. One of the core values of the Chan-Zuckerberg Initiative is to “support the science and technology that will make it possible to cure, prevent, or manage all diseases by the end of this century”. Therefore by providing content to this partner we are supporting the researchers who develop the science and technology necessary to tackle disease from prevention to cure.

Open science is powered by influential technology leaders

Publishing is integral to science and progress, and as a leading global publisher we strive to offer our authors and editors the best possible service for their publications. Therefore, we must ensure that content published with us, whether for a journal or book, research data, protocols or null results is available and discoverable as widely as possible. By making it easier to discover content from all publishers at the same time with maximal metadata enrichment, these organisations empower the scientific discourse and ultimately foster open science via full transparency and putting publications into perspective.

Decreasing organizational and legal hurdles so that we can enter partnerships faster, and ensuring that we partner with influential technology thought-leaders is essential to maximise the discoverability of our authors’ research.

The future is AI

As researchers are confronted with an ever increasing information overload, the challenge to filter the relevant information will only become harder. In future, we will need technology to help us read and digest all of this content. 

Technology is already embedded into every aspect of our business: AI is integrated into plagiarism checks, editing tools and even in our content. In the not too distant future we will see forms of publishing that are less static than books and articles printed into PDFs and EPUBs. Imagine I access a book on my mobile device. Could another reader accessing the same book at the same time but with different experience and expectations see something different?

What is clear is that together with our partners we will become much better at providing diverse new condensed forms of research alongside original publications, and at embracing flexible formats tailored to each individual reader.


Author: Henning Schoenenberger and Markus Kaindl

Henning Schoenenberger, Director Product Data and Metadata, runs a global Data Development department at Springer Nature, finding cutting-edge responses to key problems in the research and library communities, in areas such as access, discovery, and metadata formats, as well as content and data delivery solutions. He ideated and product managed the first machine-generated research book published at Springer Nature.

Markus Kaindl works as Senior Manager in the Product and Platform Group. He is responsible for Springer Nature's internal analytics solution SN Insights based on Digital Science’s Dimensions. Previously he released the Linked Open Data knowledge graph SN SciGraph. His publishing engagement started in 2013 with big data consolidation for the Springer Book Archive, followed by major metadata migration projects due to mergers. With an M.A. degree in Computational Linguistics from Ludwig-Maximilians-University Munich in 2010, he was also involved in various text & data mining and natural language processing efforts dealing with document structuring, enrichment, and classification.