At the beginning of April, Springer Nature published its first machine-generated book in chemistry. This book prototype provides an overview of the latest research in the growing field of lithium-ion batteries, based on a cross-corpus auto-summarization of the large number of current research articles in this discipline. Over the last 18 months, the algorithm has been developed in close collaboration with researchers from Goethe University Frankfurt, Germany.
We talked to Henning Schoenenberger, Springer Nature’s
A machine-generated book is a book which has been automatically generated by a computational process: an algorithm recombines or newly creates content based on existing content or data resources. The algorithm operates by a set of parameters which determine the scope for the automatically generated book.
Progress in Natural Language Generation is advancing fast, and technology around Artificial Intelligence offer promising opportunities for generating scientific content automatically. To explore the current stage of technology, we have started to collaborate with scientists from the Applied Computational Linguistics lab of Goethe University Frankfurt/Main 18 months ago. With our first machine-generated publication, we have now reached a milestone in in this field.
By introducing this prototype, we aim to explore the opportunities and limits of machine-generated research content alike. Our main objective is to initiate a public debate about opportunities, implications and potential risks of machine-generated content in academic publishing, as technology moves forward. As a global publisher, it is our responsibility to take potential implications into consideration and provide a framework for this new type of publication. And now is the perfect time to have this conversation.
Our prototype explores future ways of informing researchers, professionals and students. State-of-the-art computer algorithms were applied to select relevant sources from Springer Nature publications available on SpringerLink, arrange these in a topical order, and provide succinct summaries of these articles. The result is a cross-corpora auto-summarization of current texts, organized by means of a similarity-based clustering routine in coherent chapters and sections.
Compared to a book written by a human being, the text generation process of a machine-generated book is much faster and can provide overviews, reviews and new insights on a very short term.
Serving as a structured excerpt from a huge set of papers, it aims at helping researchers and students to manage the information overload in this discipline efficiently. This is one of the main user problems we have to solve. Instead of reading through hundreds of published articles, they are able to rely on the solid summary provided by our book. At the same time, if needed, readers are always able to easily identify and check the underlying original source documents in order to further explore the topic. Instead of using results from search engines which may be hard to qualify, readers can rely on qualified information published on Springer Nature’s content platform SpringerLink, which stands up to scientific scrutiny.
We are also exploring further applications of this new technology: As the amount of research and data grows rapidly, we might soon be able to help researchers filter the relevant research from a certain area faster and more thoroughly than a human being ever could.
Which impact will machine-generated content have on the content creation process in scholarly publishing in the future?
We expect that in the future, we will see a wide range of options to create content – from entirely human-created content to a variety of blended man-machine text generation to entirely machine-generated text. Researchers as authors will continue to play a crucial role in scholarly publishing, but it could be that their role will substantially change, as more and more research content is created by algorithms. To a degree, this development would not be that different from automation in manufacturing over the past centuries, which has often resulted in a decrease of manufacturers and an increase of designers at the same time. Perhaps the future of scientific content creation will show a similar decrease of writers and an increase of text designers. But these developments are hard to predict at the moment.
At Springer Nature, we are planning to launch similar pilot projects in further disciplines based on our experience with this chemistry pilot. Our first machine-generated book will serve as a foundation for further development of machine-generated content and provide us with valuable learnings and feedback to help shape the future of scholarly publishing.