Discover how machine learning is empowering researchers to progress on the SDGs

The Source
By: Lucy Frisch, Thu May 21 2020

SDG Programme Marketing Toolkit_E055 © Springer Nature

Lucy Frisch

Author: Lucy Frisch

In 2019, Springer Nature partnered with The Association of Universities in the Netherlands (VSNU) and its sister organisation UKB, to jointly explore and answer the question “Is open research facilitating progress on the UN’s Sustainable Development Goals (SDG)”? As a first step, the team developed a prototype for mapping scholarly content against five of the goals. Here, Digital Science was chosen as a key technology partner, with initial results released in December 2019. The goals represent some of the world’s most pressing challenges, from good health to peace, justice, and strong institutions. For more information on the partnership behind this project, please read: Advancing partnerships beyond publication to meet the Sustainable Development Goals, a perspective from our account development team. 

In early 2020, Digital Science applied the resulting method and algorithm to the outstanding 12 goals, releasing results for all 17 goals this April, and making these freely and permanently accessible via the free version of Dimensions. 

This work brings together content and technology expertise by all three players resulting in a new innovative tool for the research community. In this interview we speak to, Timon Oefelein (Senior Manager Account Development and Strategic Partnerships Springer Nature), Nicola Jones (Springer Nature’s Head of Publishing for the SDGs), and Jürgen Wastl (Director of Academic Relations and Consultancy, Digital Science) about the mapping project and what opportunities the work brings for researchers.

For any researchers, who may not be familiar with Dimensions, please describe what it is.

JW: Dimensions is an innovative research information platform which has been developed to provide a broader view on the research process and activities beyond publications and citations only. While covering 108 million publications with 1.2 billion citations, Dimensions also contains $1.6 trillion of funded grants, 500,000 clinical trials, 36 million patents, 450k policy documents and 1.5 million datasets as a linked dataset to create a deeper and better understanding of the research trajectory  and  researcher can discover and analyse the resources injected into the research system and how they translate into outputs and impact.

How is Springer Nature engaging with the SDGs?

NJ: With a portfolio of around 13,000 books, 340,000 articles and 3,000 journals in 2019, across all subject disciplines, we publish a vast amount of research that relates to the SDGs. In 2016 we set up a programme to coordinate our publishing on these interdisciplinary and multidisciplinary societal challenges. We now have a formal steering group leading the programme, led by Sir Philip Campbell, Editor in Chief of Springer Nature, which ensures that our publishing activities on the SDGs are closely aligned with our responsible business activities. In terms of specific projects, we have a book series dedicated to the SDGs and publish the Encyclopedia of the UN Sustainable Development Goals - the first major reference work to directly focus on the Goals. We’ve also focussed a number of new journal launches in recent years around sustainability and sustainable development, for example Nature Sustainability and Sustainable Earth, and we’ve set up the Springer Nature Sustainability Community to support authors and other researchers working on sustainability to host blog content that contextualises their formal research publications. Through Nature and Scientific American we have been vocal in advocating for sustainability policy to be based on scientific evidence and participated in global initiatives to support this like Covering Climate Now. We’ve also been working to bring research closer to policy and practice through events like Science and the Sustainable City in 2018; Science on the Hill, which has run four events in Washington DC; and SpotOn, which in 2020 took place in London and Cairo simultaneously, aiming to directly bridge the knowledge sharing gap between the Global North and Global South. Finally, we’ve also taken some really significant steps to manage our own business responsibly, particularly from the environmental point of view, and are committed to being carbon neutral by the end of 2020.

What in your view is so exciting about this project?

TO: There are many aspects of this project that make it exciting, and also, a privilege to be part of. For one, we have developed one of the world’s first online SDG content classifiers that draws on both Machine Learning and Subject Matter Expertise. This has enabled us to make a strong contribution to existing pioneering initiatives in this area, e.g. the AURORA network. It has also been really fantastic and very inspiring to achieve all this together with VSNU, UKB, and Digital Science. At the end of the day, we are all serving the same mission, that is, to empower researchers and funders to advance discovery and progress on the SDSGs. Thus, it makes perfect sense to join forces and work as a team.  

How does a machine learn to map SDG content?

JW: Our automated approach to categorise scholarly articles into the goals employs supervised machine learning whereby curated training data – in this case, publications that are assigned to each of the individual Sustainable Development Goals – feed machine learning algorithms to automatically build a classification model that is then used to categorise new articles without human involvement.

Our workflow deviates from the classic approach in that the collection of the training data was carried out semi-automatically, meaning that the data is compiled from the search results of 17 Boolean queries, each corresponding to one of the goals.

What were the main challenges in determining the quality of the training sets and what role does this play in machine learning?

JW: This was both technically and conceptually challenging on a number of levels: With respect to the training set we had to ask ourselves ‘what defines the quality of the training set or what defines the golden standard?’ as no tagged SDG training set is available. One has to accept that in any case no 100% perfect training set is available. Adding on top of that, the description of the individual SDG (with a varying degree and number of descriptions of targets and indicators) was a challenge in itself. One strategy to alleviate the problem is to combine the experience of a large number of subject matter experts to create large, manually curated lists of publications for each goal – this comes at the expense of time and effort spent, which we wanted to avoid. So we set out in our methodology to create the best available search string to automatically create training sets of large lists of publication without manual curation of the very long lists.

In addition, it became obvious at the start, that our choice of creating and combining search phrases (to generate the best possible training set) would crucially hinge on the critical use of buzzwords (e.g. climate change, innovation, sustainability). These form part of the individual description of the SDG, but are used beyond the scope of the respective goal, so we often excluded these in the training set to minimise false positives.

What was Springer Nature’s role in the mapping process?

TO: Our main role was two-fold: On the one hand, we undertook the overall coordination of the project team (or work stream as we call it). So this involved managing resources, timelines, and communications. And second, we also substantially contributed to one of the key technical phases involving the continual improvement and quality assurance of the actual SDG search strings (as initially authored by Jürgen Wastl). Here, Nicola Jones did a superb job in identifying the most suitable Springer Nature editorial colleagues to check individual searches - from thousands of editors across the world – and supported them in this absolutely critical task. The actual technical process was devised and spearheaded by Digital Science. Once finalized, the search strings were used to generate the initial training set before applying Machine Learning for the final results. Here, I’d like to thank especially Jürgen Wastl, Hélène Draux, and Mario Diwersy for their brilliant work.

NJ: As Timon says, I coordinated the checking from our internal editorial subject matter experts. These are busy people who were asked to take on this additional job at a particularly busy time during conference season last summer. Much to my surprise and delight, every single person I asked agreed to help. The level of engagement and enthusiasm for the project was far beyond my expectations and really shows how willing our editors are to help advance our understanding of the literature as a whole - not just the parts they work directly on themselves.

What have you learned so far through releasing this data?

JW: We created ‘conservative’ training sets, where we repeatedly applied QA processes based on manual inspection and cross checks with other available classification systems in order to minimise false positives, the results of the Machine Learning reflects that process:  We are sure to further expand the SDG tagged content by simultaneously keeping the false positives to a minimum  and eliminate (new) false positives , that were reintroduced by the Machine Learning process, in another round of Quality Assurance with SpringerNature.

What do you perceive as the main ways researchers and funders can utilise this data?

JW: I believe researchers, their institutions, and funders will use this new filter to contextualise SDG related research. With this new filter, a thematically new and different lens focusing on research and research outputs in all SDG relevant areas became available. It will help address the users’ questions about research in a SDG context: Who performs SDG research and where does it take research place? Institutions and funders will gain insights on their positioning in addressing the SDG challenges, and the new filter will enable the individual to find new collaborators (institution-wide, nationally or internationally) to tackle the challenges in social, economic and environmental fields. 

TO: The new tool allows impact evaluation officers and research directors at funders and universities to gain new insights about how their organization’s research contributes to helping solve the grand societal challenges. This in turn supports program planning, benchmarking, trend and gap analysis, and evaluation of individual units of research assessment. 

The data points have strong potential and the timing is perfect: research is becoming increasingly interdisciplinary, often making more traditional classification schemas problematic to use. Further, the research community at large is still – in my opinion – placing far too much weight on quantitative journal-level metrics for overall decision-making. Here, the new tool could play a valuable role improving decision-making. 

JW: I wholeheartedly agree with Timon’s take on it. I’d like to extend the user group to governments and NGOs too.

This mapping project is the first phase of this collaboration, so what can we expect next?

TO:  There are two additional project teams working on further key insights and best-practice recommendations. 

The project team led by Harald Wirsching, our VP for Strategy & Market Intelligence, is conducting extensive trend analysis – using quantitative metrics - of the main results of the SDG mapping, mainly focusing on the Dutch research outputs in the last ten years. Outputs include a report of the usage of research related to the SDGs outside of academia.

Parallel to above, another project team, led by Mithu Lucraft, our Director for Outreach and Open Research, is providing an overview of the existing tools and resources – as offered by both libraries and institutions – that researchers use to facilitate impact of their work. Outputs include a set of best practices on the most effective ways that researchers can optimize impact.

In many ways, the SDG mapping is simply the first step. The results of these two additional project teams will provide the actual key insights and best practices recommendations.

What else should researchers know about Springer Nature’s SDGs programme?

NJ: The ultimate aim of the Springer Nature SDG Publishing Programme is to highlight the best and most relevant SDG related research to those individuals and organizations that are best placed to put it into practice to facilitate the achievement of the Goals by 2030. The SDG mapping in Dimensions represents a significant step forward in terms of identifying published content, but we’re also working hard on developing new editorial activities and products, and working with partners to publish work that really advances our understanding of what is possible and where there are gaps in the evidence base that need to be filled. 

Click here to learn more about this partnership project towards societal impact through open research.

About Timon Oefelein

Timon Oefelein joined Springer Nature’s (SN) Marketing department in 2000 as the publisher’s main international Copywriter. Since then, several posts followed, including in 2007, Head of Global Copy and Product Data Management. In 2010, he co-developed and launched SN’s Account Development program, an innovative training and support service for libraries. In 2018, his role expanded to include Strategic Partnerships and Outreach activities. Here, he supports a number of key libraries in Europe, including the libraries of The European Commission, The European Parliament, and several consortia head offices. He is particularly interested in publishing innovation and research impact.

About Nicola Jones

Nicola Jones
Nicola Jones is Head of Publishing for the Springer Nature SDG Programme: Springer Nature’s response to the UN Sustainable Development Goals. She is passionate about the importance of interdisciplinary collaboration for solving complex global problems and the need for solid research evidence to inform policy and practice.

About Jürgen Wastl 

jurgen wastl
Jürgen Wastl leads the Digital Science consultancy portfolio, supporting research institutions, funding bodies, governments and other institutions with research capabilities to make better use of data to inform their strategies and decisions. A Molecular Biologist & Biochemist by training, Juergen has held roles in project management and research strategy in Industry and Higher Education. 

Lucy Frisch

Author: Lucy Frisch

Lucy Frisch is a Senior Marketing Manager leading the Content Marketing Programmes team, based in the New York office. She has a passion for storytelling and works to humanize the research published across Springer Nature with a focus on the researcher experience.