Bioinformatician Johannes Koester on embodying the spirit of open science

T
The Researcher's Source
By: Fernando Chirigati, Wed May 20 2026
Fernando Chirigati

Author: Fernando Chirigati

Chief Editor at Nature Computational Science

This is the third blog post in a new series in which researchers share their experiences with open science practices, and reflect on the impact that sharing open data, code, and protocols can have.   

Johannes Köster is Professor for Bioinformatics and Computational Oncology at the Institute for AI and Medicine, University of Duisburg-Essen, Germany. As author of the open-source workflow management system, Snakemake, and founder of the Bioconda project, Johannes Köster has extensive experience with open science practices and publishing, with Nature Portfolio articles including Applying the FAIR Principles to computational workflows (Scientific Data), Neoadjuvant nivolumab with or without relatlimab in resectable non-small-cell lung cancer: a randomized phase 2 trial (Nature Medicine), and Bioconda: sustainable and comprehensive software distribution for the life sciences (Nature Methods).  

We asked Professor Köster to share his perspective as an advocate of data sharing and embracing open science in every area of research practice.

How did you first become interested in open science practices? 

I think it was early during my PhD thesis when my group was approached by another that did research in reproducibility and workflow management. They asked, ‘How does this apply to Bioinformatics, actually?’ At that time, we were using a very old but relevant tool to ensure reproducibility of our data analysis. 

Upon this request, we sat together and drafted ideas on how we could make this more ergonomic and convenient for bioinformaticians, such as workflow management, reproducibility, and so on. This is, of course, only a part of open science, but it was the part that drove me to the topic. We brainstormed a bit; I came up with an idea for a syntax and the others did too. I then kind of converged those three ideas into a prototype. That was Snakemake. 

I implemented that over a couple of days and refined it for about one year. Suddenly, it was so useful for the entire group. When we published it, it was immediately very useful for many others. This became quite important for my thesis. In principle, this is exactly what brought me to open science: reproducibility and thinking, ‘How can we actually make our data analysis more accessible to others?’ It’s not only about automation but about making this accessible and understandable. That is, for me, what open science is or should be about. 

What open science practices do you use? 

Whatever I'm involved in, I ensure we publish all the code and data whenever possible. With data, it sometimes depends. When we can, we rely entirely on open data that is also accessible to everybody. Sometimes, like for example in the Nature Medicine paper we published last year, it's not possible because it's German patient data and we are not allowed to just disclose it for everybody, but at least it's uploaded to a repository where you can ask for access. It's a bit more cumbersome than just hitting a button and automatically downloading everything, but this is the best we can do in such a case.  

Protocols are, for me, inside the code. We try to write code in a way that’s human readable, not just dumping some stuff that we run quickly for publication but making sure that it's also reusable and readable by others. This is, I think, an important aspect that is often not fully recognised by reviewers or journal editors. It’s nice that journals sometimes try to get people to publish those things along with the paper, which is already an improvement compared to before, but the quality of what is published matters as well, I think. For me, this is particularly important and a mission. 

What motivated you to adopt these practices?  

“It’s very important that code is seen as a primary research output.” 

  Johannes Köster, Institute for AI and Medicine, University of Duisburg-Essen, Germany 

I think the true value, especially of the stuff that comes as auxiliary to the actual manuscript, lies in two things. First of all, I hope that it makes it easier for readers and reviewers to judge whether the work was valid, both from a methods perspective and a statistical perspective. Doing a data analysis is much more than what you usually read in the methods section of a manuscript, like ‘we applied a t-test’ [a statistical test that compares the means of two samples]. Often, it’s not even specified whether it was one-sided or two-sided, or what the exact thresholds were. For a t-test it’s easy, but there are much more complex methods, and all the parameters matter — how you filter the data matters, and so on. That level of detail you only see in the code. Therefore, I think it’s very important that code is seen as a primary research output. 

The other aspect is not about transparency or judgement, but I think people are reinventing the wheel all the time. Not so much in methods papers but in papers that analyse some big dataset. All the code is rewritten constantly because it wasn’t done in a way where you can actually reuse it and apply it to new datasets. I think this is the second big value of this approach: ideally, each publication is made of building blocks that can be used by others as well, at least in part. This is what we should all aim for because then people can concentrate on the actual big questions and don’t have to deal with all these technical difficulties every time. 

Have you ever faced any particular challenges or barriers in undertaking open science practices?  

There are quite a few barriers or challenges. One I mentioned already, especially as somebody who wants to use open data analyses by others, which is people don't take enough care in publishing and documenting those parts properly. In part, I think this is a matter of culture on the principal investigator (PI) level because they don't care so much how their postdocs or PhD students publish those parts, as long as they can ‘tick the box’. 

Another is for open access. The pricing is not always reasonable. It might be justified, but it's causing discrepancies or imbalances between countries and research groups. There are groups that have a lot of funding, and for them it's much easier to publish high impact because they can easily afford it. Also, there are countries where it's much easier to afford than other countries. That causes science to be a hierarchical system. That's a bad side effect of open access, I think, because everybody wants to publish open access nowadays or most people. It means some people need to choose a journal by just the price, and that is not good. 

Is there a particular success story or example you’re proud of that illustrates the benefits of open science?  

For me, personally, it's definitely the story about Snakemake and Bioconda because they enabled a lot of things for me. First of all, having these tools available, being able to use them, and show others how to use them in unrelated publications made me quite well known in the community. This enabled a lot of career steps for me. For example, one year before the end of my PhD thesis, I was asked to be a postdoc in the lab of Shirley Liu in Harvard. I would never have thought of applying for a postdoc in Harvard if I had not been contacted by her. She only contacted me because she knew about my open science stuff, so that was a huge opener for me. Of course, by just having something you maintain for a long time, you get regular users and they tend to collaborate more with you. That opens a lot of doors. 

How do you advise early career researchers on open science best practices? 

“When thinking about the code you publish along with your open science, make sure you use quality control tools.” 

                                              Johannes Köster, Institute for AI and Medicine, University of Duisburg-Essen, Germany 

When thinking about the code you publish along with your open science, make sure you use quality control tools. There are lots of them available for virtually every programming language. The benefit is, of course, that maintainability improves and you less frequently run into issues, such as having a shortfall in your analysis that you don’t see until publication. 

Speaking about tooling, AI is also a tool one needs to mention. It can be very useful in terms of, let’s say, reviewing your code, but you have to be super careful with that. I would not use it in a generative way, especially not for science. You need to be very sure about every single method and parameter, and AI is not about being sure. What I’m really worried about is paper writing with AI. To me, writing a paper is the process of getting your research idea into a natural language. This process is part of the actual research because you need to think about everything again, and sometimes while doing that, you get new ideas or spot a mistake that you wouldn’t if you just prompted AI to write this paragraph for you. 

This process can be painful, of course, especially if you’re a beginner, but if we start to skip that, we will have a huge problem with the quality of research. It also applies to writing code, such as writing the analysis or engaging in the cognitive process of solving a problem in a scientific way. If you just delegate things like that to AI, we are lost, I think. With this I don’t want to say that AI is not a useful addition and assistance, but we should not let ourselves be seduced to use it as a shortcut. 

Are there any gaps in support or resources about open science you’d like to see addressed? 

Open science gives you a lot, but it's very hard to get recognition for that (apart from personal relationships or career steps) in terms of, let's say, metrics, for how successful you are as a researcher. In terms of software, even if it's a small, niche project, you ideally still need to maintain it for a very long time. This is often not recognised in academia. People look at citations and probably download counts, but even for niche projects, it can still be a huge benefit for the scientific community if the person who developed it maintains it for a long time. I think we need to find metrics we can have for that as well, like continuous maintenance and proper user support, and so on. 

Another gap relates to how we handle peer review. So far, reviews are a service that researchers do for others and for the journals, of course, because it wouldn't work if everybody just published and nobody reviewed. It makes total sense, but it also means that it's often anonymous and the review is not open science, although there are exceptions to that. I like this idea of the review being a publication in and of itself because that's also science, and this is something that deserves recognition. Why not make this as open as a research artefact and let others read the review? 

Currently, we think once the paper is published, the review process is over, and then we have just the authors and the readers. There's usually no direct feedback mechanism. That's something journals could think about adopting as well, allowing reviews by readers. A platform that already explores this direction is Octopus,where peer reviews get their own DOI, becoming a citable and public entity. 

What advice would you give to researchers considering adopting open science practices?  

“There's nothing better in training you as a scientist or a computer scientist than participating in open source and getting in contact with others.” 

            Johannes Köster, Institute for AI and Medicine, University of Duisburg-Essen, Germany 

When I teach students, I tell them all the time, think about open-source projects you could contribute to or come up with your own. "There's nothing better in training you as a scientist or a computer scientist than participating in open source and getting in contact with others."  

Regarding taking care of open science and reproducibility in the publication process, I would say just do it, even if your PI doesn't care. They will tell you, ‘We have to publish this paper next Friday,’ and then, of course, every PhD student will think, ‘Oh, I just need to get this done.’ In the end, it goes into review, but the reviewers want some additional work to be done or criticise a certain step of the analysis. Now they need to go back to the work they did in a rush and basically repeat it because they didn’t do it in a properly documented and automated way the first time. 

What I want to say is that it usually pays off if you spend this extra time thinking about being reproducible and open, in the sense that later on you have less work to do and can build upon your previous work. By doing that, you get automatic citations because if others use your code for their analysis, they have to cite you. I can definitely see from my own practices that it nearly always pays off in the end. 

Learn more about open science and sharing research data, code and protocols & methods openly 


Johannes Koester © Springer Nature

Johannes Köster, PhD, Professor for Bioinformatics and Computational Oncology at the Institute for AI in Medicine, University of Duisburg-Essen, Germany

Johannes Köster is professor for Bioinformatics and Computational Oncology at the Institute for AI in Medicine, University of Duisburg-Essen, with a focus on algorithm engineering and data analysis. Johannes Köster studied computer science at the University of Dortmund, did his PhD at the TU Dortmund, and was a postdoc at the Dana Farber Cancer Institute, Harvard University and Centrum Wiskunde & Informatica (CWI). 

 Johannes Köster is the author of the workflow management system Snakemake and the founder of the Bioconda project for sustainably distributing bioinformatics software as easily installable packages. He is also the author of the Rust-Bio library, and works in the field of Bayesian statistics in order to provide algorithms for analysis of high-throughput data while capturing and quantifying all known sources of uncertainty, thereby providing more reproducible predictions.

Related content:

  • Open science conversations: 
  1. Building trust through transparency: An open science conversation with Geir Kjetil Sandve 
  2. Open science, altruism and impact: An interview with clinical geneticist Zornitza Stark
  •  Best practices for transparency and reuse: 
  1. How to share your research protocols and methods openly 

  2. How to share your research code openly 

  • Supporting open science practices: 
  1. Why share your research data? 

  2. Why sharing protocols matters 

  3. Why sharing your code matters 

Don't miss the latest news and blogs, sign up to The Researcher's Source Monthly Digest

Fernando Chirigati

Author: Fernando Chirigati

Chief Editor at Nature Computational Science

Fernando Chirigati is the Chief Editor at Nature Computational Science. He received his PhD in Computer Science from New York University in 2018, and also worked as a postdoctoral research associate at the same institution. He conducted research in various areas, including scientific data management, provenance management and analytics, large-scale data analytics, data mining, computational reproducibility, and data visualization.