This is the fourth blog post in our series in which researchers share their experiences with open science practices, and reflect on the impact that sharing open data, code, and protocols can have.
Professor Casey S. Greene is the founding Chair of and a Professor in the Department of Biomedical Informatics at the University of Colorado School of Medicine, as well as Director at the Colorado Center for Personalized Medicine. His lab applies computational methods to answer important questions in biology and medicine. He has extensive publishing experience, with articles published in Nature Portfolio journals, such as Latent spaces for tumour transcriptomes, Responsible, practical genomic data sharing that accelerates research and Projecting genetic associations through gene expression patterns highlights disease etiology and drug mechanisms. As an advocate for high-quality science, Prof. Greene encourages his research group to approach open science practices in a deliberate and meaningful way.
We asked Professor Greene to share his perspective on how these practices enable better, more rigorous science that ultimately leads to “more tomorrows”, improving and extending people’s lives through research that others can build on.
I'm a computational biologist, and our lab focuses on developing machine learning methods for integrative genomics that help us make sense of mostly large-scale biological data, as well as some clinical data. We've worked in some foundational computational methods development and a lot on applications. We like to spend our time trying to ask the uncomfortable questions that folks sometimes don't ask about the limitations of the data, so there’s a lot of thought about how to design computational experiments.
“I would say that my interest is not really open science; my interest is high-quality, rigorous science, and I think often that happens to be open.”
Casey Greene, Founding Chair of and a Professor, Dept. of Biomedical Informatics, University of Colorado School of Medicine; Director, Colorado Center for Personalized Medicine.
I would say that my interest is not really open science; my interest is high-quality, rigorous science, and I think often that happens to be open. I’ve spent a bunch of time talking about the Open Science Industrial Complex, and I think there is an approach to open science which is open first, science second. That can lead us astray. I think if it is science first, rigorous is the goal and open is how we get there.
When I started my postdoc, part of that involved the opportunity of picking up some work that was in progress in the lab. What I would find is: I'd try to do something, and it would turn out that someone else has already implemented it. It's not in the core library that we're using; it's in the version on their computer, so now I have to get that from them and then I have to integrate it with this version of the library that I inherited. We spent time basically redoing potentially avoidable work. I guess that's how I came to it first, focusing on version control, provenance, et cetera. When I started my group, that was the goal, and we've improved our practices over time in terms of our expectations of lab members.
I also think I really like working with people, and there's a limit to the scale of who you can work with if you're going to be closed, even if you use good rigorous scientific practices. If you do this stuff in the open, there's just more opportunity for critique, feedback, and collaboration. I feel like you can do science better and faster if it's open. Not that it should be open just for the sake of it, where it looks open but you don't actually engage with anyone or even use rigorous practices. The rigour is more important than the open, but I think they go together.
Pretty much all our code is on GitHub. Some of that is on GitHub while we're doing it, but I don't require people to have their code public on GitHub at the time the work is being done. I do require the code to be public on GitHub at the time of publication. We use a pull request model. We also try to make sure the data is on an appropriate repository. For the work that we do, that's usually SRA or dbGaP, with controlled access depending on the human subject status of it. Sometimes there's not an appropriate repository, so we use things like Zenodo or Figshare.
Every once in a while, there have been projects that feel like they're beyond the scale of the lab, so we'll actually run the entire project in the open intentionally to advertise it and say, ‘Hey, come here and pitch in.’ I think the ‘Open Paediatric Brain Tumour Atlas’ is an example of that. ‘Opportunities and obstacles for deep learning in biology and medicine’ is an example of that. It started with a tweet. I tweeted, ‘I'm kind of interested in reading the deep learning literature. We're going to start a review on GitHub,’ and ended up with 40-odd people writing a paper.
What does open science look like in the day-to-day practices of your lab?
Do you know the article ‘The Mundanity of Excellence’? I think open science is not a trend, but these everyday, disciplined practices. These practices let you do science where you might fail at something, but if you fail, you fail because of a good reason and you're not going to make the same mistake twice. You'll make a lot of new mistakes. So many new mistakes, but you don't make the same mistake twice. That’s the type of stuff that separates really good science from great science, and those practices often tend to be open. I think just doing that consistently and well can make the entire ecosystem stronger so that we can return more yield in health sciences; we can return more tomorrows to more people, and ideally doing that is going to be the type of thing that convinces society that, ‘Maybe we should continue to invest in this’.
There are so many incredibly bright people in science. I am not going to be able to compete on that, but doing the simple stuff well? I can compete in that. You have to find what you're good at and be able to do small, frustrating things and make them not frustrating but a thing you enjoy doing, somehow that's my skill. The great thing about it is being able to find joy in the little stuff. This is why it was great to be able to post a preprint and have someone reach out and give you a comment because you can enjoy that. You can't control the outcomes, but you can control how you do it. That's how you do science that matters, science that other people can build on.
I think people like to make the argument for open science from a career progression point of view, but, I think it's actually negative for science as a whole. If we look at preprints, for example, there were a whole bunch of people who decided to argue that we should preprint because it's good for people's careers. I think that's exactly the wrong reason to preprint. I view the posting of a preprint as, ‘Hey, I want feedback on this before it's finalized. I want to hear from people’. I see it as a complement to peer review.
At the end of the day, our goal as scientists is either to better understand the unknown or improve the lives of our fellow people. The more we bring the focus into ourselves and less about why we do what we do, the more we make bad decisions that are, at best, orthogonal to scientific progress and, at worst, directly contradictory to scientific progress. I would say focusing on career progression will almost always lead you astray. I think it's a lot easier to figure out what you believe and hope that there is an institution out there that values what you value.
I'm going to say this is a barrier, but I want to recognise it as a fully appropriate barrier. We do work on data that is from people, and I think there are real barriers to open science practices in the context of research on humans. I also think that those are completely appropriate barriers, so we have to figure out how to do that responsibly because with privacy, once it's lost, it's gone. So, to me, I think the biggest, true barrier to open science practices is that privacy is a non-renewable resource. That’s not a bad barrier, but it is something that we have to take exceedingly seriously.
For the deep learning review, we wrote it entirely on GitHub with this automatic rebuild process. We ended up writing a paper about the writing of the paper, and there was a lot of fun stuff we were able to do with that. We spun up the same thing in the context of COVID because there was a lot of research starting to come out, and we thought it'd be great if there was a place that could aggregate it.
There are people I've worked with who I've never met in person because they came across something on GitHub. I've co-authored with people and only met them years later because we interacted on GitHub and ended up writing a paper together. For the Open Paediatric Brain Tumour Atlas, all the analysis happened online in the open, so there's a tonne of people who contributed to this project that I've never met before, but we've done analysis together. It's kind of delightful sometimes.
It's pretty hard to do things if you're not funded. I do think funding for the sake of open is a little odd. We should fund practices that advance science most effectively and efficiently. I think that has to always be the North Star: what's the underlying goal?
Alex's Lemonade Stand Foundation, for example, have open as a review criterion. I like this policy very much, and I think the reason I like it is because it's very centred on the goal. The goal of resource sharing is to enable faster translation of research discoveries into cures for children with cancer. The goal is not open. The goal is not to help careers. The goal is to make sure fewer kids die of cancer, and then there’s the how. I like this type of approach. ‘We're doing this, here's why we're doing it’, but it's centred on not someone's career. It's centred on what helps kids with cancer not have cancer.
Also, every time the National Institutes of Health come out with request for input on open science practices, I try to submit this, ‘You should do something like this.’ I think the more that it's box checking, the less effective it is. I think a focus on the why and then using the why to drive the how feels like the right way to go.
“Figure out what you want to do and what change you want to make in the world and then figure out what practices are best aligned with that, and I think you're going to find they're open.”
Casey Greene, Founding Chair of and a Professor, Dept. of Biomedical Informatics, University of Colorado School of Medicine; Director, Colorado Center for Personalized Medicine.
Figure out what you want to do and what change you want to make in the world and then figure out what practices are best aligned with that, and I think you're going to find they're open. For example, the last thing I want to do is have someone else start a project from ground zero if they could have started from some software that we wrote. I'd rather have them use that limited amount of scientific funding to make as much progress as they can make, and if they can start from something we did, they can get further faster. That is a win for everyone.
Science is generally not a zero-sum game. Society decides where society will invest based on the yield of that investment, and I think most people want to live. I was talking to my daughter yesterday. We were driving back from the airport after dropping a colleague off, and she was asking me a bunch of questions like, “Why do you do what you do?’ I was like, ‘The person who we just dropped off, the work that they do means fewer children will die of cancer. That is what will happen’. What we do in science is ideally about people having more tomorrows. That is an incredible privilege. It is an incredible opportunity, and so keeping yourself grounded in whatever that is and then figuring out how to go from that to choosing the right actions for you in your group.
You’ve got to start with what gets you out of bed in the morning and why you do it because it is real hard sometimes to stay motivated, but if people are going to live happier, healthier lives, I think that has to be what guides you to decide the practices. There's just so much opportunity to make a positive difference in the world, and open science practices just follow from that.
Don't miss the latest news and blogs, sign up to The Researcher's Source Monthly Digest!