Open science before it had a name: An interview with molecular biologist Steven Henikoff

T
The Researcher's Source
Melanie Clyne

Author: Melanie Clyne

Chief Editor, Nature Protocols

This is the fifth blog post in our series in which researchers share their experiences with open science practices, and reflect on the impact that sharing open data, code, and protocols can have.  

Dr. Steven Henikoff is a molecular biologist and professor at the Basic Sciences Division, Fred Hutchinson Cancer Research Centre in Seattle, WA; USA. He is also a Howard Hughes Medical Institute (HHMI) investigator and a member of the National Academy of Sciences. Among one of the first scientists to realise that computing and the internet could revolutionise biological research, he is the recipient of the 55th Lewis S. Rosenstiel Award for Distinguished Work in Basic Medical Research.  

His lab performs research in chromatin, nuclear dynamics, transcriptional regulation, chromosomes, centromeres and, most recently, cancer, developing experimental and computational tools for studying these processes. Having been actively involved in scientific research for over 50 years, Dr. Henikoff has been instrumental in shaping open science practices before ‘open science’ had a name.  

We asked Dr. Henikoff to share his experiences of these evolving practices and how they move the scientific enterprise forward. 

How did you first become interested in open science practices?  

“I first heard the term open science in the early 2000s, but it was really something many of us were doing well before that. When you work on methods, they're only really successful if someone else uses them” 


Steven Henikoff, PhD, Fred Hutchinson Cancer Research Centre, Seattle, WA, USA 

I first heard the term open science in the early 2000s, but it was really something many of us were doing well before that. When you work on methods, they're only really successful if someone else uses them, so the focus for my own lab for a long time has been on methods. 

A few years after starting my lab with Fred Hutch, I came up with a method that I realised would be useful to many others to make clones for DNA sequencing, which was later dubbed the Erase-a-Base® System. In the early 1980s, my Erase-a-Base® method would allow someone to finish a project efficiently, whereas shotgun sequencing, which was the norm back then, always left holes that required wasteful redundancy or targeted methods to finish. This is where the open science of its day helped. My Erase-a-Base® method was commercialised as a kit. Kits for genomics were getting started back then before anyone ever heard about open science or online protocols, but the intention was similar. You make a protocol work in the hands of a novice user, that's the idea, and so that got me hooked on developing methods. 

What I first used Erase-a-Base® for was to complete the sequence of a fruit fly gene I'd been working on to understand its alternative RNA processing in two different protein isoforms, and I wondered why this first intron was so large. We’re talking about the early 1980s, and introns were only discovered in genes in pieces maybe several years before, so it was all new, but by searching the intron sequence in all six reading frames against the public protein sequence database, I discovered that there was a protein encoded on the opposite strand and determined that its sequence corresponded to a previously unknown Pupal Cuticle Protein gene. 

Now in the mid-1980s, this was the first example of a gene nested with an intron of another gene; something that later turned out to be quite common, not only in fruit flies, but also in us. The experience encouraged me to use six-frame translation to search DNA sequence databases for protein sequence homologies, which led to further unexpected discoveries and my interest over the long term in developing computational tools for genomics. This depended on researchers depositing their sequence into the GenBank DNA sequence database to make them publicly searchable by anyone around the world using the BLAST server. Whereas previously, I had to purchase floppy discs of GenBank, load them on my personal computer, and do the six-frame translation searches using a standalone program. 

But with the introduction of the BLAST server, anyone could do it from anywhere, and that's really open science to me. So, I got started way back when at the beginning of what we call genomics, which has really flourished. 

What open science practices do you use? 

Open access, sharing data, code, protocols, methods; that's the norm nowadays. For example, HHMI makes co-submission to bioRxiv and public release of data and code mandatory, and other funders are doing that as well. 

Publishing open access is recent relative to the early days of open science. I'd like to emphasise that online searchable databases really drove genomics beginning in 1990. Back then, once you sequenced the gene, the next step was to see if it matched anything known. You'd find the open reading frame in your sequence, then you'd search it against the protein database. If you get access to it, of course, and often, nothing would show up because the database was built on amino acid sequencing of proteins, and that was very low throughput. With automated Sanger DNA sequencing beginning in the late 1970s, that changed all that, and GenBank expanded rapidly with predictive protein sequences curated from public sequences. That was a major effort of the National Center for Biotechnology Information (NCBI), and they really deserve credit for promoting open science as we know it. It was well before publishers realised that open access could be a viable business model. NCBI was really pushing this since the late 80s, early 90s, so I’d like to give them a little plug because it's sort of forgotten. 

My own work grew from efforts to identify evolutionary relationships between proteins. With my wife Jorja, we developed computational tools such as the Blocks Database and the BLOSUM substitution matrices, now standard in tools like BLAST and FASTA. We also created public tools including SIFT, CODHOP, and SEACR. 

Beyond methods, I’ve been involved in open access publishing, co-founding Epigenetics & Chromatin in 2008 and later serving as field chief editor for Frontiers in Epigenetics and Epigenomics. 

What motivated you to adopt these practices? 

“You enable other people to use a method you develop, and it moves the scientific enterprise forward.” 

                                                                      Steven Henikoff, PhD, Fred Hutchinson Cancer Research Centre, Seattle, WA, USA 

The examples I gave sort of illustrate this; you enable other people to use a method you develop, and it moves the scientific enterprise forward. For the journal editing side, it keeps me up to date with developments in the broader field beyond my own research interests and expands my scientific horizons, and that's helped my research, as well as my trainees and my staff. I enjoy learning interesting new science, as we all do, that otherwise would be unfamiliar. So, those are the other motivations. 

Have you ever faced any particular challenges or barriers in undertaking open science practices?  

The biggest barrier that I've run into in recent years is inertia. To try to get people to use our methods and use them correctly, that can be hard. The examples I'm thinking of here are CUT&RUN, CUT&Tag, and RT&Tag; these methods bear no relationship to what the standard has been in the field for chromatin profiling. As a result, the way you evaluate the data can be very different from what the field is accustomed to. 

Lessons we've learned from interacting with users have been incorporated into a Primer article invited by the editors of Nature Reviews Methods Primers and recently published. It's intended to be an authoritative guide for novice users and data analysis providers for this rapidly proliferating class of methods. 

Is there a particular success story or example you’re proud of that illustrates the benefits of open science?  

In the year 2000, we published a paper in Nature of Biotechnology describing a scalable method to screen for point mutations in populations of chemically mutagenised individuals, and we use something called Denaturing High Performance Liquid Chromatography. We call the method TILLING, which is also an acronym for Targeting Induced Local Lesions in Genomes. 

We first applied it to plants, hence the TILLING acronym. We detected missense mutations using this method, and they've been valuable for all kinds of studies. Our first application was for the model plant Arabidopsis thaliana, where we discovered induced mutations in DNA methyltransferase genes that we were studying. It worked so well that, together with my University of Washington (UW) colleague Luca Comai, we went on to grow many mutagenised Arabidopsis plants in a big UW greenhouse, and we sent the seeds to the Arabidopsis Biological Resource Center (ABRC). Researchers interested in particular genes would search our Codons Optimised to Discover Deleterious Lesions website, established by Fred Hutch computational biologist Elizabeth Green, and if users find likely deleterious mutations in any of the plants, they order the seeds from ABRC and do backcrossing of the genetics. 

In those days, farmers were reluctant to plant GMO crops, especially in Europe, but mutagenising seeds with chemicals, called mutation breeding, had long been routine in agriculture, so this created a demand for applying our TILLING technology to crop plants. With NSF support, we established programmes with collaborators for maize at Purdue, soybeans at Southern Illinois University, and for rice that was sponsored by the International Rice Research Institute in the Philippines. We also helped set up TILLING for nematodes and fruit flies, and we ran workshops at UW to teach TILLING to all comers. That was my favourite example. 

How does open science enable collaboration, feedback, and real-world impact in both research and education? 

With protocols.io, for example, if somebody asks a question, it gets us to do a little digging, so that’s kind of like a collaboration. I actually have one in my inbox right now asking for troubleshooting advice on our CUT&Tag-based modification for archival clinical samples. The person who asked the question includes a suggestion for protocol modification that I really hadn't thought of, so it is quite useful for getting us to think. 

There’s also something called the Science Education Partnership at Fred Hutch to teach high school teachers how to engage students in modern science. Students would do nanopore sequencing on microbes that they'd find in seawater, and DNA sequencing in high school is nothing special, but we want to know about genes, right? We want to know about information flows from DNA to RNA to protein, the central dogma of molecular biology that's been around for decades and decades, and what we find is that students don't really understand that. 

Our methods are based on looking at the transcriptional apparatus called RNA polymerase II, and as a result, by just doing the method, they learn about real information flows like that. The paper that we published is called Chromatin profiling for everyone: FFPE-CUTAC for the theory and practice of modern molecular biology. We’re training them in something that's going to be the future, not just DNA sequencing. 

Are there any gaps in support or resources in open science you’d like to see addressed?  

I’m going to soapbox a little bit. Most of the attention in cancer research has been for prevention, of course, and therapeutics. I hear a lot about that; a new drug for this, that, and the other thing. I think that the problem there, is to really do something in the laboratory that is going to be useful in the clinic for cancer research, is going to take millions and maybe investment by Pharma to support the clinical trials needed. This is a huge, huge cost, and it’s a gamble because most drugs actually fail. I just think that we as a scientific enterprise need to pay more attention to the diagnostics and not just do what is already set up because it’s a heavily funded industry. 

What advice would you give to researchers considering adopting open science practices? 

Look at what I I've gotten out of it. It's been very useful and it's my job. Funders such as my employer, HHMI, promote that. Not all funders do that, but if you can find a funder who will give you the freedom to take advantage of open science, that's great. 

Learn more about open science and sharing research data, code and protocols & methods openly. 


Steven Henikoff, PhD, Professor, Fred Hutchinson Cancer Research Centre, Seattle, WA, USA 

Dr. Steven Henikoff is a molecular biologist who studies the structure, function and evolution of our DNA molecules, or chromosomes. He also develops tools for comparing gene sequences, determining the arrangement of genes in living cells and understanding the biological functions of genes. Credited with helping build the infrastructure for analyzing the human genome, Dr. Henikoff was among the first to realize that computing and the internet could revolutionize biological research.  
He is also a Howard Hughes Medical Institute (HHMI) investigator and a member of the National Academy of Sciences, and the recipient of the 55th Lewis S. Rosenstiel Award for Distinguished Work in Basic Medical Research. 

Steven Henikoff has shared his own protocols on TILLING and Ecotilling in plants and animalsPredicting the effects of coding non-synonymous variants on protein function using the SIFT algorithmCell type–specific gene expression and chromatin profiling in Arabidopsis thaliana, targeted in situ genome-wide profiling with high efficiency for low cell numbers, Chromatin profiling with CUT&Tag, and Single-cell profiling of chromatin modifications with sciCUT&Tag by publishing them in Nature Protocols and posted various other protocols in the protocols.io repository. 


Related content:

  • Open science conversations: 
  1. Building trust through transparency: An open science conversation with Geir Kjetil Sandve 
  2. Open science, altruism and impact: An interview with clinical geneticist Zornitza Stark
  3. Bioinformatician Johannes Koester on embodying the spirit of open science 
  4. How to do science that matters: Computational biologist Casey S. Greene on purpose-driven open science practices 
  •  Best practices for transparency and reuse: 
  1. How to share your research protocols and methods openly 

  2. How to share your research code openly 

  • Supporting open science practices: 
  1. Why share your research data? 

  2. Why sharing protocols matters 

  3. Why sharing your code matters 

Don't miss the latest news and blogs, sign up to The Researcher's Source Monthly Digest

Melanie Clyne

Author: Melanie Clyne

Chief Editor, Nature Protocols

Mel Clyne joined Nature Protocols in June 2014 and leads the journal's editorial team. Mel completed her PhD at the Genome Damage and Stability Centre, University of Sussex, UK, where she studied cisplatin resistance in mismatch-repair-defective cells. She continued this research with a brief stint as a postdoctoral scientist before making the transition from laboratory-based research to medical publishing in 2009. She worked—first as an Editorial Assistant, and then as a Clinical Writer—for Map of Medicine, drafting peer-reviewed treatment guidelines for the NHS, and helping local healthcare communities to evaluate the impact of service redesign. Mel joined Nature Publishing Group in September 2011 as an Editor for Nature Reviews Urology, before taking up her current position at Nature Protocols.