This is the fifth blog post in our series in which researchers share their experiences with open science practices, and reflect on the impact that sharing open data, code, and protocols can have.
Dr. Steven Henikoff is a molecular biologist and professor at the Basic Sciences Division, Fred Hutchinson Cancer Research Centre in Seattle, WA; USA. He is also a Howard Hughes Medical Institute (HHMI) investigator and a member of the National Academy of Sciences. Among one of the first scientists to realise that computing and the internet could revolutionise biological research, he is the recipient of the 55th Lewis S. Rosenstiel Award for Distinguished Work in Basic Medical Research.
His lab performs research in chromatin, nuclear dynamics, transcriptional regulation, chromosomes, centromeres and, most recently, cancer, developing experimental and computational tools for studying these processes. Having been actively involved in scientific research for over 50 years, Dr. Henikoff has been instrumental in shaping open science practices before ‘open science’ had a name.
We asked Dr. Henikoff to share his experiences of these evolving practices and how they move the scientific enterprise forward.
“I first heard the term open science in the early 2000s, but it was really something many of us were doing well before that. When you work on methods, they're only really successful if someone else uses them”
Steven Henikoff, PhD, Fred Hutchinson Cancer Research Centre, Seattle, WA, USA
I first heard the term open science in the early 2000s, but it was really something many of us were doing well before that. When you work on methods, they're only really successful if someone else uses them, so the focus for my own lab for a long time has been on methods.
A few years after starting my lab with Fred Hutch, I came up with a method that I realised would be useful to many others to make clones for DNA sequencing, which was later dubbed the Erase-a-Base® System. In the early 1980s, my Erase-a-Base® method would allow someone to finish a project efficiently, whereas shotgun sequencing, which was the norm back then, always left holes that required wasteful redundancy or targeted methods to finish. This is where the open science of its day helped. My Erase-a-Base® method was commercialised as a kit. Kits for genomics were getting started back then before anyone ever heard about open science or online protocols, but the intention was similar. You make a protocol work in the hands of a novice user, that's the idea, and so that got me hooked on developing methods.
What I first used Erase-a-Base® for was to complete the sequence of a fruit fly gene I'd been working on to understand its alternative RNA processing in two different protein isoforms, and I wondered why this first intron was so large. We’re talking about the early 1980s, and introns were only discovered in genes in pieces maybe several years before, so it was all new, but by searching the intron sequence in all six reading frames against the public protein sequence database, I discovered that there was a protein encoded on the opposite strand and determined that its sequence corresponded to a previously unknown Pupal Cuticle Protein gene.
Now in the mid-1980s, this was the first example of a gene nested with an intron of another gene; something that later turned out to be quite common, not only in fruit flies, but also in us. The experience encouraged me to use six-frame translation to search DNA sequence databases for protein sequence homologies, which led to further unexpected discoveries and my interest over the long term in developing computational tools for genomics. This depended on researchers depositing their sequence into the GenBank DNA sequence database to make them publicly searchable by anyone around the world using the BLAST server. Whereas previously, I had to purchase floppy discs of GenBank, load them on my personal computer, and do the six-frame translation searches using a standalone program.
But with the introduction of the BLAST server, anyone could do it from anywhere, and that's really open science to me. So, I got started way back when at the beginning of what we call genomics, which has really flourished.
Open access, sharing data, code, protocols, methods; that's the norm nowadays. For example, HHMI makes co-submission to bioRxiv and public release of data and code mandatory, and other funders are doing that as well.
Publishing open access is recent relative to the early days of open science. I'd like to emphasise that online searchable databases really drove genomics beginning in 1990. Back then, once you sequenced the gene, the next step was to see if it matched anything known. You'd find the open reading frame in your sequence, then you'd search it against the protein database. If you get access to it, of course, and often, nothing would show up because the database was built on amino acid sequencing of proteins, and that was very low throughput. With automated Sanger DNA sequencing beginning in the late 1970s, that changed all that, and GenBank expanded rapidly with predictive protein sequences curated from public sequences. That was a major effort of the National Center for Biotechnology Information (NCBI), and they really deserve credit for promoting open science as we know it. It was well before publishers realised that open access could be a viable business model. NCBI was really pushing this since the late 80s, early 90s, so I’d like to give them a little plug because it's sort of forgotten.
My own work grew from efforts to identify evolutionary relationships between proteins. With my wife Jorja, we developed computational tools such as the Blocks Database and the BLOSUM substitution matrices, now standard in tools like BLAST and FASTA. We also created public tools including SIFT, CODHOP, and SEACR.
Beyond methods, I’ve been involved in open access publishing, co-founding Epigenetics & Chromatin in 2008 and later serving as field chief editor for Frontiers in Epigenetics and Epigenomics.
“You enable other people to use a method you develop, and it moves the scientific enterprise forward.”
Steven Henikoff, PhD, Fred Hutchinson Cancer Research Centre, Seattle, WA, USA
The examples I gave sort of illustrate this; you enable other people to use a method you develop, and it moves the scientific enterprise forward. For the journal editing side, it keeps me up to date with developments in the broader field beyond my own research interests and expands my scientific horizons, and that's helped my research, as well as my trainees and my staff. I enjoy learning interesting new science, as we all do, that otherwise would be unfamiliar. So, those are the other motivations.
The biggest barrier that I've run into in recent years is inertia. To try to get people to use our methods and use them correctly, that can be hard. The examples I'm thinking of here are CUT&RUN, CUT&Tag, and RT&Tag; these methods bear no relationship to what the standard has been in the field for chromatin profiling. As a result, the way you evaluate the data can be very different from what the field is accustomed to.
Lessons we've learned from interacting with users have been incorporated into a Primer article invited by the editors of Nature Reviews Methods Primers and recently published. It's intended to be an authoritative guide for novice users and data analysis providers for this rapidly proliferating class of methods.
In the year 2000, we published a paper in Nature of Biotechnology describing a scalable method to screen for point mutations in populations of chemically mutagenised individuals, and we use something called Denaturing High Performance Liquid Chromatography. We call the method TILLING, which is also an acronym for Targeting Induced Local Lesions in Genomes.
We first applied it to plants, hence the TILLING acronym. We detected missense mutations using this method, and they've been valuable for all kinds of studies. Our first application was for the model plant Arabidopsis thaliana, where we discovered induced mutations in DNA methyltransferase genes that we were studying. It worked so well that, together with my University of Washington (UW) colleague Luca Comai, we went on to grow many mutagenised Arabidopsis plants in a big UW greenhouse, and we sent the seeds to the Arabidopsis Biological Resource Center (ABRC). Researchers interested in particular genes would search our Codons Optimised to Discover Deleterious Lesions website, established by Fred Hutch computational biologist Elizabeth Green, and if users find likely deleterious mutations in any of the plants, they order the seeds from ABRC and do backcrossing of the genetics.
In those days, farmers were reluctant to plant GMO crops, especially in Europe, but mutagenising seeds with chemicals, called mutation breeding, had long been routine in agriculture, so this created a demand for applying our TILLING technology to crop plants. With NSF support, we established programmes with collaborators for maize at Purdue, soybeans at Southern Illinois University, and for rice that was sponsored by the International Rice Research Institute in the Philippines. We also helped set up TILLING for nematodes and fruit flies, and we ran workshops at UW to teach TILLING to all comers. That was my favourite example.
With protocols.io, for example, if somebody asks a question, it gets us to do a little digging, so that’s kind of like a collaboration. I actually have one in my inbox right now asking for troubleshooting advice on our CUT&Tag-based modification for archival clinical samples. The person who asked the question includes a suggestion for protocol modification that I really hadn't thought of, so it is quite useful for getting us to think.
There’s also something called the Science Education Partnership at Fred Hutch to teach high school teachers how to engage students in modern science. Students would do nanopore sequencing on microbes that they'd find in seawater, and DNA sequencing in high school is nothing special, but we want to know about genes, right? We want to know about information flows from DNA to RNA to protein, the central dogma of molecular biology that's been around for decades and decades, and what we find is that students don't really understand that.
Our methods are based on looking at the transcriptional apparatus called RNA polymerase II, and as a result, by just doing the method, they learn about real information flows like that. The paper that we published is called Chromatin profiling for everyone: FFPE-CUTAC for the theory and practice of modern molecular biology. We’re training them in something that's going to be the future, not just DNA sequencing.
I’m going to soapbox a little bit. Most of the attention in cancer research has been for prevention, of course, and therapeutics. I hear a lot about that; a new drug for this, that, and the other thing. I think that the problem there, is to really do something in the laboratory that is going to be useful in the clinic for cancer research, is going to take millions and maybe investment by Pharma to support the clinical trials needed. This is a huge, huge cost, and it’s a gamble because most drugs actually fail. I just think that we as a scientific enterprise need to pay more attention to the diagnostics and not just do what is already set up because it’s a heavily funded industry.
Look at what I I've gotten out of it. It's been very useful and it's my job. Funders such as my employer, HHMI, promote that. Not all funders do that, but if you can find a funder who will give you the freedom to take advantage of open science, that's great.
Steven Henikoff, PhD, Professor, Fred Hutchinson Cancer Research Centre, Seattle, WA, USA
Dr. Steven Henikoff is a molecular biologist who studies the structure, function and evolution of our DNA molecules, or chromosomes. He also develops tools for comparing gene sequences, determining the arrangement of genes in living cells and understanding the biological functions of genes. Credited with helping build the infrastructure for analyzing the human genome, Dr. Henikoff was among the first to realize that computing and the internet could revolutionize biological research.
He is also a Howard Hughes Medical Institute (HHMI) investigator and a member of the National Academy of Sciences, and the recipient of the 55th Lewis S. Rosenstiel Award for Distinguished Work in Basic Medical Research.
Steven Henikoff has shared his own protocols on TILLING and Ecotilling in plants and animals, Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm, Cell type–specific gene expression and chromatin profiling in Arabidopsis thaliana, targeted in situ genome-wide profiling with high efficiency for low cell numbers, Chromatin profiling with CUT&Tag, and Single-cell profiling of chromatin modifications with sciCUT&Tag by publishing them in Nature Protocols and posted various other protocols in the protocols.io repository.
Don't miss the latest news and blogs, sign up to The Researcher's Source Monthly Digest!