Continuing on our theme, Who’s afraid of statistics? this week we are talking to statistician and author Daniela Witten, about her research and approach to writing the accessible and well-loved statistics textbook, An Introduction to Statistical Learning.
Can you introduce yourself and give an overview of your current research?
I work as a Professor of Statistics and Biostatistics, and the Dorothy Gilford Endowed Chair in Mathematical Statistics, at the University of Washington. My research focuses on developing, understanding, and applying statistical machine learning methods for large-scale and messy datasets. As the pace and scale of data collection continue to increase across so many fields, there's a growing need for statistical methods to make sense of the data. The methods that I develop aim to fill in this gap. I am particularly interested in methods for the analysis of data from genomics and neuroscience: those fields have seen an explosion of data in recent years, and there is a need for new statistical methods to fill the gap between the data that scientists are collecting and the questions that they want to answer using that data.
You co-authored the incredibly popular statistics textbook “An Introduction to Statistical Learning”: Why do you think the textbook is so popular? And what was the philosophy of you and your co-authors when writing it?
In the past few decades, the field of statistical machine learning has produced a critical toolkit for analyzing large-scale, messy, and complicated data sets. Today, a data analyst in virtually any field needs to have a working understanding of the main ideas in statistical machine learning, as well as an ability to apply these key methods to their data.
However, ten years ago, when we developed the idea for our textbook, there were no resources available for data analysts who did not have extensive graduate level training in statistics or a closely-related field. Existing textbooks assumed a high level of background knowledge, and focused on technical details rather than the key ideas needed to apply statistical machine learning methods in practice. We set out to fill this gap by writing a textbook that is accessible to a broad audience. Our textbook assumes just a previous course or two in statistics or probability, and in particular does not require knowledge of matrix algebra. We use simple language to distill complicated ideas down to their essence. Instead of just starting off with the fanciest and shiniest statistical learning methods, we build up from the basics so that readers can understand the building blocks of the more advanced methods. We also include, in each chapter, a computing lab written in the very popular open-source statistical software environment R, so that readers can learn how to apply these methods in practice.
Our textbook has been very successful: it has been cited more than 10,000 times according to Google Scholar, and has been an Amazon bestseller since it was published in 2013, with over 1,000 reviews averaging 4.7/5.0 stars.
The 1st edition was written in 2013, what new features can readers expect in the 2nd edition?
The 2nd edition contains three new chapters: on deep learning and neural networks; multiple testing; and survival analysis. It also includes new sections on a variety of topics, including Bayesian additive regression trees (BART), naive Bayes, and generalized linear models.
In recent years’ statistics have gone from being a slightly scary subject to being a vital skill across all disciplines, largely due to big data and computational power. What are the most interesting developments you seen in the use of statistics? And what new trends can we expect to see in going forwards?
I am so impressed by the increased statistical sophistication across so many fields. It used to be that only "experts" knew the basics of statistics and statistical machine learning, but now an increasing number of people, both in and out of academia, are becoming proficient in these areas. I get a huge kick out of seeing my textbook on the bookshelves of my scientific collaborators, and on the desks of software engineers and data scientists at tech companies. I truly believe that moving forward, a solid grasp of the key ideas in statistical machine learning --- as well as an ability to apply these ideas to data --- will be viewed as a core competency for any data analyst.
An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, deep learning, survival analysis, multiple testing, and more.
This Second Edition features new chapters on deep learning, survival analysis, and multiple testing, as well as expanded treatments of naïve Bayes, generalized linear models, Bayesian additive regression trees, and matrix completion. R code has been updated throughout to ensure compatibility.
To add this title or the Mathematics & Statistics collection to your Library contact us here.