There is no such thing as too esoteric for the Internet (and other reasons to publish your data)

The Source
By: Lucy Frisch, Tue Feb 12 2019
Lucy Frisch

Author: Lucy Frisch

We’re recognizing Love Data Week (February 11-15) and this year’s theme is ‘data in everyday life.’ We’ve asked several researchers who participated in our Better Science Through Better Data event to reflect on the importance of data sharing in their own lives. We’ll be sharing their stories all week so keep checking back!

Written by Dr. James Avery

Every day I benefit from someone having taken the time to share something online. In my home life it could be a video recipe, a guide to replacing a float ball valve in the tank in the loft or how to take a buggy adaptor off a car seat (I was very close to losing my patience with this one!). The same is true for the technical aspects of my academic life, a decent software library, and open source hardware design or a guide for a chip which is better than the official documentation. Often these are not on monetised channels, they simply end with someone saying – “Anyway, I had trouble with this so I thought this might help somebody.” What a perfect distillation of the internet at its best.

It was always in the back of my mind that I was benefiting from others sharing but had no means to reciprocate. Whilst I hoped people might want to read my papers, my academic work always seemed too esoteric for anyone to be interested in seeing how it was made or reusing the data for themselves. This changed when I realised there is no such thing as too esoteric for the internet! I had always thought that if someone were interested in using anything I had done, then they would email me and then I could share it somehow. Whilst I may have seen the light now, this attitude clearly still pervades in academia.

A few years ago, I was involved in collecting a dataset in a hospital and I saw this an opportunity to finally share something interesting with the wider research community. We were developing a potential method to differentiate ischaemic and haemorrhagic stroke without neuroimaging (CT/MRI) using a technique called Multi Frequency Electrical Impedance Tomography (MF-EIT). Whilst the hardware for MF-EIT is portable, the image reconstructions are particularly sensitive to errors, and so far, none have translated successfully from simulations or phantoms to human studies. There is a plethora of interesting research into improved MF-EIT imaging methods, but most EIT researchers are in departments which do not have a simple pathway to collect patient data. Our connection to a specialised unit like the UCLH Hyper Acute Stroke Unit is an even rarer commodity. Unfortunately, as a result, many talented researchers are unable to test the robustness of their algorithms against anything representing realistic clinical data. Beyond the EIT clique, how could we expect to gather interest from machine learning researchers if there was no data for them to look at?

Figure 2, Scientific Data 5, Article number: 180112 (2018).

To try and address this problem, we published a rich neuroscientific dataset (Sci Data 5, 180112) collected from 23 patients at the UCLH HASU, including MRI/CT, EEG and MF-EIT data. All the hardware, firmware and software were also released on an open source license. I tried to lower the barrier as much as possible for anyone interested in not just MF-EIT but stroke type differentiation in general. The data has so far been downloaded hundreds of times and has sparked some new collaborations into further analysing the data. I feel these would not have been possible without an open attitude to data and methods.

Another benefit I found to sharing the data and code for this project came from the expectation that one day, someone might actually read this! This reminded me of rubber duck debugging, where you can discover the bug in your code by forcing yourself to explain it, line by line, to a duck. Imagining myself explaining the data and how to use it step by step to a fictitious researcher gave the documentation the necessary detail. Not only does it make the problem clearer in your mind, the chances of someone using your data massively increase when they see extensive documentation for it.

In preparing a talk on this dataset and data sharing I found a perfect example of the benefits of open data. I needed a high-resolution image of a British Ambulance, which I found on a blog from nearly 10 years ago, with the caption like “Here is a high-resolution image of a British Ambulance, in case its useful to somebody.” Eventually it was!

Watch James’ lightning talk at #SciData18 here.

Springer Nature is committed to supporting researchers in sharing research data and in receiving the credit you deserve.
Read more about our research data products and services.


About James Avery

James completed his PhD in Biomedical Engineering at University College London in 2015 and continued his work as an EPSRC Doctoral Research Fellow. There he developed electrical impedance tomography methods for brain imaging as part of Prof. David Holder’s Neurophysiology lab. Clinical studies during this time brought into sharp focus the benefits that good, open and reproducible engineering can have for patients and strengthened his desire to translate his work into clinical practice. Since 2018 he has worked as a postdoctoral researcher at the NIHR Imperial Biomedical Research Centre, seeking to develop new sensor technologies for surgery. Read more about James here.

Lucy Frisch

Author: Lucy Frisch

Lucy Frisch is a Senior Marketing Manager leading the Content Marketing Programmes team, based in the New York office. She has a passion for storytelling and works to humanize the research published across Springer Nature with a focus on the researcher experience.

Related Tags: