Lab Talk


Crowdsourcing, Citizen Science, and Data-sharing

The future of human neuroscience lies in crowdsourcing, citizen science and data sharing but it is not without its minefields.

A recent Scientific American article by Daniel Goodwin, “Why Neuroscience Needs Hackers,makes the case that neuroscience, like many fields today, is drowning in data, begging for application of advances in computer science like machine learning. Neuroscientists are able to gather realms of neural data, but often without big data mechanisms and frameworks to synthesize them.

The SA article describes the work of Sebastian Seung, a Princeton neuroscientist, who recently mapped the neural connections of the human retina from an “overwhelming mass” of electron microscopy data using state of the art A.I. and massive crowd-sourcing. Seung incorporated the A.I. into a game called “Eyewire” where 1,000s of volunteers scored points while improving the neural map.   Although the article’s title emphasizes advanced A.I., Dr. Seung’s experiment points even more to crowdsourcing and open science, avenues for improving research that have suddenly become easy and powerful with today’s internet. Eyewire perhaps epitomizes successful crowdsourcing — using an application that gathers, represents, and analyzes data uniformly according to researchers’ needs.

Crowdsourcing is seductive in its potential but risky for those who aren’t sure how to control it to get what they want. For researchers who don’t want to become hackers themselves, trying to turn the diversity of data produced by a crowd into conclusive results might seem too much of a headache to make it worthwhile. This is probably why the SA article title says we need hackers. The crowd is there but using it depends on innovative software engineering. A lot of researchers could really use software designed to flexibly support a diversity of crowdsourcing, some AI to enable things like crowd validation and big data tools.

The Potential

The SA article also points to Open BCI (brain-computer interface), mentioned here in other posts, as an example of how traditional divisions between institutional and amateur (or “citizen”) science are now crumbling; Open BCI is a community of professional and citizen scientists doing principled research with cheap, portable EEG-headsets producing professional research quality data. In communities of “neuro-hackers,” like NeurotechX, professional researchers, entrepreneurs, and citizen scientists are coming together to develop all kinds of applications, such as “telepathic” machine control, prostheses, and art. Other companies, like Neurosky sell EEG headsets and biosensors for bio-/neuro-feedback training and health-monitoring at consumer affordable pricing. (Read more in Citizen Science and EEG)

Tan Le, whose company Emotiv Lifesciences, also produces portable EEG head-sets, says, in an article in National Geographic, that neuroscience needs “as much data as possible on as many brains as possible” to advance diagnosis of conditions such as epilepsy and Alzheimer’s. Human neuroscience studies have typically consisted of 20 to 50 participants, an incredibly small sampling of a 7 billion strong humanity. For a single lab to collect larger datasets is difficult but with diverse populations across the planet real understanding may require data not even from thousands of brains but millions. With cheap mobile EEG-headsets, open-source software, and online collaboration, the potential for anyone can participate in such data collection is immense; the potential for crowdsourcing unprecedented. There are, however, significant hurdles to overcome.

The Minefields

The obstacles to data-sharing and open science are poorly annotated data, gaps in metadata, and variation in data quality and formats–problems threatening the clarity and reliability of results. Devices record data with different sampling and signal filtering characteristics. Different data formats include some information about a recording but leave out others. Researchers store their data and collect different information on subjects in all formats and manners. Some researchers perform their recordings with exquisite care to detail while others have sloppy executions. There are plenty of minefields. Anyone who has tried to analyze datasets from another research lab can testify to the great deal of back and forth that must take place before the data can be clearly understood and interpreted.   However, sharing of data is essential for the community to combine forces to produce deeper insights on larger scale as well as to validate results.

This requires going beyond simple repositories of data to platforms with expert standards for experimental protocols and data formats as well as easy tools for data discovery, extraction and use. In addition, taking from the example of Sebastian Seung’s eyewire, platforms will also need innovative ways to harness the diverse backgrounds and abilities of hacker communities with limited background in neuroscience. With these in place, the possibilities are tremendous.

Leave a Reply