Why Big Data is a Big Deal

Scientists can now manipulate data to see patterns and solutions that otherwise were undiscoverable

Everything we do leaves a digital trace. Websites capture our likes, purchases and searches. Sensor technologies like GPS, face recognition and wearable devices capture our location and activities. Organisational databases document our lives from blood tests to parking fines.

It has all been made possible by the exponential increase in computer data crunching power. The humble floppy disc stored kilobytes of data and then hard drives expanded from handling megabytes to gigabytes and servers began dealing with terabytes of memory. In 2008 when the petabyte emerged – 1,000,000,000,000,000 bytes – Big Data and Big Analytics had truly arrived.

The world has moved from the humble floppy disk, which stored kilobytes, to super computers that crunch petabytes. Picture: Pixabay

The masses of data we can manipulate has created new opportunities for scientists to see patterns and solutions that otherwise were undiscoverable. Looking at Big Data is akin to looking at an abstract dot painting that reveal patterns you hadn’t at first noticed, suggesting new hypotheses to model and test.

For example, it gives scientists the power to detect undiscovered correlations in protein interactions, genome sequences and molecular arrangements. This is why science, especially the biological sciences, are undergoing another renaissance.

1. Big data can be used for the common good, not just consumer goods

The commercial sector was quick to exploit massive data sets to analyse them for predictive trends to enhance their competitive edge.

The availability of data for commercial use has raised legitimate concerns about privacy, banking security and hacking that require ongoing vigilance and debate.

But Big Data is now also used for planning social infrastructure, health care, utilities, water, traffic, monitoring and acting on serious disease outbreaks, disaster management and logistics.

It is also changing the way we do research. It may just provide new global solutions to a host of problems ranging from healthcare to future environmental resilience.

Data visualisation of urban pollen distribution. Picture: Intel Free Press/Flickr

2. Big Data is allowing science to see patterns we didn’t know to look for

University of Melbourne Associate Professor Vicky Schneider is the new deputy director of the EMBL Australia Bioinformatics Resource (EMBL-ABR) hosted at the Victorian Life Sciences Computation Initiative (VLSCI) and is part of this big data revolution in biological and life science research.

“There has been tremendous excitement about the tammar wallaby, dunnart and thylacine genomes because marsupials have a unique adaptation to develop outside the sterile confines of a uterus into a harsh pathogen-rich environment,” says Associate Professor Schneider.

“Biomedical scientists working in the field of antibiotic resistance see this data as having the potential to discover new drugs to manage growing antibiotic resistance in our community. Clues to old problems can be found in new unexpected places.”

Associate Professor Andrew Lonie, Director of the VLSCI and EMBL-ABR, agrees: “Publishing a new genome is like sending out a party invitation to your research colleagues all over the world – it just that a lot of the action takes place online.

Big data and the tammar wallaby have come together in genomic research. Picture: Arthur Chapman/Flickr

“Somewhere, someone will be interested in seeing your data and comparing it with theirs. The more unique the species whose genome has been sequenced, the more interest it creates.’’

When US President Obama announced his commitment to finding new ways to treat cancer, bioinformaticians nodded in agreement. They appreciate how overlaying the genomes of people, their cancers and historical treatment records from hospitals has the potential to produce new insights into cancer and its treatment.

In 2015 VLSCI collaborators at the Peter MacCallum Cancer Centre revealed at least four key mechanisms by which initially vulnerable ovarian cancers can undergo genetic changes to become resistant to common chemotherapy – a finding which is now refining the way treatments are given to people with those particular ‘sub-types’ of cancer. Such insights can both save lives and dollars.

“Physicists, geologists and astronomers have been using data-driven technologies for a long time. The biological sciences were a bit slower in harnessing this approach,” says Associate Professor Schneider. “Data-driven science has the potential to see new solutions to health and disease management, environmental resilience and food security.”

3. open Big data is now taking science to a new level

Associate Professor Schneider can remember when she first realised how critical data analysis and storage has become to the sciences.

“I remember biologists often used off-the-shelf software to log and process their data. When I was in an ecology lab, I decided to take a closer look at the assumptions built into the common software our research team was using,” Associate Professor Schneider says.

“It turned out the software was not fit for our purposes. This was probably true for many labs, not just ours. It was turning a point for me and was just one example of what was occurring worldwide – better data management and bioinformatics is now critical to the translation and utility of biological research.

Section of a Circos plot, created using software for visualising complex data sets. Picture: University of Melbourne

“Data must have a level of integrity to be useful in large volumes. A new breed of biological computing experts – bioinformaticians – combine an understanding of biology with complex data analytical skills, figuring out how to use and combine different types of data like raw data and/or processed data. And it’s not just the data but the metadata that matters too.

“For these massive data sets to become useful knowledge and enable new approaches they must be globally available to all scientists. In other words, research results must become open data. Many funders are now insisting on data management and open access policies to data as part of the project proposals.”

Collaboration and cooperation

This open science movement towards collaboration, cooperation, openness and sharing regards traditional papers and research ideas as only partial solutions to problems in health, society and the environment. It is challenging some less agile aspects of current scientific practice.

Associate Professor Schneider’s work at EMBL-ABR involves strengthening ties to international open data initiatives and using them to facilitate both data sharing and skills development among our research communities.

The application of genomics and bioinformatics to advance plant, animal and microbial research to promote a sustainable bio-economy is now critical.

As data analysis is refined and shared, so is the science, and that’s an exciting future for our health and the health of the environment.

VLSCI is funded by the Victorian Government and contributing institutions and is hosted by the University of Melbourne. This petascale facility delivers expertise and systems for life science computing. The EMBL Australian Bioinformatics Resource is hosted at VLSCI through a funding agreement between the University of Melbourne and Bioplatforms Australia.

Banner image: A tunnel of books. Picture: Pixabay