The credibility of research needs you

Critical to the credibility of any research is being able to replicate its findings. That is, other researchers need to be able to get the same results based on the data and methodologies used in the original research.

It’s a check on the science.

But the problem is, over the last decade researchers have found that in disciplines like social science and bioscience, significant numbers of published research findings can’t be replicated.

Researchers are too often finding that published research findings can’t be replicated based on the same data and methodology. Picture: mika-baumeister/Unsplash

In 2016, over 70 per cent of scientists surveyed by the scientific journal Nature reported being unable to reproduce another scientist’s research, and over half said they believed the problem was now a “crisis.”

For the end users of research, whether they be government, industry or humble PhD students, estimates that a high proportion of the world’s published research may have critical errors in their methodology or analyses is a major issue.

“People need to have confidence in published research findings if the important research work being done in the social sciences is going to be picked up and make a difference,” says Associate Professor Fiona Fidler, a reproducibility expert at the University of Melbourne.

“Research that can’t be replicated doesn’t mean the research is necessarily wrong, but it does mean the findings need to be queried and perhaps tested in a different way.

“The problem is, we simply can’t afford to be testing and replicating every piece of research before it’s published – it will be too costly and time consuming.”

For example, an effort to replicate 15 cancer biology studies took an average of seven months for each study at a cost of US$27,000 each.

By crowdsourcing expertise from a crowd of experts, from professors to students, researchers hope to discover the best reasoning methodologies for predicting whether a piece of research is likely to replicate. Picture: Getty Images

But what if artificial intelligence could be trained to assess the likely replicability of a piece of research at just the touch of a button?

In what is the largest and most ambitious response to the replication crisis yet, the US government has recruited research teams to help develop a computer program that could quickly and reliably assign confidence scores on research claims in the social and behavioural sciences. And key to the project will be harnessing the collective wisdom of thousands of expert research assessors for the computer to learn from.

Associate Professor Fidler is leading one of two competing research teams that the US government’s Defense Advanced Research Projects Agency has tasked with crowdsourcing expert judgements. Her University of Melbourne-based repliCATS (Collaborative Assessment for Trustworthy Science) team has been awarded up to US$6.5 million in funding to recruit experts and have them work together to assess how reliable thousands of published research claims are likely to be.

These expert assessments will then go into a hopefully large enough dataset that can be used to train and test a final computer algorithm. DARPA has undertaken to make the algorithm transparent so that it can be readily scrutinised by researchers

The effort will build on the findings of previous smaller-scale research into crowdsourcing expert assessment.

In the repliCATS project experts will work in small groups where they can gain the advantage of each other’s reasoned assessments. Picture: Getty Images

Published research of course is already subjected to expert assessment through a peer review process, and in part a peer reviewer is making a judgement about the likely replicability of a research claim. But peer review based on the opinions of just a couple of individuals clearly isn’t a good enough guide as to how replicable research is.

The challenge then is to tap into the diverse expertise of a sufficiently large number of experts to try and identify what are the most reliable reasoning methods for predicting how replicable a piece of research is likely to be.

“For our project, we’re going to need thousands of experts across different fields, from undergraduate students to professors, to collaborate with us in assessing up to 3000 published research claims. The key question we want them to answer is whether the main finding will successfully replicate,” says repliCATS co-ordinator and research fellow in reproducibility and reasoning at the School of BioSciences, Dr Hannah Fraser.

“We are going to ask groups of five or six experts each to assess together the replicability of each research claim using the benefit of each other’s expertise.”

Each group will follow a discussion and behaviour protocol developed at the University of Melbourne called IDEA (Investigate, Discuss, Estimate, and Aggregate) that facilitates reasoned analysis. It does this by having discussions follow formal steps aimed at overcoming problems like bias and groupthink.

IDEA was designed to help make decisions in situations of uncertainty when all the information on an issue isn’t available. Specifically, it was designed to help expert decision making during fast moving biosecurity crises.

Each expert group will use a custom online platform that will run the IDEA protocol, both in face-to-face workshops or online as virtual communities.

Volunteer experts will also be able to work in “virtual” groups using a dedicated online platform. Picture: Pexels

Under the protocol each expert will make an independent, initial estimate on the likely replicability of a piece of research. They then have the benefit of looking at and discussing each other’s estimates, including their underlying reasoning, after which each expert has the opportunity to revise their original estimate. Their final estimates are then mathematically aggregated into a single score or rating of how replicable a piece of research is.

DARPA has funded a third group – the US-based Center for Open Science – to replicate over a hundred published studies to provide the benchmark repliCATS will need for testing how accurate their volunteer experts are. They will then be able to identify who are the most accurate assessors and why.

“We expect our crowd will do better at identifying potential failures to replicate, compared to traditional peer review, because of the structured deliberation protocol and elicitation methods we are employing,” says Associate Professor Fidler.

“I’m also personally interested in the question of whether and how structured deliberation like this might be adapted to support a more efficient peer review process.”

Associate Professor Fidler says that if successful, the project could transform the way in which we understand the credibility of research, and influence how research is practiced and assessed for publication.

“Ultimately it could help make published research more reliable and therefore more useful. But our effort is entirely dependent on having enough dedicated experts volunteering their time and expertise in the name science.”

Usually it is the experts who need public volunteers to participate in surveys and experiments. But this time it is the other way around. Public confidence is demanding the experts step up. Science needs you.

Information on the repliCATS project is here.

Banner image: Getty Images

Crowdsourcing security intelligence

Education for humans in an AI world

Holding a black mirror up to artificial intelligence

Making big sense of big data: The quest to improve human reasoning