We need to retain research integrity in the AI era
There’s no question that artificial intelligence technologies pose a profound challenge to our ability to ensure the integrity of research, particularly in PhD study
The emergence of digital assistance tools – often grouped together under the umbrella of artificial intelligence (AI) technologies – like ChatGPT and Stable Diffusion has led to widespread debate in the global academic community about how can they be used constructively when they are problematic – and even challenge the meaning of the word ‘plagiarism’.
Much of the discussion on conduct, or rather misconduct, has concerned the challenges arising from students using these tools to complete their coursework, but there are also profound challenges for research integrity.
In particular, we need to consider research integrity in relation to these tools for PhDs or research Masters students – known at the University of Melbourne as graduate researchers – what they’re using these tools for and how they work.
The appearance of slick knowledgeability
The broad category of digital assistance tools (DATs) – which include AI – has been developing for decades.
Many of us routinely and uncontroversially use a DAT – they correct our spelling in a document, recommend alternative queries for a search engine or suggest possible words in a text message.
The DATs that have provoked recent debate are in some ways not so much new as bringing together and scaling up long-standing technologies.
But they truly are remarkable, providing a mechanism for interacting with a vast collection of human discourse that – at this early stage – seems to be as disruptive as the emergence of the web, search engines and the smartphone.
For years, students have been using similar tools, based on similar technology, as they write essays and summarise collections of papers. Other established applications include language translation, clarification, rewording and reformulation of written expression.
In some respects, these most recent tools are not a fresh challenge. But they are, without question, much richer than their predecessors.
Above all, the text and images they generate are new, and their controlled use of randomness to make choices means that the repetition of a prompt never leads to identical outputs. This confounds our current approaches to the detection of breaches of research integrity.
But at their core, these tools are no more than pattern-matching machines.
Humans consider what knowledge they want to impart, then form statements to describe that knowledge – revising, editing and correlating to ensure that the content is correct and consistent.
DATs, by contrast, are concerned only with generating sequences of words that are suggested by the words they have already seen or generated. They aren’t using a knowledge bank or database that captures meaning, facts or concepts – but only associations that they have inferred from observed text.
That’s why they confabulate or appear to invent – there is no external validation.
Even if all of the input was ‘true’ or ‘correct’ the output would still be fallible, because there is no awareness of meaning and no sense in which semantic coherence is part of the generation process.
And yet the output they produce seems so very human.
On the one hand, there’s a disturbing dissonance between their lack of reasoning and their lack of any use of factual information. On the other, they create an appearance of confident, slick knowledgeability.
The superficial plausibility – and the impression that you are interacting with a cognitive entity – is psychologically persuasive but deeply misleading.
Profound policy challenges
Academic publishers have been responding to the advent of DATs.
For example, Nature and Elsevier have released policy statements that restrict the use of AI-generated text in publications, prohibit the inclusion of AIs as authors and set out guidelines for acknowledging where the text has come from.
The University of Melbourne has likewise made a statement on the use of these tools in research writing, which is closely aligned with those of the publishers mentioned above.
It explains how the use of DATs can breach our policies on research integrity. Succinctly, we require that material that has been generated or substantially altered by a DAT to be acknowledged as such, and that AIs cannot be listed as authors.
The absence of robust or agreed tools for detecting the use of DATs in thesis writing (and other materials that PhDs produce) does not mean that we should weaken our stance on what constitutes ethical practice.
Some uses of DATs will be detected by other means – for example, obvious variations in style, text that is semantically incoherent or inconsistency in knowledge between writing and conversation – and one would hope that an engaged thesis supervisor would eventually realise that a candidate is engaged in deception.
However, there is no question that these technologies are a profound challenge to our ability to ensure that work has been undertaken appropriately.
Understanding and communication
The advent of these new technologies has led to a questioning of assumptions about how we teach, the purposes of teaching, and whether and when the use of DATs is genuinely problematic.
In my view, the use of DATs by PhD candidates is indeed problematic, for a range of reasons.
The obvious one is the same as coursework students: an examination of a thesis is intended in part to assess the candidate’s ability to understand and communicate, and that’s why we expect them to provide their own text.
If they provide text from some other source, that assessment is undermined.
Today, we cannot with certainty identify which text might have been generated by a DAT, a difficulty that’s likely to worsen as DATs become more sophisticated.
There are stylistic indicators but these are just indicators, not the ironclad evidence that is required to prove that a tool was used.
That said, the unreliability of DAT-generated text means it would be very risky to include more than small fragments in a thesis.
However, there are a range of other concerns.
PhD candidates can be misled by confabulated or absurd summaries of topics of interest. They can conceal an inability to communicate clearly, or a lack of knowledge of basics, not just in the thesis but also in emails, proposals and progress reports.
Some candidates already use DATs for translation to understand material in other languages – tidying up the translated output with another DAT creates further opportunities for the garbling of content.
There are also legal issues.
One is ownership. Unacknowledged use of DAT-generated text sits uncomfortably with current copyright law.
Another is the disclosure of intellectual property (IP). If a prompt entered into a DAT is about an innovation, then the retention of the prompt by the DAT will mean that the IP is lost to the author.
Some people have speculated that these AIs herald a future in which human communication skills are no longer required.
But that future is not yet with us and, in my view, remains remote. Until it arrives, we will continue to expect our PhD candidates to speak in their own voices and will be concerned when they cannot do so without assistance – digital or otherwise.
Banner: Generated by MidJourney in response to the prompt ‘robot typing at a desk in a university library’