‘Generative AI is not thinking and knows nothing – it must be approached with caution’
Should we use?artificial intelligence to assist both submission and panel assessments in REF 2029? What a question. The suggestion that we might, recently raised by a , has caused quite the kerfuffle.
What has been exposed in the fuss is the extent to which academic perspectives on AI and its uses diverge, and how little contact or understanding there appears to be between those varying perspectives.
Of course, generative AI has been causing a stir since the first versions of what is now ChatGPT were launched into the world in late 2022. The large language models (LLMs) that lie behind the interfaces were launched amid a rapture of prophesying; this was a technology that was either going to propel a great leap forward for humanity or herald a new age of enslavement, with the machines finally pushing their former masters aside. Businesses were told to embrace the future or risk extinction, while even some of the researchers involved in creating the tools warned gravely of risks to the survival of civilisation as we know it. Tech companies themselves made ever grander claims for the transformational power of their products, wooing politicians desperate to bask in a warming blast of technological white heat, scooping up huge quantities of investors’ money, and carving out unique exemptions from such trifling matters as copyright law and government regulation. Meanwhile, larger and larger data centres spring up, and energy demands skyrocket as environmental concerns are lost in the rush.
Inevitably, GenAI has already had a big impact on university life but the responses from within different parts of academia have varied significantly. One of the most immediate reactions was a belief that all the safeguards against cheating in assessments put in place over the years, including plagiarism-detection software, were now basically useless: students would be able to type their essay questions directly into an AI interface and get back a fully-formed assignment within seconds. Some initial studies suggested ChatGPT could easily ace all manner of assessments, provoking calls for a return to in-person, exam-based assessment from the reliance on essays established over previous decades.
Campus collection: Research excellence: what is it and how can universities achieve it?
Three years down the line, such fears look overblown – those headline-grabbing studies were not as robust as they first appeared, and frontline experience suggests that AI-written essays are bland, banal, repetitive and unoriginal, not as a result of current limitations of the technology but precisely because of its probabilistic workings. Hallucinations and absurdities abound, precisely because GenAI is not thinking and knows nothing, making it a resource to be approached with caution and care – a lesson that seems to have been learned by many students even as it continues to evade, say, some
At the same time, there is a growing, but perhaps still furtive, sense that some GenAI tools and capabilities might be useful for research and teaching, even in the humanities. While there are many scholars in such disciplines, especially, who wish to hold the line against any academic accommodation of AI – particularly because training LLMs has involved theft of copyright works on an epic scale and a potentially mind-boggling environmental price – there is every likelihood that the stable door is banging in the wind and its GPT Image 1-generated occupant long since departed.
The important thing will be to remain alert to the ethical dangers and sceptical of potential uses, while acknowledging that the tech is going to be hard to avoid. Academic publishers are licensing their titles en masse to AI companies, and the capabilities are now embedded in much of the software academics and students use on a daily basis. A flat refusal to countenance AI, or a blanket prohibition on doing so, will not help lecturers or students work out how to engage with it safely.

This is all the more important because there are some areas of the university, and some disciplines, which have been less concerned about the advent of the AI age. Perhaps because the humanities, in particular, are focused on writing as a mode of thinking, and on intellectual originality as a core virtue and axis of assessment, machine writing appears an obvious threat. In areas with legitimately different disciplinary commitments, it can look much more like a potentially useful aid.
Such assumptions appear to underpin proposals for the deployment of AI tech not just in teaching or research but also in some of the key processes that govern university life. Hence, the Bristol survey’s question whether GenAI might be used not only in the writing of the narrative components of REF submissions, but also in the processes of panel assessment which will determine scores and results.
Nobody likes the REF – but it is the Procrustean bed we must apparently lie in if we are to have the QR funding on which so much research activity depends. Precisely because of the REF’s obvious limitations, reviewers and panellists try to bring both care and rigour to the impossible job of grading outputs, a process that leans heavily on the exercise of ineliminably human judgement.
To outsource any of that to AI would be to place far too great a trust in a technology that is not, in fact, intelligent, and whose workings remain in some ways opaque even to its creators. Here, at least, our way forward should be clear.
James Loxley is a professor of early modern literature at the University of Edinburgh.
‘A REF AI platform would free scholars for true intellectual appraisal’
The UK’s REF has become a monumental undertaking, requiring expert review of nearly 190,000 research outputs in the 2021 cycle at an estimated sector-wide cost of . Unsurprisingly, the advent of generative AI has sparked considerable interest as institutions seek ways to reduce this burden.
Against this backdrop, a recent by the University of Bristol and Jisc asked, “Should generative AI tools be used for the REF?” We argue here that the question is already obsolete; AI use is widespread and accelerating. The real challenge is to ensure that GenAI is applied responsibly so that it strengthens the REF process and serves the research community.
Today GenAI permeates many aspects of research. Commercial language models and AI tools routinely summarise manuscripts, identify citations, draft narratives and develop research arguments. Many academics and students use these tools . Attempting to ban them simply drives their use underground, suppressing open discussions and stifling sharing of best practice.
Yet rejecting a ban is not the same as endorsing uncontrolled adoption. Careless use of AI poses significant risks, as demonstrated when US attorneys were for filing a brief containing six non-existent ChatGPT-generated precedents. Without a common framework, researchers, departments and institutions will choose their own model, prompts, level of human review and disclosure. The result is a patchwork of practices, enabling well-resourced universities to establish responsible and technically advanced methods while less-funded peers fall behind, magnifying institutional inequities.
For REF 2029, the real choice facing the academic community therefore is: rally behind the adoption of a commonly agreed platform, or accept unchecked proliferation of ad hoc, opaque tools. The only credible response is to seize the initiative through strategic development of a custom AI system underpinned by the REF’s : inclusion, equity and transparency. This way we can ensure that AI is used responsibly, with proper oversight and rigorous evaluation, and designed to complement human intelligence, rather than replacing it.
A purpose-built REF AI system would restrict algorithms to the tasks they perform reliably – ie, data aggregation, pattern detection and process assurance – while preserving core qualitative judgements for human experts. By delegating repetitive, data-heavy tasks to algorithms, REF panellists can focus on the nuanced assessments of originality, significance and rigour that define research excellence. Research excellence is multidimensional and contextual, which is something best judged by experts who retain control over substantive evaluations while AI handles the clerical heavy lifting.

Unlike previous attempts, such as using a or to assign evaluation scores directly, the REF AI platform we propose is more technologically advanced, built on the most recent AI developments. It deploys a sophisticated ecosystem of specialised AI agents with authenticated access to paywalled journals, citation databases and scholarly archives. These purpose-built agents work collaboratively, each handling specific academic evaluation tasks while operating within a secure, auditable framework designed exclusively for research assessment. Everything is constantly evaluated and all aspects of the system, including its system prompts, are exposed to the users.
The integration of AI into the REF must adhere to strong governance principles, necessitating independent oversight by ethics and responsibility experts. The system would provide comprehensive uncertainty metrics and hallucination detection, with each automated operation recorded, bias-tested and fully traceable. A sandbox environment with a public API would permit external audits and let institutions test submissions ahead of time, removing the incentive to develop unverified in-house tools and reducing sector costs.
Does this sound like wishful thinking or science fiction? We, the authors, have already built such regulated AI platforms for pharmaceutical research and regulatory dossiers that satisfied auditors and scientists. The engineering components can be in place within months. The much greater challenge is the consultation that defines the detailed scope, risk tolerance, audit trails, access rights and performance metrics. Ensuring early engagement with users, independent regulators, and data-governance teams anchors the platform in policy and withstands scrutiny. If any AI platform is to have relevance for REF 2029, the time to act is now.
A sector-wide, audited AI platform would uphold REF 2029’s values, cut administrative costs and free scholars for true intellectual appraisal. The technology is ready; what remains is collective resolve and disciplined governance. As Christian Lous Lange warned in 1921: “Technology is a useful servant but a dangerous master.”
Caroline Clewley is an AI futurist at Imperial College London, advising on the integration of generative AI into education, and leads Imperial’s flagship I-Explore programme. Lee Clewley is vice-president of AI at eTherapeutics, a drug discovery company, and was formerly head of applied AI at GlaxoSmithKline and a postdoctoral researcher at the University of Oxford.
‘骋别苍础滨’蝉?“black-box” nature stands in opposition to the transparency essential to legitimate academic evaluation’?
As the UK higher education community looks ahead to REF 2029, discussion has perhaps inevitably turned to the possibility of incorporating generative AI tools into the process. While some commentators, like , argue that using GenAI for the REF is a “no-brainer” that could help to reduce the substantial financial and labour burdens that submission currently imposes on the sector, this does not adequately address the significant concerns such use of AI raises.?
The previous REF exercise cost universities an average of ?3 million each on preparations. The investment in staff time was equally expansive – reviewing , demanding countless hours from academics and professional services staff.?
With such intensive resource demands, it’s understandable that institutions preparing for REF 2029 might consider whether GenAI could streamline this process to reduce costs and staff time. Given the vast quantity of research such AI tools have already been trained on, the need to ensure a standardised and criteria-driven reviewing process free from subjectivity and applied consistently, not to mention the efficiency of AI in comparison with a human reviewer, it may indeed seem like a “no-brainer”.?
However, the apparent benefits GenAI may bring to institutions in terms of speed, efficiency and cost reduction are overshadowed by potential harm to those already marginalised within institutions and the risk of entrenching existing biases.??

The use of GenAI also raises profound questions about authenticity and the value of the human dimension of research evaluation. What happens to considerations of lived experience, positionality and self-awareness when AI becomes the evaluator? How do we account for the nuanced understanding of unconscious bias in both the conduct and review of research??
For GenAI to work effectively in the REF submission processes, several critical conditions would need to be met:?
- Universities would need to develop purpose-built LLMs rather than relying on commercial GenAI tools, ensuring alignment with specific REF objectives and academic standards.?
- Training data would require meticulous curation to provide diverse knowledge bases, acknowledging and adjusting for inconsistencies and bias in the scholarly communication.
- Algorithms would need to be explicitly designed to account for systemic inequity in research funding, promotion decisions, seniority, workload allocations, and institutional non-research commitments, all of which influence output and hence whose work is submitted for the REF in the first place.
- Systems would need to recognise and account for documented biases in academic evaluation, including citation patterns favouring certain demographics and disciplines.
- Programming would need to disregard potentially biasing factors like metrics, publication venue, gender, departmental affiliation and academic position, and also prevent indirect inference through writing style analysis, linguistic patterns, or topic selection that might serve as proxy signals for demographic information.
- Design would need to avoid temporal bias, where topics with less historical representation would be assessed as less significant simply because they have fewer precedents in the literature, disadvantaging emerging topics or previously marginalised research areas.?
- Assessment criteria would need to acknowledge the varying suitability of AI evaluation across disciplines – recognising that STEM research might be more straightforwardly assessed than humanities scholarship or interdisciplinary work requiring nuanced contextual understanding.
- Implementation would need to balance transparency with opacity to maintain trust while preventing “gaming” of the system.
While theoretically possible, such sophisticated AI systems for REF 2029 present nearly insurmountable challenges. Developing bespoke, bias-mitigating AI tools would require immediate substantial investment from universities. The financial resources needed would likely exceed the very cost savings that make GenAI initially appealing, even with sector-wide collaboration.?
Using commercial AI tools trained on existing academic literature risks amplifying inequalities in academic evaluation and compromising the REF’s legitimacy as an academically-driven process. This approach would effectively outsource academic responsibility to external corporations whose priorities diverge from the values underpinning research assessment.?
There is a fundamental misalignment between commercial AI tools and academic assessment needs. Commercial tools prioritise user engagement, personalisation, generalisability and commercial application, rather than academic rigour, disciplinary nuance and transparency. Their training data includes vast amounts of internet content unlikely to reflect academic standards (let alone its level of bias, if not outright illegality), and their “black-box” nature stands in opposition to the transparency and accountability essential to legitimate academic evaluation.??
While financial and time pressures make AI-based solutions appealing, the key elements required for an effective, fair GenAI implementation in REF processes are currently missing. Without considerable investment in purpose-built systems designed specifically to counteract academic biases, GenAI tools risk accelerating and entrenching a conservative system that will further privilege established research traditions while systematically disadvantaging innovation and diversity in academic enquiry – precisely the opposite of what research assessment should encourage.?
Caroline Ball is academic librarian (business, law and social sciences) at the University of Derby.
请先注册再继续
为何要注册?
- 注册是免费的,而且十分便捷
- 注册成功后,您每月可免费阅读3篇文章
- 订阅我们的邮件
已经注册或者是已订阅?