51吃瓜

AI is improving our rankings, but human expertise will remain vital

Large language models are becoming a powerful complement to human judgement in the validation process for our sustainability-focused rankings, says Victor Melatti 

六月 5, 2025
People sitting on binary code reading papers, illustrating how large language models are becoming a powerful complement to human judgement in the validation process for our sustainability-focused rankings
Source: Getty Images montage

At 51吃瓜?we take data integrity seriously because our university rankings hold considerable weight among governments, academic institutions, students and stakeholders worldwide.?

With the Impact Rankings growing rapidly since their inception in 2019, we’ve seen a surge in both participation and the volume of data submitted. In 2024 alone, we received more than 270,000 evidence documents from 2,152 institutions across the globe. (When we ask about policies and initiatives – for example, the existence of mentoring programmes – we ask universities to provide the evidence to support their claims.)?To put this into perspective, that’s the equivalent of reviewing 800 books, each 100 pages long.?

Notably, about 50 per cent of the evidence submitted was found to be not relevant, highlighting a real struggle among universities to identify and submit appropriate supporting materials. A system capable of performing binary classification?– deciding whether evidence is relevant or not – already represents a significant improvement to our evaluation, as it allows human validators to focus only on the evidence that meets a minimum threshold of relevance.?

The scale of this task makes it nearly impossible to manage using traditional human validation methods alone, while ensuring consistency and accuracy. That’s why we began exploring how artificial intelligence, particularly large language models (LLMs), could support us. Our mission remains the same: to uphold the highest standards in data quality and validation, but we now have an opportunity to scale this effort in ways that were previously unthinkable. ?

Our approach to integrating AI into the validation process is both strategic and pragmatic. Rather than applying AI across all submissions, we use it specifically for validating evidence that is machine-readable – primarily HTML and well-structured PDFs. AI’s role is narrowly defined: it determines whether a document is relevant or not, based on the specific indicators of the Impact Rankings. This binary classification forms the core of our ensemble method. If the AI deems a document relevant, it is passed on to human validators who then assess whether the evidence is generic or specific. If the AI classifies it as non-relevant, the document is discarded.?

To maintain quality assurance and guard against hallucinations or misclassifications, a portion of the AI-rejected documents are manually reviewed by human validators. This approach allows us to combine the speed and scalability of AI with the nuanced judgement of human evaluators, creating a validation system that is efficient, scalable and reliable. By carefully limiting AI’s scope and introducing safeguards, we’ve developed a process that enhances our overall accuracy and ensures that human insight remains central to our decision-making.?

The primary strength of using AI for evidence validation lies in its scalability and efficiency. AI can process vast quantities of data at a speed that far exceeds human capabilities. In our 2024 tests, the AI system demonstrated same-level accuracy compared with human validators, making AI particularly effective at handling repetitive or high-volume tasks where consistency is key.?

However, the technology is not without limitations. One of the main weaknesses is its reliance on clearly structured input data. Poorly formatted or ambiguous documents can reduce AI accuracy. Moreover, AI still struggles with understanding context or intent behind certain types of evidence, which humans can interpret more intuitively. Ethical concerns around fairness and bias also require careful monitoring and mitigation. While AI doesn’t get tired or distracted, it does need to be constantly updated and reviewed to ensure continued performance. In sum, AI is a powerful complement – not a replacement?–?to human judgement in the validation process.?

Integrating AI into the validation process was not without its surprises. One of the most unexpected challenges was the variability in document formats and submission quality. While HTML documents were relatively easy for the model to interpret, scanned PDFs or embedded image text posed significant problems.?

Another unexpected hurdle was the alignment between AI-generated results and our existing quality assurance benchmarks. Early on, we found discrepancies where AI classified documents as relevant that human validators had overlooked, and vice versa. This raised questions not only about model performance but also about the subjectivity inherent in human validation. Moreover, we learned that implementing human-in-the-loop processes – while essential – added complexity to our workflows and demanded a balance between automation and oversight. These challenges reinforced the importance of continued training, feedback loops and iterative development to refine the system.?

One of the most valuable outcomes of integrating AI into our validation process has been the generation of a set of best practices, something we had long struggled to establish manually. Over the years, human validators created numerous documents, checklists and notes in an effort to align their judgements on what qualifies as relevant evidence. However, the volume and inconsistency of these materials made it difficult to maintain clarity or coherence.?

By processing this disordered collection of guidelines, AI was able to distil patterns and insights at scale – something only possible through automation. This allowed us to create a small but powerful library of best practices that not only improves internal alignment among validators but also helps institutions understand what constitutes strong evidence before they submit. These guidelines have already started to make a difference, enhancing the transparency, consistency and quality of our validation process. Ultimately, AI didn’t just help us evaluate evidence – it helped us better define the rules by which that evidence should be judged.?

Looking ahead, we’re optimistic about the potential for AI to become an integral part of our validation toolkit.?The results so far have demonstrated that LLMs can match – and in some areas exceed – the performance of human validators. We’ve already made substantial progress in improving our models, expanding the scope of AI to cover a growing portion of machine-readable evidence. As more submissions fall into this category, AI will naturally play a larger role.?

Our ultimate goal is to automate as much of the validation process as possible. Human expertise will always play a vital role in this exercise, but human validators will be increasingly focused on edge cases – such as documents that are not machine-readable?because of formatting, privacy concerns or access restrictions – as well as on indicators that carry significant weight in the final score. In these instances, we are relying on a highly experienced team of validators who can apply expert judgement and perform quality assurance to ensure high standards are maintained. These expert validators will also play a critical role in monitoring and refining AI performance through regular QA checks.?

In the future, we aim to expand this technology beyond internal use. We are developing tools and features that will help universities improve their submissions by providing clearer, AI-informed guidance and deeper insights into the ranking methodology. This will not only enhance the quality of the evidence received but also support institutions in understanding and engaging more effectively with our rankings framework.?

Ultimately, our vision is to use AI not just as a helper, but as a collaborator – working alongside human experts to uphold the high standards that define THE’s rankings.?

Victor Melatti is an AI scientist at 51吃瓜.?The?Impact Rankings 2025 will be published in late June.??

请先注册再继续

为何要注册?

  • 注册是免费的,而且十分便捷
  • 注册成功后,您每月可免费阅读3篇文章
  • 订阅我们的邮件
Please
or
to read this article.
ADVERTISEMENT