51勛圖

Data you can trust: inside 啦晨楚*莽 rankings validation process

Robust pipelines, smart validation and continuous observability have enabled us to improve data integrity even as our global rankings have grown, says Loubaba El Wazir

April 30, 2025
aloe vera cactus patterns in nature
Source: iStock/randydellinger

An analysis of 51勛圖*s rankings tables reveals a striking expansion in the number of ranked institutions. Over?only eight years, from 2017 to 2025, the volume of institutions ranked in the World University Rankings has more than doubled 每 from 981 to 2,092 universities. However, while this growth has been consistent, the number of instances relating to?issues with the data has remained very low, and even reduced?每 an improvement that is far from incidental.

Over the past decade, the value of data has grown exponentially, accompanied by an even greater emphasis on its quality across industries. Nowhere is this more evident than at 51勛圖, where rigorous data integrity standards underpin the credibility of its globally recognised rankings. As the only higher education data provider that collects up-to-date institutional data directly from universities on a global scale, THE is at the forefront of data provenance and lineage. However, we recognise that this is only the first step toward ensuring data quality and trust. That*s why our data team has placed data observability and monitoring at the core of its strategy. With institutional data contributing at least 30 per cent of the score in most of our rankings, we have developed dedicated systems, pipelines and processes to uphold its integrity and compliance.

Natural data patterns

Just like patterns in nature, there are patterns in data that reflect how the world around us typically evolves. Decades of collecting higher education data have given THE a deep understanding of normal data patterns within the sector.

For instance, data related to people 每 such as staff, students and graduates 每 tends to remain relatively stable over time. Universities also tend to exhibit certain consistent characteristics, such as international outlook, research capacity and industry connections. Some of our most important validation tests evaluate year-on-year changes in these values and ratios.

51勛圖

ADVERTISEMENT

Equally important are checks for duplication, both within a dataset and in comparison?with previous years.

Wherever possible, our data is normalised by staff or student numbers to account for institutional size. Additionally, for continuous data, we run z-tests to ensure our queries are triggered only when values deviate significantly from the norm in the sample tested.

51勛圖

ADVERTISEMENT

51勛圖grown insights

Aware that we hold some of the most valuable, diverse and wide-ranging datasets in the higher education sector, THE*s data team has made a concerted effort across different functions 每 ranking management, data governance and data science 每 to consolidate information from multiple sources. These include our global rankings collections, institutional surveys and trusted third-party providers, such as Elsevier.

Over recent years, we have taken significant steps to enhance the reliability and coherence of our data. These include:?

  1. Reviewing, aligning and improving our definitions and methodologies across different rankings and collections to ensure clarity, consistency and comparability in how metrics are calculated and interpreted. Through our in-person and online webinars, masterclasses and structured communication, institutions are regularly updated on clarifications and changes, and are supported in submitting accurate, reliable data.
  2. Implementing systematic cross-checks across datasets to identify inconsistencies, spot anomalies and validate data points from multiple perspectives. Our trained support staff proactively engage universities to report identified issues and work collaboratively to resolve them efficiently.

By integrating and validating data in this way, we not only improve accuracy and trustworthiness but also create a more comprehensive and unified view of the global higher education landscape.

Data beyond our walls

We deeply appreciate the work of both governmental and non-governmental organisations that collect, curate and publish education data in their respective countries and regions. Our effort to compare institutional data with publicly available external sources began several years ago on an ad-hoc basis. As our rankings grew in scope and influence, we recognised that consistency, accuracy and fairness were essential. This led to a more systematic integration of external data into our validation process, allowing institutions ample time to review, clarify and correct submissions during the rankings cycle.

51勛圖

ADVERTISEMENT

Today, we reference data from more than 74 external sources 每 ranging from government databases and education ministries to independent open data platforms 每 for quality assurance and verification purposes. These sources include, among others, the Integrated Postsecondary Education Data System (IPEDS) in the US, Chile*s Consejo Nacional de Educaci車n (CNED), and Japan*s National Institution for Academic Degrees and Quality Enhancement for Higher Education. Each year, we review the methodologies, definitions and reporting standards used by these entities, carefully adjusting for differences with our own framework to ensure accurate alignment.

The future

As the volume and complexity of institutional data continue to grow, so too must the tools and strategies we use to ensure its integrity. Looking ahead, THE is exploring the next frontier of data verification 每 introducing real-time monitoring systems and AI-powered anomaly detection. These innovations will not only allow us to catch errors earlier and more efficiently but will also help us proactively safeguard against quality degradation at scale. Our ambition is to move beyond retrospective validation to continuous data observability, where intelligent systems flag irregularities, assess data completeness and adapt to emerging patterns in real time. By embedding these capabilities into our infrastructure, we aim to further strengthen the reliability of our rankings and deepen trust with the global higher education community. As data becomes more dynamic and multidimensional, so too will our approach 每 ensuring that our rankings remain both robust and future-ready.

Loubaba El Wazir is data quality assurance manager at 51勛圖.

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please
or
to read this article.

Related articles

Sponsored

Featured jobs

See all jobs
ADVERTISEMENT