
Confused by research evaluation and its potential impact on career progression? Here are some pointers as to how things work in different parts of the world.
Steve Goodman had just completed a three-page referee’s letter for an academic colleague who was going for promotion in a law department, when he received a bundle of papers that surprised him. The package contained the recommendation letters of other referees. “They were eight to ten pages long,” he says, adding that he had never seen letters like these before.
Goodman, an epidemiologist at Stanford University in California, describes what he read as having been written by people who had digested the colleague’s books, engaged with their arguments and produced a commentary similar to something you’d publish. His letter, by contrast, took half a day to write. His legal academic colleagues told him it was standard for law researchers to set aside a week to write referee letters, and that they received credit towards their own career development for doing so.
Goodman was stunned. The idea that you would do a deep dive into someone’s scholarship to write a recommendation letter is “inconceivable in biomedicine”, he says, where it is standard to focus on a researcher’s publications and assess the importance and value of their contributions. Goodman did not know the scholar in question, but was asked to write the letter because their work bridged law and biomedicine.
Unbeknown to him, Goodman had stumbled across the patchwork nature of research assessment in the United States. Even within an institution, the process by which the work of researchers in different disciplines is evaluated can vary widely. It is a similar picture elsewhere.
Some countries run huge nationwide programmes to assess the quality of research, often drilling down to measure the success of individual departments or staff members so that they can be compared.
The exercises cost millions of dollars and require countless hours that, some argue, could otherwise be dedicated to research. But governments champion them because they help to ensure accountability in the distribution of taxpayers’ money. In other countries, the United States among them, there are no formal mechanisms to assess research nationwide, and evaluations are reserved for individuals’ grant applications or hiring and promotion decisions.
Researchers have mixed views on assessment. Some say it drives improvements in quality, but others argue that it has a negative impact on research culture and morale. And in some countries, such as Argentina, many scientists have earned themselves a promotion but are yet to see it because of budgetary constraints, fostering distrust of the evaluation process.
But across the diverse ecosystem of research assessment, some things remain constant. Assessors often use shortcuts, such as quantitative metrics based on citation data, to judge research quality. Although quicker, such metrics lack the nuance of more time-consuming qualitative peer review, causing tension for those being evaluated.
Many systems are now changing, having recognized the negative effects of assessment on research culture, and some are grappling with how artificial intelligence could support decision-making.
History
Forty years ago, the United Kingdom wanted to have a better way to distribute research funding. The current system is called the Research Excellence Framework (REF), and is run every six or seven years. The stakes are high because the results dictate how £2 billion (US$2.7 billion) in public research funding is distributed annually among the country’s universities, which number more than 150. It aims to capture a wider picture of research, including measurements of the social, economic and political impact and how the public engages with it. The next REF, expected to run in 2029, is likely to place greater emphasis on these elements and on the environment in which research takes place, in a bid to improve research culture.
The UK model inspired a wave of countries and territories to follow suit. In the early 2000s, for example, Hong Kong, New Zealand and Australia adopted regular nationwide exercises.
Many of these schemes focus on researchers’ outputs, which can include journal articles, data sets and contributions to conferences, and which are set before an expert panel for critical appraisal. At the nationwide level, these panels might convene world-leading authorities with specific criteria to consider, but for other assessments, such as hiring decisions, the panels might involve more-informal gatherings of departmental members to discuss applicants, for example.
These deliberations take time, and so assessors use shortcuts to help them to understand research quality, shortcuts that often rely on bibliometric data to attempt to describe the significance of their publications. Metrics such as paper or citation counts could feed into discussions or be used alone to determine research quality.
As research assessments rose in prominence, scientists became increasingly concerned about the limitations of the evaluation methods, and in particular, with one metric — the journal impact factor, a measure of the average number of citations that articles published by a journal in the previous two years have received in the current year. It was originally designed to help librarians to decide which journals to subscribe to, but is often used as a proxy for the quality of individual journal papers, or even the authors. Critics say that such a proxy loses all the nuances of how research contributes to advancing knowledge in a field, innovations or benefits to society. “It’s when you are using the journal impact factor to determine whether someone should get a job that you are in very dodgy terrain,” says James Wilsdon, who studies research policy at University College London (UCL).
During the 2010s, a wave of initiatives began to flag concerns about journal impact factors and other metrics in evaluations, and to suggest better ways to use these measures (see ‘Four initiatives that champion responsible research assessment’). These ideas have received widespread support, but metrics continue to be widely used. A global survey of almost 4,000 researchers by Springer Nature, Nature’s publisher, published in April, found that 55% of respondents are mostly or entirely assessed using metrics, with just 12% saying they were mostly or entirely assessed by qualitative means. (Nature is editorially independent of its publisher.) The full anonymized survey data are available here.
The survey found that metrics-based evaluations are often focused narrowly on journal articles, with little consideration of other research outputs, such as data sets. Most reseachers also wished for a balanced weighting of quantitative and qualitative factors, but have concerns about the subjectivity, bias and workload of the latter.
In some countries, metrics, although imperfect, can be helpful, argues Wilsdon. In systems in which nepotism or corruption are rife, using metrics to compare citation counts of researchers, for example, could be useful and corrective, he says.
But he adds that there has been a “growing clamour” to approach research assessment in a more sensitive and holistic way. Many research-assessment programmes look at past work . However, a report1 published in May on behalf of the Global Research Council (GRC), which represents more than 50 government research funders worldwide, found that evaluations over the past five years show a gradual shift towards more forward-looking elements. These components, known as formative assessments, can include giving weight to more than just the research output — the REF scores UK institutions on the impact of their work in society, for example. Researchers know that they can use these examples of impact in their CVs. “That’s quite an important axis of change in assessment, because it’s using assessment in a much more deliberate way to try and shape research systems,” says Wilsdon, who heads the Research on Research Institute (RORI), a non-profit organization based at UCL, which led the work on the report for the GRC.
Promotion policies
Funders are also experimenting with fresh ways to assess researchers. One method that is gaining traction is the narrative CV. These offer a structured, written description of a scientist’s broad contributions and achievements, in contrast to the conventional CV, which typically lists publication and employment histories, with little context.
Narrative CVs really tell you about the researcher, says Yensi Flores Bueso, a cancer researcher at University College Cork, Ireland. The format helps to level the playing field for researchers from different countries and backgrounds, who might not have access to the same resources.
In January, Bueso and her colleagues published an analysis2 of more than 300 promotion policies in more than 120 countries, covering how institutions and government agencies promote people to full professorships. The analysis broadly suggests that countries in the global south tend to use quantitative metrics, whereas high-income countries tend to prioritize qualitative aspects such as a researcher’s visibility and engagement.
Much of how research is assessed is cultural, says Bueso, adding that in some African countries, ”the emphasis is on social commitments”. Promotion documents ask for more details about how a researcher has served civil society, which committees they sit on and their voluntary work, rather than their publication history, she adds. By contrast, some southeast Asian countries rely heavily on metrics and point scoring, with researchers being awarded points according to where in a paper’s author list their name appears or the journal’s impact factor, for example.
Bueso knows from her own experience about the flaws of relying on metrics such as paper counts or the prestige of journals in which work is published. Originally from Honduras, she started her career there by establishing a laboratory where her research fulfilled a social need: determining the rates of disease in certain populations and which diagnostic assays worked best. But the facility was poorly resourced. In 2020, she moved to the United States, where she worked in a series of large, well-funded labs and made contributions to big projects, which are likely to be published in high-impact journals.
“There is not a chance that my contributions from my work in Honduras will ever be seen in any peer-reviewed journal” owing to lack of funds, she says. “What matters at the end is the quality of the research and how you adapt to a team,” she adds.
In some countries, huge nationwide assessment programmes have driven changes to the research landscape. In the United Kingdom, for example, research assessment previously helped to concentrate research funding in a number of elite institutions. And in Australia, where administrators have been evaluating research nationwide since 2010, a 2023 review found that the exercise had negative effects on research culture. Policymakers were left questioning the benefit of the system, and the programme, called Excellence in Research for Australia (ERA), was halted, with researchers wondering what would come next.
Regional reform
The most recent ERA exercise, in 2018, saw institutions submit research outputs and data on funding and citations across eight subject areas. Expert committees then evaluated the work in various ways. The extent to which panels relied on bibliometrics depended on the discipline, with science and engineering subjects relying on metrics more heavily than did the humanities and social sciences, which leant more on peer review.
The results of the exercise — a score on a five-point scale, ranging from below to well above world standard — had only limited impact on institutional funding in some years, and were mostly used by policymakers to compare institutions across disciplines and internationally.
The original idea for ERA was prompted by concerns to do with quality and value for money in Australian research, says Jill Blackmore, an education researcher at Deakin University in Melbourne, Australia. It was introduced after a period in which many institutions amalgamated and policymakers felt that research capacity was lacking. “It was very much a focus of quantity, not quality,” she adds.
The 2023 ERA review by the Australian government found that the exercise pitted universities against each other and saw them poaching staff and duplicating expertise, rather than working together. The process was also onerous and costly. Evidence provided to the review by the University of Sydney, for example, found that its submission, due every three years in line with the assessment cycle, required more than 40,000 hours of staff time, costing in excess of Aus$2 million (US$1.3 million) in salaries alone.
Although researchers are now breathing a sigh of relief, they are awaiting what might come next, says Blackmore. Institutions have been doing pre-emptive work around assessment, just in case, she says, adding, “There is no magic bullet to this, because you’re trying to measure value for money.” She also argues that ERA has served its purpose. Now, in education, she says, “we have got high-quality research, and we didn’t for a while”. She doesn’t think ERA is necessary any longer. “Other countries produce quality research without it.”
As in Australia, policymakers in Japan began assessing research on a nationwide scale as part of broader higher-education reforms. Since 2004, Japan has run the National University Corporation Education and Research Evaluation programme every six years. However, the results have little bearing on funding; instead, they inform university planning and higher-level government policy.
But this has downsides, with many universities “feeling the process lacks tangible incentives or impact”, says Takayuki Hayashi, a research-policy scholar at the National Graduate Institute for Policy Studies in Tokyo. Unlike in the United Kingdom, where the national assessment programme carries huge weight, in Japan “many researchers view the current evaluation system with a sense of detachment or fatigue, largely because the results have minimal influence on funding or career advancement”.
For each faculty or school, universities must submit detailed information about research outputs, reports on the progress and achievement of mid-term objectives, and both qualitative and quantitative data about education and research activities. This includes data on student numbers and external research funding, as well as citation indicators.
For each institution, “the final judgements are made based on the expert panel’s qualitative assessment, taking the data into account but not relying solely on it”, says Hayashi. Then, each university, and its faculties and graduate schools, are publicly ranked using a tiered system. The outputs of individual researcher are scored on a three-point scale.
The results inform each institution’s plans for the next six years, and are used by Japanese policymakers to help spot systemic issues. But because researchers do not think the assessments have any impact, universities focus more attention on a separate Ministry of Finance scheme, which uses quantitative metrics to determine funding levels for part of institutions’ core budgets. The ministry has also been pushing for a more competitive process that would use citation data to allocate block grants for research, says Hayashi.
The system is different again in Argentina, where evaluation typically focuses on individuals rather than departments. Every year, the government funding agency, CONICET, is supposed to evaluate applicants who want to undertake PhDs, become postdoctoral researchers or take up permanent positions. The process involves three groups of people, including external evaluators and the board of CONICET, considering candidates’ research proposals, academic experience and background. Unlike in some countries, where researchers must wait for a position to become available, CONICET funds several new posts every year.
But recent changes in central funding for science have left many researchers frustrated. Research-funding cuts under libertarian President Javier Milei, who took office in 2023, have left Argentinian science in a precarious situation, say researchers.
Clara Giachetti, a biologist at the Marine Organisms Biology Institute in Puerto Madryn, Argentina, was promoted from postdoctoral researcher to a permanent position in 2022, and was due to take up that role in 2023, but so far has not. There are more than 800 researchers in Argentina in a similar position, according to Giachetti, who is frustrated that her 2022 evaluation went ahead despite the funding for her salary in her permanent role being unavailable. “It’s very disappointing. I play the game like they tell me I should, and I won the prize, but I can’t have it,” she adds. The 2023 evaluation process has been delayed.
AI-powered evaluation
As science researchers in some countries struggle to stay in academia, other nations are looking at how technology can support future assessments. Artificial intelligence is already being used by some funders to select or assign panel members for reviews, but many fewer are using it to organize proposals or research outputs for assessment, according to Wilsdon’s GRC report. Funders acknowledge the complexity of using AI but see it as an important tool.
“When the AI stuff first emerged, there was a tendency by many of those who’ve been involved in debates over metrics to pigeonhole AI approaches as inevitably susceptible to the same problems” of loss of nuance, says Wilsdon.
But with the advent of large language models, “there are some exciting methodological possibilities”, he says.
Back at Stanford, the spirit of the recommendation letters for the legal scholar is embodied in the internal reports that are used for promotion in the school of medicine. A key part of the promotion package is a five-page narrative report, written by the candidate’s committee, that summarizes their contributions, says Goodman. “You are required to do an in-depth analysis of one of the candidate’s most important publications, taking about one page.” Any mention of publication numbers, authorship position or journal impact factors is not allowed, he adds.
Stanford also now offers an optional rigour and reproducibility format for a CV and personal statement that gives researchers a chance to highlight their publications and describe how they conduct their research to enhance its validity and reproducibility. “That’s a step forward, but we are just starting down that road, and only time will tell how much traction it will get,” says Goodman.
Four initiatives that champion responsible research assessment
Over the past decade, several reports, statements, organizations, initiatives and buzzwords have emerged around the idea of responsibly assessing researchers. Here’s the low-down on who is doing what.
San Francisco Declaration on Research Assessment (DORA)
It all started with this declaration, inspired by discussions at the 2012 annual meeting of the American Society for Cell Biology in San Francisco, California. DORA enshrines the principle that journal-based metrics, such as impact factors, should not be used as a proxy for quality when making hiring, promotion and funding decisions. It currently has more than 26,000 signatories (individuals and organizations) in 166 countries. They include Springer Nature, Nature’s publisher.
Metric Tide report
This 2015 report, co-authored by James Wilsdon, a research-policy specialist at University College London, was commissioned as part of an independent review of the role of metrics in the research system by the higher-education funder Research England. The review found that although metrics are widely misunderstood and contested, they can be helpful when carefully selected. It proposes a framework for the responsible use of metrics and gives targeted advice for university leaders, research funders, publishers and individuals.
Leiden Manifesto for Research Metrics
A 2015 Nature Comment article3, written by five researchers and leaders in public policy and scientometrics at universities in the United States and Europe, expressed concerns that evaluation decisions were increasingly being driven by data instead of expert judgement. The article set out the Leiden Manifesto: ten principles for giving metrics a role in the assessment of research performance. They include using metrics alongside qualitative judgements, considering variations in publications and citations by research field, and regularly scrutinizing and updating the indicators used in evaluations.
Agreement on Reforming Research Assessment
Published in July 2022 by the Coalition on Reforming Research Assessment, this agreement includes more than 770 organizations worldwide that are committed to reforming research assessment. Coalition members include institutions, research funders, assessment authorities and professional societies, which have set a timeline for reforms.
Find the original post and more great content on the Nature Careers blog – https://doi.org/10.1038/d41586-025-02498-7

Print This Post