Skip to content

Department of Education

Viewing archives for Centre for Educational Assessment

The education system is not the same as before the pandemic.  We need policies that address the current situation if we are to improve standards. Professor Jo-Anne Baird (Director of the Oxford University Centre for Educational Assessment) speaks to The Today programme.

Listen here:

Jamie Stiff and colleagues have looked at the 221 academic publications on the Progress in International Reading Study, the world’s largest reading assessment.  They found that:

  • Most articles used PIRLS data for secondary data analysis.
  • Research related to attainment gaps has increased since 2015.
  • 20% of articles were critiques of PIRLS constructs and/or procedures.
  • PIRLS remains underutilised for researching reading literacy.

The full article can be found here:

Stiff, J., Lenkeit, J., Hopfenbeck, T.N., Kayton, H. and McGrane, J.A. (2023) Research engagement in the Progress in International Reading Literacy Study: A systematic review. Educational Research Review, 40.


Strengthening the design and implementation of the standardised student assessment reform of the Flemish Community of Belgium

The Flemish Community of Belgium has been a stronger performer on international tests, however, performance has been deteriorating and more students are failing to reach basic proficiency levels than before. In response, a series of reforms have been initiated including the introduction of standardised tests, initially in Dutch and mathematics, to evaluate student, school and national performance. Michelle Meadows worked with Marco Kools, Inés Sanguino Martínez and Claire Shrewbridge from the OECD Implementing Education Policies team and Professor Inge de Wolf from Maastricht University to support the successful design and implementation of the tests.

The report presents the analysis, key findings and recommendations including the need to clarify the main purpose of the tests. As the current policy gives equal weight to the use of the data at system, school, classroom and student level, the tests may not meet all the expectations for their intended use. For example, the design decisions taken so far best support system level monitoring of student performance rather than the use of test data by teachers to improve their pedagogical practice.


The Review, led by Professor Louise Hayward recommended

  • Scottish Diploma of Achievement comprising Programmes of Learning, Project Learning and the Personal Pathway.
  • Removal of external examinations in fourth year
  • A national strategy for standards
  • A cross-sector Commission on Artificial Intelligence
  • Cultural change processes, including professional development for teachers

Professors Jo-Anne Baird and Gordon Stobart were members of the Independent Review Group and led the Qualifications Community Consultation Group, composed of:

Dr Lena Gray (Qualifications and Assessment Consultant/Researcher) • Emeritus Professor Jenny Ozga (University of Oxford, Department of Education) • Professor Pasi Sahlberg (Southern Cross University, Education) • Emeritus Professor Dylan William (University of London, Assessment) • Professor Ewart Keep (University of Oxford, Education, Training and Skills) • Professor Anne Looney (Dublin City University, Executive Dean of the Institute of Education)

Some news coverage and reactions:

Hayward Review: Expert says blueprint to scrap S4 exams in Scotland will benefit pupils, teachers and Scottish society.  The Scotsman

7 key messages as Hayward report on assessment unveiled, TES magazine

Hayward Review: Considering the future of qualifications and assessment. FE News

NASUWT comments on Hayward Review


Sara’s research interests are situated where education, policy, assessment and technology meet. She is passionate about equitable access to quality education for all.

Before joining the Department of Education, Sara was studying for her PhD in Education at the University of Sydney where she was awarded the inaugural NESA scholarship from the Centre for Educational Measurement and Assessment (CEMA) and the University of Sydney Doctoral Travel Scholarship which enabled her to spend three terms as a Recognised Student in the Oxford University Centre for Educational Assessment (OUCEA). Her doctoral research focuses on national education policies and the international assessments used to measure their success.

Prior to her PhD studies, Sara gained a MEd (Leadership) from the University of New South Wales where she wrote her thesis on the future of schooling and emerging technologies. She also holds a Graduate Certificate in Psychology from UNSW and a BEd in Primary Education from Edith Cowan University.

During her career, Sara has enjoyed many years as a classroom teacher, school leader, and lecturer in curriculum development. Most recently, she has specialised in assessment, working as a C-suite executive for a global EdTech company and on the OECD’s global PISA for Schools program.

Sara is a Research Officer with the Learning for Families through Technology (LiFT) project which is a collaboration between Ferrero international and three research groups in the Department of Education: Applied Linguistics, Learning and New Technologies, and Families, Effective Learning, and Literacy. The project aims to examine key questions about children’s learning with technology, with a focus on language and literacy skills. Sara’s contribution to the project is centred upon the development of digital book platforms including guidelines for the ethical and effective use of Generative AI in Education and literacy learning.

Sara is also a Department Associate with the Oxford University Centre for Educational Assessment.


What can the Covid exam crisis teach us about policy making?

Based upon the article, Knowledge, expertise and policy in the examinations crisis in England


A newly published report by the Department of Education has shown that the reading performance of year 5 pupils in England has remained consistent despite the COVID-19 pandemic. Findings from the 2021 Progress in International Reading Literacy Study (PIRLS) in England reveal that while most countries showed downward trends in pupils’ reading achievement since 2016, England’s results showed no statistically significant change and remained above the international average. This is despite the disruption the pandemic caused to teaching and learning.

The PIRLS study, which takes place every five years, provides an internationally comparative picture of reading literacy and shows trends in this area over time. In the PIRLS 2021 cycle, 57 countries participated, including England.

In England, the PIRLS 2021 research was conducted by the Oxford University Centre for Educational Assessment (OUCEA) based at the Department of Education, in collaboration with the research team at Pearson, who were responsible for the administration of the PIRLS assessments in schools under the leadership of National Research Co-ordinator, Associate Professor Grace Grima. The Oxford research team included Dr Ariel Lindorff, Jamie Stiff and Heather Kayton.

Dr Lindorff, Associate Professor at the University of Oxford’s Department of Education and primary author of the PIRLS 2021 National Report for England, said: “PIRLS was not designed to measure the impact of COVID-19 as such, so we can’t be sure of all the ways it affected teaching and learning, but the fact that we do not see a decline in pupils’ reading achievement in England since 2016 is encouraging.

“While we can’t link the results to any specific initiative, they do suggest that, at least to some extent, the combination of COVID-19 recovery efforts made in England have been successful in supporting pupils’ reading skills.

“People look to PIRLS and other international large-scale assessments for comparisons across countries, but we need to take into account that the pandemic affected countries differently. England appears among the highest in average reading performance of the countries participating in PIRLS 2021, and this is a very positive result. We need to avoid over-interpreting England’s specific ‘ranking’, though. England’s position in this cycle reflects declines elsewhere, as well as the exclusion of some countries from rankings if they tested a different age group, rather than a straightforward increase in England.”

In welcoming the study’s completion, Pearson’s Director of Research, Dr Grace Grima, said “It is encouraging to see pupils in England continuing to perform well in reading after COVID. I would like to thank all the schools, teachers and pupils who helped make the study a success.”

Results also showed that the gender gap, which had already narrowed between 2011 and 2016, was further reduced in PIRLS 2021.

Internationally, the study was led by the International Association for the Evaluation of Educational Achievement (IEA).

The full report can be accessed on the Government’s website.

When humans score responses to open-ended test items, the process tends to be slow and expensive and potentially introduces measurement error due to the subjective nature of decision making.

Automated scoring is an application of computer technologies developed to address these challenges by predicting human scores based on item structure algorithms or response features. Unfortunately, there is no guarantee that a particular attempt to develop an automated scoring model will be successful because features of the assessment design may reduce the automated “scorability” of examinee responses to test items. Our presentation begins with an overview of automated scoring and scorability. We then describe two applications of automated scoring: Pearson’s Intelligent Essay Assessor (IEA) and Math Reasoning Engine (MRE). We continue by illustrating the concept of automated scorability, identifying features of prompts and scoring rubrics that we have found to either improve or reduce the chances of being able to model human scores on test items. Finally, we provide guidelines for item developers and rubric writers that facilitate the automated scorability of examinee responses.


Lisa Eggers-Robertson is responsible for product planning and execution of the automated math open-ended response scoring service. She has served as program manager, supporting the College Board’s Accuplacer and SAT programs and other state program as well as a product manager for an interim classroom assessment product. She has spent the last 9 years of her career within the automated scoring team. She is certified as a Project Management Professional (PMP) by the Program Management Institute (PMI) and holds a BBA in Management of Information Systems from the University of Iowa.

Gregory M. Jacobs serves as technical lead and senior member of Pearson’s automated scoring team that is responsible for product planning and execution of the Intelligent Essay Assessor™ scoring service. He works with scoring staff and assessment program teams to provide leadership and execution of the automated scoring process and has over 3 years of industry experience building machine learning models, with an emphasis on Natural Language Processing. He also holds a Juris Doctor from Catholic University and has over a decade of legal experience as a commercial litigator handling complex contractual language disputes.

Edward W. Wolfe is responsible for development and delivery of Pearson’s automated writing and math scoring services, including the Intelligent Essay Assessor™, Continuous Flow, and Math Reasoning Engine. He works with scoring staff and assessment program teams to provide leadership and oversight of the automated scoring process on all programs. Dr. Wolfe holds a PhD in educational psychology from the University of California, Berkeley, and has authored nearly 100 peer-reviewed journal articles and book chapters regarding human and automated scoring and applied psychometrics