Skip to content

Department of Education

Oxford University Centre for Educational Assessment (OUCEA)


Jo-Anne BairdLouise HaywardMichelle Meadows & Zhanxin Hao (2023) Assessment and learning loss in England: never let a good crisis go to waste, International Journal of Inclusive Education, DOI: 10.1080/13603116.2023.2274112

Hayward, L., Baird, J., Godfrey-Faussett, T., Rhandawa, A., Allan, S., MacIntosh, E., Hutchinson, C., Spencer, E. and Wiseman-Orr, L. (2023) National Qualifications in Scotland: a lightning rod for public concern about equity in the pandemic. European Journal of Education: research, development and policy.

Michelle Meadows, Marco KoolsInés Sanguino Martínez and Claire Shrewbridge (2023) Strengthening the design and implementation of the standardised student assessment reform of the Flemish Community of Belgium. OECD.

Baird, J. (2023) Policy negotiation in ITE in England: A personal reflection. Chapter 4 in Ellis, V. (Ed.) Teacher Education in Crisis. The State, the Market and the Universities in England. 63 – 74. Open Access.

Lindorff, A., Stiff, J. & Kayton, H. (2023) PIRLS 2021: National Report for England. Department for Education.

Stiff, J., Lenkeit, J., Hopfenbeck, T.N., Kayton, H. and McGrane, J.A. (2023) Research engagement in the Progress in International Reading Literacy Study: A systematic review. Educational Research Review, 40.

Ozga, J., Baird, J., Saville, L., Arnott, M. and Hell, N. (2023) Knowledge, Expertise and Policy in the Examinations crisis in England. Oxford Review of Education.


Jo-Anne Baird, Thomas Godfrey-Faussett, Louise Hayward, Carolyn, Hutchinson, Ashmita Randhawa, Ernest Spencer, Lesley Wiseman-Orr and SQA colleagues (2022) Overview of the Perceptions of Standards in Scotland Project. Report 1 of 4.

Jo-Anne Baird, Thomas Godfrey-Faussett, Louise Hayward, Carolyn Hutchinson, Ashmita Randhawa, Ernest Spencer, Lesley Wiseman-Orr and SQA colleagues (2022) Perceptions of Assessment Standards in Scotland: Focus Groups with Stakeholders. Report 2 of 4.

Jo-Anne Baird, Thomas Godfrey-Faussett, Ashmita Randhawa, Louise Hayward, Carolyn Hutchinson, Ernest Spencer, Lesley Wiseman-Orr and SQA colleagues (2022)  Perceptions of Assessment Standards in Scotland: Questionnaire with Stakeholders. Report 3 of 4.

Ashmita Randhawa, Thomas Godfrey-Faussett, Jo-Anne Baird, Louise Hayward, Carolyn, Hutchinson, Ellen Macintosh, Ernest Spencer, Lesley Wiseman-Orr and SQA colleagues (2022)  Perceptions of Assessment Standards in Scotland: Employer Survey. Report 4 of 4.

Baird, J. (2022). On the use of cognitive science in teacher education in EnglandVernon-Wall Lecture – Anniversary Edition.

Godfrey-Faussett, T. (2022). Participatory Research and the Ethics of Anonymisation. Education Sciences, 12(4), 260.

Hayward, L., Elmegri, S., Packman, K-J, Jolly, J., Bevin, R., Percival, M., Baird, J., Newton, O., Wyse, D., Peacock, A. and Lander, V. (2022).  Qualifications for a New ERA: Equitable, Reliable Assessment.  Final Report.  National Education Union Independent Assessment Commission.

Ho, P. J. (2022). Assessing the range of cognitive processes in the Hong Kong Diploma of Secondary Education Examination (HKDSE)’s English language reading literacy test. Language Testing in Asia, 12(18), 1–19.


Sibel Erduran, Olga Ioannidou & Jo-Anne Baird (2021) The impact of epistemic framing of teaching videos and summative assessments on students’ learning of scientific methods, International Journal of Science Education, 43:18, 2885-2910, DOI: 10.1080/09500693.2021.1998717


He, Q., Meadows, M.L. & Black, B. (2020) An introduction to statistical techniques used for detecting anomaly in test results, Research Papers in Education, DOI: 10.1080/02671522.2020.1812108

Dennis Opposs, Jo-Anne Baird, Maia Chankseliani, Gordon Stobart, Amit Kaushik, Hugh McManus & David Johnson (2020) Governance structure and standard setting in educational assessment, Assessment in Education: Principles, Policy & Practice, DOI: 10.1080/0969594X.2020.1730766

Childs, A. and Baird, J. (2020) General Certificate of Secondary Education (GCSE) and the assessment of science practical work: an historical review of assessment policy.  The Curriculum Journal, 31, 3, 357 – 378.

Deem, R. and Baird, J. (2020).  The English Teaching Excellence (and student outcomes) Framework: intelligent accountability in higher education?   Journal of Educational Change, 21, 215 – 243.

Pinot de Moira, A., Meadows, M. and Baird, J. (2020). The SES equity gap and the reform from modular to linear GCSE mathematics.  British Educational Research Journal, 46, 2, 421 – 436.

Elliott, V., Randhawa, A., Ingram, J., Nelson-Addy, L. & Baird, J. (2020) Feedback in action: A review of practice in English Schools.  Education Endowment Foundation.


Cuff, B.M., Meadows, M.L. & Black, B. (2019) An investigation into the Sawtooth Effect in secondary school assessments in England, Assessment in Education: Principles, Policy & Practice, 26, 3, 321-339.

Lenkeit, J., Caro, C., Ertl, H., Cheadle, S., Turner, G., Matthews, A., Khan, S. and Jo-Anne Baird. (2019) The impact of preparation on TSA and BMAT test results – an institutional case study at Oxford University. Oxford University Centre for Educational Assessment Report. OUCEA/19/3

Baird, J.-A., Caro, D., Elliott, V., El Masri, Y., Ingram, J., Isaacs, T., Pinot de Moira, A., Randhawa, A., Stobart, G., Meadows, M., Morin, C., Taylor, R. (2019). Examination Reform: Impact of Linear and Modular Examinations at GCSE Oxford University Centre for Educational Assessment Report. OUCEA/19/2

Double, K.S., McGrane, J.A., Stiff, J.C., & Hopfenbeck, T.N. (2019). The importance of early phonics improvements for predicting later reading comprehension. British Educational Research Journal, 45,6, p.1220-1234:


Double, K.S., & Birney, D.P. (2018) Reactivity to Confidence Ratings in Older Individuals Performing the Latin Square Task. Metacognition and Learning, 13(3),309-326. doi:10.1007/s11409-018-9186-5 

Baird, J.-A., Isaacs, T., Opposs, D., Gray, L. (eds.) (2018) Examination Standards– How measures and meanings differ around the world. London: UCL IOE Press.

Lenkeit, J. & Schwippert, K. (eds.) (2018). The Assessment of Reading in International Studies. Special Issue of Assessment in Education: Principles, Policy and Practice25(1).

Hopfenbeck, T. N., & Lenkeit, J. (2018, January). PIRLS for Teachers: Making PIRLS results more useful for practitioners (Policy Brief No. 17). Amsterdam, The Netherlands: IEA


Caro, D.H., Kyriakides, L., Televantou, I. (2017). Addressing omitted prior achievement bias in international assessments: an applied example using PIRLS-NPD matched data. Assessment in Education: Principles, Policy and Practice.

Lenkeit, J., Schwippert, K., & Knigge, M. (2017). Configurations of multiple disparities in reading performance: longitudinal observations across countries. Assessment in Education: Principles, Policy & Practice. DOI: 10.1080/0969594X.2017.1309352

McGrane, J., Stiff, J., Lenkeit, J., Baird, J.-A. & Hopfenbeck, T. N. (2017). Progress in International Reading Literacy Study (PIRLS): National Report for England. London: Department for Education. Available here.

El Masri, Y., Rea-Dickins, P., Smith, R., Boggs, J. (2017). State of the Field Review. Towards a University Language Policy: The Case of the Aga Khan University. Oxford University Centre for Educational Assessment Report OUCEA/17/3

Gray, L. (2017). Overcoming political and organisational barriers to international practitioner collaboration on national examination research. GUIDELINES FOR INSIDER RESEARCHERS WORKING IN EXAM BOARDS AND OTHER PUBLIC ORGANISATIONS. Oxford University Centre for Educational Assessment Report OUCEA/17/2

Baird, J.-A., Andrich, D., Hopfenbeck, T.N., & Stobart, G. (2017) Assessment and learning: fields apart?Assessment in Education: Principles, Policy & Practice, 24(3).  Click here to download – the first 50 downloads are free.

Baird, J.-A., Andrich, D., Hopfenbeck, T.N., & Stobart, G. (2017) Metrology of education. Assessment in Education: Principles, Policy & Practice, 24(3). Click here to download – the first 50 downloads are free.

Popat, S., Lenkeit, J., Hopfenbeck, T. N. (2017) PIRLS for Teachers – A review of practitioner engagement with international large-scale assessment results. Oxford University Centre for Educational Assessment Report OUCEA/17/1

Elwood, J., Hopfenbeck, T. & Baird, J. (2017) Predictability in high-stakes examinations: students’ perspectives on a perennial assessment dilemmaResearch Papers in Education, 32 (1), 1-17.

Hopfenbeck, T.N., Lenkeit, J., El Masri, Y., Cantrell, K., Ryan, J. & Baird, J. (2017) Lessons Learned from PISA: A Systematic Review of Peer-Reviewed Articles on the Programme for International Student AssessmentScandinavian Journal of Educational Research [online].


Lenkeit, J (2016) Review of National Reports on PIRLS.  Oxford University Centre for Educational Assessment Report OUCEA/16/1

Baird, J.-A, Caro, D.H. & Hopfenbeck, T.N. (2016) Student Perceptions of Predictability of Examination Requirements and Relationship with Outcomes in High-Stakes Tests in IrelandIrish Educational Studies (online). Click here to download – the first 50 downloads are free.

Baird, J. & Hopfenbeck, T.N. (2016) Curriculum in the Twenty-First Century and the Future of Examinations (Chapter 51), in: Wyse, D. Hayward, L. & Pandya, J. (Eds.) The Sage Handbook of Curriculum, Pedagogy and Assessment, Vol. 2.

Baird, J. & Gray, L. (2016) The meaning of curriculum-related examination standards in Scotland and England: a home–international comparisonOxford Review of Education, 43 (2). 266-284. Click here to download – the first 50 downloads are free.

Baird, J., Johnson, S., Hopfenbeck, T.N., Isaacs, T., Sprague, T., Stobart, G. & Yu, G (2016) On the supranational spell of PISA in policyEducational Research, 58 (2), 121-138. Special Issue: International Policy Borrowing and Evidence-based Educational Policy Making: Relationships and Tensions. Click here to download – the first 50 downloads are free.

Caro, D.H. & Biecek, P. (2017) intsvy: An R Package for Analysing International Large-Scale Assessment Data, Journal of Statistical Software.

Caro, D.H., Lenkeit, J. & Kyriakides, L. (2016) Teaching strategies and differential effectiveness across learning contexts: Evidence from PISA 2012Studies in Educational Evaluation, 49, 30-41.

Elliott, V., Baird, J., Hopfenbeck, T.N., Ingram, J., Thompson, I., Usher, N., Zantout, M., Richardson, J. & Coleman, R. (2016) A marked improvement? A review of the evidence on written markingEducation Endowment Foundation.

El Masri, Y.H., Baird, J. & Graesser, A. (2016) Language effects in international testing: the case of PISA 2006 science itemsAssessment in Education: Principles, Policy & Practice [online]. Click here to download – the first 50 downloads are free.

El Masri, Y.H., Ferrara, S., Foltz, P.W. & Baird, J. (2016) Predicting item difficulty of science national curriculum tests: the case of key stage 2 assessmentsThe Curriculum Journal, 28 (1), 59-82.  Click here to download – the first 50 downloads are free.

Hopfenbeck. T.N. (2016) Å lykkes med elevvurdering (Succeeding with student assessment). Fagbokforlaget.

Newton, P.E. & Baird, J. (2016) The great validity debate(Editorial), Assessment in Education: Principles, Policy & Practice, 23 (2) – Special Issue on Validity. Click here to download – the first 50 downloads are free.


Baird, J., Meadows, M., Leckie, G, & Caro, D. (2015) Rater accuracy and training group effects in Expert- and Supervisor-based monitoring systemsAssessment in Education: Principles, Policy & Practice [online].

Baird, J., Hopfenbeck, T.N., Elwood, J., Caro, D. & Ahmed, A. (2015) Predictability in the Irish Leaving Certificate, Report commissioned by the State Examinations Commission, Ireland.

Caro, D.H. (2015) Causal mediation in educational research: An illustration using international assessment dataJournal of Research on Educational Effectiveness, 8 (4), 577-597.

Elwood, J., Hopfenbeck, T.N. & Baird, J. (2015 online) Predictability in high-stakes examinations: students’ perspectives on a perennial assessment dilemmaResearch Papers in Education. Click here to download – the first 50 downloads are free.

Lenkeit, J., Caro, D.H. & Strand, S. (2015) Tackling the remaining attainment gap between students with and without immigrant background: an investigation into the equivalence of SES constructsEducational Research and Evaluation, 21 (1), 60-83.

Lenkeit, J., Chan, J., Hopfenbeck, T.N., & Baird, J.-A. (2015) A review of the representation of PIRLS related research in scientific journalsEducational Research Review, 16, 102-115.


Baird, J. (2014) Teachers’ views on assessment practices(Editorial), Assessment in Education: Principles, Policy & Practice, 21 (4), 361-364. Click here to download – the first 50 downloads are free.

Baird, J., Hopfenbeck, T.N., Newton, P., Stobart, G. & Steen-Utheim, A.T. (2014) State of the Field Review: Assessment and Learning. Report for the Norwegian Knowledge Centre for Education.

Baird, J. (2014) EditorialAssessment in Education: Principles, Policy & Practice, 21 (1), 1-3

Baird, J. (2014) Assessment and Attitude (Editorial), Assessment in Education: Principles, Policy & Practice, 21 (2), 129-132. Click hereto download – the first 50 downloads are free.

Baird, J. (2014) EditorialAssessment in Education: Principles, Policy & Practice, 21 (1), 1-3. Click here to download – the first 50 downloads are free.

Caro, D.H., Cortina, K.S., & Eccles, J. (2014) Socioeconomic Background, Education, and Labor Force Outcomes: Evidence from a Regional U.S. SampleBritish Journal of Sociology of Education [online].

Lenkeit, J., & Caro. D.H. (2014) Performance status and change – Measuring education system effectiveness with data from PISA 2000-2009Educational Research and Evaluation, 20 (2), 146-174.

Mirazchiyski, P., Caro, D.H., Sandoval-Hernandez, A. (2014) Youth Future Civic Participation in Europe: Differences between the East and the RestSocial Indicators Research, 115 (3), 1031–1055.

Nyhamn, F. & Hopfenbeck, T.N. (2014) (Eds) From Political Decisions to Change in the Classroom: Successful Implementation of Education PolicyCIDREE Yearbook 2014.


Baird, J. (2013) The currency of assessments (Editorial), Assessment in Education: Principles, Policy & Practice, 20 (2), 147-149.

Baird, J. (2013) Judging students’ performances (Editorial), Assessment in Education: Principles, Policy & Practice, 20 (3), 247-249.

Baird, J., Ahmed, A., Hopfenbeck, T.N., Brown, C. & Elliott, V. (2013) Research evidence relating to proposals for reform of the GCSE.OUCEA Report.

Baird, J. & Black, P. (2013) (Eds) The reliability of public examinationsResearch Papers in Education, 28 (1), 1-4, Special Issue.

Baird, J. & Black, P. (2013) Test theories, educational priorities and reliability of public examinations in EnglandResearch Papers in Education, 28 (1), 5-21.

Baird, J., Hayes, M., Johnson, R., Johnson, S. & Lamprianou, I. (2013) Marker effects and examination reliability: A comparative exploration from the perspectives of generalizability theory, Rasch modelling and multilevel modelling. Report commissioned by the Office of Qualifications and Examinations Regulation. Ofqual/13/5261.

Caro, D.H., Sandoval-Hernandez, A., & Lüdtke, O. (2013) Cultural, Social and Economic Capital Constructs in International Assessments: An Evaluation Using Exploratory Structural Equation ModelingSchool Effectiveness and School Improvement [online].

Elwood, J. & Baird, J. (2013) (Eds) Students: researching voice, aspirations and perspectives in the context of educational policy change in the 14–19 phaseLondon Review of Education, 11 (2), Special Issue.

Lenkeit, J. (2013) Effectiveness measures for cross-sectional studies: A comparison of value-added models and contextualised attainment models, School Effectiveness and School Improvement, 24 (1), 39-63.

Rose, J. & Baird, J. (2013) Aspirations and an austerity state: young people’s hopes and goals for the futureLondon Review of Education, 11 (2), 157-153. Special Issue: Students: researching voice, aspirations and perspectives in the context of educational policy change in the 14–19 phase.

Simpson, L. & Baird, J. (2013) Perceptions of trust in public examinationsOxford Review of Education, 39 (1), 17-35.


Baird, J. (2012) EditorialAssessment in Education: Principles, Policy & Practice, 19 (4), 389-391.

Baird, J. (2012) Do we need marking at all? (Editorial), Assessment in Education: Principles, Policy & Practice, 19 (3), 277-279.

Baird, J. (2012) Science and misfits (Editorial), Assessment in Education: Principles, Policy & Practice, 19 (2), 141-145.

Baird, J., Elwood, J. & Isaacs, T. (2012) Written evidence submitted to the Education Select Committee’s Inquiry into the administration of examinations for 15-19 year olds in England.

Baird, J., Pillinger, R. & Steele, F. (2012) Use of the LEMMA Online Learning Materials, Report prepared for the LEMMA (Learning Environment for Multilevel Modelling and Applications) node, University of Bristol, January. For background information click here.

Baird, J., Rose, J. & McWhirter, A. (2012) So tell me what you want: a comparison of FE college and other post-16 students’ aspirationsResearch in Post-Compulsory Education, 17 (3), 293-310.

Caro, D.H. (2012) Evidencia causal en estudios educativos con bases de datos observables. [Causal evidence in educational studies with observational data]. In E. Vásquez (Ed) Inversión social: indicadores, bases de datos e iniciativas. [Social investment: indicators, data sets and initiatives]. Lima: Universidad del Pacífico.

Caro. D.H. & Cortés, D. (2012) Measuring family socioeconomic status: An illustration using data from PIRLS 2006, IERI Monograph Series: Issues and Methodologies in Large-Scale Assessments, 5, 9-33.

Caro, D.H. & Lenkeit, J. (2012) An analytical approach to study educational inequalities: 10 hypothesis tests in PIRLS  2006International Journal of Research and Method in Education, 35 (1), 3-30.

Caro, D.H. & Mirazchiyski, P. (2012) Socioeconomic Gradients in Eastern European Countries: evidence from PIRLS 2006European Educational Research Journal, 11 (1), 96-110.

Caro, D.H. & Schulz, W. (2012) Ten Hypotheses about Tolerance among Latin American AdolescentsCitizenship, Social and Economics Education, 11 (3), 213-234.

Daly, A., Baird, J., Chamberlain, S. & Meadows, M. (2012) Assessment reform: students’ and teachers’ responses to the introduction of stretch and challenge at A-levelCurriculum Journal, 23 (2), 173-187.


Hopfenbeck, T.N. (2012) The Role and Value of International Datasets and Comparisons in Education Research, Guest Editorial in International Datasets and Comparisons in EducationResearch Intelligence, 119, British Educational Research Association, 7-8.


Baird, J. (2011) Does the learning happen inside the black box?(Editorial), Assessment in Education: Principles, Policy & Practice, 18 (4), 343-345.

Baird, J. (2011) Why do people appeal Higher Education grades and what can it tell us about the meaning of standards? Assessment in Education: Principles, Policy & Practice, 18 (1), 1-4.

Baird, J., Béguin, A., Black, P., Pollitt, A. & Stanley, G. (2011) The Reliability Programme: Final Report of the Technical Advisory Group. Coventry: Ofqual/11/4825. Chapter 20, in: Q. He, & D. Opposs (Eds) Ofqual’s Reliability Compendium. Office of Qualifications and Examinations Regulation, Ofqual/12/5117. ISBN 978-0-85743-016-8.

Baird, J., Elwood, J., Duffy, G., Feiler, A., O’Boyle, A., Rose, J. & Stobart, G (2011) 14-19 Centre Research Study: educational reforms in schools and colleges in England Annual Report. London: QCDA.

Baird, J., Isaacs, T., Johnson, S., Stobart, G., Yu, G., Sprague, T. & Daugherty, R. (2011) Policy Effects of PISA. Report commissioned by Pearson UK.

Leckie, G. & Baird, J. (2011) Rater Effects on Essay Scoring: A Multilevel Analysis of Severity Drift, Central Tendency, and Rater ExperienceJournal of Educational Measurement, 48 (4), 399-418.


Baird, J. (2010) Construct‐irrelevant variance sometimes has consequential validity (Editorial), Assessment in Education: Principles, Policy & Practice, 17 (4), 339-343. Click here to download – the first 50 downloads are free.

MacCann, R.G. & Stanley, G. (2010) Classification consistency when scores are converted to grades: examination marks versus moderated school assessmentsAssessment in Education: Principles, Policy & Practice, 17 (3), 255-272.

MacCann, R.G. & Stanley, G. (2010) Extending participation in standard setting: an online judging proposalEducational Assessment, Evaluation and Accountability, 22 (2), 139-157.

Stanley, G. & Lee, J.C. (2010) Future Educational Reform Policies and Measures, in: J.C. Lee & B. Caldwell (Eds) Changing Schools in an Era of Globalization. New York, Routledge.


MacCann, R.G. & Stanley, G. (2009) Item banking with embedded standardsPractical Assessment Research & Evaluation, 14(17).

MacCann, R.G. (2009) Standard setting with dichotomous and constructed response items: some Rasch model approachesJournal of Applied Measurement, 10(4), 438-454.

Stanley, G. & MacCann, R.G. (2009) Incorporating industry specific training into school education: enrolment and performance trends in a senior secondary systemJournal of Vocational Education and Training, 61(4), 459-466.

Stanley, G., MacCann, R., Gardner, J., Reynolds, L. & Wild, I. (2009) Review of Teacher Assessment: Evidence of What Works Best and Issues for Development. Report commissioned by QCA.


Stanley, G. (2008) National Numeracy Review Report. Canberra: Council of Australian Governments. ISBN 0642 77735 7.

Stanley, G. & Tognolini, J. (2008) Performance with respect to standards in public examinations, Proceedings of the 34th IAEA Conference, Cambridge, UK.

Find us

OUCEA is part of the Department of Education, University of Oxford, 15 Norham Gardens OX2 6PY

OUCEA Office: tel 01865 284098

Visitor Reception: 15 Norham Gardens

For general enquiries, please email us:

Follow @OUCEA_OX

Follow Us