Department of Education

Oxford University Centre for Educational Assessment (OUCEA)




Lenkeit, J., Caro, C., Ertl, H., Cheadle, S., Turner, G., Matthews, A., Khan, S. and Jo-Anne Baird. (2019) The impact of preparation on TSA and BMAT test results – an institutional case study at Oxford University. Oxford University Centre for Educational Assessment Report. OUCEA/19/3

Baird, J.-A., Caro, D., Elliott, V., El Masri, Y., Ingram, J., Isaacs, T., Pinot de Moira, A., Randhawa, A., Stobart, G., Meadows, M., Morin, C., Taylor, R. (2019). Examination Reform: Impact of Linear and Modular Examinations at GCSE Oxford University Centre for Educational Assessment Report. OUCEA/19/2


Double, K.S., & Birney, D.P. (2018) Reactivity to Confidence Ratings in Older Individuals Performing the Latin Square Task. Metacognition and Learning, 13(3),309-326. doi:10.1007/s11409-018-9186-5 

Baird, J.-A., Isaacs, T., Opposs, D., Gray, L. (eds.) (2018) Examination Standards– How measures and meanings differ around the world. London: UCL IOE Press.

Lenkeit, J. & Schwippert, K. (eds.) (2018). The Assessment of Reading in International Studies. Special Issue of Assessment in Education: Principles, Policy and Practice25(1).

McGrane, J. A. (2018). The bipolarity of attitudes: Unfolding the implications of ambivalence. Applied Psychological Measurement.

Hopfenbeck, T. N., & Lenkeit, J. (2018, January). PIRLS for Teachers: Making PIRLS results more useful for practitioners (Policy Brief No. 17). Amsterdam, The Netherlands: IEA


McGrane, J, Chan, J, Boggs, J, Hopfenbeck, T (2017) The assessment and moderation of writing with primary-aged students in contexts where English is the primary language of instruction: A systematic review. Oxford University Press

Caro, D.H., Kyriakides, L., Televantou, I. (2017). Addressing omitted prior achievement bias in international assessments: an applied example using PIRLS-NPD matched data. Assessment in Education: Principles, Policy and Practice.

Lenkeit, J., Schwippert, K., & Knigge, M. (2017). Configurations of multiple disparities in reading performance: longitudinal observations across countries. Assessment in Education: Principles, Policy & Practice. DOI: 10.1080/0969594X.2017.1309352

McGrane, J., Stiff, J., Lenkeit, J., Baird, J.-A. & Hopfenbeck, T. N. (2017). Progress in International Reading Literacy Study (PIRLS): National Report for England. London: Department for Education. Available here.

El Masri, Y., Rea-Dickins, P., Smith, R., Boggs, J. (2017). State of the Field Review. Towards a University Language Policy: The Case of the Aga Khan University. Oxford University Centre for Educational Assessment Report OUCEA/17/3

Gray, L. (2017). Overcoming political and organisational barriers to international practitioner collaboration on national examination research. GUIDELINES FOR INSIDER RESEARCHERS WORKING IN EXAM BOARDS AND OTHER PUBLIC ORGANISATIONS. Oxford University Centre for Educational Assessment Report OUCEA/17/2

Maul, A. & McGrane, J. (2017) “As pragmatic as theft over honest toil: Disentangling pragmatism from operationalism”Measurement: Interdisciplinary Research and Perspectives 

Baird, J.-A., Andrich, D., Hopfenbeck, T.N., & Stobart, G. (2017) Assessment and learning: fields apart?Assessment in Education: Principles, Policy & Practice, 24(3).  Click here to download – the first 50 downloads are free.

Baird, J.-A., Andrich, D., Hopfenbeck, T.N., & Stobart, G. (2017) Metrology of education. Assessment in Education: Principles, Policy & Practice, 24(3). Click here to download – the first 50 downloads are free.

Popat, S., Lenkeit, J., Hopfenbeck, T. N. (2017) PIRLS for Teachers – A review of practitioner engagement with international large-scale assessment results. Oxford University Centre for Educational Assessment Report OUCEA/17/1

Hopfenbeck TNGörgen K. (2017) The politics of PISA: The media, policy and public responses in Norway and EnglandEuropean Journal of Education, 52:192205.

Elwood, J., Hopfenbeck, T. & Baird, J. (2017) Predictability in high-stakes examinations: students’ perspectives on a perennial assessment dilemmaResearch Papers in Education, 32 (1), 1-17.

Hopfenbeck, T.N., Lenkeit, J., El Masri, Y., Cantrell, K., Ryan, J. & Baird, J. (2017) Lessons Learned from PISA: A Systematic Review of Peer-Reviewed Articles on the Programme for International Student AssessmentScandinavian Journal of Educational Research [online].


Lenkeit, J (2016) Review of National Reports on PIRLS.  Oxford University Centre for Educational Assessment Report OUCEA/16/1

Baird, J.-A, Caro, D.H. & Hopfenbeck, T.N. (2016) Student Perceptions of Predictability of Examination Requirements and Relationship with Outcomes in High-Stakes Tests in IrelandIrish Educational Studies (online). Click here to download – the first 50 downloads are free.

Baird, J. & Hopfenbeck, T.N. (2016) Curriculum in the Twenty-First Century and the Future of Examinations (Chapter 51), in: Wyse, D. Hayward, L. & Pandya, J. (Eds.) The Sage Handbook of Curriculum, Pedagogy and Assessment, Vol. 2.

Baird, J. & Gray, L. (2016) The meaning of curriculum-related examination standards in Scotland and England: a home–international comparisonOxford Review of Education, 43 (2). 266-284. Click here to download – the first 50 downloads are free.

Baird, J., Johnson, S., Hopfenbeck, T.N., Isaacs, T., Sprague, T., Stobart, G. & Yu, G (2016) On the supranational spell of PISA in policyEducational Research, 58 (2), 121-138. Special Issue: International Policy Borrowing and Evidence-based Educational Policy Making: Relationships and Tensions. Click here to download – the first 50 downloads are free.

Caro, D.H. & Biecek, P. (in press) intsvy: An R Package for Analysing International Large-Scale Assessment Data, Journal of Statistical Software.

Caro, D.H., Lenkeit, J. & Kyriakides, L. (2016) Teaching strategies and differential effectiveness across learning contexts: Evidence from PISA 2012Studies in Educational Evaluation, 49, 30-41.

Elliott, V., Baird, J., Hopfenbeck, T.N., Ingram, J., Thompson, I., Usher, N., Zantout, M., Richardson, J. & Coleman, R. (2016) A marked improvement? A review of the evidence on written markingEducation Endowment Foundation.

El Masri, Y.H., Baird, J. & Graesser, A. (2016) Language effects in international testing: the case of PISA 2006 science itemsAssessment in Education: Principles, Policy & Practice [online]. Click here to download – the first 50 downloads are free.

El Masri, Y.H., Ferrara, S., Foltz, P.W. & Baird, J. (2016) Predicting item difficulty of science national curriculum tests: the case of key stage 2 assessmentsThe Curriculum Journal, 28 (1), 59-82.  Click here to download – the first 50 downloads are free.

Hopfenbeck. T.N. (2016) Å lykkes med elevvurdering (Succeeding with student assessment). Fagbokforlaget.

Hopfenbeck, T.N. & Kjaernsli, M. (2016) Students’ test motivation in PISA: the case of NorwayThe Curriculum Journal, 27 (3), 406-422.

Newton, P.E. & Baird, J. (2016) The great validity debate(Editorial), Assessment in Education: Principles, Policy & Practice, 23 (2) – Special Issue on Validity. Click here to download – the first 50 downloads are free.


Baird, J., Meadows, M., Leckie, G, & Caro, D. (2015) Rater accuracy and training group effects in Expert- and Supervisor-based monitoring systemsAssessment in Education: Principles, Policy & Practice [online].

Baird, J., Hopfenbeck, T.N., Elwood, J., Caro, D. & Ahmed, A. (2015) Predictability in the Irish Leaving Certificate, Report commissioned by the State Examinations Commission, Ireland.

Caro, D.H. (2015) Causal mediation in educational research: An illustration using international assessment dataJournal of Research on Educational Effectiveness, 8 (4), 577-597.

Elwood, J., Hopfenbeck, T.N. & Baird, J. (2015 online) Predictability in high-stakes examinations: students’ perspectives on a perennial assessment dilemmaResearch Papers in Education. Click here to download – the first 50 downloads are free.

Hopfenbeck, T.N. (2015) Lead Editors’ editorial introductionAssessment in Education: Principles, Policy & Practice, 22 (2), 179-181. Special Issue: Sociocultural Theoretical Perspectives on Assessment: Exploring Links, Limitations and Emerging Considerations.

Hopfenbeck, T.N. (2015) Formative assessment, grading and teacher judgement in times of change (Editorial), Assessment in Education: Principles, Policy & Practice, 22 (3), 299-301.

Hopfenbeck, T.N. (2015) On test development and accuracy in self-assessment (Editorial), Assessment in Education: Principles, Policy & Practice, 22 (4), 393-396.

Hopfenbeck, T.N., Flórez Petour, M.T. & Tolo, A. (2015) Balancing tensions in educational policy reforms: large-scale implementation of Assessment for Learning in NorwayAssessment in Education: Principles, Policy & Practice, 22 (1), 44-60. Special Issue: Assessment for Learning: Lessons Learned from Large-Scale Evaluations of Implementations.

Hopfenbeck, T.N. & Stobart, G. (2015) Large-scale implementation of Assessment for Learning (Editorial), Assessment in Education: Principles, Policy & Practice, 22 (1), 1-2. Special Issue: Assessment for Learning: Lessons Learned from Large-Scale Evaluations of Implementations.

Lenkeit, J., Caro, D.H. & Strand, S. (2015) Tackling the remaining attainment gap between students with and without immigrant background: an investigation into the equivalence of SES constructsEducational Research and Evaluation, 21 (1), 60-83.

Lenkeit, J., Chan, J., Hopfenbeck, T.N., & Baird, J.-A. (2015) A review of the representation of PIRLS related research in scientific journalsEducational Research Review, 16, 102-115.


Baird, J. (2014) Teachers’ views on assessment practices(Editorial), Assessment in Education: Principles, Policy & Practice, 21 (4), 361-364. Click here to download – the first 50 downloads are free.

Baird, J., Hopfenbeck, T.N., Newton, P., Stobart, G. & Steen-Utheim, A.T. (2014) State of the Field Review: Assessment and Learning. Report for the Norwegian Knowledge Centre for Education.

Baird, J. (2014) EditorialAssessment in Education: Principles, Policy & Practice, 21 (1), 1-3

Baird, J. (2014) Assessment and Attitude (Editorial), Assessment in Education: Principles, Policy & Practice, 21 (2), 129-132. Click hereto download – the first 50 downloads are free.

Baird, J. (2014) EditorialAssessment in Education: Principles, Policy & Practice, 21 (1), 1-3. Click here to download – the first 50 downloads are free.

Caro, D.H., Cortina, K.S., & Eccles, J. (2014) Socioeconomic Background, Education, and Labor Force Outcomes: Evidence from a Regional U.S. SampleBritish Journal of Sociology of Education [online].

Hopfenbeck, T.N. (2014) Strategier for læring: Om selvregulering, vurdering og god undervisning (Strategies for Learning: self-regulation, assessment and good teaching). Oslo, Universitetsforlaget.

Hopfenbeck, T.N. (2014) Testing Times: Fra PISA til nasjonale prover. Intensjoner, ansvar og anvendelse. (Testing Times: From PISA to national tests. Intentions, accountability and applications.) Chapter 23, 401–419, in: J.H. Stray & L. Wittek, Pedagogikk, en grunnbok. Cappelen Damm Akademisk, ISBN: 978-82-02-41424-5.

Lenkeit, J., & Caro. D.H. (2014) Performance status and change – Measuring education system effectiveness with data from PISA 2000-2009Educational Research and Evaluation, 20 (2), 146-174.

Mirazchiyski, P., Caro, D.H., Sandoval-Hernandez, A. (2014) Youth Future Civic Participation in Europe: Differences between the East and the RestSocial Indicators Research, 115 (3), 1031–1055.

Nyhamn, F. & Hopfenbeck, T.N. (2014) (Eds) From Political Decisions to Change in the Classroom: Successful Implementation of Education PolicyCIDREE Yearbook 2014.


Baird, J. (2013) The currency of assessments (Editorial), Assessment in Education: Principles, Policy & Practice, 20 (2), 147-149.

Baird, J. (2013) Judging students’ performances (Editorial), Assessment in Education: Principles, Policy & Practice, 20 (3), 247-249.

Baird, J., Ahmed, A., Hopfenbeck, T.N., Brown, C. & Elliott, V. (2013) Research evidence relating to proposals for reform of the GCSE.OUCEA Report.

Baird, J. & Black, P. (2013) (Eds) The reliability of public examinationsResearch Papers in Education, 28 (1), 1-4, Special Issue.

Baird, J. & Black, P. (2013) Test theories, educational priorities and reliability of public examinations in EnglandResearch Papers in Education, 28 (1), 5-21.

Baird, J., Hayes, M., Johnson, R., Johnson, S. & Lamprianou, I. (2013) Marker effects and examination reliability: A comparative exploration from the perspectives of generalizability theory, Rasch modelling and multilevel modelling. Report commissioned by the Office of Qualifications and Examinations Regulation. Ofqual/13/5261.

Caro, D.H., Sandoval-Hernandez, A., & Lüdtke, O. (2013) Cultural, Social and Economic Capital Constructs in International Assessments: An Evaluation Using Exploratory Structural Equation ModelingSchool Effectiveness and School Improvement [online].

Elwood, J. & Baird, J. (2013) (Eds) Students: researching voice, aspirations and perspectives in the context of educational policy change in the 14–19 phaseLondon Review of Education, 11 (2), Special Issue.

Hopfenbeck, T.N. (2013) What did you learn in school today?, in: J. Hattie, T.S. Wille, M. Hermansen, T.N. Hopfenbeck, C. Madsen, P. Kirkegaard, H. Bjerresgaard, C.E. Weinstein, I. Bråten & R. Andreassen (Eds) Feedback og vurdering for laering, in Danish. ISBN978-87-7281-685-2

Hopfenbeck, T.N. (2013) Students’ voice, aspirations, and perspectives: international reflections and comparisonsLondon Review of Education, 11 (2), 179-183.

Hopfenbeck, T.N., Tolo, A., Florez, T. & El Masri, Y. (2013) Balancing Trust and Accountability? The Assessment for Learning Programme in Norway. Report for OECD.

Lenkeit, J. (2013) Effectiveness measures for cross-sectional studies: A comparison of value-added models and contextualised attainment models, School Effectiveness and School Improvement, 24 (1), 39-63.

Lillejord, S. & Hopfenbeck, T.N. (2013) Vurdering og læring i skolen, in: Lillejord, S., Manger, T. & Nordahl, T. (Eds) Livet i skolen 2: Grunnbok i pedagogikk og elevkunnskap. Lærerprofesjonalitet, 231-259.

Rose, J. & Baird, J. (2013) Aspirations and an austerity state: young people’s hopes and goals for the futureLondon Review of Education, 11 (2), 157-153. Special Issue: Students: researching voice, aspirations and perspectives in the context of educational policy change in the 14–19 phase.

Simpson, L. & Baird, J. (2013) Perceptions of trust in public examinationsOxford Review of Education, 39 (1), 17-35.


Baird, J. (2012) EditorialAssessment in Education: Principles, Policy & Practice, 19 (4), 389-391.

Baird, J. (2012) Do we need marking at all? (Editorial), Assessment in Education: Principles, Policy & Practice, 19 (3), 277-279.

Baird, J. (2012) Science and misfits (Editorial), Assessment in Education: Principles, Policy & Practice, 19 (2), 141-145.

Baird, J., Elwood, J. & Isaacs, T. (2012) Written evidence submitted to the Education Select Committee’s Inquiry into the administration of examinations for 15-19 year olds in England.

Baird, J., Pillinger, R. & Steele, F. (2012) Use of the LEMMA Online Learning Materials, Report prepared for the LEMMA (Learning Environment for Multilevel Modelling and Applications) node, University of Bristol, January. For background information click here.

Baird, J., Rose, J. & McWhirter, A. (2012) So tell me what you want: a comparison of FE college and other post-16 students’ aspirationsResearch in Post-Compulsory Education, 17 (3), 293-310.

Caro, D.H. (2012) Evidencia causal en estudios educativos con bases de datos observables. [Causal evidence in educational studies with observational data]. In E. Vásquez (Ed) Inversión social: indicadores, bases de datos e iniciativas. [Social investment: indicators, data sets and initiatives]. Lima: Universidad del Pacífico.

Caro. D.H. & Cortés, D. (2012) Measuring family socioeconomic status: An illustration using data from PIRLS 2006, IERI Monograph Series: Issues and Methodologies in Large-Scale Assessments, 5, 9-33.

Caro, D.H. & Lenkeit, J. (2012) An analytical approach to study educational inequalities: 10 hypothesis tests in PIRLS  2006International Journal of Research and Method in Education, 35 (1), 3-30.

Caro, D.H. & Mirazchiyski, P. (2012) Socioeconomic Gradients in Eastern European Countries: evidence from PIRLS 2006European Educational Research Journal, 11 (1), 96-110.

Caro, D.H. & Schulz, W. (2012) Ten Hypotheses about Tolerance among Latin American AdolescentsCitizenship, Social and Economics Education, 11 (3), 213-234.

Daly, A., Baird, J., Chamberlain, S. & Meadows, M. (2012) Assessment reform: students’ and teachers’ responses to the introduction of stretch and challenge at A-levelCurriculum Journal, 23 (2), 173-187.

Eklof, H., Hopfenbeck, T.N. & Kjaernsli, M. (2012) Hva vet vi om elevers testmotivasjon? Erfaringer fra internasjonale og nasjonale undersokelser i Norge og Sverige (What do we know about students’ test motivation? Experiences of international and national tests in Norway and Sweden). Chapter 6, in: T.N. Hopfenbeck, M. Kjaernsli & R.V. Olsen (Eds) Kvalitet i norsk skole. Internasjonale og nasjonale undersokelser av laeringsutbytte og undervisning. (Quality in the Norwegian school. International and national tests of learning outcomes and teaching). Oslo, Universitetsforlaget ISBN 978-82-15-02004-4.

Hellekjaer, G.O. & Hopfenbeck, T.N. (2012) CLIL og lesing. En sammenligning av Vg3-elevers leseferdigheter og lesestrategibruk I 2002 og 2011. Report to the Norwegian Centre for Foreign Languages in Education, investigating students’ reading comprehension at the age of 18, comparing IB, CLIL and ordinary ESL students in Upper Secondary Schools.

Hopfenbeck, T.N. (2012) Strategier for laering. Om selvregulering og strategimaalinger i PISA (Strategies for learning. Self-regulating and strategy measuring in PISA). Chapter 5, in: T.N. Hopfenbeck, M. Kjaernsli & R.V. Olsen (Eds) – see above.

Hopfenbeck, T.N. (2012) The Role and Value of International Datasets and Comparisons in Education Research, Guest Editorial in International Datasets and Comparisons in EducationResearch Intelligence, 119, British Educational Research Association, 7-8.

Hopfenbeck, T.N., Throndsen, I., Lie, S. & Dale, E.L. (2012) Assessment with distinctly defined criteria: A research study of a National projectPolicy Futures in Education, 10 (4), 421-433.

Oancea, A. and Hopfenbeck, T.N. (2012) (Eds) International Datasets and Comparisons in EducationResearch Intelligence, 119, British Educational Research Association.

Olsen, R.V., Hopfenbeck, T.N., Lillejord, S. & Roe, A. (2012) Elevenes læringssituasjon etter innføringen av ny reform, Acta Didactica Oslo. 1/2012.


Baird, J. (2011) Does the learning happen inside the black box?(Editorial), Assessment in Education: Principles, Policy & Practice, 18 (4), 343-345.

Baird, J. (2011) Why do people appeal Higher Education grades and what can it tell us about the meaning of standards? Assessment in Education: Principles, Policy & Practice, 18 (1), 1-4.

Baird, J., Béguin, A., Black, P., Pollitt, A. & Stanley, G. (2011) The Reliability Programme: Final Report of the Technical Advisory Group. Coventry: Ofqual/11/4825. Chapter 20, in: Q. He, & D. Opposs (Eds) Ofqual’s Reliability Compendium. Office of Qualifications and Examinations Regulation, Ofqual/12/5117. ISBN 978-0-85743-016-8.

Baird, J., Elwood, J., Duffy, G., Feiler, A., O’Boyle, A., Rose, J. & Stobart, G (2011) 14-19 Centre Research Study: educational reforms in schools and colleges in England Annual Report. London: QCDA.

Baird, J., Isaacs, T., Johnson, S., Stobart, G., Yu, G., Sprague, T. & Daugherty, R. (2011) Policy Effects of PISA. Report commissioned by Pearson UK.

Hopfenbeck, T.N. (2011) Fostering self-regulated learners in a community of quality assessment practicesCADMO, (1) 7-21.

Hopfenbeck, T.N. & Maul, A. (2011) Examining Evidence for the Validity of PISA Learning Strategy Scales Based on Student Response ProcessesInternational Journal of Testing, 11 (2), 95-121.

Leckie, G. & Baird, J. (2011) Rater Effects on Essay Scoring: A Multilevel Analysis of Severity Drift, Central Tendency, and Rater ExperienceJournal of Educational Measurement, 48 (4), 399-418.


Baird, J. (2010) Construct‐irrelevant variance sometimes has consequential validity (Editorial), Assessment in Education: Principles, Policy & Practice, 17 (4), 339-343. Click here to download – the first 50 downloads are free.

MacCann, R.G. & Stanley, G. (2010) Classification consistency when scores are converted to grades: examination marks versus moderated school assessmentsAssessment in Education: Principles, Policy & Practice, 17 (3), 255-272.

MacCann, R.G. & Stanley, G. (2010) Extending participation in standard setting: an online judging proposalEducational Assessment, Evaluation and Accountability, 22 (2), 139-157.

Stanley, G. & Lee, J.C. (2010) Future Educational Reform Policies and Measures, in: J.C. Lee & B. Caldwell (Eds) Changing Schools in an Era of Globalization. New York, Routledge.


MacCann, R.G. & Stanley, G. (2009) Item banking with embedded standardsPractical Assessment Research & Evaluation, 14(17).

MacCann, R.G. (2009) Standard setting with dichotomous and constructed response items: some Rasch model approachesJournal of Applied Measurement, 10(4), 438-454.

Stanley, G. & MacCann, R.G. (2009) Incorporating industry specific training into school education: enrolment and performance trends in a senior secondary systemJournal of Vocational Education and Training, 61(4), 459-466.

Stanley, G., MacCann, R., Gardner, J., Reynolds, L. & Wild, I. (2009) Review of Teacher Assessment: Evidence of What Works Best and Issues for Development. Report commissioned by QCA.


Stanley, G. (2008) National Numeracy Review Report. Canberra: Council of Australian Governments. ISBN 0642 77735 7.

Stanley, G. & Tognolini, J. (2008) Performance with respect to standards in public examinations, Proceedings of the 34th IAEA Conference, Cambridge, UK.

Find us

OUCEA is part of the Department of Education, University of Oxford, 15 Norham Gardens OX2 6PY

OUCEA Office: tel 01865 284098

Visitor Reception: 15 Norham Gardens

For general enquiries, please email us:

Follow @OUCEA_OX

Follow Us