Skip to content

Department of Education

Viewing archives for Researcher

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?

As you can see from the wide-ranging topics covered, OUCEA is engaging in wide-ranging research. The team looks forward to presenting more of our work at AEA-Europe’s 2023 conference in Malta.

Candace’s research focuses on assessments of early-stage literacy in low-and-middle-income countries. Specifically, around examining the challenges and opportunities relating to the alignment, measurement and use of SDG 4.1.1 on proficiency levels at primary and lower secondary education.

Candace completed her Masters in Educational Assessment and has chosen to continue her studies within the OUCEA.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Dr Yasmine El Masri is Associate Director for Research at England’s Office for The Qualifications and Examinations Regulation (Ofqual) and an Honorary Research Fellow at the Department of Education.

Yasmine held various positions as a Research Fellow at Ofqual, OUCEA and Brasenose College. She has also an experience in research management at one of England’s examination boards (AQA) and acted as an Assessment Expert Advisor for the qualifications’ regulator in Wales, Qualification Wales.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open-source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She also led a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. Moreover, she was a co-investigator on various projects funded by external organizations (e.g., Critical Thinking in the International Baccalaureate’s Diploma Programme and the impact of translation in the Diploma Programme science assessments).

Research Interests

Language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures, digitisation of assessments, onscreen (computer-based) assessments

 

Sølvi Lillejord has been professor and HoD at the University of Bergen, Norway and the University of Oslo, Norway. At the University of Bergen she was the leader of a research school, funded by the Norwegian research council. From 2013 to 2018, she was Director for the Norwegian Knowledge Centre. In 2011 Lillejord was international member of a panel reviewing the Department of Education at the University of Oxford. In 2015 she was appointed by the Dutch Research Council as member of the international evaluation committee of TIER (Top Institute for Evidence Based Education Research)

From 1998 to 2011 she was leading the project Productive Learning Cultures in the Southern African region, supervising students, and building research capacity. Lillejord acts as reviewer for Assessment in Education – Principles, policy & practice; Oxford Review of Education, Teaching and Teacher Education, Journal of Professional Development in Education; Journal of Education and Work.

Publications

1.     Lillejord, S. (2023). A more intelligent accountability – future directions. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 13. Elsevier. https://dx.doi.org/10.1016/B978-0-12-818630-5.09068-0.

2.     Lillejord, S. (2023). Educating the teaching profession. In: Tierney, R.J., Rizvi, F., Erkican, K. (Eds.), International Encyclopedia of Education, vol. 5. Elsevier, pp. 368–374. https://dx.doi.org/10.1016/B978-0-12-818630- 5.04049-5.

3.     Børte, K., Nesje, K., & Lillejord, S. (2020). Barriers to Student Active Learning in Higher Education. Teaching in Higher Education. 1-19

4.     Lillejord, S. (2020). From “unintelligent” to intelligent accountability. Journal of Educational Change, 21 (1) 1-18.

5.     Lillejord, S. & Børte, K. (2020). Middle leaders and the teaching profession: Building intelligent accountability from within. Journal of Educational Change 21 (1) 83-107 7

6.     Lillejord, S. & Børte, K. (2020). Trapped between accountability and professional learning? School leaders and teacher evaluation, Professional Development in Education 46:2, 274-291.

7.     Lillejord, S., Elstad, E., & Kavli, H. (2018): Teacher evaluation as a wicked policy problem. Assessment in Education – Principles, Policy & Practice. 25(3), 291-309. 23

8.     Lillejord, S. & Børte, K. (2016). Partnership in teacher education – a research mapping. European Journal of Teacher Education. 39(5), 550-563. Selected as one of 14 articles published 2006-2016 for a special issue (open access) to celebrate the journal’s 40th anniversary in 2017.

Alina von Davier, PhD. is a Chief of Assessment at Duolingo and CEO and Founder of EdAstra Tech. She is a Senior Research Fellow at Carnegie Mellon University. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. Von Davier and her team operate at the forefront of Computational Psychometrics. Her current research interests involve developing psychometric methodologies in support of digital-first assessments. A co-authored book Computerized Adaptive and Multistage Testing with R (2017) and the R package to support the applications received the B. Hanson Award from National Council of Measurement in Education (NCME) in 2022. Two other publications, a co-edited volume on Computerized Multistage Testing (2014) and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking (2011) were selected as the winners of the Division D Significant Contribution to Educational Measurement and Research Methodology award at American Educational Research Association (AERA). In 2020, von Davier received a Career Award from the Association of Test Publishers (ATP). In 2019, she was a finalist for the EdTech Visionary Award, EdTech Digest.

Website: https://en.wikipedia.org/wiki/Alina_von_Davier

Publications

von Davier, A.A., Mislevy, R.J., & Hao, J. (Eds.) (2021). Computational psychometrics: New methodologies for a new generation of digital learning and assessment. New York, NY: Springer. https://www.springer.com/gp/book/9783030743932

Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized Adaptive and Multistage Testing with R: Using Packages catR and mstR. Springer.

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement, 54(1).

von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative Assessment of Collaboration. Springer International Publishing.

von Davier AA., Deonovic, B., Yudelson, M., Polyak, ST., Woo, A. (2019). Computational Psychometrics Approach to Holistic Learning and Assessment Systems .Frontiers in Education 4,

https://www.frontiersin.org/article/10.3389/feduc.2019.00069

Mislevy, R. J., Corrigan, S., Oranje, A., Dicerbo, K., John, M., Bauer, M. I., Hoffman, E.,  von Davier, A. A., & Hao, J. (2014), Psychometrics and game-based assessments. Institute of Play. [http://www.instituteofplay.org/work/projects/glasslab-research/]

von Davier, A. A. (Ed.) (2011). Statistical models for test equating, scaling and linking. New York, NY: Springer-Verlag.

Alumni of the Masters in Educational Assessment, Lorena Garelli and Kevin Mason presenting their dissertation research.

Trinity’s School of Education and the Educational Research Centre, Drumcondra, hosted AEA-Europe’s Annual Conference on 9-12 November in Dublin, Ireland.

Over 350 attendees from 37 countries reflected on the conference’s theme – “New Visions for Assessment in Uncertain Times.” This diverse range of attendees included over 15 folks affiliated with OUCEA. Throughout the conference, attendees explored possible directions for assessment policy and practice in schools, higher education, and vocational/workplace settings over the coming years. Much of the reflection centered on the instability of the recent past – the pandemic, war in Ukraine, and economic challenges globally have created a sense of uncertainty in all spheres of life. As a result, attendees took stock and reimagined assessment in a world where the certainties of the past decades have given way to a more uncertain environment.

Keynote speeches addressed such diverse topics as “Assessing learning in schools – Reflections on lessons and challenges in the Irish context,” “Assessment research: listening to students, looking at consequences,” and “Assessment research: listening to students, looking at consequences.”

In addition to the keynotes, the conference hosted panel and poster presentation opportunities. Many members and associates of the OUCEA shared their research. For example:

Honorary Norham Fellow

  • Lena Gray – presented on assessment, policymakers, and communicative spaces – striving for impact at the research–policy interface

Honorary Research Associate

  • Yasmine El Masri – an OUCEA Research Associate – presented on Evaluating sources of differential item functioning in high-stakes assessments in England

Researcher

  • Samantha-Kaye Johnston – an OUCEA Research Officer – presented on Assessing creativity among primary school children through the snapshot method – an innovative approach in times of uncertainty.

Current doctoral students

  • Louise Badham – a current D.Phil Student – presented on Exploring standards across assessments in different languages using comparative judgment.
  • Zhanxin Hao –  presented on The effects of using testing and restudy as test preparation strategies on educational tests
  • Jane Ho  – presented on Validation of large-scale high-stakes tests for college admissions decisions

MSc in Educational Assessment graduates and students

  • Kevin Mason – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Lorena Garelli – presented on Assessment of Art and Design Courses using Comparative Judgment in Mexico and England
  • Joanne Malone – presented on Irish primary school teachers’ mindset and approaches to classroom assessment
  • Merlin Walters – presented on The comparability of grading standards in technical qualifications in England: how can we facilitate it in a post-pandemic world?