Skip to content

Department of Education

Viewing archives for Researcher

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth, C. (2014). Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science, 23, 374-380.
  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse.Elementary School Journal. 115, 210-229.
  • Greiff, S., Wüstenberg, S., Csapó, B., Demetriou, A., Hautamäki, J., Graesser, A. C., & Martin, R. (2014). Domain-general problem solving skills and education in the 21st century. Educational Research Review, 13, 74–83.
  • Nye, B.D., Graesser, A.C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education,24, 427–469.
  • Graesser, A.C. (2015). Deeper learning with advances in discourse science and technology. Policy Insights from Behavioral and Brain Sciences, 2, 42-50.
  • Graesser, A.C., Forsyth, C., & Lehman, B. (in press). Two heads are better than one: Learning from agents in conversational trialogues.  Teacher College Record.
  • Medimorecc, M.A., Pavlik, P., Olney, A., Graesser, A.C., & Risko, E.F. (in press). The language of instruction: Compensating for challenge in lectures. Journal of Educational Psychology.

Yasmine El Masri is a Research Fellow at Ofqual.  She was previously Research Manager at AQA, a Research Fellow at OUCEA and a Hulme Junior Research Fellow in Educational Assessment at Brasenose College. She has been appointed by Ofqual as an External Assessment Specialist and by Qualification Wales as an Assessment Expert Advisor.

Yasmine completed her DPhil in Education at OUCEA in 2015. Her doctoral thesis examined the impact of language on the difficulty of PISA science tests across UK, France and Jordan using different psychometric and statistical techniques including Rasch modelling and differential item functioning (DIF). In 2014, Yasmine received Kathleen Tattersall New Researcher Award from the Association for Education Assessment- Europe (AEA-Europe).

Before coming to Oxford, Yasmine was a science teacher in secondary schools in Beirut and Abu Dhabi. In addition to her DPhil degree from Oxford, Yasmine holds a Master of Arts in Science Education, a Teaching Diploma for teaching science in secondary schools and a Bachelor of Science in Biology from the American University of Beirut (AUB).

Yasmine led various externally funded projects, including a one-year ESRC GCRF Fellowship in 2017, during which she collaborated with a local NGO in Lebanon producing open source interactive science tasks in multiple languages for underprivileged students in the country, including Syrian refugees. She is currently leading a study within Project Calibrate, a three-year project funded by the Wellcome Trust and the Gatsby Foundation aiming to enhance summative assessments of practical science in England. She is also a co-investigator and a project manager of a project funded by the International Baccalaureate Organization focused on Critical Thinking in the Diploma Programme.

Research Interests

Item difficulty and demands, language in assessment, science assessments, international large-scale assessments, critical thinking, comparability of assessments across cultures

 

Samantha-Kaye Johnston is a Research Officer at the Oxford University Centre for Educational Assessment (OUCEA).

Samantha-Kaye was formally educated in Jamaica, where she completed her Bachelor of Science in Psychology. In England, she received her Master of Arts in Education and then completed her Ph.D. in Psychology in Australia. Using a cognitive psychology lens, Samantha’s expertise and interest lie at the intersection of education and psychology. She aims to link these areas with evidence-based e-learning technologies to improve teaching, learning, and assessment outcomes.

Samantha has 10+ years of experience in the project management sector, where she has been actively involved in education development initiatives. In 2016, as part of her Project Capability, she founded the Marlon Christie scholarship, which provides a scholarship for Jamaican students with reading difficulties to attend university. As an extension of this project, Samantha founded Reading for Humanity, to elevate the science of reading, the science of learning, and the science of technology within the classroom. Her work is informed by her experience as an advocate and researcher in Jamaica, England, and Australia, primarily within the K-12 sector, as well as within non-governmental, private, community organisations, and United Nations bodies.

She has experience as a University Associate at Curtin University and Teaching Associate at Monash University, as part of their undergraduate and graduate psychology teaching teams. Within this space, she has been teaching and/or assessing various psychology units, including Introduction to Psychology, Developmental Psychology, Science and Professional Practice in Psychology, and Indigenous and Cross-Cultural Psychology.

During her time in the ed-tech sector, and in collaboration with UNESCO’s Future of Education Initiative, she conceptualised and spearheaded Project Seat-at-the-Table (Project SAT), an international qualitative research initiative that aimed at providing primary and secondary school students with the opportunity to provide their input on the future of technology in their education. As an affiliate at the Berkman Klein Centre for Internet and Society at Harvard University, Samantha’s seeks to strengthen internet governance within online learning. In particular, she is interested in ensuring that the rights of young students are protected while they interact within the digital space, including elevating the voices of students in decision-making processes.

Above all, Samantha believes that every child should have the same opportunity to shape their destiny, emphasing that we cannot always build the future for them, but we can build them for the future. Consequently, her goal is to ensure that teachers implement evidence-based pedagogical approaches that will strengthen 21st-century skills, including, critical thinking and creativity, in all students.

By Dr Rebecca Eynon (Associate Professor between the Department of Education and the Oxford Internet Institute) & Professor Jo-Anne Baird (Director of the Department of Education)

Now that the infamous Ofqual algorithm for deciding the high-stake exam results for hundreds of thousands of students has been resoundingly rejected, the focus turns to the importance of investigating what went wrong. Indeed, the office for statistics regulation has already committed to a review of the models used for exam adjustment within well specified terms, and other reviews are likely to follow shortly.

A central focus from now, as students, their families, educational institutions and workplaces try to work out next steps, is to interrogate the unspoken and implicit values that guided the creation, use and implementation of this particular statistical model.

As part of the avalanche of critique aimed at Ofqual and the government, the question of values come in to play. Why, many have asked, was Ofqual tasked, as they are every year, with avoiding grade inflation as their overarching objective? Checks were made on the inequalities in the model and they were consistent with the inequalities seen in examinations at a national level.  This, though, begs the question of why these inequalities are accepted in a normal year.

These and other important arguments raised over the past week or so highlight questions about values. Specifically, they raise the fundamental question of why, aside from the debates in academia and some parts of the press, we have stopped discussing the purposes of education. Instead, a meritocratic view of education, promoted since the 1980s by governments on the right and left of the spectrum has become a given. In place of discussions about values, there has been an ever increasing focus on the collection and use of data to hold schools accountable for ‘delivering’ an efficient and effective education, to measure student’s ‘worth’ in ways that can easily be traded in the economy, and to water down ideas of social justice and draw attention away from wider inequalities in society.

Once debates about values are removed from our education and assessment systems, we are left with situations like the one now. The focus on creating a model that makes the data look like past years – with little debate over whether the aims should have been different this year is a central example of this. Given the significant (and unequal) challenges young people have faced during this year, should we not, as a society have wanted to reduce inequalities in our society in any way possible?

The question of values also carries through into other discussions of the datafication of education, where the collection and analysis of digital trace data, i.e. data collected from the technologies that young people engage with for learning and education, is growing exponentially. Yet unlike other areas of the public sector like health and policing, schools rarely have a central feature in policy discussions and reports of algorithmic fairness. The question is why?  There are highly significant ethical and social implications of extensive data use in education that significantly shape young people’s futures. These include issues of privacy, informed consent and data ownership (particularly due to the significant role of the commercial sector); the validity and integrity of the models produced; the nature of the decisions promoted by such systems; and questions of governance and accountability. This relative lack of policy interest in the implications of datafication for schooling is, we suggest, because governments take for granted the need for data of all kinds in education to support their meritocratic aims, and indeed see it as a central way to make education ‘fair’.

The Ofqual algorithm has brought to our attention the ethics of the datafication of education and the risk that poses of compounding social inequalities. Every year there is not only injustice from the unequal starting points and the unequal opportunities young people have within our schools and in their everyday lives, but there is also injustice in the pretence that extensive use of data is somehow a neutral process.

In the important reflections and investigations that should now take place over the coming weeks and months there needs to be a review that explicitly places values and ethical frameworks front and centre, that encourages a focus on the purposes of education, particularly in times of a (post-) pandemic.

Edward W. Wolfe is a Principal Research Scientist in the Research and Innovations Network at Pearson.

In that position, he conducts research relating to human raters and automated scoring as well as providing operational support for Australia’s National Assessment Program – Literacy and Numeracy (NAPLAN) and the National Board for Professional Teaching Standards (NBPTS). Dr. Wolfe previously held academic appointments at the University of Florida, Michigan State University, and Virginia Polytechnic Institute and State University.

Research

Dr. Wolfe’s research interests include applications of latent trait models to detecting and correcting rater effects, modeling rater cognition, evaluating automated scoring, and applications of multidimensional and multifaceted latent trait models to instrument development. His research has recently been published in several notable peer review journals, including Educational and Psychological Measurement, the International Journal of Testing, and the Journal of Educational Measurement. He also serves on the editorial board of several journals, including the Journal of Applied Measurement and the Journal of Writing Assessment, and he has served as a representative of Pearson to committees sponsored by the National Assessment of Educational Progress (NAEP) and the Council of Chief State School Officers (CCSSO).

Publications
Peer review publications (2009-2013)
  • Song, T., & Wolfe, E.W. (2013). RaschFit.sas: A SAS macro for generating Rasch model expected values, residuals, and fit statistics, Applied Psychological Measurement, 37, 253-254.
  • Wolfe, E.W. (2013). A boostrap approach to evaluating person and item fit to the Rasch model, Journal of Applied Measurement, 14, 1-9.
  • Chow, T., Olsen, B., & Wolfe, E.W. (2012). Development, content validity and piloting of an instrument designed to measure managers’ attitude toward workplace breastfeeding support, Journal of the American Dietetic Association, 112, 1042-1047.
  • Dietrich, C.B., Wolfe, E.W., & Vanhoy, G.M. (2012). Cognitive radio testing using psychometric approaches: applicability and proof of concept study, Analog Integrated Circuits and Signal Processing, 72, 1-10.
  • He, W., & Wolfe, E.W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72, 808-826.
  • Lai, E.R., Auchter, J.E., & Wolfe, E.W., (2012). Confirmatory factor analysis of certification assessment scores from the National Board of Professional Teaching Standards. International Journal of Educational and Psychological Assessment, 9, 61-81.
  • Wolfe, E.W., & McGill, M.T. (2012). Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns. Journal of Applied Measurement, 12, 358-369.
  • Wolfe, E.W., & McVay, A. (2012). Applications of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practices, 31(4), 31-37.
  • Barnes, B.J., Chard, L.A., Wolfe, E.W., Stassen, M.L.A., & Williams, E.A. (2011). An evaluation of the psychometric properties of the graduate advising survey for doctoral students. International Journal of Doctoral Studies, 6, 1-17.
  • Wolfe, E.W., & Singh, K. (2011). A comparison of structural equation and multidimensional Rasch modeling approaches to confirmatory factor analysis. Journal of Applied Measurement, 12, 212-221.
  • Bodenhorn, N., Wolfe, E.W., & *Airens, O. (2010). School counselor program choice and self-efficacy: Relationship to achievement gap and equity. Professional School Counseling, 13, 165-174.
  • He, W., & Wolfe, E.W. (2010). Item equivalence in English and Chinese translations of a cognitive development test for preschoolers. International Journal of Testing, 10, 80-94.
  • Miyazaki, Y., Sugisawa, T., & Wolfe, E.W. (2010). Comparing the performance of different transformations in fixed-effects meta-analysis of reliability coefficient. Japanese Journal for Research on Testing, 16, 1-15.
  • Skaggs, G.E., & Wolfe, E.W. (2010). Equating applications via the Rasch model. Journal of Applied Measurement, 11, 182-195.
  • Wolfe, E.W., & Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment, 10, 1-21.
  • Wolfe, E.W. & VanDerLinden, K.E. (2010). Development of scales relating to professional development in community college administrators. Journal of Applied Measurement, 11, 142-157.
  • Myford, C.M., & Wolfe, E.W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371-389.
  • Wolfe, E.W., Hickey, D.T., & Kindfield, A.H.C. (2009). An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics. Journal of Applied Measurement, 10, 196-207.
  • Wolfe, E.W. (2009). Item and rater analysis of constructed response items via the multi-faceted Rasch model. Journal of Applied Measurement, 10, 335-347.
  • Wolfe, E.W., Converse, P.D., *Airens, O., & Bodenhorn, N. (2009). Unit and item non-responses and ancillary information in web- and paper-based questionnaires administered to school counselors. Measurement and Evaluation in Counseling and Development, 42, 92-103.

Arthur Graesser is an Honorary Research Fellow at the University of Oxford, associated with the Oxford University Centre for Educational Assessment (OUCEA). He is a professor in the Department of Psychology at the Institute of Intelligent Systems at the University of Memphis.

He received his PhD in psychology from the University of California at San Diego. He has served as editor of the journal Discourse Processes (1996–2005) and Journal of Educational Psychology (2009-2014) and as presidents of the Empirical Studies of Literature, Art, and Media (1989-1992), the Society for Text and Discourse (2007-2010), International Society for Artificial Intelligence in Education (2007-2009), and the Federation for the Advancement of Brain and Behavioral Sciences Foundation (2012-13). He received the Distinguished Contributions of Applications of Psychology to Education and Training Award (American Psychological Association, 2011), the Distinguished Scientific Contribution Award (Society for Text and Discourse, 2010), and the title of Distinguished University Professor of Interdisciplinary Research at the University of Memphis (2014).

Panels

Art Graesser has served on recent panels to advance learning, education, and assessment:

  • Organizing Instruction and Study to Improve Student Learning, US Department of Education, Institute of Education Sciences (Pashler, Bain, Bottge, Graesser, Koedinger, McDaniel, & Metcalf, 2007)
  • Lifelong Learning at Work and at Home, Association of Psychological Sciences and American Psychological Association (Graesser, Halpern, & Hakel, 2007)
  • Adolescent and Adult Literacy, National Academy of Sciences
  • Technology-based Learning, National Academy of Sciences
  • PIAAC Problem Solving in Technology-Rich Environments (OECD)
  • PISA Problem Solving 2012 (OECD)
  • PISA Collaborative Problem Solving 2015 (OECD)
  • Common Core Standards for Reading and Writing, National Governors Association, Gates Foundation, and Student Achievement Partners
  • How People Learn, edition 2 (National Academy of Sciences, Engineering, and Medicine)
Research

Art Graesser’s primary research interests are in cognitive science, discourse processing, and the learning sciences. More specific interests include knowledge representation, question asking and answering, tutoring, text comprehension, inference generation, conversation, reading, principles of learning, emotions, artificial intelligence, computational linguistics, and human-computer interaction.

Art Graesser and his colleagues have designed, developed, and tested software that integrates psychological sciences with learning, language, and discourse technologies, including AutoTutor, AutoTutor-lite, MetaTutor, GuruTutor, DeepTutor, ElectronixTutor, HURA Advisor, SEEK Web Tutor, Operation ARIES!, iSTART, Writing-Pal, AutoCommunicator, Point & Query, Question Understanding Aid (QUAID), QUEST, & Coh-Metrix.

Publications
Books
  • D’Mello, S. K., Graesser, A. C., Schuller, B., & Martin, J. (Eds.). (2011). Affective Computing and Intelligent Interaction. Berlin: Springer-Verlag.
  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix.  Cambridge, MA: Cambridge University Press.
  • Sottilare, R., Graesser, A.C., Hu, X., & Goldberg, B. (Eds.),Design Recommendations for Intelligent Tutoring  Systems: Adaptive Instructional Strategies (Vol.2). Orlando, FL: Army Research Laboratory.
Journal articles
  • Graesser, A.C. (2011). Learning, thinking, and emoting with discourse technologies.  American Psychologist, 66, 743-757.
  • Graesser, A.C., & McNamara, D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-398.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics.  Educational Researcher, 40, 223-234.
  • D’Mello, S. K. & Graesser, A. C. (2012). AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 23: 1-38.
  • Goldman, S.R., Braasch, J.L.G., Wiley, J., Graesser, A.C., & Brodowinska, K. (2012). Comprehending and learning from internet sources: Processing patterns of better and poorer learners.  Reading Research Quarterly, 47, 356-381.
  • Halpern, D.F., Millis, K., Graesser, A.C., Butler, H., Forsyth, C., & Cai, Z. (2012). Operation ARA: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity, 7, 93-100.
  • Hu, X., Craig, S. D., Bargagliotti A. E., Graesser, A. C., Okwumabua, T., Anderson, C., Cheney, K. R., & Sterbinsky, A. (2012). The effects of a traditional and technology-based after-school program on 6th grade students’ mathematics skills. Journal of Computers in Mathematics and Science Teaching, 31, 17-38.
  • Graesser, A.C. (2013). Evolution of advanced learning technologies in the 21st Theory Into Practice, 52, 93-101.
  • D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A.C. (2014). Confusion can be beneficial for learning.  Learning and Instruction. 29, 153-170.
  • Graesser, A. C., Li, H., & Forsyth,