12. Measurement of Outcomes and Selective Reporting

Home > Oxford Education Deanery > Online Learning > Assessing Risk of Bias in Education Research > 12. Measurement of Outcomes and Selective Reporting

Now that you have watched the video, read the relevant sections of the papers by Hoferichter & Jentsch (2024) and Kisida et al. (2020) refer to the EEF’s guidance notes, and make a judgement about the level of threat posed by Measurement of Outcomes and Selective Reporting in these experiments. Record your judgement in the security rating template, and note down any supporting information.

Open the accordion below to compare your judgement with that of an experienced rater.

Answers

Hoferichter & Jentsch (2024)

Measurement of Outcomes: The relevant information can be found on pages 2447 and 2448.

The authors describe three different tests to measure their primary outcomes. These are established instruments used in the field and, to the extent they can, they have shown good or acceptable internal consistency. In the case of the General Self-Efficacy Scale, they state that it has good correlations with other measures of mental health. The instruments rely on self-report, and thus on subjective judgement. The participants (i.e. the people who administered the tests) cannot have been blinded to group allocation, and there is no information reported about whether the researchers (the people who analysed the tests) were blinded to group allocation.

Therefore, this study is considered as having a moderate risk of bias for Measurement of Outcomes.

Selective Reporting: While the authors have made data and syntax files available through supplementary materials files on Open Science Framework, there is no mention of a prospectively registered protocol. So, it is impossible to assess whether they have reported everything they planned to.

Therefore, this study is considered as having a high risk of bias for Selective Reporting and Data Availability.

Kisida et al. (2020)

Measurement of Outcomes: The relevant information can be found on pages 5 and 6 in the section on Survey Measures and Data Collection and Analytic Approach.

The assessment of history knowledge is well aligned to the standard for the state curriculum. It was constructed without knowledge of content of the performance that constituted the intervention, and its content was concealed from the programme operators to protect against ‘teaching to the test’. The attitudes questionnaire was judged to have good face validity and acceptable levels of internal validity as measured using Cronbach’s alpha.

As assessments were administered by teachers in the participating schools, blinding to treatment was not achieved, but safeguards were put in place to ensure quality. No ceiling or floor effects were detected.

Therefore, this study is considered as having a moderate risk of bias for Measurement of Outcomes.

Selective reporting: There is no mention of a prospectively registered protocol. So, it is impossible to assess whether they have reported everything they planned to. Raw data have not been made available.

Therefore, this study is considered as having a high risk of bias for Selective Reporting and Data Availability.

Next: Calculating the Final Rating