Appraising the trustworthiness of primary research in education is important in helping us to understand the extent to which that research can inform policy and practice decisions. The more trust we have in a piece of research, the more confidence we have that its findings offer meaningful contributions to our knowledge and understanding of the topic it addresses. Trustworthiness appraisal comes in many guises, and different types of appraisal are used for different kinds of research, each often having a specialised ‘tool’ to guide the process. When it comes to experimental designs, trustworthiness appraisal is typically thought of in terms of how well sources of potential bias have been minimised during the research process, and thus is often called Risk of Bias appraisal. There are a number of tools designed to guide this process. This training package focuses on the Education Endowment Foundation’s (EEF) Padlock tool. The EEF Padlock tool was designed by to appraise the research the EEF commissions, and we have adapted it here to allow risk of bias appraisal of any reports of experimental research in education.
1. About the EEF Padlock Tool

Introduction
What is bias?
Bias, as it relates to research, can be defined as systematic deviation from the truth. For example, if, when selecting people to take part in a study of school leavers’ post-16 destinations, the researchers choose to study only people who scored 9 on all of their GCSEs, then we can say that this is a biased sample. That is, the sample is different from the ‘truth’ that GCSE results in the population represent a much more varied distribution of grades. It is a systematic difference because higher grades are systematically more likely to be associated with pupils going on to do A-levels rather than entering the workforce or studying for a technical/vocational qualification.
Accounting for biases
Sometimes minimising sources of bias is in the control of the people conducting the research. For example, methods of allocating participants to comparison groups in an experiment can affect how likely those groups are to be similar to each other. A method of creating comparison groups that uses concealed random allocation, for example, minimises the risk that one group will be systematically different to the other at the start of an experiment. Therefore, if we observe any differences between the groups at the end of the experiment, we can be more confident this is a result of the differences in the ways that they were taught. By contrast, if comparison groups have been created by deliberately selecting people to be in one group or another, we cannot be confident that those groups are not systematically different to each other at the start of the experiment. Therefore, we cannot be confident that any differences observed at the end of the experiment are the result of the different ways in which they were taught. The method of allocation is a choice experimenters can make. The nature of that choice can influence how much trust we have in the results of their research.
Sometimes sources of bias are not in the control of the people conducting the experiment. For example, if people who have been recruited to take part in an experiment stop taking part before it is finished, this can introduce what we call attrition bias. The risk associated with attrition bias is compounded if more people from one group leave the experiment compared to the other group. For example, if, in an experiment comparing the effects of a new way of teaching maths with an old way of teaching maths, lots of people being taught in the new way find it too difficult and leave the experiment in frustration, this might mean that only people who are already very good at maths are left in that group by the end. When the scores on maths tests are compared between groups at the end of the experiment, this gives a biased estimation of the effects of the new way of teaching. This is because of the changed average characteristics of the people who are still in that group compared to those who remain in the ‘old way’ group. Attrition bias is not the fault of the experimenters, but it is important to know whether it has occurred so that we can assesses the likelihood of its associated risk of bias.
Biases addressed in the EEF Padlock tool
Different tools focus on different potential sources of bias. The EEF Padlock tool focuses on the following potential sources of bias:
- Design – the strength of the research design used, relative to its suitability for detecting causal relationships should they exist.
- Attrition – to extent to which people who started the experiment were there at the end of it.
- Confounding – the extent to which potential confounders (such as age, gender, or motivation) have been adequately accounted for.
- Concurrent interventions – whether participants are also receiving other interventions that may be correlated with their attainment on the outcomes of the reported experiment.
- Experimental effects and contamination – whether participants’ or their teachers modified their other teaching or learning behaviours once they started participating, and whether the groups may have been exposed to the comparison condition (such as teachers in one group sharing resources with teachers in the other).
- Implementation fidelity – the extent to which the interventions being compared were delivered as intended.
- Missing data – the extent to which all data that the researchers intended to collect were actually collected, and how any missing data were accounted for in the analysis.
- Measurement of outcomes – the extent to which the measurement tools used were valid and reliable.
- Selective reporting – the extent to which all assessed outcomes were reported, and whether there is evidence of data dredging (e.g. unplanned-for sub-group analyses).
Assessing the extent to which these potential sources of bias have been addressed in reports of experimental research provides an estimate of the trustworthiness of that research. Each of these will be explained in more detail in the following pages of this training package.