Should we assess oracy, and can comparative judgement help?

| by Ayesha Ahmed |

Image: Pixabay

As a low stakes classroom activity, teacher assessment of oracy skills can have a positive effect on learning and teaching. Teachers using assessment for learning (AfL) strategies can get a picture of their students’ progress in the skills needed for tasks such as oral presentations and group discussions. Teachers can identify where individual students might need help, for example in improving their vocabulary, in turn-taking, or in listening actively. Students can gain an understanding of what good oracy skills are and how to improve their own skills. Teacher assessment as a progress check allows the teaching to be targeted towards the needs of the students. It doesn’t have to be formally labelled as assessment, or involve marks. The key is having some agreed criteria and being able to give specific, informative, useful feedback to students.

For high stakes assessment, the answer to this question is much more complex. We know that when something is assessed in a way that has an impact on results it is valued in schools, given curriculum time, and taken seriously by students, teachers and parents. Since Speaking and Listening no longer contributes towards GCSE English grades, it is perceived as less important by many who are under pressure to teach and learn what does count.

But there are many other challenges for assessing oracy in a high stakes situation. The main ones are:

  • The ephemeral nature of the evidence
  • The time consuming nature of collecting the evidence
  • The risk of narrowing oracy to what is in the assessment
  • The subjective nature of the judgements

This last point is the deal-breaker: the notorious unreliability of assigning marks or grades to performances (this also applies to art, music, drama, essays and any complex responses to open-ended tasks). This is the main reason that oracy is not included in high stakes assessments. Such assessments have to satisfy some necessary conditions, mainly validity, reliability and fairness. We have to be able to infer that those who get higher marks are better at oracy. At the moment we can’t do that with enough confidence, and that’s not fair. But hang on a minute, we do it for art, for drama, for music, for foreign languages – what’s the difference? The difference is probably the first two bullet points, and the perception of oracy as less important and therefore less worth the risk (and so a vicious circle).

Image: Pixabay

So how can we deal with the issue of subjectivity of judgements? How can we get more reliable judgements of quality? There is an interesting way round this problem. It is called Comparative Judgment (CJ), and like many ‘innovations’ is not entirely new, and originates in Thurstone’s (1927) Law of Comparative Judgement. We are very bad at making absolute judgments, but we are naturals at making relative judgements. Imagine two different shades of blue. It’s quite easy to decide which is darker but very hard to describe how dark the darker one is (without resorting to comparative language). Most of us can say which of two musical notes is higher, but few of us can say one was an A and one an F sharp. The important point for assessment is that the vast majority would agree on which note is higher, or which shade of blue is darker, if not on how dark a shade of blue is (Is it quite dark? Is it very dark? Is it a ‘1’, ‘2’ or ‘3’ in terms of darkness?). See for a more in-depth yet easy to follow explanation.

Comparative judgement is currently being trialled in a variety of assessment situations, notably recently with Key Stage 2 writing tasks. See and Michael Tidd’s TES article

Image: Pixabay

Can we use CJ for oracy assessment? It would involve judging pairs of performances, for example, videos of oral presentations. For each pair, judges (teachers or examiners) must decide which is ‘better’. Each judge sees many pairs and makes many such judgements, after which all performances can be rank ordered and put on a scale. A score can then be generated if necessary. The idea of ranking in this way can rankle, but CJ is just a fairer albeit more explicit method for ordering performances.

The advantages of this method are:

  • Reliability tends to be higher than when performances are marked in the traditional way.
  • It relies on judges being experts in the construct being assessed – they need to be able to recognise good oracy – but they don’t need to be able to define or describe it in advance. This resonates with many experts in disciplines where performance assessment is required.
  • There is no need to assign any marks or grades to performances. There is no need to do the difficult job of describing various levels of performance that should gain certain numbers of marks.

Some major challenges remain, such as collecting the evidence and the time and resources needed. But CJ is an interesting and promising approach to reducing the element of subjectivity in assessment. It is worth exploring as a way to assess oracy more fairly and perhaps therefore to give us a chance of persuading policy makers to allow oracy assessments to count again in high stakes qualifications.


Thurstone (1927) A law of comparative judgement. Psychological Review, 34(4), 273.

Leave a Reply