Understanding assessment: does it really have to be this difficult?

[Listening to ‘Cler Achel’ by Tinariwen; ‘Watch Me Disappear’ by Augie March; ‘All Around the World (or the Myth of Fingerprints)’ by Paul Simon; ‘Fight Them Back’ by Steve Mason]

Prologue

22 minutes into Cambridge English’s ‘Understanding assessment’ webinar, the presenters, Dr Evelina D. Galaczi and Dr Nahal Khabbazbashi, pose this question:

According to the poll results displayed, 95% of the 275 votes were for ‘gollining’, which the presenters say is “actually the correct answer.” It seems like a fairly straight-forward test item “tapping into sentence-level knowledge of a language”.

However, it’s not at all as straight-forward as it appears. How could we answer that and then claim that ‘gollining’ is ‘the correct answer’ without at least knowing what ‘flester’, ‘gollining’, ‘chandering’ and ‘rangeling’ mean? Perhaps, in this context, ‘gollining’, ‘chandering’ and/or ‘rangeling’ are (near-)synonyms and, therefore, there are two or more correct answers.

The fact that this is a possibility makes this an example of a bad test item, rather than a helpful illustration of testing “cognitive processes in practice”, as the presenters seem to intend it. It’s a minor misstep, perhaps, but, if two Drs – not to mention others from Cambridge English involved in the production of the webinar – can trip themselves up like this, what chance do the rest of us have of making sense of assessment?

Understanding dashed on the rocks of ‘validity’ and ‘reliability’ once again

Galaczi and Khabbazbashi talk at length about ‘six key assessment concepts’: purpose, test takers, construct, task, reliability and impact. Why isn’t ‘validity’ in this list? Because, according to Galaczi and Khabbazbashi, it is a “general overarching concept,” which “refers to whether the test is measuring what it aims to be measuring.” The “different elements” comprising ‘validity’ are the ‘six key concepts’ listed above.

This seems an uncontroversial definition of validity, but this is because it glosses over almost a century of debate about the concept. It also ignores the fundamental ontological question as to whether the thing we aim to measure even exists to begin with (Borsboom, Mellenbergh & Van Heerden, 2004)

The taxonomy is also a little confusing, especially if you’ve had other influential organisations like NEAS telling you that ‘validity’ is one of four ‘principles of assessment,’ alongside ‘reliability’, ‘flexibility’ and ‘fairness’. Cambridge says one thing, NEAS says another; where does that leave educators who rely on such organisations for leadership and, well, education?

‘Reliability’ is defined as

how dependable the scores from the test are or to put it differently how can we make sure that the scores that we give in a test reflect the learners ‘actual ability and not whether for example the examiner happened to be in a bad mood that day and was particularly harsh in giving marks.

What are we to make of this? Firstly, they’ve kept with the frustrating tradition of using a synonym of ‘reliability’ to define it: how helpful is it really to say that ‘reliable scores are dependable’? Secondly, how are we to distinguish between ‘validity’ and ‘reliability’?

Validity: does the test measure what it purports to measure?
Reliability: do the test scores reflect the learners’ actual ability?

Is it me or, according to these definitions, do ‘reliability’ and ‘validity’ mean the same thing? Sadly, this failure to adequately to define key terms and distinguish between them is, in my experience, very common in assessment discourse: read Bachman & Palmer (1996) – a canonical text – and then try to nail down a stable, practical, working distinction between ‘validity’, ‘reliability’ and ‘authenticity’.

It’s not just me and it’s not just you, this is all around the world

But I don’t think it is just me that finds this treatment of assessment (and, again, the problem is not just with this webinar or with Galaczi and Khabbazbashi – it’s endemic to the language assessment community, in my experience) confusing. After 17 minutes of failing to adequately distinguish between the key concepts that “every teacher should know”, Galaczi and Khabbazbashi summarise and, to check participants’ understanding, ask two questions.

Question 1: which assessment concept does this refer to: “the role the test has in influencing what tasks teachers will use in the classroom”? Judging by the poll results displayed on screen, participants were confused:

Poll 2

Only 25% answered correctly – ‘test impact’.

Question 2: “What about when we try to make scores more dependable by using assessment scales?”

Poll

The results are slightly better than for question 1 but hardly convincing.

Why are the results so poor? Perhaps the participants weren’t paying much attention; perhaps the webinar is a poor choice of medium; perhaps the questions are ambiguous. I’d say one important factor for sure is the conventional way the presenters have chosen to tackle the topic: by organising it around ‘key concepts’ which are difficult to define and distinguish between.

I think that this approach to ‘understanding assessment’ is counter-productive and the difficulty that the apparent experts have in even defining the key concepts is evidence of this. I think that the obsession with the concepts ‘validity’ and ‘reliability’, coupled with demonstrably ineffective efforts to educate others about them, has lead to, for many educators, a paralysing lack of confidence in discussing or designing assessment. Many of us prefer to leave it to the experts or, as Fulcher (2015) puts it, the ‘adepts’ who are “steeped in the interpretations and practices” of the assessment ‘cult’; the rest of us “uninitiated are outsiders who can only begin to see the truth if [we] submit to the required initiation processes” (pp. 97-98) such as IELTS examiner training.

This is unfortunate and unnecessary. As a community of educators, we should be able to do a much better job of educating ourselves and others about such fundamentally important aspects of our profession.What’s the alternative? Well, when you want to understand something complex and confusing, what do you do?

Personally, I read as much as I can about it and then I talk as much as I can about what I’m reading with as many people as I can so I can test out some of the new ideas I’ve come across and also see how well I’m understanding things. Writing helps too. It all takes time and effort (and I’m fortunate enough to have access to a university library) but I don’t know any other way.

And I don’t think there’s any other aspect of education which requires – and rewards – as much time and effort as assessment does.

[Listening to ‘Cler Achel’ by Tinariwen; ‘Watch Me Disappear’ by Augie March; ‘All Around the World (or the Myth of Fingerprints)’ by Paul Simon; ‘Fight Them Back’ by Steve Mason]

Prologue

Understanding dashed on the rocks of ‘validity’ and ‘reliability’ once again

It’s not just me and it’s not just you, this is all around the world

Share this:

Related

Leave a comment Cancel reply