Download this as a PDF
Some people seem to think that there are no real limits to the competence of science, no limits to what can be achieved in the name of science. There is no area of human life to which science cannot successfully be applied. A scientific account of anything and everything constitutes the full story of the universe and its inhabitants. Or, if there are limits to the scientific enterprise, the idea is that, at least, science sets the boundaries for what we human beings can ever know about reality. This is the view of scientism. (Stenmark 2013, p. 2103)
In my last post, I started what I hope will be a series of posts about assessment with a bit of a rant about the ‘Duolingo English Test‘ (DET) and rhetoric like this (from the ‘How predictive are Duolingo’s test scores?‘ page of their FAQ section):
Despite the liberal use of impressive-sounding terms like ‘precise and accurate’, ‘correlated’ and ‘validity’, the text does not actually explain ‘how predictive’ the DET is or what specifically it is supposed to predict. For now, though, I’m more interested in the claim that the test is ‘scientifically designed’.
What kind of ‘science’ is Duolingo referring to here? I think it’s the same kind of science that EL Thorndike, the ‘father of educational psychology’ and the man who “gave us the first standardized achievement test” (Berliner, 1993), wrote about in his highly influential 1912 paper, ‘The measurement of educational products‘ (more of that soon).
Thorndike and his influential version of science
According to David C. Berliner (1993), in his account of seminal developments in educational psychology at the turn of the twentieth century,
Thorndike’s version of science and his vision of educational psychology has led us to a narrower conception of our field than would have been true had the views of [William James, G. Stanley Hall and John Dewey, the ‘grandfather and granduncles of educational psychology’] gained prominence.
Berliner (1993) argues that James, Hall and Dewey, the “three founders of general and educational psychology”
had no problem agreeing that psychology had to take a major interest in education and that it was destined to be the “master science” for pedagogy. There was still a question, however, about which view of science was to dominate. This was the context for the father of our field, Edward Lee Thorndike, whose views differed from these individuals in important ways … Thorndike’s views resulted in a major shift in psychology, and it had serious consequences for our discipline. From a field genuinely interested in issues of schooling, psychology became disdainful of school practice. Thorndike’s influence resulted in an arrogance on the part of educational psychologists, a close-mindedness about the complexities of the life of the teacher and the power of social and political influences on the process of schooling.
Thorndike believed that only empirical work should guide education. His faith in experimental psychological science and statistics was unshakable. In his Introduction to Teaching (E. Thorndike, 1906), he wrote that psychological science is to teaching as botany is to farming, mechanics is to architecture, and psychology and pathology are to the physician.
In his 1910 introduction to the very first edition of the Journal of Educational Psychology, Thorndike wrote:
A complete science of psychology would tell every fact about every one’s intellect and character and behavior, would tell the cause of every change in human nature, would tell the result which every educational force … would have. It would aid us to use human beings for the worlds welfare with the same surety of the result that we now have when we use falling bodies or chemical elements. In proportion as we get such a science we shall become masters of our own souls as we now are masters of heat and light. Progress toward such a science is being made.
In a similar comment in an earlier piece, Thorndike wrote that “man is free only in a world whose every event he can understand and foresee … We are captains of our own souls only in so far as … we can understand and foresee every response which we will make to every situation” (1909, reprinted in Joncich, 1962, p. 45, cited in Berliner, 1993).
In 1922, Thorndike argued for a
newer pedagogy of arithmetic … [which] scrutinizes every element of knowledge, every connection made in the mind of the learner, so as to choose those which provide the most instructive experiences, those which will grow together into an orderly, rational system of thinking about numbers and quantitative facts. (p.74, cited in Berliner, 1993).
This reflected his belief that “whatever exists at all exists in some amount” and that “to know it thoroughly involves knowing its quantity as well as its quality” (1918, cited in Berliner, 1992).
To Thorndike, ‘knowing its quantity’ required the development of a scale. In ‘The measurement of educational products’, he wrote:
However it is defined, education concerns the production and prevention of changes in human beings; and a science of education must identify these changes, and relate them to their causes. To do this it must measure them. … There are peculiar difficulties in … measuring the changes which are the data for the science of education. The facts are extraordinarily complex, very widely variable, and do not at all readily suggest units, scales or graded standards by means of which they may be identified, compared, and related. So apparently simple an ability as ordinary addition of integers can be shown to require analysis into at least nine separate ability, each of which probably requires further analysis, in one case, into perhaps ninety component ability-atoms. (p.290)
Once we have identified these ‘ability-atoms’, Thorndike argued that we must then “get a series of perfectly defined points of the amount of some thing, so that all men may know what each man means by the statement he makes [for example, as to someone’s language proficiency], as all know it in the case of “one gram” or “two grams”” (1912, p.291). This would give us a scale of the kind associated with “the thermometer, spectroscope, and galvanometer” (p.291) and we would be able to measure a person’s ability in a scientific, objective, precise and accurate way. Then,
If we get scale points defined, and their distances defined, and establish an absolute zero, there is no further difficulty in constructing a scale for achievements of human nature. Such scales have every logical qualification that any of the scales for physical measurement have. (p. 299)
There seems to me a very clear and direct lineage that we can trace from Thorndike’s passion for scales and his remarkable ‘ability-atoms’ concept to the ‘granular’ approach to learning and assessment taken today by influential companies like Knewton and Pearson. Knewton and Pearson have taken up Thorndike’s mission and used information technology in an effort to realise his vision more than fully than he was able to.
Towards humility and tentativeness
This vision certainly meets Stenmark’s definition of ‘scientism’ above, but is there anything wrong with it? Berliner (1993) thinks there is and I agree with him:
In the second century of educational psychology, our science probably needs to be more descriptive and participatory, in the style of Hall. It needs to be less strident about pronouncing, ex cathedra, its findings, a warning that was first given to us by James. Our science needs to be more tolerant of the teacher and the complexity of the social, moral and political world of classrooms and schools, as Dewey reminded us. … Science never was as neutral as Thorndike believed it to be , and to perpetuate that myth among the next generation is nonsensical.
Paraphrasing Stenmark, I believe that there are real limits to the competence of science and to what can be achieved in the name of science and that educational psychology – and certainly language learning – is one area of human life to which science cannot successfully be applied, at least not in the ways that Thorndike and his intellectuals heirs at Duolingo, Knewton, Pearson, ACARA and elsewhere believe it can. Because of these limits, claims about language assessment such as those from Duolingo above should be made with “humility and tentativeness, rather than surety and arrogance” (Berliner, 1993).
3 thoughts on “Scientism and language assessment”
Fantastically interesting and well-researched piece, thank you. I look forward to your further thoughts on this.
Looking at your last paragraph, I completely agree. One point you have implied but not fully developed here, that fascinates me, is the vast gap between the knowledge we do have (limited and imperfect as it is) about the order in which people learn languages, so what might be regarded as more or less ‘advanced’, and the order in which language assessments tend (fairly randomly) to place particular, isolated language features in terms of how ‘advanced’ they think they are.
Geoff Jordan’s recent list of questions to ask at IATEFL, https://criticalelt.wordpress.com/2017/01/21/questions-to-ask-at-the-iatefl-2017-conference/ raises many of these same issues nicely, in his questions to eg Pearson –
“What theory or explanation of language learning informs the Global Scale of English (GSE)?
The nearly 2,000 can do statements that form the backbone of the GSE are based entirely on the intuitions of teachers: no empirical data have been gathered from learners’ experiences. How do you justify this?
How do you respond to the criticism that the GSE is an example of what Glenn Fulcher calls “Frankenstein scales”, which don’t relate to any specific communicative context, or give a good description of any particular communicative language ability?”
All of which I loved!
It is of course simply impossible for anyone to “get scale points defined, and their distances defined” as Thorndike put it in your quote above, when the items making up the scale are not agreed or discrete units anyway, and the scale has been created (https://www.english.com/gse) purely by asking teachers trained with specific assumptions to guess according to their existing assumptions. Of course, Pearson is not alone here, although more extreme in the granularity of its scale – it’s the same method by which the CEFR started out too.
Thanks for taking the time to read and respond with such a thoughtful comment.
Interesting point about what is regarded as ‘advanced’ – it reminded me of this excellent 2014 paper by Bulte and Housen, ‘Conceptualizing and measuring short-term changes in L2 writing complexity’ (http://www.sciencedirect.com/science/article/pii/S1060374314000666). In it they discuss the problems with defining and assessing ‘L2 complexity’ which “has been differentially characterized as “difficult to acquire or to produce,” “acquired late(r),” “developmentally advanced,” “more proficient,” “more mature,” “of high(er) quality,” or simply as “better”” (p. 45).
I agree re. Geoff Jordan’s list of questions to ask IATEFL – thanks for sharing the link here – the creation of the CEFR levels and descriptors. Have you read Glenn Fulcher’s ‘Deluded by Artifices’ paper (http://www.tandfonline.com/doi/abs/10.1207/s15434311laq0104_4?journalCode=hlaq20) on this? It includes fascinating account of the development of the CEFR and raises issues that I hope to address in a post somewhere down the track.
Thank you for the link to the Bulte and Housen paper – looks very interesting. I had read the Fulcher paper before, and am grateful for the much-needed light he casts on language assessment.
Looking forward to your future posts.