Why Grades Suck – Part 1

Experiment: Bring some data theory into a world that is usually unquestioned.

I know, I know, complaining about grades probably just means I couldn’t get good grades. Okay, that’s not too far from the truth, but more than anything, I ALWAYS have been annoyed at how useless grades seemed. From the time that I started getting grades all the way through having to assign numbers to students, I’ve hated it the whole time.

Part 1 will be some technical stuff about numbers, then we’ll get to the fun part later.

This post is heavily influenced by a Measurement Models class I attended with the lovely Cees van der Eijk at the University of Nottingham. According to Coomb’s (1964) theories of data, there are three ways that we assign numbers; by fiat, by representational measurement or by measurement models. Fiat is according to the measurer’s judgement, representational measurement is of physical properties (like temperature, weight or distance) and measurement models are ways that we measure unobservable attributes (latent) from observable indicators. Grades are usually assigned according to fiat, but they are used as if they were representational or measurement models!

The term “by fiat” already sounds like something fishy is going on but it has some good uses. Generally, it applies to anything that relies on the judgment of a person. For example, judging the Olympic event of ice skating relies on measurement by fiat. Although scores are based on which “required elements” are in a performance, the judges still make a judgement about the accuracy of the performance. This is a very different way to score compared to speed skating where the measurement is the amount of time it takes to go a certain distance (that would be a representational measurement). Everyone seems to understand that there is something very different between these two ways of measuring. In theory, a speed skating race time could be used as evidence that a skater was better than everyone else in the world — even if they had never directly raced them. But ice skating scores cannot be used in the same way!

When I have discussed my misgivings about grades to administrators and teachers, the conversation usually degrades quickly. Most people don’t care about the differences between measurement by fiat, representational measurement, or measurement models. But, it is a problem when we don’t see the difference. When grades are compiled into a “grade point average” (GPA), we are assuming the individual grades are additive and make up a single measure of a person. That is a characteristic of measurement models, but not even of representational measurement (for example, two loud things don’t necessarily increase the decibel level in a room in an additive way). And are math grades and creative arts grades really additive and is the average actually useful?! Is a student’s performance in music class really translatable into a set of numbers? Is there really a difference between a 3.7 and a 3.8?! (Yes, my HS grade point average was somewhere in that range . . . just in the right spot to be denied for some things) There is probably little difference between those two as an indictor of future success, but because we treat GPA as a measurement model, it becomes an example of “garbage in, garbage out”.

The reason that grades should not be used to make a GPA is that representational measurement requires a falsifiable model and a separation between observation and indicators. If grades were based on a measurement model, then it would be possible for a student to get bad grades and for that to mean that the grading system is incorrect. I don’t think schools tend to operate that way, so it is not a falsifiable model. The separation between observation and indicators means that we create measures that don’t rely “on presumed relationships between observations and the concept of interest” (Torgerson, 1958, p. 22). When they are equated, we don’t really know if we measured students or if we measured the item itself. A simplified example is a question like “Is the dress blue or white“? The question would seem to have a correct answer, but we don’t know if we are measuring the perception of the person or the characteristics of the object. Grades may be telling us more about the content or difficulty of the course than about an individual student’s achievement. And, in my opinion, that makes them suck.

So, there’s a little bit about the idiosyncrasy of numbers. The fun part is up next, some of the cultural and intellectual ramifications of grades.

Coombs, C. H. (1964). A theory of data.

Torgerson, W. S. (1958). Theory and methods of scaling.