Automated scoring - Humans have become as bad as robots at grading essays - Syngli - BlogSyngli

Automated scoring – Humans have become as bad as robots at grading essays

April 28, 2014 6:00 pm
Published by Syngli

In an article in The Chronicle of Higher Education, author Steve Kolowich describes the satirical creation of the Babel (Basic Automatic B.S. Essay Language) Generator, by Les Perelman and his students at MIT. The software generates essays prompted only with keywords. The article gives a sample sentence generated as part of an essay based on the single keyword “privacy”

Privateness has not been and undoubtedly never will be lauded, precarious, and decent… Humankind will always subjugate privateness.”

Perelman’s intent is to expose the limitations of automated essay scoring (AES) algorithms, which he argues don’t measure “any of the real constructs that have to do with writing”. Indeed, the “privacy” essay scored 5.4/6 according to the MY Access! online essay scoring software (also used as one of two readers by GMAT), with its “language use and style” and “focus and meaning” scored as “advanced”.

Photo by Aneta Pawlik on Unsplash.

The article does touch on the issue that the AES algorithms, since their introduction in 1973 (Ajay et al., 1973), have now advanced to the point where they can provide ratings similar to those of human raters. It cites work by Shermis, who recently published a study that found reliably similar scores awarded by humans and automated algorithms among 22,029 essays by junior-high and highschool students (Shermis, 2014 Assessing Writing). (Perelman published a critique of an earlier version of the Shermis study in 2013 in The Journal of Writing Assessment.)

This phenomenon would seem to imply either that automated scoring algorithms have become very good at understanding language, or that human scorers of standardized tests treat these essays in much the same way as would a robot, e.g. by identifying keywords, complex sentences, and passage length. There’s no doubt which interpretation is favored by Babel’s creator. For assessments like the SAT writing test, writes Kolowich, “Mr. Perelman says the rubric is so rigid, and time so short, that they may as well be robots”. The question of whether standardized exams can ever feasibly assess high-level writing abilities remains the elephant in the room.

Post Views: 138

Tags: graduate management admission test (GMAT), language, scholastic aptitude test (SAT), standardized testing

Categorised in: Uncategorized

This post was written by Syngli