Monday, April 7, 2014


Automated Scoring of Writing Quality

Machines and Human writing


  I am sitting in a lab with students preparing for TOEFL IBT exam. My left ear catches phrases like "In this set of materials...", "The listening passage discusses the difference between to types of bacteria...", while my right ear, to its great surprise, catches the second halves of the same sentences "... the reading passage is a news bulletin on a job announcement, while the listening passage..", " ... the reading passage casts a doubt on the information in the listening passage". The same "automated phrases" are also used in TOEFL writing, as experience has shown me. With a faint smile, I lazily pity the poor person who checks those essays. 
   
  However, a recent  discovery of mine, related to TOEFL and other high-stakes tests is that the essays, written by students, are not only checked by human scorers but also by special automated scoring engines (the  e-rater® in case of TOEFL).  The scores of the human rater and the program are compared and a final score is then assigned.
  The advantages of an automated engine are obvious:
a) Objectivity: A computer program has neither  interests nor judgments. There is no need to worry that it might have a prejudice against you just because you mentioned Justin Beaber as a person you thoroughly admire.

b) Financially economical: Needless to say a computer program is an ideal employee in terms of money. Once it is installed, it only required careful maintenance.The rest is obedience and hard work.

 Unfortunately AES can not be used as a sole evaluating tool of writing (in case of high-stakes exams at least). Although the results of AES have often correlated with human scoring, still it is almost impossible to imagine a computer program justly evaluating the highly complex nature of human writing. Let's have a look at the criteria that the AES take into account while evaluating writing:

  • errors in grammar (e.g., subject-verb agreement)
  • usage (e.g., preposition selection)
  • mechanics (e.g., capitalization) 
  • style (e.g., repetitious word use)  
  • discourse structure (e.g., presence of a thesis statement, main points) 
  • vocabulary usage (e.g., relative sophistication of vocabulary)

 While aspects such as prepositions, agreement between verb and subject and capitalization can be possible handled by an artificial intelligence, more complex parts of human language such as syntax, quality of argumentation, collocation, punctuation, appropriacy of vocabulary,etc seem impossible to asses without human intervention. The quality of argumentation, for example, has little to do with the complexity of vocabulary.

No comments:

Post a Comment