Thoughtful Consideration of Automated Essay Scoring

July 6, 2017

Automated Essay Scoring (AES) is an emerging area of assessment technology that is gaining the attention of Canadian educators and policy leaders. It involves the training of computer engines to rate essays by considering both the mechanics and content of the writing. Even though it is not currently being practiced or even tested in a wide-scale manner in Canadian classrooms, the scoring of essays by computers is fueling debate leading to the need for further independent research in order to help inform decisions on how this technology should be handled.

However, independent research on automated essay scoring is hard to come by due to the fact that much of the research being conducted is by and for the companies producing the systems. For that reason SAEE, through the Technology Assisted Student Assessment Institute (TASA) commissioned Dr. Susan M. Phillips to scan and analyze the current research on this topic from a variety of disciplines including writing instruction, computational linguistics, and computer science. The purpose of the report, Automated Essay Scoring: A Literature Review, is to communicate a balanced picture of the state of AES research and its implications for K-12 schools in Canada. The review is broad in scope including a wide range of perspectives designed to be of interest to teachers, assessment specialists, developers of assessment technology and educational policy makers.

Most AES systems were initially developed for summative writing assessments in large-scale, high-stakes situations such as graduate admissions tests (GMAT). However, the most recent developments have expanded the potential application of AES to formative assessment at the classroom level, where students can receive immediate, specific feedback on their writing and can still be monitored and assisted by their teacher.

Numerous software companies have developed different techniques to predict essay scores by using correlations of the intrinsic qualities. First, the system needs to be trained on what to look for. This is done by entering the results from a number of essays written on the same prompt or question that are marked by human raters. The system is then trained to examine a new essay on the same prompt and predict the score that a human rater would give. Some programs claim to mark for both style and content, while others focus on one or the other.

In terms of their reliability, Phillips (2007) cautions, to date, there seems to be a dearth of independent comparative research on the effectiveness of the different AES engines for specific purposes, and for use with specific populations…While it would appear that one basis of comparison might be the degree of agreement of specific AES engines with human raters, this also needs to be scrutinized as different prompts, expertise of raters, and other factors can cause different levels of rater agreement.

AES has great potential. It can be more objective than human scoring because the computer will not suffer from fatigue or favoritism. Assessment criteria are applied exactly the same way whether it is the first or the thousandth essay marked on the same prompt. The potential for immediate feedback is also considered positively when AES is used as a formative assessment tool because it allows students to work at their own level and at their own pace receiving feedback on specific problem areas.

This rapid feedback also allows for more frequent testing leading to greater learning opportunities for students. By using computers to grade essays, the marking load of teachers is reduced creating more time for professional collaboration, and student-specific instruction. Since computers are being used more often as a learning tool in the classroom, computer-based testing places assessment in the same milieu as learning and provides more accessible statistical data to inform instruction.

However, adopting AES in Canadian schools requires a careful investigation of the potential threats. Some say that it removes human interaction from the writing process. Writing is a form of communication between an author and a specific audience according to The National Council of Teachers of English, and using AES violates the social nature of writing (Phillips, 2007, p. 25). Other concerns raised are related to whether the systems can adequately detect copied, or nonsense essays. Currently, systems need to be trained by specific prompts. This limits the ability of educators to modify or create their own essay questions, potentially creating greater separation between learning and assessment. Additionally, implementing AES in schools involves not only the provision of access to computers and software, likely purchased from private companies, but also technical support and professional development to sustain its use.

Phillips (2007) highlights important issues to ponder when considering whether AES is beneficial to implement at the K-12 level in Canada and concludes her review with eight recommendations in the areas of pedagogy, technology research and educational policy.

To order a copy of the full report or to request a presentation by Dr. Phillips please contact The Society for the Advancement of Excellence in Education http://www.saee.ca.

Comments are closed.