How Getting Computers To Mark NAPLAN Tests Could Go Wrong

How Getting Computers To Mark NAPLAN Tests Could Go Wrong
To sign up for our daily newsletter covering the latest news, hacks and reviews, head HERE. For a running feed of all our stories, follow us on Twitter HERE. Or you can bookmark the Lifehacker Australia homepage to visit whenever you need a fix.

Early May would be incomplete without some NAPLAN controversy. This year’s comes from the announcement last week that the national exam sat by students across the country in Years 3, 5, 7 and 9 is to be marked by computers in 2017.

Classroom picture from Shutterstock

Part of the argument for moving to online marking is that it will decrease turnaround time from months to just weeks. While this is uncontroversial for multiple-choice-style tests, which have a correct answer, it is much more problematic when applied to creative writing.

Can computers mark creative writing?

The NAPLAN written task is usually a narrative or persuasive task and is an extended piece of prose. The marking criteria include audience, text structure, cohesion, vocabulary, paragraphing, sentence structure, punctuation and spelling.

When writing persuasive texts, the guide explains that:

students are required to write their opinion and to draw on personal knowledge and experience when responding to test topics.

The guide also explains that for narrative texts, there should be a:

growing understanding that the middle of the story needs to involve a problem or complication that introduces conflict, danger or tension that must be resolved. It is this uncertainty that draws the reader in and builds suspense.

The question is whether computers can appropriately mark students’ creative writing with this level of sophistication.

According to the Australian Curriculum, Assessment and Reporting Authority (ACARA), they can.

The approach being taken is one that uses supervised machine learning, where sample tests marked by humans are fed into an algorithm that learns how to recognise quality responses by reverse-engineering scoring decisions. Trials conducted by ACARA have demonstrated that:

artificial intelligence solutions perform as well, or even better, than the teachers involved.

One argument is that computer marking has less variability than human markers, although these claims to marker reliability are contested.

For example, what would happen if a student were to submit a nonsense piece that happened to meet the expectations of the algorithm?

Automated marking is not a new thing. It has been particularly visible since the rise of MOOCs and the search for a cheap alternative for marking student papers.

The research literature provides a mixed picture of potential benefits and pitfalls, yet there has been vocal opposition to computer marking from academics and educationalists.

The rise of algorithms can be seen in many places, including chess-playing computers, self-driving cars, metadata analysis to predict behaviour, online advertising, speech-recognition software and auto-completing search engines. It seems only logical that algorithms would enter our classrooms.

What actually matters in education?

One thing that strikes me as ironic is that we would be using computers, which can’t actually read or write, to test the reading and writing of our students. Is the next step to replace our teachers with robot instructors who can provide standardised, objective and completely emotionless feedback in the classroom?

How can a computer assess creativity and flair? How would it recognise irony, wit and humour? What about writers who use unconventional approaches for effect?

While algorithms can easily process literal meaning, what happens with inferential meaning or drawing on rich contexts, background knowledge, prior learning, cultural and social discourses? These are all part of the complex tapestry of human meaning-making in reading and writing.

As one example, the NAPLAN marking guide refers to the use of classical rhetorical discourse in persuasive writing, including:

Pathos — appeal to emotion

Ethos — appeal to values

Logos — appeal to reason.

I have not yet come across a computer except in science-fiction films that has emotions or values that could be appealed to in any persuasive sense.

There are serious concerns that computer marking of the NAPLAN writing task will have unintended effects on teaching and learning, including online reading and writing strategies different to those of traditional print-based comprehension and composition.

A further concern is that computer marking will have a reductive effect on student writing, with “teaching to the test” becoming more of a problem than it already is.

Maybe it isn’t that far-fetched to imagine computers marking assignments and robots teaching classrooms. After all, there are predictions that we will reach the singularity, the point at which artificial intelligence overtakes humans, in 2029.

Wouldn’t ACARA be better off putting the money into something that has an impact on the quality of learning of students in Australian schools rather than conducting this particular experiment? To be focusing on test scoring that is faster and cheaper seems to be at odds with what actually matters in education.

Until we reach the singularity, perhaps we should focus on improving equity and access for students who are most disadvantaged in our education system, and leave the robots out of it.The Conversation

Stewart Riddle is Senior Lecturer at University of Southern Queensland.

This article was originally published on The Conversation. Read the original article.


  • We were responding to exactly this in one of our meetings recently. This will fail and it will fail *hard*. It will also be manipulated in a huge way to make it look like they’re getting the results they want initially. Just like the results of the Naplan were initially manipulated to show preferential results the first few years.

    A computer program cannot operate outside initially programmed, preset boundaries. If a childs creativity expands beyond those boundaries and shows true, genius level approaches, you’re not going to get a computer system that comprehends that. What if the system marks that child in the bottom band before considering the child may have just ‘shattered’ its algorithms? When you have millions of kids per year performing this garbage test, it IS going to happen, it’s just a question of when. When will someone hit that magic combination that will go against the Naplan systems preset algorithms?

    Like the article said, what if you get some child who DOESN’T comprehend? Who writes crap, but flukes it, flukes the rubbish that gets that child somehow into the top band of literacy? Numeracy is easier, but literacy? No.

    I’m pulling my child from all Naplan oriented activities, he’s hitting grade 7 next year, he doesn’t need to be distracted by this absolute crap. All parents out there need to research this and weigh up ‘Is the distraction to your childs education and potential loss to their education worth it so some beancounters can satisfy themselves to arbitrary statistics from a less than arbitrary standardised test?’

    It’s amazing how as teachers we’re constantly, CONSTANTLY told ‘Do not teach to the test’, yet, the Naplan is 110% the anti-thesis to that. You are teaching exactly to the test with it. That’s precisely what it is.

    And that’s just part of the reason why it’s absolute horseshit.


    Just this week John Oliver addressed the problems of standardised testing in US schools. It was not a positive picture, and it felt like looking at the Australian system in about 5 years.

    All the same language used in that article is the same that we’re hearing here in Australia – value-added analysis, accountability, and evaluating schools and teachers on test results.

    And where is all this technology coming from? A handful of companies who sell these tests, the marking systems, and the support materials. They’re here too.

    Test developers are engaged to develop questions that meet the endorsed test specifications. ACARA contracts out this part of the process to organisations that successfully demonstrate exceptional experience and competence in the area of test development.

    Personally, I’d like to see the Gonski Reforms implements in full. There’s simply no point in any testing unless you’re also making sure that the schools which need more help are actually getting it through a needs based funding model.

    • Hopefully, hopefully it’ll be brought in next year when the next Government replaces this one. This one seems anti-education, anti science, anti… anything that gives people any sort of information. Hopefully.

Show more comments

Comments are closed.

Log in to comment on this story!