Recent research indicates that while top Generative AI models are being tested for grading university essays, they do not yet match human grading capabilities effectively. In fact, these models only achieved comparable accuracy to human evaluators in around 50% of cases.
Key Findings:
- The AI systems often misjudged both high and low-quality submissions.
- Current AI technology tends to reward stylistic elements over substantial content.
- Further improvements are needed before AI can reliably assess academic writing.