"I Would Have Written My Code Differently": Beginners Struggle to Understand LLM-Generated Code

Yangtian Zi, Luisa Li, Arjun Guha, Carolyn Jane Anderson, and Molly Q Feldman
Human-Centered AI for Software Engineering (HumanAISE), 2025

Large language models (LLMs) are being increasingly adopted for programming work. Prior work shows while LLMs accelerate task completion for professional programmers, beginning programmers struggle to prompt models effectively. However, prompting is just half of the code generation process– when code is generated, it must be read, evaluated, and integrated (or rejected). How accessible are these tasks for beginning programmers?

This paper measures how well beginners comprehend LLM-generated code and explores the challenges students face in judging code correctness. We compare how well students understand natural language descriptions of functions and LLM-generated implementations, studying 32 CS1 students on 160 task instances. Our results show a low per-task success rate of 32.5%, with indiscriminate struggles across demographic populations. Key challenges include barriers for non-native English speakers, unfamiliarity with Python syntax, and automation bias. Our findings highlight the barrier that code comprehension presents to beginning programmers seeking to write code with LLMs.

PDF available on arXiv

@inproceedings{zi:reverse-charlie,
  title = "``I Would Have Written My Code Differently'': Beginners Struggle to Understand {LLM}-Generated Code",
  author = "Yangtian Zi and Luisa Li and Arjun Guha and Carolyn~Jane Anderson and Molly~Q Feldman",
  year = 2025,
  booktitle = "Human-Centered AI for Software Engineering (HumanAISE)"
}