Does Language Model Surprisal Measure Code Comprehension?

AbstractRecognition of the similarities between programming and natural languages has led to a boom in the adoption of language modeling techniques in tools that assist developers. However, language model surprisal, which guides the training and evaluation in many of these methods, has not been validated as a measure of cognitive difficulty for programming language comprehension as it has for natural language. We perform a controlled experiment to evaluate human comprehension on fragments of source code that are meaning-equivalent but with different surprisal. We find that more surprising versions of code take humans longer to finish answering correctly. We also provide practical guidelines to design future studies for code comprehension and surprisal.


Return to previous page