Using Natural Language Processing Models to Evaluate STEM Book Coherence

Hilary Miller, Department of Psychology, Emory University, Atlanta, Georgia, United States
Phillip Wolff, Psychology / Linguistics, Emory University, Atlanta, Georgia, United States

AbstractLearning in the STEM disciplines depends on high-quality STEM books, but choosing a textbook can be difficult in the absence of objective measures of text quality. Here we compared two natural language processing approaches for evaluating text cohesion. In Coh-Metrix (Graesser et al. 2004), text cohesion is indicated by the mean cosine value of the all possible pairs of sentence vectors, with sentence vectors based on LSA. We introduce a new method for measuring text coherence based on the deep learning language model RoBERTa (Liu et al., 2019). In this new approach, coherence is measured by determining the average predictability of all of the words in the text, with word predictability a function of each word’s linguistic context. Coherence as measured by RoBERTa more closely matched the coherence ratings of human judges than did Coh-Metrix. Implications for the assessment and categorization of STEM books are discussed.

The Document

Return to previous page