Using Natural Language Processing Models to Evaluate STEM Book Coherence

AbstractLearning in the STEM disciplines depends on high-quality STEM books, but choosing a textbook can be difficult in the absence of objective measures of text quality. Here we compared two natural language processing approaches for evaluating text cohesion. In Coh-Metrix (Graesser et al. 2004), text cohesion is indicated by the mean cosine value of the all possible pairs of sentence vectors, with sentence vectors based on LSA. We introduce a new method for measuring text coherence based on the deep learning language model RoBERTa (Liu et al., 2019). In this new approach, coherence is measured by determining the average predictability of all of the words in the text, with word predictability a function of each word’s linguistic context. Coherence as measured by RoBERTa more closely matched the coherence ratings of human judges than did Coh-Metrix. Implications for the assessment and categorization of STEM books are discussed.


Return to previous page