Do we need neural models to explain human judgments of acceptability?

AbstractNative speakers can judge whether a sentence is an acceptable instance of their language. Acceptability provides a means of evaluating whether computational language models are processing language in a human-like manner. We test the ability of language models, simple language features, and word embeddings to predict native speakers' judgments of acceptability on English essays written by non-native speakers. We find that much sentence acceptability variance can be captured by a combination of misspellings, word order, and word similarity (r=0.494). While predictive neural models fit acceptability judgments well (r=0.527), we find that a 4-gram model is just as good (r=0.528). Thanks to incorporating misspellings, our 4-gram model surpasses both the previous unsupervised state-of-the art (r=0.472), and the average native speaker (r=0.46), demonstrating that acceptability is well captured by n-gram statistics and simple language features.

Return to previous page