Cornell Cognitive Studies Symposium

Statistical Learning across Cognition

Simplicity as a Principle for Language Acquisition

Nick Chater
University of Warwick
N.Chater@warwick.ac.uk

 

Viewing language learning as a statistical problem, the goal is primarly to learn the probabilistic structure in actual language use, rather than to find an abstract grammar that underpins grammaticality judgments. One way of measuring the effectiveness of this statistical analysis is to use prediction: the better the model of the language that the learner has uncovered, the better that learner should be able to predict future linguistic input. The prediction problem has been studied in a very general statistical setting, by Solomonoff (1978), who shows that an optimal way to make predictions is, roughly, to find the 'algorithm' or 'program' that is the most compressed representation of the data encountered so far; and then to predict the next items depending on their probabilities according to that algorithm (probabilistic structure can be dealt with, if we allow that the program or algorithm can take random noise, such as coin flips, as an additional input). Moreover, Solomonoff shows that the sum-squared difference between the difference between predictions, using this method, is finite, and bounded by the length of the shortest program, for the data (multiplied by a small constant factor). This foundational result in statistics, rooted in the mathematical theory of Kolmogorov complexity, can directly be applied to the language learning case. Specifically, we show that, in a rigorous probabilistic sense,

i. language can be learned from positive evidence alone (where learning is judged by prediction)

ii. language production can, in principle, be learned from exposure to language, in a certain sense.

iii. accurate grammaticality judgements can be learned from exposure to language, at least on sentences that actually have a non-neligible probability of occurring the language

iv. if the learning problem includes semantics, and the learner is trained on form-meaning mappings, these can also be learned by induction

These results are theoretical and asymptotic--they take no account of computational limitations, or the specific amount and quality of data available to the child. As a small step towards a more realistic statistical model of language acquisition, we

v. give some small-scale examples where apparently puzzling aspects of language acqusition, concerning 'alternations' can be solved by a learner using a 'scaled-down' version of the above approach.

 

Back to main page