Cornell Cognitive Studies Symposium

Statistical Learning across Cognition

Sino-Japanese Character Recognition and Multiple Constraints on a Statistical Learner

Donald Thometz and Rick Dale
Cornell University
dpt4@cornell.edu    rad4@cornell.edu

We present a model of a statistical learner (a simple recurrent network; SRN) faced with the task of learning to read Sino-Japanese characters. Character recognition has been shown to be facilitated by multiple variables, including global shape, central character region, and even initial stroke sequences. We summarize some human experimental work supporting this. Participants in an experiment were primed with different sources of visual information before a character recognition task. Global shape (overall character shape) and stroke sequence (presenting three initial strokes) both significantly enhanced performance, indicating there exists no single source of information underlying character recognition. Instead, we argue that integration of multiple sources of information governs recognition. The neural network model was designed to implement this skill as a statistical learning task (i.e., a task requiring sensitivity to the statistical properties of the characters). In order to model sequential and visual information inherent in the task, we used the SRN architecture. Input to the networks was a visual representation (12x12 grid) of individual strokes composing a character, presented in sequence. The networks were required to predict the correct character upon each stroke presentation, and were trained on a corpus of 76 characters (learned by first-graders in Japan). Character prediction by strokes in trained networks was facilitated by global shape, and prediction by global shape was facilitated by initial stroke sequences. We discuss these results in view of a multiple-constraints approach to cognition in general, and the implications for statistical learning.

 

Back to main page