Corpus of English Children’s Literature
(COECL)
Would you like access to COECL for researching children’s literature? If so, please message me.
178 unabridged texts of English children’s literature
published between 1900-2020
7,603,947 words
includes many texts consistently present in primary schools and children’s libraries
John Newbery Medal winners and nominees
other texts written by John Newbery Medal winners and nominees
texts from New York Public Library’s 125 books recommended for children
COECL for Graphophonemic Analysis
(COECL-GPA)
This corpus for graphophonemic analysis takes the 5,000 most frequent words in COECL (~90% of the original corpus) and codes 8,478,734 vowels into graphotactics (spelling patterns), phonemic sequences (sound patterns), and their correspondences. This is the first such corpus of its kind. It is a resource for studying the natural distribution of English phonics in authentic children’s literature.
In January 2025, the graphophonemic coding for COECL-GPA was updated with more refined graphotactics. COECL-GPA has two forms: American English (COECL-GPA AE) and Singaporean Eglish (COECL-GPA SE). Please see below for resources from either graphophonemic corpus.
American English
164 graphotactics
91 phonemic sequences
366 graphemic-phonemic correspondences
Singaporean English
164 graphotactics
78 phonemic sequences
348 graphemic-phonemic correspondences
Paquin, S. (2024). Frequency distribution of graphemic
phonemic correspondences of vowels in English children's literature (Publication No. 31640390) [Master's Thesis, University of Massachusetts Boston]. ProQuest Dissertations & Theses Global.
CamTESOL 2025 Presentation
overview of findings
comparison of natural graphophonemic distribution of phonics curricula
potential importance for phonics instruction in Inner Circle and World Englishes
explanation on using lexical sets to phonemically retune graphotactics to World Englishes