CEPLEXicon - A Lexicon of Child European Portuguese

       CEPLEXicon is a lexicon based on two different corpora of child speech – Santos corpus (Santos, 2006, Santos et al., 2014) and Freitas corpus (Freitas, 1997, Freitas et al. 2012) (part of AcEP). This lexicon results from the automatic tagging of the two corpora, using a tagger and the pos tag set produced in the research unit ANAGRAMA (Centro de Linguística da Universidade de Lisboa - CLUL) (Généreux, Hendrickx & Mendes, 2012). The automatic tagging was followed by a partial manual revision (as described in the manual).


     This lexicon covers all the speech produced by seven monolingual Portuguese children aged 1;02.00 to 3;11.12, in a total of 114 files, each corresponding to 40-50 minutes of child-adult interaction in a naturalistic setting. The lexicon is presented in .xls format and includes 2201 lemmas, the number of occurrences of each lemma in three different age periods (<2 years of age; ≥ 2 and < 3 years of age; ≥ 3 years of age), frequency of the lemma in each period and age of first occurrence for each child.

   CEPLEXicon was developed at ANAGRAMA (CLUL, Faculdade de Letras da Universidade de Lisboa), under the project Complement Clauses in the Acquisition of Portuguese (PTDC/CLE-LIN/120897/2010), funded by Fundação para a Ciência e Tecnologia.

The full reference to CEPLEXicon should be included in all types of work using it as a source of information, including books, papers, conference presentations or posters, evaluation tools or any other products.

How to cite CEPLEXicon (version 1.1):

Santos, Ana Lúcia, Maria João Freitas & Aida Cardoso (2014) CEPLEXicon - A Lexicon of Child European Portuguese. Lisboa: Anagrama (CLUL, FLUL). ISLRN: 408-817-203-152-3 , ELRA ID: ELRA-L0094
Link to ELRA Catalogue:
 http://catalog.elra.info/product_info.php?products_id=1244