Santos - European Portuguese
Corpus of child and child-directed speech 


  Santos - European Portuguese is a corpus of child and child-directed speech, transcribed according to the CHILDES(Child Language Data Exchange System) system and using the CLAN software (MacWhinney, 2000). It includes around 52 hours of child-adult interaction, contains 27,595 child utterances and 70,736 adult utterances. The corpus is part of AcEP (for a full description see Santos, 2006 and Santos et al. 2014) and is available in the CHILDES database, from this link. The corpus is annotated using a tagger developed at CLUL (Généreux, Hendrickx & Mendes, 2012) - the POS-tags which were used are presented here. This corpus is registered under the following ISLRN: 532-620-702-768-3.

    The corpus includes data involving three children, according to the description in the table:

   Child

          Age

    MLUw

Number of files

Number of child’s utterances

INI

1;6.6 - 3;11.12

1.530 - 3.827

21

6,591

TOM

1;6.18 - 3;10.16

1.286 - 3.089

30

15,548

INM

1;5.9 - 2;9.3

1.345 - 2.834

16

5,456


All types of work using this corpus as a source of information should cite:

Santos, A. L. (2006). Minimal Answers. Ellipsis, Syntax and Discourse in the Acquisition of European Portuguese. Ph.D. Dissertation. Universidade de Lisboa. (Published 2009, Amsterdam / Philadelphia: John Benjamins).

Santos, A. L., M. Génereux, A. Cardoso, C. Agostinho, S. Abalada (2014) A corpus of European Portuguese child and child-directed speech. In Proceedings of the 9th Conference on Language Resources and Evaluation – LREC 2014. European Language Resources Association (ELRA).

   


This corpus (or its previous versions) was used as basis for different databases:

Santos, Ana Lúcia, Maria João Freitas & Aida Cardoso (2014) CEPLEXicon - A Lexicon of Child European Portuguese. Lisboa: Anagrama (CLUL, FLUL). ISLRN: 408-817-203-152-3 , ELRA ID: ELRA-L0094

CDS_EP - A lexicon of child directed speech for European Portuguese from the FrePOP database