CORDIAL-SIN - Syntax-oriented Corpus of Portuguese Dialects

Martins, A. M. (coord.) [2000- ]. CORDIAL-SIN: Corpus Dialectal para o Estudo da Sintaxe / Syntax-oriented Corpus of Portuguese Dialects. Lisboa, Centro de Linguística da Universidade de Lisboa. URL:
Financing institution
FCT – Fundação para a Ciência e a Tecnologia

The Syntax-oriented Corpus of Portuguese Dialects (CORDIAL-SIN) is a project directed towards the study of the dialectal syntactic variation of European Portuguese - within a Principles and Parameters perspective - using a corpus markup methodology. Since 1999, the project has developed and enhanced research activities on Portuguese dialect syntax.

Main Goals: 

  1. Studying the syntax of European Portuguese dialects under a comparative perspective.

  2. Developing and enhancing research activity on syntactic dialect variation in Portugal and strengthening cooperation with  international dialect syntax projects (CORDIAL-SIN participates at the networks Edisyn - European Dialect Syntax and Wedisyn - Dialect Syntax in Westmost Europe). 

  3. Building up and making available online the Syntax-oriented Corpus of Portuguese Dialects (CORDIAL-SIN). This corpus is updated and improved on a regular basis. The existence of the corpus feeds the goals expressed in 1. and 2. above.

  4. Exploiting existing resources in order to make them available to the scientific community and relevant for the development of the field of comparative dialect syntax. A rich recorded speech collection owned by CLUL provides the 'raw material' for the constitution of the Syntax-oriented Corpus of Portuguese Dialects.


  • R&D Unit Pluriannual funding - UID/LIN/00214/2013
  • DUPLEX - PTDC/LIN/71559/2006
  • Sintaxe Dialectal - POCTI/LIN/46980/2002
  • CORDIAL-SIN, Phase 2 - POSI/PLP/33275/1999


Project description:

The Syntax-oriented Corpus of Portuguese Dialects (CORDIAL-SIN) is a project directed towards the study of the dialectal syntactic variation of European Portuguese – within a Principles and Parameters perspective – using a corpus markup methodology. The project aims at developing and enhancing research activities on syntactic dialect variation, a field with no tradition in the Portuguese domain. This is implemented by exploiting and treating existing recorded data. At its present state, CORDIAL-SIN is a 600,000 words corpus.

The Dialectology team of Centro de Linguística da Universidade de Lisboa (CLUL) has constituted during the past 30 years a rich recorded speech collection – of about 4.500 hours speech recording – obtained from interviews in more than 200 localities in the Portuguese territory (having in view the elaboration of linguistic atlases). CORDIAL-SIN is based on a geographically representative body of selected excerpts of spontaneous and semi-directed speech taken from the sound materials gathered within the scope of the following projects:

  • ALEPG Atlas Linguístico e Etnográfico de Portugal e da Galiza (Linguistic and Ethnographic Atlas of Portugal and Galicia)
  • ALLP Atlas Linguístico do Litoral Português (Linguistic Atlas of the Portuguese Coast)
  • ALEAç Atlas Linguístico e Etnográfico dos Açores (Linguistic and Ethnographic Atlas of Azores)
  • BA Fronteira Dialectal do Barlavento Algarvio (Geographical Limits of the Dialect of Western Algarve)  
    [Luisa Segura da Cruz. 1987. A Fronteira Dialectal do Barlavento do Algarve. Assistant Research dissertation. Lisbon, Instituto Nacional de Investigação Científica.]  

The CORDIAL-SIN corpus is presented in four different ways: verbatim transcripts, normalized orthographic transcripts, POS tagged transcripts, and syntactically annotated transcripts. The syntactic annotation is under development within the DUPLEX project.

The verbatim transcript contains not only the standard linguistic expressions but also annotations marking pauses, hesitations, abandoned starts, phonetic and morphological variants, repetitions, truncated words, speech overlapings, fuzzy productions, etc. Such annotations are marked according to the conventions established in Normas de Transcrição (Orthographic Transcription Conventions) and are afterwards automatically erased in order to produce the normalized orthographic transcript.

The morphological annotation system is adapted from the Tycho Brahe project and is automatically set using the tagger developed by the Tycho Brahe research team. Tycho Brahe's morphological tags have an internal structure and are made up of the following components: a part-of-speech component (i.e. the main part of the tag); inflectional components; diacritics and punctuation symbols (see POS Annotation Manual).

The syntactic annotation system has been largely inspired by the annotation system developed for the Penn Parsed Corpora of Historical English. The syntactic annotation is implemented over part-of-speech tagged texts and results in a tree representation in the form of labeled brackets, marking constituent boundaries, phrase and clause dependencies, sentence types, grammatical relations, null categories and certain transformational relations. Complete and automatic searching for predefined syntactic configurations is enabled by the search program CorpusSearch2, written by Beth Randall (open source software, downloadable from Sourceforge), which is compatible with syntactic annotations in the Penn Treebank format - see Syntactic Annotation System Manual).


The CORDIAL-SIN is a dialect corpus of European Portuguese. The materials for this corpus were drawn from the recordings of dialect speech collected by the ATLAS team as fieldwork interviews for linguistic atlases between 1974 and 2004 in more than 200 locations in the Portuguese territory.

The CORDIAL-SIN compiles a geographically representative body of selected excerpts of spontaneous and semi-directed speech from these interviews. The informants were aged, received little instruction, lived in a rural area, and were born and raised in the location of the interview.
The corpus amounts to 600,000 words, collected from 42 locations within the continental territory of Portugal and the archipels of Madeira and Azores. 

The CORDIAL-SIN data are available online in written form, in the following formats: two kinds of orthographic transcripts (more or less detailed for the marking up of spoken language phenomena), PoS tagged corpus, syntactically annotated corpus. 

Creative Commons License



CORDIAL-SIN is available for download as: 

CORDIAL-SIN is searchable online 
and interoperable with other dialect corpora 
through the Edisyn Search Engine.

Creative Commons License

CORDIAL-SIN by Centro de Linguística da Universidade de Lisboa is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.



Workshop "How spatial is dialect syntax?", November 25, 2011, CLUL.  
(within International Symposium on Limits and Areas in Dialectology, November 23-25, 2011)

Martins, A. M. (2012). Aparente variação na concordância sujeito-verbo no português europeu: ambiguidade quanto ao carácter singular ou plural do sujeito frásico, in Rosae: linguística histórica, história das línguas e outras histórias. In . Lobo et al. Salvador: EDUFBA.
Cardoso, A., & Magro, C. (2012). The syntax of naming constructions in European Portuguese dialects: variation and change. Journal Of Portuguese Linguistics 11.
Carrilho, E. (2012). Syntactic Microvariation in Westmost European Languages. .
Carrilho, E., & Lobo, M. (2012). Contribution à l étude de la variation syntaxique dans le domaine ibéro-roman. In La Leçon des dialectes. Hommages à Jean-Philippe Dalbera (pp. 323-336). M. Oliviéri, G. Brun-Trigaud & P. Del Giudice. Alessandria: Edizioni dell Orso.
Costa, J., & Pereira, S. (2012). A gente: revisitando o estatuto pronominal e a concordância. In Por Amor à Lingüística. Miscelânea de Estudos Lingüísticos dedicados à Maria Denilda Moura (pp. 101-121). A. P. Sedrins, A. T. de Castilho, M. A. Sibaldo & R. B. de Lima. Maceió: Edufal.
Cardoso, A., Carrilho, E., & Pereira, S. (2011). On verbal agreement variation in European Portuguese: syntactic conditions for the 3SG/3PL alternation. Diacrítica 25.
Martins, A. M. (2011). Clíticos na história do português à luz do teatro vicentino. Estudos De Lingüística Galega, 3, 55-83.
Carrilho, E., & Magro, C. (2011). CORDIAL-SIN Syntactic annotation system manual (updated edition. . (Original work published apr)
Magro, C. (2010). When corpus analysis refutes common beliefs. The Case Of Interpolation In European Portuguese Dialects. Corpus, 9, 115-135.
Martins, A. M., & Nunes, J. (2010). Apparent Hyper-raising in Brazilian Portuguese: Agreement with Topics across a finite CP, in The Complementiser Phase: Subjects and Wh-dependencies. In (pp. 143-163). Oxford: Oxford University Press.
Costa, J., & Martins, A. M. (2010). Middle Scrambling with deictic locatives in European Portuguese, in Romance Languages and Linguistic Theory 2, ed. In (pp. 59-76). Bok-Bennema, B. Kampers-Manhe & B. Hollebrandse. Amsterdam/Philadelphia: John Benjamins.
Carrilho, E. (2010). Tools for dialect syntax: the case of CORDIAL-SIN (an annotated corpus of Portuguese dialects), in Tools for Linguistic Variation. In (pp. 57-70). Aurrekoetxea & J. L. Ormaetxea. Bilbao: Universidad del Pais Vasco.
Magro, C. (2010). Interpolação & Cia. Nos Dialectos Do Português Europeu. Estudos De Lingüistica Galega, 2, 97-119.
Carrilho, E. (2009). Sobre o expletivo ele em português europeu. Estudos De Lingüística Galega, 1, 7-26.
Martins, A. M. (2009). Subject doubling in European Portuguese dialects: The role of impersonal se, in Romance Languages and Linguistic Theory 2007, ed. In (pp. 179-200). Aboh, E. van der Linden, J. Quer & P. Sleeman. Amsterdam & Philadelphia: John Benjamins.
Martins, A. M., & Nunes, J. (2009). Syntactic change as chain reaction: The emergence of hyper-raising in Brazilian Portuguese, in Historical Syntax and Linguistic Theory. In (pp. 144-157). Crisma & G. Longobardi. Oxford/New York: Oxford University Press.
Carrilho, E. (2008). Beyond doubling: overt expletives in European Portuguese dialects, in Syntax and Semantics. Vol. 36: Microvariation In Syntactic Doubling, 36:, 301-323.
Hornstein, N., Martins, A. M., & Nunes, J. (2008). Perception and Causative Structures in English and European Portuguese: Φ-feature Agreement and the Distribution of Bare and Prepositional Infinitives. Syntax 11.
Martins, A. M. (2008). Investigating language change in a comparative setting, in Questions on Language Change. In (pp. 99-116). Almeida, B. Sieberg & A. M. Bernardo. Lisboa: Colibri/Centro de Estudos Alemães e Europeus.
Martins, A. M. (2007). Double realization of verbal copies in European Portuguese emphatic affirmation, in The Copy Theory of Movement. In (pp. 77-118). Corver & J. Nunes. Amsterdam/Philadelphia: John Benjamins.
Carrilho, E. (2007). Beyond Subject Doubling: expletive constructions in European Portuguese dialects. In Dialect Syntax Archive. Amsterdam: Edisyn Project.
Carrilho, E., Mota, M. A., & Saramago, J. (2006). Variação Regional e Social. In Mostra de Linguística. A Linguística em Portugal: estado da arte, projectos e produtos. S. Frota and M. Colaço. Lisboa: APL.
Martins, A. M. (2006). Aspects of infinitival constructions in the history of Portuguese, in Historical Romance Linguistics: Retrospective and Perspectives. In (pp. 327-355). Gess & D. Arteaga. Amsterdam/Philadelphia: John Benjamins.
Martins, A. M. (2006). Emphatic Affirmation and Polarity: Contrasting European Portuguese with Brazilian Portuguese, Spanish, Catalan and Galician, in Romance Languages and Linguistic Theory 2004, ed. In (pp. 197-223). Doetjes & P. Gonzalez. Amsterdam/Philadelphia: John Benjamins.
Martins, A. M., & Nunes, J. (2006). Raising issues in Brazilian and European Portuguese. Journal Of Portuguese Linguistics, 4, 53-77.
Hornstein, N., Martins, A. M., & Nunes, J. (2006). Infinitival complements of perception and causative verbs: a case study on agreement and intervention effects in English and European Portuguese. University Of Maryland Working Papers In Linguistics, 14, 81-110.
Martins, A. M. (2005). Clitic Placement, VP-ellipsis and scrambling in Romance, in Grammaticalization and Parametric Change. In (pp. 175-193). Batllori, M. -Ll. Hernanz, C. Picallo, & F. Roca. Oxford/New York: Oxford University Press.
Martins, A. M. (2005). Passive and impersonal se in the history of Portuguese, in Romance Corpus Linguistics II: Corpora and Diachronic Linguistics. In (pp. 411-430). Pusch, J. Kabatek & W. Raible. Tübingen: Gunter Narr Verlag.
Batllori, M., Iglésias, N., & Martins, A. M. (2005). Sintaxi dels clítics pronominals en català medieval. Caplletra, Revista Internacional De Filologia, 38, 137-177.
Costa, J., & Pereira, S. (2005). Phases and autonomous features: a case of mixed agreement in European Portuguese, in Perspectives on Phases. In . M.
Carrilho, E., Magro, C., & Pereira, S. (2004). Morphological Tagging and Syntactic Annotation of a Dialectal European Portuguese Corpus, in Language Technology for Portuguese: shallow processing tools and resources. In (pp. 73-87). Branco, A. Mendes & R. Ribeiro. Lisboa: Colibri.
Carrilho, E. (2003). Ainda a unidade e diversidade da língua portuguesa : a sintaxe". In Razões e Emoção. Miscelânea de Estudos em Homenagem a Maria Helena Mira Mateus. Vol. 2 (Vol. 2, pp. 19-41). I. Castro & I. Duarte. Lisboa: Imprensa Nacional – Casa da Moeda.
Carrilho, E. (2003). Construções de expletivo visível em Português europeu (não-padrão). In Gramática e Léxico em Sincronia e Diacronia. Um contributo da Linguística portuguesa (pp. 29-38). A. Veiga. Santiago de Compostela: Universidade de Santiago de Compostela.
Martins, A. M. (2003). Construções com SE: mudança e variação no português europeu, in Razões e Emoção: Miscelânea de estudos em Homenagem a Maria Helena Mira Mateus. I, 2, 19-41.
Martins, A. M. (2003). From unity to diversity in Romance syntax: A diachronic perspective of clitic placement in Portuguese and Spanish, in Aspects of Multilingualism in European Language History. In (pp. 201-233). Braunmüller & G. Ferraresi. Amsterdam / Philadelphia: John Benjamins.
Protótipo de um Glossário dos Dialectos Portugueses com Anotação Sintáctica
Variation and Change in the Syntax of Relative Clauses: New Evidence from Portuguese
O Marcador de Negação Metalinguística Agora nos Dialectos do Português Europeu
Negação Metalinguística e Estruturas com Nada no Português Europeu
Como é que é com o “é que”? Análise de estruturas com “é que” em variedades não standard do português europeu
Clíticos: Variação sobre o Tema
Expletive Ele in European Portuguese Dialects
Gramática Comparada de a gente: Variação no Português Europeu