PREPLEXOS - Complex Predicates: typology and corpus annotation

Funding institution
FCT – Fundação para a Ciência e a Tecnologia
Project PI
Inês Duarte

Project summary:
This project aims at presenting a unified approach to complex predicate formation, along which the complex predicate constructions available in European Portuguese may be accounted for, namely in what concerns their lexico-syntactic properties and their interpretation. The project findings build on the analysis of authentic data, extracted from a 1 million word corpus of written and spoken European Portuguese, tagged, lemmatised and manually revised. A new layer of information has been added to this 1M word corpus, following complex predicate constructions typology.

This project focuses on four specific types of complex predicate constructions. One regards constructions with two main verbs (e.g. fazer rir x) that raise issues concerning the categorial nature of the embedded domain and the characterization of its functional heads, its Syntax-Semantics interface, the temporal impact of the infinitive predicate, as well as the role played by the inflected and the infinitive forms on the final aspectual type of the whole construction.
The second type of construction involves what is usually called a "light verb" followed by a derived noun dar um passeio or a noun expressing an emotion, i.e., a psych-noun ter medo. It is important to determine the exact nature of the semantic contribution of light verbs to the clause and the similarities and differences between the light verb construction and its lexicalized verbal counterpart, if it exists. Possible aspectual selection restriction in complex predicate constructions were observed, as well as relevant restrictions with respect to tense selection. 
The third type corresponds to cases of light verbs + secondary predicates, either an adjective "tornar credível", or a prepositional phrase fazer x em pedaços. The analysis of corpus occurrences of light verbs and adjectives will allow the identification of lexical and aspectual subclasses of adjectives that may occur as secondary predicates (e.g., eventive vs stative, stage-level vs individual-level).
Accounting for these different types of constructions will give us important findings to establish relevant and necessary conditions for the definition of a light verb and of complex predicate constructions.
Finally, the properties of serial verb constructions with two concatenated verbs (O Pedro pegou e despediu-se), namely the fact that the two verbs combine to form a single semantic predicate and convey a single event interpretation, raise the issue of whether to relate this construction to regular coordination or to a specific type of serial verb construction.
Crosslinguistic comparison of these constructions with other Romance languages and with English is also one of the aims of the project.

These complex predicates are analysed based on data extracted from an annotated corpus of 1 million words, the CINTIL corpus, which has been compiled at the Centre of Linguistics of the University of Lisbon and is a subpart of a larger monitor corpus, the Reference Corpus of Contemporary Portuguese (CRPC), with 330M words. The CINTIL corpus has approximately 50% written and 50% spoken words (formal and informal), and was annotated with part-of-speech tags and lemmatized, and has been manually revised, under a partnership between the NLX - Natural Language and Speech Group of the Faculty of Sciences of the University of Lisbon and CLUL. The corpus can be queried online)
One of the outcomes of the project is the annotation of the CINTIL corpus with information on complex predicate constructions. This new layer of information on complex predicates can play an important part for future syntactic and semantic annotation.
The annotated corpus will be made available for query.

