This resource is a result of a study on Natural Language Processing which main goal was the development of a semantic taxonomy to classify nominal Multiword Lexical Units (MLU) for European Portuguese (EP). Despite being built by single words, MLU don’t have a compositional meaning and have morphosyntactic restrictions. These units are so important in any text that their identification and classification is essential for information extraction and retrieval in Natural Language Processing. We adapted and applied a semantic taxonomy, based on the Lancaster semantic lexicon[1], to a list of MLU extracted from CETEMPúblico[2].

          The automatic extraction of MLU from CETEMPúblico was made with Unitex[3]. The list obtained from the automatic extraction was then manually annotated, in order to exclude non-nominal MLU, named entities and repetitions.­ The final list has 5068 nominal MLU. 

          Therefore, this resource includes two lists: (i) List of Nominal MLU in EP; and (ii) List of Nominal MLU in EP Semantically Classified.

          In the first one, we present the nominal MLU and, in the second one, the nominal MLU semantically classified. The classified list results of the application of our semantic taxonomy adapted from Lancaster semantic lexicon to the list of nominal MLU.


List of Nominal MLU in EP

List of Nominal MLU in EP Semantically Classified

Semantic Taxonomy Applied to Nominal MLU for European Portuguese
 

Team:

Aida Cardoso

Silvana Abalada

Vera Cabarrão

 

Poster:

Abalada, Silvana, Vera Cabarrão & Aida Cardoso: "Proposta de Classificação Semântica de Unidades Lexicais Multipalavra Nominais". Poster presented in the XXV Encontro Nacional da Associação Portuguesa de Linguística, Lisbon, October, 22nd-24th, 2009.

Publication:

Abalada, Silvana, Vera Cabarrão & Aida Cardoso (2010): "Proposta de Classificação Semântica de Unidades Lexicais Multipalavra Nominais". In Ana Maria Brito, Fátima Silva, João Veloso & Alexandra Fiéis (orgs.): Textos Seleccionados do XXV Encontro Nacional da Associação Portuguesa de Linguística 2009. Oporto: Edições Colibri/APL. [PDF]


[1] Piao, Scott et alii (2005) "A Large Semantic Lexicon for Corpus Annotation". In Proceedings from The Corpus Linguistics Conference Series, Corpus Linguistics 2005. Birmingham.
[2] http://www.linguateca.pt/cetempublico/.
[3]
http://www-igm.univ-mlv.fr/~unitex/.