SynExtract - automatic extraction of synonymy relations for a cost-effective acquisition of language resources

Funding institution
FCT – Fundação para a Ciência e a Tecnologia

General goals

 automatically extracting synonymy relations from unstructured data with the aim of reducing human intervention in the acquisition of language resources without compromising their precision. In contrast with what is the case for other lexical-conceptual relations, there are no clear text patterns that might work as cues for the identification of synonymy relations. Given this, one of the challenges of this project consists in conceiving new approaches and methods:
        exploring different strategies for measuring semantic similarity between words in corpora;
        using lexical and syntactical patterns, even if these have a very low frequency, for identifying synonymy relations;
        filtering preliminary results to reduce the search space in the context of the identification of synonymy relations;
        evaluating the usability of the information obtained in the process of extracting synonymy relations and considering the possibility of integrating it in the WordNet.PT database.

CLUL - Centro de Linguística da Universidade de Lisboa
Universitat Pompeu Fabra