SynExtract - automatic extraction of synonymy relations for a cost-effective acquisition of language resources

Funding: FCT
Code: SFRH/BPD/79900/2011
Research Unit: UPF/CLUL
Period: 2012– 2014

General goals

automatically extracting synonymy relations from unstructured data with the aim of reducing human intervention in the acquisition of language resources without compromising their precision. In contrast with what is the case for other lexical-conceptual relations, there are no clear text patterns that might work as cues for the identification of synonymy relations. Given this, one of the challenges of this project consists in conceiving new approaches and methods:
        exploring different strategies for measuring semantic similarity between words in corpora;
        using lexical and syntactical patterns, even if these have a very low frequency, for identifying synonymy relations;
        filtering preliminary results to reduce the search space in the context of the identification of synonymy relations;
        evaluating the usability of the information obtained in the process of extracting synonymy relations and considering the possibility of integrating it in the WordNet.PT database.