This project had mainly two goals:
a) to construct a database of recorded samples broadcasted through social communication means, namely, radio, TV and written press, including: samples of audio and video in analogical format, samples of audio in digital format (digitalized from the original analogical samples), and samples in written (electronic) format (for the press materials); data transcription and morpho-syntactic tagging; information on frequency and concordances of all the data.
b) to perform descriptions of European Portuguese as used in media, providing information on its lexical, grammatical, semantic and style properties. The final corpus has around 324.000 words: 108.000 for each media (radio, TV and written press), divided by the following 6 fields: Economics, Opinion, Sports, Culture, Generic News and Science, with 54.000 words each field (18.000 words per field in each type of media).
A detailed description of the project, as well as related publications, is available at the Instituto de Linguística Teórica e Computacional (ILTEC) webpage (http://www.iltec.pt/projectos/concluidos/redip.html.
The REDIP corpus can be queried online at ILTEC's webpage (http://www.iltec.pt/?action=concord).