SUL – Summer University in Linguistic



This Summer University aims to put its attendees in contact with methodological aspetcs of the work in various areas of linguistics: language acquisition and processing, geographical variation and textual variation. Its aim is to complement the scientific training of post-graduation students from FLUL and of external students with the practical learning of software specially designed to work with different types of linguistic data.


The work language of the course is English.


Course date: July 15th-19th, 2019

Location: Room 6.1

Course duration: 1 week (6 daily hours)


Registration fee (includes certificate, coffe breaks and friday's lunch):

80€ for participants from FLUL (students/teachers/researchers)

150€ for participants outside FLUL


Registration period: June 17th-28th, 2019

Registration procedures:

To register, please fill out the registration form and send it to our email address along with a copy of the proof of payment.

Payment details: 

Faculdade de Letras da Universidade de Lisboa (Alameda da Universidade, 1600-214 Lisboa, Portugal)

Caixa Geral de Depósitos, S.A. (Av. João XXI, 63, 1000-300 Lisboa, Portugal)

IBAN: PT50 0035 0824 00011450130 14


Description: CLUL - SUL - Name

Note: Due logistical restrictions, no reimbursements are possible.

ContactThis email address is being protected from spambots. You need JavaScript enabled to view it.



We offer attendees the acquisition of knowledge in various software and experimental methods fit to be applied to their research works. The participation in this course will allow students to be autonomous in their work with linguistic data and will grant them a wider perspective of different available methods.


Work plan:



Program summary:

Experimental methods in the acquisition of syntax: comprehension and production

Ana Lúcia Santos (FLUL/CLUL)

This module explores different offline methods for the study of the development of syntax which have been used in research with preschool children. Different types of elicited production tasks will be explored, including semi-structured elicitation or different contexts of sentence completion. In the case of comprehension, we will discuss different types of tasks, considering their suitability for the study of specific structures: we will consider picture selection and picture evaluation tasks, truth-value judgment tasks, reference judgment tasks and their advantages and disadvantages considering the target structure, the research questions and the time available to apply the experiment.


L1 and L2 Phonological Acquisition

Maria João Freitas (FLUL/CLUL) & Chao Zhou (CLUL)

In this tutorial, we will provide background on former and current experimental methods in L1 phonological acquisition and in L2 speech learning research. Both perception and production studies will be taken into account. Information on types of samples, different techniques for data collection and tools for language assessment will be discussed under the light of general research questions in L1 and L2 phonological acquisition.


Tools for the assessment of early language acquisition and development

Marina Vigário (FLUL/CLUL) & Marisa Cruz (CLUL)

In this module, we present a selection of tools for the assessment of infant and child language especially built for or adapted to the Portuguese language. The course is predominantly hands-on and is supported by materials that are made available to participants before hand. We explore (i) frequency databases (FrePOP and child, adult and child directed speech lexica), and applications that generate frequency resources (FreP and FreLex -; (ii) tools for the assessment of prosody, such as Proso-Quest (parental report of infants prosodic skills), Prova de Avaliação da Prosódia - Crianças em idade pré-escolar (tool for the assessment of prosody in pre-school children) and PEPS-C (Profiling Elements of Prosodic Systems-Children), the Developmental Concern Questionnaire, Q-CHAT 10 (Quantitative Checklist for Autism in Toddlers), and the MacArthur-Bates Communicative Development Inventories (CDI) for European Portuguese (short forms) -


Eyetracking paradigm for reading processing analysis

Isabel Falé (UAberta/CLUL), Paula Luegi (CLUL) & Armanda Costa (FLUL/CLUL)

In this seminar we will show how eyetracking paradigm can be used for the analysis of language processing while reading. We will demonstrate this with an iView X™ Hi-Speed 1250 SMI eyetracker. Participants will be able to get familiar with different software necessary to measure eye-movements while reading: iView, to properly calibrate the eye for eyetracking experiments (we will show how to use it, some problems that can be found and potential solutions for calibration improvement); Experiment Center, the software that enables researchers to control for the order and time of stimuli presentation; BeGaze, the analysis software, which allows the definition of regions of interest for the subsequent analysis and also the visualization and extraction of: scanpaths, heat-maps and, most importantly, many different dependent variables that can be analyzed.

Participants of this seminar will be able to take part of different steps of a typical eyetracking while reading experiment: creating the experiment, collecting the data and, at the end, visualizing the results of collected data.


Portuguese parsed corpora: a room with a view to syntactic variation

Catarina Magro (CLUL)

At which point in the diachrony of Portuguese did ‘é que’-clefts emerge? And when was relative clauses extraposition ruled out?

In which areas of Portugal do periphrastic gerunds occur? And where can interpolation phenomena be found nowadays?

Which syntactic contexts favor clitic climbing? Also what kind of sources displays a higher rate of hanging topics?

This module aims to show how to use parsed corpora to easily get answers to such kind of questions. We will inspect four Portuguese corpora that are the outcome of several language variation projects developed at Centro de Linguística da Universidade de Lisboa (CORDIAL-SIN; POST SCRIPTUM; WOChWEL) and at Instituto de Estudos da Linguagem da Universidade de Campinas (TYCHO BRAHE). These corpora share a standardized annotation format (Penn treebank format) and follow unified annotation guidelines, which makes it possible to search them using an integrated methodology. As a whole they constitute a valuable large-scale resource to investigate both synchronic and diachronic syntactic variation in Portuguese.

Through this module, participants will get hands-on experience with treebank mining technics by exploiting corpus search tools like CorpusSearch 2 (B. Randall, 2005-7) and Teitok (M. Jansen, 2014) and by testing different query languages to search for null categories, word order issues, structural configurations, transformational relations and syntactic functions.

Note: Participants are expected to have a basic knowledge of Portuguese grammar and preferably previous experience of working with Unix-like operating systems.


Who's afraid of building a parsed corpus?

Catarina Magro (CLUL)

Building a parsed corpus is a time-consuming and error-prone process. In this module – directed at corpus creators – we will present a recent approach to syntactic annotation that aims at minimizing the time usually involved in parsed corpora construction while simultaneously improving annotation consistency.

In the proposed methodology, syntactic trees are built by alternating automated analysis and manual correction. We use CorpusSearch revision queries (B. Randall, 2005-7) for automatic rule-based parsing and the graphical user interface Annotald (J. Beck, A. Ecay, A. Ingason, 2011) for edition of the output. Syntactic annotation is represented as labeled bracketing in the Penn treebank format.

In this module, participants will get acquainted with this annotation strategy by writing queries to parse a Portuguese sample text. Practical exercises will cover the annotation of constituent boundaries, empty categories and grammatical functions.

Note: Participants are expected to have a basic knowledge of Portuguese grammar and preferably previous experience of working with Unix-like operating systems.


Textual variation and digital scholarly editing

Elsa Pereira (CLUL)

This seminar aims to present some variation issues addressed by scholarly editing and textual criticism, while familiarising students with the possibilities and limitations of digital approaches to editorially treat these occurrences.

The session will begin by identifying the most frequent textual problems raised by antique and modern literary works, distinguishing intra- and interdocumentary variation. It will demonstrate the adequacy of the electronic medium to represent these occurrences, while pointing the advantages and limitations of the digital editorial paradigm, at the transcription level, as well as the processing and visualisation of the encodings. In the end, students will have the opportunity to explore and interact with renown digital editions, which are based on an XML-TEI encoded apparatus or resort to semiautomatic collation software.


  1. intra- and interdocumentary variation;
  2. representation of textual variation in digital media;
  3. some resources and digital archives.


Workflow for processing linguistic fieldwork data

Hugo Cardoso (FLUL/CLUL) & Patrícia Costa (CLUL)

This module introduces to the participants a common workflow for processing oral data collected during linguistic fieldwork, using tools which were developed precisely to assist the documentation of minority/endangered languages - although they have a wider range of applications. In this case, we will make use of tools and materials involved in the Documentation of Sri Lanka Portuguese project, developed in the Centro de Linguística da Universidade de Lisboa.

In this session, we will describe the procedures which follow the actual collection or oral data, demonstrating the software which allows us to complete several steps on the way to producing an annotated transcription of the recordings: 

1) Preparation of multimedia files: Audacity and Avidemux;

2) Transcription and public sharing of materials: ELAN;

3) Morphosyntactic annotation: FLex.

By the end of the sessions, the participants will have produced a brief annotated transcription of a recording provided by the instructors, in an easily archivable, shareable, and searchable format.

Note: Prior to the course, participants are requested to install the following freeware on their computer:




FLex (Fieldworks, version 8):


Introducing Dialectometry

Fernando Brissos (CLUL)

Dialectometry is the quantitative study of dialects, typically resorting to linguistic atlases or similar-sized corpora. It is therefore a part of the broader field of Dialectology, and follows the paradigm of the so-called quantitative revolution that has been taking place in the social sciences and humanities  for the last decades (in analogy to Psychometrics, Sociometry, Econometrics, etc).

Since the quantitative study of linguistic phenomena still has limited tradition, this module will focus on explaining what is Dialectometry, going through three main topics: definition; raison d’être; and the different stages of the work-flow (from building the database to analyzing the results).


(i) students should bring their own laptops;

(ii) only basic knowledge (high-school level) of mathematics and statistics is required.


Doing Dialectometry

Fernando Brissos (CLUL)

Following the first module, we will have a group exercise on the two nuclear – and at the same time most delicate – procedures of the dialectometrical method: building the database and analyzing the concomitant results. We will (i) study examples of the most common problems placed by linguistic atlases and of the solutions that have proven the most robust in tackling them; and (ii) put into practice the quantitative parameters typically used by specialists, drawing from the most common framework, the so-called Salzburg school of dialectometry.

At the end of this module, students will have detailed knowledge of what is Dialectometry, its pros and cons, and its state-of-the-art. They will thus be able to handle the specialized literature and to take the first steps in the quantitative study of linguistic variation data.


(i) students should bring their own laptops;

(ii) only basic knowledge (high-school level) of mathematics and statistics is required;

(iii) participants should have attended module 9.



To be supplied by each teacher in the beginning of each module.


Vacancy: 10 to 20 attendees


Organizers and scientific coordinators: Amália Mendes, Nélia Alexandre e Fernando Brissos


This and other courses are also listed at and at


       FCT Original        FLUL UL Original          CLUL pequeno