On anaphoric pronouns in simultaneous interpreting

By Ana Correia (University of Minho, Portugal)

Abstract & Keywords

The successful establishment of anaphoric links between pronouns and their antecedents is a basic condition to ensure that a text is both cohesive and coherent. This is sometimes difficult to achieve when dealing with spoken texts and severe temporal restrictions as is the case in simultaneous interpreting. The present study focuses on personal and demonstrative pronouns. It is based on a random sample of transcripts of speeches and interpretations delivered at plenary sessions of the European Parliament (EP), taken from a larger pool of data, which will be included in an interpreting corpus to be compiled at the University of Minho.

Keywords: simultaneous interpreting, cohesion, anaphora, personal pronouns, demonstrative pronouns

Research on anaphora has gained momentum in recent years due to the interest it has raised among scholars of Natural Language Processing and Artificial Intelligence, who became intent on finding solutions to resolve the ambiguities posed by this intriguing linguistic phenomenon. In the field of translation and interpreting (T&I) studies, anaphora is but a marginal topic of research. This may be because this phenomenon does not lend itself readily to observation, especially not in simultaneous interpreting (SI). Studying anaphora involves studying the relationship between two or more elements (i.e. the antecedent and the anaphor(s)), which are sometimes not easily identifiable in a text. This is all the more true if said text is translated simultaneously, in which case the likelihood increases that one of the elements in the anaphoric chain will be lost. Additionally, in simultaneous interpreting, the study of anaphora – and in fact of any other linguistic phenomenon – is further compounded by the need to transcribe the oral data beforehand. However, the developing paradigm of corpus-based interpreting studies may help to contravene this tendency, shedding light on such phenomena, which are crucial to achieve a deeper understanding of the mechanics behind discourse production. In the first section of this paper, we will briefly trace the evolution of corpus-based interpreting studies and provide an overview of some of the interpreting corpora that are currently available online. This section further describes the general outline of a teaching experience conducted at the University of Minho in connection with the compilation of an interpreting corpus. The second section deals with the object of study, i.e. anaphoric pronouns, and its relevance for simultaneous interpreting, which we will attempt to demonstrate in the third section by transcribing and analyzing some examples taken from a small sample of speeches. The fourth section provides some conclusions based on the findings discussed in the previous section.[1]

1. Corpus-based interpreting studies

Interpreting is a multi-faceted phenomenon which can – and no doubt must – be studied from a wide array of perspectives (Pöchhacker 2015). Indeed, since the 1950s, research on interpreting has been conducted under different paradigms. According to Moser-Mercer (1994), the fundamental distinction is that between the liberal arts paradigm and the natural science community. Most of the research conducted under the liberal arts paradigm was of a prescriptive and anecdotal nature, far removed from the realm of scientific experimentation. This paradigm corresponded to the first developmental stage of interpreting research. The first writings were essentially reports of personal experience and their main scope of application lied in training. Gradually, the scope broadened as did the methods employed to carry out research (Hale and Napier 2013). There was a growing concern with quantification and measurements, connected in particular with the surge of interest on the definition of quality in interpreting. This evolution was also accompanied by an important shift from prescriptive to descriptive research, which is now a cornerstone principle of the so-called liberal arts paradigm. For these reasons, it can be argued that the line between the liberal arts paradigm and the natural science community is becoming increasingly blurred. One of the factors that contributed to this state of affairs was without a doubt the advent of corpus linguistics, with its focus on the systematic and rigorous description of authentic data. Corpus linguistics was first applied to translation, yielding very successful results, and it became a popular research method among translation scholars ever since (Baker 1993, 1995; Laviosa 1998). In 1995, Susan Armstrong pioneered the idea of corpus-based interpreting studies (Armstrong 1995), which was further developed by Miriam Shlesinger (1998) in her seminal paper on the challenges and opportunities of extending corpus linguistics to interpreting studies. Following Shlesinger’s call, many researchers did indeed venture into the compilation of interpreting corpora, some of which will be mentioned in the next subsection.

1.1. Interpreting corpora

In corpus linguistics, it is common to equate the notion of corpus with an electronic corpus available online through a dedicated search interface, which allows users to perform different types of queries. However, the word corpus can also be used to refer to any purposefully sampled body of texts, available either in paper or machine-readable form. This is indeed a broader conception of corpus, which is compatible with much of the work conducted in interpreting studies. In order to overcome problems of ecological validity, researchers in this field have been increasingly concerned with studying actual interpreting output and have therefore begun to build their own interpreting corpora. They are generally compiled by single researchers in the scope of their Master’s or PhD projects, with many constraints. The compilation procedure begins with data collection, which could be a fairly simple retrieval of audiovisual files from a given online source or a more complex undertaking that requires the presence of the researcher at a conference to record the proceedings and gather informed consents from all the participants. The data collected must then be transcribed according to a predefined set of transcription conventions. Researchers may rely on the help of speech-recognition software to produce draft versions of the transcripts but ultimately, whether it is done from scratch or not, transcription involves a great deal of manual labor, hence determining the limited size of many such corpora. Once data collection and transcription have been completed, the corpus is ready to be analyzed either manually or with the support of dedicated software. Not all of these ad hoc corpora result in electronic corpora. Nevertheless, both types of corpora are valuable sources of information for the study of interpreting and have contributed to achieve significant results in our field. For example, Setton (1999) based his cognitive-pragmatic approach on the analysis of such a corpus of English, German and Chinese transcripts. Naturally, these corpora are more useful if they are in machine-readable form, in which case they are fit to be processed using corpus linguistics tools. Nowadays there are countless free web-based tools that allow researchers to exploit their corpora (monolingual or bilingual) in meaningful ways, such as concordancers, taggers, aligners and terminology extractors.

It is widely acknowledged that the process of building interpreting corpora is a highly time-consuming task, mainly due to transcription work (Bendazzoli 2010b; Shlesinger 1998). However, over the past ten years, some scholars have successfully taken up the challenge of compiling such corpora. Among the first interpreting corpora was the European Parliament Interpreting Corpus (EPIC), built between 2004 and 2006 by an interdisciplinary team of researchers from the University of Bologna. EPIC is an on-line, trilingual corpus made up of speeches delivered at plenary sittings of the European Parliament in Italian, Spanish and English plus the respective interpretations (Bendazzoli and Sandrelli 2005; Russo et al. 2012). Italy has in fact been an active center for corpus-based interpreting research. In addition to EPIC, other corpora are worth mentioning, such as CorIT (media interpreting – consecutive and simultaneous) (Falbo 2012), DIRSI-C (conference interpreting - simultaneous) (Bendazzoli 2010a), and FOOTIE (media interpreting - simultaneous) (Sandrelli 2012). At the Hamburg Center for Language Corpora, affiliated with the University of Hamburg, several interpreting corpora have been compiled as well, representing not only simultaneous but also consecutive modes of interpreting (Bührig et al. 2012; House, Meyer and Schmidt 2012). One particular feature that distinguishes the work of researchers at Hamburg is that their repository includes community interpreting corpora, reflecting for example interpreter-mediated interaction in hospitals and in courtrooms. All around the world, the growing interest in corpus linguistics has spurred the creation of all sorts of different corpora suited to the study of a wide gamut of linguistic phenomena. Thanks in part to the pioneering work developed by the Language Resource Center for Portuguese, known as Linguateca, Portugal is no exception. We now have at our disposal a number of monolingual and multilingual corpora featuring Portuguese as either a source or target language such as Corpus de Referência do Português Contemporâneo. Corpus do Português, CETEMPúblico, Le Monde Diplomatique, COMPARA, Per-Fide and OPUS, to name but a few (for a comprehensive review of Portuguese corpora, see Berber Sardinha and Ferreira 2014). The last four examples are multilingual corpora with parallel alignment, hence particularly suited to research in translation studies. To our knowledge, the corpus mentioned in this study is an original attempt at building an interpreting corpus since, at present, there are no such corpora for (European) Portuguese. This is not surprising if we consider that spoken corpora in general are scarce and of limited size. However, members of the above-mentioned Hamburg Center for Language Corpora have created Dik - interpreting in hospitals corpus and CoSi - a corpus of consecutive and simultaneous interpreting, both of which include Brazilian Portuguese (Bührig at al. 2012; House, Meyer and Schmidt 2012). In Brazil, Luciana Ginezi is also compiling an interpreting learner corpus (Ginezi 2014). Due to the lack of interpreting corpora for European Portuguese, we decided to take up that challenge. We are currently involved in the compilation of the interPE corpus. It is a simultaneous interpreting multimedia corpus which includes Portuguese and English speeches delivered at the European Parliament plenary sittings. It will include not only the sentence-aligned transcripts of the original speeches and interpretations but also the corresponding audiovisual files. The interPE corpus will be composed of 20 Portuguese and 20 English speeches plus the respective interpretations, with each original speech averaging a duration of one and a half minutes. InterPE was envisaged as an open corpus, which means that more speeches will be added in the future. While it falls outside the scope of this paper to describe the compilation stages of our corpus, we would nevertheless like to report the involvement of some students of the University of Minho in the transcription stage, highlighting the potential benefits of this kind of work for the students.

1.2. Transcription: a teaching experience at the University of Minho

The study presented in this paper is part of a doctoral project, which in turn is based on a corpus of speeches delivered at EP plenary sittings by English and Portuguese MEPs (Members of the European Parliament) as well as the respective interpretations, in simultaneous mode. This initiative, which began in 2013-14, has been developed in collaboration with students attending the course unit of Principles of Interpreting, from the 3rd year of the undergraduate degree in Applied Languages of the University of Minho (Braga, Portugal). The students transcribed and/or revised speeches using EXMARaLDA – Partitur. The speeches were orthographically transcribed. The decision was made not to transcribe paralinguistic features such as pauses, hesitations, vowel lengthenings or false starts (among others), as this fell outside the scope of our study, which was exclusively concerned with anaphoric relations. The students further aligned the speeches with the respective interpretations using the web-based aligner YouAlign[2]. Each student was then asked to produce an analysis of the interpretations they transcribed, based on a typology adapted from Falbo (1998). We implemented a two-stage approach which allowed us to successfully involve students not only in the actual compilation of the corpus but also in the analysis of the data, as illustrated in Figure 1 below:


Figure 1: Methodology used in class.

This teaching experience led us to believe that such an approach produces positive results as it encourages students to acquire technical (connected with the early stages of corpus compilation) and analytical (connected with reasoning abilities and linguistic analysis) skills. According to the students’ feedback, this exercise had a satisfactory outcome. In general, students claimed to have acquired a relevant and diverse set of skills that can actually help them in their future language-related careers. For example, among the benefits they mentioned were learning to use transcription and alignment software, and learning simple yet effective linguistic terminology to describe with scientific rigor some of the phenomena encountered in the speeches they transcribed. Incidentally, for the majority of students, who saw interpreting as a mission impossible, the analysis of authentic output contributed to demystify the work of the interpreter. It helped students gain a better understanding of simultaneous interpreting in the context of EP plenary sittings, drawing their attention to the delivery rates as well as to the syntactic and semantic complexity of the speeches.

2. Anaphoric pronouns in simultaneous interpreting

Anaphora is one of many linguistic devices employed by speakers to create “texture” or “textuality”, that is, the property of “being a text”. According to de Beaugrande and Dressler (1981), texts are communicative occurrences that must comply with seven principles of textuality: cohesion, coherence, intentionality, acceptability, informativity, situationality and intertextuality. Textuality is the property that allows a text to be acknowledged as such, rather than of a heap of disconnected sentences. The first two standards are text-centered. Cohesion, in particular, concerns grammatical dependencies. Coherence, in turn, depends upon the semantic connection between the sentences that make up a text or, in other words, it depends on whether they conform to our view of the world and on their adequacy to the communicative context. Anaphora is located at the level of cohesion. It is a lexico-grammatical mechanism that enables the establishment of referential chains, presupposing the existence of a referentially dependent element (the anaphor) which can only be interpreted in connection with another item that is present in the cotext. An anaphora can be coreferential if both elements designate the same real world entity, or non coreferential if they have different referents. There are also various types of anaphora, depending on the grammatical nature of the elements involved: pronominal, nominal, verbal, and adverbial (Lopes and Carapinha 2013; Charolles, 2002).

While anaphora has been studied in a wide range of disciplines, within different frameworks (Branco, McEnery and Mitkov 2005), we are particularly interested in anaphora as a discourse-level phenomenon, as conceived in text linguistics and discourse analysis, and the challenges it brings for simultaneous interpreters. In Translation and Interpreting Studies, many scholars have been concerned with the question of how translators are able to recreate cohesive and coherent texts in the target language (Baker 2011; Hatim and Mason 1997; Neubert and Shreve 1992, among others). This line of research was often connected with the quest for translation and interpreting universals (Blum-Kulka 1986). Anaphoric reference is generally addressed as a marginal topic subsumed under the broad umbrella of cohesion and coherence. This is for example the case of Shlesinger’s (1995) paper on cohesive shifts in SI, where anaphora is but one of the various devices dealt with by the author, and Gallina’s (1992) study on the cohesion of political speeches, which looks not only into reference but also ellipsis, conjunctions and lexical cohesion. Friedel Dubslaff’s (1993) paper on anaphoric retrieval in simultaneous interpreting is one of the few examples where anaphoric reference is regarded as a research topic that is worthy of interest on its own. Snelling (1992) also addresses a number of syntactic as well as semantic issues that should be taken into consideration when interpreting from Portuguese. With regard to syntax, one of the problems on which he focuses in more detail is choice of subject. Based on his corpus, Snelling found that most sentences in Portuguese did not begin with the subject. In such cases, he recommends that interpreters working into English always begin their sentences with a subject, following the linearity of the subject-verb-object structure. As we will see in the next section of this paper, the search for a subject is a frequent obstacle faced by interpreters working from Portuguese into English, who often use pronouns to fill in that gap, often generating ambiguities and even erroneous chains of reference. These authors share an interest in cohesive ties and acknowledge their relevance for interpreting (see for example Gumul 2012). In particular, anaphoric ties are a basic condition for the successful construction of any text, helping to ensure cohesion and coherence. Pronouns can be used to build anaphoric chains made up of not only intrasentential but also intersentential connections that can spread through an entire speech. Such a complex architecture may be costly in terms of processing requirements, and if anaphoric links are not properly established, that may well affect a text’s communicative intelligibility. The cognitive processing underlying the mechanisms of reference building becomes more complex when it is conducted only in the spoken mode and under severe temporal restrictions as is the case in simultaneous interpreting. Thus, if we consider that a text results from the intersection of several anaphoric chains, it becomes clear that the study of anaphora is relevant for interpreting, which aims to ensure that a source text is rendered in the target language in a cohesive and coherent manner.

3. Empirical analysis of anaphoric pronouns in simultaneous interpreting

In this section, we present a small-scale exploratory study about anaphoric pronouns in simultaneous interpreting from Portuguese into English. We will begin by providing the frequency and distribution of the pronouns. This will be followed by a detailed analysis of examples taken from the corpus.

3.1. Frequency and distribution

For this empirical analysis, we selected a random sample of seven Portuguese speeches (plus English interpretations). The sample – to which we will refer as corpus throughout the remainder of this paper – was deliberately small in order to allow for a more in-depth qualitative analysis of the relevant examples. By randomly selecting the speeches, we ensured that the sample would be unbiased by any speech- (for example, topic), speaker- (for instance, gender) or interpreter-related variables (for example, professional experience) and that it would be free from researcher bias. The following exclusion criteria were applied before we could proceed with the random sampling to ensure that the data was homogenous in the first place:

  • speakers who did not speak in their mother tongue (that is Portuguese);
  • speeches by Commissioners and other non-MEP entities, which tend to be either long interventions that far outlast those of MEPs or very brief announcements of who has the floor;

We then proceeded to the extraction of all the pronouns, in the originals as well as in the interpretations. Since the corpus had been previously annotated with part-of-speech tags, the extraction process was completed semi-automatically, only requiring manual verification in a few ambiguous cases. The pronouns were then organized and counted according to type. The large majority of occurrences found in the corpus were of personal, relative and demonstrative pronouns, in that order. No possessive pronouns were found but we thought it relevant to extract all possessive determiners since these markers are often implicated in anaphoric relations, leading to ambiguous readings. Possessive determiners ranked third, after the personal and relative pronouns. The results are shown in table 1 below:


PT (original)

EN (interpretation)


Personal pronouns




Relative pronouns




Possessive determiners




Demonstrative pronouns








Table 1: Number of pronouns per language and type.

This study focuses on personal and demonstrative pronouns. Personal pronouns were chosen because of the high number of total occurrences, which is more than twice the number of occurrences found for the second most frequent category of pronouns (namely, relative). Despite being the least frequent, demonstrative pronouns were chosen because of their potentially resumptive value, which allows for these pronouns to select wider antecedents. After extracting the pronouns, it was necessary to mark those that were part of anaphoric chains (that is the pronouns with anaphoric value). We found that: out of 36 personal pronouns, 26 were anaphoric; and out of 13 demonstrative pronouns, all 13 were anaphoric. This quantitative extraction was carried out only in the original speeches. We then copied into a spreadsheet all the occurrences of anaphoric pronouns in Portuguese in their extended context along with the respective interpretations in English. This allowed us to comparatively analyze the originals and the interpretations, focusing on how the anaphoric chains were rendered in the interpretations.

3.2. Personal and demonstrative pronouns

In the following sections, we present a non-exhaustive selection of cases where the anaphoric chains present in the original speeches were omitted or reformulated in the interpretations. While such operations could affect coherence, it is not our goal here to assess the extent to which the interpreted speeches are affected by the omission and/or reformulation of the anaphoric chains. The examples were taken from five out of the seven speeches analyzed; the originals are preceded by the acronym OS and interpretations by the acronym INT, each accompanied by the speech number (according to the chronological order in which they were delivered).

3.2.1 Personal pronouns

Corpus analysis uncovered a clear asymmetry between English and Portuguese in terms of the number of pronouns, as can be seen from Table 1. This is especially visible in the case of personal pronouns, which may be explained by the fact that English, as opposed to Portuguese, does not accept null subjects. This can further be explained by the tendency observed in the English interpretations toward paratactic structures. Example (1a) is representative of the speeches delivered by Portuguese MEPs, who tend to include several embedded clauses:


(1a) Neste debate não podemos esquecer que existe uma proposta dum chamado pacto de competitividade
In this debate we cannot forget that there is a proposal of a so-called competitiveness pact

através do qual o diretório, comandado pela Alemanha, quer desferir novos ataques ao regime público
through which the directory, led by Germany, wants to launch new attacks against the solidary and

solidário e universal da segurança social, aumentar a idade da reforma e desvalorizar salários, tentando
universal public regime of social security, increase the age of retirement and devalue salaries, trying to

pôr fim à sua indexação à taxa de inflação apenas para beneficiar o setor financeiro, o qual pretende
put an end to their indexation to the inflation rate only to benefit the financial sector, which intends to

encontrar nas pensões novas formas de maiores ganhos especulativos.
find in pensions new ways to greater speculative gains.


(1b) We must remember in this debate that there is a proposal relating to the competitiveness pact. Germany in particular seems to be very ready to attack the system of public security by lowering salaries, by exacerbating inflation largely for the benefit of the financial sector and we know that the financial sector wants to continue to gamble through private financing of pensions.

The complexity of the original speech takes its toll on the interpreter, who attempts to chunk the incoming message into smaller, more manageable bits of information. These chunking operations result in the creation of shorter sentences, to which the interpreter must assign a subject, as required by English grammar. According to Gile (1994: 48), however, ‘a deviation from the source language structure may mean the interpreter is controlling the situation, whereas the selection of target language structures similar to source language structures indicates that the interpreter may be short of processing capacity’. As mentioned above, syntactic complexity is often the hallmark of Portuguese speeches. This factor is further compounded by the delivery rates, which in our corpus ranged between 140 and 181 words per minute (average = 156 wpm). These factors hinder the process of recognizing and assigning a syntactic subject to the new sentences. In such cases, our corpus analysis has shown that interpreters often resort to the generic personal pronoun “we”. In (1b), this pronoun provides the interpreter with a plausible subject for the independent clause he[3] creates in his rendition. The use of “we” also proved a particularly useful instrument when the interpreter struggled to identify the antecedent in an anaphoric chain. In (2b) we have a rather unclear anaphoric link between the pronoun ‘ela’ and the immediately preceding antecedent (‘Europa pós-queda do muro’):


(2a) Isto parece-me uma perversão fundamental dos princípios da Europa pós-89, da Europa pós-queda do
This seems like a fundamental perversion of the principles of Europe after 89, of Europe after the fall of

muro. O que ela queria dizer é que nós não abandonaríamos os nossos irmãos europeus de qualquer país à
the Wall. What she meant is that we would not abandon our European brothers of any country to

censura e à repressão à liberdade de expressão.
censorship and to repression of freedom of expression.


(2b) I think this runs completely counter to the principles of Europe, particularly after the fall of the Wall. We have said that we will not leave Europeans subject to censorship in any country.

Unable to identify the antecedent of ‘ela’, the interpreter has to look for an alternative that allows her to do away with the anaphoric chain without severely detracting from the coherence of her speech. In their efforts to segment the incoming speech into manageable chunks of information that allow them to stick to the canonical sentence order (subject-verb-object(s)), interpreters seem more prone to employ paratactic structures rather than hypotactic ones. In order to convert hypotaxis into parataxis, the interpreter is required to produce a syntactic subject. Since it is not always simple to come up with an appropriate subject, the use of the pronoun “we” acquires a strategic dimension of considerable usefulness. A great deal of attention has also been devoted to the study of “we” markers (we, us, our) in the specific context of European Parliament interpreting as a means of highlighting ideological assumptions (Beaton 2007; Dumara 2015). As we have seen in (2), the pronoun “we” can be used as an alternative to solve such anaphora-induced difficulties but there are other pronouns that can fulfill the same role, such as “they”:


(3a) O desaparecimento de Ai Weiwei tem de ser entendido no contexto do aumento desesperado da
The disappearance of Ai Weiwei has to be understood in the context of the desperate increase of

repressão política por parte das autoridades Chinesas. Tudo por medo de que o espírito revolucionário
political repression on the part of Chinese authorities. All out of fear that the revolutionary spirit

no mundo Árabe infete a sociedade chinesa.
in the Arab world might infect the Chinese society.


(3b) The disappearance of Ai Weiwei has to be understood in the context of the tightening up of political repression in China. They are afraid that the democratic spring in the Arab world might infect them.  

In (3b), the interpreter creates an anaphoric relation that did not exist in the original speech (‘they’…’them’). In the first sentence, he replaces the agent (‘por parte das autoridades chinesas’) with a simple locative phrase (‘in China’). This phrase becomes the antecedent for the pronoun at the beginning of the following sentence. Owing to the vagueness of this antecedent, which could refer to the Chinese authorities (as in the original), to the population or any other Chinese entity, the interpreter opts for ‘they’ to fill in the subject role, possibly due to the reminiscence of a plural antecedent uttered in the original speech (‘autoridades chinesas’). This anaphoric chain has a third link (‘them’), which necessarily follows from the interpreter’s previous choice. It remains unclear though whether the two anaphors are coreferential. In any event, by using ‘they’, the interpreter leaves it up to his listeners to decide whether to interpret these two anaphors coreferentially and to determine what their referent(s) is(are).

3.2.2. Demonstrative pronouns

In addition to personal pronouns, demonstrative pronouns have also been found to serve as a strategic device when used resumptively. As already mentioned above, delivery rate is an essential variable in interpreting, which may lie at the origin of various comprehension and production problems encountered by interpreters, especially in simultaneous mode. In the specific context of the EP plenary sittings, which is notorious for rigid constraints on floor allocation rules, delivery rates are often found to exceed the optimal threshold[4]. For that reason, most speakers prepare their speeches in advance, which means that they are generally closer to the literate pole of the oral-literate continuum (Shlesinger 1989). Adding to an already tense situation, speeches are typically encumbered by intricate lines of reasoning that are often hard to follow even for native speakers, as is the case in the excerpt transcribed in example (4a):


(4a) Neste debate não podemos esquecer que existe uma proposta dum chamado pacto de competitividade
In this debate we cannot forget that there is a proposal of a so-called competitiveness pact

através do qual o diretório, comandado pela Alemanha, quer desferir novos ataques ao regime público
through which the directory, led by Germany, wants to launch new attacks against the solidary and

solidário e universal da segurança social, aumentar a idade da reforma e desvalorizar salários, tentando
universal public regime of social security, increase the age of retirement and devalue salaries, trying to

pôr fim à sua indexação à taxa de inflação apenas para beneficiar o setor financeiro, o qual pretende
put an end to their indexation to the inflation rate only to benefit the financial sector, which intends to

encontrar nas pensões novas formas de maiores ganhos especulativos. Queremos aqui manifestar a nossa
find in pensions new ways to greater speculative gains. We want here to manifest our

clara oposição a este caminho da integração europeia construído na base de políticas anti-sociais a que
clear opposition to this road of European integration built on the basis of antisocial policies to which

lamentavelmente este relatório dá cobertura ao apoiar o Livro Verde da Comissão Europeia, ao
this report regrettably gives credit by supporting the Green Book of the European Commission, by

admitir uma ligação da idade legal da reforma à esperança de vida e incentivar a permanência por um
allowing a connection between the age of retirement and life expectancy and encouraging permanence

período mais longo no mercado de trabalho, ao não excluir o apoio a sistemas de reformas privados
for a longer period of time in the job market, by not excluding support to private retirement systems

mesmo quando já se conhecem consequências graves da sua utilização especulativa por fundos e
even when there are already known consequences of their speculative use by private banks and funds

bancos privados que deixaram os idosos, designadamente mulheres idosas, na pobreza.
that have left the elderly, namely elderly women, in poverty.


(4b) We must remember in this debate that there is a proposal relating to the competitiveness pact. Germany in particular seems to be very ready to attack the system of public security by lowering salaries, by exacerbating inflation largely for the benefit of the financial sector and we know that the financial sector wants to continue to gamble through private financing of pensions. I think we have to be clear about the dangers of that. These are antisocial policies and I think it's crucial that we understand that. We have to take account of increasing life expectancy and the fact that in many instances people are working for much longer periods. The document also talks about the intervention of the private sector but there are serious speculative risks related to that because many older women are being driven into poverty by this combination of circumstances.

It would seem that the interpreter was not able to keep up with the original speech. In particular, after the first sentence he was thrown off the track and forced to deploy ‘coping tactics’ (Gile 1995: 191). In this case, his tactics consisted in the use of the demonstrative pronoun “that” as a resumptive, which is taken to refer back to the preceding clauses. This allowed him to save time, leaving it to the listeners to put together the intended meaning of the speaker. Irrespective of all the compounding difficulties inherent to simultaneous interpreting, interpreters must see their renditions through, resorting to alternatives that do not always yield the best results. Although resumptive pronouns do offer a valid non-committal strategy, their intrinsic vagueness could impair the listeners’ understanding of the interpreter’s rendition. It can be argued, however, that the listeners, who are assumed to possess some degree of familiarity with the topics discussed at EP plenary sittings, may be able to fill in any semantic gaps on the basis of their previous knowledge. This speaks to the importance of extralinguistic factors in interpreting, namely, the listener’s cognitive complements (Seleskovitch and Lederer 1984), which allow them to fill in gaps caused by any omissions or inaccuracies on the part of interpreters.

We have already mentioned above that, when interpreting from Portuguese into English, there is a recurrent use of parataxis to the detriment of hypotaxis, which forces interpreters to assign a syntactic subject to each new sentence or clause. When in doubt about an adequate subject, it was found that interpreters often used the pronoun “we” or “they”. However, other evidence from the corpus showed that demonstrative pronouns can also be used for the same purpose, as in (5b):


(5a) Obrigada, Senhora Presidente. Num tema necessariamente vasto, queria aqui deixar apenas dois
Thank you, Madam President. In a necessarily vast theme, I would like here to leave just two

breves apontamentos. O primeiro para chamar a atenção para os fatores de ameaça que hoje pesam sobre
brief notes. The first to draw attention to the factors of threat that today weigh over

inúmeros ecossistemas florestais.
numerous forest ecosystems.


(5b) Thank you, Madam President. This is a very vast topic but I'd simply like to make two points. The first is that I'd like to draw your attention to the threats for forestry resources and then the exotic species that escape forest fires.

This excerpt was taken from the beginning of the speech. The original sentence was converted into two coordinate clauses joined by an adversative conjunction (‘but’). Due to this segmentation, the interpreter had to find a subject for the first clause, which is the pronoun ‘this’. This transformation makes more explicit the restrictive relationship between the hypernym (‘vast topic’) and the hyponym (‘two points’). In this case, the pronoun ‘this’ takes on a cataphoric value, unlike the pronoun ‘that’ in the following example which is both anaphoric and cataphoric:


(6a) É condição sine qua non que a Líbia permita que o Alto Comissariado das Nações Unidas para os
It is a sine qua non condition that Libya allows the United Nations High Commissioner for

Refugiados volte a operar no país com um mandato alargado. Atrevo-me a dizer claramente: sem
Refugees to once again operate in the country with an extended mandate. I dare say clearly: without

ACNUR não há acordo.
UNHCR there is no agreement.


(6b) The condition sine qua non is that Libya allows the UNHCR to come back to the country with an amplified agreement. I have to say that quite clearly: without UNHCR, no agreement.

In (6b) the pronoun ‘that’ resumes the idea conveyed in the previous sentence and, at the same time, it announces the reasoning that follows it. It would seem that, by placing the pronoun in a cataphoric position, the relationship between the anaphor and the postcedent becomes even more evident than in the original. In both (5b) and (6b), the interpretations have a higher degree of explicitness – the first due to the segmentation and the second due to the addition of the demonstrative.

4. Conclusions

Our corpus analysis has shown that English target speeches (the interpretations) globally have more pronouns than Portuguese source speeches (the originals). This is partly because interpreters are more prone to use hypotactic structures. In order to deal with this, interpreters tend to segment the input into small chunks and in doing so they are left with coordinated clauses to which they must assign suitable subjects. It was found that interpreters resort to pronouns such as ‘we’ and ‘they’ as a means of fulfilling that grammatical requirement. Our corpus analysis also showed that demonstrative pronouns were employed in anaphoric relations with a resumptive function, referring back to strings of embedded clauses as in (4b). Demonstrative pronouns further contributed to make more explicit the semantic logic that was only implicit in the original speeches, as was the case in examples (5) and (6). These findings suggest that personal and demonstrative pronouns are strategically used by interpreters to meet a grammatical requirement of the target language as a result of chunking operations. Additionally, these pronouns provide a non-committal alternative which is valuable to interpreters in case of doubt. However, the drawbacks of pronoun use can quickly overshadow the benefits if the intrinsic vagueness of pronouns prevents listeners from being able to identify the antecedent in the anaphoric relationship of which they form part. Although listeners are assumed to bring their previous knowledge to the context of simultaneous interpreting, that may not always be sufficient to overcome the vagueness introduced by some pronominal anaphors.

This type of study is based on the premise that reflection on the practice of interpreting through the analysis of authentic data, that is, a corpus (electronic or not), can promote the students’ metalinguistic awareness, helping them to develop anticipation and problem-solving strategies (Sandrelli 2010). We have seen that there are a few corpora of interpreting but certainly not nearly as many as there are for (written) translation. However, it is fair to claim that corpus-based interpreting studies is gaining ground all around the world. Scholars engaged in this kind of research are well aware of the difficulties of creating interpreting corpora so, in line with the rationale behind the 1st Forlì International Workshop on Corpus-based Interpreting Studies, it is important that researchers join efforts in the future with regard to greater standardization of interpreting corpora, thus contributing to significant increases in terms of sheer size and representativeness. To the best of our knowledge, the interPE corpus is an original attempt at a simultaneous interpreting corpus featuring European Portuguese and, although it was created with the aim of studying anaphoric relations in simultaneous interpreting, we hope that it will come to serve a wider range of research purposes.


Armstrong, Susan (1995) “Corpus-based Methods for NLP and Translation Studies”, Interpreting 2, no. 1/2: 141–62.

Baker, Mona (1993) “Corpus Linguistics and Translation Studies: Implications and Applications” in Text and Technology: In Honour of John Sinclair, Mona Baker, Gill Francis and Elena Tognini-Bonelli (eds), Amsterdam/Philadelphia, John Benjamins: 233-52.

---- (1995) “Corpora in Translation Studies: An Overview and Some Suggestions for Future Research”, Target 7, no. 2: 223-43.

---- (2011) In Other Words: A Coursebook on Translation, London/New York, Routledge.

Beaton, Morven (2007) “Interpreted Ideologies in Institutional Discourse. The Case of the European Parliament”, The Translator 13, no. 2: 271–96.

Bendazzoli, Claudio (2010a) Il Corpus DIRSI: Creazione e sviluppo di un corpus elettronico per lo studio della direzionalità in interpretazione simultanea, PhD diss., University of Bologna, Italy.

---- (2010b) Corpora e interpretazione simultanea, Bologna, Asterisco.

Bendazzoli, Claudio, and Annalisa Sandrelli (2005) “An Approach to Corpus-based Interpreting Studies: Developing EPIC (European Parliament Interpreting Corpus)” in MuTra – Challenges of Multidimensional Translation: Conference Proceedings, Heidrun Gerzymisch-Arbogast and Sandra Nauert (eds), Saarbrücken, 1–12.

Berber Sardinha, Tony, and Telma São Bento Ferreira (eds) (2014) Working with Portuguese Corpora, London/New York, Bloomsbury Academic.

Blum-Kulka, Shoshana (1986) “Shifts of Cohesion and Coherence in Translation” in Interlingual and Intercultural Communication: Discourse and Cognition in Translation and Second Language Acquisition Studies, Julianne House and Shoshana Blum-Kulka (eds), Tübingen, Narr: 17–35.

Branco, António, Tony McEnery, and Ruslan Mitkov (eds) (2005) Anaphora Processing. Linguistic, Cognitive and Computational Modelling, Amsterdam/Philadelphia, John Benjamins.

Bührig, Kristin, Ortrun Kliche, Bernd Meyer, and Birte Pawlack (2012) “The Corpus “Interpreting in Hospitals”: Possible Applications for Research and Communication Training” in Multilingual Corpora and Multilingual Corpus Analysis, Thomas Schmidt and Kai Wörner (eds), Amsterdam/Philadelphia, John Benjamins: 305–15.

Charolles, Michel (2002) La référence et les expressions référentielles en français, Paris, Ophrys.

de Beaugrande, Robert-Alain, and Wolfgang U. Dressler (1981) Introduction to Text Linguistics, London/New York, Longman.

Dubslaff, Friedel (1993) “Die Funktionen anaphorischer Proformen beim Simultandolmetschen aus dem Deutschen”, Hermes, Journal of Linguistics 11: 107–16.

Dumara, Barbara (2015) “How Can Interpreting Corpora Extend Our Knowledge on Intrusive ‘We’ in SI?”, Poster presented at the conference Corpus-based Interpreting Studies: The State of the Art. First Forlì International Workshop, 7-8 May 2015, University of Bologna at Forlì.

Falbo, Caterina (1998) “Analyse des Erreurs en Interprétation Simultanée”, The Interpreters’ Newsletter 8: 107–20.

---- (2012) “CorIT (Italian Television Interpreting Corpus): Classification Criteria” in Breaking Ground in Corpus-based Interpreting Studies, Francesco Straniero Sergio and Caterina Falbo (eds), Bern, Peter Lang: 157–85.

Gallina, Sandra (1992) “Cohesion and the Systemic-functional Approach to Text: Applications to Political Speeches and Significance for Simultaneous Interpretation”, The Interpreters’ Newsletter 4: 62–71.

Gerver, David (1969/2002) “The Effects of Source Language Presentation Rate on the Performance of Simultaneous Conference Interpreters” in The Interpreting Studies Reader, Franz Pöchhacker and Miriam Shlesinger (eds), London/New York, Psychology Press: 52–66.

Gile, Daniel (1994) “Methodological Aspects of Interpretation and Translation Research” in Bridging the Gap. Empirical Research in Simultaneous Interpretation, Sylvie Lambert and Barbara Moser‑Mercer (eds), Amsterdam/Philadelphia, John Benjamins: 39–56.

---- (1995) Basic Concepts and Models for Interpreter and Translator Training, Amsterdam/Philadelphia, John Benjamins.

Ginezi, Luciana Latarini (2014) “Desafios para a Construção de um Corpus de Aprendizes de Interpretação Simultânea”, TradTerm 23: 165–91.

Gumul, Ewa (2012) “Variability of Cohesive Patterns. Personal Reference Markers in Simultaneous and Consecutive Interpreting.”, Linguistica Silesiana 33: 147-72.

Hale, Sandra, and Jemina Napier (2013) Research Methods in Interpreting, London/New York, Bloomsbury Academic.

Hatim, Basil, and Ian Mason (1997) The Translator as Communicator, London, Routledge.

House, Juliane, Bernd Meyer and Thomas Schmidt (2012) “CoSi - A Corpus of Consecutive and Simultaneous Interpreting” in Multilingual Corpora and Multilingual Corpus Analysis, Thomas Schmidt and Kai Wörner (eds), Amsterdam/Philadelphia, John Benjamins: 295–304.

Kleiber, Georges (1994) Anaphores et pronoms, Louvain-la-Neuve, Duculot.

Laviosa, Sara (1998) “The Corpus-based Approach: A New Paradigm in Translation Studies”, Meta 43, no. 4: 474-9.

Lopes, Ana C. M., and Conceição Carapinha (2013) Texto, Coesão e Coerência, Coimbra, Almedina.

Moser-Mercer, Barbara (1994) “Paradigms Gained or the Art of Productive Disagreement” in Bridging the Gap. Empirical Research in Simultaneous Interpretation, Sylvie Lambert and Barbara Moser‑Mercer (eds), Amsterdam/Philadelphia, John Benjamins: 17–23.

Neubert, Albrecht, and Gregory M. Shreve (1992) Translation as Text, Kent, OH, The Kent State University Press.

Pöchhacker, Franz (2015) “Interpreting” in Routledge Encyclopedia of Interpreting Studies, Franz Pöchhacker (ed.), London/New York, Routledge: 197-200.

Russo, Mariachiara, Claudio Bendazzoli, Annalisa Sandrelli, and Nicoletta Spinolo (2012) “The European Parliament Interpreting Corpus (EPIC): Implementation and Developments” in Breaking Ground in Corpus-Based Interpreting Studies, Francesco Straniero Sergio and Caterina Falbo (eds), Bern, Peter Lang: 35-90.

Sandrelli, Annalisa (2010) “Corpus-Based Interpreting Studies and Interpreter Training: A Modest Proposal” in Translationswissenschaft: Stand und Perspektiven. Innsbrucker Ringvorlesungen zur Translationswissenschaft VI, Lew Zybatow (ed.), Peter Lang: 69–90.

---- (2012) “Interpreting Football Press Conferences: The FOOTIE Corpus” in Interpreting across Genres: Multiple Research Perspectives, Cynthia J. K. Bidoli (ed.), Trieste, Edizioni Università di Trieste: 78–101.

Seleskovitch, Danica, and Marianne Lederer (1984) Intérpreter pour Traduire, Paris, Didier Érudition.

Setton, Robin (1999) Simultaneous Interpretation: A Cognitive-pragmatic Analysis, Amsterdam/Philadelphia, John Benjamins.

Shlesinger, Miriam (1989) Simultaneous Interpretation as a Factor in Effecting Shifts in the Position of Texts on the Oral-Literate Continuum, M.A. diss., Tel Aviv University, Israel.

---- (1995) “Shifts in Cohesion in Simultaneous Interpreting”, The Translator 1, no. 2: 193–214. 

---- (1998) “Corpus-based Interpreting Studies as an Offshoot of Corpus-based Translation Studies”, Meta 43, no. 4: 486–93.

Snelling, David (1992) Strategies for Simultaneous Interpreting from Romance Languages into English, Udine, Campanotto.


[1] This study is part of a doctoral project, supported by grant no. SFRH/BD/88142/2012 and awarded by the Portuguese Foundation for Science and Technology under the Human Potential Operational Program. It is cofunded by the European Social Fund and the Portuguese Ministry of Education and Science.

[3] Thanks to the audio of the interpretations we were able to distinguish male from female interpreters, hence in this paper we use gender-marked pronouns to refer to the interpreters.

[4] According to Gerver (1969/2002) that would be in the range of 95 to 120 words per minute.

About the author(s)

Ana Correia holds an undergraduate degree in Applied Foreign Languages from the University of Minho (2006). She worked as a research assistant for the corpus compilation project “Per-Fide - Portuguese in parallel with six languages: Español, Russian, Français, Italiano, Deutsch, English” (ref. no. PTDC/CLE-LLI/108948/2008). Currently, she is a PhD student in Language Sciences, speciality of Applied Linguistics, at the same university. She has received a grant from the Portuguese Foundation for Science and Technology to conduct her PhD project, which is a corpus-based study dealing with pronominal anaphora in simultaneous interpreting.

