Corpora and Literary Translation

By Titika Dimitroulia & Dionysis Goutsos (Aristotle University of Thessaloniki & National and Kapodistrian University of Athens, Greece)

The Digital Humanities are attracting an increasing amount of attention in the humanities, the social sciences and the arts; often leading to transnational, collaborative and interdisciplinary projects which are also increasingly based on corpora (Crompton, Lane and Siemens 2017; Bernard et Bohet 2017; Longhi 2017; Klein and Gold 2016; Jockers 2013; Warwick, Terras and Nyhan 2012). Digital Literary Studies, which use a wide range of methodologies, explore the latent hermeneutic potential of related fields, by combining qualitative and quantitative methods which privilege corpora (Earhart 2015; Ganascia 2015; Hoover, Culpeper and O’Halloran 2014; Price and Siemens 2013; Schreibman and Siemens 2008; Marchionne, online). This preference is, in a way, reminiscent of the ‘prehistory’ of corpora, which was rooted in concordances of religious and sacred texts rather than texts with a practical purpose (Gigot 1910; Jones 2016).

The resistance of literary translation to integrating corpora in its practice, analysis and teaching is due both to the relative absence of literary parallel corpora (Olohan 2004) and to the reluctance of researchers in both literary and literary translation studies to investigate the quantitative analysis of literary texts – be they originals or translations (Porsdam 2011). Nevertheless, a corpus-based approach is gaining an increasing amount of ground in literary translation studies (Zubillaga, Sanz and Uribarri 2015; Rybicki 2012; Ji 2012), after being explored in its inception by such researchers as Bernardo (1981), Baker (1996, 1999, 2000), Kenny (2000), Zanettin (2001 and 2000) and Bosseaux (2004), among others. This development is also supported by new tools, like CATMA (, the cultural texts annotation tool, TRADUXIO (; Goncharova and Lacour 2011), participative platform for cultural texts translators or QU.IT (;  Zotti, 2016), a parallel database useful both to translation practice and teaching. Thus, literary bi-text interfaces and tools can lead the way to complex approaches and the use of corpora in literary translation, while the traditional concordances and parallel corpora continue to be of great help not only to translators, but also to researchers and translation teachers, as this issue highlights.

This is a period when textual material, and particularly literary texts, are flourishing on the internet and can be used for the compilation of parallel corpora (Zanettin 2002). Tools are also increasingly simpler and more effective to use, ranging from OCR to alignment, concordancing, annotating and visualizing, while the new perspectives offered by interdisciplinary projects in Corpus-based Translation Studies make the use of corpora in literary translation practice, research and teaching both a choice and a necessity. Corpus-based and corpus-driven approaches (see below) to literary translation, beyond linguistic and stylistic research, are found in macro-level and/or diachronic analyses, among others, of norms, translation history and terminology management in literary or comparative studies and can reveal the relation between original and translated literary production at many levels. Furthermore, they can contribute to research on literary and cultural transfer by doing justice to its complexity and its multiple relationships to culture and society in a global context.  

The global context is also shaped by the discussion on World Literature or world literatures (Apter 2006, 2013; Damrosch 2006). In this discussion, firstly, fiction is created with regard to the place or even the possibility of translation in it (Apter 2013) and, secondly, the sociological translation perspective applied at a macro-level has to be completed by a close analysis of the texts (Casanova 2004). Parallel and comparable literary corpora, explored from different points of view, and, most importantly, including all genres of literary texts, can contribute to this discussion, dealing with the relation between language, nation, minorities and all kinds of collectivities, their linguistic and cultural universals and particularities. Research on literary translation based on or driven by big corpora is an unexplored domain, which can enrich Translation Studies in their relation to other disciplines (Ganascia, 2015).

This issue aims to highlight the challenges of corpus-based literary translation, as well as its possible itineraries in the future, from a historical, theoretical or practical perspective. In the opening paper Gerard Lynch sets the scene by placing the analysis of literary texts through corpora within the broader context of research involving the computational and statistical analysis of written texts. He starts from a historical overview, by identifying an earlier phase that focused on the so-called translationese or translation’s interlanguage, following Baker’s (1993) pioneering work. The next phase includes, according to Lynch, stylometric studies on literary translation, giving emphasis to micro-features such as those used in author attribution or stylistic profiling, more generally. The current work belongs to the third phase of research, which involves big data and machine learning methods. Lynch discusses relevant work on literary translation in each phase in order to point out important methodological and theoretical considerations, including the need for more widely accessible machine learning platforms, the restrictions imposed by the limited availability of relevant literary material and the necessity to combine macro-level approaches with more fine-grained analyses. In his view, the interaction of literary translation projects with other research in digital humanities has the potential to investigate “trends in stylistic variation that may transcend individual translator’s choices”.

The papers that follow offer four case studies in the analysis of literary translations. Mojca Schlamberger Brezar focuses the discussion on how parallel literary translation corpora can be exploited for a variety of purposes, by showing how corpora like FraSloK, a literary translation parallel corpus involving French and Slovene, are placed within the larger frame of language resources available for the two languages in question. She then moves on to a minute analysis of the frequency of connectives such as the French mais, cependant etc, which have been extensively discussed in the linguistic literature under various names such as discourse, functional or pragmatic markers etc. (see e.g. Fedriani and Sansó 2017), along with their translation variants in the texts of the corpus. Her analysis shows how quantitative considerations can shed light on an author’s poetics and highlights the usefulness of literary corpora, which, because of their nature, include several registers and thus offer a wider spectrum of lexical choices both to native speakers and L2 learners of a language.

In her contribution Adriana Mezeg singles out a specific construction, namely the use of past participles in sentence initial non-finite clauses, in order to identify translation strategies with reference to the same parallel corpus. What is particularly important is the combination of automatic and semi-automatic methods, for example the use of a qualitative functional analysis of its syntactic and semantic properties for the identification of the construction in question and its frequency. The translation process seems to favour, in her findings, explicitation strategies both in syntactic terms and semantic relations, the latter bringing to the fore implicit relations through translation.  The implications are obvious for the linguistic analysis of each language and their contrastive relations, as well as pedagogical and professional applications.

Elaine Ng’s contribution is a close analysis of four Chinese translations of Hemingway’s The Old Man and the Sea. She specifically employs Simpson’s (1993) model of point of view and focuses on the expression of modality through deontic, epistemic and other modals, as found in the original and the translated texts. Her meticulous analysis bears out the translation shifts occurring in the rendering of a literary text in Chinese and allows the identification of translation strategies, such as omission or modification that may be crucial for the overall patterning of speech and thought presentation in the translated texts. Although, as pointed out, extra-textual information is needed in order to better understand the translators’ choices, quantitative corpus analyses such as this can reveal recurring patterns of translation activity and stylistic choice. Despite its small sample, this is a good example of how close textual analysis can draw from quantitative, and more specifically, corpus-based methods in order to reveal what takes place in the translation process, both on the level of individual translators and that of a particular language.

Finally, Rudy Loock shifts the attention to a rather neglected type of parallel corpora, namely a learner corpus which involves translation tasks, in his case from English to French performed by advanced students of English, having French as their native language. He thus brings the discussion back to the question of translation’s interlanguage or the first phase of relevant research in Lynch’s taxonomy, by carefully comparing student texts with English and French corpora as regards two specific linguistic features, namely derived adverbs in -ly vs. –ment and existential there vs. il y a constructions. The findings are also correlated with the assessment of translation quality by independent evaluators. This exploratory and thus highly original type of research is valuable not so much for its definitive results, but rather for the methodological and theoretical issues it raises, including the question of the correlation between intra-language differences as a whole and translation quality.

This special issue on the use of corpora in the study of literary translation concludes with a paper by Federico Zanettin, which aptly wraps up the issues involved. In particular, Zanettin places the corpus analysis of literary translation within the tradition of computer-assisted literary studies, pointing out the difference between qualitatively- and quantitatively-driven or stylometric approaches, in a way reminiscent of Tognini-Bonelli’s (2001) famous distinction between corpus-based and corpus-driven analysis. His extensive overview of corpus studies that focus on literary texts indicates the spectrum of methods employed, the range of linguistic features examined and the diverse findings produced. His paper draws the implications of these studies for exploring literary translators’ style by means of corpora and the question of translation universals (see Baker 1993), as well as translation criticism.


About the author(s)

Titika Dimitroulia is Associate Professor of Translation Studies in the School of French at Aristotle University of Thessaloniki and Director of the Greek National School of Public Administration and Local Government. She coordinates AUF (Agence Universitaire de la Francophonie) and is a member of the Hellenic Terminology Network ( and the CLARIN infrastructure (European Research Infrastructure for Language Resources and Technology, She is also Director of the Digital Humanities Laboratory ( and responsible for the translation curriculum and tutor at the Training Programme for Greek-speaking translators of the Academy of Athens. She has received the EKEMEL (European Translation Centre-Literature and Human Sciences) translation award in 2008 and has collaborated with several Greek newspapers as well print and electronic journals as a literary critic. She has published widely on literature, translation and digital literary studies, as well the following books, among else: Literary Translation. Theory and Practice (2015, in Greek); Digital Literary Studies (2015, in Greek).

Dionysis Goutsos is Professor of Text Linguistics at the National and Kapodistrian University of Athens. He has also taught at the University of Birmingham (UK) and the University of Cyprus. He has written several articles on text linguistics and discourse analysis, translation studies and corpus linguistics, as well as the following books, among else: Modeling Discourse Topic (1997), Discourse Analysis: An Introduction (1997/2004), The Discourse of Translation (2001, in Greek) and Language: Text, Variety, System (2012, in Greek). He has been research co-ordinator for the research projects leading to the compilation of the Corpus of Greek Texts ( and the Diachronic Corpus of Greek of the 20th Century (

