Translation Quality Assessment

Edited by Joss Moorkens, Sheila Castilho, Federico Gaspari, Stephen Doherty (2018)

Springer International Publishing, pp. 287, 114,39 € (hardcover) 93,08 € (ebook)

Reviewed by: Luisa Bentivogli

This book provides a comprehensive overview of Translation Quality Assessment (TQA), a multifaceted topic that, especially with the advent of new translation technologies, has increasingly gained the attention of the translation studies community. Its various chapters address important questions, such as: how to define and measure the quality of a translation? What are the most appropriate evaluation methods for different domains, text types, workflows, end-users? Are quality requirements and measures the same for human and automatic translation? The authors shed light on these issues from a range of perspectives, addressing TQA research and practice in academic, institutional and industry contexts, as well as encompassing both human and machine translation evaluation.

The rationale of the book is given in the Introduction, where the editors explain how understanding translation technologies and their appropriate evaluation methods is crucial to succeed in a competitive landscape like that of the language services industry, where different technologies - and especially machine translation (MT) - are increasingly integrated into the translation process.

The book is composed of 11 chapters grouped into three parts.  As suggested by the subtitle of the volume “From Principles to Practice”, the first part starts by describing different scenarios for human and machine TQA, the second part moves to explore real applications developed for TQA and the third part concludes by presenting empirical studies that employ novel applications of TQA. The first two parts discuss the various subjects at a high level, offering a comprehensive overview that still allows the interested reader to delve deeper into the preferred themes. The third part is more specialised and includes technical details, but it perfectly succeeds in making the goals, findings and potential of the described research very clear also to non-experts.

The interdisciplinary nature of the book makes it a valuable tool for professionals as well as for students, teachers and researchers. Part I, named “Scenarios for Translation Quality Assessment”, examines the state-of-the-art in TQA.  After explaining the complexity of defining and measuring the concept of translation quality, the introductory chapter by Castilho et al., entitled “Approaches to Human and Machine Translation Quality Assessment”, reviews the main approaches to TQA both in the context of human translation and of MT, highlighting similarities and differences between different contexts, i.e. research, education and industry. Given the current situation where the boundaries between human and machine translation are increasingly blurred due to the increasing use of MT and post-editing, the overview of research on assessing post-editing is particularly relevant. Indeed, this avenue of research is crucial since it reflects the need to optimize translation processes and pricing decisions in order to make post-editing time- and cost-effective as compared to translating from scratch.

In the second chapter, “Translation Quality, Quality Management and Agency: Principles and Practice in the European Union Institutions”, Drugan et al. focus on the many TQA methods used by the European Commission’s Directorate-General for Translation. This case study is particularly interesting since ensuring translation quality in this context is of the utmost importance for two main reasons: the translations of EU legal acts have legal effect, and translation is a key instrument in communicating the EU vision and goals to European citizens. Furthermore, the highest quality must be reached within a highly complex workflow which comprises a high number of processes and tools, and involves thousands of translators and huge translation volumes. Thus, quality management represents a crucial issue and presents diverse and important challenges. This chapter describes the different institutions, units and processes involved in the translation workflow, and also contains an interesting section about the impact of the translation quality management model on individual translators’ experience.

The third contribution, “Crowdsourcing and Translation Quality: Novel Approaches in the Language Industry and Translation Studies” by Jiménez-Crespo, deals with a novel and broadened application of TQA. The development of new platforms and technologies allowing large groups of people to cooperate has led to the rise of crowdsourcing, whereby some processes previously conducted by professionals are outsourced to groups. This new model required the creation of innovative workflows aimed at achieving the best possible results. The contribution presents an overview of workflow practices in crowdsourcing platforms aimed to guarantee translation quality, and clearly outlines the impact of crowdsourcing on the notion of translation quality. In particular, three key changes are reviewed, namely (i) a move-away from a static notion of quality towards a more dynamic “fitness for purpose” paradigm, (ii) the great importance given to the process to ensure final quality, and (iii) the sharing of responsibilities in terms of quality among the various participants in the process.

In the fourth and final chapter of the first part, “On Education and Training in Translation Quality Assessment”, Doherty et al. highlight a crucial point for translators: especially with the advent of translation technologies - first and foremost MT - translation workflows have become more varied and complex. Thus, mastering TQA methods, practices and tools is of the utmost importance. However, within academia, up to now there is a lack of education and training opportunities for translation students, which has negative effects both on their employability and on their long-term professional practice. The chapter provides a review of the literature on translation syllabi and programmes that include translation technology and TQA methods, as well as recommendations to help educators and translators to acquire the most appropriate skillset required by the translation market.

In Part II “Developing Applications of Translation Quality Assessment”, attention is shifted to new metrics and TQA methodologies, with a focus on expectations and methods for MT evaluation. Arle Lommel begins this part with the fifth chapter of the book, “Metrics for Translation Quality Assessment: A Case for Standardising Error Typologies”, which offers an overview of the history of TQA in the translation industry as well as in translation studies and MT research. Starting from a lack of standard methods, the situation evolved in all fields until 2012, when the Multidimensional Quality Metrics (MQM) and the Dynamic Quality Framework (DQF) projects independently started to address the need for such standard methods. The chapter then describes in all details the two TQA methods and their systematic harmonisation into a shared error typology, which provides the community with a common vocabulary to create the most appropriate translation quality metrics for each translation framework.

In the sixth chapter “Error Classification and Analysis for Machine Translation Quality Assessment”, Popović presents an overview of approaches and typologies for the classification and analysis of errors in MT output. The goal of error classification is to go beyond standard evaluations that only provide global scores of translation quality and allow for the collection of additional information on MT systems’ behaviour. Different error typologies are presented, and both manual and automatic error classification are described, together with the associated advantages, disadvantages, and challenges. A particularly interesting type of error classification is the one carried out on post-editing data. Starting from the assumption that post-editing can be viewed as implicit error annotation, assigning an error category to each performed post-edit operation results to be a useful TQA method allowing for the analysis of the post-editing process. Finally, the contribution discusses other methods for error analysis, which are typically aimed to evaluate specific linguistic phenomena.

Andy Way contributes the seventh chapter “Quality Expectations of Machine Translation”, which is devoted to MT and addresses the important topic of what level of quality can be expected from it nowadays. It examines the main use cases where MT is being exploited, discussing the level of quality reached by MT and how this quality should be measured. Way argues that, in order to properly evaluate an MT system, it is necessary to abandon the idea of a single standard notion of quality, and to consider other crucial factors, such as the context in which MT is to be used (its  “fitness for purpose”, as discussed in Chapter 3), and the expected lifespan of the translation (i.e. for how long it will be consulted). The chapter then describes how MT quality has been evaluated over the years, addressing advantages and issues related to human and automatic evaluation, with a focus on the shortcomings of BLEU, the most used automatic metric. A particularly relevant aspect covered by this chapter is the MT post-editing use case, since it would be crucial for companies to have a clearly defined evaluation process enabling them to assess whether MT should or should not be introduced into their translation workflow.

The eighth chapter, “Assessing Quality in Human- and Machine-Generated Subtitles and Captions” by Doherty and Kruger, addresses the complex use case of audiovisual translation (AVT). Besides growing very fast, this field is becoming increasingly merged with novel technologies, making the assessment of AVT quality particularly challenging. The contribution focuses on intralingual captioning and interlingual subtitling. It starts by describing the current situation, in which quality assessment is largely based on several de facto industry guidelines. While typically varying by organisation, country, language etc., these guidelines have in common clear prescriptions for three main parameters: the accuracy of the caption/subtitle content, its presentation and timing. Then, the chapter reviews the major AVT empirical studies, which are based on the assumption that the measurement of quality is strictly related to how viewers process and receive AVT products. Finally, the focus shifts to the interaction of AVT and language technology, describing the various tools becoming available and discussing how these are challenging the traditional concept of quality in AVT.

Finally, Part III, “Translation Quality Assessment in Practice”, addresses three research studies aimed to understand the usability of MT in novel real-world application scenarios. The first study is presented in the ninth chapter, “Machine Translation Quality Estimation: Applications and Future Perspectives”, where Specia and Shah assess the efficacy of MT quality estimation (QE). QE is the task of automatically estimating how good or reliable an MT output is without access to human reference translations. This contribution reviews experiments for some of the most promising and practical applications of QE. The first one is predicting post-editing effort of MT outputs. This is the most widely studied task, which can allow the optimization of the human post-editing workflow by excluding low-quality segments that would require too much effort to be corrected. The second application consists of exploiting QE to select the best MT system from multiple options. A third application is strictly related to MT system development research, where QE is applied to select good MT outputs to be used as additional training data to improve its performance. The last application of QE is aimed to select the most representative samples of MT outputs for quality assurance by humans. While further investigation is still required, current findings are promising and show that QE can have the potential to make MT more useful to different types of end users.

The tenth chapter, “Machine Translation and Self-post-editing for Academic Writing Support: Quality Explorations” by O’Brien et al., presents an exploratory study assessing whether MT can be a useful aid for academic writing in English as second-language, and what impact it might have on the quality of the text produced. An extensive literature review poses the basis for the experiment, in which participants were first asked to draft an abstract in their field of expertise, half in English and half in their native language. Then, the native language text was automatically translated into English and participants were required to review the full English abstract. Results showed that times for drafting were not substantially different in the two conditions, while revision time and number of revisions were greater when working on the English automatic translation. Furthermore, a post-task survey showed that participants had mixed views on the use of MT and self-post-editing. Finally, corrections by a professional reviser and by an automatic language checker revealed that both parts of the abstracts were of comparable quality, confirming that MT has not a negative impact on the final quality of the academic text produced. A number of interesting open questions that require further investigation conclude the chapter.

Toral and Way contribute the eleventh and final chapter of the book, “What Level of Quality Can Neural Machine Translation Attain on Literary Text?”, which describes extensive experiments conducted to assess the performance of MT on the great challenge of translating literary text. A neural (NMT) and a phrase-based statistical (PBSMT) system were trained with English novels and their translations into Catalan and then extensively evaluated and analysed. Automatic evaluation confirmed that NMT performs significantly better than PBSMT also in the literary domain. A number of additional analyses were conducted focusing on different characteristics of the source-side of each novel, showing interesting behaviours of the two MT systems. To gain further insights, human evaluation was carried out on subsets of three test novels. Two evaluators were asked to rank the two MT systems’ outputs and the human professional translations according to their quality. As regards the two MT systems, the better performance of NMT over PBSMT was confirmed. As regards the comparison between NMT outputs and the professional translations, results are quite impressive: translations were judged of equal quality in up to one third of the inspected sentences.

On the whole, the great value of this book is its systematic interdisciplinarity: all the different topics presented in the book take into consideration both human and machine translation quality assessment and are analysed from a professional as well as an academic perspective. As such, each chapter can be used as a guide for the topic addressed.

To conclude, I recommend reading this book, which provides a broad and varied contribution to the field of translation quality assessment, able to enlighten people who are or will get involved in the work of translation.

©inTRAlinea & Luisa Bentivogli (2020).
[Review] "Translation Quality Assessment", inTRAlinea Vol. 22
This review can be freely reproduced under Creative Commons License.
Stable URL:

Go to top of page