Electronic tools and resources for translating and writing in the digital age

By Federico Zanettin (University of Venice, Italy)

Abstract

This article reports on the use of online language tools by the students of a course of English for International Relations. The students carried out text transformation activities, including the translation from English into Italian of extracts from a textbook of international criminal law and summaries in English of articles on humanitarian issues from specialized publications. I introduced corpora, translation memories and language-oriented searches using search engines as supplementary linguistic resources to carry out these tasks, in conjunction with the free online bilingual dictionaries and machine translation systems most students resorted to when engaging in writing activities involving an L2. I illustrate how these resources were used by the students, and discuss findings with respect to data-driven learning research.

Keywords: translation and language teaching, electronic tools, corpus-based translation studies

©inTRAlinea & Federico Zanettin (2018).
"Electronic tools and resources for translating and writing in the digital age"
inTRAlinea Special Issue: Translation And Interpreting for Language Learners (TAIL)
Edited by: Laurie Anderson, Laura Gavioli and Federico Zanettin
This article can be freely reproduced under Creative Commons License.
Stable URL: https://www.intralinea.org/specials/article/2295

1. Introduction

When corpora where first introduced in the foreign language and translation classroom in the early 1990s (Aston, 2001), they were part of a wave of computer-aided tools which were enriching the learning environment but which also called for special facilities and involved the purchase and installation of specific software, and required careful consideration about disk space and processing speed. Corpus-based activities also often required learners to build corpora and to receive training in specialized computer applications. While the effort proved rewarding for students in vocational courses, the average language learner or second language user might not always appreciate the additional learning curve required. Now a variety of electronic tools are available to the average computer user as online services and it is almost unthinkable to be anywhere without being connected to the Internet. In this changed technological environment, in which dictionaries, machine translation (MT) systems, translation memories (TMs), corpora and other language resources are available even as smartphone applications, and the whole textual universe can be looked up and brought to the ubiquitous screen, students and teachers alike may bring their laptops and tablets to the classroom and access online resources as they engage in writing activities.

In what follows, I discuss the use of online language tools and resources by a group of average language learners, the students of a course of English for International Relations at the Department of Political Sciences of the University of Perugia. Each using a computer, the students carried out text transformation activities, including translation from English into Italian of extracts from an international criminal law textbook and summaries in English of articles on humanitarian issues from a specialized Internet publication. They were introduced to corpora and databanks, and to techniques for exploiting search engines in order to tailor searches to their linguistic needs. These were discussed as supplementary linguistic resources, which students used in conjunction with the free online bilingual dictionaries and MT systems most students would resort to when engaging in writing activities involving an L2. Translation activities aimed at developing L2 reading comprehension skills, while the summarizing activities were meant to promote L2 writing skills.

In order to attend the course students were required to have either a B2 certificate obtained from the University Language Center (CLA) or an international accreditation such as a TOEFL or IELTS certificate. A couple of students were, however, allowed to attend the course even though they had not yet attained the level of proficiency specified, as they were concurrently attending a B2 level course at the Language Centre. Two students were native speakers of languages other than Italian (Albanian and Pashto), four more were bilingual in Italian and another language (Albanian, Rumanian, Spanish, Arabic). There were no native speakers of English. While none of the about 20 students had any previous formal education in languages or linguistics other than the mandatory few credits in foreign languages obtained during their first degree course (Laurea di primo livello) and the B2 certificate, most of them were highly motivated to improve their writing and translation skills. Part of the students had attended or were concomitantly attending a course of Human Rights and International Criminal Law, which comprised both a general introduction to Human Rights Law and Humanitarian Law and discussion of related case studies and, as part of the course workload for the English course, students were also required to give brief presentations on a case study based on research articles drawn from law journals. Moreover, some class time was spent on vocabulary building activities selected from legal English course books such as Brown and Rice (2007) and Krois-Lindner and TransLegal (2011).

The main focus of the course was however on translation and writing activities. Students had to carry out several translation and other writing assignments at home and in class, some of which were graded towards the final mark. Translation tasks consisted in arriving at an Italian version of extracts taken from an international criminal law textbook which had previously been read and discussed, while writing tasks into English consisted in summarizing the content of media articles focusing on conflict and peacebuilding[1], whose length ranged from 1,500 to 3,500 words. Several classes were devoted to practical, hands-on sessions in which students wrote a text, either the translation or the summary, using any means available to them. These sessions were preceded by an introduction to and a discussion of the various tools available to assist writing and translating, and followed by an analysis and discussion of the texts produced.

2. The tools

The first writing task was an assignment in which the students were asked to translate the first part of the table of contents and the first paragraph of the first chapter of An Introduction to International Criminal Law and Procedure (Cryer et al., 2010). They were told they could avail themselves of any means they wanted, including their smartphones, to carry out the translation task. They could write their translation either on paper or on screen, as most students had with them a laptop computer or tablet connected to the Internet. After this first translation was handed in, we started to discuss how they had approached their assignment and what tools they had used. Typically, most students had looked up the translation of some words in a bilingual dictionary, which was almost in all cases the online English-Italian dictionary WordReference[2]. A few other students had instead resorted to two different types of tools, namely those offered by the Reverso Context website and the online machine translation service offered by Google. Reverso.net is a language services portal which, like WordReference, offers part of its services for free[3]. However, as opposed to WordReference, Reverso is not based on dictionary entries which are looked up as individual words (though, like WordReference, it provides access to an online version of the Collins bilingual dictionaries), but rather on very large TMs derived from official documents (e.g. those culled from EU websites), and other multilingual sources such as film subtitles and commercial websites. Finally, Google translate is the free MT service based on neural networks (GNMT, Perez, 2017) provided by the Internet giant and possibly the most used translation service worldwide[4].

Regardless of the different data sources, technologies and functionalities of these different tools, students used them in order to find bilingual equivalents at the word level. This use was most obvious with WordReference, which presents itself as a bilingual dictionary, offering a set of “principal translations” for any given word entry, together with some grammatical abbreviations, usage comments, and model examples. However, students used Reverso and Google translate in much the same way as they used WordReference, indeed sometimes as an alternative to it: that is, they typed single lexical items in the input field, which led to the immediate appearance of one or more items in the target language. In using a word for word replacement mechanism they largely conformed to the typical behaviour of non-professional as opposed to professional translators, who have generally been found to process language units larger than individual words (Tirkkonen-Condit, 1992; Lörscher, 1991). Apparently, the students were not aware they could use both Reverso and Google translate to produce translations of phrases, sentences, paragraphs and whole texts (though the output would differ depending on the type of tool used). Students who used Google translate as a primitive bilingual dictionary were also generally wary about using the MT system to translate whole texts, both because to some of them it felt like cheating, and because they were negatively biased against the output, not expecting it to be good and perhaps not feeling competent to evaluate its quality.

Having observed the students’ tendency to use the above resources essentially as support on the word level, I decided to dedicate some class time to exploring the different options and features offered by dictionaries, TMs and MT systems. As concerns dictionaries, for instance, I illustrated how monolingual dictionaries, both in the source and target language[5], may sometimes prove more useful than bilingual ones, respectively for comprehension and production. The availability of other free bilingual dictionaries[6] as well as specialized bilingual legal dictionaries, glossaries and term banks[7] was also pointed out. As concerns TMs, I first showed how dictionaries based on TMs, as opposed to more traditional dictionaries like WordReference, arrive at proposing translation equivalents mainly on the basis of statistical evidence, i.e. they present the most likely translation equivalent(s) extracted from the parallel corpora used as data (see Zanettin, 2012: 149–180 for details). One of the main advantages of websites based on translation memories such as Reverso.com, Bab.la and Linguee.com lies instead in the possibility of looking up the translation(s) of groups of two or more words; such sites can also be used to recover several parallel contexts, i.e. paired instances of source and translated fragment containing (part of) the expression typed in a query. As concerns MT systems, the quality of Google translate and of a competing MT system, DeepL.com (see Heiss & Soffritti, this volume), was assessed against the translations independently produced by the students. This allowed the students to ascertain that single word translations were often misleading and less informative than those found in dictionaries and TMs, and that the quality of the output varied in relation to the type of text inputted. In the case of specialized legal texts such as the one considered, the resulting MT translations of full paragraphs were, however, of a quality higher than expected. While the translations produced by the two MT systems considered differed, and both contained inaccuracies and omissions, it was found that quite a few sentences or phrases were of a quality comparable or even better than those produced by at least some of the students, especially as concerned specialized terminology. This may not be unexpected, given that legal terms banks and parallel corpora are among the most widely available language resources which can be exploited by MT systems. The problem lay, of course, in being able to evaluate and gauge which parts of the automatic translation fared well or better than the others.

After discussing the various benefits and disadvantages of dictionaries, TMs and MT systems, I introduced two further tools, ones that the students had never considered or even heard of before, i.e. the use of search engines as linguistic resources, and language corpora. As concerns the first, students eagerly welcomed the various ”tricks” - i.e. syntax rules for performing advanced searches in search engines like Google.com, Yahoo.com or Duckduckgo.com - which they saw they could use to check whether a turn of phrase or an expression they would use in their Italian translation or in their English summary was attested, frequent and reliable. While a few students already knew they could use quotation marks to search for exact phrases, none of them was aware that inserting an asterisk in the “quotation” allows for phrase variation, for instance that a search for

(1) "di * violazioni dei diritti umani"

in Google would produce about half a million hits in which highlighted examples included “di gravi violazioni dei diritti umani”, “di una vasta gamma di violazioni dei diritti umani”, “di numerose violazioni dei diritti umani” and so on. These findings, prompted by a search for a suitable translation for “gross” in the phrase “gross human rights violations”, after a search for this adjective in WordReference had produced results which to most students appeared unsatisfactory (e.g. lordo, evidente, generale, nauseabondo, denso, volgare, etc.) showed that the word “gravi” recurrently collocated with “violazioni dei diritti umani”. The further option of restricting the search to a single Website using the appropriate operator (site:), as in a search for

(2) "di * violazioni dei diritti umani" site:diritto.it

was also appreciated, as it produced five instances of the phrase “di gravi violazioni dei diritti umani” out of seven overall results, which were deemed especially significant given the authoritativeness of the source. A search for the translation hypothesis which had at this point surfaced, that is

(3) "gravi violazioni dei diritti umani"

confirmed that indeed this phrase was consistently and frequently used in Italian media and advocacy websites with a focus on human rights.

The option of limiting a search to specific domains such as co.uk or .edu in order to filter results by geographical or institutional provenance, or to the Google Books (books.google.com) or Google Scholar (scholar.google.com) websites, to include only results from published books and magazines or scholarly literature, respectively, was especially valued when it came to writing in English, as it allowed students to sift out potentially badly written or non standard documents. Illustrating search engine syntax and operators[8], and showing students how “search engines quotations” (SEQs) could be used as a tool to revise their translation or summary was also useful preparation for subsequently introducing corpora and corpus linguistics tools.

The use of search engines to retrieve examples, for instance, had a priming effect, as students learned to perform “pattern searches” (Pérez-Paredes et al., 2012: 494) as opposed to only searching for content. When then introduced to corpora, the students appreciated that the limitations of search engines lay not only in the unedited nature of the data and the instability of the results, but also that when checking for errors or common usage corpora permitted them to carry out more elaborate pattern searches - that is not only instance searches (i.e. using words as input), but also model searches (using as input categories of elements such as “Article + Noun”) and mixed searches (a combination of the previous two) (ibidem).

While a few different corpora of both Italian and English were briefly introduced, I chose to illustrate corpus use through Mark Davies’ query interface to the collection of corpora held at Brigham Young University, and more specifically to the Corpus of American Contemporary English (COCA, see Davies, 2009). The reasons for this were first that this very large and balanced corpus is currently freely accessible[9], and second that the interface makes corpus data easily available, and gives the user the opportunity to familiarize with different ways of retrieving information. Given the short time available for training and the general lack of previous education in languages and linguistics, COCA’s search syntax seemed suitable to the students’ needs, in that formal operators are kept to a minimum and demonstrated through a hands-on approach which takes advantage of the online format. Example searches are interactive, i.e. they are links which when clicked on run the queries given as examples and allow students to see the format these take as well as the actual data produced by the search. Students appreciated not only the possibility to search for patterns, but also to look up collocations and compare the usage of word pairs, two of the main features of the COCA interface.

The next section provides some examples of how the different tools were used by the students.

3. Translating and summarizing

While translating extracts from Cryer et al. (2010), students stumbled across many words of Latinate origin for which Italian cognate words can be easily identified, but whose status as “true” or “false” friends was not immediately apparent. For instance, the first chapter of Cryer et al. (2010: 3) begins with the sentence:

(4) International law typically governs the rights and responsibilities of States

the verb “govern” elicited a number of dictionary and Web searches, as some students were not satisfied with the “literal” translation governa, which some found was the translation proposed when typing the English verb into Google translate. Other students looked the word up in WordReference, where they found the entry in fig. 1.

Fig. 1. Screenshot from [url=http://www.wordreference.com/enit/govern]http://www.wordreference.com/enit/govern[/url]

WordReference presents four different senses of the verb, each illustrated by one or more English synonyms, one or more Italian equivalents and a translated example for each. Having discarded the first and the second sense, both rendered by the Italian governare, and respectively involving intransitive usage or requiring an animate subject, the students decided that the third (control/restrain, controllare/impedire) and fourth sense (influence/determine, controllare) were also not appropriate. In order to reach a better comprehension of the source text, some students thus turned to monolingual dictionaries, and found that the online Merriam-Webster Dictionary lists, among others, the meaning “to serve as a precedent or deciding principle for”, providing as an example the sentence “customs that govern human decisions”. The online Oxford English Dictionary also had “Serve to decide (a legal case)”, and provided several usage examples for this sense of the verb. While monolingual dictionaries did not provide students with translation solutions, they allowed for better comprehension and convinced students to perform further searches. One student, for instance, noted that the WordReference English-Italian entry for “govern” also included links to pages containing that word in related dictionaries, i.e. an English monolingual and an Italian-English dictionary. He clicked on the link to regolare, the only word not already offered as a translation in the main entry, and saw that “govern” (and “regulate”) were proposed as translations of regolare, with governare and disciplinare also offered as synonyms.

Another student decided instead to use a TM, and typed

(5) international law governs

into Linguee.com’s search box. The results (see fig. 2) showed that disciplinare was the most common verb used to translate “govern” in contexts in which the subject of the verb was (an) (international, national, humanitarian, State, etc.) law/agreement/convention, etc., and that the sources of many relevant examples were EU legal documents; this information convinced many students they had found the correct translation.

Fig. 2: Screenshot from [url=https://www.linguee.com/english-italian/search?source=auto&query=International+law+governs]https://www.linguee.com/english-italian/search?source=auto&query=International+law+governs[/url]

Some students were still not fully satisfied, and searched for the hypothesized translation with a search engine, typing either “il diritto internazionale disciplina” or “il diritto internazionale regola”, or both. They found that the former phrase retrieved over two million hits, and the latter over 400,000 hits. The translation of the direct object of the verb was also the occasion for some discussion, and led to different solutions. Most students translated the binomial “rights and responsibilities” as diritti e responsabilità, a seemingly straightforward literal equivalent. However, a few used instead the expression diritti e doveri, based on their intuition as native speakers or on findings from a TM. In fact, while diritti and responsabilità were unequivocally identified as correct single word equivalents for the two English words respectively, the collocation diritti e doveri came to the mind of some students as more salient[10], and a search for “governs rights and responsibilities” in Linguee.com produced several instances of the latter binomial as a translation equivalent, along with the more frequent diritti e responsabilità. Some students also checked whether the expression was attested on the Web, by googling the exact phrase "disciplina diritti e doveri" as opposed to "disciplina diritti e responsabilità", or "regola diritti e doveri" as opposed to "regola diritti e responsabilità", depending on the verb they favored. Neither search yielded any results when the verb was used in conjunction with diritti e responsabilità. Both produced several results in conjunction with diritti e doveri, though the search containing disciplina retrieved only a fraction of those retrieved by that containing regola (536 hits as opposed to 17,200).

The translations finally produced by the students varied as regards the combination of the verb and the object binomial[11], with some students giving preference to disciplina over regola, as the former seemed to be a stronger collocate for diritto internazionale, and others favoring the latter, which some saw as a stronger collocate for diritti e doveri. Others still reckoned that diritti e responsabilità was a better translation for the binomial, since it was the most frequently attested in TMs.

As students went about translating in this fashion, the extent to which they resorted to dictionaries, TMs and SEQs varied depending on their proficiency and awareness of potential pitfalls in the source text. Some students also resorted more of less extensively to an MT system, and the translations produced by Google translate and DeepL were assessed and compared with those arrived at by students who had used other means and resources. While it was agreed that both systems generated translations of generally good quality, it was also observed that both had some “glitches” (for instance, both skipped sentences or fragments when fed a longer text) and both produced mistakes. In the case of the sentence discussed as an example, both MT systems generated the same target text, which was remarkably similar to those produced by the students, namely “Il diritto internazionale disciplina in genere i diritti e le responsabilità degli Stati”.

As concerns the summary writing tasks, students were told to use the same tools they had used when translating, with the addition of monolingual corpora, which had at this point been introduced. After having been briefly introduced to the general procedure and the various steps to be followed in order to write a summary[12], and receiving some guidelines concerning, for instance, the use of formulaic expressions and reporting verbs (e.g. “the author argues that”, “according to the author”), students engaged in reading and writing. In order to understand the text, some students relied extensively on dictionaries and TMs to check single words and expressions, while others used an MT system to generate a translation into their first language of the full text to be summarized, using the parallel text display of the MT system to check their comprehension of the source text. In addition, some students resorted to the corpus to acquaint themselves with the usage of words or expressions.

Moving on to the writing stage, the students used the same tools they had used when translating into Italian, only this time they also had recourse to the COCA corpus. For instance, one student provisionally wrote the phrase “armed groups have state functions”. However, after searching for the exact phrase in Google and not finding any results, she thought something was not right. She then searched for "armed groups * state functions", thinking the problem may be the verb, and therefore used the asterisk as a wild card in the hope of finding a more suitable replacement for the verb “have”. However, this turned out to be of no avail, as she only got very few results, and none of them seemed relevant. She thus ran a query in COCA, in order to find verbs collocating in the immediate vicinity of “state functions”. The results of this “mixed” search (Pérez-Paredes et al., 2012: 494) were few and not altogether clear, so she carried out another search, this time discarding the word “state” and looking for verbs immediately preceding the word “functions”, i.e. within two words to the left of the node (fig. 3).

Fig. 3: Screenshot for a collocate query in COCA ([url=https://corpus.byu.edu/coca/]https://corpus.byu.edu/coca/[/url])

The results (fig. 4) showed “perform” was the most frequent verb collocating with “function”, seemingly indicating that this might indeed be what she was looking for.

Fig. 4: Screenshot of results on the query in fig. 3

In order to obtain further confirmation the student looked at the actual concordance lines generated by the query (fig. 5), and noted various examples (e.g. “perform editorial functions”, “perform job functions”, “perform police functions”) conforming to the structure of the phrase she had hypothesized.

Fig. 5: Screenshot of a concordance for “functions” preceded by “perform”, from COCA

The student then ran a Google search for the exact phrase “perform state functions”, which produced about 500 results, and finally one for “armed groups perform state functions” which, while not retrieving any pages containing an exact match, produced a number of “fuzzy matches”, i.e. pages which contained those words but not in the same order or in a slightly different form, most of them coming from journal articles of books dealing with the same topic of the article to be summarized (fig. 6).

Fig. 6: Results of a Google search for “armed groups perform state functions”

Eventually, the student decided to incorporate this expression into a longer sentence in modified form, as follows: “It is not uncommon for armed groups to perform the typical state functions in the territories under their control”. While it is not possible for reasons of space to detail here all the moves that led to this choice, it is worth noting that the student’s final version included also expressions such as “it is not uncommon for”, encountered by chance while exploring the context surrounding the words or phrases which had prompted the SEQ search.

Towards the end of the course students were administered two timed tests, each to be completed within two hours, in which they had to a) carry out a translation into Italian of a 350 word extract, and b) summarize an article of about 1,500 words in a text of between 150 and 200 words. As opposed to previous translation, writing and reading tasks in which they were encouraged to raise doubts and discuss suggestions, as well as to try out all the different tools which had been discussed, they were in these instances left to their own devices and, facing time-constrained tasks, students differed in the type of tools used and the extent to which they made use of them.

The tests took place in a computer lab equipped with desktop computers, but students could bring with them their own laptops, tablets and mobile phones. Most students used only the desktop provided, but a few also kept handy their laptop or used mobile versions (apps) of dictionaries or TMs. While some students diversified their use of resources, most predominantly used only the one or two resources they felt more comfortable with. When translating into Italian some students favored online dictionaries while others TMs. WordReference and Reverso were still among the favorite tools, though generally students approached them in a different way than they had at the start of the course. A few students also used the MT systems to varying extents. One student in particular, whose first language was neither English nor Italian, produced the translation exclusively using MT systems, both Google translate and DeepL, switching back and forth between English, his native language and Italian. While the learning benefits of this exercise remain unclear, the quality of the target text eventually produced was acceptable, if lower than that of the translations produced by his course mates. All students used search engine advanced queries to revise their texts, though apparently this tool was preferred by more advanced students, perhaps because they needed to spend less time looking up words and phrases in dictionaries and TMs.

To understand the text to be summarized, many students used an MT system (with DeepL seemingly preferred over Google translate); they resorted to dictionaries, TMs and (to a lesser extent) the English monolingual corpus (COCA) only to enhance their comprehension of specific passages or words. When writing the summary they mostly abandoned dictionaries and used TMs, SEQs and the corpus to check for target language usage and revise their text. In order to test the acceptability of their tentative formulations, they searched for strings of words rather than individual words, to see whether the same or similar phrases would appear in the various textual resources. Furthermore, they used the MT systems as a production tool, a use I had not anticipated. Especially towards the end of the task, when they had to complete the writing assignment and time was running short, most students worked at finalizing their English text using the interface of an MT system as a writing tool. That is, they wrote and edited their summary in it and only at the end copied and pasted the summary in a word processor. In other words, students were checking whether what they were writing in English was acceptable against a simultaneous machine translation into Italian, changing the language in the English source text whenever they were not convinced that the Italian target text produced by the MT system was correct.

4. Discussion

While this study does not attempt a rigorous analysis of the course experience described in terms of learning outcomes, skill progression or quality evaluation, some tentative considerations may be ventured regarding student performances and patterns of use of the different tools and resources. The examples used to illustrate student activities and the present discussion are based on participant observation, that is, I discussed issues with the group as well as with individual students, and I took notes after each class. I also collected students’ translations and summaries in order to be able to compare and relate them to points raised and to behaviour observed in class.

A first finding is that bilingual dictionaries continued to be seen as the main reference tool and were used by all students, while monolingual dictionaries were consulted only rarely and only by a few students. However, whereas at first they were looking for “the answer” to a translation or reading comprehension problem, i.e. for “the” correct translation equivalent, after using other tools students seemed to have became better dictionary users in that they accepted that more often than not dictionaries offer more answers, or sometimes none at all. With regard to the use of bilingual parallel corpora in the form of TMs and of the Web as corpus (Kilgarriff and Grefenstette, 2003), students had sometimes to be reminded not only of the importance of considering frequency information (Geluso, 2013), but also of paying attention to the “quality” and reliability of the data, as indicated, for instance, by geographical provenance and domain name. They tended to use these resources as reference in two different modes. Usually they conducted searches to test previously formulated hypotheses on the basis of evidence from real language use, what Kennedy and Miceli (2010: 31–34) term “pattern defining” searches. These, however, often turned into “pattern hunting” searches, that is, the concordances and texts retrieved provided fodder for enriching the content and language of their writing. This is because both TM and SEQ queries sometimes resulted in “fuzzy searches” which did not retrieve literal strings but rather “similar” contexts, i.e. bitextual segments (in the case of TMs) or Web pages (in the case of SEQs) which only partially matched the search pattern. The lack of precision was however often well compensated by the wealth of relevant examples.

In the timed tests, the COCA corpus was used only when writing the summary, and only by the most proficient students. This is perhaps not surprising, given that, while corpora have been extensively used in language teaching and learning since Tim Johns first introduced the concept of data-driven learning (Johns, 1990; see Tribble, 2015 for an overview), the potential of corpora for learners with lower levels of proficiency, training or motivation continues to be controversial (e.g. Brodine, 2001; Boulton, 2017: 485). In a previous study (Zanettin, 2009) I discussed how Google’s advanced search features can be used in order to exploit the Web as a language rather than content resource (see also Robb, 2003; Shei, 2008), and how corpora can be drawn on as resources for translation activities into the L2 in language learning settings[13]. There I argued that integrating corpus resources into second-language writing and translating makes it possible to supplement the traditional learning grammar of “dictionary items + combinatory rules” with a novel learning grammar of “corpora + rules for querying and analyzing them”. I suggested that the multifarious anarchy of the Web can be seen as complementary to well-constructed corpora which, while certainly more reliable than the Web as concerns core patterns of language use, cannot rival its lexical and phraseological richness. Here I suggest that, since corpora are not as intuitive as dictionaries, general search engines and other resources that are more familiar to the public, introducing advanced Web searches (SEQs) may be instrumental in making learners more sophisticated corpus users.

Corpus query interfaces were not initially designed for relatively unsophisticated users like non-vocational language learners, but rather had lexicographers and linguists in mind. Thus, while COCA’s relatively “user-friendly” query interface is certainly a step in the direction of making corpora more accessible to this kind of users[14], mastering corpus consultation is a gradual, long-term process (Kennedy and Miceli, 2010: 29), and one which most students in the course described above perceived as too time consuming and laborious, especially when time limits were factored in. As noted by Frankenberg-Garcia (2012: 476) “corpus skills that come as second nature to experts are not at all obvious to the untrained”. For many students demonstrating how to use a corpus while discussing translation choices proved useful as a way of raising their consciousness that language is to a large extent idiomatic, thereby helping them realize that, as John Sinclair famously put it, “a large number of semi-reconstructed phrases ... constitute single choices, even though they might appear to be analyzable into segments” (Sinclair, 1991: 110).

The observations of this study are in line with findings from other data driven learning studies such as Kennedy and Miceli (2010), Pérez-Paredes et al. (2012), Conroy (2010) and Frankenberg-Garcia (2005), who describe learning experiment in which students used corpora as reference resources among others (see also Zanca, this volume). In Kennedy and Miceli (2010) students worked with dictionaries and corpora to revise their own creative writing, and the authors noted that different learners used different resources with different frequencies, depending on individual preferences and personal learning experiences, attitude to grammar, etc. so that only some of them incorporated corpora into their language work, thereby extending the repertoire of resources at their disposal. Pérez-Paredes et al. (2012) describe how hands-on uses of corpora worked better when integrated with SEQs and bilingual dictionaries, while Conroy (2010) argues that corpus concordancing alongside advanced Google searches promotes more and better use of Google in error correction. Frankenberg-Garcia (2005), after analyzing students’ choices of reference resources for various types of queries, highlights the importance of presenting a corpus as an addition to the learners’ suite of resources rather than in isolation.

5. Conclusions

Multi-word expressions, ranging from collocations, binomials and multi-word verbs to idioms, proverbs, speech-formulae, etc. have been estimated to account for between 20 and 50% of native speaker discourse (Syanova-Chanturia and Martinez, 2015: 349–350). In this respect, the main learning outcome observed in this study was perhaps the fact that at the end of the course students started to operate with linguistic units larger than orthographic words, adopting a phrasal perspective and chunking discourse in segments longer than single dictionary entries, thus improving their ability to analyze and process language.

Syanova-Chanturia and Martinez (2015: 353) argue that the way statistical information is used during language processing to estimate the probability of appearance of certain words is “not unlike the predictive text algorithms designed to facilitate typing on smartphones and internet search engines”. While explicit linguistic knowledge allows learners to produce grammatical generalizations, the use of the Web as a corpus and general TMs leans closer to acquisition than to learning, as it allows L2 learners to work with unanalyzed knowledge and retrieve extended segments of language independently of any awareness of grammatical or syntactic categories. As such, it is a process often “incomplete, fragmented and approximate, envisaging, or at least allowing, gradual revision and fine-tuning as learners come into contact with new data” (Ciliberti, 1994: 10, my translation).

Hands-on use of corpora such as COCA, as opposed to TMs and SEQs, allows learners to perform more precise and refined pattern searches; at the same time, in order to obtain relevant results, they are required to adopt analytical procedures more akin to those of linguists than to those they have become familiar with by using search engines. Thus, corpus use may be seen at the same time as both demanding and as fostering better analytic skills. Finally, I suggest that MT systems, which can be seen as automatically mimicking the procedures of searching, retrieving, selecting and combining segments from corpora which learners carry out manually, seem to have a place as reference resources for language learners[15], a use which may well deserve further exploration.

Refererences

Aikawa T. (2014), Language Technology and Its Role in Language Teaching and Learning, in Proceedings of the 24th Conference of the Central Association of Teachers of Japanese, Eastern Michigan University, MI, [url=http://commons.emich.edu/catj/1/]http://commons.emich.edu/catj/1/[/url]

Aston G. (ed.) (2001), Learning with Corpora, CLUEB & Athelstan, Bologna & Houston.

Aston G. & Burnard L. (1998), The BNC Handbook. Exploring the British National Corpus with SARA, Edinburgh University Press, Edinburgh.

Bauer-Ramazini C. (2017), Guidelines for using in-text citations in a summary (or research paper), [url=http://academics.smcvt.edu/cbauer-ramazani/AEP/EN104/summary.htm]http://academics.smcvt.edu/cbauer-ramazani/AEP/EN104/summary.htm[/url]

Boulton A. (2017), Corpora in language teaching and learning, Language Teaching 50(4): 483–506.

Brodine R. (2001), Integrating corpus work into an academic reading course, in Aston G. (ed.), Learning with Corpora, CLUEB & Athelstan, Bologna & Houston: 138–176.

Brown G. D. & Rice S. (2007), Professional English in Use. Law, Cambridge University Press, Cambridge.

Ciliberti A. (1994), Manuale di glottodicattica, La Nuova Italia, Milano.

Conroy M. A. (2010), Internet tools for language learning: University students taking control of their writing, Australasian Journal of Educational Technology 26(6): 861–882.

Cryer R., Friman H., Robinson D. & Wilmshurst E. (2010), An Introduction to International Criminal Law and Procedure. Cambridge University Press, Cambridge.

Davies M. (2009), The 385+ million word Corpus of Contemporary American English (1990–2008+). Design, architecture, and linguistic insights, International Journal of Corpus Linguistics 14(2): 159–190.

Frankenberg-Garcia A. (2005), A peek into what todays language learners as researchers actually do, International Journal of Lexicography 18(3): 335–355.

Frankenberg-Garcia A. (2012), Raising teachers awareness of corpora, Language Teaching 45(4): 475–489.

Geluso J. (2013), Phraseology and frequency of occurrence on the web: Native speakers perceptions of Google-informed second language writing, Computer Assisted Language Learning 26(2): 144–157.

Johns T. (1990), From printout to handout: Grammar and vocabulary teaching in the context of data-driven learning, CALL Austria 10: 14–34.

Kennedy C. & Miceli T. (2010), Corpus-assisted creative writing: Introducing intermediate Italian learners to a corpus as a reference resource, Language Learning and Technology 4(1): 28–44, [url=http://www.doaj.org/doaj?func=fulltext&aId=581752]http://www.doaj.org/doaj?func=fulltext&aId=581752[/url]

Kilgarriff A. & Grefenstette G. (2003), Introduction to the Special Issue on the Web as Corpus, Computational Linguistics, 29(3), [url=http://www.mitpressjournals.org/doi/abs/10.1162/089120103322711569]http://www.mitpressjournals.org/doi/abs/10.1162/089120103322711569[/url]

Krois-Lindner A. & TransLegal (2011), International Legal English. A course for classroom or self-study use, Cambridge University Press, Cambridge.

Locklear S. (2018) How to Write a Summary - How to Write a Summary in 8 Easy Steps, eNotes Publishing, [url=http://www.enotes.com/topics/how-write-summary#how-to-how-write-summary]http://www.enotes.com/topics/how-write-summary#how-to-how-write-summary[/url]

Lörscher W. (1991), Translation Performance, Translation Process, and Translation Strategies. A Psycholinguistic Investigation, Narr, Tübingen.

Niño A. (2009), Machine translation in foreign language learning: language learners and tutors perceptions of its advantages and disadvantages, ReCALL European Association for Computer Assisted Language Learning 21(2): 241–258.

Och F. (2012), Breaking down the language barrier- six years in, Google Official Blog, [url=http://googleblog.blogspot.it/2012/04/]http://googleblog.blogspot.it/2012/04/[/url] breaking- down- language- barriersix- years.html

Pérez-Paredes P., Sánchez-Tornel M. & Alcaraz Calero J. M. (2012), Learners search patterns during corpus-based focus-on-form activities, International Journal of Corpus Linguistics 17(4): 483–516.

Perez S. (2017), Googles smarter, A.I.-powered translation system expands to more languages, TechCrunch.com, [url=https://techcrunch.com/2017/03/06/googles-smarter-a-i-powered-translation-system-expands-to-more-languages/]https://techcrunch.com/2017/03/06/googles-smarter-a-i-powered-translation-system-expands-to-more-languages/[/url]

Robb, T. (2003), Google as a quick n dirty corpus tool, TESL-EJ, Teaching English as a Second or Foreign Language 7(2): 1–10, [url=http://tesl-ej.org/ej26/int.html]http://tesl-ej.org/ej26/int.html[/url]

Shei C. C. (2008), Discovering the hidden treasure on the Internet: Using Google to uncover the veil of phraseology, Computer Assisted Language Learning 21(1): 67–85.

Sinclair J. (1991), Corpus, Concordance, Collocation, Oxford University Press, Oxford.

Syanova-Chanturia A. & Martinez R. (2015), The idiom principle revisited, Applied Linguistics 36(5): 549–569.

Tirkkonen-Condit S. (1992), The Interaction of World Knowledge and Linguistic Knowledge in the Processes of Translation: A Think-aloud Protocol Study, in Lewandowska-Tomaszczyk B. & Thelen, M. (eds), Translation and Meaning, Part 2, Rijkshogeschool Maastricht, Maastricht: 433–440.

Tribble C. (2015), Teaching and language corpora: Perspectives from a personal journey, in SLeńko-Szymańska A. & Boulton A. (eds), Multiple affordances of language corpora for data-driven learning, John Benjamins, Amsterdam/Philadelphia: 37–62.

Warner R. (2015), Google advanced search. A comprehensive list of Google search operators, Beyond, https://bynd.com/news-ideas/google-advanced-search-comprehensive-list-google-search-operators/

Zanettin F. (2009), Corpus-based Translation Activities for Language Learners, The Interpreter and Translator Trainer 3(2): 209–224.

Zanettin F. (2012), Translation-driven corpora. Corpus resources for descriptive and applied translation studies, St Jerome, Manchester.

Notes

[1] The articles were taken from media platforms and magazines such as OpenDemocracy ([url=https://www.opendemocracy.net/]https://www.opendemocracy.net/[/url]) and Foreign Affairs ([url=https://www.foreignaffairs.com]https://www.foreignaffairs.com[/url]).

[2] This is not surprising given that the website, which offers an interface to a number of free bilingual dictionaries, as of January 2018 ranked as the 76th most visited website in Italy and the 305th worldwide ([url=https://www.alexa.com/siteinfo/wordreference.com]https://www.alexa.com/siteinfo/wordreference.com[/url]).

[3] As of January 2018 Reverso.net ranked as the 101st most visited website in Italy and the 341th worldwide ([url=https://www.alexa.com/siteinfo/reverso.net]https://www.alexa.com/siteinfo/reverso.net[/url]).

[4] In 2012 the Google MT system translated every day (between 64 languages) the equivalent of 1 million books, or as much as all human translators translate in one year (Och, 2012).

[5] E.g. Oxford Dictionary ([url=https://en.oxforddictionaries.com/]https://en.oxforddictionaries.com/[/url]), the Cambridge Dictionary ([url=https://dictionary.cambridge.org/]https://dictionary.cambridge.org/[/url]) and the Merriam-Webster Dictionary ([url=https://www.merriam-webster.com/]https://www.merriam-webster.com/[/url]) for English, Il Sabatini-Coletti ([url=http://dizionari.corriere.it/dizionario_italiano/]http://dizionari.corriere.it/dizionario_italiano/[/url]) and Hoepli Italiano ([url=http://www.grandidizionari.it/Dizionario_Italiano.aspx]http://www.grandidizionari.it/Dizionario_Italiano.aspx[/url]) for Italian.

[6] E.g. Il Sansoni Inglese ([url=http://dizionari.corriere.it/dizionario_inglese/]http://dizionari.corriere.it/dizionario_inglese/[/url]).

[7] E.g. the MultiLex Dizionario Giuridico Generale Inglese-Italiano ([url=http://multilex.it/wp-content/uploads/2014/07/Dizionario-Giuridico-Generale-ITA-ENG-Prima-Edizione.pdf]http://multilex.it/wp-content/uploads/2014/07/Dizionario-Giuridico-Generale-ITA-ENG-Prima-Edizione.pdf[/url]) and the specialized legal term banks at the Interactive Terminology for Europo (IATE) website ([url=http://iate.europa.eu/]http://iate.europa.eu/[/url]).

[8] See e.g. Warner (2015) for a comprehensive list of Google search operators. See e.g. Robb(2003) for the use of Google for linguistic research and language teaching and learning.

[9] There are, however, a number of restrictions, which in the case of unregistered students like those attending the course consisted mainly in obligatory “reboots” of the system after a few searches.

[10] A Google search for the two expressions confirms that diritti e doveri is indeed more frequent with over 500,000 occurrences, about twice as many as those of diritti e responsabilità.

[11] As well as regards the choice of the adverb, since “typically” was translated alternatively as tipicamente, solitamente, and in genere.

[12] See e.g. Bauer-Ramazini (2017) and Locklear (2018).

[13] In the experiment reported a short text was translated from Italian into English by an MT system and then revised by the students using the Web as corpus and corpora such as the British National Corpus (BNC; see Aston and Burnard, 1998) and the COCA corpus.

[14] “The query interface is the means through which learners retrieve information from the corpus and, therefore, their success in obtaining relevant results will largely depend on its features, functionalities and ease of use” (Pérez-Paredes et al., 2012: 487).

[15] On the relationship between MT and language teaching and learning see Niño (2009) and Aikawa (2014).

About the author(s)

Federico Zanettin is Full professor of English Language and Translation at the University of Venice, Italy.

Email: [please login or register to view author's email address]