Respeaking-based online subtitling in Denmark
By Inge Baaring (Copenhagen Business School)
Abstract & Keywords
Il riconoscimento vocale in Danimarca è attualmente utilizzato in due progetti in atto presso il Parlamento e la televisione pubblica rispettivamente; in entrambi i casi sono previste anche iniziative di formazione a favore degli operatori (resocontisti e sottotitolatori). La realizzazione di programmi di riconoscimento vocale sufficientemente affidabili per questo tipo di utilizzo si è rivelata quanto mai complessa, e anche le difficoltà dello rispeakeraggio non sono ancora state esplorate appieno. Il presente articolo analizza alcune peculiarità di questa attività, con particolare riguardo a simultaneità di ascolto e parlato, costrizioni sapzio-temporali e norme istituzionali. Formare un esperto sottotitolatore per farne un esperto respeaker potrebbe essere alquanto simile a formare un esperto traduttore per farne un esperto interprete di simultanea - con tutti i problemi che ne conseguono.
Speech recognition is currently being applied in two different projects in major Danish institutions, the Danish Parliament and Danish Television. The Danish Parliament is planning to use speech recognition to produce the official reports of Parliamentary proceedings, whereas Danish television has begun using it to offer deaf viewers simultaneous subtitling for programs that could not previously be subtitled, e.g. breaking news reports or live football commentary. In both cases, the idea is to use respeaking and to train some of the people already involved in producing reports and subtitles to perform these functions using the speech recognizer, which will be trained in the process to recognize these persons’ voices. The development of a speech recognizer that performs sufficiently reliably for these purposes has proved to be a complex technical and linguistic challenge. The challenges facing traditional subtitlers turned respeakers are only just being explored. This paper will address some specific skill requirements and task demands involved in this activity with special emphasis on simultaneous listening and speaking, temporal and spatial constraints, as well as institutional norms. Training a skilled traditional subtitler to become a skilled respeak subtitler may not be very different from training a skilled translator to become a skilled simultaneous interpreter - and not altogether unproblematic.
Keywords: respeaking, online subtitling, simultaneous interpreting, shadowing, rispeakeraggio, live subtitling, sottotitolazione in diretta, interpretazione simultanea, interpreting studies
©inTRAlinea & Inge Baaring (2006).
"Respeaking-based online subtitling in Denmark"
inTRAlinea Special Issue: Respeaking
Edited by: Carlo Eugeni & Gabriele Mack
This article can be freely reproduced under Creative Commons License.
Permanent URL: http://www.intralinea.org/specials/article/1685
1. Subtitling in Denmark
Ever since the introduction of television broadcast in Denmark in 1950, subtitling has been a standard procedure for mediating foreign-language television programmes. This applies to foreign movies, news items and indeed any type of programme with foreign talk. Dubbing with lip synchronization has been virtually non-existent, and voice-over and respeak or versioning based on script translation have been used mainly in programmes for children up to the age of about 8 to 10.
This means that viewers with hearing disabilities have been able to follow nearly all foreign-language programmes, whereas – at least formerly – they had limited access to programmes with in Danish language, especially news programmes. Now, a teletext page with subtitling for the hard-of-hearing is available from both of our two public service channels for nearly all programmes that are not broadcast live.
2. Online-subtitling in Denmark – the current situation
The need for online subtitling has been fully recognized by both public service channels, however, and is already being provided to some extent across the teletext page just mentioned. One channel is using a steno typing machine for speed typing with two specially trained subtitlers typing alternative sentences if the need for online subtitling occurs. The other channel is supplying online subtitles based on respeaking and voice recognition, using the Philips SpeechMagic program. This online method of providing subtitles based on respeaking and voice recognition will be the main focus of this paper.
At present, online respeaking-based subtitles are provided for one news broadcast every day. The system currently uses scroll mode which runs continuously over the screen. Traditionally, subtitles have always been displayed in block mode in two horizontal bars at the bottom of the screen. Therefore, this was also the display mode initially tested. Block mode was rejected, however, mainly because experiments showed that block mode resulted in unacceptable delays. The system did not display a subtitle until both lines had been filled with text, and then displayed it for 5 seconds. For the purpose of online subtitling, therefore, scroll mode has been preferred.
Respeaking aims at producing a simultaneous spoken version of a statement spoken by another speaker, for the purpose of making the communication interpretable by a speech recognizer and representable in written subtitles with minimal delay. The need for respeaking arises because of the current limitations in speech recognition technology. At present, speech recognizers are only capable of recognizing the speech of a person after considerable ‘training’.
3.1 Skill requirements
The element of simultaneous repetition makes respeaking similar in this respect to simultaneous interpreting. Both of these language activities are based on concurrent listening and speaking with minimal timedelay. This means that a respeaker must be able to practise split attention, i.e. to be able to simultaneously pay attention to the form and content of the source message and to the form and content of the target communication.
However, respeaking is different from simultaneous interpreting in not involving any translation from one language to another. Respeaking operates intralingually rather than interlingually. It would therefore appear to be a relatively simple skill to master, not requiring much training, especially if all it involves is a more or less mechanical repetition of the source communication.
3.2 Task demands
It is immediately obvious, however, that the kind of respeaking required for online subtitling is not a question of mechanical, simple repetition of the kind a talented parrot might perhaps be trained to perform. Even if respeaking is seen as an intralingual language activity only, the specific task requirements and the speech recognition software impose quite complex constraints on the activity if it is to succeed. E.g. the software may require particularly careful articulation, mini pauses between words, and the like, all of which will require training. To this one must add further institutional requirements such as demands for coherent text with minimal information overlap with subsequent text, elimination of false starts, repairs of incoherent text, compression of unnecessarily padded text, addition of punctuation marks, etc. And everything has to be performed online with a delay never exceeding two or three seconds.
Such constraints imposed both by the specific requirements of the technology employed and by general communicative considerations impose a need for additional layers of text processing, all of which require a cognitive effort by the respeaker far beyond what is needed for mere mechanical repetition. In this perspective, the respeaker should perhaps rather be seen as a reformulator or online editor, whose communicative effort adds relevance and value to the original message for the respeaker’s target audience.
For all these reasons the new phenomenon of respeaking is interesting both from a research perspective and from the point of view of training. Both from the research perspective and with a view to designing appropriate training programmes, similarities and differences between respeaking and familiar tasks and skills like shadowing, subtitling and simultaneous interpretation need to be explored.
3.3. Respeaking and shadowing
The surface resemblance between shadowing and respeaking is striking. Shadowing, as sometimes used as a preliminary exercise to help develop split attention in the training of simultaneous interpreters, is the simultaneous listening to a speaker’s speech and repetition of the speaker’s words with minimal delay. Or, in the words of Lambert,
Technically speaking, shadowing is a paced auditory tracking task which involves the immediate vocalization of auditorily presented stimuli, i.e. word-for-word repetition, in the same language, parrot-style, of a message presented through headphones (Lambert 1988: 381).
Respeaking, also, can accurately be described as a “paced auditory tracking task which involves immediate vocalization of auditorily presented stimuli” (1988: 381), but Lambert’s description of shadowing also indicates where shadowing and respeaking differ. Respeaking is not always a straightforward word-for-word repetition, parrot-style. As already mentioned, the specific task requirements, including the constraints imposed by the speech recognition software, frequently force the respeaker to depart from straightforward word-for-word repetition.
Just how frequently this is the case, to the author’s knowledge still remains to be investigated empirically. What can be said with some certainty, however, is that persons who have been trained to develop split attention, by means of shadowing exercises or otherwise, possess a skill that is also required for respeaking and must be acquired and developed by anyone aspiring to become a proficient respeaker.
At the Copenhagen Business School we have found shadowing useful as a means of developing the ability to split one’s attention. In our experience shadowing exercises help develop the ability to attend at the same time both to information received through the head-phones and to the language produced by one’s own voice.
The author is well aware that shadowing is not a universally acclaimed training method by teachers of simultaneous interpreting. Neuropsychologists and psychologists also have different perceptions of shadowing. Kurz claims that
These neuropsychological findings should be taken into account in an assessment of the pros and cons of shadowing exercises. The advocates of shadowing should be aware that a crucial element is missing in those exercises: the active analysis of the speech input” (Kurz 1992: 248).
On the other hand, Tonelli and Riccardi (1995) found experimental evidence that subjects in a shadowing task detect and correct errors both when they have not in advance been informed of the possible existence of such errors, but also if they are instructed to correct such errors in the text and even if they are instructed not to correct possible errors in the text. Their experiment indicates that shadowing involves both semantic analysis of source information and concurrent monitoring of the target communication for semantic and syntactic appropriacy.
Based on the experience at the Copenhagen Business School with shadowing we can only conclude that if shadowing can be used to develop split attention in simultaneous interpreters, it will be equally useful for training respeakers.
3.4. Respeaking and simultaneous interpreting
Research into simultaneous interpreting by such authors as Hella Kirchhoff, Barbara Moser-Mercer, Daniel Gile and Robin Setton (cf. Pöchhacker & Shlesinger 2002) has demonstrated the complexity of this language activity and the need for managing the cognitive effort involved. The element of interlingual transfer alone means that simultaneous interpretation includes so many overlapping processes that the similarity between it and what might be called ‘simple respeaking’ is restricted to the element of concurrent listening and speaking and the time pressure involved by the demand for simultaneity.
Again, however, we have to remind ourselves that respeaking for online subtitling is always more complex than simple respeaking. This makes it relevant to look more closely for parallels between simultaneous interpreting and respeaking for online subtitling despite the fact that the kind of respeaking refered to here does not require practitioners to constantly cross between two languages.
Simultaneous interpreting is based on segmentation of the source text into units of meaning which can be rendered in the target language. The time delay involved depends on the length of these units of meaning and will be affected by a variety of factors such as what languages are involved and how well they are mastered by the interpreter, the speed with which the source text is delivered, its degree of complexity, and such like.
Production of a coherent target text is dependent on the simultaneous interpreter’s ability to attend to two concurrent language activities in two different languages, listening and speaking at the same time, while also monitoring both of those activities and on the ability to perform these activities under the time pressure imposed by the speed with which the source text is delivered. A considerable portion of the interpreter’s cognitive working power is devoted to establishing the right balance between the amount of attention devoted to listening to the source text and the attention devoted to formulating content in the target text and monitoring this output – and to managing time pressure (cf. Gile 1988).
The external constraints on respeaking mentioned earlier must be assumed to impose a somewhat similar kind of cognitive load though it is clearly also different and probably not quite as demanding. The need for voice control while listening is shared by simultaneous interpreting and respeaking, and the time pressure situation is very similar too. Empirical research comparing respeaking and simultaneous interpreting in controlled experiments is likely to throw more light on the relative cognitive load involved in these two activities. A probable initial hypothesis might be that respeaking requires roughly the same amount of cognitive effort less that imposed by interlingual transfer.
3.5. Respeaking and subtitling
Though respeaking and subtitling are aimed at producing the same product, the language activities involved are by no means identical. The most obvious difference is the absence of the need for concurrent ear-voice control in traditional subtitling.
Traditional subtitling is similar to interpreting in frequently depending wholly on auditory input, and also similar with respect to interlinguality when target-language subtitles are added to foreign language programmes. However, the fact that the traditional subtitler works offline means that there is much less strain on the listening activity. If the subtitler misses a bit of dialogue, there is always the chance to replay it, partly because the traditional subtitler works from recorded speech and partly because the time pressure in simultaneous interpreting is absent.
The similarity between respeaking and subtitling is most obvious with respect to subtitling for the deaf and hard-of-hearing. Both of these activities operate intralinguistically, but even here there are still considerable differences. Respeaking and subtitling for the hard-of-hearing may require similar comprehension skills, but the subtitler (this is again general for both traditional and deaf subtitling) has the advantage of working offline with recorded and therefore replayable auditory input, whereas the respeaker works online. Strong typing skills will benefit the subtitler, not the respeaker, who in turn will need to be able to control voice production online while listening online – a skill for which the traditional subtitler has no need.
The fact that subtitlers are used to working within demanding external constraints such as spatial limitations or display-time regulations may be an immediate advantage to them on the way to developing respeaking skills. Strong editing skills will be an important element in a respeaker’s portfolio of competences.
3.6. Specific-purpose respeaking
Respeaking for a specific purpose such as public serve channel subtitling takes place in a specific cultural setting and employs a specific configuration of technological tools. Gradually norms about what constitutes good professional practice in the field begin to develop. The effect of all this, in short, is that specific-purpose respeaking cannot be looked upon as mere mechanical word-for-word repetition. Verbatim repetition may certainly be part of specific-purpose respeaking, but such respeaking will necessarily involve at least an element of editorial monitoring for comprehensibility, linguistic appropriacy, and the like. This makes respeaking similar in several respects to all of the three language activities I’ve discussed, shadowing, simultaneous interpreting and traditional subtitling.
The similarity with simultaneous interpretation appears to be most comprehensive in that they share all the following four features:
1. concurrent listening and speaking
2. concurrent semantically and syntactically correct oral rendition under strict external constraints
3. continuous online monitoring of output
4. mental strain imposed by time pressure and psychological stress
Re 1) The concurrent listening and speaking requirement makes it necessary for both the simultaneous interpreter and the respeaker to manage the distribution of attention to source text comprehension and target text production. Too much focus on source text comprehension creates production problems. Conversely, formulation problems may cause the interpreter and the respeaker to focus so much on production that source text content is missed and production becomes erroneous or otherwise inadequate (cf. Gile 1988).
Re 2) Simultaneous production of semantically and syntactically correct output is required of both the interpreter and the respeaker. This particular requirement may put the respeaker under very considerable strain since there will be no chance to correct any inaccuracy or error once it has been formulated.
Both activities depend on segmentation of the source text with this difference that an interpreter is looking for units of meaning for which an equivalent can be found in the target language, whereas segmentation in respeaking aims at formulating text that is both correct and screen-ready, i.e. a text from which redundant information and false starts have been eliminated, a text in which incoherence has been repaired, a text to which text-external elements such as punctuation has been added. Among the choices an online respeaker has to make, one of the most important is to ensure that the online subtitling does not overlap too much with conventional subtitling. This also makes it necessary for the respeaker to strike the right balance between making the ear-voice span as short as possible and on the other hand eliminating false starts and compressing incoherent text.
The decision to use scroll mode rather than block mode for online subtitling removes some of the strain on the respeaker caused by time pressure and also does away with some of the spatial constraints. The voice recognition software may impose several specific constraints, however. Articulation has to be particularly careful, breathing must be controlled, mini pauses may have to be introduced between words, microphone volume and distance must be monitored, etc. Intonation, on the other hand, is only relevant e.g. to generate italics for special emphasis.
Re 3) Online monitoring of output is carried out to ensure that the target text which is produced is semantically and syntactically correct, but in some instances it may constitute a barrier to attending to the other activities. If an interpreter or respeaker decides to make a correction, critical time may be lost and contact with the source text may be momentarily lost and production may suffer.
Re 4) Finally, both the simultaneous interpreter and the respeaker are exposed to considerable mental strain imposed by the time pressure they are working under and the psychological stress this entails. The interpreter has limited possibilities to repairing infelicities. To the respeaker that door seems to be completely closed. At present, respeakers in Denmark work with shorter turns than do interpreters so that at present fatigue is not a problem. Of course this situation may change if online subtitling becomes more widely used.
4. Training of respeakers
Based on this analysis of respeaking, it would appear that a training programme for respeakers would include a number of the elements used in training simultaneous interpreters – but without the components aimed at developing fast interlingual communication skills. Shadowing would be a good starting point for training concurrent ear-voice management, and this would of course have to be followed up by course components aimed at developing the specific skills needed to meet institutional and technical demands such as segmentation exercises, exercises aimed at improving time delay management, exercises at condensing information, training memory etc. An already fully trained conference interpreter would appear to have a great advantage here, but even such a person would need to be trained to meet e.g. the articulatory constraints imposed by the speech recognition software. A trained subtitler, especially one experienced in subtitling for the hard-of-hearing, would certainly have a headstart in comparison with somebody without this background in being used to producing nicely segmented reader-friendly texts. But the ability to produce written text offline does not necessarily translate unproblematically into an ability to produce the kind of spoken text that will come out as a reader-friendly text after being processed by a speech recognizer.
The training of respeakers, in conclusion, seems to be a really interesting question, both from a research and a didactical point of view and one that should be investigated further. So far, in Denmark, only subtitlers have had access to the new technology. The public service channels employ very highly skilled subtitlers, so there is no doubt that they will attempt to live up to high quality requirements, also for online subtitling. And one very relevant skill they already possess is the ability to condense text. But the question to ask is: Will they be able to perform at the same level as respeakers with a background in interpreting might? Subtitlers’ main experience and special skill is with listening to speech in a foreign language and then writing a time-coded translation. They do not necessarily have strong speaking skills, which may well be the key competence in online subtitling based on speech recognition. Time, but hopefully also empirical experimentation and training experience, will give us the answer to that question.
Gile, D. (1988). “Le partage de l’attention et le ’modèle d’effort’ en interprétation simultanée”. The Interpreters’ Newsletter 1: 27-33.
Kurz, I. (1992). “Shadowing Exercises in Interpreter Training”, Dollerup and Loddegaard (eds), Teaching translation and interpreting 1: Training, talent and experience. Amsterdam/Philadelphia: John Benjamins Publishing Company, 245-250.
Lambert, S. (1988). “A Human Information Processing and Cognitive Approach to the Training of Simultaneous Interpreters”, Hammond (ed.). Languages at crossroads. Proceedings of the 29th Annual Conference of the American Translators Association ATA. Medford, NJ: Learned Information Inc., 379-387.
Pöchhacker, F. Shlesinger, M. (eds). (2002). The Interpreting Studies Reader. London/New York: Routledge.
Tonelli, L. Riccardi, A. (1995). “Speech Errors, Shadowing and Simultaneous Interpretation”. The Interpreters’ Newsletter 6: 67-74.