Subtitling choices and visual attention: a viewer perspective

By Mikołaj Deckert and Patrycja Jaszczyk (University of Łódź, Poland)


This paper surveys and critically discusses empirical findings that shed light on the processing of audiovisual material in the context of translatorial decision-making and translator training. More specifically, we investigate how attention is allocated to a special kind of visually-coded language and how those instances are reasoned about by ‘non-translator’ viewers as well as by trainee translators – who produced subtitles for the material in addition to watching it.

Keywords: attention allocation, cognitive processing, trainee translators, audience, decision making, visual-verbal coding

©inTRAlinea & Mikołaj Deckert and Patrycja Jaszczyk (2019).
"Subtitling choices and visual attention: a viewer perspective"
inTRAlinea Special Issue: New Insights into Translator Training
Edited by: Paulina Pietrzak
This article can be freely reproduced under Creative Commons License.
Stable URL:

1. Introduction

Our starting point is that there are contexts where the translator’s decision-making centres on “what to translate” or “if to translate”, before it can be focused on “how to translate”. This approximates what Kruger (2012: 70) terms “relevant selection” in a discussion of audio description. We illustrate the problem with instances of audiovisual material featuring language that is represented visually. Drawing on work in cognitive psychology, we will examine cases where the prominence status of elements is not fully clear, which therefore – significantly for translator training – requires the translator to decide if an element is to be translated in the first place.

To that end, we will report experimental studies where two pools of respondents – regular viewers and trainee translators – provided their feedback. As this paper goes on to show, the data displays patterns that can be used to draw conclusions about the processing of visual-verbal representations and then about differences in how these elements, or by extension film material in general, is processed by different audiences, with special regard to the didactic dimension of translation.

2. The semiotic setup in AVT

The semiotic composition of audiovisual material can be construed as a 2x2 matrix comprising two dual sets: sound and vision and then verbal and non-verbal (cf. Delabastita 1989, Chaume 2000, 2004, Zabalbeascoa 2008). The visual layer can be further broken down into six codes: “iconographic”, “photographic”, “mobility”, “planning”, “graphic” and “syntactic” (Chaume 2001, 2004, Tamayo 2017). Drawing on that, this paper concentrates on the visual-verbal layer, or cases of what we will be referring to as “visual-verbal coding” (VVC), which are kin to what Matamala and Orero (2015) call “text on screen” in their discussion of audio description. These elements are then differentiated into “diegetic” and “non-diegetic” ones. The former are those visual-verbal inserts which are “part of the action” while the latter are introduced by filmmakers and superimposed in the “editing process” (Matamala and Orero 2015). This paper deals with the “diegetic” type[1]. Importantly, we further narrow down the scope of inquiry to instances of what we term “liminal ostensiveness”. That construct rests on the premise that stimuli, in this case of visual character, are differently prominent, and are embedded in the filmic material in ways as to draw attention to themselves to a variable degree. We thus use the term “ostensive” as it came to be used to describe stimuli in communication[2]. Visually, the radial character of ostensiveness can be illustrated by contrasting the following two frames from the film “Less Than Human”. The film also served as material in the studies discussed further on in this contribution:

Figure 1. Prototypical ostensiveness

Figure 2. Liminal ostensiveness

The text “The Animation Workshop presents” in Figure 1 is a prototypical case of ostensiveness in that viewers are highly unlikely not to be led into believing that they are expected to read it. This results primarily from the diagetic character of the text. Whereas in the case of the second image (in Figure 2) and the “out of order” notice behind the reporter’s left shoulder, the status is much more uncertain, i.e. whether viewers detect it and choose to invest processing effort in it is harder to predict. In other words, while in the former we can fairly safely assume the text is there for a reason, the matter gets more nuanced in the latter where the communicator’s informative intention is less unambiguous, which is what we set out to explore.

3. A psychological view

3.1. Selective attention

By drawing on the construct of attention – and specifically its visual facet – in this article we propose an integrated account demonstrating the cross-fertilisation of translation studies and cognitive psychology. Attention can be thought of as focusing consciousness on a specific stimulus whilst ignoring others. The skill is pivotal in order to deal with the tremendous amount of information that people are confronted with. William James talks about attention in the following way:

Everyone knows what attention is. It is taking possession of the mind, in clear and vivid form, of one out of what seems several simultaneously possible objects or trains of thought. Focalization, concentration of consciousness are of its essence. It implies a withdrawal from some things in order to deal effectively with others. (1890/1950: 403)

Zhang and Ling (2013: 3) highlight that attention is selective by pointing out that “(…) observers’ eyes only can focus their attention on a small area of the visual field at a given moment. Consequently, only this small area can be observed in detail”. Different models try to account for the mechanism. A notable one is Broadbent’s (1958) “filter model” also known as the “early selection model”. It was experimentally supported by the “dichotic listening test” which involves sending two different messages to the left and to the right ear, simultaneously. Broadbent found that the stimuli are filtered at an early stage, because of “limited information processing capacity” (van der Heijden 1992: 64). In this formulation, stimuli are argued to first reach the sensory buffer (Broadbent 1958). At this stage one of them is chosen and filtered on the grounds of physical characteristics, such as pitch or loudness (van der Heijden 1992: 43), and the filtering precedes the processing of information. Following from that work, Deutsch and Deutsch developed the “late selection model” (1963). In this proposal, stimuli were argued not to get filtered before the analysis of meaning. Then, Treisman (1964) came up with her “attenuation model” where the message unattended to is solely weakened, rather than eliminated.

3.2. Change blindness and inattentional blindness

A powerful cognitive mechanism that can be outlined to provide more background for the current investigation is “change blindness”, i.e. “the failure to visually experience changes that are easily seen once noticed” (Rensink 2009: 47). Research shows that change detection is not possible without focused attention. This phenomenon has been the subject of a large number of experiments (cf. e.g. Simons and Rensink 2005), many of which used a “flicker paradigm” (Rensink, O’Regan and Clark 1997). This technique consists in displaying the original and the altered image with a blank slide inserted between the two. Thus, the subjects’ attention is distorted, leading to the challenge of registering change (Noe, Pessoa and Thompson 2000: 94). Notably, even if observers are informed that a change will occur in some cases, they might not be able to detect it, despite the alteration being considerable and recurring (Simons and Ambinder 2005: 45). Given the confirmatory research done in the lab, Simons and Levin (1998) ran a study to establish if change blindness functioned outside the lab as well. In their experiment known as “the door study”, participants were approached by a stranger with a map asking for directions. At some point the conversation is briefly interrupted by two men carrying the door between the subject and the stranger. As the subject’s vision is blocked, the stranger swaps with another person who then continues to talk to the subject. Surprisingly, more than 50 per cent of subjects failed to detect the change of interactant (Simons and Levin 1998: 646). This was aptly summarised by Simons and Ambinder (2005: 48) as “a striking phenomenon, one that reveals limits on conscious awareness and accentuates the discrepancy between what we see and what we think we see”.

A related phenomenon pertinent to the study reported further on in the paper is inattentional blindness which Rensink (2009: 47) defines as “the failure to visually experience the appearance of an object or event that is easily seen once noticed. Attention (likely, diffuse attention) is thought to be necessary for such an experience”. The change is located in the field of sight, nevertheless observers cannot notice it due to the fact that their attention is directed elsewhere (Szymańska 2011). Research shows that participants may fail to discern the stimulus that appears unexpectedly (Chabris and Simons 2010: 7). One of the most famous experiments examining this phenomenon is the “invisible gorilla test” conducted by Chabris and Simons[3]. The subjects’ task was to count how many times a basketball was passed between players in white or in black, whilst ignoring others (Simons and Chabris 1999: 1066). Meanwhile, an individual dressed up as a gorilla walked across on screen, being visible for almost 6 seconds (Chabris and Simons 2010: 6). The experimenters established that 46 per cent of their subjects failed to notice the gorilla (Simons and Chabris 1999: 1068) which compellingly supports the mechanism of inattentional blindness.

4. Experimental evidence

The primary aim of the analysis is to inform the translator’s decisions by looking into:

  1. whether/how viewers allocate attention to processing instances of VVC where its status is variably liminal (i.e. VVC identification)
  2. whether viewers are convinced such elements need to be rendered (VVC’s translational status perception)
  3. whether VVC identification and its translational status perception vary between regular viewers and trainee translators who additionally engage in translation

Starting from these general research questions, explored in Study 1 and Study 2, we hope to come up with empirically-founded insights into the functioning of VVC that are ultimately intended to shed light on the construct of translation competence.

4.1. Study 1

4.1.1. Procedure, materials and participants

Data collection took place online with the use of Google Forms that participants[4] accessed individually. The conditions were therefore less controlled but they ensured privacy and therefore helped approximate a regular non-experimental screening, thus enhancing the studies’ ecological validity.

The survey was completed by a total of 32 individuals, 21 of whom were women, with the age mean of 27, SD = 8.09. They were all native speakers of Polish with varied command of English. Their self-reported English proficiency was elementary (6 participants), intermediate (14), and advanced (12 participants).

As a first step subjects were requested to download and watch a total of 5 English-language video clips with Polish subtitles. The subtitles were prepared specifically for the study and they consistently did not render the instances of VVC. The clips were extracted from the following films: “Monsters, Inc.”, “Charlie and the Chocolate Factory”, “Less Than Human”, “Shrek” and “The Simpsons”, with some of them serving as filler stimuli and some containing cases of VVC whose reported reception the study sought to examine. Each subject was instructed to anonymously provide answers to a sum of 51 questions divided into 6 sections. The sense of anonymity was important so as to minimise the risk that the subjects’ self-reporting answers would be distorted. The first section  required them to provide demographic information. The remaining 5 sections corresponded to the 5 clips with questions eliciting feedback on parallel matters, with some questions being more directly linked to the construct of VVC and some serving as fillers preventing the subjects from identifying the aim of the study.

In this paper we will discuss in some detail samples used in the larger research project[5]. The two samples come from “The Simpsons” (a 2D animation created by Matt Groening, 1989[6]) and “Less Than Human” (a 3D animation created at the Animation Workshop[7], 2016) and accommodate a total of 3 cases of VVC that will be examined. These three cases have been selected as they differ in the degree of ostensiveness of VVC as evidenced by its visual prominence and its function.

4.1.2. Sample 1 – Case 1

The first fragment to be discussed comes from “The Simpsons”. The clip lasted 4 minutes 59 seconds. The relevant case of VVC[8] comprises a truck of a construction worker who arrives at the house of the eponymous family. The truck is showed in profile and bears the company’s name – “J&J Construction” – written across the truck’s white-coloured side in black block letters. The shot serves as an introduction to the scene that follows where the contractor talks to Marge Simpson inside the house that is first seen in the background against which the truck is presented. However, the case of VVC that is of particular interest here is the additional subtitle placed on the truck under the company’s name, in a smaller font –  “The Vague Answer People”. This “explicitating” input is not directly indispensable for the interpretation of the shot or the scene it precedes, and its function could be categorised as “humorous”. The table below traces the relevant fragment and provides some wider context.

The Simpsons

source text

target text


-There it is. Nice and smooth. I’d like to see your boyfriend the contractor do a better job.

- I think you used too much plaster.

-Now you tell me?

-I never stopped telling you.


-Oto jest, piękna i gładka. Chciałbym zobaczyć twojego fachowca jak robi to lepiej.

-Myślę, że użyłeś za dużo tynku.

-Teraz mi to mówisz?

-Cały czas Ci to mówię.

Homer is talking to Marge about the remodelling.

They are in the kitchen. Homer is proud of his work.

Marge has second thoughts about it. 

- So that’s what that white noise was…

-To o to był ten jazgot…

Camera shows Homer.

I’m calling a contractor.

Dzwonię po fachowca.

Marge smashes through the plaster. She calls a contractor.



The contractor’s truck is parked in front of the Simpsons’ house.

-Thanks for taking the job. I’m sorry my husband is being so difficult.

-Get lost, crook!

-That’s alright Mrs Simpson. Many husbands feel emasculated when their wife must turn to a professional to satisfy her remodeling needs.

-Why don’t you just kiss her?

-I’m gay. But I have a subcontractor that does this sort of thing for me.

- I like to kiss.

-Dzięki za przyjęcie zlecenia i przepraszam za męża.

- Spadaj naciągaczu!

Marge is talking to the contractor. Homer is hiding behind the tree. He throws a pot of paint at the contractor. Homer is shouting. He is mad at the contractor.

-W porządku. Wielu mężów nie lubi

gdy ich żony proszą fachowca żeby zaspokoił ich potrzeby.

-Jeszcze ją pocałuj.

-Jestem gejem. Podwykonawca załatwia takie sprawy.

-Lubię się całować.

A subcontractor appears on the screen. He kisses the air.

-Now, don’t you worry. Your kitchen will be done in three weeks.

-Proszę się nie martwić. Kuchnia będzie gotowa w trzy tygodnie.

The contractor assures Marge that the remodelling will not last long.



Camera approaches the Simpsons’ house. There is text on the screen indicating the amount of time that has passed.

Table 1


Within the pool of 32 subjects 26 (81.25 per cent) declared that they had detected the inscription, and 20 subjects (62.5 per cent) reported that elements like this one should be translated. While the latter figure constitutes a majority, one could still argue it is unexpectedly low. In line with this, another intriguing finding is that within the pool of subjects who noticed the case of VVC, as many as 9 (34.62 per cent) argued such elements need not be translated. Conversely, 3 (50 per cent) of the subject who did not notice the writing on the truck claimed such elements need to be translated.

4.1.3. Sample 2 – Case 2

The second animation used in the investigation was “Less Than Human”, lasting 6 minutes 9 seconds. A major advantage of using the film was that because it is relatively short, subjects could view the entire animation. This is a methodological advantage in the sense of ecological validity and vitally supplements Sample 1 where a necessarily decontextualised fragment of a longer film was used, even though in many cases for reasons of feasibility the latter will be the only realistic option[9].

It is also notable that the clip featured more than one case of VVC, two of which are zoomed in on as part of the current inquiry. The first case is showed in Figure 3 below. The poster with title of the film “Night of the Living Dead” is seen on the wall in the apartment of the story’s protagonists. The poster’s textual component is fully visible for approximately 7 seconds and the poster is present on the screen at least partly in more than one shot. Analogously to Case 1 discussed earlier, this occurrence of VVC is not pivotal to following the animation’s plot and it is not coded in other semiotic layers of the film. Its function is also similar to that of Case 1 as it can be interpreted as a humorous, perhaps ironic, commentary on the main characters of the story who are in fact “the living dead” residing in “a quarantine facility” or “the camp”, as it is termed in the film. Also, the poster can be a way of tying the action of the animation to the extra-filmic reality known to viewers. Similarly to the location name of “Seattle” being mentioned by one of the characters, this serves as an ‘authenticating’ device to indicate that the world showed in the animation is not a fabricated one.

Figure 3. VVC – “Night of the Living Dead”

Table 2 below outlines the pertinent fragment.

Less Than Human (1)


source text

target text




Oh, can I…

Can I get you something to drink?


Może się czegoś napijesz?



Andy is standing in the kitchen. His voice is heard in the background.


Oh, man. What’s that smell?

Co to za zapach?

Camera approaches the kitchen window.


Just look at me, not into the lens.

Po prostu patrz na mnie, nie w obiektyw.

Andy is picking his teeth. The reporter is giving him instructions about the interview.


-So, tell me. How long have you been in here?

-At the camp? Oh… Well, I think they placed us in here, I guess, it’s been about six years now.

- I see. And, do you…

-Powiedz jak długo tutaj jesteście?

-W obozie? Myślę, że umieszczono nas tutaj jakoś tak sześć lat temu.

-Rozumiem. I czy wy...

Andy is talking with the reporter.


Hey, what you were saying before?

Come on, you know it’s impossible for me to hear anything from… Um… 

Co mówiłeś wcześniej?

Wiesz, że nic nie słyszę z...

Don appears. He is astonished and annoyed.


What’s this?

What’s going on here?

Co to jest?

Co się tutaj dzieje?

Camera approaches Don’s intestines. He is angry.




The reporter looks surprised and terrified after he saw Don.


Table 2


As far as the detection of VVC goes, 13 subjects (40.62 per cent) claimed to have noticed it. While the preponderance of subjects failed to notice the title, the ratio of those who did is relatively high so it could be seen as inconsistent that as many as 24 participants (75 per cent) responded that there was no need to translate elements like that one. It ought to be pointed out that among the 8 viewers who stated such elements should be rendered, 6 did not detect the case of VVC. It might therefore follow that whether viewers themselves notice VVC, or perhaps other types of elements for that matter, could be a significant factor influencing their opinion about the translational status of those. In Case 2 we find it to an extent counter-intuitive that the influence should be in this direction.

4.1.4. Sample 2 – Case 3

The third instantiation of VVC (in Figure 4 below) examined in this paper is functionally distinct from the two cases discussed above. This occurrence of VVC more explicitly contributes to the interpretation as it reinforces the story’s congruency by providing additional evidence of attitudes toward the “living dead”. This meaning-making facet is coupled with visual prominence. The VVC is first visible in the background but then the speaker gestures in its direction after which the camera zooms on the writing on the wall itself and it takes the central position on the screen. As has been mentioned, this plays a significant function because the VVC occurs when the reporter says “What we witnessed here today is a clear sign that reintegration is not an option”. The VVC thus serves to lend credence to the speaker’s statement via remarkable intermodal cross-feeding of inputs. What is more, just a few seconds later the reporter declares “The images speak for themselves” (see Table 3). Interestingly, while the speaker is probably primarily referring the footage recorded throughout their stay at the facility, given the temporal proximity of the preceding VVC, the utterance could actually be taken to denote that literal “image” as well.

Figure 4. VVC – “Go back to your graves”

Less Than Human (2)

source text

target text



Don and Andy had an argument. Andy is crying in the room. Don is sitting at the table.

Andy? Man, come on.

You, you still there man?

Don’t be like that, alright?

I’m sorry.


Andy, no co ty.

Jesteś tam?

Nie bądź taki.


Don knocks on the door. He wants to apologise to Andy.

Don is playing a harmonica.

Andy leaves the room. He is listening to Don playing a harmonica.

Are you coming or what?

You’ve got the car keys.  

Idziesz czy nie?

Masz kluczyki od samochodu.

The reporter opens the door. He is whispering.

What we witnessed here today is a clear sign that reintegration is not an option.

To czego doświadczyliśmy tu dzisiaj jest jasnym dowodem na to, że reintegracja nie jest możliwa.

The reporter is standing in front of the building. He is commenting on the situation.

The danger, we thought we’d gotten rid of, is lurking just beneath the surface.

Niebezpieczeństw, którego myśleliśmy, że się pozbyliśmy czai się tuż pod powierzchnią.

Camera approaches the writing on the wall.

The images speak for themselves.

Obrazy mówią same za siebie.

The speaker summarises the report.

Table 3


The argument about the more sanctioned prominence and ostensiveness status of this VVC occurrence is only partly corroborated by the findings, with 20 (62.5 per cent) of viewers stating they had noticed the writing on the wall. This indicates that even in cases where VVC is salient – if only because for a few seconds it is the single stimulus standing out visually – viewers can fail to allocate their attentional resources to it. What we find even more unexpected,  less than half of our respondents, i.e. 15 individuals (46.88 per cent), agreed that elements like this one should be subtitled. Among the 15 subjects, 13 noticed the slogan, which is at loggerheads with how the relationship worked in Case 2 between subjects’ viewing patterns and their opinion on whether to translate VVC. Nonetheless, this confirms our observation that the relationship can be hard to predict.

4.2. Study 2

4.2.1. Procedure, materials and participants

Analogously to Study 1, the data were collected with the use of a form that participants accessed online.  A total of 47 subjects took part in the study, 9 male and 38 female, age mean of 21.6, SD = 0.97. They were native speakers of Polish proficient in English, year-3 BA-level trainee translators at the Institute of English Studies, University of Łódź who participated in an introductory AVT course. The problem of VVC was not discussed with the participants as part of the course prior to the experiment.

The audiovisual material we used was the film “Less Than Human” from Study 1 which subjects were instructed to download individually. Differently from what was the case in Study 1, participants were not presented with the questionnaire from the outset as Study 2 comprised two stages. In stage 1 participants were required to produce Polish subtitles for two fragments of the clip, each featuring a case of VVC – corresponding to the instances discussed in 4.1.3. as well as 4.1.4. and to what is illustrated in Table 2 and Table 3 above. Notably, subjects were not required to spot the subtitles which was one measure to preclude the technical constraints of AVT from interfering with the content of the target text. This is especially important when it comes to VVC because the spatio-temporal limitations could make it impossible to render it in subtitles. In stage 2 participants were requested to fill out a questionnaire that elicited information on VVC identification and status perception. Additionally, open questions were used to get an insight into our subjects’ viewing experience and to learn about the rationale behind their choices.

4.2.2. Case 1


In the pool of  47 subjects 9 (19.1 per cent) participants reported detecting the instance of VVC. This corroborates our observation about the analogous case in the non-translator pool about the rate there (40.62 per cent) being unexpectedly high. One way to account for this disproportion between the results from the groups in Study 1 and Study 2 is to argue for overestimation of actual performance in the participants’ self-reporting – even though it was minimised in both studies by ensuring anonymity of responses. Another explanation is that translation takes up cognitive resources that in the case of regular viewers (Study 1) can be allocated exclusively to viewing.

When it comes to the translational status of VVC, as perceived by trainee translators, 38 participants (80.9 per cent) think it should not be translated. Strikingly, the proportion of subjects who noticed VVC is identical with the proportions of those who claim it should be translated. It is noteworthy that the perception of the translational status of VVC across groups from Study 1 and Study 2 is comparable, with 75% in the former claiming this instance of VVC need not be rendered into Polish.

4.2.3. Case 2


With 40 subject reporting that they had noticed the writing on the wall, the VVC identification rate reported in the study amounts to 85.1 per cent. This result does not support the hypothesis formulated above that it is the additional task of translating that makes trainee translators less likely to pay careful attention to the visual component. Rather, compared to what we found in Study 1 where the identification rate was unexpectedly low (62.5 per cent), the result indicates that trainee translators are more sensitive viewers than those from Study 1. However, the main difference comes to light when we compare the perception of VVC’s translational status across the studies. While in Study 1 the proportion of respondents who believed the inscription on the wall was ‘translation-worthy’ was markedly low (46.88 per cent), in Study 2 the proportion is as high as 97.9 per cent, with just one participant stating the case of VVC should not be subtitled.

5. Discussion

5.1. Liminal VVC – guiding the target viewers

The liminality of VVC creates a kind of a special meaning construction space for the viewer who is able to recognise and interpret an occurrence of liminal VVC to be shared with the filmmakers, and other – but not all – members of the audience. It is that ‘exclusive’ character of VVC that largely defines it. However, while the original gives leeway in that it is “up to the viewer” whether or not she detects those elements, in the case of the target audience the decision is taken in the subtitling process. That is to say, if the subtitler chooses not to translate an instance of VVC, the audience will not recover it[10]. On the other hand, if the subtitler does render a case of VVC, he explicitly tells the viewers the element is worth their effort. Remarkably, in those scenarios the target audience could de facto be in a cognitively privileged position as some of the work is done for them. While this is an asset in the cognitive sense of reducing effort, it need not be advantageous in the broader sense. After all, it could be postulated that the experience of uncovering a partly concealed meaning builds the appreciation of the work. In either translation scenario, as a result of a binary decision on the part of the translator the viewing experience and meaning construction procedure of the target audience with respect of VVC only partly overlaps with that of the source audience.

A fairly straightforward solution to the problem of the cognitive and translatorial status of VVC could be to manipulate the original image and replace the source VVC with its target version without disturbing the visual composition by adding a subtitle, which is now technically feasible. However, even in such a case, it would be hardly viable to argue that the instances of VVC would be analogously accessible to the source and target audiences, as the very fact of processing the ‘regular’ subtitles alters the viewing experience for the latter. With this evident difference in mind, and recognising the target audience’s extra processing effort as undesirable, one could argue that liminal VVC might be left untranslated. If such as conclusion is drawn, however, we are left with the question of why VVC was implemented in a particular way in the original. After all, the authors could have not added those liminal visual representations, or they could have made them more ostensible. To be fair, it needs to be noted at this point that animated productions are special when it comes to ascertaining the status of visual stimuli. It arguably is the case because each element that is visible on the screen had to be “placed” there consciously. In turn, in non-animated material it could be that an element appears on the screen by virtue of being a part of the natural setting where the film was shot, possibly could actually pass unnoticed in the shooting process.

5.2. Translatorial vs. viewer processing: translator training implications

Kruger (2008: 73) points out that “(i)n subtitling training, as in more generic training for the language professions, the development of analytical skills is vital, all the more so because of the multimodal nature of audiovisual texts.” A question that surfaces is to what extent the trainee’s, the translator’s or, by extension, a translation researcher’s processing of audiovisual stimuli is similar to how viewers engage with these stimuli[11]. A crucial consideration here is that – as indicated most clearly by the findings from Case 2 in Study 2 –

the former could be processing input more deeply. This is in all probability the result of training-induced sensitisation coupled with the fact that trainees likely viewed the stretch of the film being subtitled more than once[12]. What we could be ultimately dealing with, however – at least in some cases – is “overattending” to stimuli and as a result “overanalysing”. Therefore, while it is a fairly safe argument to make that trainee subtitlers need to be (made) aware of the complexity of the multimodal material and that they should be sensitised to cross-modal meaning-making patterns, the cognitive mechanism at work here appears to be nuanced. Taking the ‘multimodal’ aspect of subtitling competence a step further, we can propose that in addition to having their semiotic awareness and sensitivity developed, subtitlers, or translators in general, should also be able to monitor and control those skills (cf. Tirkkonen-Condit 2005). They should realise that as a result of training and work experience they might be processing audiovisual material differently from most viewers. This difference should then be factored into the choices they make not to let their deeper processing negatively impinge on the target audience’s experience of engaging with the translated product.

If we think about subtitling competence(s) (cf. Di Giovanni 2016), or translation competence more broadly, the above discussion signals some issues with one of the oft-postulated requirements (e.g. Gouadec 2002: 33), i.e. to “fully understand” the source text. Leaving aside the fact that “full understanding” is a very elusive notion, it could be brought into question if uncovering multiple layers of meaning – in our case paying attention to many potentially detectable instances of VVC – is invariably desirable. One might argue – if only for the purpose of a somewhat provocative exercise in reasoning – that by detecting those additional elements – which in itself can be labour and time-intensive – the translator broadens the scope of decisions to be taken, which might be seen as disadvantageous in some respects. From the angle of the audience, if the translator has a very thorough and multi-layered understanding of the source text he can guide their audience’s comprehension too narrowly or can be overly inclined to incur on them the extra processing effort of attending to VVC where the interpretation recovered might not be worth it.

On another level, while there are clear differences between the translator and non-translator groups, a finding to be highlighted surfaces from the descriptive responses in Case 1 of Study 2 where participants were requested to support their statements, e.g. explain why they though a given case of VVC should not be subtitled. What is thought-provoking here is that even within what is a highly homogenous group of individuals perceptions vary very significantly. This is evidenced by negative justifications such as “It’s only a poster”, “It is barely visible”, “Because it is irrelevant” that dominate but are partly counterbalanced by starkly contrastive statements like “It should be subtitled because it seems crucial”, “If it would be correctly subtitled it would make a lot of sense in the context”, and “Because it fits with the general theme of the short movie”.

6. Final remarks

6.1. Next research steps

With the study pointing to some inter-group variation in cognitive processing, these results can be complemented with the use of behavioural measures, above all eye-tracking (cf. e.g. Goldstein, Woods and Eli 2007; Doherty and Kruger 2016), and fine-tuned by testing the role of a range of variables. First of all, as has been showed, instantiations of VVC can differ qualitatively.

They are functionally distinct, to begin with. While an element’s function will be a conglomerate of several parameters, it could be argued that in Case 1[13] and 2 VVC served a primarily humorous function – with less ostensiveness – and in Case 3 it was employed more ostensibly. It is somewhat paradoxical that occurrences with lower ostensiveness – where the omission of VVC, either in the original or in translation, essentially would not prevent the audience from following the plot – are more highly problematic for the subtitler. If an occurrence of VVC is evidently relevant to the plot and is key to sense-making it is not much of a translation dilemma.

Second, the interaction of VVC with input from other semiotic layers of filmic material will vary. In fact, where the interaction is salient, as would be the case if a character makes a reference to a poster, the status of VVC should be unproblematic. But this again would get fuzzier if the reference was suboptimally ostensible or was not temporally aligned with the occurrence of VVC. Along those lines, a critical manipulation to be examined is how the presence or absence of subtitles influences the processing of VVC, also across native and non-native subject pools.

A related variable is then how VVC is presented – for how long, from what angle, whether in the foreground/background, at what distance, and with what other competing stimuli for the conceptualiser to distribute attention among. To construe that in terms of effort, it will matter how costly the accessing and processing of VCC is. In our data, the stimulus in Case 1 (Study 1) was presented on the screen for a shorter time than in Case 2  and Case 3 but was easier to identify by the viewers because it was in the foreground and there were fewer other discernable elements on the screen to which to allocate attention.

Another important variable is material type, and following from it, audience profile. Hard as it typically is to ascertain the prospective target audience’s education, foreign language proficiency, social background or cultural awareness,  a curious situation to consider is that – for example in a cartoon that will be watched by children accompanied by adults – VVC could be used for a subset of audience members to decode.

6.2. Conclusions

First, the reported experimental findings suggest that individuals who approach audiovisual material as producers and/or analysts (trainee translators in Study 2) can be attending to elements of the image differently from end users of the product (viewers in Study 1). Second, the two groups appear to be reasoning about the status of VVC in only partly compatible ways which compellingly suggests that – even on early stages of competence development (year 3 of the BA programme) – trainees engage with audiovisual material in a different manner. At the same time we have observed clear variation in how attention is allocated within the ‘regular’ and ‘trainee translator’ groups of viewers. Then, it is interesting to note based on the findings from Study 1 that whether viewers themselves detect VVC can unexpectedly influence their choice as to whether such text on screen needs to be translated.

With the above in mind, liminally ostensive VVC – as outlined in this article – is a good example of a subtitling decision-making problem, if not a prototypical one, all the more so because it tends to be overlooked in subtitling guidelines. Vitally, the core of the problem is the cognitive and meaning-making status of VVC which has to be considered before the subtitler chooses (not) to render it. By surveying and illustrating some of the considerations and implications behind the mechanisms involved in the processing of VVC this paper has attempted to offer a preliminary account that could be of use to trainees, translators and translator trainers.


This work was supported by the Polish National Science Centre, grant number DEC-2017/01/X/HS1/00812, awarded to Mikołaj Deckert.


Broadbent, Donald (1958) Perception and communication, Oxford, Pergamon Press.

Chabris, Christopher F., and Daniel J. Simons (2010) The invisible gorilla, London, Crown Publishers.

Chaume, Frederic (2000) La traducción audiovisual: estudio descriptivo y modelo de análisis de los textos audiovisuales para su traducción, PhD diss., Jaume I University, Spain.

Chaume, Frederic (2001) “Más allá de la lingüística textual: cohesión y coherencia en los textos audiovisuales y sus implicaciones en traducción” in La traducción para el doblaje y la subtitulación, Miguel Duro (ed.), Madrid, Cátedra: 65–82.

Chaume, Frederic (2004) Cine y traducción, Madrid, Cátedra.

Delabastita, Dirk (1989) “Translation and mass-communication: Film and T.V. translation as evidence of cultural dynamics”, Babel 35, no. 4: 193–218.

Deutsch, J. Anthony, and Diana Deutsch (1963) “Attention: some theoretical considerations”, Psychological Review 70, no. 1: 51–60.

Di Giovanni, Elena (2016) “The layers of subtitling”, Cogent Arts & Humanities, Vol. 3, URL: (accessed February 2018).

Eagleman, David (2012) [untitled talk], The Up Experience, URL: [url=][/url] (accessed September 2019).

Flaherty, Michael (1999) A watched pot: How we experience time, New York, New York University Press.

Goldstein, Robert B., Woods, Russel L., and Eli Peli (2007) “Where people look when watching movies: Do all viewers look at the same place?”, Computers in Biology and Medicine 37, no. 7: 957–64.

Gouadec, Daniel (2002) “Training translators: Certainties, uncertainties, dilemmas” in Training the Language Services Provider for the New Millennium: Proceedings of the III Encontros de Traduçăo de Astra-FLUP, Belinda Maia, Johann Haller, and Margherita Ulyrch (eds), Porto, University of Porto: 31–41.

van der Heijden, Alexander H. C. (1992) Selective attention in vision, London, Routledge.

James, William (1950) The principles of psychology, New York, Dover Publications.

Kruger, Jan-Louis (2008) “Subtitler training as part of a general training programme in the language professions” in The Didactics of Audiovisual Translation, Jorge Díaz Cintas (ed.), Amsterdam and Philadelphia, John Benjamins: 71–87.

Kruger, Jan-Louis (2012) “Making meaning in AVT: eye tracking and viewer construction of narrative”, Perspectives 20, no. 1: 67–86.

Kruger, Jan-Louis, and Stephen Doherty (2016) “Measuring cognitive load in the presence of educational video: towards a multimodal methodology”, Australasian Journal of Educational Technology 32, no. 6: 19–31.

Matamala, Anna, and Pilar Orero (2015) “Text on screen” in Pictures painted in Words: ADLAB Audio Description guidelines, Aline Remael, Nina Reviers and Gert Vercauteren (eds), URL: (accessed May 2018).

Noe, Alva, Pessoa, Luiz, and Evan Thompson (2000) “Beyond the Grand Illusion: What Change Blindness Really Teaches Us About Vision”, Visual Cognition 7, no. 1-3: 93–106.

Rensink, Ronald A., O'Regan, Kevin J., and James Clark (1997) “To see or not to see: the need for attention to perceive changes in scenes”, Psychological Science 8, no. 4: 368–73.

Rensink, Ronald A. (2009) “Attention: Change Blindness and Inattentional Blindness” in Encyclopedia of Consciousness, William P. Banks (ed.), New York, Elsevier: 47–59.

Simons, Daniel J., and Ronald A. Rensink (2005) “Change blindness: past, present, and    future.”, Trends in Cognitive Sciences 9, no. 1: 16–20.

Simons, Daniel J., and Michael S. Ambinder (2005) “Change Blindness Theory and Consequences”, Current Directions in Psychological Science 14, no. 1: 44–8.

Simons, Daniel J., and Christopher F. Chabris (1999) “Gorillas in our midst: sustained inattentional blindness for dynamic events.”, Perception 28, no. 9: 1059–74.

Simons, Daniel J., and Daniel T. Levin (1998) “Failure to detect changes to people during a real-world interaction”, Psychonomic Bulletin & Review 5, no. 4: 644–49.

Sperber, Dan, and Deirdre Wilson (1986/1995) Relevance: Communication and Cognition, Oxford, Blackwell.

Szymańska, Katarzyna (2011) “Czy hipoteza Wielkiej Iluzji jest problemem dla teorii percepcji?”, Analiza i Egzystencja 16: 27–40.

Tamayo, Ana (2017) “Signifying codes of audiovisual products: Implications in subtitling for the D/deaf and the hard of hearing”, inTRAlinea, Vol. 19,  URL: (accessed May 2018).

Tirkkonen-Condit, Sonja (2005) “The monitor model revisited: evidence from process research”, Meta: Translators’ Journal 50, no. 2: 405–14.

Treisman, Anne (1964) “Selective attention in man”, British Medical Bulletin 20, no. 1: 12–16.

Zhang, Liming, and Weisi Lin (2013) Selective visual attention: computational models and applications, Singapore, Wiley.

Zabalbeascoa, Patrick (2008) “The nature of the audiovisual text and its parameters” in The Didactics of Audiovisual Translation, Jorge Díaz Cintas (ed.), Amsterdam and Philadelphia, John Benjamins: 21–37.


[1] With the non-diegetic type, the technique of inserting text itself signals the filmmakers’ intentionality more openly and therefore such text will not be liminally ostensive.

[2] One framework whose application of the term also fits our purposes is that of Relevance Theory (Sperber and Wilson 1995), where it is understood as creating expectations of stimuli’s relevance.

[3] The authors themselves point out it “has become one of the most widely demonstrated and discussed studies in all of psychology” (Simons and Chabris 2010: 8). 

[4] Participants for this study were recruited with the use of social media and through personal channels which was to ensure a balance between the diversity in professional and educational background, on the one hand, and limited age variation, on the other hand.

[5] It was ran by P.J. under the supervision of M.D. at the University of Łódź. Also, the authors wish to thank the participants of the Introduction to AVT course held at the Institute of English Studies in the Spring of 2017 for their input and coming up with film samples some of which were used for the purpose of the research reported here.

[6] The sample used in the study comes from  season 16, episode 2 (2004).

[7] [url=][/url]

[8] Screenshots from “The Simpsons” could not be used as permissions from copyright owners required by the journal were not granted.

[9] What it more, the film is not very widely known so the viewers were less likely to have watched it and have preconceptions. This was largely confirmed in the questionnaire as 87.1 per cent of subjects stated they had not seen the film before.

[10] It remains to be empirically tested whether for viewers whose command of the source language would enable them to understand an instance of VVC, the lack of translation could imply the element is not worth the processing effort they might have otherwise chosen to invest.

[11] From a methodological vantage point, watching a short clip and watching a 90-minute video will differ in how attention is allocated throughout. One way to see the difference is that when presented with a short clip in an overtly experimental setting, a viewer is more stimulated and could therefore recruit more resources, which in turn could be making the viewer more likely to detect VVC. If that was true, we could expect the VVC identification rates to be lower in fully authentic viewing setups.

[12] That multiple viewing is the case was directly supported by respondent feedback in Study 2. At the same time, we should not overestimate the meaning-making import of repeated viewing. As psychological studies indicate (cf. for example Flaherty 1999 and Eagleman 2012) we differently pay attention when processing an event or scene – for instance on our way to a particular location – for the first time and in subsequent encounters. Due to the stimuli’s novelty we notice more elements the first time which is a reason why we perceive that first journey as taking more time than subsequent ones. This suggests that the identification of secondary foci should not be incompatibly different between an individual seeing a clip once and someone who might (have to) watch a fragment repeatedly. Still some incongruence is sure to exist.

[13] We are referring here to cases as they are numbered in Study 1.

About the author(s)

Mikołaj Deckert is an assistant professor at the Institute of English Studies, University of Łódź. In his current research he uses experimental and corpus methods to look into language and cognition as well as interlingual translation. He serves on the editorial board of the Journal of Specialised Translation, and is a founding member of the Intermedia AVT Research Group.

Patrycja Jaszczyk is an MA student at the Institute of English Studies, University of Łódź. Her primary research interests are audiovisual translation and cognitive psychology. In her work she has been especially concerned with attention phenomena in interlingual multimodal transfer.

Email: [please login or register to view author's email address]

©inTRAlinea & Mikołaj Deckert and Patrycja Jaszczyk (2019).
"Subtitling choices and visual attention: a viewer perspective"
inTRAlinea Special Issue: New Insights into Translator Training
Edited by: Paulina Pietrzak
This article can be freely reproduced under Creative Commons License.
Stable URL:

Go to top of page