Audio introduction meets audio description:

an Italian experiment

By Elena Di Giovanni (University of Macerata, Italy)


Access services for the blind and visually impaired have been gaining momentum within the audiovisual translation studies community: research, international projects and experiments have been flourishing and, by so doing, they have contributed to enhancing service provision. However, if audio description has been the object of attention for several years now, audio introduction has hardly ever been considered by the academic community. Yet, audio introduction was the first service to be provided for the blind and visually impaired in the early Eighties, for the enjoyment of live events and films. A project started in the UK in early 2012 and brought to Italy by the University of Macerata at the end of the same year, has aimed to bring audio introduction to the fore and to evaluate its function and relevance in conjunction with audio description. This paper aims to report on the complex experiment which led to testing the audio introduction and audio description of the Italian version of Slumdog Millionaire with a group of 20 blind individuals, in November 2012. The experiment, which started with the transcription, translation and adaptation of the English audio description available on DVD in the UK, and ended with the analysis of a set of three questionnaires administered to each blind participant, is the very first of its kind. This article provides a description of the whole experiment as well as a series of reflections on the results yielded by the questionnaires.

Keywords: audio description, reception, media accessibility, blindness, audio introduction

©inTRAlinea & Elena Di Giovanni (2014).
"Audio introduction meets audio description: an Italian experiment"
inTRAlinea Special Issue: Across Screens Across Boundaries
Edited by: Rosa Maria Bollettieri Bosinelli, Elena Di Giovanni & Linda Rossato
This article can be freely reproduced under Creative Commons License.
Stable URL:

1. Introduction

In the ‘game of turns’ translation studies has been experiencing over the past three decades (see Bassnett & Lefevere 1998; Bassnett & Trivedi 1999; Snell Hornby 2006), we have seen an increasing specialization of each sub-field, and several of these sub-fields have expanded beyond our expectations. Among the latter are audiovisual translation studies (AVT), which, in themselves, have witnessed waves of intensive interest for different theoretical and empirical approaches, as well as the development of sub-specializations. The study of accessibility in audiovisual media has flourished in Europe in the past decade and, unlike North America, it has its roots mainly within AVT. An extremely lively and dynamic research domain, AVT has provided fertile ground for research in media accessibility and, although the cognitive, clinical and technical aspects involved in its study are anything but secondary, the linguistic component is equally prominent and lends itself to diverse interdisciplinary analyses. As an additional, positive characteristic, European research in accessibility has been, from the very onset, committed to the enhancement of access services provision and, at the same time, interested in understanding their audiences to best cater for their needs.

Exploring audiences has led to important research cross-fertilization, but also to the unveiling of the great potential of most access services, whose enjoyment and benefits are not to be confined to specific types of impairments (see Udo 2009; Romero Fresco 2013). This latter aspect is perfectly embodied by audio introductions, which are still largely unexplored yet widely diffused, especially within certain settings.

Audio introductions (AI) originated in the Eighties in the USA and appeared in Europe, more precisely in the UK, at the beginning of the Nineties (York, 2007). Conceived for live performances in theatres and opera houses, the function of AIs was – and still is – to provide as much information as possible for visually impaired patrons wishing to fully enjoy a performance. And if they have since become increasingly widespread throughout Europe, their application to television and cinema has hardly ever been considered, let alone experimented with.

Audio description (AD) is the most popular and widespread technique to make film and television, but also live shows, accessible for the visually impaired. However, due to the fairly high production costs – especially if compared to those of subtitling for the deaf and hard of hearing – and the supposedly limited size of its target audience, audio description has found it hard to establish itself as a regular access service in a fairly large number of European countries. Experiments are being carried out to cut costs while keeping quality, especially through the use of artificial voices (see, for instance, Walczak & Szarkowska 2012). However, as shown by several reception studies (Fryer & Freeman 2012; Di Giovanni 2014), recourse to experienced voice talents remains the preferred option.

All in all, the promotion and increasing diffusion of audio description should not prevent us from exploring the application of audio introduction to cinema. As the experiment reported here shows, audio introduction can be a complement to, not a replacement for, cinema audio description: its features and functions are in effect complementary, not conflicting. Moreover, audio introduction is useful for the visually impaired but can also be appreciated by sighted viewers, as has been evidenced by a number of studies and experiences which will be detailed below.

This article reports on an experiment carried out in Italy throughout 2012, as a follow up to a project which originated in the UK the year before (see Romero Fresco & Fryer 2013). Based on a film (Slumdog Millionnaire) in its Italian dubbed version, the experiment aimed to test the reception and appreciation of AI and AD by blind individuals. The next sections will first of all offer a comparison between AI and AD, to then move on to a presentation and discussion of the experiment, also making reference to the results obtained in the UK. The final sections will reflect upon the potential and the future of AI and AD, as well as on desirable, further developments for this project.

2. Audio introduction, audio description and Italy

2.1. Audio introduction

Audio introductions, also called introductory notes (Fryer & Romero-Fresco forthcoming; York 2007) are easily explained by their own name: their purpose is to introduce a work, be it a theatre play, an opera or a film. Their preliminary nature places them outside the work, or better, assigns to them the role of pathways towards the enjoyment of an audiovisual text. Coming before it, they hold several advantages over AD, but they have also additional duties to fulfil. As far the advantages are concerned, AIs offer an opportunity to provide an external view of an audiovisual text, as well as to give voice to its creators, for instance by reporting the director’s or scriptwriter’s words or ideas. As Fryer and Romero Fresco (forthcoming) put it, AIs, as complements to ADs and by virtue of their external nature, can concentrate on the how of a film – how it is made, i.e. its visual style and structure – whereas AD normally focuses solely on the what, i.e. the actions and events as they are evoked visually.

Another advantage of AI over AD lies in its being a cohesive text, as opposed to the often succinct in-text descriptions. Cohesion – and coherence – originates out of unity: audio introduction is generally enjoyed as a whole, without interruptions, whereas audio description consists of a number of short sequences of words, appearing when no other significant auditory stimulus is present.

On the other hand, AIs have a number of extra duties to fulfil, if compared to ADs: they shouldn't reveal too much about the play or film which they are presenting; they have to provide information which functions, precisely, as a smooth accompaniment to, and not a substitute for, the text and its AD. AIs are, therefore, contextualizing narratives, whose benefit clearly goes beyond the needs of blind patrons.

As for AI content, Fryer and Romero Fresco (forthcoming) state that “Creating an AI is much like making your own jigsaw: shaping the pieces and fitting them together. However, there is no single template and there are many possible solutions to the puzzle”. Drawing inspiration from the practice of AI for theatre and opera across the UK, the introductions written for the experiment here reported followed this pattern: overall film presentation, genre and structure; synopsis; information about the visual style; characters; locations and, last but not least, cast and production details.

In his article on audio introductions, Greg York (2007), a long-standing writer and performer of AD and AI in the UK, defines the nature of AI by stating what it should contain, how it should be put together and finally delivered. Although referring specifically to opera and ballet AI, most of his remarks are important to better understand the role that AI can play in the enjoyment of other audiovisual texts. Amongst several important aspects, York points out that AI “doesn’t have to be delivered live. It can be pre-recorded and made available in advance” (2007: 31). This is indeed an essential feature, especially if we think of a possibly wide diffusion for film audio introductions. In a number of theatres across the UK and beyond, audio introductions are made available before the performance in a variety of formats: CDs, MP3 files, etc. Along these lines, the Italian Sferisterio Opera Festival (held in Macerata, Italy, in July and August, 2013) made its audio introductions in English available on MP3 players and for download from a server, upon request.[1] As for film AI, the only experience known so far was carried out in 2011 at the 69th Venice Film Festival (Mostra del Cinema di Venezia) in Italy. On that occasion, a UK-based company developed an application for iOS, which allowed viewers within the festival area to download the AI (in English and Italian) for a selection of 15 films. The AIs were based on materials released by the producers and distributors (synopsis, interviews, etc.) and were about 10 minutes long. Although widely appreciated by a fairly large, and not exclusively blind audience, unfortunately the experiment hasn’t been repeated. It does, however, constitute an inspirational example and starting point for further development.

2.2 Audio description

As opposed to audio introduction, audio description is born inside the audiovisual text, in our case a film. It unfolds in the interstices of filmic narration, appearing when there are no significant auditory stimuli, be they verbal or non-verbal. Therefore, AD is by nature dynamic, gradual, its usually brief descriptions being not necessarily cohesive within and among themselves, but possibly also with the film. The nature of AD is different from, but complimentary to that of AI: while the latter, from outside, takes the audience towards the film, AD then takes over and accompanies them throughout the entertainment experience. Borrowing linguistic terminology, we could say that AD establishes a co-textual relationship with the film, whereas AI works as a context (Catford 1965).

Although widely studied nowadays, especially with reference to the guidelines developed in each country for its creation (Rai et al. 2010, Orero 2012), AD is nonetheless far from easy to classify. This is due to the variety of forms it can take, based on the type of audiovisual text to be described – live events and films imply different description techniques and strategies – and, within one type or category, depending on the genres, sub-genres and the unique genre incarnation of each text. Action films, with fast dialogues and numerous sound effects, may have little space for AD, whereas slower, more reflexive films may have longer pauses from both dialogue and sound. Slumdog Millionaire for instance, is a fast action film, whose narrative unfolds on different chronological levels and whose dialogues – in two languages – are very dense and very often complemented by meaningful sound effects.

Another universally-shared feature of AD is its culture-based nature: originating as the ‘worldview’ of a people and society, AD is generally considered untranslatable. Although there have been experiments in support of its translatability (see, for instance, Remael & Vercauteren 2010), they have hardly gone beyond the experimental phase and have not been matched by any systematic practice anywhere in Europe. Although indirectly, the experiment reported in the next sections raises this issue and, as we shall see, questions the untranslatability principle.

3. The Millionnaire experiment

In 2011, Louise Fryer and Pablo Romero Fresco decided to test the appreciation and reception of audio introductions and audio descriptions for two films: Slumdog Millionnaire (Colson, Bolye & Tandan 2008) and Man on Wire (Chinn & Marsh 2008). Inspired by Louise Fryer’s experience as writer and performer of AI and AD for London’s West End theatres, they aimed to explore the opportunities offered by AI for film, either in conjunction with AD or in its own right. The flexible nature of AI, its relatively simple and not too costly production, its stand-alone nature, which allows for fruition anytime before (or even after) a film is viewed, and, last but not least, its audience-reaching potential, seem to be strong enough reasons for putting AI to the test with a sample of end users.

Using the AD available on the two films’ DVDs, Fryer and Romero Fresco drafted an AI and developed a set of three questionnaires to be administered to a group of blind individuals. As a follow up to, and an expansion of this project, a team at the University of Macerata,[2] with the technical support of SubTi Ltd,[3] decided to set up a similar experiment in Italy. Since no Italian AD was available for either film used in the UK, and for the sake of homogeneity with the British experiment, Italian researchers opted for a translation of the English AD and decided to work only with Slumdog Millionnaire (released in Italy as The Millionnaire in the same year). The choice of this film over Man on Wire was due to the latter’s documentary nature and dialogue/narration density, which leaves very little space for descriptions and makes the AD rather short and therefore not especially representative of film audio description and its reception.

4. Structure

Begun in January, 2012, the Italian experiment involved steps which were not necessary for its English counterpart. First of all, the Italian dubbed version of Slumdog Millionnaire was used and, with a view to mixing its soundtrack with an Italian version of the English AD, the latter was first fully transcribed and then translated into Italian. The first draft of the translation required thorough revision, generally in terms of deletion and reduction. The English AD, besides descriptions, contained the reading of the Hindi dialogues, each line being introduced, for the sake of clarity, by the describer’s voice. These inserts were obviously irrelevant for the Italian version, as all dialogues from the original film had been translated and dubbed into Italian, therefore they were deleted. As for reduction, its application to the translated AD was due to three factors:1) Italian ADs are generally less dense and slower in pace than their English counterparts; 2) Italian blind viewers are less frequently exposed to AD, therefore their AD+film processing skills, although so far virtually never tested, are likely to be slower; 3) Italian words are generally longer than English words, and each syllable is clearly pronounced, thus requiring longer reading times.

First draft of AD translation

Final version of Italian AD

00:06:14,24 --> 00:07:16,21

Dei giovani trasandati e dai vestiti sudici giocano a cricket sulla pista di un aeroporto. Salim grida: "Jamal, prendila! Prendila Jamal! Jamal, è tua!" Il giovane Jamal inciampa e manca la palla. Gli altri ragazzi si picchiano in fronte. Salim grida: "Come hai fatto a mancare un lancio così facile?" Due guardie aeroportuali arrivano su dei motorini, gridando: "Suolo privato." Brandiscono dei manganelli. Il gruppo di ragazzini si sparpaglia improvvisamente in tutte le direzioni. Uno di loro grida: "Arrivano gli sbirri, correte!" Un altro ragazzino si volta per fare un gestaccio alle guardie. Una guardia grida: "Se non vi uccidono gli aerei, lo faremo noi!" Mentre i ragazzi scappano sulla pista, Salim grida a Jamal: "Ehi, fratello!", e gli passa un bastone. I due fanno batti cinque. I ragazzini si arrampicano su cumuli di rifiuti, saltano sopra tetti di lamiera ondulata e s'infilano nelle strette viuzze dei quartieri poveri della città. Le guardie scendono dai motorini e li inseguono a piedi.[4]

Dei giovani trasandati e dai vestiti sudici giocano a cricket sulla pista di un aeroporto.

Jamal è ora un ragazzino. Inciampa e manca la palla. I compagni si colpiscono la fronte.

Due guardie aeroportuali arrivano in motorino, brandendo dei manganelli.

Il gruppo di ragazzini si sparpaglia in tutte le direzioni. Un ragazzino si volta per fare un gestaccio alle guardie.

I ragazzini si arrampicano su cumuli di rifiuti, saltano sopra tetti di lamiera ondulata e s'infilano nelle strette viuzze dei quartieri poveri della città. Le guardie scendono dai motorini e li inseguono.[5]

Table 1: first and final draft of AD translation

Table 1 above shows the first draft of the translation on the left, and the final version with all due changes on the right. As can be noted through the series of split timecode sequences on the right column, as opposed to the macro sequence on the left, when adapting the AD according to the parameters above, it was decided that shorter sequences would be preferable, so as to provide short breaks and possibly make the overall reception of the AD – and the film – smoother.

The English audio introduction was also translated from English into Italian, the process being simpler this time , although also implying a certain degree of adaptation: all-British references contained in the original AI were normally deleted, so that the Italian AD, when recorded, was on the whole some 70 seconds shorter than its English counterpart. The example below shows the type of reference that was eliminated in the Italian translation:

We encounter Latika at Mumbai central station, almost the spitting image of London’s St Pancras. Behind the neo-gothic façade, five long platforms are connected by a series of footbridges.

Incontriamo Latika nella stazione centrale di Mumbai. Al di là della facciata neogotica, cinque lunghi binari sono collegati da una serie di piccoli ponti per il passaggio pedonale.[6]

Finally, the Italian AI and AD were passed onto a recording and mixing studio through SubTi Ltd and the final product – Italian dubbed version of Slumdog Millionnaire (The Millionnaire), mixed with the AD and preceded by the AI, was delivered in September, 2012. Preparation of the materials to carry out the experiment culminated in the translation and adaptation of the three questionnaires which would be administered to the participants, as will be detailed in the following section.

5. Method

Once the materials to be used for the Italian experiment were ready, the research team met to define the methodology to be applied to the core section of the experiment, in terms of process and result analysis. Contrary to what had happened in the UK, where two screenings were organized at Roehampton University and other participants were involved from home,[7] all Italian participants were invited to one screening, held at the University of Macerata on 12 November, 2012, at 4 pm. The overall experiment required 3 hours, with 24 blind individuals, their assistants or partners, and 4 researchers engaged in the experiment organization and questionnaire administration. Participants had received an invitation by email, which simply asked them to join a university experiment involving the screening of an audio described film, never before made available in Italy with AD.

By way of introduction, the experiment was presented and explained. Participants were informed that they would be administered three questionnaires: one demographic questionnaire at the beginning, one questionnaire after hearing the AI, and one final questionnaire after the film screening. Questionnaires were on paper. They were compiled with the support of the blind’s assistants and the four researchers.

All three questionnaires comprised mainly closed questions, with replies ranging across a five-point spectrum: “strongly disagree” corresponded to 1 and “strongly agree” to 5, with “disagree” (4), “neither disagree nor agree” (3) and “agree” (2) as intermediate options.

Information about the participants, as taken from the demographic questionnaire, are given in the next section.

6. Participants

The Italian experiment saw the enthusiastic participation of 20 blind individuals, as opposed to the overall 24 participants to the English project. All of them were members of the Italian association for the Blind (UICI), therefore legally recognized as blind, although 25% (5) of them declared themselves to be affected by severe visual impairment rather than totally blind. 65% of them were female (13) and 35% were male (7), their age ranging from 22 to 81, with an average age of 54.28. The largest section of participants (45%) was in the 45 to 64 years-old age range, which is similar to the UK participants where the average age was 51.87.

As recorded through the two, final classification questions in the demographic questionnaire, only 5% of the participants stated they were able to detect evident scene changes on the screen, whereas 94% stated they were totally unable to distinguish objects, faces and movements on screen. Finally, the last question of this first, demographic questionnaire enquired about the participants’ love of films and their familiarity with AD: 60% of them said they were film lovers, and only 30% went on to say they watched films with AD, which also reflects the scarcity of audio described films available in Italy. The next two sections present results for the questionnaires administered after delivery of the AI, and subsequently of the film with AD.

7. Post-AI questionnaire results

As stated by Fryer and Romero Fresco (2012, 2013), one of the main aims of this experiment was to test the reception and appreciation of information which is normally only hinted at, or utterly excluded from, audio description but which can be included in AI. Besides the overall introduction to the film and its synopsis, the AI created for this project featured, as is normally the case with AI for live performances, detailed information about characters, locations and the visual style of the film. With reference to the latter, it is generally assumed, and specified in the most diffused and quoted AD guidelines (see, for instance, ITC 2000), that the use of filmmaking terminology (camera angles, shots, etc.) is to be discouraged. However, as AI for live events generally includes technical information about stage direction and scene changes, it was decided that the AIs for this experiment would replicate this trend and include not only filmmaking terms, but also references to scene/shot changes, rhythm and other elements in relation to plot development. Moreover, as Romero Fresco and Fryer report with reference to their study (2013), only 29% of the UK participants expressed disagreement with the recourse to cinematic terms, and did so only by stating that they were too numerous in the AIs they had been exposed to, not that they should be excluded altogether.

Focusing on the Italian experiment, the questionnaire which was administered after AI delivery comprised two parts: section A featured eight closed questions which asked to select a reply in the “strongly disagree” to “totally agree” range, whereas section B included 5 closed questions with a choice of three replies: “too short/too little”, “about right”, “too long/too much”. Section A started with a question on the amount of information, which was stated to be excessive (through the strongly agree/agree reply options) by 45% of the participants. Such a high percentage may be ascribed to the total lack of previous exposure to AI for films, and to the respondents’ eagerness to move onto the film (as was declared by two participants). Nonetheless, 95% of them went on to strongly agree with the statement “I would like audio introductions to other films”; 80% stated that they appreciated the order in which the information was provided (only 5% disagreed to the designed order) and, significantly, 85% declared that they would like to be able to download AIs from a website.

Section B in the post-AI questionnaire aimed to gauge the appreciation of the AI content, although without eliciting spontaneous responses (as happened with the third questionnaire whose results are reported in the next section). When participants were asked if they thought the AI was too short/about right/too long, 70% of them opted for “about right”, therefore confirming the AI’s overall appreciation. In the UK tests, the percentage rose to 83%, perhaps due to the English blind participants’ previous exposure to, and familiarity with, this type of supportive text. When asked to evaluate the amount of information provided for the four core elements of the AI (visual style, plot, characters, locations), interestingly enough the highest percentage of positive replies (“about right”) was recorded for the visual style, appreciated by 85% of the respondents. 80% of them went on to declare that both the plot and the characters were rightly described, whereas locations scored 70% of “about right” replies. The English questionnaires had similarly appreciative percentages, with locations and characters scoring 87.5%, plot 79% and visual style 70%.

8. Post-screening questionnaire results

The third and final questionnaire aimed to elicit more in-depth, spontaneous responses to the overall experiment. Following from the second questionnaire, the first section of this third questionnaire was named “C” and comprised 3 open questions, focusing on visual style, characters and locations respectively.

The first open question asked “What do you recall about the visual style of the film?” Participants provided vivid and often detailed descriptions in their replies; most interestingly, over 40% of these contained words and phrases thatwere almost unaltered repetitions of the information conveyed in the audio introduction. As the table below shows, four replies (left column) relayed technical information about the visual style, not only by replicating the wording of the AI (right column), but also re-contextualizing it, confirming the overall understanding and reception of information provided in the AI, AD and the film itself.

Replies AI

“ricordo che le inquadrature a volte non sono dirette sui personaggi, ma oblique. mi sembra di ricordare anche che i passaggi da una scena all’altra siano rapidi.”

“Il film è molto dinamico, con tagli netti e scene al rallentatore, velocizzate o invertite, organizzate in sequenze di montaggio sorprendenti. La varietà dei contrasti che caratterizzano gli affollati e poveri quartieri di Mumbai viene catturata attraverso rapidi passaggi da panoramiche a primi piani estremi; i visi sembrano quasi assumere rilievo oltre lo schermo, mentre i piani di ripresa inclinati attestano il rifiuto di inquadrature orizzontali ed evocano un forte dinamismo.” [8]



“i continui cambi di scena, segnati da tagli di ripresa molto netti, nonché alcune scene particolari le cui azioni venivano mosse alrallentatore.”

“riprese fatte in obliquo per mettere in risalto l'ambientazione.”

“da primi piani stretti a piani lunghi”.[9]


Table 2: excerpts from the respondents’ replies and from the audio introduction

With reference to Table 2 above, it is worth highlighting that some of the details recalled by the participants are strictly technical (slanted camera, slow motion), and their being evoked in the replies proves that they have been both absorbed and understood. Further tests ought to be carried out with reference to the reception of similar terminology as used in AD (see, for instance, Freeman & Fryer), to gauge their reception when provided within the film. However, as far AIs are concerned, our results confirm that describing the visual style of a film is both feasible and well-received.

The second open question in this final questionnaire enquired about characters’ descriptions (“What do you recall about the characters in the film?”). Once again, replies were lengthy and detailed, providing interesting, additional insights to those given for question one. As a matter of fact, unlike the visual style, which was only presented in the audio introduction, the participants were given descriptions of the characters in two ways: the audio introduction provided a contextualizing description of the main characters (Jamal, Salim, Latika, Jamal’s mother, the anchorman, the police officer) and, for the protagonists Latika, Jamal and Salim, it explained and illustrated their appearance in the film as children, teenagers and adults. Additionally, the audio description provided extra details as the film unfolded, in the form of co-textual elements related to the film text. On the whole, the replies conveyed a mixture of information obtained from the AI and AD, thus proving the overall reception of both and a positively complex integration of their insights. A fairly high number of participants (12 out of 20) reported details of the characters’ face and body appearance, a few using words provided in the AI: Latika’s bony structure as a child as well as her curly hair, Jamal’s short, dark hair and deep, dark eyes. And if some replies pointed to an interpretive reception of the descriptions (with Salim being remembered for his courage as a kid, and then for his evil character as an adult, Latika being recalled for her freshness and also her sad expressions when forced into prostitution) others conveyed elements which had solely been provided in the AI, such as the police officer’s sweaty shirt, or the anchorman’s unshaven beard and his blue tie. Moreover, two participants stated that “the audio introduction was useful to understand the characters’ appearance”.

The third and final question in section C asked participants to say what they recalled about the film’s locations. As with the characters and their descriptions, an appreciation of locations was openly linked by a few participants (3) to the input provided by the AI. For instance, a male participant aged 56 declared “I remembered the trains, and the places where the children grew up, from the audio introduction. It was easier to understand them during the film”. Nonetheless, another male participant, aged 65, stated “I would have liked to receive more information about the locations”, as if the details provided could have been further developed. Indeed, locations are described in less detail than characters in the AI, also because most locations are clearly depicted in the AD, so as to mark transitions between narrative levels and subsequent scenes. In some of the replies, a mixture of details from the AI and the AD was evoked: one participant best remembered the lush forests (AI), the chaos in Mumbai (AD and film) and a luxurious hotel (AD); another highlighted the slums of Mumbai (AI and AD), the steam train (AI) the Taj Mahal (AI and AD) and the game show hall (AI).

The final set of four, closed questions contained in section D of this third questionnaire aimed to encourage participants to openly comment on the overall experiment, and express their opinions about the usefulness of the AI. 55% of them declared that the AI had indeed contributed to “bringing the film to life”, with only 20% disagreeing with this statement. Moreover, 65% of the respondents stated that they would not have been able to retrieve the information supplied by the AI elsewhere, and 60% stated that the AI had made the film easier to follow (with only 15% clearly disagreeing with this). Additional, open comments provided by the respondents will be referred to in the next section.

9. Discussion

The third and final questionnaire closed on a broad, open question, allowing participants to provide any additional comments they wished. Notwithstanding the long hours spent in the screening room, this open question was seen by a fairly large number of participants as an opportunity to express their appreciation for the audio description. 40% of them openly did so, defining it as “very well-done”, “accurate” and “precise”, also adding specific references to scenes and excerpts which the AD had, in their opinion, appropriately underscored. Moreover, 4 participants appreciated the quality of the recording and sound mixing, and 6 of them enthusiastically recalled the voice of both AI and AD, stating that it was “particularly appropriate” and also suggesting that the name of the voice talent ought to appear in the credits provided at the end of the film. Against all odds, this translated AD was definitely well received, thus adding extra value to, and feedback for, this experiment.

As for the AI, the results reported above prove that it was also generally well-received and appreciated. However, and perhaps most significantly, appropriate reception of the AI were recorded indirectly: recalling exact words and expressions from the AI when replying to questions at the very end of the film, i.e. over two hours after listening to it, the participants demonstrated that they had not only understood and retained it, but that its input had converged in their overall reception of the film. As a matter of fact, words from the AI were appropriately re-contextualized by the respondents, and when information about a character, or a location, had been provided both in the AI and the AD, excerpts from both were recalled and appropriately juxtaposed in the replies. Furthermore, overt appreciation of the voice talent, and a recommendation that his name be mentioned in the film credits, is further proof of the overall positive reception of the AI and the AD, and also a rare instance of requested visibility for what is normally perceived as a necessarily, or preferably, invisible service. On this subject, for instance, the California Audio Describers’ Alliance state that “Harmonious description renders the describer invisible and virtually indistinguishable from the event,”[10] whereas Canadian audio describer Joe Clark recommends that “You work for the production and the audience.  A certain self-effacement is required.” (RNIB 2010: 68). In our experiment, it was perhaps the successful balance between self-effacement and the harmonious blend of AI and AD, that ensured such a high percentage of positive feedbacks from the participants.

10. Conclusion

The main goal of this experiment was to test the reception and appreciation of audio introduction for films, in itself but also in its interaction with audio description. Although this was a small scale project, which has no claim to quantitative representativeness, the goal was reached and the results were both insightful and positive, both in the UK and in Italy. Focusing on the latter as the case in point in this essay, we can conclude that the AI was largely appreciated, that no overlaps with the AD were reported and that, on the whole, the AI was perceived as complimentary to the AD. This leads us to assert that there certainly is room for development of AI for films, all the more so if we consider its stand-alone nature, its low production costs and its flexibility both in terms of delivery and fruition.

Moreover, the results discussed above point to a cross-fertilization of the research in, and practice of, both AD and AI. By analysing the reception of AI, for instance, we found an unexpectedly high percentage of positive feedback for the information about the film’s visual style, which was particularly appreciated for being provided before the film itself. As the use of cinematic terms in AD remains a rather controversial issue, especially since their correct interpretation by different types of visually impaired individuals cannot be taken for granted, this and further studies could also enhance the development of AI for films as the best tool for expression of the author’s intentions, views and camera movements, thus also reaching out to non-blind viewers.

Another element which this experiment has brought to the fore is the positive reception of translated AD. It goes without saying that this can only be considered a pilot study in this direction, but it certainly brings up interesting issues which deserve further testing. What translation strategies are required to transfer the AD to a new linguistic and cultural context? From what language into what language would translation be feasible and recommendable? What would be the benefits of establishing such a translational practice, in terms of delivery times, costs, increased standardization at national and/or international level, etc.? These and other questions need to be addressed by further research, which, as the results of this study suggest, is indeed to be encouraged.

On the whole, this study proves that more interdisciplinary research is needed in media access services for the blind, possibly considering all the relevant issues and variables at stake. The cognitive processes which are put in place during AI and AD reception, the long-term vs short-term memory mechanisms and their effects, the impact of technology on standardization and the potential of the internet as a dissemination tool, among others, certainly deserve systematic exploration.

Interdisciplinary empirical research of this kind, moreover, is undoubtedly one of the most powerful ways to increase awareness, provision of accessible media content and, ultimately, effective social inclusion.


Bassnett Susan, and André Lefevere (eds) (1998) Constructing Cultures. Essays on Literary Translation, Clevedon, Multilingual Matters.

Bassnett Susan, and HarishTrivedi (eds) (1999) Post-Colonial Translation. Theory and Practice, London/New York, Routledge.

Catford, John (1965) A Linguistic Theory of Translation, London,Oxford University Press.

Di Giovanni, Elena (2014) “Visual and Narrative Priorities of the Blind and Non-blind: Eye Tracking and Audio Description”,Perspectives. Studies in Translatology, 22:1, 136-153.

Fels, Debora I., Udo, John Patrick, Diamond, Jonas E., and Jeremy Diamond (2006). “A Comparison ofAlternative Narrative Approaches to Video Description for Animated Comedy”,Journal of Visual Impairment & Blindness, 100: 5, 295–305.

Fryer, Louise, and JonathanFreeman (2012) “Cinematic Language and the Description of Film: Keeping AD Users in the Frame”, Perspectives. Studies in Translatology, DOI:10.1080/0907676X.2012.693108.

Fryer, Louise, and Pablo Romero Fresco (forthcoming) “Audio Introductions”,New Insights on Audio Description, Amsterdam, John Benjamins.

ITC (2000),Guidance on Standards for Audio Description,retrieved from (accessed 28 August2014).

Orero, Pilar (2012) “Film Reading for Writing Audio Description: a Word is Worth a Thousand Images?” Elisa Perego (ed) Emerging Topics in Audiovisual Translation, Trieste, EUT, 13-26.

Rai, S., Greening, J., and Petre, L. (2010) “A Comparative Study of Audio Description Guidelines Prevalent in Different Countries”, London, RNIB.

Remael, Aline, and Gert Vercauteren (2010) “The Translation of Recorded Audio Description from English into Dutch”, Perspectives.Studies in Translatology, 18: 3, 155-171.

Romero-Fresco, Pablo (2013) “Accessible Filmmaking: Joining the Dots Between Audiovisual Translation, Accessibility and Filmmaking”, JoSTrans, Journal of Specialized Translation, Issue 20, online at (accessed 28 August2014).

Romero-Fresco, Pablo, and Louise Fryer (2013) “Could Audio-Described Films Benefit from Audio Introductions? An Audience Response Study”,Journal of Visual Impairment & Blindness, July-August, 287-295.

Snell-Hornby, Mary (2006) The Turns of Translation Studies, Amsterdam,John Benjamins.

York, Greg (2007) “Verdi Made Visible. Audio Introduction for Opera and Ballet”, Jorge Diaz Cintas, Pilar Orero, and Aline Remael (eds) Media for All. Subtitling for the Deaf, Audio Description and Sign Language, Amsterdam, Rodopi, 215–229.

Walczac, Agnieszka, and Agnieszka Szarkowska (2012) “Text-to-speech Audio Description of Educational Materials for Visually Impaired Children”,: Silvia Bruti, Elena Di Giovanni (eds), Audiovisual Translation Across Europe: an Ever changing Landscape, Oxford, Peter Lang, 209-234.


[1] The Italian audio introduction, along with the in-text audio description, was delivered live and broadcast via infrared system. The English AI, provided for the first time on an experimental basis, was made available on MP3 players and for download.

[2]The Italian experiment here reported has been carried out at the University of Macerata by Elena Di Giovanni, author of this paper, and Agnese Morettini, PhD student in accessibility at the same university.

[3]SubTi Ltd is a London-based company providing audiovisual translation services worldwide.

[4] [A group of scruffy young boys in filthy clothes play cricket on an airport runway. Salim shouts: “Jamal, catch it! Catch it, Jamal! Jamal it's yours!” Young Jamal trips and fails to catch the ball. The other boys slap their foreheads. Salim shouts, “How did you manage to drop a sitter like that?” Two airport security guards arrive on mopeds shouting, “Private land!” They brandish long truncheons. The gang of boys suddenly disperses in all directions. One boy shouts, “The dogs are coming, run!” Another boy turns to make a rude gesture at the security men. One guard shouts, “If the planes don't kill you, we will!” As the gang of boys escape along the runway, Salim shouts to Jamal, “Hey brother!” And hands him a stick. They clap a high-five. The boys clamber across a rubbish tip, up over corrugated roof tops and down into the narrow alleyways of the slum. The security men dismount and give chase on foot.]

[5] [A group of scruffy young boys in filthy clothes play cricket on an airport runway. //

Jamal, now a boy, trips and fails to catch the ball. The other boys slap their foreheads. //

Two airport security guards arrive on mopeds, brandishing long truncheons. //

The gang of boys suddenly disperses in all directions. One boy turns to make a rude gesture at the security men. //

The boys clamber across a rubbish tip, up over corrugated roof tops and down into the narrow alleyways of the slum. The security men dismount and give chase on foot.]

[6][We encounter Latika at Mumbai central station. Behind the neo-gothic façade, five long platforms are connected by a series of small footbridges.]

[7] The UK experiment included a group of participants who did not wish to attend the screenings organized by the researchers but wanted to participate from home. They were sent a package containing the DVDs with the AI and AD and all the instructions to carry out the experiment in equal conditions.

[8] [The film is very dynamic, with harsh cuts, slow motion, fast and reverse shots and dazzling montage sequences. The contrasts which characterize Mumbai’s crowded slumsare depicted through rapid shifts between pan shots and extreme close ups. Faces are cut off by the edge of the screen, and tilted shots and camera angles reinstate the rejection of horizontal shorts and suggest a strong dynamism.”]

[9] [“I recall that sometimes shots do not aim at the characters, they are tilted. I seem to remember that shifts from one scene to the next one are rapid.” // “there are continuous scene changes, marked by sharp shot changes. Some scenes in particular are in slow motion.” // “Tilted shots aim at enhancing the locations.” // “from close ups to long shots.”]

About the author(s)

Elena Di Giovanni is Associate Professor of English Translation at the University of Macerata (accredited for full professorship as of 2020). She was President of the European Association for Studies in Screen Translation. She is one of the founding members - and Editorial Board member - of the open access Journal of Audiovisual Translation. In 2019, she was Fulbright Distinguished Chair at the University of Pittsburgh and is now part of the international Fulbright evaluation team. She was Visiting Lecturer at Roehampton University, London, and since 2013, she lectures on audiovisual translation and accessibility at the Venice Film Festival. She currently supervises many accessibility projects throughout Italy and in 2021, she delivered a TED X talk on accessibility and inclusion.

Email: [please login or register to view author's email address]

©inTRAlinea & Elena Di Giovanni (2014).
"Audio introduction meets audio description: an Italian experiment"
inTRAlinea Special Issue: Across Screens Across Boundaries
Edited by: Rosa Maria Bollettieri Bosinelli, Elena Di Giovanni & Linda Rossato
This article can be freely reproduced under Creative Commons License.
Stable URL:

Go to top of page