Multisemiotic Transcriptions as Film Referencing Systems

By Anthony Baldry (University of Messina, Italy)

Abstract & Keywords

Film analysts often rely in their work on the transcripts that fans have produced for a TV series and made available online. However invaluable the labour of love of these dedicated aficionados may be, film analysts’ transcript requirements are not fully met by this type of transcript. Existing online transcriptions of The West Wing TV series are a good example of the difficulties that arise when using them, all of which raises questions about the imbalanced nature of the referencing systems that are used in TV series transcripts. Why, on the one hand, is referencing to characters so systematic and accurate, while reference to time, place and theme at best sporadic? Can transcripts be made more suited to analysts’ needs? Can transcript culture be strengthened? The article investigates these issues proposing new types of transcript that film analysts could usefully use, from both episode and series perspectives, in their investigations of TV series. The paper bases its arguments on detailed comparisons between TV series transcripts and other related genres and concludes that developing a better theoretical framework for the TV transcript genre than those currently available is an essential premise to its future developments as a useful tool for film analysts.

Keywords: audiovisual translation, multimedia translation

©inTRAlinea & Anthony Baldry (2016).
"Multisemiotic Transcriptions as Film Referencing Systems"
inTRAlinea Special Issue: A Text of Many Colours – translating The West Wing
Edited by: Christopher Taylor
This article can be freely reproduced under Creative Commons License.
Permanent URL:

1. Introduction

Transcripts of TV series have seldom been the subject of critical analysis on the part of film analysts despite the use they make of them. Film analysts are, of course, a heterogeneous and constantly growing group, each with different transcript requirements. They include: film critics, as well as those who use TV transcripts for corpus analysis, subtitling, dubbing and audio description. To this, we need to add an increasing number of teachers (foreign language teachers in particular) who use film and TV series transcripts in their classroom teaching and for their students’ project work (Sindoni, 2011, Coccetta, 2016 in press).

However, by far the greatest contingent of film analysts are film buffs and TV series fans. In particular, those of them who write about a TV series or a film in their blogs, forums, wikis and fanzines all rely on transcripts in various forms whether pre-existing transcripts written by others for a specific TV series or simply their own notes containing dialogue snippets they themselves have transcribed. The need to refer accurately to what is said in an episode – often accompanied by admissions that certain details have not been checked – is a further indication of fandom’s understanding of the significance of transcription and, in the author’s opinion, a disguised plea for better transcription practices.

Certainly, fans’ TV episode recaps point to a strong awareness of transcripts and transcribing. Significantly, such recaps are a hybrid genre inspired by visual-verbal precursor genres from the pre-digital age. They owe much to fotoromanzi, the soap opera magazines that allowed you to while away the time during pre-digital train journeys and other forms of travel. However, revisiting a TV series is also a powerful, heart-tugging exercise more in tune with scrapbooks, and the ‘lost’ memories they evoke, when rediscovered after many years in a loft or garage cupboard. As we shall see below, fans' recaps have considerably affected the way online transcripts are evolving as a genre.

Given fandom’s intense activity, it is hardly surprising that the majority of post-airdate TV series transcripts, posted on the Internet, are produced by the fans themselves for use by other fans. The transcript of the In Excelsis Deo’ episode described in this article[1], like other transcriptions of the episodes in The West Wing series, is no exception to this ‘rule’ and is a product of the digital world of virtual communities.

As well as containing the words actually uttered in a particular episode, these transcripts – referred to below as episode transcripts – contain a surprising amount of ‘extra’ information that makes them an invaluable tool for all film analysts: just as no telephone works without a telephone directory and no writer works without dictionaries or reference books, so no film analyst – whether a young kid working on a school media project or a journalist commenting on last night’s episode of a famous TV series – will want to be without a film transcript as a backup, that at the very least, allows significant lines from an episode to be cut-and-pasted rather than (re)transcribed.

Nevertheless, however praiseworthy they may be, fans’ episode transcripts could do a much better job. While their character-related referencing ‒ i.e. who says what ‒ may be systematic and meticulously accurate, on the other hand, other types of referencing are, at best, sporadic. Despite their merits, we should not be blinded by episode transcripts’ limitations. While the growth of the TV transcript genre, as we shall see below, has caused it to distance itself from other types of transcription, such as radio broadcasts, the TV transcript genre has not yet entirely shaken off the conventions inherited from precursor forms of transcript. Three weaknesses, summarised in the next three subsections, stand out.

1.1  Timepoint referencing

While screenplays and film scripts, cannot, by definition, pinpoint the exact moment when an exchange takes place, since they have not yet been turned into films, on the contrary, the mere fact of being post-airdate implies that TV transcripts could include such timepointing. Invariably, they never do.

This first type of omission relates to the way in which what is written in a transcript links up with what is said in the TV episode – or rather the difficulties that the absence of any such links creates for analysts. For example, an analyst might, want to understand how things are said, for example, how an actor’s skilled use of voice quality, facial expressions, eye movements and hand gestures interprets specific words. The potential for enlightenment, and fun, in doing so is, alas, killed off by the difficulties of tracking down where specific words in the transcript are actually uttered in the video. It takes, on average, eight minutes to find the exact location in the West Wing ‘Pilot’ video of the words “rode his bicycle into a tree” and to discover the emotional tones – irony, laughter, sarcasm – with they are actually delivered in the video. No wonder fans’ recaps and podcasts about The West Wing contain apologies about not checking up on specific details. Unconvinced? Use the– link to see how quickly you can find the exact point where these words are pronounced.

Now ask yourself, given that presents — goldfish and books in the ‘In Excelsis Deo’ episode— are used in The West Wing to construct affective relationships, how long would it take you to reconstruct this particular strategy when examining gender relationships in an entire episode or a whole series? Certainly, transcript tracking — the search-based study of the patterned nature of wordings explored in Part II of this article — requires little time or skill. Word searches, e.g. in the database of pdf transcripts on the West Wing Transcripts website, are all that is needed to identify the whereabouts of all mentions of, say, ‘goldfish’ in The West Wing series and to demonstrate that they are associated with the flirting that goes on between CJ and Danny. Goldfish, it turns out, are sexier than you might think.

While flirtatious overtures are by definition part of all TV soaps, political or otherwise, they are not merely linguistic in nature, involving instead, as already suggested, actors’ skilled use of voice quality, facial expressions and posture. Without a transcript that identifies the exact points where goldfish, and other such devices, occur in a video, video tracking of such overtures (again discussed below in Part II) – takes a very long time. Still unconvinced?

Try reconstructing Danny and C.J.’s flirting in the ‘In Excelsis Deo’ episode using the following link:

Alas, the ties between transcripts and the videos they transcribe form a far from perfect marriage. Regardless of whether you are investigating The West Wing, Dr. House, Dr. Kildare or Dr. Who, or any other TV series, you will conclude that using a traditional post-airdate transcript to carry out comparative examination of episodes entails constant yet rather awkward switching between video and transcript. Even though in the goldfish case, only three episodes are involved in the entire series, the absence of timepointing – a vital resource when cross-referencing written and spoken forms of dialogue – hinders attempts to understand how verbal exchanges, gestures, mutual gaze, laughter and much more besides are cross-modally blended in a particular episode’s interactions.

1.2. Episode or series? The effects of a TV series’ cult status on transcripts

Fandom knows best. While a TV lecture makes its cultural impact in a single ‘go’, a TV soap (political or otherwise) does so over many years – in some cases over a lifetime. So, fandom means talking about an entire series – not just about individual episodes. This changes the focus from what people say, to what they stand for ‒ their short and long-term convictions and attitudes on particular social issues [2].

Transcripts need to take in the series perspective and do, in part, already do so. The Raspberry Lime Ricki blog ( is a good example of the shift from word-based to theme-based transcription that this perspective entails. In keeping with its mission statement, it provides sweet, tart, and refreshing insights into sexism for the first sixteen episodes of The West Wing, using a misogyny meter to award plus and minus points. The verdict for the ‘In Excelsis Deo’ episode is:

“Total Misogyny Points: 37. A pretty heavily misogynistic episode”

This type of transcription targets episode meanings rather than character wordings. Within each episode recap, the goal is to track sexist/non-sexist perspectives on women by applying objective criteria, such as the Bechdel test, in a systematic way. The blog’s incorporation of a search mechanism ‒ ‒ allows Ricki’s individual recaps to be tied together, thus transcending the episode perspective and aligning transcription with the series perspective.

Ricki’s own dilemmas about her transcribing technique are eloquent testimony to the changes afoot in transcription practices and fandom’s awareness of this issue:

I started watching and taking notes. Then I went downstairs and watched while doing dishes, so I couldn’t take notes, so I tried to remember after the fact what I’d just watched, and then I started over, and then I took some notes in one notebook and some notes in another and I know a bunch of you are going, “Notes? She’s taking, like, pen and paper notes? For a blog post?” Yes. Hello. I’m a nerd. Welcome to my blog. […] And then I was like, there’s no way I’ll be able to fit the series in one blog post. How about I do season by season? And then I was like, that’s crazy, right?

1.3. Visual and verbal referencing in transcripts

New needs, new types of transcripts. Fans are game-changers in this respect as the third and final ‘issue’ regarding omissions in TV series episode transcripts underscores. Ask yourself: as a film analyst, do you think visually or linguistically about films? Traditionally the transcript genre is conceived of as discourse-oriented, despite the fact that, somewhat ironically, TV series transcripts relate to a story being told visually. As well as a paradox, this is a worrisome omission.

But not for fandom. Despite the risks of possible copyright infringement, fandom’s outlook on this matter can be summarised by the slogan Recap and screencap! Take, for example, the Persephone Magazine. Women. Pop culture. News. Unicorns site, which self-describes as “an online destination for bookish, clever women around the world”. Note the three-way division of the post entitled Ladyghosts: The West Wing, Season 1.10 “In Excelsis Deo”

The first and third parts respectively contain ten and three photos – all presented as realistic moments of high tension and conflict, such as when Laurie shouts in Scene 17 at Josh and Sam to get out of her house.

The second part, on the other hand, relates to Scene 10 in which Mrs. Landingham tells Charlie about the death of her twin sons. This consists of a single still embedded in a (partial) written transcript of the scene. Both are tied together cross-modally by the writer’s comments: “Are we all feeling warm and fuzzy now? Very good. I’ve got you all warmed up so I can yank the rug out from under you” after which the recap focuses on the photo’s highlighting of Mrs. Landingham’s sad eyes – a detail that instigates a change in our perception of the depth of the episode’s (and recap’s) emotional force.

As the Persephone example shows, had the visual element been suppressed − regularly the case with episode transcripts − desired messages could not have been made. So recap and screencap posts come in ‘various shapes and sizes’ but invariably rely on an episode transcript both as a backup and as a source for quotations. Collectively, recap and screencap posts are also an admission of the need for visual-verbal transcripts. However minimally, by linking fragments of online transcripts to photos, they represent a start to the development of multisemiotic TV series transcripts. As such, they, too, are a genre-changing strategy. Further examples that pertain to the ‘In Excelsis Deo’ episode, such as posts that embed a video clip from the ‘In Excelsis Deo’ episode, are mentioned below in Part II.

1.4. What this article attempts to do

The TV series transcript is a much-neglected genre deserving more attention than has been the case. In particular, English linguistics needs to eat humble pie and learn from fandom’s example, which has sent out many signals as regards transcription practices – some clearly intentional, others perhaps less so. Collectively, fandom’s experimentation with new forms of transcript creates new perspectives on TV series and different ways of interpreting their social and cultural impact.

It also raises a basic question: whether – as teachers of English, as text and genre analysts, or ‘simply’ as inspired fans – film analysts can really rely on the partial, stop-gap solutions to transcription that, as indicated above, fandom has invented. Are not more thorough, more systematic solutions required?

Below, with reference to the ‘In Excelsis Deo’ episode, we attempt to address ways in which the transcript experiments described above can be turned into more systematic ‘solutions’ that benefit all film analysts. In other words, research into what TV transcripts are, and what they might be, is viewed as beneficial not only within the confines of specific academic disciplines, such as English linguistics or translation studies, but hopefully well beyond.

Accordingly, Part I of this article (Sections 2 and 3) is concerned with transcript design vis-à-vis individual episodes. In particular, it provides a detailed investigation of some of the possible additional transcripts that can be envisaged for this episode of The West Wing and, by extension, other episodes in this and other TV series.

Part II is concerned with the role that a specific episode plays within a series. All the transcripts proposed are envisaged as parts of series-oriented storyboards. A storyboard is an advanced form of transcript, a dynamic ‘product’ obtained when using simple software tools to interpret codings embedded in separate transcripts. Individual transcripts can thus assembled, dynamically, in different combinations according to the perspective required, and are particularly useful when aggregating data from an entire TV series. Thus the second part of the paper (Sections 4 and 5) investigates the empowerment that series-level analysis brings, suggesting that this dictates the need for a further round of research to establish whether new forms of transcript can help consolidate, fine-grained comparative studies of TV series.

Part III (Sections 6 and 7) contains a brief discussion of, and conclusions about, how a transcript needs to be defined in the Internet age and why transcript theory needs to be worked on more fully.

Part I: New types of transcript: episode level

2. Solving the timepointing issue

The introduction of systematic timepointing can perhaps be best illustrated in terms of three levels of analysis: phase, scene and mini-scene timepointing. Each of these levels reuses (and extends) information in existing episode transcripts, suggesting that, with a little more effort, much more could be achieved when using a transcript.

2.1. Phase timepointing

We may begin with an analysis of Figure 1, an example of a phase transcript. This transcript’s function is to provide an overview of an episode’s major phases that includes reference to the points in the episode video where these occur. The example shown in Figure 1 has added timepoints to information found ‒ though in a rather ‘hidden’ way ‒ in the episode transcript’s Teaser-and-Act referencing system. Specifically, the words in Column 2 have been ‘lifted’ directly from this transcript.

Figure 1 uses the term phases in keeping with phase theory, which points to the organisation of a film in terms of its meaning (Baldry and Thibault, 2001, Baldry 2004 [2015]). This type of transcript functions as a simple and initial overview, a very basic where-to-find it guide. It might well contribute to series level studies in research and teaching projects as it allows overall episode structure to be compared in an entire series or across series, for example, the nature and incidence of night vs. day scenes, camera technique, lighting resources used and so on.

Figure 1

Figure 1: A phase timepoint transcript

However, this is not its main goal. Its real purpose, instead, is as an adjunct to an existing online transcript, a quick-and-dirty solution to avoid the classroom embarrassment and hassle often experienced, for example, by English language teachers when they need to find a specific point in a video during a classroom lesson. As such, it is easily and, above all, quickly constructed. It is a simple a timepointing tool that reuses as much existing information as possible. It includes descriptions of settings, taken from the episode transcript, that teachers (and others) could re-use.

However, Figure 1 clearly illustrates the uneven nature of many online episode transcripts: not everything that should be transcribed, is transcribed. In this case, the transcription of the Prologue is missing, i.e. the initial part of every episode in The West Wing series, where an off-screen speaker announces: Previously on the West Wing and which proceeds with a 30-second visual-verbal recap of previous episodes.

Unlike the credits, containing only written discourse, the Prologue phase in The West Wing includes verbal exchanges – which a transcript ought, by definition and tradition, to transcribe. This omission is all the more surprising as prologues provide thematic continuities that are part of a TV’s soap’s raison d’être.

To highlight this problem, the first row in the transcript in Figure 1 has been intentionally left blank. Given their significance in the series level perspective, Prologues are further discussed in Part II (Section 5) as a way of filling this particular ‘blank’.

2.2. Scene timepointing

Even though they take longer to construct, the majority of film analysts will prefer to use a transcript that incorporates the more traditional notion of scenes. Figure 2 is thus a scene transcript that reconstructs this episode in this way, bearing in mind the need to provide precise timepointing – a must for many analysts. At the very least, a scene transcript provides a support for, if not an alternative to, film analysts’ use of descriptive labels such as the Feliz Navidad scene or the Bookstore scene to identify the various parts of this episode. While the latter are fine when discussing specific scenes in specific episodes, the former are more likely to prevail when the referencing of scene types (such as those relating to Christmas festivities and buying presents) needs to be made across an entire series.


Figure 2

Figure 2: A scene timepoint transcript with location references

However, when constructing scene timepoints (see Figure 2), we come up against another typical ‘omission’ of episode transcripts: the individual scenes are not indexed numerically. However, “All is not lost” (Milton Paradise Lost, Book 1, Line 106). Though not in a numerical form, referencing of each individual scene exists de facto in all the one hundred and fifty-odd episode transcripts on the West Wing Transcripts site, as they contain ‘CUT TO:’ and ‘FADE IN’ ‘markers’ that implicitly define where one scene ends and where a new one starts.  

Meant to be a descriptive device, they are a convenient and time-saving metatextual shortcut when constructing an index for individual scenes in an episode. There are sixteen CUT TO ‘references’ in the transcript in this episode as well as five FADE IN ‘references’ whose existence makes it possible to number scenes using a simple search-and-mark procedure. On this basis, it also becomes possible to work out timepoints and construct a scene transcript such as the one shown in Figure 2 in a systematic way.

A brief comment on the omission of scene numbering in TV series episode transcripts is in order as it appears to be at odds with the tradition in the humanities, which uses scene numbering in printed plays and critical commentaries. No theatre critic would, for example, reference Out, damn’d spot! Out, I say! with a wording that ran along the lines of: ‘part of an imaginary conversation-cum-flashback scene towards the end of the play recalling the night Lady Macbeth and her husband conspired to murder King Duncan in which she incriminates herself to eavesdroppers and reveals the pangs of conscience she had ridiculed in her husband’. As a reference system Macbeth Act 5, Scene 1, Line 26 has stood the test of time in many literary forms including poetry as the reference to Milton above shows.

Bringing scene numbering and scene timepointing together in a single scene transcript makes life much easier though it is, alas, unlikely that fandom will adopt such indexing until the series perspective in transcribing is more fully established. This is despite the fact that we are merely taking a leaf out film-making’s book, where script supervisors (a.k.a. continuity supervisors) are employed to do this housekeeping work:

“recording and accessing all information regarding the screenplay and any scenes which have already been shot.”[It is is their job is] “to keep running totals of scene timings and pages counts in order that the script runs approximately to overall time.”  (

A final comment on this issue. Inherent in the traditional notion of scene is the idea of location. CUT TO and FADE IN ‘references’ are immediately followed in the online transcript by locational co-texts, making it relatively easy to extract the information provided and re-present it systematically in the form of a Location Index.

One such index is embedded as a legend in Figure 2’s scene transcript. It  uses abbreviations that facilitate digital searches across an entire series — useful as described in Section 4 for film analysts whose investigations into scene types in an entire TV series might well start with location patternings (e.g. day vs. night, indoors vs. outdoors and so on).

Hence, potentially as a transcript model, the scene transcript goes beyond the mere emulation of the ‘textual housekeeping’ that characterises film-making and many literary publications and becomes, instead, part of the series level perspective further discussed below in Part II.

2.3. Mini-scene timepointing

When does a scene transcript stop being an adequate solution to the timepointing needs described in Section 1? When do we need a further support? An answer to this question is partly dependent on how deep you need to dig into the dialogue in a TV series. In this respect, the third and final timepointing referencing system proposed here relates to mini-scenes. That the mini-scene is central to The West Wing series is beyond dispute. How the term mini-scene is defined and who needs to analyse them — both essential premises for such transcriptions — is much less clear.

A mini-scene timepoint transcript is likely to be valuable for those researching into text and film theory as it involves a level of detail mostly designed for those concerned with the textual models that underlie the type of discourse used in a TV series. But beware: while it might be thought that only text theorists will explore these lofty heights, the level of detail reached in fans’ discourse analysis says otherwise and represents a constant surprise and example that theorists should heed.

For those unfamiliar with The West Wing series, The West Wing TV series represents an innovation in filming with scenes made up of a series of ‘mini-scenes’ that share the same location:

Historically, dramas devoted an entire scene to communicating a single new plot development, and thus the show moved forward one step at time, one scene at a time. Giving a whole scene to every significant character interchange would make it almost impossible for The West Wing to do multiple plotlines. Therefore Sorkin and Cleveland choose to break single scenes into separate dramatic ‘mini-scenes’ that are unrelated to each other narratively but which share the same time and space. Characters come and go in pairs, each couple intent on its own conversation but each occupying center stage for only a brief moment. (Smith 2003: 127-8)

In other words, The West Wing series is based on a ‘two’s company, three’s a crowd’ principle, likely to fascinate the fans – and not just those who apply the Bechdel test to find out whether an episode is ‘politically correct’ i.e. whether it contains at least one scene featuring at least two women talking to each other about something other than a man.

This organisational principle raises many questions: Are there never three interactants participating somewhat vocally in an exchange (e.g. bawling out at each other)? Are there never any introspective soliloquies? Is there a ‘three’s company’ rule for visual presence offsetting the ‘two’s company rule’ applied to verbal interaction? What about those mini-scenes where there is no interaction at all? 

Figure 3

Figure 3: A mini-scene timepoint transcript

Figure 3 is a mini-scene transcript (Figure 3) that provides the detail needed to clear up these issues.

First, as we are dealing with people rather than places, this type of transcription re-uses data in the episode transcript to build up a Character Index (Figure 4) in the manner used to create a Location Index. Again, it uses initials rather than full names given the visual convenience that this brings when analysing such transcripts, dynamically, with software tools such as Microsoft Excel or, better still, Microsoft Access with its relational characteristics that accord well with the storyboard principle of assembling specialist transcript tables in diverse ways.

Figure 4 shows how interactions, and turn-taking patterns, can be expressed in terms of abbreviating formulas as CJ+D, in this case a mini-scene consisting of exchanges between CJ Cregg and Danny, where the abbreviation before the plus sign indicates the first character to speak, while the character shown after the plus sign relates to who is addressed and, by implication, who normally continues or completes the verbal interaction. By exclusion, the notation means that the interactants identified in this way are the only ones to speak – though, of course, quite frequently they will not be the only characters present during a mini-scene.

Image 4

Figure 4: Character Index

Already, the Character Index points to the need to envisage more complex dyadic structures than a simple ‘two’s company principle’ as individuals address groups. These interactions can, nevertheless, be transcribed in the same way – as CJ+R, B+CS and B+K where R refers to reporters, CS to carol singers and K to (school) kids and so on.

Additionally, thanks to the Number[Number] annotation, mini-scenes (and their turn-taking patterns) are associated in this type of transcript with scene structure. In this annotation, the number inside the square bracket refers to mini-scene numbering while the number outside relates to the numbering given in the scene transcript (Figure 2). This tells us, for example, that Scene 1 consists of 4 mini-scenes, numbered 1[1] CJ+DN; 1[2] S+LA; 1[3] L+B, 1[4] J+L. Using the Character Index, it is now easy to work out that this particular scene consists of four different pairs of speakers.

Dyad types can also be transcribed. Orange background colouring has been used to indicate combinations of female speakers, which, as Ricki points out, allows this particular episode to pass the Bechdel test. She mentioned one of these – transcribed here as 2[1] MA+CJ –  where Mandy and CJ discuss Dickensian costumes. However, it is not the only one. There is just one other, 5[2] CJ+BR, where CJ and Bobbi discuss hate crime legislation.

A mini-scene transcript thus facilitates analysis of thorny social issues. For example, it helps make the point that while there are just two women-only mini-scenes in this episode, in contrast there many men-only mini-scenes highlighted in bluish-grey. Surprised by this? Maybe not.

Note, however, how background colouring, undertaken in this case manually, but achievable automatically with the right software tools, makes hidden patterning clearly visible. In other words, the reference to ‘multisemiotic transcriptions’ in this article’s title relates, in particular, to the resources used, in addition to language, at the metatextual referencing level, an innovation in TV series transcription whose theoretical status is further discussed in Part III.

Other examples of this form of referencing – a combination of annotational symbols, colouring and the resources provided by tables – are used in the mini-scene transcript shown in Figure 3. They allows us to see, at a glance, that not all mini-scenes in the episode are interactional.

These non-interactional mini-scenes – presented in Figure 3 with background highlighting in green and labelled as [0] for ease of identification in the episode’s overall mini-scene patterning – are scene setters. They frame a scene ensuring the subsequent interactional mini-scenes are bound to each other. Figure 5 gives the full list of types of scene-setters, the most frequent type being establishing shots.

 Image 5

Figure 5: Scene-setter index

The first non-interactional mini-scene, labelled 1[0] PLG, is a good example of the scene setting function of this type of mini-scene. Intriguingly, the desired continuity in Scene 1 – where PLG stands for Prologue – comes not from identification with a specific place or specific time (the classic way of defining a scene) but instead from its recapping of previous episodes. Nevertheless, it is still, a scene setter. It is also non-interactional in the sense that the wording Previously on the West Wing, which follows the combined audio and visual logos (drumbeat + US flag shown against the White House silhouette) is directed to viewers and is not part of an exchange involving the episode’s characters.

Highlighting scene setters visually is a device that allows us to further detect breaks in expected patterns. Indeed, we can see at a glance that six scenes – 5, 6, 8, 12, 19 and 20 – do not begin with a scene setter that ‘smooths’ our way in. Given that a mini-scene transcript, like the other transcripts presented so far, is also a timepoint-based transcript, it is now easier to work out why this is the case.

Take, for instance, Scene 6 which begins abruptly, precisely as it is designed to highlight Bartlet’s irritation with Mandy’s insistence on PR work with journalists at a time when he is simply longing for Yuletide peace and quiet. In other words, the basic unmarked NIMS^IMS pattern in (i.e. non-interactional mini-scenes followed by interactional mini-scenes) is broken up by a marked IMS^IMS pattern. The latter deliberately, and meaningfully, punctuates the former in a contrapuntal way – one of the dialogic ‘special effects’ used throughout The West Wing series.

Similar marked vs. unmarked patterning applies to the small number of cases — sixteen out of over a hundred interactional mini-scenes — where the talk is between three rather than two people. Represented in Figure 3 with background pink highlighting, their distribution is no longer hidden and is instead clear at a glance. We immediately note that their highest incidence occurs — rather unsurprisingly — in Scene 17 where essentially Sam, Josh and Laurie take turns at shouting at each other. Even so, the calmer moments in this scene involve a single addressor and a single addressee, even though three characters are visually present throughout.

On the contrary, in Scene 15, despite the large number of mini-scenes — eleven in all — there are no three-way exchanges even though three people are present for much of its duration. Figure 6 highlights the disjunctive nature (Baldry, 2000) of visual and linguistic interactions in this scene, one of the finest in the episode, and nicely illustrates a basic strategy in The West Wing’s structure: the more the verbal pulls people apart dividing them into two’s, the more the visual pulls them together bonding them into groups.

The fine balancing act between disjunctive and conjunctive elements when analysing what goes on in the verbal and what goes on in the visual (Baldry 2000) may be further appreciated from the analysis given by Charles Papert on the website as regards the shot plans associated with Scene 2 in the In Excelsis Deo episode usually referred to as the Feliz Navidad scene:

The roundy-round at the beginning required very specific timing to get certain lines delivered on-camera--you can see Rob Lowe “helping” clear himself twice. Once off and running through the lobby, it’s a dodging match of Xmas trees, ‘xtras and, uh, xylophones (?) until we get into the relative peace of the final hallway.
On a number of occasions such as this one I had to fly the rig sideways through doorways; although the doors were built slightly oversized from standard, this still left just a few inches on either side of mattebox and mag. With the rig in front of me I would take my best shot and close my eyes as we passed the threshold, hoping we wouldn't clip off any expensive Panabits! (Charles Papert “Feliz Navidad”, Retrieved 17.05.2016)

What this steadicam operator – but also experienced film analyst – calls roundy-round shooting, functions to create a ‘dust cloud’ of characters visually twirling around each other and linguistically quibbling about relatively trivial themes. The mini-scenes in question 2[1-5] involve both three-party and two-party interactions. The whirling and twirling comes to an abrupt end when the characters filing down corridor, as a single visual group, make a sudden about-turn, again as a single visual group when they learn that the Washington police are looking for Toby (mini-scene 2[6]). After, this momentary excursion into something more serious the subsequent mini-scenes 2[7-10] return to trivial themes that culminate in the ‘storm-in-a-teacup conflict’ created by secret service names.

Analysts exploring visual-verbal contrasts in The West Wing, will notice the contrapuntal, disruptive rhythm of Toby’s constant and rather nervous head turning in many parts of Scene 15 e.g. in MS: 15[6-8] reproduced in Figure 6. Sometimes, this occurs in relation to the person being addressed. Sometimes his head turns towards the third person present. Most significantly, at other times he looks at neither, as if stressed out by the circumstances and not knowing who to look at or who to talk to next. In their podcast for this episode – – Andrea Howat and Sallie Gregory, rightly speak of Toby’s ‘visual stumbling’ in this scene, speculating that it contributed to bringing actor Richard Schiff an Emmy award for his interpretation of Toby in this episode.

Figure 6

Figure 6: A fragment from an episode transcript recast as a mini-scene transcript

Figure 6 shows how data from the various transcripts can now be combined and presented systematically in a way that enhances the usefulness of the original episode transcript. It includes scene and mini-scene numbering (MS), timepoint referencing (TP) and indications about the interactants. Note that, the reference to George: (G) in a red font and round brackets indicates where George talks to himself, a break in the overall dialogic pattern in this episode that otherwise rigorously respects turn-taking rules. It is another dialogic special effect, in this case indicating the nature of George’s rambling mind and the tenderness and delicacy that Toby, usually aggressive and blunt in his interactions, needs to adopt when talking to him.

With its focus on the timepointing of mini-scenes, the mini-scene transcript has the merit of facilitating the analyst’s job of understanding the relationship between micro and macro levels in this episode, in other words how details fit into the overall scheme of things. Indeed, we might well conclude that the mini-scene transcript has systematically formalised critical insights that various analysts have perceived vis-à-vis the dialogic organisation of The West Wing series.

Overall, the dialogic structure is characterised by constant change in partners in the very brief short-lived encounters that make up The West Wing’s episodes:

The tracking shot in the West Wing often begins by focusing on a small action, often by a bit player (e.g., carrying a gift basket) or on an object (a small wall decoration). The camera almost immediately picks up one or two of the central characters moving through the White House office space. The camera follows as a couple of principal players march quickly through the hallways, discussing one or more topics. Then one of the characters forks off and is almost immediately replaced by another principal, who initiates another discussion. […] In this way the camera stages a series of pas de deux, with partners pairing off and then cutting in in one prolonged dance. (Smith, 2003:131)

This view tends to assume that the mini-scene is defined by changes in interactional partners. Alas, confounding this perspective is the fact that, with respect to the following or previous mini-scene, about twenty percent of the mini-scenes listed in Figure 3 do not involve such a change. Thanks to the use of yellow background highlighting and italicised purple fonts, this type of mini-scene, its distribution in the episode, and the theoretical inconsistencies this entails can now be spotted ‘a mile off’.

A definition of mini-scene based solely on changes in interactional partners thus runs counter to the overall dynamics of the discourse that pervades the entire episode and series. One example is Scene 7, which adopts the same interactional partners throughout. We might be tempted to argue that this is the one case in the episode where there are no mini-scenes. However, this does not stack up, as there are other scenes where exactly the same happens e.g. Scene 10 and yet others, such as Scenes 13 and 16, where there are only minimal changes in interactional partners, usually a scene closure device at the tail-end of the overall scene.

However, the matter is cleared if we accept that topic-changing by interactants also counts in the definition of a mini-scene – and that it is part of the ‘special dialogic effects’ deployed, as we have hinted at above, in this and other episodes in The West Wing series. On this point, Smith states:

While this reliance on mini-scene is characteristic of The West Wing it is not unique to the series. John Wells’s ER, for instance, also uses mini-scenes to break up the business of an individual scene into separate conversations. However, Sorkin’s dialogue in The West Wing and in his other show Sports Night (1998-2000) is distinctive in that it breaks up single conversation between two characters into multiple topics, thereby conveying information quickly while mirroring the complexity of the West Wing world. (Smith, 2003, p. 128, my emphasis)

This begs the question: Is a mini-scene a structural unit, based on who does the talking, or a functional unit based on what they say and mean? In the following section, we will argue that a mini-scene is usually both, but that occasionally it will be only one or the other. Specifically we will argue that the mini-scenes in this episode can be classified into sub-types, according to the degree to which they are structural and/or functional. In other words, this presupposes a division into: interactions that involve only partner changing vis-à-vis the previous mini-scene; those that involve only topic changing vis-à-vis the previous mini-scene and those (the vast majority) that involve both.

We will also argue that the more a mini-scene acts as a functional unit, the more it will focus on high drama i.e. unplanned moments of contrast and conflict that upset the daily routine in the White House, while the more it is a structural unit, the more it will help planned events to flow smoothly and happily along in the episode. However, in order to grasp this nettle fully, we need a map of how the themes that underlie events in this episode play out.

3. Towards theme-based transcripts

The previous section called for a thematic episode transcript. However, before we discuss the uses to which such a transcript can be put, we need to explain how it is constructed. Figure 7 is a Theme index, providing a summary description of each theme used in Figure 8, which plots out the distribution of interactional mini-scenes in this episode in terms of nine event-related themes – a map of how the multiple plotlines play out in a dynamic storyboard that reveals the way plotlines in one episode interweave with those in the rest of the series.

 Figure 7

Figure 7: Theme Index 

Figure 8 is such a theme transcript. Justifying the selection of themes and their distribution in an episode is never any easy task. What do we include and, more to the point, what do we exclude? Misogyny, for example, though clearly present is, not as it were, a theme that was designed to be pursued in the episode. However, gender relationships and sexually oriented encounters/discourse were intended. Hard as it is to distinguish between intended and unintended, various clues show that such selection is likely to be more objective and less arbitrary than at first sight might seem to be the case. Again, the transcripts we have so far developed give a helping hand as they provide the necessary statistical basis for selection.

First, if we accept that the Prologue is indicative of series level thematics – i.e. beyond what will happen just in the current episode – then we have to include Themes 3 and 6. Similarly, we can use frequency as a criterion for selection. As the bottom line in Figure 8 suggests, Theme 1 accounts for a third of the overall mini-scenes. If we further take first mention into consideration, on the grounds that the first topic in a scene’s interactional encounters is likely to be most significant, we need to include Themes 2 and 4.

This leaves Themes 5, 7, 8 and 9. The discussion about secret service names (Theme 5) and the millennium (Theme 7) are significant precisely because of their triviality. They act, structurally, as dialogic counterfoils to moments that are more serious.

 Figure 8

Figure 8: A theme transcript with mini-scene distribution

Of all the themes, Theme 9, is the least frequent: just one mini-scene 10[2] in which Mrs. Landingham talks about the death of her twin sons. We could, in theory, lump this together with other themes that deal with death, namely Themes 2 and 4. However, we have already noted one fan’s comments on the heart-tugging that goes on in this mini-scene. Another is Howat’s and Gregory’s description of it in their podcast as a ‘killer backstory’.

We have thus included it separately, as it is a very special ‘theme-cum-scene’, both structurally and functionally. It is the point that comes closest to a soliloquy in the entire episode, as Charlie is more an embarrassed listener than a true interactant. It also sets up the linkage with Themes 2 and 4 as Mrs. Landingham poignantly requests Toby to let her come with him to the war veteran’s funeral, the episode’s climax.

This leaves Theme 8 which we have called leave-taking, perhaps the hardest theme of all to justify. Note, however, that while the characters in The West Wing are portrayed as very powerful people, they constantly express a desire to get out of the White House and move into the real world. The stronger this desire, the harder it becomes − as the plot creates barriers that prevent them from doing so.

This is why Bartlet wants to sneak out to the bookstore, why Josh boasts he will be in Bermuda in twenty-four hours and why Toby, desperate to escape the triviality of the Xmas festivities in the White House, becomes impatient on the phone when information is not forthcoming about the more serious matters going on in the world outside.

In this episode, leave-taking is both a how-the-hell-do-we-get-out-of-here existential theme in its own right, as well as a structural device, an exeunt strategy, used by a long line of playwrights (Shakespeare instructs). A detailed mini-scene analysis helps us pursue thematic tracking and, in particular, helps us to understand how the dialogic structuring of mini-scenes interacts with the societal themes presented in the episode.

As Figure 9 (extracted from Figure 6) shows, Scene 9 uses all the different types of mini-scene we have identified above. The transitions from mini-scenes 9[1] to 9[2] and from 9[5] to 9[6] are distinguished from each by the theme change criterion while the transitions from 9[3], to 9[4] and then to 9[5] are distinguished from each other by the interactant partner change criterion. In this scene, only the transition from 9[2] to 9[3] is distinguished by both criteria.

 Figure 9

Figure 9: Thematic tracking

However, this type of transcript helps us to understand that it is not by chance that, in this episode, the words “exit strategy” echo all the way down the White House corridors as the dialogue unfolds. The walk-and-talk action is imbued with leave-taking, which, in part, explains why mini-scenes function in the way that they do. Just as there are scene setters, there are scene closers based on the notion of escape as Figure 10 shows.


Fifure 10

Figure 10: Mini-scene closure

Remarkably, there are also occasions in this episode where exit strategy is both part of the episode’s dialogic structure and its reflection on escapism. Take, for example, Scene 19, which begins abruptly (i.e. with no establishing shot) with a mini-scene involving CJ and the reporters.

Paradoxically, this scene opener is, in fact, a scene closure, a way of telling the reporters (and the audience) that the ‘lid is on’ and that the White House is shutting down for Christmas. However, note the hidden irony in the line shown in red characters, the subtlest of ways of pointing to the constant frustration of the President’s escapist desires and his permanent struggle with his staff in this respect.

Figure 11

Figure 11: Mini-scene exit strategy

These hidden patterns and their constant overlapping is the raison d’être of a post-airdate storyboard, which relies, however, on specialist transcripts that ultimately facilitate its identification of hidden meanings.

As mentioned above such a storyboard is a dynamic structure which uses combinations of the various transcripts we have illustrated so far to assemble patterns in a dynamic way. It helps us grasp the episode’s overall organisation within a TV series. In particular, it can highlight the conflict between public and private persona, the leitmotiv of the episode and the entire series, that many analysts – whether fans, teachers, subtitlers or critics – have tracked down and will, in all probability, want to track further.

Already a theme transcript facilitates tracking, within a single episode, helping us to understand, for example, how information about the stoning and subsequent death of a gay kid in Minnesota, and related ‘hate-crime’ legislation, unfolds in four steps distributed over six mini-scenes:

1) news of the stoning reaches the White House (4[4]);
2) the White House confirms its knowledge to the Press and agrees that crime legislation needs to be revisited (5[2]);
3) news of the boy’s death reaches the White House (8[6]);
4) disputes between the White House staff on how to react emerge, with C.J. in conflict with others, including Danny, a reporter, on the matter (9[1], 18[1,3], 19[2]).

A theme transcript also helps us understand how the three deaths described in the episode are ‘glued’ together in terms of three wars: the death on a park bench of a Korean War veteran; Mrs. Landingham’s twins killed in the Vietnam War and the death of a gay boy, a victim of what is presented as the USA’s domestic Crime War.

In other words, the theme transcript for this episode highlights public/private death and grieving contrasts and, in so doing, is a first step in understanding social hierarchies, a recurrent issue in the series. It does this by revealing how information management functions at both macro (thematic) and micro levels (mini-scene dialogues), as it records the structuring and sequencing of information within the episode’s entire framework.

Done manually this is a chore; however, experience shows that initial manual analysis quickly leads to computer-assisted shortcuts, making the task less daunting than might first appear to be the case. Indeed, a thematics-oriented post-airdate storyboard will be worth the candle if it helps us, for example, to pinpoint in what ways the thematic range of a TV series reflects the upheavals and changes in society in general.

Part II: New types of transcript: series level

In the first part of this paper, we analysed the benefits of introducing timepointing in transcripts that allow phase, scene, mini-scene and thematic perspectives to be explored.

Together they form a referencing system, which in its turn forms a possible basis for computer-based search systems that help analysts contextualize mini-scenes, and in particular specific types of mini-scenes across an entire TV series and thus within higher-order meaning-making units such as scenes, episodes and episode seasons.

When time-based, visually-oriented transcripts of this type will potentially help text analysts locate where a specific effect occurs (and, more significantly, where it recurs) in the overall structure of a film, where a specific transition pops up, how the scenes fit together and even how transcripts can help identify recurrent patterns in face-offs.

By face-offs, we mean confrontations involving real or potential face-threatening acts. As we can see from the dialogue snippets given above and below, face-offs are the hallmark of TV film series: from soap operas that portray everyday events and conflicts of family life to cartoon versions thereof, such as The Simpsons; from whodunit detective stories such as Colombo to science fiction series such as Star Trek and Dr. Who; and from medical dramas such as Dr. House and ER to political dramas such as The West Wing.

More complete referencing systems, and above all timepoints, represent a key move towards fine-grained cross-series comparisons that involve face-offs.

The referencing systems described in this article are, of course, designed to be digital and, therefore, accessible by Internet tools in ways that are compatible with the job that many film analysts have of comparing episodes within the same TV series, or with episodes from other TV series.

Many ideas for possible transcripts may well go beyond the digital resources currently available. As such, they have the status of blueprints for future tools. Even so, progress is being made in this respect. One such example is the MWS-Web tool being developed by a research team, including the author, which provides online supports for film analysts through its capacity to concordance online episode transcripts directly (i.e. without the need to download them) in ways that, as illustrated below, home in on underlying discourse patterns[3].

4. Exploring digital transcripts

Currently, in order to obtain an overview of the thematics of a TV series it is necessary to summarize an episode’s thematics and then link the summaries of individual episodes around a specific theme. This is what many fan sites do including the Raspberry Lime Ricki blog, The West Wing Wiki site, the IMDb ( and many others such as the West Wing Transcripts site.

The following summary is taken from the latter site:

As Christmas Eve approaches, President Bartlet (Martin Sheen) eagerly sneaks out of the White House for some last-minute Christmas shopping, while a haunted Toby (Richard Schiff) learns more about a forgotten Korean War hero who died alone on the district’s cold streets wearing a coat that Toby once donated to charity. In other hushed corridors, Sam (Rob Lowe) and Josh (Bradley Whitford) ignore Leo’s (John Spencer) advice and consult Sam’s call girl friend (Lisa Edelstein) concerning her confidential clientele when one political rival hints at exposing Leo’s previous drug problem. C.J. (Allison Janney) wonders aloud about the President’s public response to a notorious hate crime while her personal resolve weakens as persistent reporter, Danny (Timothy Busfield) continues to ask her out.

As Figures 8 and 12 show, this summary describes some of the thematics but not all. It fails in particular to describe their distribution over an episode or a series in relation to the characters who construct and interpret them. When reconstructing thematic patterning in an entire series, an alternative to these summaries is a post-airdate storyboard, a fragment of which is shown in Figure 12.

This shows a three-row representation of how the information given in Figure 8 might well be extracted by software to create a series-level thematic storyboard. Figure 12 gives only one such block, but when completed by similar blocks extracted, for example, from the twenty-two episodes that make up the West Wing’s first season, it becomes possible to grasp thematic patterning more fully.

Figure 12

Figure 12: A row from a series-level thematic storyboard

We thus have a start to a form of transcription that allows us to have the best of both worlds: thematic tracking at episode level but also at series level.

Even so it is still hard to spot Charlie’s ‘bad news’ function in all the three scenes in which he speaks in this episode namely: 1) when in 14[5], he tells the President it is time to leave and get back to duty; 2) when in 8[6] he informs the President about the gay boy’s death; 3) when in 10[2] he triggers Mrs. Landingham’s sad memories about her twin boys’ death on Xmas Eve 1970.

It is even harder to reconstruct the West Wing view of ethnic minorities, which we would expect to be significant in such a politically-oriented series. Charlie, of course, represents an ethnic minority. Apart from him, the only other coloured person with a speaking part in this episode, excluding seven-year old Jeffrey’s one-line exchange with Bartlet in 8[3], is the officer-cum-detective in 3[1] investigating into the war veteran’s death.

The association with death of both these characters is so specific in this episode as to raise suspicions as to whether such a correlation exists over the entire series. This is where a web concordancer ‘comes in handy’. Figure 13 reports part of a quick check-up search using MWS-Web’s concordancing functions that require no downloading of episode transcripts from the web. It thus exemplifies the possibility of associating two recurrent characters and the death theme, in this case Bartlet and Charlie and shows that in the entire West Wing series, there are only three examples of this association for each of these two characters, in other words, suggesting no racial bias at all.

Figure 13

Figure 13: Concordance comparison of co-texts in The West Wing series

At a series level, a storyboard such as the one illustrated in Figure 12, used in conjunction with other tools, allows patterns to be more easily spotted and checked out across an entire TV series. Unlike a transcription, multimodal or otherwise, which helps an analyst to explore a single text, a combined websearch and concordancing approach can help establish patterns that are common to a much larger set of multimodal texts (Baldry 2007:180).

This is just a first step towards understanding (inter)semiotic theme-based patternings in film texts. The construction of hypotheses are somewhat akin to the previsualizations mentioned below in Section 6. Armed with a post-airdate thematic transcript and a tool for searching and concordancing transcript archives, a film analyst now appears to be in a position to check for patterns that extend to the entire series and can think in terms of possible patternings.

This is a further confirmation of the validity of not discarding the current generation of post-airdate transcripts but, instead, of finding ways of using the information they contain to better effect. In this case, the existence of transcripts for entire TV series is essential.

However, this also raises the question about the role of post-airdate transcripts. Specifically, how can they become springboards for further analysis? One way of answering this question might be to explore the search functions of existing transcript archives, such as the West Wing Transcript resource, in terms of individual words or expressions. For example, we might investigate a prominent word in this episode: ‘flamingo’ using the West Wing Transcript’s internal search engine, which produces the result shown in Figure 14.

 Figure 14

Figure 14: Websearch results compared: searching with site tools

While a significant and welcome development, this type of search inevitably confirms the limitations of such sites. For example, no co-text is explicitly provided. The consequence is that, in order to establish that ‘flamingo’ is not a reference to an animal, colour, or other contextually-determined meaning in any of these cases, but is, instead, always the US security service’s code name for one of the characters, C.J., the analyst has to open up each transcript individually and scroll down the script in a pdf document searching for this keyword.

While this is a possible, though awkward, procedure that establishes that all three occurrences of the word ‘flamingo’ do indeed refer to C.J., it becomes an unmanageable solution when using keyword searching to reconstruct thematic structures in The West Wing TV series ‒ the more sophisticated type of searching mentioned above. A search for the word ‘death’, for example, reveals its presence some 70-odd times in The West Wing series, far too many for the analyst to handle by opening up individual transcripts.

A much better way of using episode transcripts lies in reconstructing thematic patternings in combination with the use of an online concordancing tool, such as MWS-Web, designed expressly to search entire web archives and to report findings in a much more complete way. Thus, when we refine the search to ‘Flamingo is a’, the result in the centre of Figure 15 is (correctly) returned by MWS-Web in just one step, without the need to open up many different pdf documents.

 Figure 15

Figure 15: Websearch results compared: searching with MWS-Web

Yet a further step is to compare this for completeness’ sake with a search for ‘Secret Service’ in other online post-airdate transcripts in order to track changing attitudes to Secret Services over time.

For example, in the fifty-odd years of its existence (1963-2013), and hundreds of episodes, an  MWS-Web search of transcripts ( for the entire Dr. Who series reveals only four such references.

Similarly, in the 24-year period (1987-2010) of transcripts for The Simpsons (, again a much longer airtime period than The West Wing (September 22, 1999, to May 14, 2006), there are only five references.

A further ‘twist’ is to compare these results for fictional stories with the real thing: the CNN’s Live Program, which comments on political events, much of it coming from the (real) White House ( In this case there is a much higher incidence of results over a much shorter period, which closely reflects (and in part explains and justifies) the ironic comments made in this episode as regards code names used by the US Secret Service.

Figure 16

Figure 16: Thematic comparisons: Dr. Who, The Simpsons and CNN Live Event

We may conclude that this type of investigation rewards hours of patient transcription. The combination of transcripts and online concordancing tools allows patterns across TV series and news sites to be established very quickly. Without online post-airdate episode transcripts, none of this would be possible, which is why building on them, rather than replacing them, is so important. Creating perspectives that might otherwise have remained hidden is a significant result of this approach.

5. Face-offs: visual and verbal transcription

The combination of tools described in the previous section hints at the possibility of using visual-verbal transcripts to explore significant social and ethical problems, effectively turning computer-assisted transcript analysis into a support for sociolinguistic analysis. Let us explore this hypothesis a little further. Let us suppose that one of a film analyst’s needs is to reconstruct gender and power relationships in a TV Series. How can we respond to the question: Men or women: who’s the boss in The West Wing?

On the upside, the resources available to film analysts on the Internet do not end with transcripts and video clips. Still images of face-to-face dyads appear on YouTube. They highlight the negotiation that takes place between two, rather than three or more, people in The West Wing series that has already been mentioned. They do so verbally and visually.

Web searches, such as the one reproduced in Figure 17, reveal the combined visual-verbal referencing widely used by the YouTube site – a key frame from a clip plus a ‘title’, in this case a quote. This may be a very small step towards combined visual-verbal referencing, but nevertheless one that is recognizable as a ‘step in the right direction’.


Figure 17

Figure 17: Multimodal referencing: verbal quotes + visual thumbnails

On the downside, such searches point to the imperfect and incomplete nature of the referencing that leaves much to be guessed[4]. In this case, for example, the question of who is being quoted, the man or the woman, remains teasingly ambiguous. The post-airdate transcript for The U.S. Poet Laureate episode (Third Season, Episode 61), like the YouTube clip ( itself, clears up who is doing the talking and who is ‘wearing the trousers’, as C.J.’s shouted reply is: So far up your ass!

Visual-verbal summaries are an emergent genre that focus on specific frames. We have already remarked on fans’ scrapbook-like selections of pin-ups with respect to the photo of Mrs. Landingham’s eyes. To this, we now need to add the audio pin-ups of podcasts and the thumbnails that appear in YouTube.

We argue that face shots are thus an emergent visual transcript, analogous to shot plans as regards the comments they draw. There is clearly a need to support the analysis of the In Excelsis Deo episode with a frame-based visual-verbal transcript – at least, for specific scenes.

In particular, within the WestWing series, Scene 1 functions as a summary of some of the conflicts that go beyond individual episodes. This points to the inherent duality that characterises TV series as referring simultaneously both to an entire series as well as to a single episode in a way that, with some exceptions, is not characteristic of feature films, lectures, documentaries and many other genres.

We thus suggest that Scene 1, the Prologue scene in every episode in The West Wing series, is a good candidate, already a professional summary in itself and thus ready for further development as a dedicated visual-verbal face-shot summary.

Figure 18 is a face-shot transcript for the ‘In Excelsis Deo’ Prologue. We note that it does indeed consist of close-ups on faces. In theory, as there are fifteen interactional turns (excluding the narrator), there ought to be a sequence of fifteen face shots, corresponding to each speaker. This is not the case, as we see when we analyse the five main shots, which actually occur – at the timepoints given in seconds in the bottom row of Figure 18.

Figure 18

Figure 18: Face-shot transcript of the ‘In Excelsis Deo’ Prelude

There are three types of exception: first, the camera tends to stay on the ‘victim’ of events and circumstances, in this case most clearly in the first and last frames, which allows the look of disappointment to be firmly fixed in the viewer’s mind. That is, the camera’s fixed position identifies with the victim whether or not s/he is the speaker. This happens, for example, in 1[1] and 1[4]. In the first case, the message is reinforced by Danny’s ludicrous holding of the goldfish bowl complete with goldfish.

Something different happens in Frames 3 and 4 (1[3]), providing a second type of exception. In this two-second mini-scene, the camera aligns with the listener (Bartlet) and looks over his shoulder at (Leo) speaking; it then switches to align with the listener Leo and looks over his shoulder at Bartlet who says just one word because the visual and cognitive focus is still on Leo, one of the main victims in this episode, and who continues to be so visually in 1[4].

Finally, at Second 18, another mini-scene (1[2]) starts, which further sets up the male-female power relationships aspect of the gender relations theme, introduced in (1[1]), but which uses a third type of exception: Here Sam reaches out to grab Laurie’s sandwich, the camera’s focus being on the hand reaching out. This is an anticipation of Sam’s and Josh’s failed attempts to recruit Laurie to their defence of Leo, a matter shown to be beyond them in Scene 17, mainly because Leo and Laurie, too, are portrayed as pawns in political manoeuvrings and hence victims, as well as promoters, of power games. They are like the goldfish in the bowl, looking for exit strategies but are ultimately trapped. 

Part III: Defining a TV series transcript: a question of perspective

Defining a TV series transcript ought to be a fairly straightforward matter. Most dictionary entries will typically explain that a transcript is “a written, printed, or typed copy of words that have been spoken” in radio programmes, court proceedings or political speeches. The rise of speech-to-text software has transformed the work of transcription in medicine and journalism in the Internet age but has not affected the underlying goal, to turn spoken words into written ones.

However, this type of definition stacks up less well with the digital transcripts of TV series whose referencing systems contain a more substantial reference-oriented metatextual level, not mentioned in dictionary definitions, which distinguishes them from many other types of transcript. The virtual communities of the Internet age have accelerated this process – for example in their promotion of visual as well as verbal referencing – placing TV series transcripts on a quite different evolutionary track as compared with other types of transcript.

Progress along this ‘new track’ is likely to grow in the digital age, further realising the TV transcript genre’s multisemiotic potential, at both textual and metatextual levels. In this part of the article, we explore this hypothesis suggesting that pre-airdate genres’ high degree of visual-verbal specialisation could well be paralleled in the digital age by post-airdate genres.

6. Transcripts as a post-airdate genre

Film-making involves a cross-modal, verbal-to-visual transposition from a written to a visual story, a form of ‘translation’, or transposition whose infinite complexity is a good illustration of how Kress’s concept of transduction can be used, in social semiotic theory and beyond, to refer to the remaking of meaning across modes (Kress 1997: 41).

In fact, a TV series transcript is just one of the many text types that are part of the wider set of texts used by different communities in the creation, viewing and interpretation of a TV series. Each of these texts reworks and builds on the meaning of the previous step in the chain. The process of turning a screenwriter’s initial idea into a film is thus the first step in a well-established series of text types that exist in relation to each other in a step-like transition, each dependent on the previous stage in the chain for their existence and each essential input for the subsequent stage.

The starting point is, of course, the screenplay which as:

An exercise in visual storytelling […] isn’t simply a matter of shot selection and composition. […] Only the most inexperienced screenwriter includes camera directions in a screenplay because such things are the responsibility of the director and director of photography once the film is in production. […]  The fundamentally visual nature of film narrative has led to an interesting paradox. The screenwriter must fully imagine the film that he’s writing. But – and here’s where the paradox comes in – only a small part of what the screenwriter imagines should actually appear in the screenplay, which must evoke a sense of place and character rather than catalogue it down to the minutest detail. What’s more, only a small part of what appears in the screenplay will ever make it to the screen in anything like its original form. (Gurskis, 2007: Introduction, xiii)

The film industry uses a stage-by-stage process involving various visual genres, such as shot plans, storyboards, animatics and previsualizations¸ as well as verbal genres that go beyond screenplays. Thus, as well as being turned into the visual genres used mainly by the film crew, a screenplay will also be turned into a film script to be used by actors as a guide to the delivery of their lines.

The left-hand column of Figure 19 shows a small fragment from Sorkin’s final draft of the script for The West Wing ‘Pilot’. Even this is not the final version of the dialogue as the post-airdate transcript, in the right-hand column shows, with its record of what was actually said in the filmed episode.

The contrast exemplified in Figure 19 is particularly revealing. There are obvious similarities between the two texts. Yet despite this, a post-airdate transcript is rather different from a pre-airdate script, in both form and function. With its clear description of the movements of characters with walk-on parts, the pre-airdate script can certainly lay greater claim to being a this-is-what-the-actors do-as-well-as-say representation than the post-airdate type shown on the right-hand side of Figure 1.

More significantly, this comparison shows that, while pre-airdate scripts and post-airdate transcripts may look the same, significantly, they do not do the same job. Nor do they have the same effect. As regards emotional knife-twisting, the final dialogue in the post-airdate transcript is far more biting.

Figure 19

Figure 19: Pre-airdate draft (left) vs. post-airdate transcript (right)

Even so, Figure 19 shows that beyond references to where it takes place, who is present and who walks down the corridor, few visual aspects of this scene are recorded in either the transcript or the script.

With the final draft script, this is not a problem as scripts are not the main source of visual representations of the episode being filmed, a role carried out instead by pre-airdate storyboards, a form of visual transcript that supports the process of turning screenplays into finished films.

Various types of storyboards were, in fact, adopted extensively in the filming of The West Wing. Simon reports:

As in films if a [TV series] production has stunts or FX /special effects, producers and directors may want boards to work them out. Sometimes the openings are boarded. I boarded the first opening for Nickelodeon’s Clarissa Explains It All to provide the technicians, who were doing the camera moves and special effects, with a visual of the creator’s idea. Amblin’s seaQuest DSV, The Cape, Star Trek, Babylon S, and The West Wing all used storyboards for special effects and stunt scenes. (Simon 2007: 219)

Indeed, Simon’s account also includes examples of storyboards (not shown here for copyright reasons) used in The West Wing (Simon 2007: 28-9). Storyboards are a sequence of illustrations that guide film-makers when they shoot a scene as they indicate the scene’s dynamics, its unfolding in time. In their turn, they incorporate shot plans, which illustrate spatial dispositions of actors and props, (see, for example, Papert’s description of the shot plan for Feliz Navidad scene given above in Section 2.3).

All this illustrates our observation that these visual forms of transcript, together with a discourse-oriented screenplay, constitute an indispensable set of pre-airdate intertexts collectively guiding the production of films. Though functioning, collectively, as a single, integrated multisemiotic transcript, with cross-referencing mechanisms, each contributes individually, with its own internal coherence, to specific functions in the film-construction process.

From a historical perspective, each also represents a step in the evolution of pre-airdate transcripts. As Simon further notes:

Previz, or previsualization, refers to the use of computer-generated sequences to replicate soon-to-be shot live-action sequences. They are the modern-day animatic. Most previz studios work from storyboards provided by a production, but they do have to produce boards on their own at times.  (Simon 2007: 219)[5]

All this throws light on the issue as to whether the TV series post-airdate transcripts should be defined as an independent genre or part of a set of related genres. Of course, a TV series transcript can never be an entirely independent genre as, somewhat like a film subtitle, it is intrinsically tied to the film it transcribes. However, while the story of pre-airdate visual and verbal transcripts is one of increasing integration and computerization, including sophisticated simulations, on the contrary, the story of post-airdate transcripts appears to lag far behind.

However, appearances are also deceptive. Like pre-airdate texts, TV series transcripts exist in relationship with other texts. They are neither starting points, nor endpoints in the production of post-airdate texts. Alas, in contrast to the planned sequence of pre-airdate texts, with post-airdate texts this relationship is never explicit. It can, however, be detected.

This, however, requires careful detective work that sifts through the texts produced by TV series analysts, fans in particular. Andrea Howat’s and Sallie Gregory’s joint podcast recap of the ‘In Excelsis Deo’ episode, mentioned above in Section 2.3, is instructive in this sense.

Its scene-by-scene commentary of Sorkin’s emotional knife twisting of viewer’s emotions is peppered with ad lib remarks about the need to check up on details of what the dialogue actually says.

The ‘notes’ the podcasters refer to would appear be rather more copious than perhaps they would like us to believe. The extensive and highly accurate quoting of Sorkin’s dialogue suggests their reliance on a written transcript (regardless of who actually wrote it)

The same goes for their reconstruction of the visual story. With great insight, they point out that Josh watches Donna read the note in the Christmas present he gives her and make the following comment:

And then Josh gives Donna the book, the Christmas book and he wrote a note inside. It’s just a wonderful a moment [….] but the best moment is after he gives it, when he leans around his door in his office and watches her read it again. That’s the real kicker of the moment. [..] I don’t know what the note says but watching him watching her that’s what does it. […] And you know that they never do those extra shots for no reason. Like time is money. It always tells part of the story. It’s definitely a building moment.  (Wingin It: The West Wing Podcast series. Podcast 80. Timepoint: 30.48)

Were all the eighty-nine podcasts in the Wingin It: The West Wing Podcast series, all equally insightful and detailed, really based on memories and recollections? The level of detail provided suggests otherwise and implies at the very least that the ‘note taking’ referred to in this podcast series, while not as complex a process as shooting a film, is nevertheless far from unsubstantial.

In keeping with what has been stated in multimodal research about transduction (Kress 1997) and transmedia (Lemke 2013), we note that, in contemporary society, many different text types are constantly being merged and brought in our daily activities, as it were, ‘under one roof’. In this view, we can conceive of a transcript as a master document with a strong potential to interlock with other Internet genres.

8. End points or starting points?

Despite the fact that The West Wing series is dialogue rich, Scene 21, the final scene in the In Excelsis Deo episode, contains no dialogue. As further proof that TV episode transcripts go well beyond the word-only definitions of a transcript, it includes the following summary of  what is going on:

The episode ends with a montage of juxtaposing shots of the military funeral for Walter Hufnagle and the activity in THE MURAL ROOM. Throughout, we can hear the boys’ choir sing ‘Little Drummer Boy.’ The hearse arrives at ARLINGTON CEMETERY, SECTION 43. Toby, Mrs. Landingham, and George get out of the car. George is holding a bouquet of flowers. The honor guard carries the casket to the grave. They begin the ritual of folding the flag that covered the casket. THE MURAL ROOM. Sam and C.J. join Mandy and Bartlet. Then, Charlie and Leo join. ARLINGTON CEMETERY. The honor guard starts to shoot their rifles in salute. Toby flinches with the first shot. Mrs. Landingham with the second. THE MURAL ROOM. Donna and Josh join the group. ARLINGTON CEMETERY. The honor guard starts to hand the tightly folded flag to Toby who gestures uncomfortably to George, who is then presented with the flag. George gently places the flowers on the casket. They all stand to leave.

The scene lasts for just under 4 minutes which raises questions about this episode’s suitability for disabled viewers – despite the praise usually meted out for The West Wing by the blind and partially sighted:

I grew up on the great classic comedies of the 1970s: “All in the Family,” “The Mary Tyler Moore Show,” and “M.A.S.H.” I spent far too many summer vacation hours lazily watching programs from “Love Boat” to gameshows. […] I was a pretty typical American TV watcher. Yet, there was always a disappointing aspect to TV programs […] There was always the question: “What’s going on?” And too often, there wasn't anyone willing or able to answer it for me. After all, as a blind person, I missed the visual information these programs presented: telltale facial expressions, audience laughter not triggered by dialogue, the silent entrance of a new character and, of course, the complete shift of setting. […] As a consequence, I have what may be unhealthy love for the work of Aaron Sorkin, the screenwriter whose shows from “Sports Night,” to “West Wing,” to “Studio 60” were heavily dialogue-driven. (Paul Schroeder, Watching TV Blind: A Love-Hate Relationship. Retrieved 05.05.2106)

There is a clear need, and ample time, for an AD (audio-description) voiceover to be incorporated as part of any AD adaptation of this episode. This will inevitably entail re-use both of the episode transcript (regardless of whether the descriptive summary of Scene 21 reported above is actually used) and, quite possibly the supplementary table transcripts we have provided.

Although concerned with the very different genre of Internet lectures, the adaptation of TED Talks (Cámara and Espasa 2011) to the needs of AD is a good example of the fact that in today’s digital world, a transcript is no longer the final stage in the film-text production chain.

Hence, in their paper, Cámara and Espasa (2011) present what they call AD units which are, in fact, modified table-based versions of original Ted Talk transcripts re-arranged in such a way as to recast and reconstrue them as intermediate texts in the transduction process.

Cámara and Espasa’s goal in their paper is to show how AD adaptation might be achieved. The examples they give are the basis for an analysis of the crucial problems that arise. Specifically, the result of their work is a reworked transcript divided into units, with each unit or step in the meaning-making process represented as a table consisting of rows some of which present what is said in the lecture and others what is shown. Subsequent rows assess the need for AD to describe those visuals not described by the speaker and propose the AD text to be added. 

In some ways, the tables that Cámara and Espasa (2011) provide are akin to the transcripts described above. Quite apart from raising questions about what AD adaptation this West Wing episode would require, they provide further evidence of film analysts’ redefinition of transcripts. Their evidence shows that film transcripts are neither solely end-products, nor solely associated with writing down what was said. They are much more than this.

7. Conclusions

The study of transcripts as a genre is in its infancy. With its rather tentative exploration and exemplification of nascent forms, such as manually produced multisemiotic transcripts and software-produced storyboards, this paper has suggested that transcript culture is changing.

While online repositories of digital videos such as YouTube, TedTalks and the various fan sites quoted all point to the need to take a new look at transcripts, even so changing transcript culture to accommodate specific desiderata, such as timepointing, is no easy task. Typically, the picture is uneven. While TedTalks transcripts are exemplary in this respect, those of TV series are not.

To encourage greater awareness of this unevenness, and the need for adjustments, this article has attempted to describe the wider picture, by taking a step backwards and providing a behind-the scenes view of the complex world of film scripts and transcripts, as this helps to identify and clarify the gap between what most transcripts currently offer and what film analysts really need.

Reference systems used in scripts and transcripts will inevitably evolve and certainly need to do so if they are to meet the needs of today’s sophisticated text management society. This article has thus explored the characteristics of post-airdate TV transcripts in relation to the videos that they transcribe as well as in relation to the pre-airdate genres that guide the film production process.

This helps explain that rather than technical barriers, cultural barriers are a more significant consideration. In particular, the article has suggested the need for research that brings together examples of the awareness (among fans in particular) that the TV transcript genre is unlike many other types of transcript and needs to be rethought.

As well as redefining transcripts, no longer seen as an isolated text type but rather as one that interacts with other related genres, the article has proposed an integrational approach in which old and new work together. In this view, new types of tools, including new forms of transcripts but also computer-generated post-airdate storyboards, support existing post-airdate transcripts in a way that encourages a series-analysis perspective.

However, what in the end matters is whether the reader can reflect on the ideas about, and illustrations of, transcript analysis, as expressed in this article and relate them to the other articles in this Special Issue of Intralinea and, beyond that, to the wider issue of the ways in which our society re-contextualizes films and adapts them through the linguistic and multimodal ‘engineering’ called transcription.

Such reflection could very well extend to an understanding of the underlying processes governing society’s access to texts. This will involve a clear understanding of the functions of support texts (subtitling, audio description, annotating, referencing, storyboarding and so on). Ultimately, this article has attempted to encourage readers to reflect on how these textual processes interact and mutually affect each other from a variety of perspectives.


Francesca Coccetta (University of Venice) Deirdre Kantz (University of Pavia), Ivana Marenzi (Leibniz University of Hannover), Maria Grazia Sindoni (University of Messina) and Chris Taylor (University of Trieste) are thanked for comments on earlier drafts of this article. Any remaining shortcomings and oversights are to be attributed to the author.


Baldry, Anthony (2000) “English in a visual society: comparative and historical dimensions in multimodality and multimediality” in Multimodality and Multimediality in the distance learning age, Anthony Baldry (ed.), Campobasso, Palladino: 41-89.

Baldry, Anthony (2004, [2015]) “Phase and transition, type and instance: patterns in media texts as seen through a multimodal concordancer” in Multimodal Discourse Analysis, London and New York: Continuum, Kay O’Halloran (ed.): 83-108. Reprinted in Sigrid Norris (ed.) Multimodality: Volume II Multimodality – The Beginning of a New Area of Research: 2000–5, London and New York: Routledge.

Baldry, Anthony (2007) “The Role of Multimodal Concordances in Multimodal Corpus Linguistics” in New Directions in the Analysis of Multimodal Discourse, Terry D. Royce and Wendy L. Bowcher (eds), Mahwah, New Jersey, Laurence Erlbaum: 173-93.

Baldry, Anthony and Paul J.Thibault (2001) “Towards Multimodal Corpora.” In Corpora in the description and teaching of English, Guy Aston and Lou Burnard, (eds), Bologna: Cooperativa Libraria Universitaria Editrice.

Baldry, Anthony and Paul J. Thibault (2006) Multimodal transcription and text analysis, London, Equinox.

Cámara, Lidia, and Eva Espasa (2011) “The Audio Description of Scientific Multimedia”, The Translator 17, no. 2: 415-37.

Coccetta, Francesca (2016 in press) Access to Discourse in English through Text Analysis: A Preparatory Guide for Undergraduate Students. Como: Ibis.

Gurskis, Daniel (2007) Short Screenplay: Your Short Film from Concept to Production, Andover, Cengage Learning.

Kress, Gunther (1997) Before Writing: Rethinking the paths to literacy, London and New York, Routledge.

Lemke, Jay (2013) “Transmedia Traversals: Marketing meaning and identity” in Readings in Intersemiosis and Multimedia, Elena Montagna (ed.), Como, Ibis: 13-33.

Li, Stan Z. and Anil K. Jain (eds) (2011) Handbook of Face Recognition. Second Edition, New York, Springer-Verlag.

Lombardo, Linda (2001)Selling it and telling it: a functional approach to the discourse of print ads and TV news, Roma, LUISS Guido Carli.

McCabe, Janet (2013) The West Wing, Detroit (MI), Wayne State University Press.

Simon, Mark (2007) Storyboards: Motion in Art, Third Edition, Oxford and Burlington (MA), Focal Press.

Sindoni, Maria Grazia (2013) Spoken and Written Discourse in Online Interactions. A Multimodal Approach, Como, Ibis.

Smith, Greg M. (2003) “The Left Takes Back the Flag: The Steadicam, the Snippet and the Song in the West Wing’s In Excelsis Deo” in The West Wing: the American presidency as television drama, Peter C. Rollins and John E. O’Connor (eds), Syracuse, New York, Syracuse University Press:125-35.

Snyder, Joel (2008) “The visual made verbal” in The Didactics of Audiovisual Translation, Jorge Diaz-Cintas (ed.), Amsterdam, John Benjamins: 191-98.

Taibi, Davide, Saniya Chawla, Stefan Dietze, Ivana Marenzi and Besnik Fetahu (2015) “Exploring TED Talks as linked data for education”, British Journal of Educational Technology, bjet.12283/abstract (last access 10.07.2015).

Taylor, Chris and Anthony Baldry (2001) “Computer assisted text analysis and translation: a functional approach in the analysis and translation of advertising texts” in Exploring Translation and Multilingual Text Production Beyond Content, Erich Steiner and Collin Yallop (eds), Berlin and New York, Mouton de Gruyter: 277-305.

Thibault, Paul (2000) “The multimodal transcription of a television advertisement: theory and practice” in Multimodality and Multimediality in the Distance Learning Age, Anthony Baldry (ed.), Campobasso, Palladino: 311-85.

Vasta, Nicoletta (2001) Rallying Voters: New Labour’s verbal-visual strategies, Padua, CEDAM.74


[1] This chapter makes use of transcripts for Episode 1.10 In Excelsis Deo and The West Wing pilot, both of which can be found in the West Wing Searchable Episode Transcripts section of the West Wing Transcripts website (, described there as ‘dedicated to providing a resource for loyal fans of NBC’s The West Wing’. The screenplay script for The West Wing pilot can be found at scripts/West_Wing_Pilot.pdf. Both sites last accessed 09/04/2016.

[2] By definition, TV series fans want to explore the characters’ changing relationships and beliefs over time and talk about them with other fans. In the case of The West Wing, this is transcendental in nature – beyond rather than over time ‒ as fans are still producing their own West Wing stories long after the end of the TV series. See the West Wing Fanfiction Central ( database for stories (fanon) written by fans about the characters appearing in The West Wing series that are extensions to the episodes (canon).

[3] Developed by a team involving the Consiglio Nazionale delle Ricerche, Istituto per le Tecnologie Didattiche (CNR-ITD) Palermo, the L3S Research Center, Leibniz University of Hannover, Dip DISGESI, University of Messina, MWS-Web is the online follow-up to the MWS and MWS-ACE tools developed in the Living Knowledge ( and Act ( projects. For further information contact: Davide Taibi: Sites last accessed 10/07/2015.

[4] Both MWS-Web and West Wing Transcript search engines will locate this quote through the word ‘outrank’, which appears 4 times in the entire series. Though each search tool provides different and very useful information, they perform differently. Of the two, only MWS-Web can ignore the spelling mistake in the quote (‘technichally’) and find the exact line with a single search, that is without the need for secondary searches or query refinement.

[5] As Simon further explains: ‘Storyboards may also be used to test the viability of a finished commercial product without the great expense of shooting it. These drawings are shot on video and edited together just like a live shoot. This footage is then dubbed with music and voices. This is called an animatic […]. The animatic may then be shown to test groups around the country. Productions benefit from boards in many ways. They may cost money in postproduction, but that cost is much less than the hidden expenses caused by a lack of proper planning or any miscommunication’ (Simon, 2007:29). For a side-by-side comparison of an animatic and a finished cartoon and exemplification of the function of animatics as transcripts that define a film’s overall rhythms, in particular, spatial and temporal relationships in narrative sequences, see The Boondocks: The Complete Third Season Episode Clip - Fried Chicken Flu Animatic ( =sqARB4gNj3w). Site last accessed 29/04/2016.

About the author(s)

Anthony Baldry, who joined the University of Messina in 2008 as Full Professor in English language &  translation, has participated in many inter-university projects that have led to publications on: multimodality; multimodal corpus linguistics; digital genres & digital literacy, computer-based text analysis; Internet & its evolution; web-as-multimodal corpus software tools; captioning & subtitling tools; syllabus design; technology-enhanced learning; systemic-functional approaches to text analysis & transcription; testing including self-assessment & computer-managed testing; scientific & medical English.

Email: [please login or register to view author's email address]

©inTRAlinea & Anthony Baldry (2016).
"Multisemiotic Transcriptions as Film Referencing Systems"
inTRAlinea Special Issue: A Text of Many Colours – translating The West Wing
Edited by: Christopher Taylor
This article can be freely reproduced under Creative Commons License.
Permanent URL:

Go to top of page