New(s) Readers: Multimodal Meaning-Making in AJ+ Captioned Video




captions, multimodality, attention economy, online news video, Al Jazeera, Facebook

How to Cite

Burwell, C. (2017). New(s) Readers: Multimodal Meaning-Making in AJ+ Captioned Video. M/C Journal, 20(3).
Vol. 20 No. 3 (2017): caption
Published 2017-06-21


In 2013, Facebook introduced autoplay video into its newsfeed. In order not to produce sound disruptive to hearing users, videos were muted until a user clicked on them to enable audio. This move, recognised as a competitive response to the popularity of video-sharing sites like YouTube, has generated significant changes to the aesthetics, form, and modalities of online video. Many video producers have incorporated captions into their videos as a means of attracting and maintaining user attention. Of course, captions are not simply a replacement or translation of sound, but have instead added new layers of meaning and changed the way stories are told through video.

In this paper, I ask how the use of captions has altered the communication of messages conveyed through online video. In particular, I consider the role captions have played in news reporting, as online platforms like Facebook become increasingly significant sites for the consumption of news. One of the most successful producers of online news video has been Al Jazeera Plus (AJ+). I examine two recent AJ+ news videos to consider how meaning is generated when captions are integrated into the already multimodal form of the video—their online reporting of Australian versus US healthcare systems, and the history of the Black Panther movement. I analyse interactions amongst image, sound, language, and typography and consider the role of captions in audience engagement, branding, and profit-making. Sean Zdenek notes that captions have yet to be recognised “as a significant variable in multimodal analysis, on par with image, sound and video” (xiii). Here, I attempt to pay close attention to the representational, cultural and economic shifts that occur when captions become a central component of online news reporting. I end by briefly enquiring into the implications of captions for our understanding of literacy in an age of constantly shifting media.

Multimodality in Digital Media

Jeff Bezemer and Gunther Kress define a mode as a “socially and culturally shaped resource for meaning making” (171). Modes include meaning communicated through writing, sound, image, gesture, oral language, and the use of space. Of course, all meanings are conveyed through multiple modes. A page of written text, for example, requires us to make sense through the simultaneous interpretation of words, space, colour, and font. Media such as television and film have long been understood as multimodal; however, with the appearance of digital technologies, media’s multimodality has become increasingly complex. Video games, for example, demonstrate an extraordinary interplay between image, sound, oral language, written text, and interactive gestures, while technologies such as the mobile phone combine the capacity to produce meaning through speaking, writing, and image creation.

These multiple modes are not simply layered one on top of the other, but are instead “enmeshed through the complexity of interaction, representation and communication” (Jewitt 1). The rise of multimodal media—as well as the increasing interest in understanding multimodality—occurs against the backdrop of rapid technological, cultural, political, and economic change. These shifts include media convergence, political polarisation, and increased youth activism across the globe (Herrera), developments that are deeply intertwined with uses of digital media and technology. Indeed, theorists of multimodality like Jay Lemke challenge us to go beyond formalist readings of how multiple modes work together to create meaning, and to consider multimodality “within a political economy and a cultural ecology of identities, markets and values” (140).

Video’s long history as an inexpensive and portable way to produce media has made it an especially dynamic form of multimodal media. In 1974, avant-garde video artist Nam June Paik predicted that “new forms of video … will stimulate the whole society to find more imaginative ways of telecommunication” (45). Fast forward more than 40 years, and we find that video has indeed become an imaginative and accessible form of communication. The cultural influence of video is evident in the proliferation of video genres, including remix videos, fan videos, Let’s Play videos, video blogs, live stream video, short form video, and video documentary, many of which combine semiotic resources in novel ways. The economic power of video is evident in the profitability of video sharing sites—YouTube in particular—as well as the recent appearance of video on other social media platforms such as Instagram and Facebook.

These platforms constitute significant “sites of display.” As Rodney Jones notes, sites of display are not merely the material media through which information is displayed. Rather, they are complex spaces that organise social interactions—for example, between producers and users—and shape how meaning is made. Certainly we can see the influence of sites of display by considering Facebook’s 2013 introduction of autoplay into its newsfeed, a move that forced video producers to respond with new formats. As Edson Tandoc and Julian Maitra write, news organisations have had been forced to “play by Facebook’s frequently modified rules and change accordingly when the algorithms governing the social platform change” (2). AJ+ has been considered one of the media companies that has most successfully adapted to these changes, an adaptation I examine below. I begin by taking up Lemke’s challenge to consider multimodality contextually, reading AJ+ videos through the conceptual lens of the “attention economy,” a lens that highlights the profitability of attention within digital cultures. I then follow with analyses of two short AJ+ videos to show captions’ central role, not only in conveying meaning, but also in creating markets, and communicating branded identities and ideologies.

AJ+, Facebook and the New Economies of Attention

The Al Jazeera news network was founded in 1996 to cover news of the Arab world, with a declared commitment to give “voice to the voiceless.” Since that time, the network has gained global influence, yet many of its attempts to break into the American market have been unsuccessful (Youmans). In 2013, the network acquired Current TV in an effort to move into cable television. While that effort ultimately failed, Al Jazeera’s purchase of the youth-oriented Current TV nonetheless led to another, surprisingly fruitful enterprise, the development of the digital media channel Al Jazeera Plus (AJ+). AJ+ content, which is made up almost entirely of video, is directed at 18 to 35-year-olds. As William Youmans notes, AJ+ videos are informal and opinionated, and, while staying consistent with Al Jazeera’s mission to “give voice to the voiceless,” they also take an openly activist stance (114). Another distinctive feature of AJ+ videos is the way they are tailored for specific platforms. From the beginning, AJ+ has had particular success on Facebook, a success that has been recognised in popular and trade publications. A 2015 profile on AJ+ videos in Variety (Roettgers) noted that AJ+ was the ninth biggest video publisher on the social network, while a story on (Reid, “How AJ+ Reaches”) that same year commented on the remarkable extent to which Facebook audiences shared and interacted with AJ+ videos. These stories also note the distinctive video style that has become associated with the AJ+ brand—short, bold captions; striking images that include photos, maps, infographics, and animations; an effective opening hook; and a closing call to share the video.

AJ+ video producers were developing this unique style just as Facebook’s autoplay was being introduced into newsfeeds. Autoplay—a mechanism through which videos are played automatically, without action from a user—predates Facebook’s introduction of the feature. However, autoplay on Internet sites had already begun to raise the ire of many users before its appearance on Facebook (Oremus, “In Defense of Autoplay”). By playing video automatically, autoplay wrests control away from users, and causes particular problems for users using assistive technologies. Reporting on Facebook’s decision to introduce autoplay, Josh Constine notes that the company was looking for a way to increase advertising revenues without increasing the number of actual ads. Encouraging users to upload and share video normalises the presence of video on Facebook, and opens up the door to the eventual addition of profitable video ads. Ensuring that video plays automatically gives video producers an opportunity to capture the attention of users without the need for them to actively click to start a video. Further, ensuring that the videos can be understood when played silently means that both deaf users and users who are situationally unable to hear the audio can also consume its content in any kind of setting.

While Facebook has promoted its introduction of autoplay as a benefit to users (Oremus, “Facebook”), it is perhaps more clearly an illustration of the carefully-crafted production strategies used by digital platforms to capture, maintain, and control attention. Within digital capitalism, attention is a highly prized and scarce resource. Michael Goldhaber argues that once attention is given, it builds the potential for further attention in the future. He writes that “obtaining attention is obtaining a kind of enduring wealth, a form of wealth that puts you in a preferred position to get anything this new economy offers” (n.p.). In the case of Facebook, this offers video producers the opportunity to capture users’ attention quickly—in the time it takes them to scroll through their newsfeed. While this may equate to only a few seconds, those few seconds hold, as Goldhaber predicted, the potential to create further value and profit when videos are viewed, liked, shared, and commented on.

Interviews with AJ+ producers reveal that an understanding of the value of this attention drives the organisation’s production decisions, and shapes content, aesthetics, and modalities. They also make it clear that it is captions that are central in their efforts to engage audiences. Jigar Mehta, former head of engagement at AJ+, explains that “those first three to five seconds have become vital in grabbing the audience’s attention” (quoted in Reid, “How AJ+ Reaches”). While early videos began with the AJ+ logo, that was soon dropped in favour of a bold image and text, a decision that dramatically increased views (Reid, “How AJ+ Reaches”). Captions and titles are not only central to grabbing attention, but also to maintaining it, particularly as many audience members consume video on mobile devices without sound. Mehta tells an editor at the Nieman Journalism Lab:

we think a lot about whether a video works with the sound off. Do we have to subtitle it in order to keep the audience retention high? Do we need to use big fonts? Do we need to use color blocking in order to make words pop and make things stand out? (Mehta, qtd. in Ellis)

An AJ+ designer similarly suggests that the most important aspects of AJ+ videos are brand, aesthetic style, consistency, clarity, and legibility (Zou). While questions of brand, style, and clarity are not surprising elements to associate with online video, the matter of legibility is. And yet, in contexts where video is viewed on small, hand-held screens and sound is not an option, legibility—as it relates to the arrangement, size and colour of type—does indeed take on new importance to storytelling and sense-making.

While AJ+ producers frame the use of captions as an innovative response to Facebook’s modern algorithmic changes, it makes sense to also remember the significant histories of captioning that their videos ultimately draw upon. This lineage includes silent films of the early twentieth century, as well as the development of closed captions for deaf audiences later in that century. Just as he argues for the complexity, creativity, and transformative potential of captions themselves, Sean Zdenek also urges us to view the history of closed captioning not as a linear narrative moving inevitably towards progress, but as something far more complicated and marked by struggle, an important reminder of the fraught and human histories that are often overlooked in accounts of “new media.” Another important historical strand to consider is the centrality of the written word to digital media, and to the Internet in particular. As Carmen Lee writes, despite public anxieties and discussions over a perceived drop in time spent reading, digital media in fact “involve extensive use of the written word” (2). While this use takes myriad forms, many of these forms might be seen as connected to the production, consumption, and popularity of captions, including practices such as texting, tweeting, and adding titles and catchphrases to photos.

Captions, Capture, and Contrast in Australian vs. US Healthcare

On May 4, 2017, US President Donald Trump was scheduled to meet with Australian Prime Minister Malcolm Turnbull in New York City. Trump delayed the meeting, however, in order to await the results of a vote in the US House of Representatives to repeal the Affordable Care Act—commonly known as Obama Care. When he finally sat down with the Prime Minister later that day, Trump told him that Australia has “better health care” than the US, a statement that, in the words of a Guardian report, “triggered astonishment and glee” amongst Trump’s critics (Smith). In response to Trump’s surprising pronouncement, AJ+ produced a 1-minute video extending Trump’s initial comparison with a series of contrasts between Australian government-funded health care and American privatised health care (Facebook, “President Trump Says…”). The video provides an excellent example of the role captions play in both generating attention and creating the unique aesthetic that is crucial to the AJ+ brand.

The opening frame of the video begins with a shot of the two leaders seated in front of the US and Australian flags, a diplomatic scene familiar to anyone who follows politics. The colours of the picture are predominantly red, white and blue. Superimposed on top of the image is a textbox containing the words “How does Australia’s healthcare compare to the US?” The question appears in white capital letters on a black background, and the box itself is heavily outlined in yellow. The white and yellow AJ+ logo appears in the upper right corner of the frame. This opening frame poses a question to the viewer, encouraging a kind of rhetorical interactivity. Through the use of colour in and around the caption, it also quickly establishes the AJ+ brand. This opening scene also draws on the Internet’s history of humorous “image macros”—exemplified by the early LOL cat memes—that create comedy through the superimposition of captions on photographic images (Shifman).

Captions continue to play a central role in meaning-making once the video plays. In the next frame, Trump is shown speaking to Turnbull. As he speaks, his words—“We have a failing healthcare”—drop onto the screen (Image 1). The captions are an exact transcription of Trump’s awkward phrase and appear centred in caps, with the words “failing healthcare” emphasised in larger, yellow font. With or without sound, these bold captions are concise, easily read on a small screen, and visually dominate the frame. The next few seconds of the video complete the sequence, as Trump tells Turnbull, “I shouldn’t say this to our great gentleman, my friend from Australia, ‘cause you have better healthcare than we do.” These words continue to appear over the image of the two men, still filling the screen. In essence, Trump’s verbal gaffe, transcribed word for word and appearing in AJ+’s characteristic white and yellow lettering, becomes the video’s hook, designed to visually call out to the Facebook user scrolling silently through their newsfeed.

Image 1: “We have a failing healthcare.”

The middle portion of the video answers the opening question, “How does Australia’s healthcare compare to the US?”. There is no verbal language in this segment—the only sound is a simple synthesised soundtrack. Instead, captions, images, and spatial design, working in close cooperation, are used to draw five comparisons. Each of these comparisons uses the same format. A title appears at the top of the screen, with the remainder of the screen divided in two. The left side is labelled Australia, the right U.S. Underneath these headings, a representative image appears, followed by two statistics, one for each country. For example, the third comparison contrasts Australian and American infant mortality rates (Image 2). The left side of the screen shows a close-up of a mother kissing a baby, with the superimposed caption “3 per 1,000 births.” On the other side of the yellow border, the American infant mortality rate is illustrated with an image of a sleeping baby superimposed with a corresponding caption, “6 per 1,000 births.” Without voiceover, captions do much of the work of communicating the national differences. They are, however, complemented and made more quickly comprehensible through the video’s spatial design and its subtly contrasting images, which help to visually organise the written content.

Image 2: “Infant mortality rate”

The final 10 seconds of the video bring sound back into the picture. We once again see and hear Trump tell Turnbull, “You have better healthcare than we do.” This image transforms into another pair of male faces—liberal American commentator Chris Hayes and US Senator Bernie Sanders—taken from a MSNBC cable television broadcast. On one side, Hayes says “They do have, they have universal healthcare.” On the other, Sanders laughs uproariously in response. The only added caption for this segment is “Hahahaha!”, the simplicity of which suggests that the video’s target audience is assumed to have a context for understanding Sander’s laughter. Here and throughout the video, autoplay leads to a far more visual style of relating information, one in which captions—working alongside images and layout—become, in Zdenek’s words, a sort of “textual performance” (6).

The Black Panther Party and the Textual Performance of Progressive Politics

Reports on police brutality and Black Lives Matters protests have been amongst AJ+’s most widely viewed and shared videos (Reid, “Beyond Websites”). Their 2-minute video (Facebook, Black Panther) commemorating the 50th anniversary of the Black Panther Party, viewed 9.5 million times, provides background to these contemporary events. Like the comparison of American and Australian healthcare, captions shape the video’s structure. But here, rather than using contrast as means of quick visual communication, the video is structured as a list of five significant points about the Black Panther Party. Captions are used not only to itemise and simplify—and ultimately to reduce—the party’s complex history, but also, somewhat paradoxically, to promote the news organisation’s own progressive values.

After announcing the intent and structure of the video—“5 things you should know about the Black Panther Party”—in its first 3 seconds, the video quickly sets in to describe each item in turn. The themes themselves correspond with AJ+’s own interests in policing, community, and protest, while the language used to announce each theme is characteristically concise and colloquial:

  1. They wanted to end police brutality.
  2. They were all about the community.
  3. They made enemies in high places.
  4. Women were vocal and active panthers.
  5. The Black Panthers’ legacy is still alive today.

Each of these themes is represented using a combination of archival black and white news footage and photographs depicting Black Panther members, marches, and events. These still and moving images are accompanied by audio recordings from party members, explaining its origins, purposes, and influences. Captions are used throughout the video both to indicate the five themes and to transcribe the recordings. As the video moves from one theme to another, the corresponding number appears in the centre of the screen to indicate the transition, and then shrinks and moves to the upper left corner of the screen as a reminder for viewers. A musical soundtrack of strings and percussion, communicating a sense of urgency, underscores the full video.

While typographic features like font size, colour, and placement were significant in communicating meaning in AJ+’s healthcare video, there is an even broader range of experimentation here. The numbers 1 to 5 that appear in the centre of the screen to announce each new theme blink and flicker like the countdown at the beginning of bygone film reels, gesturing towards the historical topic and complementing the black and white footage. For those many viewers watching the video without sound, an audio waveform above the transcribed interviews provides a visual clue that the captions are transcriptions of recorded voices. Finally, the colour green, used infrequently in AJ+ videos, is chosen to emphasise a select number of key words and phrases within the short video. Significantly, all of these words are spoken by Black Panther members. For example, captions transcribing former Panther leader Ericka Huggins speaking about the party’s slogan—“All power to the people”—highlight the words “power” and “people” with large, lime green letters that stand out against the grainy black and white photos (Image 3). The captions quite literally highlight ideas about oppression, justice, and social change that are central to an understanding of the history of the Black Panther Party, but also to the communication of the AJ+ brand.

Image 3: “All power to the people”


Employing distinctive combinations of word and image, AJ+ videos are produced to call out to users through the crowded semiotic spaces of social media. But they also call out to scholars to think carefully about the new kinds of literacies associated with rapidly changing digital media formats. Captioned video makes clear the need to recognise how meaning is constructed through sophisticated interpretive strategies that draw together multiple modes. While captions are certainly not new, an analysis of AJ+ videos suggests the use of novel typographical experiments that sit “midway between language and image” (Stöckl 289). Discussions of literacy need to expand to recognise this experimentation and to account for the complex interactions between the verbal and visual that get lost when written text is understood to function similarly across multiple platforms. In his interpretation of closed captioning, Zdenek provides an insightful list of the ways that captions transform meaning, including their capacity to contextualise, clarify, formalise, linearise and distill (8–9). His list signals not only the need for a deeper understanding of the role of captions, but also for a broader and more vivid vocabulary to describe multimodal meaning-making. Indeed, as Allan Luke suggests, within the complex multimodal and multilingual contexts of contemporary global societies, literacy requires that we develop and nurture “languages to talk about language” (459).

Just as importantly, an analysis of captioned video that takes into account the economic reasons for captioning also reminds us of the need for critical media literacies. AJ+ videos reveal how the commercial goals of branding, promotion, and profit-making influence the shape and presentation of news. As meaning-makers and as citizens, we require the capacity to assess how we are being addressed by news organisations that are themselves responding to the interests of economic and cultural juggernauts such as Facebook. In schools, universities, and informal learning spaces, as well as through discourses circulated by research, media, and public policy, we might begin to generate more explicit and critical discussions of the ways that digital media—including texts that inform us and even those that exhort us towards more active forms of citizenship—simultaneously seek to manage, direct, and profit from our attention.


Bezemer, Jeff, and Gunther Kress. “Writing in Multimodal Texts: A Social Semiotic Account of Designs for Learning.” Written Communication 25.2 (2008): 166–195.

Constine, Josh. “Facebook Adds Automatic Subtitling for Page Videos.” TechCrunch 4 Jan. 2017. 1 May 2017 <>.

Ellis, Justin. “How AJ+ Embraces Facebook, Autoplay, and Comments to Make Its Videos Stand Out.” Nieman Labs 3 Aug. 2015. 28 Apr. 2017 <>.

Facebook. “President Trump Says…” Facebook, 2017. <>.

Facebook. “Black Panther.” Facebook, 2017. <>.

Goldhaber, Michael. “The Attention Economy and the Net.” First Monday 2.4 (1997). 9 June 2013 <>.

Herrera, Linda. “Youth and Citizenship in the Digital Age: A View from Egypt.” Harvard Educational Review 82.3 (2012): 333–352.

Jewitt, Carey.”Introduction.” Routledge Handbook of Multimodal Analysis. Ed. Carey Jewitt. New York: Routledge, 2009. 1–8.

Jones, Rodney. “Technology and Sites of Display.” Routledge Handbook of Multimodal Analysis. Ed. Carey Jewitt. New York: Routledge, 2009. 114–126.

Lee, Carmen. “Micro-Blogging and Status Updates on Facebook: Texts and Practices.” Digital Discourse: Language in the New Media. Eds. Crispin Thurlow and Kristine Mroczek. Oxford Scholarship Online, 2011. DOI: 10.1093/acprof:oso/9780199795437.001.0001.

Lemke, Jay. “Multimodality, Identity, and Time.” Routledge Handbook of Multimodal Analysis. Ed. Carey Jewitt. New York: Routledge, 2009. 140–150.

Luke, Allan. “Critical Literacy in Australia: A Matter of Context and Standpoint.” Journal of Adolescent and Adult Literacy 43.5 (200): 448–461.

Oremus, Will. “Facebook Is Eating the Media.” National Post 14 Jan. 2015. 15 June 2017 <>.

———. “In Defense of Autoplay.” Slate 16 June 2015. 14 June 2017 <>.

Paik, Nam June. “The Video Synthesizer and Beyond.” The New Television: A Public/Private Art. Eds. Douglas Davis and Allison Simmons. Cambridge, MA: MIT Press, 1977. 45.

Reid, Alistair. “Beyond Websites: How AJ+ Is Innovating in Digital Storytelling.” 17 Apr. 2015. 13 Feb. 2017 <>.

———. “How AJ+ Reaches 600% of Its Audience on Facebook.” 5 Aug. 2015. 13 Feb. 2017 <>.

Roettgers, Jank. “How Al Jazeera’s AJ+ Became One of the Biggest Video Publishers on Facebook.” Variety 30 July 2015. 1 May 2017 <>.

Shifman, Limor. Memes in Digital Culture. Cambridge, MA: MIT Press, 2014.

Smith, David. “Trump Says ‘Everybody’, Not Just Australia, Has Better Healthcare than US.” The Guardian 5 May 2017. 5 May 2017 <>.

Stöckl, Hartmut. “Typography: Visual Language and Multimodality.” Interactions, Images and Texts. Eds. Sigrid Norris and Carmen Daniela Maier. Amsterdam: De Gruyter, 2014. 283–293.

Tandoc, Edson, and Maitra, Julian. “New Organizations’ Use of Native Videos on Facebook: Tweaking the Journalistic Field One Algorithm Change at a Time. New Media & Society (2017). DOI: 10.1177/1461444817702398.

Youmans, William. An Unlikely Audience: Al Jazeera’s Struggle in America. New York: Oxford University Press, 2017.

Zdenek, Sean. Reading Sounds: Closed-Captioned Media and Popular Culture. Chicago: University of Chicago Press, 2015.

Zou, Yanni. “How AJ+ Applies User-Centered Design to Win Millennials.” Medium 16 Apr. 2016. 7 May 2017 <>.

Author Biography

Catherine Burwell, University of Calgary

Catherine Burwell is an Assistant Professor in the Werklund School of Education at the University of Calgary. Her research areas include cultural studies, media education and digital literacy. She has a particular interest in the literacies and interpretive strategies connected with online video, and has published on video genres such as fan remixes and Let's Plays.