Basic structures of visual language

One of the important basic tasks of doing research on the visual language used in comics is to identify the foundational components that go into our comprehension of sequential images. In Understanding Comics, McCloud implicitly broke down the medium into a few parts:

1. Graphic Style
2. Iconography and symbolism
3. Panel-to-panel relationships
4. Text-image relationships

These parts provided a nice initial foray into how the visual language of comics might be segmented. However, the crux of my research outlines that the structure of sequential images actually breaks down similarly to language, and can thereby be researched using similar tools. This gives us several components of the visual language of comics, many of which tie to McCloud’s:

1. Graphic Structure is how we understand the visual pieces of an image. Are certain junctions of lines more appropriate for certain parts of an image? How do we understand lines and shapes? This is the equivalent of studying the sound properties of language, only here in the visual-graphic form.

2. The Lexicon is the vocabulary of systematic pieces used to create images and sequential images. These might range from the morphology of visual conventions (like motion lines) to systematic full panels (like those from Wally Wood’s 22 Panels) or patterns of storytelling (like the set up-beat-punchline pattern). Basically, anything that is used as a pattern is a part of the lexicon of visual language.

2.2. Morphology is a particular part of the lexicon that deals with small components of meaning (like McCloud’s iconography and symbols). However, morphology also includes the principles for how these parts combine together. Why do stars above heads mean one thing but replaced for eyes mean something else? Why can’t lightbulbs also replace eyes to mean inspiration, like they do floating above heads? Why do motion lines always trail behind objects, but not in front?

3. Event Structure is how people understand the nature of events, and in sequential images we may have to rely on knowledge about parts of an event to understand the whole. If an image shows a person punching another, we infer that the puncher reached back their arm first. We also need to be able to make sense of the connections in meaning between and across panels.

4. Spatial Structure has to do with the knowledge that panels convey information about a fictitious spatial location. Each panel only frames a glimpse of this location, and our minds build the overall space. If one panels shows the exterior of a house and the next shows someone sitting at a table, how do we know that they are inside that house without overt cues? If panels in a sequence only depict individual characters, how do we know they belong to the same broader environment?

5. Narrative Structure is how we make sense of the meaning of a sequence of images—its grammar. The event or spatial structures convey the meaning of a sequence, but this meaning is guided through its presentation in a narrative structure. Why delay the climax of a sequence until after several lead-in panels? Why show a scene where each panel shows individual characters instead of all characters in just one panel? These have to do with the presentation of meaning, not just the meaning itself.

6. Navigational Structure is the system used to move through a page layout. Why do people read from left-to-right in America instead of vertically down-then-up? What happens when layouts depart from simple grids? These issues go beyond just the meaningful connections between panels and have to do with a reader’s preferences for how to move from panel-to-panel on a page.

7. Multimodality is the phenomenon of getting information from different domains. In this case, we receive information from both text and image, and thus need to explore how these multiple signals cohere to form a single conception (or, in reverse for creation: how a single conception results in multiple signals).

These are the broad components at work in comprehending sequential images. Many questions have yet to be answered about their parts and their relationships. And, of course, we can also ask how these components might differ across cultures, how people learn these conventions, and how their understanding changes over development.

Importantly, when we look at these components through a linguistic or cognitive perspective, we can’t simply think about it terms of the components of the medium. Rather, we must think about these components in terms of what authors or readers must know in order to create/understand this visual language.

In other words, it shifts the focus to what’s going on in people’s minds and brains. Because of this type of shift, we can then ask how this knowledge may be similar or different from what we know about other cognitive systems, spoken and signed languages in particular.


  • I agree, though doing this blog isn't a full time job so sometimes I go "low tech" with it. You'll be in luck later this year though, when I recap the same structures with lots more detail and lots of images in my new book!

  • Write a Reply or Comment