Podcast: “Grammar” in visual language

I’ve done another podcast with the folks at VizThink, this time debating Yuri Engelhardt and Dave Gray on what constitutes a visual language and the nature of visual language grammar.

This new format allows you to skip around to different chapters to jump straight to parts of interest. (Please note, I object to the insinuation in the chapter title that “comics” can equal “visual language”):

Hint: Use the Full Screen Button to see this video in greater detail.

I think that there is something I strived to point out throughout the discussion that I didn’t articulate well enough, but to explain it I’ll have to do a mini-linguistics lesson.

In the podcast, Yuri pointed out the view that language has two main parts: a set of units (lexicon) and a set of combinatorial rules (grammar). This view of two components is essentially Chomsky’s view of grammar, and organizationally looks something like the diagram to the side.

In this traditional view, syntax/grammar is the component that offshoots meaning, and only syntax has properties for combining elements together. I said that I agreed with this notion, but really I don’t. When I mentioned that I subscribe to a view from Chomsky’s student, Ray Jackendoff (my teacher), I should perhaps have elaborated on the differences between those perceptions more, because they are extremely important and can resolve some of the conflict of the debate.

Jackendoff’s view of grammar is different. This “Parallel Architecture” says that the mind has three main interfacing components: modality (auditory/manual/graphic), syntax, and conceptual structure (meaning). The “lexicon” is distributed across the interfaces between all three of these structures — it doesn’t have it’s own “place.” And, importantly, each of these structures has that capacity for infinite combinations — not just syntax. (Note the similarities to my listing of properties of Language). This would look like this:

Much of our debate focused around whether or not single images (diagrams) have “grammar.” My objection is that it does not function like “syntax” does in a verbal grammar, though I acknowledge that there might be a hierarchy or a combinatorial system there. If you subscribe to the Chomksyan view of grammar, you’re forced to say that the combinatorial element “is syntax,” which is exactly what Yuri is doing:

If you follow the Parallel Architecture (as I do), syntax is not the only element that creates hierarchies. They all do. So, combinations within a single image or diagram is “grammar’ insofar as phonology is the “grammar” of sound. Essentially, Yuri’s “visual grammar” is the combination system within the graphic structures, which is why I kept prodding about the difference between it and just the system of perception (and why most of its “constraints” are based on iconicity). This instead looks like this:

In contrast, my grammar for visual language needs a combinatorial system for individual images and for combining them together, looking like this:

To the extant that the narrative structure takes concepts and a modality and orders them coherently, it functions the same as syntax in verbal language. This is “visual language grammar” analogous to the way that syntax is verbal language grammar (nouns and verbs). But, all three structures have combinatorial properties. They don’t all make reasonable analogies to saying that they are like “grammar” in the syntactic sense, but they may be combinatorial.

(This is also why you can say that “gestures are to sign language what individual images are to visual language” in the context of sequential images, but not for individual images. There is no developmental/fluency gap like this for “… visual objects are to individual images”. I.e. People don’t learn how to draw simple graphic signs but not be able to put them into a diagrammatic arrangement.)

Making this shift in perception buys you a lot: It makes the distinction why single images may have hierarchy (like perception/phonology), but don’t have grammar (like syntax). It addresses why most of that structure is guided by iconic and indexical constraints. And, it also may give you a leg up in describing combinatorial aspects of images beyond diagrams (which occurs within panels).

Finally, it is worth noting that not just aspects of language have consistent patterned units that appear hierarchic in structure within our cognitive system. This also appears in music, event structure, vision, social structure, and a myriad of other domains (discussed well here). But, we don’t have to call them “languages” because of this broad similarity.

Suggested reading:
Foundations of Language and
Language, Consciousness, Culture, both by Ray Jackendoff


  • Excellent discussion here. As a grad student also doing work in this area of visual language, I’m extremely interested in this type of debate.

    It strikes me that perhaps some of the points that were brought up between you and Yuri might be (better) situated in a discussion of phonology and/or morphology; grammar may be too ‘generic’ for the elements which both of you brought up. For example, can the discussion of the red-outline-with-car-blue-background- with-bicycle schema be considered a phonological and/or morphological issue?

    Loved the dichotomy between Yuri’s “part of the grammar” point versus your “perception” point. Very important. Hope to perhaps email you privately about other questions. Thanks again; I hope VizThink continues to promote this kind of cross-disciplinary dialogue.

  • Thanks for the comments doug! I think you’re right that what Yuri’s getting at there was more phonological/morphological, and was again a failure on my part for pointing that out. I think that’s partially why I wrote this long post about grammars to this extant.

    Stuff that Yuri talks about, and things within an individual image/panel, would be what I consider “photological”/morphological, while once they are put in sequence they become grammatical issues. But, importantly, all those systems can be generative potentially.

  • Write a Reply or Comment