# Background

The dissertation from which this chapter is excerpted deals explores ocularities and imaginations of British literature, from about 1880 to about 1940.

In Chapter 1, I introduce key terms in the theoretical framework I’m using, and the sense in which I use them, which is narrower than usual:

imagination
a cognitive process that translates text to mental images
image
a group of words that causes or prompts imagination, and therefore mental images

I argue that description may be understood through imagination, and that imagination may be understood by way of the eye.

Chapter 2 examines literary theory and science contemporary to this period, to begin to build the case for how the eye is an apt metaphor for understanding this literature.

Chapter 3 deals with methodology, and answers questions such as:

• Why computational analysis?
• How does computational/quantiative analysis fit in with other critical traditions?
• Isn’t this all an oversimplification? (It isn’t.)

Chapters 4-6 each deal with an aspect of vision:

• Chapter 4 treats hue, photopic vision, (retinal cones)
• Chapter 5 treats shape, scotopic vision, (retinal rods)
• Ch. 6. treats space, proprioception (I might end up cutting this one)

This is Chapter 4.

# Introduction

Among the most striking ocularities of literary description, and more broadly of any narrative, are pauses to convey the hue of that what is before the describer’s imagination. Pauses because description is a dilation of extradiegetic time. Before in that the described, if it has hue, is in focus, in the center of the describer’s field of vision. To describe is to settle one’s gaze in a field, and at the same time, to move it linearly and programmatically across that field. It is always selective, and it always refracts. Description of hue is among the most concentrated subroutines of that process, since the creation of textual color from a mental image involves digitization: approximative translation from an analog, linear system (a spectrum) to a digital, discrete system (a word). This process is so deeply involved with our language/thought apparatus, and so charged with epistemological problems, that it is the perennial subject of linguistics, neuroscience, and psychology. But rather than explore the phenomenon of textual color theoretically, I will take the opposite approach, and model the imaginative process in reverse, and in aggregate.

Modeling is a statistical mimesis. Given observable qualities of a subject—in our case, a text, or a corpus of texts—a model imitates those qualities, in order to study the subject’s behavior under unobserved circumstances. In modeling a hurricane, a meterologist not only becomes capable of predicting that hurricane’s path, albeit with some margin of error, but, by studying the model’s error, learns further nuances of the system he or she is studying. To model literature is not to discover how Virginia Woolf would have written about the 2020 coronavirus epidemic—although that sort of prediction is possible—it’s to understand the small swirls of rainwater that compose the greater phenomenon we know as the hurricane.

I am motivated here, both by large questions about literary history, and smaller, more specialized ones about individual texts. Big ones include: Does literary writing get more colorful with time? (It does.) What are the dominant colors of this period’s literature? (White and black.) and, Which genres are the most colorful? (Love stories.) But smaller questions have to do with the way color words operate syntactically, how they operate within description, and how their semiologies warp the reader’s imagination. When Woolf describes a set of curtains as “mustard-coloured,” or when Joyce describes a man’s eyes as “nocoloured,” how and why do those color choices do more work than their superficial significations? How, precisely, are color words used differently in poetry and fiction?

These questions are inseparable from the way they are modeled. In many cases, modeling them is what generates the questions. In others, the model is, at least partially, the answer to these questions. It is with that in mind that I invite you to join me in my process of creating this model of imagination—an imagination machine—where each decision in the algorithmic design, however mathematical, probes at the workings of color in text.

## A Critique of the Narration / Description Dichotomy

Before we begin our experiment, it is necessary to discuss the textual structure in which representations of hue are typically found: literary description. Although textual color has its own behaviors and properties, the conditions of description shape how color operates in text. By description here I am discussing a very particular writerly process which linearizes and descretizes visual information: that which transforms imaginable material into words, and arranges those words into lines.

Exactly what may be identified as description, and what its role and import in literature may be, has been a matter of some debate. One of the more hotly contested works is an essay by Georg Lukács, titled “Narrate or Describe?” the central dichotomy of which is apparent from the title (lukacs2005?). Lukács contrasts writers such as Flaubert and Zola, whom he calls “descriptive,” with writers like Tolstoy whom he claims use a more “narrative,” action-oriented style. The descriptive style Lukács is quick to dismiss. Of descriptive details in Flaubert’s Madame Bovary, he writes:

to the reader they seem undifferentiated, additional elements of the environment Flaubert is describing. They become dabs of colour in a painting which rises above a lifeless level only insofar as it is elevated to an ironic symbol of philistinism. The painting assumes an importance which does not arise out of the subjective importance of the events, to which it is scarcely related, but from the artifice in the formal stylization." (115)

He later asserts that description “lacks humanity,” in that “its transformation of men into still lives is only the artistic manifestation of its inhumanity.” (140). I join the many later critics who have written about Lukács’s essay in disagreeing with him, but since they critics have done this so thoroughly, I won’t bother to do so here (Love; marcus2016building?). My refutation is much more radical: I question the distinction between narrative and description at the root of Lukács’s argument.

There do exist some aspects of fiction that have no descriptive function, of course—they may not be imagined, and thus they convey no images. But nearly everything else in fiction does describe. The opposite is nearly true, as well: there exist very few elements in a story that are purely descriptive, and serve no role in furthering the plot, fleshing out the characters, or providing a scene which is inextricable from, and indespensible to, plot and character.

In other words, to narrate is to describe. Any text may be description if it contains a visual component (strong description) or may be imagined (weak description), and this includes the “epic artistry” Lukács sees in Tolstoy, and his “recounting of the vicissitudes of human beings” (111). Since, is not the visual experience one such vicissitude? This is about more than just a distinction between the stylistic pastoral and epic, though, where description recounts in minute detail because it has the bourgeois leisure of a shepherd, and narration practically presents the facts, with military precision. Description is not essentially static, even though it often is. The proof is simply that action can be, and is, imagined in the same way as a still-life. Furthermore, description’s linearity makes it a priori dynamic.

To explain this further, I’ll use a well-known metaphor from physics: the color spectrum. Although there do exist areas on a color spectrum where we could identify certain colors—spots at which one could point, and nine out of ten English speakers would call it /red/–if these same people were asked to draw a line that definitively separates red from pink, or red from blue, there would be ten very different answers. The same is true of description and narration: they exist along a spectrum, and overlap with each other considerably. It is in that sense that we could say that the distinction isn’t real at all.

This is a crucial context for understanding textual color. At first glance, and according to Lukács’s followers, color is probably the one element of fiction most superfluous to the story, and thus the work’s reason for being. But what I argue is that colors in text are not simply signifiers of their position on the visible spectrum, but are the material out of which the text is created.

# Problems and First Considerations

The first of many color-related epistemological problems may be found among color metaphors like lemon-yellow. Lemons themselves are—paradoxically—not lemon-colored. But neither is lemon-yellow a Platonic ideal to which all lemons, or paintings of lemons, should aspire. Instead, the term exists somewhere between lemons, our memory of them, our visual experience of them, and what we read about them. This view is allied with Heather Love’s 2016 work, “Shimmering Description,” which characterizes literary description as oscillating, or shimmering, between its lexemic and communicated significance. Beginning with Love’s idea, I aim to find the dimensions in which this oscillation takes place.

Aloys Maerz and Morris Paul’s 1930 reference manual A Dictionary of Color, one of the more ambitious works of its kind, acknowledes this problem as one they hope to solve with their manual. They see this as a part of the “material” and “intellectual” confusions of color names:

the confused ideas on color nomenclature are found due to two factors, one material, the other intellectual. The first has been the ability of color makers, in the past, to produce color substances that were both brilliant and permanent … the second is the difference of opinion as to the exact color indicated by any name, and the lack of any authority by which an individual opinion can be upheld. … the name Lemon Yellow would seem sufficiently accurate as a descriptive term, yet the color of lemons varies slightly and the memory for exact color sensations, when the original is not at hand, is often faulty. (Maerz and Paul 1)

Readers of James Joyce’s Ulysses may recognize the color lemon-yellow here. Lemons and lemon-yellow are leitmotive that appear at intervals in the novel. First appearing in the Telemachus episode as the “Paris fad” for tea which Buck Mulligan rejects in favor of “Sandycove milk” (Stephen has just recently returned from Paris, and had aquired some of its habits), the color appears in “Proteus,” as Stephen muses about the effects of sunlight on the color of the houses: “Gold light on sea, on sand, on boulders. The sun is there, the slender trees, the lemon houses. ¶ Paris rawly waking, crude sunlight on her lemon streets” (Joyce, Ulysses 10, 35). Neither the Sandymount houses nor the Paris cobblestones are painted lemon-yellow, of course, or appear so at all other times of day, but they look this way under the reflection of the early morning light. Stephen, a poet, is more interested in the phenomenology of the visual experience than its lexicon—one which would describe the houses by the name of their paint, or the stones as gray. Lemon-yellow, then, is the site at which the Aristotelian conception of color–the stones are gray—meets Newtonian color phenomenology.

Leopold Bloom, too, the hero of Ulysses, imagines the skin of his naked body in the bath, as “lemonyellow,” not because he is jaundiced, or of olive-toned Mediterranean complexion, but because he imagines the light catching his body, “oiled by scented melting soap,” the lemon-scented and lemon-colored soap he’d just bought (Joyce, Ulysses 71). When Bloom later notices the scent of “citronlemon” in his handkerchief, he conflates the citron, an ancestor of the lemon and the French word for lemon, with Israel Citron, a real Dubliner about whom he had been thinking two paragraphs earlier (Gifford 74, 133). Don Gifford suggests that Bloom “associates the soap with the citron (Ethrog) central in the ritual of the Jewish Feast of Tabernacles (Sukkoth) (133). In the surreal dream of the Circe episode, this soap reifies,”diffusing light and perfume," and speaks in terms of light and reflections: “we’re a capital couple are Bloom and I. He brightens the earth. I polish the sky” (Joyce, Ulysses 340). For Bloom, colors like lemonyellow are a crucible where visual experience, other sensory experiences, and memory are melted together.

But not only are these textual perceptions problematic, but, as Maerz and Paul remind us, the color of lemons themselves varies. In fact, lemons themselves are green before they ripen, and green in certain varieties. In French, a language in which Stephen often daydreams, lemons and limes are citron and citron vert, (“green lemons”) most commonly, meaning that lemons can be both yellow and green, in that language’s taxonomy. However, the color lemon, in English and in French, invariably refers to a bright yellow, despite any variation in its actual color. This is a theoretical problem now, but will become a practical problem, in the section below, on modeling color categorization. Everyone knows that lemons are yellow, blood is red, and the sea is blue. But lemons are also green, blood is usually brownish, and the sea may appear purple, brown, or green. So description, then, whether literary description or otherwise, is both a representation and a social contract.

## On the Impossibility of a Bluish Yellow

A second, more troubling, and more deeply epistemological problem Ludwig Wittgenstein articulates in his late work Remarks on Color. He asks, quite simply, whether it is possible to imagine a “bluish yellow”:

If you call green an intermediary colour between blue and yellow, then you must also be able to say, for example, what a slightly bluish yellow is, or an only somewhat yellowish blue. And to me these expressions don’t mean anything at all. But mightn’t they mean something to someone else? (Wittgenstein and Anscombe 20e)

Wittgenstein then asks whether a “reddish green” or other color combinations might be difficult to imagine, and why. He posits that the category of green is what prevents him from imagining “bluish yellow,” since, he says, “for me, green is one special way-station on the coloured path from blue to yellow…” (Wittgenstein and Anscombe 22e). This is an important question, with many implications. First, what colors are there which have greater primacy among speakers of English? And more generally: why do linguistic categories—color words and their weights in our language—transform our imaginative processes?

I say “our” here with some hesitation, since I suppose an affinity with others who might experience color terminology in the same way, but recognize that a painter, with years of experience mixing colors, might imagine these terms differently, as would, most likely, a speaker of a language very different from English. Still differently would a blind person imagine these colors.

This question of Wittgenstein’s is testable, to some degree, by examining patterns in literary data. To test this, I constructed a matrix of color expressions from the $$CM_X$$ color map,Described in more detail below.

where one word ends in -ish. The resulting matrix is shown here in fig. 1.

colorpurplishgreenishbluishgreyishtealishreddishpinkishlightishbrownishdarkishpurpleyyellowyblueyyellowishpurpleishorangishlightorangeish
redpurplish rednannannannannanpinkish redlightish redbrownish reddarkish rednannannannannanorangish rednannan
bluepurplish bluegreenish bluenangreyish bluenannannanlightish bluenandarkish bluepurpley bluenannannanpurpleish bluenanlight greenish bluenan
brownpurplish browngreenish brownnangreyish brownnanreddish brownpinkish brownnannannannanyellowy brownnanyellowish brownnanorangish brownnannan
pinkpurplish pinknannangreyish pinknanreddish pinknannanbrownish pinkdarkish pinkpurpley pinknannannanpurpleish pinknannannan
greypurplish greygreenish greybluish greynannanreddish greypinkish greynanbrownish greynanpurpley greynanbluey greynannannannannan
yellownangreenish yellownannannannannannanbrownish yellownannannannannannannannannan
tealnangreenish tealnangreyish tealnannannannannannannannannannannannannannan
tannangreenish tannannannannanpinkish tannannannannannannanyellowish tannannannannan
turquoisenangreenish turquoisenannannannannannannannannannannannannannannannan
beigenangreenish beigenannannannannannannannannannannannannannannannan
cyannangreenish cyannannannannannannannannannannannannannannannannan
purplenannanbluish purplegreyish purplenanreddish purplepinkish purplelightish purplebrownish purpledarkish purplenannanbluey purplenannannannannan
greennannanbluish greengreyish greentealish greennannanlightish greenbrownish greendarkish greennanyellowy greenbluey greenyellowish greennannanlight bluish greennan
orangenannannannannanreddish orangepinkish orangenanbrownish orangenannannannanyellowish orangenannannannan

Not only are there no entries for bluish yellow or reddish green here, but a few other patterns are apparent. First, yellowish green is not mapped to the same color as greenish yellow, indicating that the order of the adjectives dictates precedence. Second, those colors that take -ish adjectives are common colors. However common a color like maroon might be, reddish maroon does not appear in this list, potentially because it’s not considered a basic color with the ability to be mixed. However, some colors which are common in marketing, like beige and teal, but which are less common in paint names, are present here.

Also note that orangeish and orangish, variant spellings of the word, have different average colors here, and orangish is used as a modifier half as much as orange is modified by an -ish. We might say, that orange can -ish, but it is not very -ishable. Greenish and brownish are much more versatile as modifiers than others: they are good -ishers. But green is much more easily /-ish/ed than other colors. So does green take a first place in our cognitive pantheon, despite being a secondary color?

Pink has many variations here, despite being simply a shade of red. We don’t see these same patterns with an analog of pink in other hues, like light blue or light green. This leads one to believe that pink’s monolexemic and monosyllabic advantage over analogues like light blue give it more cognitive-categorical weight.

There are many, many more puzzles to be found by exploring the semi-permeable membrane of the color/word divide. But these should be enough for us to remember before proceeding to the algorithmic design of this computational model, where we will encounter more of these problems.

## Colors and their Object-Archetypes

Things and their colors define each other. In a metaphoric sense, things are colors: we use things to describe our visual experience. Alternatively, our color categories are shaped by the kinds of things we often see which are seemingly illuminated by these colors. The sky is blue is a statement of such obvious fact that the phrase has come to emblematize obvious facts themselves. The sky is blue, leaves are green, roses are red, and violets are blue. Or are they violet?

Put differently: are violets named such because they are violet in color, or is the color word violet the name for the color of the flower? Lexical data from the OED show that violet the color appears at least a hundred years after violet the name of the flower—in 1430 and in 1330, respectively. (oed:violet?). Similarly, we might ask, which came first, the color orange or the orange fruit itself? And there, too, the name of the fruit, taken from the name of its tree, is ultimately descended from Sanskrit, and is older than English itself, but the color sense only appears in the early 16th century. (oed:orange?). So we might deduce that the polysemy between these colors and their associated objects is caused by the phenomenon of naming colors after common objects.

But in other cases, bigrams encode a magnetism between the object and its appearance. In designing the experiment below, I wanted to know: what objects are most often described as blue, red, and so on. I wanted to quantify the gravitational pull of colors with their associated objects. To investigate this, I use data from the Google Books Ngram Viewer project, hereafter $$C_{NG}$$ (googleNgrams?). Google Books provides n-gram (sequences of words of length $$n$$) data for the books in its vast collection, and the most recent version, 20200217, provides n-grams tagged according to their parts of speech. I use the data subset English Fiction, which, although not strictly relegated to the time and place this I am studying here, is still useful to determine broad patterns.I describe this corpus in greater detail in the appendix below.

There, I query for patterns ADJ NOUN, where ADJ is a color word, tagged as an adjective, and NOUN is any noun which follows. The list of color words I derive from Berlin and Kay, but augment with several auxiliary color words, for comparison (Berlin and Kay).

Fig. 2 shows word cloud visualizations for each color word and its most commonly collocated nouns.Word clouds, or tag clouds, are a relatively recently popularized technique of textual data visualization, which depicts the frequency of words through typeface sizes. See (viegas2008timelines?) for a history of the visualization that traces it to Soviet Constructivism.

Surprisingly, the most frequent collocations are not the most cliché: green grass and green leaves are of course present, but are not as frequent as green eyes. Blue sky is present, and red blood, but are subordinate to hair and eyes.

## bluish

Some overall trends are apparent in these words, which may be illuminated by categorizing them. Using WordNet, the relational lexical database, I am able to identify hypernyms for most of these words, and group the words according to these hypernyms (miller1998wordnet?). The hypernym treemap for red, for instance, shows that a good proportion of the words are body parts (they have the hypernym synset bodypart.n.01), or “coverings” such as hair (covering.n.01). A crucial similarity between these bodily descriptors, from red hair, the most frequent collocation, to red eyes, red lips, and red face, is that they describe exceptions, or aberrations, from their usual states. Red hair (actually orange, as I will argue below) is among the least common natural hair colors. Red eyes describe diseased or depigmentized eyes, of humans or other animals. And red lips are lips that are unusually red: whether blood-filled through vigor, or excitement, or through the use of cosmetics.

Artificial objects comprise a second, equally large, hypernym, artifact.n.01. These are, with a few exceptions like brick, items which have been dyed red: silk, velvet, dress, shirt, tape, lipstick, carpet. If silk, or a dress, were always red, we might not need to describe it as such—it would be obvious. But since these are all items which are typically dyed, at least in modernity, they need to be described according to their dye. As with the read body parts above, the dye here is the difference, or the abnormality, which necessitates the color description.

This leads me to a theory of color description: that color descriptions are color exceptions.

## Color descriptions are color exceptions

We are blind to things that aren’t important to us, or exceptional in some way. It’s not that the light, at certain color frequencies, doesn’t reach our eyes, but it isn’t processed by our brains in the same way. Thus, what we describe, using color words and color expressions, is what we have noticed, or what we want to be noticed: something different, striking, or unusual. This is why red hair is a more frequent collocation than black hair or brown hair, despite the rarity of the gene that causes red hair.

At this point, you may fairly imagine a number of counterexamples, not the least of which are those clichés I’ve just outlined: the blue sky, or the even wine-dark sea. I would argue that, in most cases, these are a special kind of exception: one of magnitude, rather than category. In other words, when a writer describes leaves as green, it is less often a pure cliché than a calculated underscoring of the visuality of the leaves: that they are unusually green, noticeably green, or a particular subcategory of green.

Here is a passage from Jacob’s Room, Virginia Woolf’s novel, which, incidentally, I will later show is among the most colorful in this period of British Literature:

The tree had fallen, though it was a windless night, and the lantern, stood upon the ground, had lit up the still green leaves and the dead beech leaves. It was a dry place.(woolf08_jacob?)

Here, green leaves describes an exception to the rule in which living leaves are green, and dead leaves are brown. But this theory remains manifest upon closer examination of some of the bigrams of fig. 2 as they appear in $$C_{PG}$$. There, the phrase green leaves is rarely unaccompanied by an additional modifier. Leaves are light green, emerald-green, or sea-green: specificities that take color knowledge, via visual abberation, into the realm of color description.

In James Joyce’s 1914 short story collection Dubliners, the boy protagonist of “An Encounter” describes trees along the canal bridge:

All the branches of the tall trees which lined the mall were gay with little light green leaves and the sunlight slanted through them on to the water.(dublinersGutenberg?)

And in John Galsworthy’s The Dark Flower, a colorful novel, as it narrates the life of a visual artist, we read of “a blue sky thinly veiled from them by the crinkled brown-green leaves” (galsworthy1913dark?). Sometimes the word-order is different, even though the same syntactic dependence remains: in May Sinclair’s Mary Oliver, a Life, one of the most colorful novels as measured in the analysis below, we see “green leaves” which “had the cold glitter of wet, pointed metal” [TODO: cite]. Sinclair is not content with a description which presents leaves as green, but presents them with alien qualities. These leaves appear so unusually, she implies, that the play of light on their surfaces appears as if their material were entirely different.

It may seem as if this argument—that color descriptions are visual anomalies—deflates upon scrutiny. After all, it’s obvious that we don’t need to say the obvious. But this theory will be born out materially as we begin to reverse-engineer textual imagination, which we will now see.

# Imagining Words: Mapping Words to Colors

To model imagination, we start by working backwards, by first creating an engine to generate a color from a word or phrase. Given a word or group of words, we infer a color code: the exact proportions of red, green, and blue needed to recreate it computationally. We can either conceive of this process as modeling the writer’s imagination in reverse, or modeling a reader’s imagination. Of course, the question then becomes: which reader’s imagination do we attempt to model? One approach would be to model as many imaginations as we can, by averaging several mappings, from several different sources. But these sources vary greatly. So factors we would look for in a color/word mapping would include:

• Consensus. Color names should not be too subjective, since we want language that can be evocative with some degree of reliability. To this end, word/color pairs that appear in more than one map should be weighted higher than those that only appear in one.
• Synchronicity. The color names should not be anachronistic to the texts we are trying to understand. So a color like cyberspace blue is not very irrelevant to an understanding of a Virginia Woolf novel. However there is a sense in which it does: the imagination of a contemporary reader applies to his or her understanding of a literary work.
• Syntopicity. Army green and navy blue refer to the uniforms of their respective countries. However, the proliferation of these colors between militaries makes this difference small.
• Objectivity. We need to mitigate the influence of marketing on color naming. Paint manufacturers and similar organizations have a way of describing colors that are meant to sell paint: they skew towards pleasant color names. Yet not all colors are pleasant ones.However, colors on the whole do skew towards pleasant ones. Since colors are only perceptible in one’s central vision, and not peripheral vision, color perception betrays attention.

• Size. It would be best not to exclude colors simply because they don’t appear in a pack of Crayola crayons. Yet the more colors one includes, the more chances there are of metaphors that are more subjective, and farther afield.

A related issue is the algorithm by which we decide to collapse color word orthographies:

• Fuzziness. Blue-green and blue green should be categorized together as the same color. Yet blue! Green, that is, at the end of one sentence and the beginning of another, should not be categorized together.
• Absinthe green should match absinthe as well as absinthe green.
• Green and greenness should be in the same family, but not necessarily as synonyms, since greenness connotes something more abstract.

With these principles in mind, I chose several books, and several databases containing color/word mappings, and combined them into one master map, although I will occasionally use individual databases when appropriate.

## Heuristic Maps

The breadth of the text/color translation problem is suggested even in at a glance at its bibliography: dictionaries of color, or manuals of color nomenclature, were essential reference books for centuries, not only among visual artists, designers, and others that work with pigment, but among botanists, ornithologists, and anyone else in need of a standardized way to describe visual phenomena. These manuals invariably contained color plates—some hand-painted, even—intended to be concrete mappings between color words and their associated hues. I chose just a few of these, based on their proximity to the period, the availability of their electronic editions, and/or the number of colors they contained.

### $$CM_R$$, Ridgway

Some of the most ambitious attempts at mapping colors to their names, or naming colors, came from the natural sciences. American ornithologist Robert Ridgway (1850-1929), for example, authored two influential works of color naming systems: A Nomenclature of Colors for Naturalists in 1886, and Color Standards and Color Nomenclature in 1912. In the preface of the earlier work, Ridgway names as his problem that “the author has in collection considerably over three hundred water-colors, each bearing a different name” (Ridgway, A Nomenclature of Colors for Naturalists X). This volume contains over a thousand colors, but the color names are less metaphorical (lemon-yellow), and more descriptive (bright yellow).

### $$CM_S$$, Saccardo

A continental work of the same decade is the ambitious and polyglot volume from Italian botanist Pier Andrea Saccardo bearing the formidable Latin title, Chromotaxia Seu Nomenclator Colorum, Polyglottus, Additis Specimibus Coloratis ad Usam Botanicorum et Zoologorum (1894). Although containing only fifty colors, it features an index of several hundred “synonyms” for these colors in Latin, Italian, French, English, and German. While some of these are recognizable to modern readers, others seem strangely specific, such as Murinus (mousey) or Fuligineus (“sooty”). Saccardo provides two supplementary colors: achrous, or colorless, glassy; and sordidus, or “sordid,” “dirty,” which he describes as a modifier rather than a color. non est color definitus sed indicat inquinamentum aliorum colorum. Exempla: sordide albus, luride ruber(Saccardo 16).

### $$CM_M$$, Maerz and Paul

Maerz and Paul’s 1930 A Dictionary of Color provides the largest number of colors, and was itself meticulously compiled from a number of prior manuals. This volume contributes over three thousand colors.

### $$CM_P$$ Pantone

The Pantone set, one of the most common among designers and artists today, contains over two thousand colors, but they have names which are much more mercantile than others. Thus, these are biased towards food-related words, flower-related words, or anything else that would seem like a pleasant marketing term.

### $$CM_X$$, XKCD

The antidote to the Pantone set is one from Russell Monroe, an American author, former NASA engineer, and cartoonist best known for his webcomic XKCD. Monroe surveyed his wide readership, asking them to name colors they were shown at random on his website. He also took demographic data from them, logged their locations via their computers’ addresses, and asked them whether they were colorblind, or used a cathode ray tube monitor. The survey results, which represent the five million color mappings from 220,500 users, show a consensus for many color names, as shown in table 1.I’ve represented these hex values in RGB space using a script which displays them in the browser, overlaid on the hex value itself, to account for color differences in monitors, pages, and other media.

Table 1: XKCD Color Mappings
Color Name RGB Hex Value
purply blue #661aee
silver #c5c9c7
sickly green #94b21c
melon #ff7855
mocha #9d7651
coffee #a6814c
canary yellow #fffe40
purpleish #98568d
bluey purple #6241c7

This mapping presents a useful counterpoint to commercial mappings such as that of Pantone, or to more systematic mappings like Ridgway’s. In the sample presented in [fig:xkcdBlocks], we see a mix of naming metaphors. The usual food metaphors (melon, mocha, coffee) appear next to animal metaphors (camel, canary yellow) and creative compounds indicating a small amount of one color mixed into another (purplish, bluey, purply, preyish). The informality of the “-ish” suffix suggests extemporaneous description, as if colors are mixing in the imaginations of these survey respondants, in the absence of a ready-made metaphor. For comparison, greyish pink in this color map is blush in Pantone, and darkish green translates to online lime. And of course, one would expect that sickly green would not be an easily marketable name for a commodity, especially if it were food, so in Pantone the color is lime green. If an exact match for a hex value does not exist in a color map, I find the closest color to it using $$\Delta E^{*}_{ab}-76$$. This is described in more detail in Categorization.

### Summary and Comparison

Table 2: Table of color mappings
Abbreviation Name # Color/Word Pairs Year Weight
$$CM_S$$ Saccardo, Chromotaxia Seu Nomenclator Colorum 500 1894 2
$$CM_R$$ Ridgway, Color Standards and Color Nomenclature 1113 1912 2
$$CM_M$$ Maerz and Paul, Dictionary of Color 3224 1930 2
$$CM_P$$ Pantone Colors 2310 2010? 1
$$CM_X$$ XKCD Color Survey 954 2012 3

To compare the tendencies, or biases, of these color maps, and to better know how to balance them, I calculate the average of their 300-dimensional GloVe vectors (Stanford’s Global vectors for word representation, trained on English-language websites), and derived the cosine similarity to the vectors of a number of seed words:

$similarity(\vec{A}, \vec{B}) = \frac{ \vec{A} \cdot \vec{B} }{\|\vec{A}\| \times \| \vec{B} \|}$

Or, the dot product of the two vectors, normalized by the product of their two $$L_2$$ (Euclidean) norms.

Fig. 3 shows a series of word vectors chosen to illustrate the vector similarity with the average vectors of each color map. $$CM_P$$, the Pantone map, has a higher similarity to positive, marketable-sounding words, and words evoking leisure, whereas $$CM_X$$ has a higher similarity for snot, a decidedly unmarketable word, discussed later, and Jaffer’s aggregation of $$CM_S$$, $$CM_R$$, and $$CM_M$$ shows a slighly higher similarity for blood, also not a decidedly marketable term.

Given these tendencies, I weight these color maps as described in table 2, and combine them into one large master color mapping, which will become the basis of the imagination model below.

## Deep Imagining: Color Inference

Mapping color expressions to hex values is only the beginning. Since explicit color words are not the only words that suggest color in the mind of the reader, and since broadly imagining a text will allow us to understand it more than narrowly, it would help to imagine those aspects of a text that are more difficult to imagine.

But this is a difficult problem: how can we derive the color of an object, or an adjective, where that color is known to a human reader, but not to a computer? For example, the words Statue of Liberty would recall the pale greenish color of copper oxide to those familiar with the statue, even from images, although this mapping isn’t readily available in a database.

If a poet or novelist presents us with imagery which is all of a single color, we want to be able to see that. One of the tasks of a literary critic, after all, is to be sensitive to the arrangements of the writer, so as to point out their resonances.

Take Katherine Mansfield’s 1922 story “The Garden Party,” for instance, an inspiration for Woolf’s Mrs Dalloway, and a classic modernist short story. (The short story collection of the same name is among the most colorful works, as ascertained in an analysis below.) Set on a fine day in early summer, in New Zealand, it is resplendent in greenery, which is to say, flora. But besides the grass, the lawn, the green bushes, the karaka-trees, and the leaves and stems of the flowers, there is an unusual abundance of other green things, as well. Laura’s sister Meg is wearing a “green turban” when she arrives for breakfast (Mansfield 287). A band plays music from a tennis court, which we might assume is green, since it’s compared to a pond, and tennis courts are usually green (ibid.). The band itself are wearing green, which makes Kitty compare them to frogs (294). Green baize doors separate the servants’ rooms from the rest of the house (289). Some of this is explicitly labeled as green, but some, like frogs and tennis courts, we are just expected to know are green things. While frog green does appear in one of the sources of $$CM_J$$ sources, and tennis court green appears in some responses of the original $$CM_X$$ survey, the value does not appear in the final mapping.

So to computationally imagine not just green itself, but things which are very likely to appear green, we need to find a way to imagine colors from any given word. This is where we must develop an engine to model deep imagination.

### Word Proximity-based, $$M_P$$

One of the simplest methods of color inference is to calculate the syntactic distance from a known color word to a target word. Given a large enough corpus, it is quite likely that, for example, green will appear within several words of grass, and so by measuring the distances from green to grass, and noticing that these distances are much shorter than for the pair red and grass, we might infer that grass is green: that is, the literary imagination of grass has the color category green.

Another example might be inferring the color of a gull. Anyone who has visited the north Atlantic shore knows that gulls tend to be white and gray. Of the 170 times that the word gull appears in $$C_{PG}$$, we see white appear within about ten words of it nineteen times. The lemma grey appears six times. Red, however, appears not once. Of course, both green and yellow appear twice, although not with the same relations in the dependency graph. Given this collocation data, we can write a model that guesses that gull is mostly white, a little gray, and with hints of green and yellow.

However, syntactic proximity is preferable to raw proximity itself, and so I developed an algorithm to score relations between two neighboring words, which uses both linear word distances and syntactic distances. I calculate syntactic distances by traversing the dependency trees of their containing sentences. By way of illustration, take these lines from Arthur Conan Doyle’s 1908 novel Sir Nigel:

Next morning they found themselves in a dangerous rock studded sea with a small island upon their starboard quarter. It was girdled with high granite cliffs of a reddish hue, and slopes of bright green grassland lay above them. (doyle1906sir?)

The syntax dependency graph of the clause, “slopes of bright green grassland lay above them” is parsed as shown in fig. 4

graph LR lay --- S[slopes of] S --- grassland grassland --- bright grassland --- green lay --- A[above them]

Here, bright and green are descendants—syntactic dependents—of grassland. This might even more accurately be parsed with bright and green together as one semantic unit.

This model infers color associations $$W_C$$ from target words, $$W_T$$, by traversing the syntax tree, and calculating weights accordingly.

However, modifiers are not always direct descendants of their modified words, since they might cross sentences. (Imagine a passage that were to read: “The slopes of grassland. How bright green they were!”) So to account for these types, I also compute weights based on the raw distances of these words from each other. The full algorithm is this:

1. It begins by identifying a color word, $$W_C$$ from color map $$CM_X$$ in the target text.
2. It then parses the containing sentence, and determines its syntactic dependencies.
3. Starting from $$W_C$$, it navigates through parent words and parent noun chunks $$W_T$$ to the root of the sentence.
4. If $$W_T$$ is a noun or adjective, it is assigned a score: 2 if it is a direct parent of $$W_C$$, or 1 if it is a grandparent of $$W_C$$, at two steps’ removal in the syntactic tree.
5. All other words nouns and adjectives are now candidate $WT$s, and are assigned a score: $$1/i$$ where $$i$$ is the distance, in number of tokens, from $$W_C$$. Thus, it gets a score of 1 if it is a directly adjacent word, or 0.5 if it is two tokens away.
6. These scores are then averaged for each token that shares the same lemma.

The resulting data structure looks like fig. 5, for grass: