Without the methods of the digital humanities, my history of nostalgia would have been impossible to write. For the past four years, I have been working with a linguist, Rebekah Baglini, (now a Mellon fellow at Stanford University), on the semantic morphology of nostalgia in the HathiTrust/Google Books dataset (~3.2 million books). While this dataset is very messy, we have been able to produce our corpus of 25,000 occurrences of nostalgia, which we use to address four important questions:
- Semantics. How is nostalgia defined? How can computational approaches allow us to study the meaning of a concept, its semantic neighborhood? To address these questions, we determined that the fifty-word contexts where the word appears – roughly the size of a paragraph – represent the best frames for isolating the keywords that give nostalgia its meaning. Peter de Bolla arrived at a similar conclusion in his recent work, The Architecture of Concepts, which we hope lends support to our decision.
- Historical Semantics. How does nostalgia change over time? How can we distinguish these changes? By measuring the relative frequency of keyword co-occurrences in these contexts across unique twenty-year periods, we identified shifts in the semantics of nostalgia between 1800 and 1919. For example, between 1800 and 1819, keywords like “doctor” and “disease” occur most frequently, while between 1880 and 1899 new keywords like “life” and “youth” emerge for the first time.
- Structure. How does nostalgia fit into larger frameworks of knowledge? How are these frameworks structured? Here, we produced network analyses in each twenty-year period to reveal the relations between our key co-occurring concepts (rather than just the unilateral relationship obtaining between nostalgia and a single concept).
- Variation. How can we tell if there are different versions of nostalgia in use in a given period? In order to test the widely held assumption that concepts belong to larger debates, we used a community detection algorithm to distinguish different clusters of language used to describe nostalgia. (N.B. This is basically a more robust form of topic modeling, and was recommended to us by a sociologist who also worked on the project). We found, for example, that between 1800 and 1819, nostalgia is actually quite homogeneous, with only two clusters, while between 1900 and 1919 there is incredible variety, with twenty-seven statistically significant clusters. Explaining possible reasons for this diversification will be an important part of the humanistic side of my project.
For example, consider the following non-egocentric network models, which display the most frequent concepts that co-occur with nostalgia in two distinct historical periods:
These two networks depict the language that most likely gives “nostalgia” its meaning. At this point, Rebekah and I have two opposed methods to explain these results at our disposal. On the one hand, we might want to see if it is possible to describe the transformation of this concept (from a pathological desire to return home to a popular sentimental emotion associated with lost times and places) in the terms of formal logic. On the other hand, we might want to pursue hermeneutic or formalist research by using these results to orient our reading of historical source materials. In the end, we believe that both methods have their advantages, particularly when combined. The drive to produce systematic explanations is important because it leads to significant methodological innovation, as well as a discussion into what results would count as truth. The drive to read, by contrast, reveals information that might slip through the cracks otherwise. Sometimes this means that anecdotal or exceptional examples are recovered, other times it means entire global frameworks.
Arts, Science & Culture Initiative, May 2013
Digital Crucible: Arts & Humanities & Computation, October 2014 (presentation begins at 26:20)