When Does (German) Literature Take Place? – On the Analysis of Temporal Expressions in Large Corpora

1. Workflow

Our workflow consisted of four steps that were performed in parallel: (1) compilation of a suitable (German-language) corpus; (2) collection of data using the temporal tagger HeidelTime (Strötgen and Gertz, 2012) for the automatic extraction of temporal expressions according to the guidelines of the temporal markup language TimeML (Pustejovsky et al., 2003); (3) data analysis (from heat maps to individual cases); and (4) development of an Android app for exploring the ‘literary year’.

Out of the two biggest corpora with German literary texts, the TextGrid Repository 1 and Gutenberg-DE, 2 we chose the latter. We prepared the corpus so it would only contain fiction and ended up with 2,735 works by 549 authors, the majority of which had been published between 1840 and 1930. The resulting 900 MB of text were fed into HeidelTime to extract date specifications. Just using the explicit (and therefore very correct) expressions, we created a calendar heatmap (where ‘1’ means 0–9 occurrences, ‘2’ means 10–19 occurrences, etc., and ‘+’ means 90 or more occurrences; days with more than 50 occurrences are highlighted). Some expected and unexpected accumulations turned up:

JAN: +333222323131323223222222222131
FEB: 43322222133212332332212322231
MAR: 73334322233243422433 63232322252 (21 st)
APR: +33233223432223223332322223323
MAY: +3544332353 64353232424323223244 (12 th)
JUN: 733233323333324432343324433233
JUL: 9444332333243 652333432224223223 (14 th)
AUG: 83 6442232 72444 63344533332323222 (3 rd, 10 th, 15 th)
SEP: 854433233332234233233221222323
OCT: +353322224223552253432222222133
NOV: 944233333 723225213232222222224 (10 th)
DEC: 55223412132232321322333 92122224 (24 th)

As we can see, first days of a month and fixed holidays (New Year, Christmas) have a much higher frequency. But some other days also stick out, an example being the ‘10th of August’. A look into our results files showed that 74% of its occurrences provide a temporal setting for the fictional plot. The above-average frequency of the 10th of August, though, has to do with a historical event, the storming of the Tuileries Palace on 10 August 1792. About 21% of the occurrences are references to this historical date (cf. Georg Büchner’s play Danton’s Death: ‘ First Citizen: Danton was with us on the 10th August, Danton was with us in September’). Therefore, it is necessary to distinguish between historical dates that left their traces in literary texts and explicit dates that provide a temporal setting of a fictional plot. The collection and analysis of such date accumulations will be systematically expanded, in regard to specific authors, languages, and nations.

2. Significance for Literary Studies

Along the lines of Hans Ulrich Gumbrecht’s study on the year 1926, it would be useful and feasible to assemble the literary history of individual days. Every date has its own literary history, as is indicated by the example of Paul Celan and the ‘20th of January’.

In Celan’s prose poem Conversation in the Mountains (1960), he alludes to Georg Büchner’s short story Lenz, which also describes a passage through the mountains. Büchner’s text starts with the sentence, ‘The 20th of January, Lenz walked through the mountains’. In Der Meridian, Celan’s acceptance speech for the Georg Büchner Prize (Germany’s most prestigious literary accolade), he stresses that Lenz’s hike through the mountains takes place on a 20th of January and extends the temporal frame by referring to another 20th of January, the one of 1942, when the Wannsee Conference took place in Berlin. Celan concludes, ‘Perhaps one may say that every poem has its “20th of January” inscribed?’ (cf. Sieber, 2007, 114).

The automatic extraction of date expressions from large corpora makes such simultaneities visible and enables their systematic exploration.

3. Development of an Android App to Facilitate the Exploration of Date Expressions in World Literature

To get an idea of the seasonal cycle of literature, we developed an Android application that works like a calendar and, for each day of the year, presents passages from canonised texts of world literature that take place on that very day. Screenshots are shown in Figure 1.

It is well known that James Joyce’s Ulysses takes place on 16 June 1904. But there is just one inconspicuous mentioning of the day in the novel (the secretary Ms Dunne types it in on the keyboard); it is made visible in the app. Other examples for such passages are 12 June in Günter Grass’ novel The Tin Drum (birth of Oskar Matzerath’s declared son Kurt), 29 February in Thomas Mann’s Magic Mountain (where the date is of importance as a special variant of the Walpurgisnacht; see Figure 1), and 27 July in Stefan Zweig’s Chess Story (on that day, imprisoned protagonist Dr. B. gets hold of the chess book).

Our app thus represents a database of fictional dates that will be available for further research. To be able to map the entire ‘literary year’, though, we will also have to take approximate temporal expressions into account, which we will be attempting in the next section.

The Seasonal Cycle of Literature

We already mentioned the very specific ratio between exact and approximate dates. In the search for anomalies in the works of individual 19th-century authors, we came across Theodor Fontane and Theodor Storm. A collection of just the month specifications in the fictional works of both authors yielded the results shown in Table 1.

In conformity with the popularity of the first of May, the whole month is strongly represented in the narratives of both authors. However, the summer months (of the Northern Hemisphere) are not. Fontane’s narratives seem to especially take place between September and January, Storm’s works between August and November. Given that every month name has a specific tonal-associative character and creates a stylistic effect, both authors seem to favour autumnal/wintry settings and moods.

4. Conclusion

In this abstract, we presented a method to find date accumulations in large literary corpora. We described the relation between approximate and exact dates and introduced a growing database of exact date specifications in world literature. Furthermore, we approached the seasonal cycle of literature and certain authors to try to answer the question, ‘When does [German] literature take place?’ in a macro-analytic fashion. The results obtained to date already show the potential of the ongoing research.

5.

Notes

1. http://www.textgridrep.de/.

2. http://projekt.gutenberg.de/.

Appendix A

Bibliography
  1. Jockers, M. (2013). Macroanalysis: Digital Methods and Literary History. University of Illinois Press, Chicago.
  2. Pustejovsky, J., Castano, J. M., Ingria, R., Sauri, R., Gaizauskas, R. J., Setzer, A., Katz, G. and Radev, D. R. (2003). TimeML: Robust Specification of Event and Temporal Expressions in Text. In New Directions in Question Answering. Cambridge, MA: MIT Press, pp. 28–34.
  3. Sieber, M. (2007). Paul Celans ‘Gespräch im Gebirg’. Erinnerung an eine versäumte Begegnung. Tübingen: Niemeyer, http://books.google.de/books?id=KbfF2oIHrjwC&pg=PA114.
  4. Strötgen, J. and Gertz, M. (2012). Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards. In Proceedings of the 8th International Conference on Language. Resources and Evaluation (LREC 2012), pp. 3746–53.
Frank Fischer (frank.fischer@zentr.uni-goettingen.de), Göttingen Centre for Digital Humanities, Germany and Jannik Strötgen (jannik.stroetgen@informatik.uni-heidelberg.de), Heidelberg University, Germany