As cultural historians interested in public debates we necessarily only access historical newspapers through the lenses that they themselves – either as commercial companies, social spaces, and/or cultural phenomena – provide us. In this paper, we propose a digital humanities approach to historical newspapers to overcome these issues of biased information provision. The digital turn has enriched historical research not only with the bottom-up approach of keyword searching, but also with a growing range of digital analysis tools. Applied to (digitized) newspapers, these techniques are able to break down the rhetoric of the journalistic profession and deconstructing illusion and reality. At the same time, the use of digital techniques in historical study faces the researcher with problems and considerations of their own, which this paper addresses in detail.
It were alarming terms in which the regional Dutch newspaper Leeuwarder Courant referred to an American study on the influence of television on family life in the Fall of 1950: “Some researchers spoke to parents whose children watched television for eight or nine hours during weekends. […] Many mothers complained about the impossibility to get them to the dinner table, to their homework, or to bed in time.” One can feel the concern hidden in these sentences – although even three years later less than 3000 Dutch households actually owned a television.
Perhaps this fact only affirms the power of historical news articles like these to provide one with a feeling of its Zeitgeist: although the piece nowhere explicitly expresses opinions, the anxiety speaking from it cannot be overseen. The fact that it is printed provides us with a glimpse of a past mentality – namely that televisions are disruptive devices and a threat to family life. Similarly, it is alluring to consider newspapers as the antennas of current events and as ventriloquists of public opinion. As such, historical newspapers form, in theory, overwhelming sources for the study of how ‘the common’ people’ (as contrary to intellectuals and politicians) experienced, judged, or thought about events, people, and objects.
In this paper, we propose a digital humanities approach to historical newspapers to overcome the traditional issues that to a high extent keep newspapers from fulfilling this promise. After all, newspapers in the past—as they still do today– have never neutrally portrayed the news and the public’s opinion about it. Media theorist Jaap van Ginneken has pointed at the prominent discursive level of language in news media in his Understanding global news. He argues that news is never written in neutral language, given the fact that ‘[b]oth media producers and media consumers can only perceive and render the world in terms of meaning systems, in terms of pre-existing frames’. According to Van Ginneken the practice of ideological, political or corporate framing is inherent to the business of newspapers.
However, newspapers not only have ideologies of their own, they also have formats. In other words: they have to select and edit the news before it can be published. Above that, editorial staffs are social spaces like any other. Journalists and editors collaborate, but also compete with each other. This affects the articles as they make it to the paper as well. Finally, the conventions of newspaper make-up and layout vary considerably over time and place. Do newspapers tend to have long articles, or rather brief ones? Do they make place for opinion, feuilletons, and advertisements? Are articles written concisely or rather elaborately? All these aspects impact on the way newspapers bring us both ‘the news’ and ‘the public’s voice’.
As cultural historians interested in public debates we necessarily only access them through the lenses that newspapers themselves – either as commercial companies, social spaces, and/or cultural phenomena – provide us. Consequently, historians working with historical newspapers are confronted with a typical kind of source criticism. As we will argue, digital methodologies applied to (digitized) newspapers are not only able to break down, or rather deconstruct the rhetoric of the journalistic profession, but also to give us glimpses of past mentalities in promising new ways.
This paper is based on our experiences in two digital humanities projects at Utrecht University. The first was a one-year pilot project titled BILAND, which ran in 2012/2013 and was aimed at comparing public debates on genetics and eugenics in the Netherlands and Germany before WWII. The second is the currently running project Translantis, which facilitates the research of six PhD and postdoc-scholars studying the United States as a model for several aspects of Dutch culture and society. Both projects use the Dutch National Library’s (NL) digitized newspaper archive as its main source. Besides their historical focus, the goal of both projects is the development of a digital tool for the analysis of this enormous dataset. The NL’s collection of digitized newspapers offers over eight million pages of newspapers published between 1618 and 1995. For the period between 1890 and 1990, the database contains 480.112 national and regional newspaper editions. The software is being developed in close cooperation with software engineers from the Informatics department of the University of Amsterdam.
We are currently experiencing a ‘digital turn’ in the humanities as we are confronted with an ever-growing corpus of digitized source material. The analysis of large amounts of digital texts is often referred to as text mining. The term is used to describe a variety of analytical functionalities—either based on statistics, or on linguistic algorithms—and the ways to visualize their outcomes. The idea is that these visualizations present text corpora in different ways than humans used to approach them, thereby offering the possibility of insights that would never have appeared by reading every text within a corpus from a to z. This article presents text mining as a way to overcome traditional issues surrounding the use of newspapers in historical research. However, it in no way aims to claim that digital techniques make source criticism obsolete. On the contrary, it will argue that they generate a range of new problems that researchers will have to cope with when engaging in digital aspects of historical scholarship.
The Digital Turn
Historical newspapers, in particular, are being digitized with dazzling speed all over the world. The NL has recently finished its above-mentioned immense digitization project. Other European countries eagerly follow the Dutch example. The Europeana Newspaper Project aims to aggregate 18 million digitized historical newspaper pages from as many European countries in 2015.
Still, what does the ‘digital turn’ comprise of? The use of computational methods in historical research has a long tradition. The characteristics that distinguish the recent digital turn in the humanities and, more specifically, in historical scholarship are, therefore, not so much the use of computers per se. It is the amount of data that allows for different ways of quantitative analysis (‘big data’). However, the rapid rate of digitization does not remove the problems of source criticism touched upon above. If all, they make accounting for them more urgent than ever, as more and more scholarly research finds its way to these magnificently abundant new sources.
When speaking of digital humanities or of digital history as analytic approaches to doing research, two features are genuinely innovative. First, the large amount of readily available digitized material makes it possible to combine separate sets of data in ways and quantities that have before been unimagined. Traditionally, the creation of databases or collections of archival material have been so exhaustive that strict selections had to be made. Today, massive data sets from different countries can be compared almost instantly. Also, for the first time textual archives can be searched alongside vast digital collections of images, or audiovisual material. These data sets are a mer à boire for cultural historical research, enabling new forms of comparative studies.
Second, the use of computers to gain access to this digital information has fundamentally altered the historian’s heuristic process. Even the most simple search techniques bear witness of this (although not exclusively those, as will be argued in the next paragraph). The NL’s digitized newspaper archive, for example, facilitates researchers to do full-text searches through the entire repository of articles. Using single word queries or more complex queries that include Boolean operators and fuzzy matching make it possible to find the occurrence of specific words or combinations of words within texts. Full-text searching enables a bottom-up approach that fundamentally alters our access to source material like newspaper corpora. “Billions of individual words – the fundamental building blocks of culture – are now at our fingertips”, as Bob Nicholson has argued. The advantages for scholarly research are enormous, knowing that “the vast majority of the information in an article – the things that we can learn about the people and the society who produced and read it – are not organized using a top-down system.”
Now, the querying of (combinations of) keywords forms a direct point of entry into the content of the corpus. This search strategy has the potential to enrich historical research in vital ways. It enables the researcher to find his object of study – defined by a string of keywords – at places he would never have thought of looking from a top-down perspective. In the specific case of the Translantis-project, how would one have gone about to give an overview of the discursive use of references to the United States in newspaper articles? One would not have known where to start. After all, these references could have literarily turned up everywhere. If one would have wanted to argue how diversely America was used in argumentations, one could not but read every opinionated newspaper article period. With full-text searching, a simple query suffices to get the same result. It is not hard to imagine that the digital turn has significantly enriched our view of the diversity of associations with America in Dutch public discourse.
This new way of searching and browsing enables discourse analysis through the use of specific words or concepts. The option to filter databases of newspapers on the basis of region and date allows for a faster and quicker way to trace continuities and discontinuities in language. This makes the mapping of ideas, practices, and products and their discursive formations much more feasible.
The capabilities of such a full-text search are easily overlooked, but they have considerably changed the practice of research. In the past, time constraints turned archival research into browsing through a selected numbers of editions or volumes. In the process, the title of an article pointed the researcher into a specific direction. Using full-text search, singular keywords or strings of keywords allow the researcher to immediately dive into the text and study the context of the used word(s). In the practice of research, full-text search has an exploratory and heuristic goal; it guides the researcher towards specific information or possible hypotheses.
Keyword searching by itself has some considerable limitation for the proposed research into ideological layers of language in media. For obvious reasons it has, for example, serious trouble finding the ‘unspoken or unconscious assumptions’ that define a collective mentality. Keyword searching can only deal with the articulated dimensions of language. However, humanities research has witnessed the introduction of a range of digital tools and techniques during the last years that are better able to present a deeper understanding of meaning systems in language.
Digital analytic methods
The advancement of computer technology has brought about hitherto unseen range of analytical tools and research methods. Geographic information systems or network analysis may have had a longer tradition, but the speed and ease with which large networks can be digitally arranged, or spatial references can be plotted on digital maps today are innovative indeed. The same goes for the thorough digital analysis of data with the help of statistical and/or linguistic methods. This has become a field of study in itself. Originally (and still) described as information retrieval, the concepts of data and text mining have become also in vogue to describe this scholarly field in the age of big data. Text mining techniques can, for example, help to find ‘strikes that never happened’ or to attribute texts from unknown origin to specific authors. In short, computer technologies can help to solve research questions that scholars did not dream of asking before.
The NL’s own n-gram viewer, for example, allows for sophisticated ways of information extraction from the newspaper corpus. It enables quick comparative or trajectory analysis of n-grams – a word or sequence of words – the relative occurrences in the newspaper corpus of which the tool displays on a temporal axis. For instance, one could chart the trajectory of specific English or “American” words or products in Dutch public discourse. Moreover, one is now able to search on idioms like the Dutch translation of “typically American” – exploring the diversity of what Dutch newspapers considered as such, but also comparing the extent to which things were seen as “typically American” with “typically German” or “typically French.” Comparing queries such as these can inform us about the distinctiveness of America as a reference culture of the Netherlands.
The tool built within the BILAND project and its predecessor project WAHSP and further developed within Translantis, combines the histogram visualization of the n-gram viewer with a word cloud functionality. The tool, named Texcavator, enables the user to save the results of his or her queries into subcollections. Out of these subcollections word clouds can be made: a visualization of words within the entire subcollection with the highest overall frequency.
A form of visualization as basic as word clouds has at least two advantages above simple keyword searching. First, by treating digitized corpora as immense bags-of-words, this technique deconstructs the given structure of texts. As a result, it breaks down most of the described mechanisms that make newspapers such delicate sources for historical research. Second, word clouds stimulate ‘distant readings’ of large corpora of texts by which associative and correlative patterns can become visible that would never have stood out on a singular level. It is particularly such patterns – such as a striking correlation between word x and pejorative y – that provide glimpses into discursive levels of language.
However, it is essential to underline that word clouds do not generate meaning for the keywords used – at least not in the sense ‘meaning’ is understood in a humanities context. They easily provoke hasty interpretations, because one almost automatically tends to make associations between different words in the cloud. While using these types of technologies one must, therefore, constantly be aware of the fact that rather than unambiguous ways of meaning-attributions, word clouds should be considered as new texts – texts that “are not easier to make sense of than the original texts used to make them.” Reading (interpreting) these ‘new texts’ does not reveal their meaning, but their substance or context. Important to note, though, is that the dichotomy assumed here is a false one. Meaning can, after all, hardly emerge from a word without taking into account its context. Therefore, the contextualizing qualities of these digital technologies can help to interpret – to give meaning to – the words at hand. And do so, we may add, without being troubled by the judgments of single authors.
It is this remarkable quality of word clouds to reveal contexts in which words are used that, among other things, enables a new use of digital newspapers for historical research. An example from the BILAND project may illustrate this. One of the goals of this project was to analyze the use of tacit knowledge of genetics in the Dutch public debate around 1900. Were the readers of Dutch newspapers familiar with theories of genetics, or at least with concepts of inheritance? By querying the word ‘inheritance’ (‘erfelijkheid’ in Dutch) itself, the Texcavator tool yields a timeline that indicates the start and frequency in the use of the term. Given the ambiguous meanings of the word ‘inheritance’— referring, e.g., not only to heredity, but also to legal and cultural forms of heritage—the tool displays the dominant contexts in which the concept appeared from year to year. It does so by generating word clouds from articles per individual year. In this instance, Texcavator clearly demonstrates that the biological meaning of inheritance was dominant throughout the end of the 19th and the first half of the 20th century. However, the context in which this concept was debated did change considerably over time. The word cloud makes it plainly visible that articles containing the word ‘inheritance’, in 1867, predominantly focused on medical subjects. In 1935, however, the medical context of inheritance has almost completely been replaced by a legal and racial context.
Beyond word clouds – mining for statistical meaning
Digital technologies like these enable historians to switch between scopes swiftly and effortlessly. First, the heuristic process of digital history is a constant to and fro between distant reading and close reading – the interpretation of data visualizations and the zooming in on potentially interesting texts. At the same time, digital techniques allow for the agile moving between longue durée and short-term perspectives. Also, text-mining technologies can cope with the searching for both concrete and abstract keywords. The Translantis project may serve here as an example. The projects overall focus is the study of Dutch public debates on domains in which the United States served as a model in the 20th century, like consumerism, the rise of managerism, the medicalization of society, or, for which the opening quotation serves as an illustration, the introduction of mass media in the Netherlands. Within the managerial domain companies like ‘Standard Oil’ can be subject of study, but ‘American industry’ as such can be as well. Similarly, the ‘jeans’ would make an interesting term to search on, as would ‘consumerism’ itself. In the same way, this study as a whole could be reflected on by taking the before mentioned ‘meta’-search phrases like ‘typically American’, but also ‘the American way’ or, for example, any combination of ‘America’ and ‘modernity’.
At the same time, it is clear what the limitations of digital keyword searching are. The scope of the subject of research is always and necessarily limited by the keywords that are selected as comprehensively defining it (in this case, ‘the American influence in the Netherlands’ or ‘the US as a model for Dutch business and economy’). Also, the chosen keywords always direct both the heuristic process and the analysis in decisive ways. Topic modeling is a more elaborate way of distant reading that can be deployed to overcome these problems.
Topic modeling is a way to deduce the latent structure within a series of texts. A latent structure could entail a set of words that are often used in close proximity. This might indicate the existence of a reoccurring topic within texts. Topic modeling is based on the latent relation hypothesis, which states that “pairs of words that co-occur in similar patterns tend to have similar semantic relations”. This hypothesis is translated to a set of rules within a vector space model (VSM), the technique that computational linguists use for the semantic processing of texts. In terms of vectors, the hypothesis is that “word pairs have similar semantic relations, when they have similar row vectors in a pair-pattern matrix”. An increasing number of software programs, such as MALLET, is available that are based on such algorithms to chart which words are used in close proximity within a big collection of texts. Their output comes in the form of strings of words. Both number of strings as number of words within a string can, usually, be chosen as one finds fit.
Topic modeling is, thus, able to provide global themes within large collections of historical newspapers. The connection between the words within one ‘topic’ is based on above-mentioned hypothesis of meaning production, instead of the ideologically, politically or otherwise inspired decisions of editorial boards. As topic modeling is usually based on large numbers of different newspapers and large volumes of these newspapers, they will to a large extent bypass the biases of single authors or editorial boards. Topic modeling will show words connected to the United States, regardless of whether different newspapers had contradictory attitudes towards the superpower across the Atlantic.
Obviously, herein also lie the limitations of topic modeling. Moreover, just as word clouds, automatically generated topics require interpretation. In the designation of a set of words as a topic, the analytical skills of the researcher come into play. For instance, the words, “America”, “Powerful”, “Dollar”, and “Market” indicate a topic concerned with the market power of the United States. The output is not the answer to a question, but a rhetorical object that needs to be read by the researcher. Topic modeling is employed exploratory. It is especially useful for the discovery of events, users, and objects related to specific topics within specific collections of texts. However, topic modeling can also help to discern discursive practices. We, for example, do this by combining topic models executed on multiple subsets of documents, either on different themes or historical periods. This allows us to compare combinations of words that determine a topic both between themes or time periods. From this the researcher can deduce discursive shifts that can then be further scrutinized through close reading.
A specific way to discern actors from a large collection of text is through Named Entity Recognition (NER). Named Entity Recognition uses semantic rules and databases to extract entities from a set of texts. The principle actors that can be drawn from texts are persons, locations, and organizations. The possibility to automatically discern these entities from a text makes the analysis of the dynamics of public discourse much more practicable. Networks between persons, locations, and organizations can be constructed from the digitized archives.
The data acquired through Named Entity Recognition can be used to create maps. These maps can function as visual and dynamic representations of references cultures within public discourse. Maps function not as the representation of reality, but rather to elucidate how discursive practices produce subjects and objects.  In other words, as a rhetorical device that is not employed to show what is true, but rather assist the reader in thinking how and concepts can work.
Digital source criticism
The techniques mentioned here – generating word clouds and/or histograms, topic modeling, and extracting named entity extraction – are but a view of the research tools that historians could successfully adopt from computational linguistics or information retrieval. Historians still have a lot of ground to cover when one realizes that these disciplines work on a variety of techniques that by far outdo the mentioned ones in terms of algorithmic sophistication. Still, even these word clouds and topic models have in common that they enable the researcher to build a narrative based on historical newspapers while seeing through the things that make them such complex sources for historical study.
At the same time, the use of digital techniques in historical study faces the researcher with problems and considerations of their own. Above, the limitations of keyword searching have already been stressed. Also, the need for interpretation of digitally generated results has been stressed. This is not a trivial observation. Visualizations of any kind, be it word clouds, histograms or topic models, are rhetorical objects of their own. Their “argumentative power” tends to steer the interpretation. The use of these techniques, therefore, requires a new form of sensitivity from the humanities scholar. He has to renew the critical stance he has been trained to adopt while using historical newspapers, towards both his object and method of analysis (which, using computational techniques, cannot always easily be separated).
The researcher, for instance, has to learn that producing and using software is about making interpretative choices. Every algorithm that processes the input data affects the outcome: the output data. Humanities scholars, therefore, have to learn to be very aware of each of those (pre)processing steps. First when he knows how to use the software, he can decide whether or not to use stop word lists, for example, i.e. ignoring seemingly redundant words in the analysis. The same applies for stemming or lemmatization, techniques to reduce words to a common root form or lemma. Texcavator makes use of all named preprocessing techniques.
Another essential consideration concerns the data used. The 80 million newspapers articles in the Dutch corpus sound like an immense heap, far more than anyone could ever read by hand in a lifetime. However, these articles cover the whole of the last four centuries, whereas both the BILAND and the Translantis project only focus on, more or less, the last one hundred years. Moreover, as we have learned from the National Library, the digitized corpus only contains 8 percent of all historical newspapers in its collection. To responsibly use the corpus, one must first have an idea of what it contains in terms of number and spread of newspapers, but also volumes of these newspapers (the WWII period is, for example, extremely overrepresented in the Dutch corpus – making analyses much more difficult to interpret). Second, it is just as important to know what is missing from the corpus. Are 8 percent of all available (not: published!) newspapers enough to count as a representative reflection of ‘the Dutch public debate’? Not if you know that most nationwide newspaper are failing for the postwar period due to copyright restrictions.
Also, digital research adds some new methodological problems to comparative research between different language regions. Tools usually function only for a specific language as a consequence of language-dependent linguistic preprocessing steps. The analytical consequences of preprocessing, which may vary for different languages, complicates the comparability of the generated outcomes. Furthermore, the fact that digital analyses so heavily rely on linguistic elements also hampers comparison. How do you compare concepts that go by completely different names in different countries (as the example of eugenics shows, which is called ‘eugenetica’ in Dutch, but in Germany was dominantly called ‘Rassenhygiene’): do you compare words or the concepts they represent? In other words, the researcher needs to have a very clear view on what it is he aims to compare. Lastly, the comparability of different sets of digitized archives is problematic. Probably less than in traditional media research, the researcher has control over the archives at hand: he is very much dependent on what others judge necessary or practical to digitize. Within this position he has to use datasets with comparable sizes, scopes and mindsets to make his comparison sensible.
The overall lesson is to be very sensitive when using digital techniques in historical (newspaper) research. Our experience shows that scholars not used to work with software as part of their daily scholarly practice tend to see computational techniques as utterly objective or neutral. They have to get used to the idea that this is never the case.
What comes with this observation is the realization that digital methods are to assist and not to replace the historian. He has to stay in charge of his research every step of the (computational) way. Just as much as he has to be aware that the strengths of digital methods like text mining are in the heuristic process preceding the analysis. Digital history is a quantitative history of texts – in contrary to forms of quantitative or structural history of numerical data that reached their height in the 1960s and 1970s. While statistical data enabled historians to make socio-economical analyses that gave a fundamental new dimension to historical storytelling, quantitative textual data has the same potential for the analysis of cultural and intellectual history. Digital history has in common with ‘traditional’ quantitative history that the presentation of the data in these approaches to the study of history in itself does not suffice as proof. Software can never make up for the need for the ‘old-fashioned’ historical analysis and narrative. Nonetheless, digital tools can revolutionize the exploratory search process. They are, through innovative and often surprising forms of visualization, able to present historians with a completely original view on texts and the other materials they are used working with.
Quantification here is not necessarily used to fix one’s eye on possible correlations, but to trigger historians to look in every (unexpected) way possible for associations that seem worthwhile looking into. It is more like hermeneutics gone digital:
[T]he nature of data and the way it has been used by historians in the past differs in several important respects from contemporary uses of data. This is especially true in terms of the sheer quantity of data now available that can be gathered in a short time and thus guides humanistic inquiry. The process of guiding should be a greater part of our historical writing.
As exploratory search tools Gibbs and Owens stress that text mining or other digital techniques are highly promising to iteratively “help with discovering or framing research questions” in the first place. To keep as much an open view as possible, one, consequently, needs as little pre-processing as possible. Exploratory searching as a way of “iterative interaction with data” needs agility and technical straightforwardness.
Language of news according to Van Ginneken tends to reinstate ‘the commonplace views of certain issues, shared by (most members of) a society or culture’. News discourses are, in other words, expressions of collective mentalities in the Burkian sense of the ‘unspoken or unconscious assumptions’ that build the ‘categories, metaphors, and symbols’ with which people think. While ‘the meaning of a word to someone (or to an entire group) is never simple and straightforward; it is always complex and layered, ambiguous and contradictory, with certain elements placed in and others hidden from immediate sight’, historical newspapers form rich sources for anyone studying these mentalities. Whereas elaborated quantitative research in general generate knowledge by bringing the dominant aspects of data to the fore at the expense of the statistically or semantically irrelevant, exploratory searching with digital techniques has a natural tendency to account for these ambiguities and contradictions. After all, it broadens the heuristic scope significantly compared to the traditional top-down approach of historical newspaper study. The different forms of visualization also speed up the iterative process and stimulate serendipity. As a result, exploratory searching leads to the opposite of traditional quantitative historical research. It is able to make visible the odd stories, the counter-narratives that counterbalance the dominance of grand narratives. In the present research, it is able to make the process of transfer and appropriation in the Netherlands of American concepts, ideas and practices less straightforward and more diffuse. In this way, exploratory searching is welcomed as an advanced corrective against the threat of essentialism and determinism.
 “Aan Televisie Verslaafde Jeugd,” Leeuwarder Courant, October 27, 1950.
 Jaap van Ginneken, Understanding Global News: A Critical Introduction (London: Sage, 1998), 144–165.
 Ibid., 144.
 See: www.delpher.nl.
 For an overview, see: J. Nyhan, A. Flinn, and A. Welsh, “Oral History and the Hidden Histories Project: Towards Histories of Computing in the Humanities,” Literary and Linguistic Computing, July 30, 2013.
 See, for example: Bob Nicholson, “The Digital Turn,” Media History 19, no. 1 (2013): 59–73; for an overview of what it means to do history in the digital age, see: Toni Weller, History in the Digital Age (London; New York: Routledge, 2013).
 On Big Data, see: Viktor Mayer-Schönberger and Kenneth Cukier, Big Data: A Revolution That Will Transform How We Live, Work, and Think (Boston: Houghton Mifflin Harcourt, 2013).
 Digital humanities (DH) have become an umbrella term under which a wide variety of scholarly practices have been gathered, ranging from the (development of techniques for the) digitization of source material, the use of digital methods as analytical and visualization tools, and the experimentation with digital forms of scholarly communication, presentation and publication. Here, DH and digital history are used specifically as analytical methods.
Historical newspaper archives from many different countries are, for example, available. Europeana is just one organization that instigates the further digitization of source material Europe-wide. See for Europeana’s WWI project: http://www.europeana-collections-1914-1918.eu/
 The Dutch National Library offers the searching of its collections of digitized newspapers, magazines, books and transcripts of radio news broadcasts through one interface: www.delpher.nl. The Netherlands Institute for Sound and Vision preserves one of the largest digitized audiovisual collection of Europe, containing the majority of Dutch public radio and tv broadcasts: www.beeldengeluid.nl/en. The Rijksmuseum has opened its digital collection to the public, making publicly available high resolution images of 150,000 pieces of art from its collection: www.rijksmuseum.nl/en.
 Nicholson, “The Digital Turn.”
 Johanna Drucker, SpecLab : Digital Aesthetics and Projects in Speculative Computing (Chicago: University of Chicago Press, 2009), 9.
 Nicholson, “The Digital Turn,” 67.
See, for example the project Geography of the Post: http://cameronblevins.org/gotp.
 See, for example: Anand Rajaraman and Jeffrey D Ullman, Mining of Massive Datasets (New York, N.Y.; Cambridge: Cambridge University Press, 2012).
 Martha Van den Hoven, Antal Van den Bosch, and Kalliopi Zervanou, “Beyond Reported History: Strikes That Never Happened,” in Proceedings of the First International AMICUS Workshop on Automated Motif Discovery in Cultural Heritage and Scientific Communication Texts, Vienna, Austria, 2010, 20–28.
 Patrick Juola, “Authorship Attribution,” Foundations and Trends in Information Retrieval 1, no. 3 (2006): 233–334.
 D. Sculley and Bradley M. Pasanek, “Meaning and Mining: The Impact of Implicit Assumptions in Data Mining for the Humanities,” Literary and Linguistic Computing 23, no. 4 (2008): 16.
 Hans Ulrich Gumbrecht, Production of Presence (Stanford (Calif.): Stanford University Press, 2004), 29–30, 47–49.
 Franco Moretti, Distant Reading (London; New York: Verso, 2013).
 Peter D. Turney, Patrick Pantel, and others, “From Frequency to Meaning: Vector Space Models of Semantics,” Journal of Artificial Intelligence Research 37, no. 1 (2010): 153.
, Hardcover, Blackwell Companions to Literature and Culture (Oxford: Blackwell Publishing Professional, 2008), 4–5, http://www.digitalhumanities.org/companionDLS/.
 For a description of the possible ways to analyze NER see: Caroline Sporleder et al., “Identifying Named Entities in Text Databases from the Natural History Domain,” in Fifth International Conference on Language Resources and Evaluation (LREC-06), 2006, 1742–45; Seth van Hooland et al., “Exploring Entity Recognition and Disambiguation for Cultural Heritage Collections,” Literary and Linguistic Computing, November 29, 2013.
 Drucker, SpecLab, 44.
 Bernhard Rieder and Theo Röhle, “Digital Methods: Five Challenges,” in Understanding Digital Humanities, ed. David M Berry (Basingstoke & New York: Palgrave Macmillan, 2012), 73.
 Gerben Zaagsma makes an elaborated point of this in: Gerben Zaagsma, “On Digital History,” BMGN-Low Countries Historical Review 128, no. 4 (2013): 3–29.
 See, for a critical stance toward digital historical analysis, also: Hinke Piersma and Kees Ribbens, “Digital Historical Research: Context, Concepts and the Need for Reflection,” BMGN-Low Countries Historical Review 128, no. 4 (2013): 78–102.
 Peter Haber, “Writing History by the Numbers. A New Historiographic Approach for the 21st Century?,” in Writing History in the Digital Age, ed. Kristen Nawrotzki and Jack Dougherty (Ann Arbor: University of Michigan Press, 2013).
 Although new in the humanities, ‘exploratory data analysis’ has a long tradition in the social sciences. See, for example: John W Tukey, Exploratory Data Analysis (Reading, Mass.: Addison-Wesley Pub. Co., 1977); Frederick Hartwig and Brian E Dearing, Exploratory Data Analysis (Beverly Hills: Sage Publications, 1979).
 Ginneken, Understanding Global News, 161.
 Peter Burke, Varieties of Cultural History (Ithaca, N.Y.: Cornell University Press, 1997), 162.
 Ginneken, Understanding Global News, 146.