One Republic of Learning / Digitizing the Humanities

The New York Times

In the Republic of Learning humanities scholars often see themselves as second-class citizens. Their plaintive cries are not without cause. When universities trim budgets it is often their departments that take the hit. In the last 10 years, however, there has been one bright spot: the “digital humanities,” a vast enterprise that aims to digitize our cultural heritage, put it online for all to see, and do so with a scholarly punctilio that Google does not.

The digital humanities have captured the imaginations of funders and university administrators. They are being built by a new breed of scholar able to both investigate Cicero’s use of the word “lascivium” and code in Python. If you want to read Cicero’s letter in which lascivium appears, or the lyrics of 140,000 Dutch folk songs, now you can. Texts are living things: Digitization transforms them from caterpillars into butterflies. But the true promise of digitization is not just better websites. Rather, it is the transformation of the humanities into science.

By “science” I mean using numbers to test hypotheses. Numbers are the signature of science; they allow us to describe patterns and relationships with a precision that words do not. The quantification of the humanities is driven by an inexorable logic: Digitization breeds numbers; numbers demand statistics. The new breed of digital humanists is mining and visualizing data with the facility that bioinformaticians analyze genomes and cosmologists classify galaxies. All of them could, if they cared to, understand each others’ results perfectly well.

Most traditional — analog — humanists, I suspect, delight in the new databases but do not fully grasp their consequences. One great literary critic did so years ago. “What,” asked Harold Bloom in 1973, “is Poetic Influence anyway? Can the study of it really be anything more than the wearisome industry of source-hunting, of allusion-counting, an industry that will soon touch apocalypse anyway when it passes from scholars to computers?”

Bloom’s apocalypse arrived in 2012 when a group of mathematicians analyzed the pattern of stylistic influences in more than 7,700 texts. Just the year before, Bloom published “The Anatomy of Influence,” his swan song. Less a work of rational criticism than a testament of personal aesthetic faith, its claims are immune to quantitative tests, or indeed tests of any kind. “I am an Epicurean literary critic, reliant upon sensations, perceptions, impressions,” he wrote. But scientists know that impressions lie; that they tell us what we want to hear, not what is.

It’s easy to see how it will go. A traditional, analog, scholar will make some claim about the origin, fate or significance of some word, image, trope or theme in some Great Work. He’ll support it with apt quotations, and fillet the canon for more of the same. His evidence will be the sort that natural scientists call “anecdotal” — but that won’t worry him since he’s not doing science.

But then a code-capable graduate student will download the texts — not just the canon, but a thousand more — run the algorithms, produce the graphs, estimate the p values, and show the claim to be false, if false it indeed is. There will be no rejoinder; the analog scholar won’t even know how to read the results. Quantification has triumphed in field after field of the natural and social sciences. It will here, too.

Science, however, is not just about measurement. Science offers theories — of a particular kind. The French poet Paul Valéry said that a “work of art becomes a machine intended to excite and combine the individual formations” of our minds. Yes, but how does the machine work?

A comparison with biology shows what’s missing. To explain organic diversity, biologists have built a theory of evolution whose major tenets are couched in math and generally agreed. To explain cultural diversity, the humanities have offered only a succession of incommensurable interpretive fashions and uncountable particular studies, many of which, to be sure, enrich our understanding of this writer or that, but which only add texture to the tapestry of culture and do nothing to explain its whole.

There is an explanatory vacuum. Some scholars think that it will be filled by something resembling the theory of organic evolution. I think they’re right. But it will also draw elements from epidemiology, cognitive psychology and behavioral economics. Whatever it looks like, we can be sure of one thing: It will be expressed not in words, but equations.

If the rudiments of a new cultural science are visible, so are its limits. There is one great difference between human and natural things: The former have meaning; the latter do not. That is why the humanities are filled with critics and the natural sciences are not: Critics tell us what artifacts mean.

I say “seems” because deep-learning algorithms are becoming very good at extracting meaning from data; and, as art becomes data, it is always possible that new meanings may be revealed by algorithmic microscopes yet unbuilt. That said, it would take a very clever algorithm to flag up irony in Jane Austen. More fundamentally, the truth of art criticism is not the same kind as scientific truth.

Will there be a new kulturkampf — a great battle between quantification and interpretation? Or will the humanities, weakened by their own interminable, internecine Theory Wars, gratefully accept the peace imposed by science? Some will fight. Hard words such as “imperialism,” “scientism,” and “vaulting ambition” will be flung about — the vocabulary of anti-science is rich, well-honed, and all the more pungent for a little Shakespeare.

But most scholars, I believe, will simply accept quantitative tools for the power that they offer. Some will use them to survey vast bodies of literature; others to unravel the tiniest philological knots. Under the Pax Scientia criticism will continue, but be tamed. The epistemological feuds of the 20th century will yield to the technical quarrels typical of science. The scene will be less tumultuous, some will say less exciting, but it will be a renaissance.

Whether the new humanists will accept, or even understand, the rise of a mathematical theory of culture is another matter. It’s being built by biologists, economists and physicists and being published in the unforgiving terms and journals that such scientists read. I hope they do. After all, it seeks to explain the world of human-made things that they know and love. And it holds the promise — long sought, often heralded, never fulfilled — that the Republic of Learning, so divided for so long, will become one.

Armand Marie Leroi is a professor of evolutionary developmental biology at Imperial College, London, and the author, most recently, of “The Lagoon: How Aristotle Invented Science.”