the gospel story as told by Matthew:
Comparing two digital texts

MA English Literature student Emily Burke discusses findings from her work placement with the Cassirer Archive.

As part of my MA English Literature, I was offered the opportunity to complete a work placement module of 100 hours. I chose to work with the Cassirer Archive in the University of Sheffield’s School of English, as I hope to pursue a career in academia, and I felt that learning more about the processes surrounding archival, cataloguing, and digitisation would be beneficial for the future. Due to the COVID-19 pandemic, this work placement was completed entirely from home, with myself and my supervisor, Dr. Iona Hine, communicating over Google Meet on a weekly basis to discuss progress and any issues which had arisen with the work.

When registering for the placement module, I had very little knowledge surrounding both Ancient and Biblical Studies, and methods of digital preservation, though both interested me. Along with tasks such as quick-cleaning data and proofreading and adding emphasis to Cassirer’s Categories, one of my main responsibilities was to compare two different versions of Cassirer’s translation of The Gospel Story as Told by Matthew. Dr. Hine outlines the background of these two differing texts below:

Back at the start of the Cassirer project, I was handed a USB drive with a few digital assets. The contents included a couple of hundred lines from Cassirer’s Oedipus, a copy of the proposal underpinning the University of Sheffield’s work with the Cassirer materials, a list of Cassirer’s known works, and a folder named “God’s New Covenant”. The latter contained 12 documents, one of which could be identified as .wps (Microsoft Works) format, and four of which had in their file name “restored”. On inspection, it was apparent that these represented an existing digital copy of Cassirer’s New Testament translation in three parts, albeit in need of some attention. Even those marked “repaired” had visible formatting issues. This was least true of the file marked “GNC 1”.

It was a logical step to encode the digital edition of Cassirer’s New Testament (to be published on the legacy website) using these born-digital (word-processed) files as the foundation. However, after I had completed the encoding of Matthew, it became clear the digital text was not identical to that in the printed edition of God’s New Covenant (Eerdmans, 1989). Before encoding Mark, Luke and John, I scanned the print version, applied optical character recognition (an algorithm to identify text in an image), and ran comparisons to reveal the differences.

By the time I reached Acts, the problems of the born-digital edition (GNC 2) made it an inferior candidate for the digitisation base. I now began to work solely with the text digitised by optical character recognition. I had edited Mark, Luke and John to resemble the printed text, making notes on the patterns of difference observed.

The task I set for Emily, was to identify the differences in Matthew’s Gospel, and then to revise the existing encoded text so that it (like the rest of the encoded New Testament) matched the print edition. As I recall, I rather deliberately withheld my hypothesis about the file origins, leaving room for her to form an independent hypothesis based on her observations.

The process of comparing the two texts yielded some intriguing differences that I (Emily) had not expected. One such variation was that the word-processed version used British English, shown through the spelling of words such as ‘recognise’, ‘paralysed’, and ‘realise’. The OCR/print version, however, used American English (‘recognize’, ‘paralyzed’, ‘realize’). It makes sense that the print version used such spelling: the publisher of God’s New Covenant was based in the USA.

Close reading revealed other patterns of difference between the two versions of Matthew. For example, the word-processed text favoured the noun ‘man’ as a term to refer to all of humanity (‘the nature of the man’; Matthew 7:24). The print edition employs gender non-specific language (‘the nature of the person’).

Generally speaking, the terms ‘man’ and ‘mankind’ are used to encompass all people, and were widely used until a marked decline in the 1970s.

This chart shows the relative frequency of "mankind" in written texts through time, based on Google Books data (via Google NGrams).
Source: J-B Michel*, Y. Kui Shen, A. Presser Aiden, A. Veres, M. K. Gray, W. Brockman, The Google Books Team, J. P. Pickett, D. Hoiberg, D. Clancy, P. Norvig, J. Orwant, S. Pinker, M. A. Nowak, and E. Lieberman Aiden*. Quantitative Analysis of Culture Using Millions of Digitized Books. Science (Published online ahead of print: 12/16/2010).

Ronald Weitzman, Cassirer’s editor, addresses this in his introduction to God’s New Covenant, writing:

‘As for the use of exclusive expressions, such as “man” and “men” to describe both members of the human race […] they cannot always be responsibly or intelligently expunged; and Cassirer resisted what he saw as an encroachment upon the English language and which fatally undermines its poetic force […] subsequently, the decision has been made to respect his wish in most instances, only permitting change, with caution and trepidation, where this could be made without tampering with interpretation or weakening the translation’s stylistic distinctiveness.’

(Weitzman, Introducing the Translation and its Translator, pp. xv-xvi)

Yet despite this defence, the changes are evident in the printed text. The fact that Weitzman discusses this particular editorial decision in his introduction indicates that the exclusive nature of the language bothered some readers, and were changed to suit a wider audience. This change also better reflects the Greek vocabulary, which was less gendered.

Regarding inclusivity, another key change occurred in Matthew 28:19. Here, the word-processed version has, ‘make people in any and every nation’. The print reads: ‘make people in every nation’. This change is subtle. It can easily be argued that it does not change the meaning of the sentence in any real way. However, I perceived the omission of ‘any and’ as more exclusionary, something somewhat at odds with the message of inclusivity that changing ‘man’ to ‘person’ conveys. Upon discussing this point with Dr. Hine, she confirmed that there is only one word in the Greek translation. Cassirer was deliberately expansive and used ‘any and every’ in each instance in order to be inclusive. My assumption regarding this difference is simply that the editors felt that using ‘every’ alone was encompassing enough, and removed the other two words for the sake of conciseness rather than exclusivity.

One final change that both Dr. Hine and I puzzled over came from Matthew 12:33. The word-processed version reads, ‘Make a tree a good tree and its fruit will be good; make a tree a rotten tree and its fruit will be rotten’. The print version has, ‘Suppose a tree is good, then its fruit will be good; suppose a tree has fallen into decay, then its fruit will be worthless’. Though both of these quotes are attempting to convey a similar message, the lexical choices hold pivotal distinctions. Most notably, compare ‘rotten’ from the former, to ‘worthless’ from the latter. Looking at other New Testament translations as a comparison, most adhere to a similar turn of phrase as the word-processed version. For example, ‘Make a tree good and its fruit will be good, or make a tree bad and its fruit will be bad’ from the New International Version.

Due to COVID-19 restrictions, we are unable to go to the physical archive and look at the typescripts to investigate this further. However, it is possible that both versions come from Cassirer himself, and he changed his interpretation of the phrase. Changing the verb from ‘make’ to ‘suppose’ seems to indicate the removal of blame from the person who has grown the tree—was this something that Cassirer considered?

My belief while completing this task was that the print edition pre-dated the word-processed text received by the archive, and that the latter attempted to “correct” perceived issues with the print edition such as the American-English spelling. Dr. Hine’s hypothesis was in fact the opposite to my own, believing the word-processed text was an earlier draft of the print. Either one of us could be correct in our conclusions, as at present there is no definitive way of confirming which came first.

Dr Hine reflects:

As Emily acknowledges, she is not a biblical scholar. Her mention of Greek usage is dependent on my input.

Alongside Weitzman’s introduction, there are other documents attesting to the discussions that took place as Cassirer’s translation was prepared for publication. An expert who had been asked to review the text by the publisher strongly objected to the exclusive language. The printed text takes this into account in two ways: defending Cassirer’s practice (Weitzman in the introduction, as quoted above); and amending it where there was a strong case to be made—generally because Cassirer was translating a term such as ἄνθρωπος (anthrópos, human) which lacks the uncompromising maleness of ᾰ̓νήρ (anér, man) or αδελφός (adelphos, brother). Emily logged six such changes, clustered in Matthew 7 & 13. That the scope of inclusive amendments is noticeable in just one gospel, and what was (at earliest) a late stage in its preparation for print—suggests how strongly gendered Cassirer’s unedited version could have seemed to a reviewer.

Emily shows how simply comparing two English versions can reveal and raise questions about the impact and nuance of translation. It is fascinating to think how Cassirer’s actual inclusiveness (“any and every”)—or his desire to emphasise particular aspects of the Gospel message—may have been diminished by particular expectations about how the Bible ought to be translated.

It has been frustrating that we could not readily consult the various typescripts deposited at Sheffield. These record different stages in Cassirer’s work, with variant translations. This tendency to adapt and amend is visible in Cassirer’s Categories of St Paul, as highlighted in the #Cassirer40 Twitter threads. Perhaps the “make” v. “suppose” passages come from different iterations of Cassirer’s work. “Make” represents a more traditional English treatment. It is the default translation of the Greek (ποιήσατε). “Suppose” gives a different flavour to this speech.

Most of the evidence points to the printed text being a more deeply edited version when compared with the word-processed files. However, our uncertainty rests on the discovery that in the mid-1990s Weitzman corresponded with someone who intended to reprint Cassirer’s New Testament. Weitzman sent errata, and received .wps files by return. This information unsettles the authority of the printed text. (We know it contains errors.) Yet given the diversity of typewritten and digital options, it is far from clear what should best represent Cassirer’s own work.

Emily’s task was not purely theoretical: The OCR algorithm makes mistakes. Most are easily found and corrected during the encoding phase. That said, coding can itself obscure some kinds of errors. The comparison task found one or two further issues, which Emily was able to correct. All who consult Cassirer’s online bible will be grateful, as we are, for that careful work.

The header image for this post shows a portion of The Gospel Story as told by Matthew from one of the typescripts in the Sheffield Cassirer archive (CASSIRER 043; Matthew 3:3-7).

the gospel story as told by Matthew: Comparing two digital texts

MA English Literature student Emily Burke discusses findings from her work placement with the Cassirer Archive.

the gospel story as told by Matthew:
Comparing two digital texts