History of the Reuther dictionary

This is an outline of the historical background to Reuther’s dictionary — for more details see Austin 2023 (in References).

The original Reuther Diari-German manuscript materials are held by the South Australian Museum as a set of bound notebooks (catalogued as AA266-09). Here is a sample from Volume IV, showing the handwriting and style of the documents.

In 1974, the Australian Institute of Aboriginal Studies (AIAS, now Australian Institute of Aboriginal and Torres Strait Islander Studies, AIATSIS) provided funding for Pastor Philipp Scherer, the first archivist of the Lutheran Church of Australia, to translate the whole of Reuther’s manuscript into English. The resulting typed document includes the Dictionary in Volumes I to IV amounting to 2,180 pages. Here is a sample (page 1885) from Scherer’s 1974 translation of the dictionary, showing part of Reuther’s volume IV page 80-81 (compare the picture above).

In 1981 a microfiche of the whole translation was published by AIAS – this is difficult to use because specialist equipment is needed to read the document, and it can only be searched by going through page-by-page. In 1989 David Nash and Jane Simpson, working at AIATSIS on the National Lexicography Project, scanned the Scherer translation of the Dictionary using a Kurzweil Discover 7320 Model 30 scanner and optical character reader and created 44 plain text digital files. Simpson proof-read the scanned files and corrected many obvious errors, however many mistakes in the Diyari remained (e.g. ] or J for j, nq for ng, ~ for uninterpreted characters), along with random representations of white space. The resulting files added to the value of the Scherer translation but were still not ideal.

From 1991 to 2000 Peter Austin partially edited the proof-read scanned text files in Microsoft Word to correct more errors, especially in the Diyari words, and trying to regularise the formatting.

In 2014-2015 with funding support from the Dieri Aboriginal Corporation, David Nathan processed the Word files to further clean them up and remove inconsistencies. He produced a combined XML-marked-up plain text version, with tags encoding information content, such as <gloss>…</gloss>, plus a Cascading Style Sheet (CSS) which specifies how the file displays (e.g. in a web browser).  Paragraphs that are indented in the translation were tagged as <tabp>…</tablp>.

From 2020 to 2023, with assistance from Edward Garrett, Peter Austin edited the XML file, correcting numerous errors in the English translations and the Diyari, and adding tens of thousands of content tags, including coding and classifying 13,158 notes (from Nathan’s <tabp>), 3,879 examples, and 1,766 footnotes added by the translator.1 Here is an example of what the marked-up XML file looks like (beginning with the sub-entry [9] “kumari tapana” seen above):

Garrett and Austin refined the CSS to create a specialist edition of the Dictionary, which is published here. Garrett then converted the XML to JavaScript Object Notation (JSON) and indexed the JSON dictionary using the open-source in-memory Redis database. Redis therefore serves as the back-end for the user-friendly edition to Reuther’s dictionary.

Here is screenshot of the online specialist edition that shows Scherer’s page 1885 (Reuther’s Vol IV, page 80-81) seen in the image above. This shows the colours, labels, and layout that have been added to the translation via the process of editing and XML tagging:

The following screenshot shows the results of searching for “kumari tapana” in the user-friendly online version:

Clicking on the underlined link to “tapana” takes the user to an entry which has the following format for the page shown above:2

Footnotes:

  1. technically, the outcome is a semi-diplomatic edition because it “seeks to reproduce only some of [the] features of the original” (see Wikipedia). We have also adjusted the English translation in various systematic ways, discussed here.
  2. the user-friendly edition suppresses some details such as the footnotes and page numbers, which are less useful to general readers.