Friday 5 January 2018

Transcribing a blot

One of the tasks in documenting artifacts as part of the project is transcribing labels on the bottles of materia medica in the pharmacy.

Mostly this is fairly straightforward - the labels are on the whole beautifully stencilled in india ink on good quality paper, and so while they may be a little yellowed they're perfectly legible. It's the early twentieth century ones that are more of a problem - cheaper paper and sloppilly writen in faded fountain pen ink.

To be sure they have their peculiarities - the extensive use of Æ  in nineteenth century pharmaceutical latin and outdated abbreviations like TṚ for tincture, but it's all fairly straightforward.

Until a couple of days ago, when I came across the following


where the label had been corrected at a later date - if you look carefully you can see what appears to be an extra L which has been blotted out in a different thinner ink. presumably at a later date.

This of course raises an number of questions about transcribing the label - should I transcribe the label as it was meant to be read, or include the blot, or transcribe it as the original text and note that the first L had been blotted out at (presumably) a later date.

I decided to go for the middle route and transcribe the label as you would read it today, blot and all.

While I knew about the Text Encoding Initiative and the Leiden Epigraphy conventions, which I'm using to indicate missing or illegible characters, I didn't know about blots.

My first thought was to simply insert a unicode blot symbol, except there isn't one - as a stopgap until I could spend more time with Google I decided to use the cyrillic Zhe (Ж) as


  • there was no cyrillic text involved in the pharmacy anywhere
  • it sort of looked like the H^HZ^HN sequence we used to use in Wordstar days to generate a cursor symbol on daisywheel printers when doing documentation
  • having learned to read and write Russian I could write it with a degree of fluidity
I guess I could have used the unicode block character ( █ ) but as I also keep a longhand paper workbook in parallel with the transcription spreadsheet Ж seemed a better choice.

I started off by searching for things like 'epigraphy blot' without much success - well I guess stone inscriptions don't have blots, although they do have erasures, so I don't think it was that silly a search. 

Changing the search terms to something like 'TEI transcription blot' was more useful and produced a lot of information on how to represent blots in XML as well as important questions such as whether it was a correction by the author or a correction at a later date and differentiating between the two, as well as what to do if you weren't sure.

The only problem was all this information was for creating XML markup, and I was transcribing the labels to an excel spreadsheet using unicode, and I needed a standard pre-XML way of doing this that was going to be intelligible to someone else.

In the end I found the answer in the epidoc documentation maintained by Stoa.org. Under erased and lost  it not only documented the TEI XML but also referenced previous pre XML paper technology conventions, in this case [[[...]]], which was ideal.

This little journey has raised a whole lot of questions, including should we be using TEI XML encoding for the labels.

The short answer is probably not, unicode in excel plus some standard notation is more than adequate in 99.9% of cases, and the whole majestic edifice that is TEI seems like complete overkill, but certainly this little diversion shows the importance of discussing and agreeing on transcription standards before starting on something as seemingly straightforward as a sequence on nineteenth century materia medica labels ...



No comments: