Ancient Greek OCR

Yes, you can scan documents written in ancient Greek (or images of such documents) and then copy the text where you need it (like pasting into an editor where you can make an interlinear translation). It’s easy to set up on Linux, see below.

Found this on the single page site, Ancient Greek OCR.

Basically the solution recommded is based on the renowned Tesseract OCR backend. Fedora 20 includes a package for ancient Greek, tesseract-langpack-grc.

I did not use the grc-traineddata file from the site, as I was pretty sure the one that shipped with the package would be up to date and probably more compatible with what I’d installed.

The page author recommends OCRFeeder for Linux as a graphical frontend, but since that isn’t available for Fedora 20, I installed gImageReader from the main repo instead (gImageReader is also recommended for Windows).

Once installed, I gave it a try using a high resolution .tiff file from the source image library on the Textkit site. The results were very impressive, even when working with mixed English and ancient Greek — something that Google Translate, among others, have a lot of difficulty with. One word of warning to those running slower CPUs, like the i3 in my machine, without a high end graphics card: the software will take quite a while to render a page.

This is a solution I can recommend to those of my liberal arts brothers and sisters who have occasion to work with ancient Greek text.

As Thales is reputed to have once said, τί εὔκολον; Τὸ ἄλλῳ ὑποτίθεσθαι.

This entry was posted in System Administration on by .

About phil

My name is Phil Lembo. In my day job I’m an enterprise IT architect for a leading distribution and services company. The rest of my time I try to maintain a semi-normal family life in the suburbs of Raleigh, NC. E-mail me at philipATlembobrothersDOTcom. The opinions expressed here are entirely my own and not those of my employers, past, present or future (except where I quote others, who will need to accept responsibility for their own rants).