Manuscript OCR

OCR training data for Caroline Miniscule, for the Tesseract OCR engine. This is developed by Rescribe, and based upon work described in the paper Modelling Medieval Hands: Practical OCR for Caroline Miniscule. We also produce OCR training data for early printed Latin or Ancient Greek.

The training is still in active development. We are keen to improve it, so any feedback would be very welcome. It is released as free software under the Apache License 2.0.

Downloads

Caroline Miniscule OCR v1.0 (for Tesseract v4.x) (2020-05-06)

Contact

For comments, bugs, criticisms, code, help, or anything else, contact the folks at Rescribe: info@rescribe.xyz.