This workshop introduces the basics of optical character recognition (OCR), which allows for full-text searching and other types of text manipulation of a digitized document. Attendees will learn how to use Google Docs to create a basic machine-readable text from an image file and be introduced to Tesseract for OCR through exercises in Google Colab. This workshop is open to researchers interested in OCR for any language. It is strongly recommended that attendees: 1) prepare a digitized, highly legible sample image file for trying out the tools, and 2) have a Google account to do the exercises fully and save their work.
Presenter: Dale J. Correa, Mercedes Morris, Natalya Stanke
This workshop will be a hybrid event.
PCL Scholars Lab Data Lab or Zoom Registration: https://utexas.zoom.us/meeting/register/tJMkc-mrrDksGtGE9Nk6BaSPaYMQV7gYjuoZ#/registration
Location: Scholars Lab, Perry-Castañeda Library, Virtual