Tesseract OCR engine

Optical Character Recognition (OCR) software convert image into text data. Installation of Tesseract in Debian Wheezy is simple.

tesseract-ocr

Command line OCR tool

The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google and is probably one of the most accurate open source OCR engines available. It can read a wide variety of image formats and convert them to text in over 40 languages.


$ sudo aptitude install tesseract-ocr
The following NEW packages will be installed:
liblept3{a} libtesseract3{a} tesseract-ocr tesseract-ocr-eng{a} tesseract-ocr-equ{a}
tesseract-ocr-osd{a}
0 packages upgraded, 6 newly installed, 0 to remove and 510 not upgraded.
Need to get 29.4 MB of archives. After unpacking 79.9 MB will be used.

Use ImageMagick to convert png or jpg file to tiff format.
$ convert sample2.png -auto-level -compress none sample2.tiff

Run tesseract to convert the image into text data
$ tesseract sample2.tiff sample2

ocr-sample2

The text data:

1. Alinonrls

There are a lot of health benefits associated with almonds.
Almonds are very high in vitamin E and protein as well as
other nutrients such as maglesinm and pllospllorns. Almonds
contain anLi-cancer properties as well. Whether almonds are
best raw or pasteurized is still a soI|rce of heated debate.

This page http://www.win.tue.nl/~aeb/linux/ocr/tesseract.html provides through explanation how to use Tesseract in GNU/Linux.

About janpenguin

Email: janpenguin [at] riseup [dot] net Every content on the blog is made by Free and Open Source Software in GNU/Linux.
This entry was posted in GNU/Linux, Science and technologies and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s