What Is an OCR Scanner?
Optical Character Recognition (OCR) is a data-entry technique that uses a specific font type and an OCR scanner to read the character set and send it to your computer. The American National Standards Institute, or ANSI, defines the font type as a set of characters 0-9, A through Z, and a few special characters, each containing a defined size and shape. OCR fonts are reproducible, and humans and OCR scanners can read and distinguish them.
-
Categories
-
OCR scanners are either Text Input or Data Capture scanners. Text Input scanners read the entire document, or at least large portions of it. Data input can be hand-fed or the scanner can have automatic data feeding, reading, sorting and stacking capabilities. When using a Text Input scanner, editing takes place either during or after scanning. Data Capture scanners capture and format data during the scanning process, and no human editing of data takes place. Because of this, Data Capture scanners must be more accurate.
Types
-
Scanner types can be stationary or hand-held. Stationary scanners, such as flatbed, sheet-fed and drum scanners mainly use Text Input to read, process and store data images on your computer, where you can then edit or otherwise format the captured text. Hand-held scanners, such as digital pens or bar code scanners, use either Text Input or Data Capture to read and process data information and then store it for later editing, or "lock" data to prevent editing.
-
Methods
-
Briefly, an OCR scanner takes a picture of the document, and then the OCR scanner software looks at the OCR font the picture contains, and then reads and converts it to text using either a Matrix Matching or Feature Extraction method. Matrix Matching is a form of pattern matching where the scanner looks at a character and matches it to one in its library of characters or character templates. Feature Extraction does not rely on a predefined library, but on general features such as open areas, closed shapes, and intersecting lines when deciphering characters. Feature Extraction also goes by the name Intelligent Character Recognition, or ICR.
Benefits
-
The most significant benefit of using an OCR scanner is the elimination of human data entry errors. OCR scanners read data in speeds that can reach over 200 characters per second. The accuracy rate of an OCR scanner is 99.9975 percent, or one character misread in 40,000, as compared to a human misread rate of one in 300 characters. Automatic check digit validation can bring the OCR accuracy rate to fewer than one in 3,000,000.
Considerations
-
Poor quality originals will result in less accurate OCR documents. Handwritten documents, documents containing styled text, older documents, photocopies and most faxed documents do not work well with OCR scanners. Recommendations for acceptable documents include printed text in a font size less than 72 points, laser and ink jet printer text, fax documents with 200 dots per inch (dpi) or greater resolution and commercially printed materials such as books, brochures, and magazines.
-
References
Resources
- Photo Credit white charger in salon of the car image by Irina Igumnova from Fotolia.com