Nova Scotia Flag

SCOTIA SYSTEMS BLOG




Free Windows OCR Recommendation – FreeOCR V3

April 15th, 2010 admin

Scanned PDF files are becoming increasingly common now to replace faxes.    But how many time have you tried to select the text in a PDF, only to find you can’t because it’s in image form?

Well I’ve just searched Google for a solution – knowing that they also use OCR (Optical Character Recognition) for indexing PDF Files.

It turns out that Google are developing an open source OCR platform based on an engine called Tesseract which was developed by HP Labs between 1985 and 1995.

FreeOCR is a free windows GUI which uses the Tesseract engine – and I have to say it works pretty well!

scanfile





An experiment in Search Engine OCR – Part 2

February 23rd, 2010 admin

OK, so the results are in for the 1st part of this experiment – now for the second stage.

This time, I’m submitting a PDF file to google to see what gets indexed and how quickly.

Here’s the PDF File