LCCUG small logo

Back to Interface Articles Home Page   

       Readiris Pro:
      Turn Scanned Documents Into Editable Text
       by Kaye Coller

When you scan a page of text, the scanner takes a picture of it - it becomes a graphic. You can't put a graphic into a word processor and edit it the way you can text. OCR programs are designed to recognize the words in a graphic and turn them into text youcan edit in your word processor. The problem has always been that things like faint printing, different font styles, fly specks, and the like can hinder accurate conversion into text, and the user has to go through the document carefully to make sure it's legible, and that there are no weird characters scattered around.

Readiris Pro 7, developed by I.R.I.S, a Belgium software company, promises "Superior OCR accuracy at up to 1,300 characters per second." It claims to be 40% more accurate than its version 6. Other features include improved ability to detect and read tables, plus improved handling of color documents, auto format technology, multi-page documents/batch OCR. and more. New features include the ability to save documents as Adobe Acrobat PDF files, digital camera support, and page orientation detection. It also recognizes up to 93 languages.

The program requires at least a 486 based Intel PC or compatible, 16 MB RAM with 32 recommended, 55 to 70 MB of disk space, and Windows XP, ME, 2000, 98, 95, or NT operating system. It's also a good idea to have MS Word, which I don't. More about that later. List price is $140.39, but it's available at http://www.irislink.com/ for $99.99.

Installation is fairly simple; just follow the instructions in the installation program. After installation, you need to install "The Connect Capability" to configure your scanner and word processor to work with Readiris. The word processor was where I ran into trouble. There are several versions of MS Word supported, but I use Word Perfect 8. Some earlier versions of WordPerfect can be configured though. I emailed I.R.I.S support explaining my problem. They claim they'll get back to you within 48 hours. I emailed them at 2:49 pm and they replied at 4:55 pm - and this was on a Friday. Good service! unfortunately, no simple solution though. I'llexplain what they told me a few paragraphs from now.

To test the program, I used a page from our newsletter, Interface. It had 2 columns, some text in boxes, both serif and sans serif text, and graphics. It was of good contrast and clean. Readiris claims to be effective with poor quality scans, but I wanted to give it a reasonable test. The good news first R eadiris does a great job of creating a PDF file. This type of file can be read on any computer as long as Acrobat Reader, a free program, is installed. It copied the page exactly. The graphics weren't quite as sharp as the original, but still good. Of course you can't edit this document in your word processor.

Next, I saved it as an html file, the format used to create a web page. The program did a nice job here too. I found 2 minor mistakes in punctuation and one more serious mistake where a period was omitted from a web address. However, it took a lot less time to scan, use Readiris to create the html, and correct those mistakes than to type it and put in the code myself. I plan to use the program to convert the LCCUG by laws to text. Once they're posted as text, they'll be searchable and also take less time to load.

If I had MS Word, Readiris would send the converted scan directly to that program. Since I don't, Support told me to format to send the converted file to an external file. I found my best choices were either WordPerfect RTF (Rich Text Format, which can be read by most word processors), or a plain Rich Text Format file. When saving it in the plain RTF, I could choose "recreate source document", which is supposed to retain all formatting and include graphics. The program goes through a process of "recognizing" the text where it puts questionable letters or symbols on the screen. If there's a mistake, you make the correction.

After saving the file to plain Rich Text Format, I loaded it into Word Pad, which wouldn't load the graphics. It didn't always reproduce the same font style either, tuming sans serif into serif, and vice versa, sometimes on the same line. Boldface also didn't show up, and a couple of lines were spaced really close together, for some reason. Plus, some words ran together. There were a few other minor errors, but nothing time consuming to correct. One thing I found, if I set my scanner to gray scale, I got
more errors. Black and white worked much better.

Choosing WordPerfect doesn't allow the option of recreating the source document, but it does copy the
graphics. Although it didn't put the text into columns, altogether it did a better job than Word Pad. There were still minor mistakes, but no problems with line spacing. It's easy to format columns in WordPerfect, so with only a little work, I could get the page looking almost like the original. And with a little more work, I could put the sections of text that had been in boxes in the original, into boxes in WordPerfect.

I don't know if I would have gotten better results with the supported versions of MS Word. By the time I demonstrate the program at the May 28 meeting, I will have had a chance to work with Word on the club's computer and will let you know then if that works better. Even with the problems I had, I found the program useful for my purposes.


Up



Articles   Home