[Date Prev][Date Next]
[Thread Prev][Thread Next]
[Date Index]
[Thread Index]
[New search]
To: "William D. Garriott" <wdg@xxxxxxxxxxx>, FrameUsers List <Framers@xxxxxxxxxxxxxx>, Frame List <Framers@xxxxxxxxx>
Subject: Re: to Re: Conversion Utility Wanted (scanned text --> live text)
From: Jay Smith <jay@xxxxxxxxxxxx>
Date: Tue, 11 May 1999 18:03:10 -0400
Organization: Jay Smith & Associates
References: <007f01be9bf4$ee5ab4c0$08091681@buzz-lightyear.cwru.edu>
Sender: owner-framers@xxxxxxxxx
Bill,
You hit the nail on the head... And I have avoided doing OCR for three years
myself!
HOWEVER...
1) From what I hear from people who do use it extensively, the OCR software
has gotten a little better. Just the same, training the software is
essential. And if you are scanning (for OCR) from old hand-set letterpress
type, forget it -- type it yourself.
2) If the material being scanned is small type size, one can sometimes do a
photocopy blowup of it first -- depending upon the page size of the original
and the scanner bed size.
Lastly...and I don't want to start a thread here, but does anybody remember
back when a version of OmniPage (OCR) would crash-hard a Win3.1 machine if the
text being scanned included the character strings: "SS", "S.S.", or "S/S".
Such as a ship named the "SS Victoria" or some such. I never got Omni to
admit that somebody was "never forgetting". Bless that person, but don't
crash my machine!
Jay
--
Jay Smith
e-mail: jay@jaysmith.com
The Press for History(tm), The Press for Education(tm),
The Press for [Your Industry](tm), The Press for....(tm)
On-demand printing and binding of hardbound books.
Minimum run one copy.
P.O. Box 650
Snow Camp, NC 27349 USA
Phone: Int+US+336-376-9991
Toll-Free Phone in US & Canada:
1-800-447-8267
Fax: Int+US+336-376-6750
William D. Garriott wrote:
>
> Jay's information matches my experience (which was so painful that I haven't
> attempted it for a few years). We worked with 300 dpi TIFF files, but what I
> was not prepared for was the TRAINING required for EACH typeface. You have
> to help some of these programs recognize the difference between a "cl" and a
> "d", an "lo" and a "b". The tighter the kerning, and the smaller the text,
> the harder it is for the software to distinguish between a single letter and
> a combo.
>
> Once our OCR software was "trained," it produced 90-some percent accuracy.
> However, the accuracy was based on the recognition of a "letter" (if it
> guessed a letter, it was accurate!), not a correctly spelled word! We soon
> discovered that with the vast array of typefaces in all their iterations
> (each printer may print a little differently on spacing, software can set
> lines of text tighter or looser...), there was no way we could pull it off
> more cheaply than retyping the document!
>
> Oh, the unfulfilled promises of technology...
>
> Best wishes,
>
> Bill
>
** To unsubscribe, send a message to majordomo@omsys.com **
** with "unsubscribe framers" (no quotes) in the body. **