![]() I'm not a Linux user, I had Linux Ubuntu years ago but I really didn't like constantly hacking my computer to make it work. It kind of appears to be a hellava lot of work to create an epub because the text needs to be basically rebuilt, or rewritten into MS Word or some other word processor and then converted to an epub. Well, I thank you for all of your advice. ![]() I have never tried Djvu, EPUB, and don't know what FB2 is. Besides this output form, it can produce PDFs or Djvu files with the identical layout as the original, with or without the image of the original document. Besides that I flirt with the idea to acquire a contingent of pages for recognition of Fraktur (Gothic) script, once I have a project for that.įineReader keeps character formatting like italics or bold with output form "Free form text", and has also the option to keep line breaks. I hope that this year's "Black Friday" will bring another offer which I can't refuse. I got the version 6 "Sprint" as add on to an Epson flat bed scanner, and later upgraded to version 11, and still later bought version 12 with a "Black Friday" discount. Thanks for the clarification on Acrobat joining lines to paragraphs - but Acrobat would have to do guesswork. I guess that Abbyy FineReader or OmniPage may be more convenient tools, but I have only little experience with them. page numbers which are useless in reflowable text). gImageReader has function of rectangular selection of specific area of Image which is OCR-ed (it may be applied to multiple pages at once) which is very useful for ridding off headings and footers (e.g. As a matter of fact any text formatting is lost (it saves plain text) but usually it takes less time to recreate text formatting from scratch than correcting formatting produced by Acrobat. Personally I prefer tesseract (current 4.00 beta version) plus gImageReader frontend. On the other hand I agree that there are better OCR programs than Acrobat. So usually no additional converting of line brakes into spaces is necessary (apart of typical OCR errors). Newer versions of Acrobat Pro join lines into paragraphs and in general try to save layout of original documents (as I remember Acrobat 11 Pro does). ![]() It is better to use another OCR which does not keep the line breaks as Acrobat does, if the ultimate goal is an EPUB. And don't forget to convert most line breaks into spaces, so that you have free flowing text for the Epub.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |