12-14-2019, 10:26 AM
The problem with a basic Tesseract, is it is command line. Obviously the best way if OCR-ing a whole book. One problem is loss of formatting, tend to get long lines of text with no breaks and no headings etc.
I use it in Linux for small 'screen captured' text images using a GUI (prefer YAGF but not working in 'buntu 18.04 so gImageReader) .
For a screen capture always need some pre-processing in Gimp, scaling up 200% - 300%, clean background etc.
There is a Tesseract for Windows with GUI here: https://ocr.space/blog/p/free-ocr-windows.html
And a quick try-out in a Win10 VM https://i.imgur.com/H7fvKCu.jpg and that is typical, some post OCR corrections needed. Still better than typing out the whole thing
I use it in Linux for small 'screen captured' text images using a GUI (prefer YAGF but not working in 'buntu 18.04 so gImageReader) .
For a screen capture always need some pre-processing in Gimp, scaling up 200% - 300%, clean background etc.
There is a Tesseract for Windows with GUI here: https://ocr.space/blog/p/free-ocr-windows.html
And a quick try-out in a Win10 VM https://i.imgur.com/H7fvKCu.jpg and that is typical, some post OCR corrections needed. Still better than typing out the whole thing