10-02-2020, 08:38 AM
(This post was last modified: 10-02-2020, 08:40 AM by rich2005.
Edit Reason: typo
)
The ofn-remove-grid.py plugin works really well. It answers the topic question on removing that table. Using a more usual scan, greyscale 300 ppi
1. Scan straight into Gimp. Not great, uneven colour, slightly skewed. A typical scan.
2. Apply levels to get even colour
3. A bit faint so apply threshold to get more contrast
4. Apply the plugin, (grow selection = 1) and the dividing lines gone. A few speckles
5. Export that to a png and run through Tesseract - text is recognised
but
6. Run the original scan through Tesseract and get a better result. Tesseract ignores the lines.
I am gradually adding to my kubuntu 20.04 desktop, I think I will give Tesseract 5 a go, see if YAGF works with that version. There are PPA's for Tesseract.
1. Scan straight into Gimp. Not great, uneven colour, slightly skewed. A typical scan.
2. Apply levels to get even colour
3. A bit faint so apply threshold to get more contrast
4. Apply the plugin, (grow selection = 1) and the dividing lines gone. A few speckles
5. Export that to a png and run through Tesseract - text is recognised
but
6. Run the original scan through Tesseract and get a better result. Tesseract ignores the lines.
I am gradually adding to my kubuntu 20.04 desktop, I think I will give Tesseract 5 a go, see if YAGF works with that version. There are PPA's for Tesseract.