Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Erase / remove borders from a table (scanned pdf)
#8
The ofn-remove-grid.py plugin works really well.  It answers the topic question on removing that table. Using a more usual scan, greyscale 300 ppi

   

1. Scan straight into Gimp. Not great, uneven colour, slightly skewed. A typical scan.
2. Apply levels to get even colour
3. A bit faint so apply threshold to get more contrast
4. Apply the plugin, (grow selection = 1) and the dividing lines gone. A few speckles 
5. Export that to a png and run through Tesseract - text is recognised
but
6. Run the original scan through Tesseract and get a better result. Tesseract ignores the lines.

I am gradually adding to my kubuntu 20.04 desktop, I think I will give Tesseract 5 a go, see if YAGF works with that version. There are PPA's for Tesseract.
Reply


Messages In This Thread
RE: Erase / remove borders from a table (scanned pdf) - by rich2005 - 10-02-2020, 08:38 AM

Forum Jump: