Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
ofn-remove-grid
#1
ofn-remove-grid removes table borders from scanned text. It is found here.

It was spawned from this discussion.

As always, comments, suggestions and bug reports welcome.

Enjoy.
Reply
#2
Using the script I'm not getting good results for the image presented on this page:

What's the way to remove all lines and borders in image (keep texts) programmatically?
https://stackoverflow.com/questions/3394...ogrammatic

I tried with values for the treshold area from 6500 to 9500, but most of the page is deleted.

PS: But when manually executing the steps indicated in post #4 - https://www.gimp-forum.net/Thread-Erase-...8#pid20168 the result is great!

[Image: G1uQx.png]
Reply
#3
(10-02-2020, 04:39 PM)Krikor Wrote: Using the script I'm not getting good results for the image presented on this page:

What's the way to remove all lines and borders in image (keep texts) programmatically?
https://stackoverflow.com/questions/3394...ogrammatic

I tried with values for the treshold area from 6500 to 9500, but most of the page is deleted.

PS: But when manually executing the steps indicated in post #4 - https://www.gimp-forum.net/Thread-Erase-...8#pid20168 the result is great!

[Image: G1uQx.png]

Quoting the script doc:

Quote:Assumptions and algorithm

The script assumes that the area of grid squares is larger than the area of any visually isolated bits of text, that are normally individual characters. So it won’t work well if the image includes a very large text and a tiny table.

... which is pretty much the case with the image you show. This said, eyeballing the smallest grid area (which appears to be the one at (1890,1020), and making a selection just inside it (which makes an area threshold around 5400px), and then running the script removes this:

   

Which isn't too bad given the circumstances.

The one difference between ofn-remove-grid and ofn-path-filter-strokes is that ofn-remove-grid takes a shortcut to compute areas: it actually computes the area of the bounding box, when ofn-path-filter-strokes computes the area of the polygon (but with a formula that is only valid for convex polygons, so not for typical letters...).
Reply
#4
Uploaded a new version that can also filter on width and height, for these cases where there are only horizontal or vertical dividers.
Reply
#5
Ofnuts,

In the post https://www.gimp-forum.net/Thread-Erase-...3#pid20173 you say the following:

"Made a last minor improvement: if there is a selection, its area is used as the area threshold, in other words, I automated the hint above, no need to explicitly use the histogram and copy values."

I'm not sure if I'm misunderstanding, but if I make a selection and leave it active when I call the plugin, the plugin still uses the value contained in the threshold field.

If I'm not mistaken, the default value of the threshold if not changed is 1000. And I believe that is the value used by the plugin and not the value of the selected area.

If I use the histogram and type the value found in it in the plugin's threshold field, everything works perfectly, but if I try to just leave the selection active without typing the value, the plugin just uses the value that is already typed in that field.

Here using version v0.2 from 2020-10-23.
                               .....
Samj PortableGimp 2.10.28 - Win-10 /64.
Reply


Forum Jump: