Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Blackest Pixel
#1
Hi

I'm writing a python script to cleanup a scan text document. (I'll have to cleanup thousands of pages)
Is there a way to determine what is the darkest pixel in the image?
I want to use "pdb.gimp_image_select_color" using the darkest pixel (which is not always pure black (0, 0, 0))
Reply
#2
No direct way as far as I can tell. You can use several calls with Gimp histogram, adjusting the upper limit of the range until you don't get any pixels in the range:

The call  is:
Code:
mean, std_dev, median, pixels, count, percentile = pdb.gimp_drawable_histogram(drawable, channel, start_range, end_range)

The values you are interested in are count and/or percentile (as far as I can tell, count=pixels*percentile).

You use:

Code:
_,_,_,_, count,_ = pdb.gimp_drawable_histogram(drawable, HISTOGRAM_VALUE, 0.,max)

and you try max values (with a dichotomic search you'll never need more than eight calls), something like:

Code:
def blackest(drawable):
    bot,top=0.,1.
    while top-bot>.001:
        print "%5.3f < x < %5.3f" % (bot,top)
        threshold=(top+bot)/2.
        _,_,_,_,count,_ = pdb.gimp_drawable_histogram(drawable, HISTOGRAM_VALUE, 0.,threshold)
        print "%5.3f px @ %5.3f" % (count,threshold)
        if count:
            top=threshold
        else:
            bot=threshold
    return threshold


However doing a color selection on the result may select a single pixel and may not give you the result you want. What is the whole process?
Reply
#3
Hi Ofnuts, thanks for taking the time to respond, and moving into right board

We are scanning very old documents, that was still typed using a typewriter. We are scanning it at 600dpi in grayscale. The mission now is to clean-up the scans, and reduce file size as much as possible (300dpi Black and white) to be able to share these documents. Some documents are very bad, with a lot of noise etc.

Pointing me to the histogram, solved my problem.  Based on the percentile I decided to get the range that is used the most in the lower scale of the histogram, and set all pixels up to that scale to black.


Code:
def SetBlack(image, drawable):
    #find the "color" that is used the most in "darker" side of the historgram - this is the black of text
    MaxPercentile = 0.0
    MaxEndRange = 0.0
    Increment = 0.025 #2.5% increase
    
    start_range = 0.0
    end_range = Increment
    mean, std_dev, median, pixels, count, percentile = pdb.gimp_drawable_histogram(drawable, 0, start_range, end_range)
    if percentile > MaxPercentile:
        MaxPercentile = percentile
        MaxEndRange = end_range
    
    for x in range(0, 20):
        start_range = end_range
        end_range = end_range + Increment
        mean, std_dev, median, pixels, count, percentile = pdb.gimp_drawable_histogram(drawable, 0, start_range, end_range)
        if percentile > MaxPercentile:
            MaxPercentile = percentile
            MaxEndRange = end_range
        #pdb.gimp_message(percentile)
    
    MaxEndRange = MaxEndRange * 1.2 #Add 20%
    #pdb.gimp_message(MaxPercentile)
    #pdb.gimp_message(MaxEndRange)
    pdb.gimp_drawable_curves_spline(drawable, 0, 6, (0.0, 0.0, MaxEndRange, 0.0, 0.9, 1.0))


Apologies for my coding style, I'm new to gimp and python.
Reply
#4
Looks like you are re-inventing the automatic contrast stretch, either

Code:
pdb.gimp_drawable_levels_stretch(drawable)
pdb.plug_in_autostretch_hsv(image, drawable)

Also, if you batch-process, using ImageMagick instead of Gimp is likely be  a better idea.
Reply
#5
Thanks, will look into these. Don't want the re-invent something.
Yes, I thought that ImageMagick might be the way to go, when I was looking for an auto de-skew routine, and came across it.
Now for a few more sleepless nights, playing with a new program.
Reply


Forum Jump: