Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Cleaning scanned magazine pages
#1
Photo 
I'm totally inexperienced with GIMP but I chanced upon this recipe that would meet my needs for a rather big project I'm currently doing. It's about cleaning scanned magazine pages where the print on the back side shines through the thin paper making the reading less enjoyable.

GIMP Recipe:
  1. Open a file.
  2. Convert your document to grayscale: Image - Mode - Grayscale.
  3. Select the background color: Select - By Color, click with mouse pointer on the color of the background.
  4. Invert the selected color: Select - Invert.
  5. Copy the selection: Edit - Copy.
  6. Create a new file: File - New.
  7. In the dialog of a new file, in field: Advanced Options choose: Fill with: White, hit Ok.
  8. Click anywhere in the window of the new opened document, just to choose it.
  9. Paste the content of a clipboard: Edit - Paste.
  10. Add a new layer to enhance the black text: Layer - New from Visible, in the layer's palette, in field: Mode: choose Multiply.
  11. Combine two layers: Layer - Merge Down.
  12. Save the result as a JPEG file: File - Export As, choose jpeg and set the quality at least 60.

I have a couple of questions.
  •  When getting to "Combine two layers: Layer - Merge Down." the Merge Down option is grayed out so I can't apply that step.
What could be the reason? And why does it say "Combine two layers" when there is three?

1.Floated selection (Pasted Layer)
2.Visible
3.Background
  •  When selecting a background color to be - erased/turned to white - what would be the best way to pick a wider range of grayness than the rather sparse instruction the recipe suggests "By Color, click with mouse pointer on the color of the background."
Reply
#2
(08-31-2017, 05:19 AM)MrNiceButDim Wrote: 1.Floated selection (Pasted Layer)
2.Visible
3.Background

A Floating Selection appears after pasting. It is an interim step and many options are not available until the step is completed.

There are two options:
1. Right click on the Floating Selection in the layers dialogue and select To New Layer
2. Click the Anchor button in the layers dialogue. The Floating Selection will be merged into the active layer.

For now, select the To New Layer option. Then you can do what you want with the new layer.
Reply
#3
Replace from step 9:

9) Paste the content of a clipboard: Edit - Paste.
10) Layer>Anchor Layer (Ctrl-H)
11) Layer>Duplicate layer (Ctrl-Shift-D) set to multiply
12) Export (no need to merge, this is done when you export).

But IMHO for strictly the same results (assuming you start from JPEG):

1) Select background
2) Edit>Clear [Del] (this replace the background by pure white)
3) Possibly duplicate/multiply
4) Export

However both techniques rely on the hypothesis that there exists a threshold level for "Select by color" where the whole background (white and see-through characters) is selected, without selecting anything in the good text. They can be improved by reducing th threshold and, after an initial simple click, shift-clicking in several places to add the missing bits. But there can be easier solutions, just post an image (or a representative extract) so that we can work on real data.
Reply
#4
Another way:

GMIC has (in the Repair section) a filter called Repair Scanned Document. That may work for you.


Attached Files Image(s)
   
Reply
#5
Quote:When selecting a background color to be - erased/turned to white - what would be the best way to pick a wider range of grayness than the rather sparse instruction the recipe suggests "By Color, click with mouse pointer on the color of the background."


A quick scan as an example.

You can adjust the color select threshold to make more/less selective. Typically the default value is 15 and clicking on an 'empty' space in the image background gives something like this. The top text is selected but so is a lot of the 'reverse'

screenshot http://i.imgur.com/Kg1JJtT.jpg

Then it becomes a matter of increasing the threshold until you get a satisfactory result. This one a compromise, a few speckles remaining from the reverse and a light section from the top missing.

http://i.imgur.com/krOkNab.jpg


and the result http://i.imgur.com/TShnZff.jpg


There are other ways to adjust a selection, parts can be painted out using a quick mask or selections can be combined using the selection modes. It all depends on the original scan and time and effort.

and...agree with ofnuts there are more economical ways but get something working first.
Reply
#6
I often ask questions in user forums and I must say that this seems to be the most useful I have ever encountered. A stream of good suggestions in no time at all. Brilliant!

I haven't had the opportunity to test everything out yet but I will. So far I have tried:

Blighty's suggestion on Floated selection (Pasted layer) and that works well.

Ofnuts' suggestion replacing from step 9. Works even better, simpler way for me to understand. Question: Why does the Export dialog suggest saving it as a png file when the original is jpg? As jpg is the most common format for me to save in it would be useful to have that as default.

Ofnuts' 2'nd suggestion in four steps. The result looks as good as the other ones and even simpler (which is good if it comes to automate (parts) of the process later).

Blighty's GMIC suggestion. I surprised myself by successfully installing GMIC. Didn't produce a result as good as the other ones but what can you expect doing a 30 second test. Might improve if I spend some time with it.

rich2005. Good mini-tutorial. Would have taken me quite some time to find out myself.

I did get one response in another forum. Putting a black sheet of paper under (or over depending how you see it) the page you're scanning. It's such a fun and extraordinary suggestion that I must try it out. It will however take five days for the paper I ordered to arrive so I'll report back on that. Must say I have mixed feelings about this. Nice if it works but it also means that I would have to scan all of the pages again.

I'm attaching the sample page I've been working with. It's a little bit cheaty as the only image on the page is nicely framed by black borders therefore not affected by the text manipulation but that's another thing to be dealt with later.  (If you wonder the language is Swedish.)

Sample page
Reply
#7
Yes, redoing the scan is the best way.... You don't need black paper, anything dark will do. In any case this will likely darken the background in the scan, so you will have to use Brightness/Contrast to fix it (but this will be an easy step).
Reply
#8
Quote:...Putting a black sheet of paper under ...

Can sometimes be problematic if using a sheet-feed scanner.

However, back to the original method and good practice for me. I changed the work-flow a little. I think this is a little more economical.

Color select
Invert selection
Copy selection
Paste-As -> New Image
Remove alpha channel
Optionally, Auto-white-balance.

Then to semi-automate the process, that sequence, assigned to keyboard shortcuts.

If you can be bothered, the whole thing, 3 and a half minutes



Reply
#9
Good results too with Color>Desaturate followed by the Curves tool:

   

   
Reply
#10
rich2005 What an amazing tutorial, thank you so much. (Do I detect a very slight Welsh accent? Smile ) Questions:

Using the method of assigning keyboard shortcuts. If I check the Save keyboard shortcuts on exit I guess that will change them for good, but if I don't I have to redo them next time, don't I?

Is there another way of automating workflows, like recording scripts or something like that? (Well, I'm sure there is, but the question is really: Is it too complicated for a beginner like I?)

And my scanner is Mustek ScanExpress A3 Flatbed. It was quite cheap when I bought in in 2011 and I must say very price worthy. It would be next to impossible to scan a thick magazine with sheet-feed scanner, I think.

Ofnuts Wow, I didn't choose my username for no reason and yet you managed to explain it in a way so that even I got the gist of it. I won't pretend to actually understand the inner workings of Levels but your example was so clearcut I might be able to use it for this particular set of scans that in many cases share the same characteristics.

I do understand that to get the best possible result for each page you need to treat them individually but perhaps finding an "average" case that gets a good result it would be possible to automate the procedure. It is of course a matter of finding a balance in quality vs. time spent when it's a matter of 2000-2500 pages.
Reply


Forum Jump: