Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to batch open multiple PDF files and batch export them again as PDF files
#1
Hello!

I've been reading a bit about the gimp batch mode and gimp scripts but I'm not sure what I'm trying to do is possible and I was looking for some guidance.

I have multiple PDF files (let's say 100). I want to batch open them with GIMP (specifying the resolution - DPI) and maybe even more options (Anti-aliasing, the layer etc). Then I want to directly export them as PDF again (specifying the export options - apply layer masks, convert bitmaps and omit hidden layers).

Is this possible in the first place? And if yes, could you please give me some pointers as to how I can achieve that?

Thank you very much in advance!
Reply
#2
You can't specify the resolution when importing them via the API, unfortunately.
Reply
#3
Saw your post on gimpchat, are you an ex-M$ photoshop user? If so then try and forget old PS habits. Using Gimp 2.8, nothing wrong with that, old plugins more likely to work. Probably not using a cutting edge linux, again nothing wrong with that. (oh, just seen you are from CERN - very impressive)

Apart from a bespoke plugin and Ofnuts gave advice on that. My thoughts on the subject.

100 PDF's to open. Gimp can open as layers but that involves 100 clicks on the import button. Take into consideration that Gimp will render the PDF(s) to bitmaps. What was a scalable vector (including text) becomes a fixed ppi bitmap. If the PDF's are multi-page, Gimp will number the layers per document ie. 1,2,3, 1#1,2#1,3#1,4 which can result in a scrambled final combined PDF. There are scripts/plug-ins that will rename layers with consecutive numbers.

A better way is combine the pdfs beforehand with a utility such as PDFsam (pdf split-and-merge) there is a free version. Combining will give a single PDF to open in Gimp.
or
If many of pages are pure text, splitting allows picking just the graphic pages from the PDF(s) to be edited/recombined. Depending on the number of pure text pages this can make a big difference in the final PDF size.

edit: Also have a look at pdftk. It is command line but more suitable for a linux bash file.

What is possible with Gimp (and linux)
Install ImageMagick (IM), it is probably already there, if not it will be in your disto repository.
Export all the layers, there are scripts/plugins for this, the attached sg-save-all-layers.scm exports as png, will flatten layer groups and layer masks (not all will do that) screenshot: https://i.imgur.com/k0HgM6q.jpg  Now combine those into a PDF using IM convert command

Code:
convert /path/to/*.png new.pdf

(note about IM and pdf/eps/ps formats. I use Kubuntu 16.04 that command suddenly stopped working with the last update, a month ago. Had to go into /etc/ImageMagick-6/policy.xml and comment out those disabled formats. Why disabled? Who knows.)

ImageMagick same as Gimp, rasterizes any file so you do end up with large PDF's

The above but straight from Gimp.
There is a (old, bit flakey) plug-in export-layers-to-pdf.py (attached) that uses IM (see note about pdf format) to export the layers to a temp folder, combines those to a new PDF. Caveats, will fail with layer groups and layer mask so up-to-you to flatten first. screenshot with result open in PDF viewer: https://i.imgur.com/CqG5eMg.jpg Choose a quality setting less than 100 and the bitmaps are jpeg, makes a big difference to the PDF size.

Long and complicated: Yes, but at least it is not dumbed-down click-n-wish.


Attached Files
.zip   export-to-pdf.zip (Size: 4.07 KB / Downloads: 401)
Reply
#4
(11-07-2018, 09:56 PM)Ofnuts Wrote: You can't specify the resolution when importing them via the API, unfortunately.

Thank you for the information! That is unfortunately already a preventing issue for me...

(11-08-2018, 10:01 AM)rich2005 Wrote: Saw your post on gimpchat, are you an ex-M$ photoshop user? If so then try and forget old PS habits. Using Gimp 2.8, nothing wrong with that, old plugins more likely to work. Probably not using a cutting edge linux, again nothing wrong with that. (oh, just seen you are from CERN - very impressive)

Before Gimp I was using...Microsoft Paint :-P So thankfully no PS habits to drop but not much experience either to help me.
You're right: Gimp 2.8 because of https://packages.debian.org/stretch/gimp

(11-08-2018, 10:01 AM)rich2005 Wrote: Apart from a bespoke plugin and Ofnuts gave advice on that. My thoughts on the subject.

100 PDF's to open. Gimp can open as layers but that involves 100 clicks on the import button. Take into consideration that Gimp will render the PDF(s) to bitmaps. What was a scalable vector (including text) becomes a fixed ppi bitmap. If the PDF's are multi-page, Gimp will number the layers per document ie. 1,2,3, 1#1,2#1,3#1,4 which can result in a scrambled final combined PDF. There are scripts/plug-ins that will rename layers with consecutive numbers.

A better way is combine the pdfs beforehand with a utility such as PDFsam (pdf split-and-merge) there is a free version. Combining will give a single PDF to open in Gimp.
or
If many of pages are pure text, splitting allows picking just the graphic pages from the PDF(s) to be edited/recombined. Depending on the number of pure text pages this can make a big difference in the final PDF size.

edit: Also have a look at pdftk. It is command line but more suitable for a linux bash file.

What is possible with Gimp (and linux)
Install ImageMagick (IM), it is probably already there, if not it will be in your disto repository.
Export all the layers, there are scripts/plugins for this, the attached sg-save-all-layers.scm exports as png, will flatten layer groups and layer masks (not all will do that) screenshot: https://i.imgur.com/k0HgM6q.jpg  Now combine those into a PDF using IM convert command

Code:
convert /path/to/*.png new.pdf

(note about IM and pdf/eps/ps formats. I use Kubuntu 16.04 that command suddenly stopped working with the last update, a month ago. Had to go into /etc/ImageMagick-6/policy.xml and comment out those disabled formats. Why disabled? Who knows.)

ImageMagick same as Gimp, rasterizes any file so you do end up with large PDF's

The above but straight from Gimp.
There is a (old, bit flakey) plug-in export-layers-to-pdf.py (attached) that uses IM (see note about pdf format) to export the layers to a temp folder, combines those to a new PDF. Caveats, will fail with layer groups and layer mask so up-to-you to flatten first. screenshot with result open in PDF viewer: https://i.imgur.com/CqG5eMg.jpg Choose a quality setting less than 100 and the bitmaps are jpeg, makes a big difference to the PDF size.

Long and complicated: Yes, but at least it is not dumbed-down click-n-wish.

Thank you so much for your detailed answer - I really appreciate it.

A few notes on my use case: we're processing scientific papers (PDF, with hundreds of pages of content and hundreds of figures at times) that eventually get printed. When printing, conversion to PDF/X is needed. (Vector) Figures are suspect #1 to create problems when converting to PDF/X. One good-enough solution in that case is to rasterise problematic figures (or even the entire page that contains the figure). When rasterising we want to use a high enough DPI (>=300, usually 500) to make sure there is no visible drop in quality (for A4 printing).

This is where my initial question and Gimp come in. Normally we have the PDF/EPS figure files (could be hundreds) or we use pdftk to pick single pages. Then we use Gimp to open them (which converts them to bitmaps with the DPI we select, as you mention) and then export them to PDF again. For a few figures/pages that's easy to manually do - but when there is a lot of them we were looking for ways to automate.

For now, given an input.pdf file, we automated like this (imagine this in a bash script with loops etc):
Code:
$ pdftoppm -r 300 input.pdf output.ppm
Code:
$ convert -density 300 output.ppm output.pdf
However, we're not as happy with the outcome of pdftoppm + convert as we are with Gimp, especially when it comes to the final file size. Gimp seems to optimize a few things in the process (ImageMagick vs cairo graphics ?). So we were wishing to automate it in Gimp.

See for example the difference in these files: https://cernbox.cern.ch/index.php/s/LWdT2UDsECdVrfn
* levels2nn_INPUT.pdf is the original file (based on an EPS file, following epspdf conversion) --> 11K
* levels2nn_GIMP.pdf is the output of Gimp (import with 300 DPI and export as PDF) --> 151K
* levels2nn_PPM_IM.pdf is the output of pdftoppm converted to PDF with IM (using 300 DPI as well, with the commands quoted just above) --> 250K
* levels2nn_PPM_GIMP.pdf is the output of pdftoppm imported Gimp and exported to PDF (first scale image to 300 DPI and then export as PDF) --> 97K

Maybe the ideal combination would be to batch convert to PPM (with pdftoppm) and then batch open in GIMP, scale to 300 DPI and export as PDF. Would that be possible with a Gimp script?

Thank you both very much for your time!
Reply
#5
To (sort of) answer my own question, here is what I decided to do:
Note #1: the following will work for single pages, some extra bash scripting is needed for multiple pages
Note #2: choose the <DPI> of your liking

[1] In case the file is EPS, first convert to PDF

Code:
$ epstopdf input_file.eps input_file.pdf

[2] Get the PDF metadata

Code:
$ pdftk input_file.pdf dump_data_utf8 output input_file.data

[3] Convert the the PDF file to PPM

Code:
$ pdftoppm -cropbox -r <DPI> input_file.pdf > input_file.ppm

[4] Convert the PPM to PDF

Code:
$ gimp -idfn --batch-interpreter python-fu-eval -b "image = pdb.gimp_file_load('input_file.ppm', 'input_file.ppm'); pdb.gimp_image_set_resolution(image, <DPI>, <DPI>); drawable = pdb.gimp_image_get_active_drawable(image); pdb.file_pdf_save(image, drawable, 'output_file.pdf', 'output_file.pdf', True, True, True); pdb.gimp_quit(1)"

[5] Set the PDF metadata

Code:
$ pdftk input_file.pdf update_info_utf8 input_file.data output output_file.pdf

Any feedback would be welcome :-)
Reply


Forum Jump: