Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Possible to convert pic pdf to image series?
#1
Hello and good morning.

I post on Debian user group and forums with the same username, and on Knoppix forums as rajibando.

Have been using gimp, but only for minor uses, for the last 12 years. I don't have any high end technical knowledge on GIMP. Just common sense usage ideas.

I have a pdf file that contains only images.

There is a long process: Right Click on the PDF file ⟶ Open with GIMP ⟶ Import from pdf ⟶ Open as Images ⟶ set dpi and other options ⟶ each image opened.
Then save each image as a separate file.

Is there a shortcut? That is, an automated/batch process, for the above step without actually opening any file on GIMP window?
Reply
#2
I would not use Gimp for this. There is a package poppler-utils which has several PDF related utilities.

, One is pdfimages to extract images. Man pdfimages for all the options, which of course I shouls have done before posting Wink This bit useful

If both -png and -tiff are specified, CMYK images will be written as TIFF and all other images will be written as PNG.

Code:
pdfimages -png -tif  filename.pdf your-ref

Edit: A bit more information will help. Are the images in the pdf actually a scan of a whole page, and what is actually required is a method of cropping the image(s) from the page background. There is a Gimp script DivideScannedImages which has a batch function. Can you give details of a single page layout ?

There is also this: Not Gimp, Imagemagick https://stackoverflow.com/questions/4968...background
Reply
#3
Yes, I use popplar-utils, like pdftotext to convert pdf files to text files.
But I don't know the exact codeline to do the work for a series of image files.
Such as,
pdftoppm -jpeg filename.pdf /folder
pdftocairo -jpeg filename.pdf /folder
pdfimages -jpeg filename.pdf /folder

Beyond me to interpret man pages such as https://www.systutorials.com/docs/linux/...dftocairo/
or
https://docs.oracle.com/cd/E88353_01/htm...ges-1.html

Example codelines are difficult to procure generally.

I come from a non-unix GUI (read doze) background, so I am not adept at deciphering man pages.

This issue has been at the back of my mind for some time. So I thought this might be the right time to get a solution via gimp.

So thank you for the code line.

Could you, the Admin, the leader of this forum, please elaborate on what is "your-ref" on your code line,
Code:
pdfimages -png -tif  filename.pdf your-ref
?

Edit:
For the present purpose, Imagemagick isn't necessary for separating individual images from a larger image.
Each page of the pdf file contains one image file.
Reply
#4
Ahh... only just see your edit. One image per page ? Is it embedded in text or is there just surrounding space
?

However I was expecting something like this, it might help someone.
----------------
This example extracting images from a multi-page PDF (nothing special, I just made one using XSane)

Using a script DivideSannedImmagesLatest.scm which goes in your Gimp profile ~/.config/GIMP/2.10/scripts and a compiled plugin deskew64 which goes in  ~/.config/GIMP/2.10/plug-ins

The workflow goes
Break the PDF up into pages using pdfimages 
Extract the images from the pages using Gimp.

To save a lot of writing a 5 minute video - not viewed -deleted.


Attached Files
.zip   extractimages.zip (Size: 35.51 KB / Downloads: 266)
Reply
#5
The pdf file is only Image. And space. No Text. If desired, I could even send you the link by a PM. That is, if desired.

Kindly accept my accolades and admiration. I haven't seen another pro-active and hands-on admin like you in my entire experience with Debian or Knoppix. All admins are quite distant by several orders from simple users like us.

Yes, your post shall help those who come later. So thank you.
Reply
#6
You can still use that procedure with a single image per page PDF. It will straighten (if required) and crop down to just the images. By all means send a link to the PDF there might be other or better ways to process the file.
Reply
#7
(06-07-2021, 03:19 PM)rich2005 Wrote: You can still use that procedure with a single image per page PDF. It will straighten (if required) and crop down to just the images....
You meant, using your first code line?

I had posted one query in my earlier post:
Quote:Could you ... please elaborate on what is "your-ref" on your code line,
Code:
pdfimages -png -tif  filename.pdf your-ref

(06-07-2021, 03:19 PM)rich2005 Wrote: ... By all means send a link to the PDF there might be other or better ways to process the file.

Sure, will do.
Reply
#8
I hope you do not have a library full of similar PDF's to process. This scanned document does not lend itself to an automated extraction process. However, there can be some improvements made and each page exported to a separate jpg as required.

The 29 page PDF supplied took 10 minutes to process. To do a better job will require each page to be checked and modified individually  and I am guessing that is not what you want to do.

The work flow went like this
Open the PDF. I checked the actual embedded images and they are small jpgs. My feeling is not much point importing at 300 ppi , I used 150 ppi.

The overall readability of the pages can be improved using a plug-in gmic_gimp_qt from http://www.gmic.eu  There is a version compiled for Debian. Once installed in Gimp, open gmic and  look for the Repair scanned images filter. 

The pages are all different sizes / angles / offsets. Try and find a sensible 'window' that fits best. Set some guides up for that 'window'

For multi-layer manipulation, hold the shift key down to toggle layer visibility and links on / off

I rotated the whole layer stack a very small amount.
One over large layer I scaled down
One small layer I scaled up.
The whole image I cropped to the size of the 'window' 

Exporting the layers as individual jpegs. If using the latest Debian, Gimp might not have python support, so the attachment is a script-fu. Unzip and put in  ~/.config/GIMP/2.10/scripts.
That gives File -> Export Layers  put in your export location / filename.jpg to get numbered files.

It took me about 10 minutes for that but then I know my way around Gimp.

As a video - not viewed - deleted

best of luck


Attached Files
.zip   export-layers-plus.scm.zip (Size: 3.91 KB / Downloads: 248)
Reply
#9
First words: Our illustrious Admin, our leader, your Neural Network clusters rock! You really are magic! I am surprised that you aren't being utilised better by GNU-Linux for its achieving loftier ideals.


rich2005 Wrote:The pages are all different sizes / angles / offsets...
Yes, that was the first thought I had when when I opened the file with evince.
So better to use popplar-utils to extract,  and then manually adjust all pages?

rich2005 Wrote:Exporting the layers as individual jpegs. If using the latest Debian, Gimp might not have python support, so the attachment is a script-fu. Unzip and put in  ~/.config/GIMP/2.10/scripts.
That gives File -> Export Layers  put in your export location / filename.jpg to get numbered files.  ...
So there is no script file to be run that has gimp run in a  terminal and extract individual images? Then in this aspect, pdfimages et al would have indeed been better.

BTW, I could have had stupidly posted another question that could have been put up in a different thread. Like gimp run in terminal like popplar-utils. But this idea too defeats the purpose of a GUI.

Thank goodness, this experience already answered that stupid question.


rich2005 Wrote:... I rotated the whole layer stack a very small amount.
One over large layer I scaled down
One small layer I scaled up.
The whole image I cropped to the size of the 'window'  ...

So, if all pages are turned at different angles, and have to be handled manually, it would be better to use popplar-utils and then gimp to cut individual pages (off the 2-page format) and polish up those the pics.

For me, in the end of this chat, I simply wasted your time, energy, money and zeal, that could have been better used for a higher purpose.

I am ashamed, and I apologise. I in fact had asked you for clarifying certain aspects of the pdfimages code  line.

Unfortunately, I had by then already set you off on solving the problem via gimp. I have to atone for my putting your abilities on to a poor mission, when the entire community could have been better served by your abilities being put to good use. I apologise to the community too.
Reply


Forum Jump: