Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
PDF to TXT
#3
(10-01-2020, 06:43 PM)Ofnuts Wrote: The "JFIF" at the beginning hints that there is a JPEG.

This said the right LibreOffice app to open PDF is Draw, not Writer. It can locate text parts, but considers each line as an independent text part, so don't expect to reflow paragraphs.

Now, If you scan documents, the scanner output is an image, not text, so your PDF is just a bunch of JPEGs, and MS-Office is doing some OCR. In which case you need an OCR program, which in the FOSS world is usually Tesseract.

In fact, several times the file opened directly in Draw, even though I selected Writer. But he edits as an image, not as text.
How could I be generating an image?
I configured the scanner to generate a .pdf
[Image: bIXhKVf.jpg]


Quote:Now, If you scan documents, the scanner output is an image, not text, so your PDF is just a bunch of JPEGs, and MS-Office is doing some OCR. In which case you need an OCR program, which in the FOSS world is usually Tesseract.

Hummmm ok...
This Tesseract I remember trying to use it, but I couldn't, I think that either I didn't know or there was another problem, I'll try to download it and try again.

Thx Ofnuts!
Reply


Messages In This Thread
PDF to TXT - by Krikor - 10-01-2020, 06:05 PM
RE: PDF to TXT - by Ofnuts - 10-01-2020, 06:43 PM
RE: PDF to TXT - by Krikor - 10-01-2020, 06:58 PM
RE: PDF to TXT - by Ofnuts - 10-01-2020, 07:54 PM
RE: PDF to TXT - by meetdilip - 10-03-2020, 04:24 AM
RE: PDF to TXT - by Krikor - 10-03-2020, 01:16 PM

Forum Jump: