Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
PDF to TXT
#2
(10-01-2020, 06:05 PM)Krikor Wrote: In short, I scan a PDF document and then try to edit it as text.

My problem is that I can only do this using Microsoft Word, not with Libre Office Writer or OpenOffice Writer.

With LibreOffice the file opens but does not allow me to edit it.

With OpenOffice version 4.1.7 (currently the most current), editing is allowed, but the text appears encoded. I've tried several Character Set options, but a page with ascii codes always appears.
[Image: N8duLGJ.png]
I didn't want to depend on having to use Word.

Does anyone know how to use either LibreOffice or OpenOffice to obtain the scanned PDF result in an editable text file?

I thank you for your help.

The "JFIF" at the beginning hints that there is a JPEG.

This said the right LibreOffice app to open PDF is Draw, not Writer. It can locate text parts, but considers each line as an independent text part, so don't expect to reflow paragraphs.

Now, If you scan documents, the scanner output is an image, not text, so your PDF is just a bunch of JPEGs, and MS-Office is doing some OCR. In which case you need an OCR program, which in the FOSS world is usually Tesseract.
Reply


Messages In This Thread
PDF to TXT - by Krikor - 10-01-2020, 06:05 PM
RE: PDF to TXT - by Ofnuts - 10-01-2020, 06:43 PM
RE: PDF to TXT - by Krikor - 10-01-2020, 06:58 PM
RE: PDF to TXT - by Ofnuts - 10-01-2020, 07:54 PM
RE: PDF to TXT - by meetdilip - 10-03-2020, 04:24 AM
RE: PDF to TXT - by Krikor - 10-03-2020, 01:16 PM

Forum Jump: