Portable Document Format (PDF) files have become an industry-standard document format that can be used whenever the publisher of a document wishes to ensure that the document's overall layout and look will remain the same across all computers. However, sometimes they can be a nuisance for the end-user when compared to ordinary text files, so utilities exist for converting PDF to text on Ubuntu.
Open the terminal by clicking the "Applications" option in the menu bar, then "Accessories" and then "Terminal." This will open your Ubuntu command line terminal, the Linux equivalent of the Windows DOS command prompt screen.
Type the following command into the terminal:
Replace "file.pdf" with the name of the PDF file. A TXT file with the same name will be created in the current directory.
Type the following to print out the resulting text:
Be sure to check your results. PDF to TXT conversion is inexact at best, and while it usually works, sometimes the resulting text file is corrupted.
Tips & Warnings
- There are dozens of "pdftotext" options available to allow you to specify exactly how the conversion runs. Type "man pdftotext" into your terminal to see those options.
- Your success in converting PDF to text will vary with each PDF file. Depending on the layout of the PDF file, it may turn out very well or the text may be hopelessly jumbled. Always check the results yourself before electronically distributing a text file converted from a PDF.
- All Ubuntu systems come with "pdftotext" installed as part of the "poppler-utils" package. This package is installed by default on Ubuntu systems, but if it is not installed on yours, you can reinstall it by typing "sudo apt-get install poppler-utils" into your terminal.
How to Extract Text from a PDF document
It can be very frustrating to try to extract text from a PDF file for use in another application. It's not uncommon...
How to Convert a PDF to HTML With Ubuntu
There are several ways to attempt changing a PDF file into an HTML. Keep in mind, the finished product will probably not...