How to Extract Text From HTML
A HyperText Markup Language (HTML) file contains a number of elements including tags, script and text. In some cases, you'll need to isolate the text from your HTML document so that you can use it in other applications, such as an article or publication. Extracting text from an HTML file is a task that you can complete in one of several ways on your computer.
Instructions
-
From Your Browser
-
1
Load the HTML file in your Web browser of choice. The text available appears on screen by default.
-
2
Click "CTRL + A" to select all of the text on the screen and then "CTRL + C" to copy the text to your computer clipboard.
-
-
3
Press "CTRL + V" to paste the text from your HTML file to another application, such as a Microsoft Word document.
From Your HTML Editor
-
4
Load the HTML in Notepad or your HTML editor of choice. Notepad is a common choice available on most computer systems for free.
-
5
Navigate to the <body> section of the HTML file. Click the line directly after the opening <body> tag and select everything up to the line right before the closing </body> tag.
-
6
Click "CTRL + C" to copy the text and then load a new blank Notepad document. Press "CTRL + V" to paste the HTML text into the new document.
-
7
Go through the new Notepad document and delete any tags (like <a href> or <img> tags) so that your file only contains basic text.
Using Software
-
8
Download an extractor program to your system. HTML Text Extractor, for example, is compatible with Windows systems.
-
9
Load the extractor program and type in the Web address of the Web page containing the text you need to extract.
-
10
Click "Extracted Text" on the toolbar to access the text only. Click "Copy" to add the text to your clipboard and then "CTRL + V" to paste it into another application.
-
1