How to Use HTML Purifier As a Tidy Alternative
HTML Purifier is a HTML-filtering library for PHP. It includes a set of features inspired by HTML Tidy, which cleans up HTML and converts outdated HTML to standards-compliant code. You can use HTML Purifier with PHP as an alternative to HTML. You can tidy and clean up your website’s HTML while you're traveling. HTML Purifier includes three different levels, so you can customize how aggressively it processes your HTML code. The default level, medium, shouldn’t cause problems with most Web pages.
Instructions
-
-
1
Download HTML Purifier from HTMLPurifier.org/Download.
-
2
Extract the downloaded HTML Purifier .tar.gz or .zip file to your PHP server. If you don’t want to extract the entire file, you can omit every folder except for the “Library” folder inside the archive.
-
-
3
Run the following command on your server to give the Web server write permissions to the Serializer directory, replacing “/path/to/HTMLPurifier/DefinitionCache/Serializer” with the path to the Serializer directory on your system:
chmod -R 0755 /path/to/HTMLPurifier/DefinitionCache/Serializer
-
4
Open a file you want to use HTML Purifier with in a text editor.
-
5
Determine your document’s document type and encoding from the “<!DOCTYPE html PUBLIC” and “<meta http-equiv="Content-type"” types in the file.
-
6
Add the following code to your file, replacing “/location/of/htmlpurifier/library/HTMLPurifier.auto.php” with the location of the HTMLPurifier.auto.php file on your system:
<?php
require_once '/location/of/htmlpurifier/library/HTMLPurifier.auto.php'; -
7
Add the following code to the file, replacing “medium” with “light” or “heavy” if you want to use a different setting. HTML Purifier uses the medium level by default; you can switch to the light level if the medium level causes problems. The heavy level performs aggressive replacement of code and may cause problems.
$config->set('HTML.TidyLevel', 'medium');
-
8
Add the following code to your file if the page’s document type is XHTML Transitional and its encoding is UTF-8:
$purifier = new HTMLPurifier();
Add the following code instead if the document uses a different document type or character set, replacing “ISO-8859-2” with your document’s encoding and “HTML 4.01 Strict” with your document’s document type:
$config = HTMLPurifier_Config::createDefault();
$config->set('Core.Encoding', 'ISO-8859-2');
$config->set('HTML.Doctype', 'HTML 4.01 Strict');
$purifier = new HTMLPurifier($config); -
9
Add the following code to the file:
$clean_html = $purifier->purify($dirty_html);
?> -
10
Save the file and upload it to your Web server.
-
1
Tips & Warnings
Test your website after enabling HTML Purifier. If you encounter problems, try setting the tidy level to “light” or disabling HTML Purifier entirely.