Perl Diver 2.33
Main Environment Variables Perl Default Values Perl Config - Summary Perl Config - Full Installed Modules List Directory uptime Docs

Module Documentation
Details and documentation about a specific module, including version and documentation (if available). Note that while links to perldoc.com and search.cpan.org are provided, the module may be part of a larger distribution. If you reach a File Not Found page on either site, please try the parent module.

HTML::Clean

Name HTML::Clean
Version 1.4
Located at /usr/share/perl5
File /usr/share/perl5/HTML/Clean.pm
Is Core No
Search CPAN for this module HTML::Clean
Documentation HTML::Clean
Module Details HTML::Clean


NAME

HTML::Clean - Cleans up HTML code for web browsers, not humans


SYNOPSIS

  use HTML::Clean;
  $h = HTML::Clean->new($filename); # or..
  $h = HTML::Clean->new($htmlcode);
  $h->compat();
  $h->strip();
  $data = $h->data();
  print $$data;


DESCRIPTION

The HTML::Clean module encapsulates a number of common techniques for minimizing the size of HTML files. You can typically save between 10% and 50% of the size of a HTML file using these methods. It provides the following features:

Remove unneeded whitespace (beginning of line, etc)
Remove unneeded META elements.
Remove HTML comments (except for styles, javascript and SSI)
Replace tags with equivalent shorter tags ( --> )
etc.

The entire process is configurable, so you can pick and choose what you want to clean.


THE HTML::Clean CLASS

$h = HTML::Clean->new($dataorfile, [$level]);

This creates a new HTML::Clean object. A Prerequisite for all other functions in this module.

The $dataorfile parameter supplies the input HTML, either a filename, or a reference to a scalar value holding the HTML, for example:

  $h = HTML::Clean->new("/htdocs/index.html");
  $html = "<strong>Hello!</strong>";
  $h = HTML::Clean->new(\$html);

An optional 'level' parameter controls the level of optimization performed. Levels range from 1 to 9. Level 1 includes only simple fast optimizations. Level 9 includes all optimizations.

$h->initialize($dataorfile)

This function allows you to reinitialize the HTML data used by the current object. This is useful if you are processing many files.

$dataorfile has the same usage as the new method.

Return 0 for an error, 1 for success.

$h->level([$level])

Get/set the optimization level. $level is a number from 1 to 9.

$myref = $h->data()

Returns the current HTML data as a scalar reference.

strip(\%options);

Removes excess space from HTML

You can control the optimizations used by specifying them in the %options hash reference.

The following options are recognized:

boolean values (0 or 1 values)
  whitespace    Remove excess whitespace
  shortertags   <strong> -> <b>, etc..
  blink         No blink tags.
  contenttype   Remove default contenttype.
  comments      Remove excess comments.
  entities      &quot; -> ", etc.
  dequote       remove quotes from tag parameters where possible.
  defcolor      recode colors in shorter form. (#ffffff -> white, etc.)
  javascript    remove excess spaces and newlines in javascript code.
  htmldefaults  remove default values for some html tags
  lowercasetags translate all HTML tags to lowercase
parameterized values
  meta        Takes a space separated list of meta tags to remove,
              default "GENERATOR FORMATTER"
  emptytags   Takes a space separated list of tags to remove when there is no
              content between the start and end tag, like this: <b></b>.
              The default is 'b i font center'

Please note that if your HTML includes preformatted regions (this means, if it includes <pre>...</pre>, we do not suggest removing whitespace, as it will alter the rendered defaults.

HTML::Clean will print out a warning if it finds a preformatted region and is requested to strip whitespace. In order to prevent this, specify that you don't want to strip whitespace - i.e.

  $h->strip( {whitespace => 0} );

compat()

This function improves the cross-platform compatibility of your HTML. Currently checks for the following problems:

Insuring all IMG tags have ALT elements.
Use of Arial, Futura, or Verdana as a font face.
Positioning the tag immediately after the <head> tag.</STRONG><BR> <DD> </DL> <P> <H2><A NAME="defrontpage();">defrontpage();</A></H2> <P>This function converts pages created with Microsoft Frontpage to something a Unix server will understand a bit better. This function currently does the following:</P> <DL> <DT><STRONG><A NAME="item_Converts_Frontpage_%27hit_counters%27_into_a_unix_">Converts Frontpage 'hit counters' into a unix specific format.</STRONG><BR> <DD> <DT><STRONG><A NAME="item_Removes_some_frontpage_specific_html_comments">Removes some frontpage specific html comments</STRONG><BR> <DD> </DL> <P> <HR> <H1><A NAME="see also">SEE ALSO</A></H1> <P> <H2><A NAME="modules">Modules</A></H2> <P>FrontPage::Web, FrontPage::File</P> <P> <H2><A NAME="web sites">Web Sites</A></H2> <DL> <DT><STRONG><A NAME="item_Distribution_Site_%2D_http%3A%2F%2Fpeople%2Eitu%2E">Distribution Site - http://people.itu.int/~lindner/</STRONG><BR> <DD> </DL> <P> <HR> <H1><A NAME="authors and coauthors">AUTHORS and CO-AUTHORS</A></H1> <P>Paul Lindner for the International Telecommunication Union (ITU)</P> <P>Pavel Kuptsov <<A HREF="mailto:admin@modernperl.ru">admin@modernperl.ru</A>></P> <P> <HR> <H1><A NAME="copyright">COPYRIGHT</A></H1> <P>The HTML::Strip module is Copyright (c) 1998,99 by the ITU, Geneva Switzerland. All rights reserved.</P> <P>You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file.</P> </td></tr></table></td></tr></table><p /><table bgcolor="black" border="0" cellpadding="1" cellspacing="0" width="100%"><tr><td><table bgcolor="white" cellpadding="1" cellspacing="0" width="100%"><tr class="HL"><th>Perl Diver brought to you by <a href="http://scriptsolutions.com">ScriptSolutions.com</a> © 1997- 2026</th></tr></table></td></tr></table> </body> </html>