HTML::Laundry

Perl module to clean HTML by the piece
Download

HTML::Laundry Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Perl Artistic License
  • Price:
  • FREE
  • Publisher Name:
  • Steve Cook
  • Publisher web site:
  • http://search.cpan.org/~scook/

HTML::Laundry Tags


HTML::Laundry Description

Perl module to clean HTML by the piece HTML::Laundry is a HTML::Parser-based HTML normalizer module, meant for small pieces of HTML, such as user comments, Atom feed entries, and the like, rather than full pages. Laundry takes these and returns clean, sanitary, UTF-8-based XHTML. The parser's behavior may be changed with callbacks, and the whitelist of acceptable elements and attributes may be updated on the fly.A snippet is cleaned several ways: * Normalized, using HTML::Parser: attributes and elements will be lowercased, empty elements such as and will be forced into the empty tag syntax if needed, and unknown attributes and elements will be stripped. * Sanitized, using an extensible whitelist of valid attributes and elements based on Mark Pilgrim and Aaron Swartz's work on sanitize.py: tags and attributes which are known to be possible attack vectors are removed. * Tidied, using HTML::Tidy or HTML::Tidy::libXML (as available): unclosed tags will be closed and the output generally neatened; future version may also use tidying to deal with character encoding issues. * Optionally rebased, to turn relative URLs in attributes into absolute ones.HTML::Laundry provides mechanisms to extend the list of known allowed (and disallowed) tags, along with callback methods to allow scripts using HTML::Laundry to extend the behavior in various ways. Future versions may provide additional options for altering the rules used to clean snippets.Out of the box, HTML::Laundry does not currently know about the tag and its children. For santizing full HTML pages, consider using HTML::Scrubber or HTML::Defang.SYNOPSIS #!/usr/bin/perl -w use strict; use HTML::Laundry; my $laundry = HTML::Laundry->new(); my $snippet = q{ < P STYLE="font-size: 300%" >< BLINK >"You may get to touch her< BR > If your gloves are sterilized< BR >< /BR > Rinse your mouth with Listerine< /BR > Blow disinfectant in her eyes"< /BLINK >< BR > -- X-Ray Spex, < I >Germ-Free Adolescents< I > < SCRIPT >alert('!!');< /SCRIPT > }; my $germfree = $laundry->clean($snippet); # $germfree is now: # < p >"You may get to touch her< br / > # If your gloves are sterilized< br / > # Rinse your mouth with Listerine< br / > # Blow disinfectant in her eyes"< br / > # -- X-Ray Spex, < i >Germ-Free Adolescents< /i >< /p > Requirements: · Perl


HTML::Laundry Related Software