YAPE::HTML

YAPE::HTML is Yet Another Parser/Extractor for HTML.
Download

YAPE::HTML Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Perl Artistic License
  • Price:
  • FREE
  • Publisher Name:
  • Jeff Pinyan
  • Publisher web site:
  • http://search.cpan.org/~pinyan/YAPE-Regex-3.03/Regex/Element.pm

YAPE::HTML Tags


YAPE::HTML Description

YAPE::HTML is Yet Another Parser/Extractor for HTML. YAPE::HTML is Yet Another Parser/Extractor for HTML.SYNOPSIS use YAPE::HTML; use strict; my $content = "< html>...< /html>"; my $parser = YAPE::HTML->new($content); my ($extor,@fonts,@urls,@headings,@comments); # here is the tokenizing part while (my $chunk = $parser->next) { if ($chunk->type eq 'tag' and $chunk->tag eq 'font') { if (my $face = $chunk->get_attr('face')) { push @fonts, $face; } } } # here we catch any errors unless ($parser->done) { die sprintf "bad HTML: %s (%s)", $parser->error, $parser->chunk; } # here is the extracting part # < A> tags with HREF attributes # < IMG> tags with SRC attributes $extor = $parser->extract(a => , img => ); while (my $chunk = $extor->()) { push @urls, $chunk->get_attr( $chunk->tag eq 'a' ? 'href' : 'src' ); } # < H1>, < H2>, ..., < H6> tags $extor = $parser->extract(qr/^h$/ => []); while (my $chunk = $extor->()) { push @headings, $chunk; } # all comments $extor = $parser->extract(-COMMENT => []); while (my $chunk = $extor->()) { push @comments, $chunk; }YAPE MODULESThe YAPE hierarchy of modules is an attempt at a unified means of parsing and extracting content. It attempts to maintain a generic interface, to promote simplicity and reusability. The API is powerful, yet simple. The modules do tokenization (which can be intercepted) and build trees, so that extraction of specific nodes is doable.This module is yet another parser and tree-builder for HTML documents. It is designed to make extraction and modification of HTML documents simplistic. The API allows for easy custom additions to the document being parsed, and allows very specific tag, text, and comment extraction. Requirements: · Perl


YAPE::HTML Related Software