5 HTML::Tree - overview of HTML::TreeBuilder et al
13 # HTML::Tree is basically just a happy alias to HTML::TreeBuilder.
14 use HTML::TreeBuilder ();
16 use vars qw( $VERSION );
21 use HTML::TreeBuilder;
22 my $tree = HTML::TreeBuilder->new();
23 $tree->parse_file($filename);
25 # Then do something with the tree, using HTML::Element
26 # methods -- for example:
37 shift; unshift @_, 'HTML::TreeBuilder';
38 goto &HTML::TreeBuilder::new;
41 shift; unshift @_, 'HTML::TreeBuilder';
42 goto &HTML::TreeBuilder::new_from_file;
44 sub new_from_content {
45 shift; unshift @_, 'HTML::TreeBuilder';
46 goto &HTML::TreeBuilder::new_from_content;
51 HTML-Tree is a suite of Perl modules for making parse trees out of
52 HTML source. It consists of mainly two modules, whose documentation
53 you should refer to: L<HTML::TreeBuilder|HTML::TreeBuilder>
54 and L<HTML::Element|HTML::Element>.
56 HTML::TreeBuilder is the module that builds the parse trees. (It uses
57 HTML::Parser to do the work of breaking the HTML up into tokens.)
59 The tree that TreeBuilder builds for you is made up of objects of the
62 If you find that you do not properly understand the documentation
63 for HTML::TreeBuilder and HTML::Element, it may be because you are
64 unfamiliar with tree-shaped data structures, or with object-oriented
65 modules in general. Sean Burke has written some articles for
66 I<The Perl Journal> (C<www.tpj.com>) that seek to provide that background.
67 The full text of those articles is contained in this distribution, as:
71 =item L<HTML::Tree::AboutObjects|HTML::Tree::AboutObjects>
73 "User's View of Object-Oriented Modules" from TPJ17.
75 =item L<HTML::Tree::AboutTrees|HTML::Tree::AboutTrees>
79 =item L<HTML::Tree::Scanning|HTML::Tree::Scanning>
81 "Scanning HTML" from TPJ19
85 Readers already familiar with object-oriented modules and tree-shaped
86 data structures should read just the last article. Readers without
87 that background should read the first, then the second, and then the
92 You can find documentation for this module with the perldoc command.
96 You can also look for information at:
100 =item * AnnoCPAN: Annotated CPAN documentation
102 L<http://annocpan.org/dist/HTML-Tree>
106 L<http://cpanratings.perl.org/d/HTML-Tree>
108 =item * RT: CPAN's request tracker
110 L<http://rt.cpan.org/NoAuth/Bugs.html?Dist=HTML-Tree>
114 L<http://search.cpan.org/dist/HTML-Tree>
120 L<HTML::TreeBuilder>, L<HTML::Element>, L<HTML::Tagset>,
121 L<HTML::Parser>, L<HTML::DOMbo>
123 The book I<Perl & LWP> by Sean M. Burke published by
124 O'Reilly and Associates, 2002. ISBN: 0-596-00178-9
126 It has several chapters to do with HTML processing in general,
127 and HTML-Tree specifically. There's more info at:
129 http://www.oreilly.com/catalog/perllwp/
131 http://www.amazon.com/exec/obidos/ASIN/0596001789
133 =head1 SOURCE REPOSITORY
135 HTML::Tree is maintained in Subversion hosted at perl.org.
137 http://svn.perl.org/modules/HTML-Tree
139 The latest development work is always at:
141 http://svn.perl.org/modules/HTML-Tree/trunk
143 Any patches sent should be diffed against this repository.
145 =head1 ACKNOWLEDGEMENTS
147 Thanks to Gisle Aas, Sean Burke and Andy Lester for their original work.
149 Thanks to Chicago Perl Mongers (http://chicago.pm.org) for their
150 patches submitted to HTML::Tree as part of the Phalanx project
151 (http://qa.perl.org/phalanx).
153 Thanks to the following people for additional patches and documentation:
154 Terrence Brannon, Gordon Lack, Chris Madsen and Ricardo Signes.
158 Original HTML-Tree author Gisle Aas. Handed off to Sean M. Burke.
159 and Andy Lester. Currently maintained by Pete Krawczyk
160 C<< <petek@cpan.org> >>.
164 Copyright 1995-1998 Gisle Aas; 1999-2004 Sean M. Burke;
165 2005 Andy Lester; 2006 Pete Krawczyk. (Except the articles
166 contained in HTML::Tree::AboutObjects, HTML::Tree::AboutTrees, and
167 HTML::Tree::Scanning, which are all copyright 2000 The Perl Journal.)
169 Except for those three TPJ articles, the whole HTML-Tree distribution,
170 of which this file is a part, is free software; you can redistribute
171 it and/or modify it under the same terms as Perl itself.
173 Those three TPJ articles may be distributed under the same terms as
176 The programs in this library are distributed in the hope that they
177 will be useful, but without any warranty; without even the implied
178 warranty of merchantability or fitness for a particular purpose.