X-Git-Url: http://vcs.maemo.org/git/?a=blobdiff_plain;f=dev%2Farm%2Flibhtml-tree-perl%2Flibhtml-tree-perl-3.23%2Flib%2FHTML%2FElement%2Ftraverse.pm;fp=dev%2Farm%2Flibhtml-tree-perl%2Flibhtml-tree-perl-3.23%2Flib%2FHTML%2FElement%2Ftraverse.pm;h=744b4b0492d2fd20d9d9b9010723aa21b849c781;hb=f477fa73365d491991707e7ed9217b48d6994551;hp=0000000000000000000000000000000000000000;hpb=da95c414033799c3a62606f299c3c00b5c77ca11;p=dh-make-perl diff --git a/dev/arm/libhtml-tree-perl/libhtml-tree-perl-3.23/lib/HTML/Element/traverse.pm b/dev/arm/libhtml-tree-perl/libhtml-tree-perl-3.23/lib/HTML/Element/traverse.pm new file mode 100644 index 0000000..744b4b0 --- /dev/null +++ b/dev/arm/libhtml-tree-perl/libhtml-tree-perl-3.23/lib/HTML/Element/traverse.pm @@ -0,0 +1,329 @@ + +# This is a .pm just to (try to) make some CPAN document converters +# convert it happily as part of the dist's documentation tree. +package HTML::Element::traverse; + # Time-stamp: "2002-11-22 23:53:39 MST" +use HTML::Element (); +$VERSION = $VERSION = $HTML::Element::VERSION; +1; + +__END__ + +=head1 NAME + +HTML::Element::traverse - discussion of HTML::Element's traverse method + +=head1 SYNOPSIS + + # $element->traverse is unnecessary and obscure. + # Don't use it in new code. + +=head1 DESCRIPTION + +C provides a method C that traverses the tree +and calls user-specified callbacks for each node, in pre- or +post-order. However, use of the method is quite superfluous: if you +want to recursively visit every node in the tree, it's almost always +simpler to write a subroutine does just that, than it is to bundle up +the pre- and/or post-order code in callbacks for the C +method. + +=head1 EXAMPLES + +Suppose you want to traverse at/under a node $tree and give elements +an 'id' attribute unless they already have one. + +You can use the C method: + + { + my $counter = 'x0000'; + $start_node->traverse( + [ # Callbacks; + # pre-order callback: + sub { + my $x = $_[0]; + $x->attr('id', $counter++) unless defined $x->attr('id'); + return HTML::Element::OK; # keep traversing + }, + # post-order callback: + undef + ], + 1, # don't call the callbacks for text nodes + ); + } + +or you can just be simple and clear (and not have to understand the +calling format for C) by writing a sub that traverses the +tree by just calling itself: + + { + my $counter = 'x0000'; + sub give_id { + my $x = $_[0]; + $x->attr('id', $counter++) unless defined $x->attr('id'); + foreach my $c ($x->content_list) { + give_id($c) if ref $c; # ignore text nodes + } + }; + give_id($start_node); + } + +See, isn't that nice and clear? + +But, if you really need to know: + +=head1 THE TRAVERSE METHOD + +The C method is a general object-method for traversing a +tree or subtree and calling user-specified callbacks. It accepts the +following syntaxes: + +=over + +=item $h->traverse(\&callback) + +=item or $h->traverse(\&callback, $ignore_text) + +=item or $h->traverse( [\&pre_callback,\&post_callback] , $ignore_text) + +=back + +These all mean to traverse the element and all of its children. That +is, this method starts at node $h, "pre-order visits" $h, traverses its +children, and then will "post-order visit" $h. "Visiting" means that +the callback routine is called, with these arguments: + + $_[0] : the node (element or text segment), + $_[1] : a startflag, and + $_[2] : the depth + +If the $ignore_text parameter is given and true, then the pre-order +call I be happen for text content. + +The startflag is 1 when we enter a node (i.e., in pre-order calls) and +0 when we leave the node (in post-order calls). + +Note, however, that post-order calls don't happen for nodes that are +text segments or are elements that are prototypically empty (like "br", +"hr", etc.). + +If we visit text nodes (i.e., unless $ignore_text is given and true), +then when text nodes are visited, we will also pass two extra +arguments to the callback: + + $_[3] : the element that's the parent + of this text node + $_[4] : the index of this text node + in its parent's content list + +Note that you can specify that the pre-order routine can +be a different routine from the post-order one: + + $h->traverse( [\&pre_callback,\&post_callback], ...); + +You can also specify that no post-order calls are to be made, +by providing a false value as the post-order routine: + + $h->traverse([ \&pre_callback,0 ], ...); + +And similarly for suppressing pre-order callbacks: + + $h->traverse([ 0,\&post_callback ], ...); + +Note that these two syntaxes specify the same operation: + + $h->traverse([\&foo,\&foo], ...); + $h->traverse( \&foo , ...); + +The return values from calls to your pre- or post-order +routines are significant, and are used to control recursion +into the tree. + +These are the values you can return, listed in descending order +of my estimation of their usefulness: + +=over + +=item HTML::Element::OK, 1, or any other true value + +...to keep on traversing. + +Note that C et +al are constants. So if you're running under C +(as I hope you are), and you say: +C +the compiler will flag this as an error (an unallowable +bareword, specifically), whereas if you spell PRUNE correctly, +the compiler will not complain. + +=item undef, 0, '0', '', or HTML::Element::PRUNE + +...to block traversing under the current element's content. +(This is ignored if received from a post-order callback, +since by then the recursion has already happened.) +If this is returned by a pre-order callback, no +post-order callback for the current node will happen. +(Recall that if your callback exits with just C, +it is returning undef -- at least in scalar context, and +C always calls your callbacks in scalar context.) + +=item HTML::Element::ABORT + +...to abort the whole traversal immediately. +This is often useful when you're looking for just the first +node in the tree that meets some criterion of yours. + +=item HTML::Element::PRUNE_UP + +...to abort continued traversal into this node and its parent +node. No post-order callback for the current or parent +node will happen. + +=item HTML::Element::PRUNE_SOFTLY + +Like PRUNE, except that the post-order call for the current +node is not blocked. + +=back + +Almost every task to do with extracting information from a tree can be +expressed in terms of traverse operations (usually in only one pass, +and usually paying attention to only pre-order, or to only +post-order), or operations based on traversing. (In fact, many of the +other methods in this class are basically calls to traverse() with +particular arguments.) + +The source code for HTML::Element and HTML::TreeBuilder contain +several examples of the use of the "traverse" method to gather +information about the content of trees and subtrees. + +(Note: you should not change the structure of a tree I you are +traversing it.) + +[End of documentation for the C method] + +=head2 Traversing with Recursive Anonymous Routines + +Now, if you've been reading +I too much, maybe +you even want a recursive lambda. Go ahead: + + { + my $counter = 'x0000'; + my $give_id; + $give_id = sub { + my $x = $_[0]; + $x->attr('id', $counter++) unless defined $x->attr('id'); + foreach my $c ($x->content_list) { + $give_id->($c) if ref $c; # ignore text nodes + } + }; + $give_id->($start_node); + undef $give_id; + } + +It's a bit nutty, and it's I more concise than a call to the +C method! + +It is left as an exercise to the reader to figure out how to do the +same thing without using a C<$give_id> symbol at all. + +It is also left as an exercise to the reader to figure out why I +undefine C<$give_id>, above; and why I could achieved the same effect +with any of: + + $give_id = 'I like pie!'; + # or... + $give_id = []; + # or even; + $give_id = sub { print "Mmmm pie!\n" }; + +But not: + + $give_id = sub { print "I'm $give_id and I like pie!\n" }; + # nor... + $give_id = \$give_id; + # nor... + $give_id = { 'pie' => \$give_id, 'mode' => 'a la' }; + +=head2 Doing Recursive Things Iteratively + +Note that you may at times see an iterative implementation of +pre-order traversal, like so: + + { + my @to_do = ($tree); # start-node + while(@to_do) { + my $this = shift @to_do; + + # "Visit" the node: + $this->attr('id', $counter++) + unless defined $this->attr('id'); + + unshift @to_do, grep ref $_, $this->content_list; + # Put children on the stack -- they'll be visited next + } + } + +This can I be more efficient than just a +normal recursive routine, but at the cost of being rather obscure. It +gains efficiency by avoiding the overhead of function-calling, but +since there are several method dispatches however you do it (to +C and C), the overhead for a simple function call +is insignificant. + +=head2 Pruning and Whatnot + +The C method does have the fairly neat features of +the C, C and C signals. None of these +can be implemented I straightforwardly with recursive +routines, but it is quite possible. C-like behavior can be +implemented either with using non-local returning with C/C: + + my $died_on; # if you need to know where... + sub thing { + ... visits $_[0]... + ... maybe set $died_on to $_[0] and die "ABORT_TRAV" ... + ... else call thing($child) for each child... + ...any post-order visiting $_[0]... + } + eval { thing($node) }; + if($@) { + if($@ =~ m<^ABORT_TRAV>) { + ...it died (aborted) on $died_on... + } else { + die $@; # some REAL error happened + } + } + +or you can just do it with flags: + + my($abort_flag, $died_on); + sub thing { + ... visits $_[0]... + ... maybe set $abort_flag = 1; $died_on = $_[0]; return; + foreach my $c ($_[0]->content_list) { + thing($c); + return if $abort_flag; + } + ...any post-order visiting $_[0]... + return; + } + + $abort_flag = $died_on = undef; + thing($node); + ...if defined $abort_flag, it died on $died_on + +=head1 SEE ALSO + +L + +=head1 COPYRIGHT + +Copyright 2000,2001 Sean M. Burke + +=head1 AUTHOR + +Sean M. Burke, Esburke@cpan.orgE + +=cut