4 Pod::Simple::Subclassing -- write a formatter as a Pod::Simple subclass
8 package Pod::SomeFormatter;
10 @ISA = qw(Pod::Simple);
14 sub _handle_element_start {
15 my($parser, $element_name, $attr_hash_r) = @_;
19 sub _handle_element_end {
20 my($parser, $element_name) = @_;
25 my($parser, $text) = @_;
32 This document is about using Pod::Simple to write a Pod processor,
33 generally a Pod formatter. If you just want to know about using an
34 existing Pod formatter, instead see its documentation and see also the
35 docs in L<Pod::Simple>.
37 The zeroeth step in writing a Pod formatter is to make sure that there
38 isn't already a decent one in CPAN. See L<http://search.cpan.org/>, and
39 run a search on the name of the format you want to render to. Also
40 consider joining the Pod People list
41 L<http://lists.perl.org/showlist.cgi?name=pod-people> and asking whether
42 anyone has a formatter for that format -- maybe someone cobbled one
43 together but just hasn't released it.
45 The first step in writing a Pod processor is to read L<perlpodspec>,
46 which contains notes information on writing a Pod parser (which has been
47 largely taken care of by Pod::Simple), but also a lot of requirements
48 and recommendations for writing a formatter.
50 The second step is to actually learn the format you're planning to
51 format to -- or at least as much as you need to know to represent Pod,
52 which probably isn't much.
54 The third step is to pick which of Pod::Simple's interfaces you want to
55 use -- the basic interface via Pod::Simple or L<Pod::Simple::Methody> is
56 event-based, sort of like L<HTML::Parser>'s interface, or sort of like
57 L<XML::Parser>'s "Handlers" interface), but L<Pod::Simple::PullParser>
58 provides a token-stream interface, sort of like L<HTML::TokeParser>'s
59 interface; L<Pod::Simple::SimpleTree> provides a simple tree interface,
60 rather like XML::Parser's "Tree" interface. Users familiar with
61 XML-handling will find one of these styles relatively familiar; but if
62 you would be even more at home with XML, there are classes that produce
63 an XML representation of the Pod stream, notably
64 L<Pod::Simple::XMLOutStream>; you can feed the output of such a class to
65 whatever XML parsing system you are most at home with.
67 The last step is to write your code based on how the events (or tokens,
68 or tree-nodes, or the XML, or however you're parsing) will map to
69 constructs in the output format. Also sure to consider how to escape
70 text nodes containing arbitrary text, and also what to do with text
71 nodes that represent preformatted text (from verbatim sections).
77 TODO intro... mention that events are supplied for implicits, like for
81 In the following section, we use XML to represent the event structure
82 associated with a particular construct. That is, TODO
86 =item C<< $parser->_handle_element_start( I<element_name>, I<attr_hashref> ) >>
88 =item C<< $parser->_handle_element_end( I<element_name> ) >>
90 =item C<< $parser->_handle_text( I<text_string> ) >>
99 =item events with an element_name of Document
101 Parsing a document produces this event structure:
103 <Document start_line="543">
107 The value of the I<start_line> attribute will be the line number of the first
108 Pod directive in the document.
110 If there is no Pod in the given document, then the
111 event structure will be this:
113 <Document contentless="1" start_line="543">
116 In that case, the value of the I<start_line> attribute will not be meaningful;
117 under current implementations, it will probably be the line number of the
118 last line in the file.
120 =item events with an element_name of Para
122 Parsing a plain (non-verbatim, non-directive, non-data) paragraph in
123 a Pod document produces this event structure:
125 <Para start_line="543">
126 ...all events in this paragraph...
129 The value of the I<start_line> attribute will be the line number of the start
132 For example, parsing this paragraph of Pod:
134 The value of the I<start_line> attribute will be the
135 line number of the start of the paragraph.
137 produces this event structure:
139 <Para start_line="129">
144 attribute will be the line number of the first Pod directive
148 =item events with an element_name of B, C, F, or I.
150 Parsing a BE<lt>...E<gt> formatting code (or of course any of its
151 semantically identical syntactic variants
152 S<BE<lt>E<lt> ... E<gt>E<gt>>,
153 or S<BE<lt>E<lt>E<lt>E<lt> ... E<gt>E<gt>E<gt>E<gt>>, etc.)
154 produces this event structure:
160 Currently, there are no attributes conveyed.
162 Parsing C, F, or I codes produce the same structure, with only a
163 different element name.
165 If your parser object has been set to accept other formatting codes,
166 then they will be presented like these B/C/F/I codes -- i.e., without
169 =item events with an element_name of S
171 Normally, parsing an SE<lt>...E<gt> sequence produces this event
172 structure, just as if it were a B/C/F/I code:
178 However, Pod::Simple (and presumably all derived parsers) offers the
179 C<nbsp_for_S> option which, if enabled, will suppress all S events, and
180 instead change all spaces in the content to non-breaking spaces. This is
181 intended for formatters that output to a format that has no code that
182 means the same as SE<lt>...E<gt>, but which has a code/character that
183 means non-breaking space.
185 =item events with an element_name of X
187 Normally, parsing an XE<lt>...E<gt> sequence produces this event
188 structure, just as if it were a B/C/F/I code:
194 However, Pod::Simple (and presumably all derived parsers) offers the
195 C<nix_X_codes> option which, if enabled, will suppress all X events
196 and ignore their content. For formatters/processors that don't use
197 X events, this is presumably quite useful.
200 =item events with an element_name of L
202 Because the LE<lt>...E<gt> is the most complex construct in the
203 language, it should not surprise you that the events it generates are
204 the most complex in the language. Most of complexity is hidden away in
205 the attribute values, so for those of you writing a Pod formatter that
206 produces a non-hypertextual format, you can just ignore the attributes
207 and treat an L event structure like a formatting element that
208 (presumably) doesn't actually produce a change in formatting. That is,
209 the content of the L event structure (as opposed to its
210 attributes) is always what text should be displayed.
212 There are, at first glance, three kinds of L links: URL, man, and pod.
214 When a LE<lt>I<some_url>E<gt> code is parsed, it produces this event
217 <L content-implicit="yes" to="that_url" type="url">
221 The C<type="url"> attribute is always specified for this type of
224 For example, this Pod source:
226 L<http://www.perl.com/CPAN/authors/>
228 produces this event structure:
230 <L content-implicit="yes" to="http://www.perl.com/CPAN/authors/" type="url">
231 http://www.perl.com/CPAN/authors/
234 When a LE<lt>I<manpage(section)>E<gt> code is parsed (and these are
235 fairly rare and not terribly useful), it produces this event structure:
237 <L content-implicit="yes" to="manpage(section)" type="man">
241 The C<type="man"> attribute is always specified for this type of
244 For example, this Pod source:
248 produces this event structure:
250 <L content-implicit="yes" to="crontab(5)" type="man">
254 In the rare cases where a man page link has a specified, that text appears
255 in a I<section> attribute. For example, this Pod source:
257 L<crontab(5)/"ENVIRONMENT">
259 will produce this event structure:
261 <L content-implicit="yes" section="ENVIRONMENT" to="crontab(5)" type="man">
262 "ENVIRONMENT" in crontab(5)
265 In the rare case where the Pod document has code like
266 LE<lt>I<sometext>|I<manpage(section)>E<gt>, then the I<sometext> will appear
267 as the content of the element, the I<manpage(section)> text will appear
268 only as the value of the I<to> attribute, and there will be no
269 C<content-implicit="yes"> attribute (whose presence means that the Pod parser
270 had to infer what text should appear as the link text -- as opposed to
271 cases where that attribute is absent, which means that the Pod parser did
272 I<not> have to infer the link text, because that L code explicitly specified
275 For example, this Pod source:
277 L<hell itself!|crontab(5)>
279 will produce this event structure:
281 <L to="crontab(5)" type="man">
285 The last type of L structure is for links to/within Pod documents. It is
286 the most complex because it can have a I<to> attribute, I<or> a
287 I<section> attribute, or both. The C<type="pod"> attribute is always
288 specified for this type of L code.
290 In the most common case, the simple case of a LE<lt>podpageE<gt> code
291 produces this event structure:
293 <L content-implicit="yes" to="Net::Ping" type="pod">
297 For example, this Pod source:
301 produces this event structure:
303 <L content-implicit="yes" to="Net::Ping" type="pod">
307 In cases where there is link-text explicitly specified, it
308 is to be found in the content of the element (and not the
309 attributes), just as with the LE<lt>I<sometext>|I<manpage(section)>E<gt>
310 case discussed above. For example, this Pod source:
312 L<Perl Error Messages|perldiag>
314 produces this event structure:
316 <L to="perldiag" type="pod">
320 In cases of links to a section in the current Pod document,
321 there is a I<section> attribute instead of a I<to> attribute.
322 For example, this Pod source:
326 produces this event structure:
328 <L content-implicit="yes" section="Member Data" type="pod">
332 As another example, this Pod source:
334 L<the various attributes|/"Member Data">
336 produces this event structure:
338 <L section="Member Data" type="pod">
339 the various attributes
342 In cases of links to a section in a different Pod document,
343 there are both a I<section> attribute and a L<to> attribute.
344 For example, this Pod source:
346 L<perlsyn/"Basic BLOCKs and Switch Statements">
348 produces this event structure:
350 <L content-implicit="yes" section="Basic BLOCKs and Switch Statements" to="perlsyn" type="pod">
351 "Basic BLOCKs and Switch Statements" in perlsyn
354 As another example, this Pod source:
356 L<SWITCH statements|perlsyn/"Basic BLOCKs and Switch Statements">
358 produces this event structure:
360 <L section="Basic BLOCKs and Switch Statements" to="perlsyn" type="pod">
364 Incidentally, note that we do not distinguish between these syntaxes:
369 L<Member Data> [deprecated syntax]
371 That is, they all produce the same event structure, namely:
373 <L content-implicit="yes" section="Member Data" type="pod">
374 "Member Data"
377 =item events with an element_name of E or Z
379 While there are Pod codes EE<lt>...E<gt> and ZE<lt>E<gt>, these
380 I<do not> produce any E or Z events -- that is, there are no such
383 =item events with an element_name of Verbatim
385 When a Pod verbatim paragraph (AKA "codeblock") is parsed, it
386 produces this event structure:
388 <Verbatim start_line="543" xml:space="preserve">
392 The value of the I<start_line> attribute will be the line number of the
393 first line of this verbatim block. The I<xml:space> attribute is always
394 present, and always has the value "preserve".
396 The text content will have tabs already expanded.
399 =item events with an element_name of head1 .. head4
401 When a "=head1 ..." directive is parsed, it produces this event
408 For example, a directive consisting of this:
410 =head1 Options to C<new> et al.
412 will produce this event structure:
414 <head1 start_line="543">
422 "=head2" thru "=head4" directives are the same, except for the element
423 names in the event structure.
425 =item events with an element_name of over-bullet
427 When an "=over ... Z<>=back" block is parsed where the items are
428 a bulletted list, it will produce this event structure:
430 <over-bullet indent="4" start_line="543">
431 <item-bullet start_line="545">
434 ...more item-bullets...
437 The value of the I<indent> attribute is whatever value is after the
438 "=over" directive, as in "=over 8". If no such value is specified
439 in the directive, then the I<indent> attribute has the value "4".
441 For example, this Pod source:
455 produces this event structure:
457 <over-bullet indent="4" start_line="10">
458 <item-bullet start_line="12">
461 <item-bullet start_line="14">
466 =item events with an element_name of over-number
468 When an "=over ... Z<>=back" block is parsed where the items are
469 a numbered list, it will produce this event structure:
471 <over-number indent="4" start_line="543">
472 <item-number number="1" start_line="545">
475 ...more item-number...
478 This is like the "over-bullet" event structure; but note that the contents
479 are "item-number" instead of "item-bullet", and note that they will have
480 a "number" attribute, which some formatters/processors may ignore
481 (since, for example, there's no need for it in HTML when producing
482 an "<UL><LI>...</LI>...</UL>" structure), but which any processor may use.
484 Note that the values for the I<number> attributes of "item-number"
485 elements in a given "over-number" area I<will> start at 1 and go up by
486 one each time. If the Pod source doesn't follow that order (even though
487 it really should should!), whatever numbers it has will be ignored (with
488 the correct values being put in the I<number> attributes), and an error
489 message might be issued to the user.
491 =item events with an element_name of over-text
493 These events are are somewhat unlike the other over-*
494 structures, as far as what their contents are. When
495 an "=over ... Z<>=back" block is parsed where the items are
496 a list of text "subheadings", it will produce this event structure:
498 <over-text indent="4" start_line="543">
502 ...stuff (generally Para or Verbatim elements)...
504 ...more item-text and/or stuff...
507 The I<indent> attribute is as with the other over-* events.
509 For example, this Pod source:
523 produces this event structure:
525 <over-text indent="4" start_line="20">
526 <item-text start_line="22">
529 <Para start_line="24">
532 <item-text start_line="26">
539 <Para start_line="28">
546 =item events with an element_name of over-block
548 These events are are somewhat unlike the other over-*
549 structures, as far as what their contents are. When
550 an "=over ... Z<>=back" block is parsed where there are no items,
551 it will produce this event structure:
553 <over-block indent="4" start_line="543">
554 ...stuff (generally Para or Verbatim elements)...
557 The I<indent> attribute is as with the other over-* events.
559 For example, this Pod source:
563 For cutting off our trade with all parts of the world
565 For transporting us beyond seas to be tried for pretended offenses
567 He is at this time transporting large armies of foreign mercenaries to
568 complete the works of death, desolation and tyranny, already begun with
569 circumstances of cruelty and perfidy scarcely paralleled in the most
570 barbarous ages, and totally unworthy the head of a civilized nation.
574 will produce this event structure:
576 <over-block indent="4" start_line="2">
577 <Para start_line="4">
578 For cutting off our trade with all parts of the world
580 <Para start_line="6">
581 For transporting us beyond seas to be tried for pretended offenses
583 <Para start_line="8">
584 He is at this time transporting large armies of [...more text...]
588 =item events with an element_name of item-bullet
590 See L</"events with an element_name of over-bullet">, above.
592 =item events with an element_name of item-number
594 See L</"events with an element_name of over-number">, above.
596 =item events with an element_name of item-text
598 See L</"events with an element_name of over-text">, above.
600 =item events with an element_name of for
604 =item events with an element_name of Data
612 =head1 More Pod::Simple Methods
614 Pod::Simple provides a lot of methods that aren't generally interesting
615 to the end user of an existing Pod formatter, but some of which you
616 might find useful in writing a Pod formatter. They are listed below. The
617 first several methods (the accept_* methods) are for declaring the
618 capabilites of your parser, notably what C<=for I<targetname>> sections
619 it's interested in, what extra NE<lt>...E<gt> codes it accepts beyond
620 the ones described in the I<perlpod>.
624 =item C<< $parser->accept_targets( I<SOMEVALUE> ) >>
626 As the parser sees sections like:
628 =for html <img src="fig1.jpg">
638 ...the parser will ignore these sections unless your subclass has
639 specified that it wants to see sections targetted to "html" (or whatever
640 the formatter name is).
642 If you want to process all sections, even if they're not targetted for you,
643 call this before you start parsing:
645 $parser->accept_targets('*');
647 =item C<< $parser->accept_targets_as_text( I<SOMEVALUE> ) >>
649 This is like accept_targets, except that it specifies also that the
650 content of sections for this target should be treated as Pod text even
651 if the target name in "=for I<targetname>" doesn't start with a ":".
653 At time of writing, I don't think you'll need to use this.
656 =item C<< $parser->accept_codes( I<Codename>, I<Codename>... ) >>
658 This tells the parser that you accept additional formatting codes,
659 beyond just the standard ones (I B C L F S X, plus the two weird ones
660 you don't actually see in the parse tree, Z and E). For example, to also
661 accept codes "N", "R", and "W":
663 $parser->accept_codes( qw( N R W ) );
665 B<TODO: document how this interacts with =extend, and long element names>
668 =item C<< $parser->accept_directive_as_data( I<directive_name> ) >>
670 =item C<< $parser->accept_directive_as_verbatim( I<directive_name> ) >>
672 =item C<< $parser->accept_directive_as_processed( I<directive_name> ) >>
674 In the unlikely situation that you need to tell the parser that you will
675 accept additional directives ("=foo" things), you need to first set the
676 parset to treat its content as data (i.e., not really processed at
677 all), or as verbatim (mostly just expanding tabs), or as processed text
678 (parsing formatting codes like BE<lt>...E<gt>).
680 For example, to accept a new directive "=method", you'd presumably
683 $parser->accept_directive_as_processed("method");
685 so that you could have Pod lines like:
687 =method I<$whatever> thing B<um>
689 Making up your own directives breaks compatibility with other Pod
690 formatters, in a way that using "=for I<target> ..." lines doesn't;
691 however, you may find this useful if you're making a Pod superset
692 format where you don't need to worry about compatibility.
695 =item C<< $parser->nbsp_for_S( I<BOOLEAN> ); >>
697 Setting this attribute to a true value (and by default it is false) will
698 turn "SE<lt>...E<gt>" sequences into sequences of words separated by
699 C<\xA0> (non-breaking space) characters. For example, it will take this:
701 I like S<Dutch apple pie>, don't you?
703 and treat it as if it were:
705 I like DutchE<nbsp>appleE<nbsp>pie, don't you?
707 This is handy for output formats that don't have anything quite like an
708 "SE<lt>...E<gt>" code, but which do have a code for non-breaking space.
710 There is currently no method for going the other way; but I can
711 probably provide one upon request.
714 =item C<< $parser->version_report() >>
716 This returns a string reporting the $VERSION value from your module (and
717 its classname) as well as the $VERSION value of Pod::Simple. Note that
718 L<perlpodspec> requires output formats (wherever possible) to note
719 this detail in a comment in the output format. For example, for
720 some kind of SGML output format:
722 print OUT "<!-- \n", $parser->version_report, "\n -->";
725 =item C<< $parser->pod_para_count() >>
727 This returns the count of Pod paragraphs seen so far.
730 =item C<< $parser->line_count() >>
732 This is the current line number being parsed. But you might find the
733 "line_number" event attribute more accurate, when it is present.
736 =item C<< $parser->nix_X_codes( I<SOMEVALUE> ) >>
738 This attribute, when set to a true value (and it is false by default)
739 ignores any "XE<lt>...E<gt>" sequences in the document being parsed.
740 Many formats don't actually use the content of these codes, so have
741 no reason to process them.
744 =item C<< $parser->merge_text( I<SOMEVALUE> ) >>
746 This attribute, when set to a true value (and it is false by default)
747 makes sure that only one event (or token, or node) will be created
748 for any single contiguous sequence of text. For example, consider
749 this somewhat contrived example:
751 I just LOVE Z<>hotE<32>apple pie!
753 When that is parsed and events are about to be called on it, it may
754 actually seem to be four different text events, one right after another:
755 one event for "I just LOVE ", one for "hot", one for " ", and one for
756 "apple pie!". But if you have merge_text on, then you're guaranteed
757 that it will be fired as one text event: "I just LOVE hot apple pie!".
760 =item C<< $parser->code_handler( I<CODE_REF> ) >>
762 This specifies code that should be called when a code line is seen
763 (i.e., a line outside of the Pod). Normally this is undef, meaning
764 that no code should be called. If you provide a routine, it should
767 sub get_code_line { # or whatever you'll call it
768 my($line, $line_number, $parser) = @_;
772 Note, however, that sometimes the Pod events aren't processed in exactly
773 the same order as the code lines are -- i.e., if you have a file with
774 Pod, then code, then more Pod, sometimes the code will be processed (via
775 whatever you have code_handler call) before the all of the preceding Pod
779 =item C<< $parser->cut_handler( I<CODE_REF> ) >>
781 This is just like the code_handler attribute, except that it's for
782 "=cut" lines, not code lines. The same caveats apply. "=cut" lines are
783 unlikely to be interesting, but this is included for completeness.
786 =item C<< $parser->whine( I<linenumber>, I<complaint string> ) >>
788 This notes a problem in the Pod, which will be reported to in the "Pod
789 Errors" section of the document and/or send to STDERR, depending on the
790 values of the attributes C<no_whining>, C<no_errata_section>, and
793 =item C<< $parser->scream( I<linenumber>, I<complaint string> ) >>
795 This notes an error like C<whine> does, except that it is not
796 suppressable with C<no_whining>. This should be used only for very
800 =item C<< $parser->source_dead(1) >>
802 This aborts parsing of the current document, by switching on the flag
803 that indicates that EOF has been seen. In particularly drastic cases,
804 you might want to do this. It's rather nicer than just calling
807 =item C<< $parser->hide_line_numbers( I<SOMEVALUE> ) >>
809 Some subclasses that indescriminately dump event attributes (well,
810 except for ones beginning with "~") can use this object attribute for
811 refraining to dump the "start_line" attribute.
813 =item C<< $parser->no_whining( I<SOMEVALUE> ) >>
815 This attribute, if set to true, will suppress reports of non-fatal
816 error messages. The default value is false, meaning that complaints
817 I<are> reported. How they get reported depends on the values of
818 the attributes C<no_errata_section> and C<complain_stderr>.
820 =item C<< $parser->no_errata_section( I<SOMEVALUE> ) >>
822 This attribute, if set to true, will suppress generation of an errata
823 section. The default value is false -- i.e., an errata section will be
826 =item C<< $parser->complain_stderr( I<SOMEVALUE> ) >>
828 This attribute, if set to true will send complaints to STDERR. The
829 default value is false -- i.e., complaints do not go to STDERR.
831 =item C<< $parser->bare_output( I<SOMEVALUE> ) >>
833 Some formatter subclasses use this as a flag for whether output should
834 have prologue and epilogue code omitted. For example, setting this to
835 true for an HTML formatter class should omit the
836 "<html><head><title>...</title><body>..." prologue and the
837 "</body></html>" epilogue.
839 If you want to set this to true, you should probably also set
840 C<no_whining> or at least C<no_errata_section> to true.
842 =item C<< $parser->preserve_whitespace( I<SOMEVALUE> ) >>
844 If you set this attribute to a true value, the parser will try to
845 preserve whitespace in the output. This means that such formatting
846 conventions as two spaces after periods will be preserved by the parser.
847 This is primarily useful for output formats that treat whitespace as
848 significant (such as text or *roff, but not HTML).
855 L<Pod::Simple> -- event-based Pod-parsing framework
857 L<Pod::Simple::Methody> -- like Pod::Simple, but each sort of event
858 calls its own method (like C<start_head3>)
860 L<Pod::Simple::PullParser> -- a Pod-parsing framework like Pod::Simple,
861 but with a token-stream interface
863 L<Pod::Simple::SimpleTree> -- a Pod-parsing framework like Pod::Simple,
864 but with a tree interface
866 L<Pod::Simple::Checker> -- a simple Pod::Simple subclass that reads
867 documents, and then makes a plaintext report of any errors found in the
870 L<Pod::Simple::DumpAsXML> -- for dumping Pod documents as tidily
871 indented XML, showing each event on its own line
873 L<Pod::Simple::XMLOutStream> -- dumps a Pod document as XML (without
874 introducing extra whitespace as Pod::Simple::DumpAsXML does).
876 L<Pod::Simple::DumpAsText> -- for dumping Pod documents as tidily
877 indented text, showing each event on its own line
879 L<Pod::Simple::LinkSection> -- class for objects representing the values
880 of the TODO and TODO attributes of LE<lt>...E<gt> elements
882 L<Pod::Escapes> -- the module the Pod::Simple uses for evaluating
883 EE<lt>...E<gt> content
885 L<Pod::Simple::Text> -- a simple plaintext formatter for Pod
887 L<Pod::Simple::TextContent> -- like Pod::Simple::Text, but
888 makes no effort for indent or wrap the text being formatted
892 L<perlpodspec|perlpodspec>
897 =head1 COPYRIGHT AND DISCLAIMERS
899 Copyright (c) 2002 Sean M. Burke. All rights reserved.
901 This library is free software; you can redistribute it and/or modify it
902 under the same terms as Perl itself.
904 This program is distributed in the hope that it will be useful, but
905 without any warranty; without even the implied warranty of
906 merchantability or fitness for a particular purpose.
910 Sean M. Burke C<sburke@cpan.org>
914 Hm, my old podchecker version (1.2) says:
915 *** WARNING: node 'http://search.cpan.org/' contains non-escaped | or / at line 38 in file Subclassing.pod
916 *** WARNING: node 'http://lists.perl.org/showlist.cgi?name=pod-people' contains non-escaped | or / at line 41 in file Subclassing.pod