[tz] TZ file comments UTF-8? Bastardized HTML?

Mon Jan 14 12:10:20 UTC 2013

Russ Allbery <rra at stanford.edu> wrote:
 |Steffen "Daode" Nurpmeso <sdaoden at gmail.com> writes:
 |> How about turning over to a simple generic approach, as in
 |
 |>   L<http://www.url1.com>
 |>   and/or
 |>   L<Some descriptive text><http://www.url2.com>
 |
 |> and having "I" for informational links that follow the same style
 |> but do not provide hyperlinks and "D" for dead links?
 |
 |Note that this is very, very close to POD syntax, and if you move the rest
 |of the way to POD syntax, you could then just extract the comments and use
 |any of the existing POD formatters and, from that, get any output format
 |you want from text through HTML to PDF.

This would indeed not be a step that big for the parser, given
that it sticks as a filter in the middle, nor for the editor when
unrotting/converting the database files.

 |The only changes that you'd need to make would be to change your second
 |example to:
 |
 |    L<Some descriptive text|http://www.url2.com>

See below..

 |and then decide how you'd like to handle the I and D links.  One option
 |for the latter would be to just prefilter the output to remove the link
 |and return just the descriptive text before processing the rest of the
 |document with a POD formatter, which would be quite easy to do using
 |either a simple script or existing POD parsing modules.
 |
 |If you do decide to use POD, you may want to pick a different letter than
 |I, since I<> is already a POD formatting code (for italics).  D<> is safe
 |to use; there's no existing formatting code.

Well, that was just what came to my mind on saturday.
Maybe U<> (for URL)?  It seems unused.

 |POD would also give you headings, verbatim paragraphs, and a (very
 |verbose, admittedly) syntax for lists if you wanted to use them, but you
 |of course wouldn't have to.

I would think of headings and (verbatim) paragraphs, trying to
make the output match the input, which will be somewhat hard and
require adjustments from what i see at a glance, because
text-filling obviously has not been thought of when adding entries
to the tz database files.
I also can hardly imagine that spreading formatting tags will be
accepted by the actual tz maintainer(s), since otherwise they
might have done it 15 years ago.

 |This doesn't solve the character set problem, sadly; POD of course has its
 |own method of representing characters outside the input character set, but
 |it's as ugly (or uglier) than all the encoding methods already proposed.
 |But neither would it get in the way; POD is happy with any input character
 |set you want to use, and just wants a single =encoding command somewhere
 |to tell it what to expect if you're not using US-ASCII or UTF-8.

I think the script can easily get extended by another mode which
simply takes an encoding name and reads text from STDIN/file,
converting to tz database-style comment output along the way;
i.e., HTML entities.  It would have to decode HTML entities to
E<>, which should work, then?  Shouldn't be that hard either.

 |I'm happy to help with any POD issues, including showing how to subclass
 |Pod::Text to add new formatting codes, if you want to go that route.

..Well i think for L<> to work the way you show it above the POD
parser must be adjusted; or we need to use special tags and make
POD treat them as links.
I haven't yet used POD any more sophisticated than this:

  use Pod::Text;
  my $parser = Pod::Text->new(loose => 1, indent => 0, width => 72);
  $parser->parse_from_file($0, '-');

and of course normal module documentation, so some hints how to do
the required task would be appreciated.

It of course all stands and falls with wether these adjustments
will be acceptable for the tz database as such.
I'm not planning to fork the project.

 |-- 
 |Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>

--steffen