[tz] Simplification and unification of scheme:// anchors

Ian Abbott abbotti at mev.co.uk
Wed Jan 30 17:27:04 UTC 2013


On 2013-01-30 14:02, Steffen Daode Nurpmeso wrote:
> Ian Abbott <abbotti at mev.co.uk> wrote:
>   |On 2013-01-30 11:28, Steffen Daode Nurpmeso wrote:
>   |> Ian Abbott <abbotti at mev.co.uk> wrote:
>   |>|While on the subject, the backslash escapes at the ends of the lines
>   |>|with a <URL> with a parenthesised comment on the following line is kind
>   |>|of ugly.  I'm sure it must be possible to re-work your script to avoid
>   |>|the need for that.  (I.e. if a line ends with a <URL> plus optional
>   |>|whitespace, check if the following line starts with optional whitespace
>   |>|plus parenthesised link text.)
>   |>
>   |> Hmm.
>   |> So i've reworked the (Pod-less) script to support multiple follow
>   |> lines in the middle of nowhere, and changed the two links from
>   |> which i remembered that it did matter.
>   |>
>   |> This updated version also fixes the "trailing empty line after
>   |> rules are included in data boxes" issue.
>   |> And it uses normal text paragraphs for the comment text, forcing
>   |> newline breaks via <br />, instead of using preformatted text for
>   |> that, which makes it even nicer, since some of the dramatically
>   |> long links will now be wrapped by browsers.
>   |
>   |Self closing tags such as <br /> are only legal in xhtml, not plain
>   |html, so you'll need to output a XML declaration and a DOCTYPE in your
>   |script.
>
> That is indeed a good point, it must be '<br>'.

That depends what DOCTYPE you decide to use.

There are various other things wrong with the output, such as '&', '<' 
and '>' not being turned into the entities '&amp;', '&lt;' and '&gt;'. 
Note that if doing that, you'd need to make sure not to convert the 
existing entities such as '&aacute;' into '&amp;aacute;'.  That would be 
easier if the existing HTML entities were converted to UTF-8 sequences 
first!

(There are also a few odd-ball bits of mark-up in the original text, 
such as <e'> which need to be dealt with by a separate patch to the data 
files, e.g. to replace <e'> with the HTML entity &eacute; or by the 
UTF-8 sequence é if going down the UTF-8 road.)

Also, validator.w3.org is your friend!

>   |>   # For more about the first ten years of DST in the United States, see
>   |>   # Robert Garland's <http://www.clpgh.org/exhibit/dst.html> \
>   |> -# (``Ten years of daylight saving from the Pittsburgh standpoint'', \
>   |. Carnegie Library of Pittsburgh, 1927).
>   |> +# (``Ten years of daylight saving from the Pittsburgh standpoint'', \
>   |> +# Carnegie Library of Pittsburgh, 1927).
>   |
>   |It would still be great to get rid of the backslash line continuations
>   |and modify the script to work without them.
>
> :)
> I personally like it explicit and would definitely go for the L<><>
> syntax i've used first, since it is completely unambiguous.

There's also the MediaWiki style for external links, e.g.:

[http://www.foobar.org/baz.html Meaningful link text]

which is not too unreadable, but less readable than having the 
Meaningful link text in parentheses.  For long URLs, it might be split 
like this:

[http://www.foobar.org/baz.html
Meaningful link text]

or even:

[http://www.foobar.org/baz.html Meaningful
link
text]

which should be fine as long as the Meaningful link text contains no ']' 
characters (or at least no unmatched ']' characters if matched pairs of 
'[' and ']' are to be allowed).

> I would also spend some more time and convert the many "headlines" that
> yet exist in the comments to enough markup to get to something
> real; in fact with not that much effort, maybe a weekend, it would
> be possible to adjust the comments so that the script could use
> indents, lists and normal paragraphs without any <br> at all;
> then the Pod-way (any many others, too) could be pursued,
> also leading to cross-referenced PDF output -- and that is
> something that would surely be interesting for some people, as
> i suppose.

It depends how much mark-up people are willing to put up with in the 
tzdata files, but I suspect not very much, if any!  The primary method 
for viewing the tzdata files should be the plain text originals, not the 
output from some fancy converter.

> But the idea ypu proposed won't work with git(1), since trailing
> whitespace is a no-go; right?

You shouldn't need trailing anything, right?  If the line ends with URL, 
see if the next line(s) contains the start of the link text before you 
decide to output the <br /> or whatever.

-- 
-=( Ian Abbott @ MEV Ltd.    E-mail: <abbotti at mev.co.uk>        )=-
-=( Tel: +44 (0)161 477 1898   FAX: +44 (0)161 718 3587         )=-




More information about the tz mailing list