[tz] Non-ASCII outside comments?

Guy Harris guy at alum.mit.edu
Fri Jun 27 04:27:39 UTC 2014

On Jun 26, 2014, at 4:56 PM, Matt Johnson <mj1856 at hotmail.com> wrote:

> Guys, please don't forget that zic is not the only usage of the tzdata.  Countless platforms, languages, libraries, and applications use TZ identifiers.  Putting a non-ASCII character in a time zone name is likely going to break many things.

If by "the tzdata" you mean "the files that come with the IANA time zone distribution", then all we can do is

	1) have the Theory file indicate that official tz files will have only characters in a given subset of ASCII in the zone names


	2) follow the rules of the theory file when putting out tz data releases.

If by "the tzdata" you mean "any data that anyone ever constructs for time zone processing", then all we can do is

	1) nothing

because somebody who constructs a file with tz syntax is not obliged to run them through zic at all, much less run it through zic with the "-v" flag.

The Theory file currently says

        Use only valid POSIX file name components (i.e., the parts of      
                names other than '/').  Do not use the file name
                components '.' and '..'.  Within a file name component,
                use only ASCII letters, '.', '-' and '_'.  Do not use
                digits, as that might create an ambiguity with POSIX
                TZ strings.  A file name component must not exceed 14
                characters or start with '-'.  E.g., prefer 'Brunei'
                to 'Bandar_Seri_Begawan'.

and I didn't see a patch from Paul that would *remove* any of those requirements, so, as long as we follow the Theory file rules, no official tz file will have a zone name that includes non-ASCII characters - or even ASCII characters other than a-z, A-Z, ".", "-", and "_".

So all Paul would be obliged to do would be to continue to follow the Theory rules; I have no reason to imagine that he would do anything other than that.

He is *not* obliged to make zic unconditionally reject files that have non-ASCII characters; if some third party creates a tz file with non-ASCII characters in a zone name, and hands it to software that parses tz files, and Something Bad Happens, that's a matter for the third party and the developer of the software in question to resolve.

More information about the tz mailing list