Request from CLDR committee

Paul Eggert eggert at CS.UCLA.EDU
Thu Aug 18 21:52:49 UTC 2005


"Mark Davis" <mark.davis at icu-project.org> writes:

> I agree that these are oddities, and that ISO really shouldn't have
> had country codes for them. Unfortunately, these are more than
> glitches; many systems use ISO codes internally. And completeness
> helps in testing, even if these are uninhabited.

It depends on what sort of "completeness" you want.  To my mind, a
system that can adequately represent absent or missing data is more
complete than a system that cannot.

> I don't see anything wrong with setting them to be the time that
> would otherwise be the case, using the maritime zones.

> If you don't change them, you really have to change Theory to be
> accurate. It says now:
>
> "       Include at least one location per time zone rule set per country.
>                One such location is enough.  Use ISO 3166 (see the file
>                iso3166.tab) to help decide whether something is a country.
> "
> to something like:
>
> "       For most countries, include at least one location per time
> zone rule set per country.
>                One such location is enough.  Use ISO 3166 (see the file
>                iso3166.tab) to help decide whether something is a country.
> "

Yes, that's true.  Thanks.  I'll insert the word "inhabited" before
"country"; that's simpler.

> The x/x/x... notation is fine for internal use,  but isn't really the
> best for end users.

The x/x/x notation is intended only for "internal" use, if by "internal"
you mean "visible to programmers and people who know how to use the shell
and such-like".  I don't see why end users would ever need to see it.
And if end users do need to see it, I don't see why they would care
whether there are slashes in it.

> If we can't depend on the TZ group to assign unique final fields, we
> can prepare to have deviations from them wherever
> necessary.

If slashes are causing problems for you, you can replace them with
some other character, e.g., colons.

> However, we really would like to avoid that. With all the
> possible names, and with exceptions such as Bahia already in the
> database, it is unclear why non-unique final fields have to be used:
> perhaps you can take a few minutes to explain why that is required?

It's meant as a convenience to programmers and people who know how to
use the shell and such-like.  It's better to use mnemonics.

> I had thought that all the rules for *all* the timezones,
> Etc/GMT+-X, America/Los_Angeles, etc. were defined in terms of
> offsets from UTC. If some of them are not, that is very unclear from
> the documentation. Can you explain which are and which aren't?

They are not merely history of offsets from UTC.  They also are
histories of whether the time in question is daylight saving time (the
tm_isdst flag of C; see
<http://www.opengroup.org/onlinepubs/009695399/basedefs/time.h.html>),
and the time zone abbreviation in effect (the tzname of POSIX; see
<http://www.opengroup.org/onlinepubs/009695399/functions/tzname.html>).
So, even if the UTC offset histories are identical, the other bits
of information may mean that the Zone entries still differ.

>> How about if you simplify your life by simply ignoring the Etc/* names?
>
> But those are used for maritime rules, right?

Not in POSIX, no.  You can just use something like TZ='<GMT+12>-12'.

> We also need them for cases where parsing a zone gives no other
> information. If we parse a string like "12:00am GMT-500", we cannot
> assign it to a zone with a location, since we don't know which one
> it would be.

But in that case you need approximately 24*60 entries, right?  That
sounds excessive.

>>>             AQ      Antarctica/Mawson
>>>             AQ      Antarctica/Vostok -- same since 57
>>
>> As far as I know it's merely a coincidence that the two bases use the
>> same time zone.  Most crucially the bases belong to different
>> countries (so to some extent the "different country" rule applies,
>> though admittedly Antarctica is special) and use different supply
>> lines.  I'd rather leave them alone for now.
>
> Then you need additional clarification in Theory, that not only do you
> subtract country codes from ISO, you add them as well ;-)

They do use different tz abbreviations (tzname), if that's what we need
for an excuse.

>>> List C. TZIDs that are linked, but refer to different locations.
>>
>> As a rule, these locations have identical time zone histories except
>> before the advent of standard time; and LMT (by definition) is
>> approximate, so I don't see the harm in keeping them linked.
>> If they are trouble for your system, perhaps you can keep a
>> list of exceptions and filter them out.
>
> We do, but that means that we have to check for "faux" links with
> every version of the TZ database. If at least you put put them in a
> special file that we could mechanically check for, or had some kind of
> extra notation, even a comment that we could parse for (documented in
> Theory), then we wouldn't have to hack around it.

How about if you just ignore all the "Link" entries?  Or, if that's
not quite right, include the exceptional entries that need to be added.



More information about the tz mailing list