[tz] Non-valid timezones, is there a rule to remove them?

Fri Jan 24 16:43:17 UTC 2014

>> I'm maintaining a mapping data between the IANA tzids and Windows 
>> time zones in the Unicode CLDR project and review the data about 
>> quarterly basis [http://www.unicode.org/repos/cldr/trunk/common/
>> supplemental/windowsZones.xml].
>> 
>> -Yoshito 
> 
> I've found a few differences between your mapping table and my tzdata [
> http://paste.stg.fedoraproject.org/4440/90540445/].
> I would be happy to provide it patch formatted, just let me know if 
> it's of your interest.
> 
> Moreover, what is the reason to have those kind of differences?
> Could it be different from distro to distro?
> 

These are FAQs. I think CLDR project should provide a document about this.
This document [
http://cldr.unicode.org/development/development-process/design-proposals/extended-windows-olson-zid-mapping
] is a little bit old, but you can see some useful background information 
about the mapping data. (Sorry for my crappy English)

CLDR project defines stable set of tzids, because tzids are used also for 
locale data for display names. For example -

TZ database: America/Argentina/Buenos_Aires vs. CLDR: America/Buenos_Aires

In old versions of the tz database only had "America/Burnos_Aires" and 
CLDR project uses it as a key for localized display names for the zone. 
Later, the tz database reorganized Argentina zones, then 
"America/Buenos_Aires" was moved to backward file as below:

Link    America/Argentina/Buenos_Aires  America/Buenos_Aires

Because we don't want CLDR locale data files to change the key identifying 
the zone, we preserve America/Buenos_Aires as the canonical name of the 
zone, America/Argentina/Buenos_Aires is added as an alias. The mapping is 
defined in another place [
http://www.unicode.org/repos/cldr/trunk/common/bcp47/timezone.xml]

<type name="arbue" description="Buenos Aires, Argentina" 
alias="America/Buenos_Aires America/Argentina/Buenos_Aires"/>

In this file, first entry of alias is the canonical 'long' id for the zone 
in CLDR project, and remaining entries in alias attribute are its alias. 
That means, the consumer of windowsZones.xml needs to use this additional 
mapping data.

For 'missing' data, such as "Australia/Lord_Howe" - is really unmappable. 
In the tz database, this zone is defined as

Zone Australia/Lord_Howe 10:36:20 -     LMT     1895 Feb
                        10:00   -       EST     1981 Mar
                        10:30   LH      LHST

However, Windows does not have any zone using UTC+10:30 offset. I use a 
small tooling for maintaining the mapping data in the ICU project and I 
have exception data for such zones [
http://source.icu-project.org/repos/icu/icuapps/trunk/WinTZ/src/com/ibm/icu/dev/tools/wintz/mapper/MapData.java
]. I think the comments below explain why these are not included.

    /*
     * There are some Olson time zones that do not have the same base UTC 
offset in
     * Windows time zones. These zones are not supported by Windows.
     */
    static final String[] NO_BASE_OFFSET_MATCH_ZONES_ARRAY = {
        "Australia/Eucla",      // +8:45
        "Australia/Lord_Howe",  // +10:30
        "Etc/GMT-14",           // +14:00
        "Pacific/Chatham",      // +12:45
        "Pacific/Kiritimati",   // +14:00
        "Pacific/Marquesas",    // -9:30
        "Pacific/Norfolk",      // +11:30
    };

    /*
     * These Olson time zones are using different DST rules from Windows 
zones with
     * same base offset.
     */
    static final String[] NO_DST_RULE_MATCH_ZONES_ARRAY = {
        // UTC-10:00/North American DST rule.
        // Closest match - "Hawaiian Standard Time" (no DST)
        "America/Adak",

        // UTC-08:00/no DST.
        // Closest match - "Pacific Standard Time" (observes DST).
        "Etc/GMT+8",
        "America/Metlakatla",
        "Pacific/Pitcairn",

        // UTC-09:00/no DST
        // Closest match - "Alaskan Standard Time" (observes DST).
        "Etc/GMT+9",
        "Pacific/Gambier",

        // UTC-06:00/Southern Hemisphere style DST rule.
        // Closest match - "Central America Standard Time" (observes 
Northern Hemisphere style DST rule).
        "Pacific/Easter",

        // UTC-03:00 zone with North American DST rule.
        // Closest match - "Greenland Standard Time" (observes EU DST 
rule).
        "America/Miquelon",

        // UTC+02:00 with DST (Mar - Sep).
        // Closest match - "E. Europe Standard Time", "Israel Standard 
Time" and some others
        "Asia/Gaza",
        "Asia/Hebron",
    };

There is a request to add 'unmappable' zones included in the data [
http://unicode.org/cldr/trac/ticket/5589] and I'm planning to work on this 
in near future.

I don't want to hijack this ML for discussing CLDR specific 
implementation. If you have further questions, please post your question 
directly to the CLDR project. You can post your questions to CLDR user 
mailing list (cldr-users at unicode.org) [
http://www.unicode.org/consortium/distlist.html#cldr_list] or problem 
reports/new feature requests to the CLDR trac [
http://unicode.org/cldr/trac].

Thanks,
Yoshito

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/tz/attachments/20140124/350d5bd0/attachment.html>