Time Zone Localizations

Mark Davis mark.davis at jtcsv.com
Fri Jun 11 22:51:27 UTC 2004


comments interleaved below. Since this is getting back into the translation
issues, I'm cc'ing the cldr group.

Mark
__________________________________
http://www.macchiato.com
► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- 
From: <jcowan at reutershealth.com>
To: "Mark Davis" <mark.davis at jtcsv.com>
Cc: <tz at lecserver.nci.nih.gov>; "Chuck Soper" <chucks at lmi.net>
Sent: Fri, 2004 Jun 11 13:23
Subject: Re: Time Zone Localizations


> Mark Davis scripsit:
>
> > However, to reduce the translation requirements and make the data
> > more manageable, we do want to set up some uniqueness criteria. If
> > two IDs have exactly the same behavior since the time when time zones
> > were adopted,
>
> In fact the Olson data do not separate timezones in a given country
> that have been the same since 1970-01-01.  Otherwise, Indiana would have
> something like 30 time zones instead of just four.
>
> > and have always been in the same country over that period, we only want
> > one of them to be in the main list. The other can be an alternate --
> > and still work-- but we would recommend an extremely low priority
> > on translation.
>
> I think that is a mistake, for two reasons:  national chauvinism and
> future-proofing.  About the former, nothing need be said; but the whole
> point of setting your zone to the country you are in (especially if you
> live there) is that you don't want to have to reset it if your national
> legislature changes the rules, either the DST rules or the zone proper.
> Within the EU, DST rules are harmonized, but which zone to adopt is a
> purely national decision.

I said "have always been in the same country over that period"; this would not
make any zones "modern equivalents" that were in different countries. But see
below.

>
> > Many (I would dare say the vast majority) of end users just don't care
> > now that there was once a difference between Dawson, Whitehorse and
> > Los Angeles.
>
> This strikes me as backwards.  If you're in the U.S., you should see U.S.
> choices; in Canada you should see Canadian ones.

My fault for confusing you; I mistyped Los Angeles instead of Vancouver. Here is
a real example. Each of the items separated by commas are modern equivalents,
and all within the same country (Canada). Thus America/Dawson,
America/Whitehorse, America/Vancouver are not distinguished by country, and all
behave the same nowadays.

America/Dawson, America/Whitehorse, America/Vancouver;
America/Dawson_Creek;
America/Inuvik, America/Yellowknife, America/Edmonton, America/Cambridge_Bay;
America/Swift_Current, America/Regina;
America/Rainy_River, America/Rankin_Inlet;
America/Winnipeg;
America/Iqaluit, America/Pangnirtung, America/Nipigon, America/Thunder_Bay,
America/Montreal;
America/Goose_Bay;
America/Glace_Bay, America/Halifax;
America/St_Johns

>
> > Absolutely!
>
> I think the series of fallbacks is unnecessarily complex.  In particular,
> the fallback from "Pacific Time" to "GMT-07:00/08:00" doesn't tell me
> that much, because I don't know a priori whether it's winter or summer
> currently.
>
> In addition, it fails to exploit the nice thing about the use of city
> names in Olson, namely that city names don't need that much localization:
> in the vast majority of cases, the internationally known name is the
> only name.  (Transliteration might be required if the current locale
> has no Latin letters.)  Thus the full combinatorial explosion of city
> name x language can mostly be short-circuited.

If that were true, we'd not as much of a problem. And if everyone spoke English
this would all be much easier ;-) Look at London, from the CLDR:

<ldml><dates><timeZoneNames><zone type="GMT"><exemplarCity>
"Londain": ·ga·
"Londen": ·nl·
"London": ·da· ·en· ·fr· ·sv·
"Londra": ·it·
"Londres": ·es· ·pt·
"Lontoo": ·fi·
"ロンドン": ·ja·
"伦敦": ·zh·
"倫敦": ·zh_Hant·
"런던": ·ko·
...

These are only for a few languages, but there is a lot of variation. A great
part of the motivation for this is to cut down on the amount of data required,
just from the sheer magnitude of the problem when you multiply the figures by
the 90 languages currently in CLDR, plus the many more languages to come.

>
> I propose a simpler scheme, therefore:
>
> 1) If you have a translation for the time zone name x the language, use it.
>
> 2a) Get the localized name for the city (or if none, the Olson city name);
> 2b) Get the "Tampo de '%1'" schema for the language (or if none, use just
"%1");
> 2c) Substitute the city name into the schema and use that.

I can understand your desire for simplicity, and I am not happy with there being
8 possible steps. But depending on city data would be very painful. We already
have in CLDR a lot of country data, so if we can leverage that it really helps.
Let's look at the figures. There are 239 countries. Of them, 210 have a single
zone. Using a country name for each of them is essentially free. Of the
remainder, 8 only have multiple zones historically. So the modern ones are again
essentially free. Of the rest, cities might be the best way to go. We would need
99 cities for modern zone distinctions, 140 if we added historic also. If you
multiply that by 90 languages it is still a lot of data, but *way*, *way* better
than 558 x 90 we are faced with now! So that is the reason for Step #4.1 in
http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/time_zone_localization.html

But we still need some fallback in case there is no unique country, and no
translated city. Now, it may be better to nuke #4.2 and #5, e.g. dropping the
GMT part. GMT format when there is no daylight savings does not lose any
information (nowadays). Where there is daylight, it does lose information -- 
although actually not much -- but avoids the problem of using cities that may
either be unknown to the user or not in a script s/he can read. The only place
where it is ambiguous (within a country) is if you have two zones that have the
same summer & winter offsets, but start at different times. That is pretty rare.
(Across countries, or historically, it is not quite so rare.) That being said,
we are not wedded to the GMT format either; have to toss it around a bit.

You are right that GMT format does not protect against future changes; but we
have to look at likelyhood. The city format also doesn't protect against all
possible changes; I might use America/Los Angeles right now meaning my time
zone, but if the N. California counties changed to a different zone, splitting
that one, then it wouldn't be correct any more.

[Of course, what would really be nice is if the world could agree to all switch
to/from daylight savings at the same (local) time, e.g. 02:00 the last Sundays
in March and September. Then you could convey all modern zones with three
formats, without loss of information:
- GMT-08:00 (for no daylight savings)
- GMT-08:00N (for daylight savings March-Sept), and
- GMT-08:00S (for daylight savings Sept-March).
Of course, the chances of something sensible like this are, well,  zip.]

>
> When I'm communicating with users about the Reuters Health system, I always
> refer to events occurring at such-and-such a time, New York time.  That
> communicates not only a GMT offset but a set of DST rules.  This is also
> what's typically done in legal documents -- see the legal ads for bond
> redemption announcements in a newspaper.
>
> -- 
> John Cowan <jcowan at reutershealth.com>     http://www.reutershealth.com
> I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan
> han mathon ne chae, a han noston ne 'wilith.  --Galadriel, LOTR:FOTR
>
>



More information about the tz mailing list