[tz] Idea for internationalized time point unique time zone abbreviations

Thu Jun 7 16:53:20 UTC 2012

Boruch Baum <boruch_baum at gmx.com> writes:
> CRITICISM: As Andy Lipscomb, Brian Inglis and Peter Machata all
> recently posted, localizations for all timezone names and
> abbreviations can (and should) be expected to be in the local language
> and character set.
>
> Who is/are responsible for maintaining each locality's localization
> lists, in order, for example, to guarantee uniqueness?

In Fedora Linux in particular zone identifier translations are stuffed
at system-config-date.  I recall that Debian has a dedicated package for
that.  In both these cases, the translation is done by calling gettext
on a time zone identifier.

There are some improvements to be made.  In particular, the zone ID
prefixes (e.g. "America") get translated each time a new zone appears.
Typos are not unheard of.  This could be solved by translating each
fragment separately (providing context to disambiguate namespace
collisions) [1].  Nills Philippsen and I have been about to publish a
zoneinfo translation repository for some time now, but can't seem to get
around to it.  As a bonus you would then be able to partially translate
even new zones.  E.g., you could translate America/Indiana/NewZone into
a hybrid Америка/Индиана/NewZone, which is probably better than the bare
thing.  (At least it sorts correctly.)

So this can be solved, and is, away from zoneinfo project, and there are
existing Open Source translation efforts for time zone strings in
particular.  Of course this is only for presentation purposes.  You
don't translate contents of $TZ much like you don't translate program
option names or library function names.

gettext might be kinda sorta used for translating abbreviations as well.
One would need to mangle the abbreviation, e.g. "Europe/Prague//CET", to
construct a unique string, which is admittedly quite awkward.  (Also,
glibc wouldn't know to do it when formatting localized date strings.)
Then you could shove this to gettext and get back "SEČ" as you expect.
Scripts could be written to sort the abbreviations into pools of
equality, so that "Europe/Prague//CET" doesn't have to be translated
separately from "Europe/Bratislava//CET" etc.

[1] Scripts for this are available:
    https://github.com/pmachata/zoneinfo-localization

> The GLIBC library of GNU/Linux currently has a function nl_langinfo,
> which gives programmers an easy way to localize pretty much all time
> and date related data elements... but not timezone names or
> abbreviations. The GNU/Linux coreutil commandline 'date +%Z' command
> will return a timezone abbreviation, but seemingly never localized. I
> see no option in the GNU/Linux coreutil commandline 'date' command to
> return a timezone name.

That's because the TZ identifier is not stored in zoneinfo file.  Doing
so would prevent us from hardlinking equal zones to save disk space.
(Though I don't know if this is the reason the TZID is absent from the
file, or the hardlinking trick is the consequence of this.)

Furthermore, the time and date functions just look into /etc/localtime,
which is a _copy_ of the zoneinfo file for your time zone.  It's a copy
for historical reasons: zoneinfo would be installed into /usr/share, and
/usr could be separately mounted, and you need correct time even if the
mount fails, or before it happens.  I'm not sure how relevant this is
these days.  In Fedora 17 in particular, /usr is now the canonical
location for system files.  /bin /lib* etc. are just symlinks to /usr.
But in any case, "date" and glibc simply don't know, in general, what
the zone ID is.  It's just /etc/localtime.

Thanks,
PM