[tz] WSJ follows AP to Kyiv

Guy Harris guy at alum.mit.edu
Tue Nov 19 23:22:01 UTC 2019


On Nov 19, 2019, at 2:31 PM, Bryan J Smith <b.j.smith at ieee.org> wrote:

> So this is yet another consideration, because I understand it (again, insert possibly ignorance on my part) ... 
>  - Linguistical:  Anglicized
>  - Technical:  Historically UTF-7 US/NIST ASCII (again, circa '86)

The items in theory.html that might correspond to those would presumably be:

* Use only valid POSIX file name components (i.e., the parts of names other than '/'). Do not use the file name components '.' and '..'. Within a file name component, use only ASCII letters, '.', '-' and '_'. Do not use digits, as that might create an ambiguity with POSIX TZ strings. A file name component must not exceed 14 characters or start with '-'. E.g., prefer Asia/Brunei to Asia/Bandar_Seri_Begawan. Exceptions: see the discussion of legacy names below.

and

* Use mainstream English spelling, e.g., prefer Europe/Rome to Europa/Roma, and prefer Europe/Athens to the Greek Ευρώπη/Αθήνα or the Romanized Evrópi/Athína. The POSIX file name restrictions encourage this guideline.

We should probably *exclude* UTF-7, as +{modified Base64-encoded UTF-16}- 1) is ugly and 2) makes it more likely that we'd hit the 14-character limit.  Admittedly, the 14-character limit is *probably* not an issue on most UN*Xes and isn't an issue on modern Windows ("modern" as in "post-1995" :-)), but we may want to continue it for compatibility with any other software that has that limit wired in.

So it's historically "POSIX portable character set":

	https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html

and probably "printable characters in the POSIX portable character set, and no spaces, either".

I'll leave it up to Paul to give the justification for why we prefer mainstream English spelling even if that can be done with printable ASCII.
 
> The former addresses ... 
>  - Roma, which fits in UTF-7

That's not a component of a former tzdb identifier!  We *never* used Europe/Roma - it was *always* Europe/Rome under the new naming scheme.  (I'm not sure what we did, if anything, for Italy in the old naming scheme; there's nothing in the "backward" file, as opposed to (the Republic of) Ireland, where we now have Europe/Dublin and previously had Eire.)

Calcutta *is* a component of a former address; both it *and* Kolkata fit in printable ASCII.

> Plus, now ... 
>  - Kyiv, which fits in UTF-7

Or, more to the point, fits in printable ASCII.  We could fit both Київ and Киев into UTF-7, but not into printable ASCII.  We can, however, fit both Kyiv and Kiev into printable ASCII, so the *character encoding required* is not an issue when it comes to "Kyiv" vs. "Kiev".

> But we could clarify further, historically.  So maybe it's time to put something, even if just further clarifications, in theory.html, as this comes up and I could see it this way, as others do.
> 
> Then, _after_ that, only could the project (not saying so) start the debate of (and this is just what I could think) ... 
> 
>  A)  If it fits in UTF-7, should we allow it to be changed?

Only if it also fits in printable ASCII.  If it requires anything outside of printable ASCII, no.

>  B)  If it fits in some Latin set, will it be allowed to be changed?

See previous answer.

> I.e., I don't see this as Kyiv only, but being a project-wide change to Roma, possibly others ... and then the 'rolls downhill' after that.

"Rome" vs. "Roma" is not equivalent to "Kyiv" vs. "Kiev".  "Kyiv" vs. "Kiev" is closer to "Kolkata" vs. "Calcutta".

We use "Rome" rather than "Roma" because we want "mainstream English spelling".

"Kyiv" and "Kiev" *are* romanized versions of "Київ" and "Киев", respectively.  Neither "Київ" nor "Киев" are "mainstream English spelling"; the question is whether "Kyiv" or "Kiev" is the "mainstream English spelling" of the city's name.  It appears that "Kyiv" has become more mainstream over time.

The "mainstream English spelling" of city names in countries where English is not a (the?) primary language doesn't necessarily match that of the name in the native language.  "Berlin" is "Berlin", but "München" is "Munich" and "Москва́" is "Moscow".  Perhaps, in some alternate world, the English-language name of the capital of Ukraine could have been "Kiv", which is neither a transliteration of the Ukrainian name nor the Russian name, in which case the tzdb id would have been "Europe/Kiv".


More information about the tz mailing list