[tz] Pre-1970 data

Sat Nov 6 07:54:24 UTC 2021

On Fri, Nov 05, 2021 at 02:53:23PM -0700, Brian Park via tz wrote:
> On Fri, Nov 5, 2021 at 12:01 PM Brian Park <brian at xparks.net> wrote:
> 
> > I agree that it is conceptually cleaner if the Core TZDB identifiers were
> > internal only. But I understand that some people would consider ISO-country
> > identifiers to be out of scope of this project, although there are many ad
> > hoc ones currently in the database. I think a file like 'countryzone'
> > should be added only if there are people willing to maintain such a list.
> > It may need to be a separate project, to avoid forcing the TZ Coordinator
> > to pick up the slack if those maintainers drop off.
> >
> 
> Following up my own post, I took an initial stab at what this 'countryzone'
> file would look like, and immediately ran into problems that convinces me
> that this does *not* belong in the TZDB project. The scope seems too large,
> so it seems better as a separate project.
> 
> I started from an ISO-3166 CSV file (see
> https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes for a human
> readable version), and I found:
> 
> 1) Many country names are too long to fit into 14 characters. Let's say we
> relax that constraint because we deprecate support for any old Unix system
> that cannot support these longer file names. But there are countries like
> "Heard Island and McDonald Islands", "South Georgia and the South Sandwich
> Islands", and "United States Minor Outlying Islands", and "British Indian
> Ocean Territory". Just from an ergonomics perspective, we should find a way
> to shorten these very long names.
> 
> 2) If we shorten some countries, like "Bosnia and Herzegovina" to just
> "Bosnia" for convenience, are we going to offend people? I don't know
> anyone from Bosnia and Herzegovina, so I have no idea. Each country that we
> shorten needs to be researched carefully.
> 
> 3) At least 5 countries have non-ASCII characters in their ISO names: "Côte
> d'Ivoire ", "Curaçao", "Åland Islands", "Saint Barthélemy", "Réunion".
> Personally, I would like to use only ASCII characters because they are the
> lowest common denominator that is guaranteed to work, outside of mainframes
> using EBCDIC. If we remove these non-ASCII characters, are we going to
> offend the people of those countries, even though these are supposed to be
> English versions of their country names?

This also brings up the question about why any of the subregion
identifiers should be included? They are not countries and I find it
hard to defend that Jan Mayen (population: 4 (scientists on the
weather station)) should have it's own time zone when the US state
of e.g. Texas shouldn't.

/MF