[tz] Dealing with Pre-1970 Data
guy at alum.mit.edu
Sun Sep 1 01:18:43 UTC 2013
On Aug 31, 2013, at 4:39 PM, Stephen Colebourne <scolebourne at joda.org> wrote:
> On 31 August 2013 22:06, Paul Eggert <eggert at cs.ucla.edu> wrote:
>> Lester Caine wrote:
>>> having a single repository for ALL available data should be a goal?
>> Sure, it's a worthy goal, even if pre-1970 timestamps are
>> currently out of scope for the current database. We could
>> collect all the data that we can for an extended database
>> that contains new zones that differ from existing ones only
>> for pre-1970 timestamps. We could then derive the current
>> database by applying a filter to the extended database,
>> along the lines that Zefram suggested. This filtering could
>> be done automatically and at the source level, so existing
>> tz source file readers would not need to be changed, and we
>> wouldn't have to maintain two copies of the database.
> I can see no reason why an optional additional file "extended" could
> not exist for new zone IDs that only exist to record history before
> 1970. Such additional zones would not appear in zone.tab. Most people
> would just ignore the "extended" file.
> However, I would argue that any zone ID that already exists (or is
> newly created) should have its full pre-1970 history retained and
> enhanced within the main tzdb files, so all current consumers simply
> pickup the enhancements.
I.e., grandfather in *existing* tzids that exist only to record history before 1970, leaving them in the non-"extended" part of the database, but don't add *new* such tzids to the non-"extended" part of the database?
> Filtering data applies to database consumers via zic, not the database
> itself. The database itself should not be limited to storing data
> after 1970. If someone makes a contribution to an existing ID before
> 1970, that data should be included in the main files. Whether that
> contribution causes a Link to become a full Zone should never be a
> relevant factor.
I can see at least two ways of handling this:
1) have the database include the full collection of tzids wherein if two regions had different standardized time at any point, they get separate tzids, have a winnowing process that can be instructed to somehow winnows out regions that had the same standardized time subsequent to some cutoff date (e.g., January 1, 1970), and allow consumers of the database to do whatever winnowing they choose, including none;
2) have an "extended" database for any *new* cases where we split a region due to pre-1970 differences, preserve the existing such splits, and not have a winnowing process, just a choice, on the part of consumers of the database, to use the "extended" file or not.
("Consumer" here doesn't mean "end user"; I suspect most end users don't know enough to care and may not even want to care - they're busy dealing with other problems. "Consumer" here refers to software that packages the database and uses it.)
Both of those preserve all the historical information we have, and leave it up to the consumer to decide whether they care at all about pre-1970 differences or not. The first doesn't let the consumer (again, as defined above) choose to include some but not all zone splits due to pre-1970 differences (in particular, it doesn't let the consumer preserve the existing splits); the latter doesn't let the consumer choose to include none of them.
The first suggestion is what I think Zefram was suggesting; the second suggestion sounds like what you're suggesting.
Either suggest would allow full support for pre-1970 standardized times (to the extent that any given release the database has information about those times) to be provided by a consumer.
The first one allows a consumer to winnow out of all tzids that are not of interest if they don't view supporting conversion of pre-1970 times as something they want to put significant effort into supporting (I suspect that a number of UN*Xes might choose that option), reducing the number of options to present to whoever's configuring the system to use a particular tzid.
The second one doesn't require choosing between tossing out all splits due to pre-1970 differences and tossing out none of them.
(What reasons, if any, exist for removing tzids that exist due to differences in pre-1970 standardized time? Is that intended to reduce the number of zones to present to someone configuring the system if your system doesn't make a vigorous effort to support those times, and to reduce the effort needed to maintain them? Is it intended to be consistent with a rule forbidding adding *new* tzids due to differences in pre-1970 standardized time? Is it intended to reduce political disputes over tzids - presumably by reducing the number of tzids and thus the number of possible points of complaint, as there's nothing particularly politically interesting about January 1, 1970, as far as I can tell?
I went back through the archive and, if the controversial changes are the ones from the "Move links to 'backward' if they exist only because of country codes." message, I'm probably missing something, as that doesn't appear to be discarding existing pre-1970 splits, at least from the description.)
More information about the tz