[tz] Error in Scandinavian tz data

Paw Boel Nielsen paw at boel-it.dk
Tue Jun 27 11:13:10 UTC 2023


> Beyond that, I think there is also a need to recognise that TZDB's
> pre-1970 data is the de facto truth for large parts of the world. 
> Most end users don't care about the accuracy, just that someone 
> has made an effort to record it. I still believe that reinstating the 
> data would be by far the best outcome, but a rules-based truncation
> approach would be a viable alternative if pre-1970 data is not 
> something to be maintained.

Hear hear

---------------------------
DISCLAIMER:
The information contained in this electronic message and in any attachments to this message is intended only for the person or entity to which this electronic message is addressed. If you are not the intended recipient, you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited.

-----Original Message-----
From: tz <tz-bounces at iana.org> On Behalf Of Stephen Colebourne via tz
Sent: Tuesday, June 27, 2023 1:09 PM
To: Time zone mailing list <tz at iana.org>
Subject: Re: [tz] Error in Scandinavian tz data

On Tue, 27 Jun 2023 at 10:31, Paul Eggert <eggert at cs.ucla.edu> wrote:
> On 2023-06-27 01:35, Stephen Colebourne via tz wrote:
> > The simplest approach would be to determine a rule, eg.
> >   - take the standard offset that applied on most days (modal 
> > average) for the 5 years from 1970 to 1975 - and use that pre-1970.
> > - or, take the LMT of the city and round to the nearest hour So long 
> > as the answer is reasonable for most cases, it would be fine.
>
> This would take the existing TZif files (admittedly problematic, as 
> you
> say) and make them worse, as they'd become wrong for every location, 
> even the location that names the Zone.

Being slightly wrong everywhere is a much better outcome than what we have today.


> Surely it would be better to discard the pre-1970 data - then users 
> would be on notice that it's missing. And there's a standard way to do 
> that, documented in the Makefile: use 'make ZFLAGS=-r at 0'. Perhaps this 
> option should be documented more prominently.
>
> It's not clear that -r at 0 should be the Makefile default, though, as 
> that could well cause more trouble than it would cure. For example, it 
> would cause the following behavior:
>
>         $ export TZ=Europe/Copenhagen
>         $ date -r 1; date -r 0; date -r -1
>         Thu Jan  1 01:00:01 CET 1970
>         Thu Jan  1 01:00:00 CET 1970
>         Wed Dec 31 23:59:59 -00 1969
>
> and the UT offset zero and abbreviation -00 of pre-1970 timestamps 
> would likely give many users pause. That being said, in installations 
> not needing pre-1970 timestamps, -z at 0 is a clear win.

In most cases, end users do not pick what options to install with.
They just get what they are given by their packager.

Since the packager cannot know whether the end user wants pre-1970 data or not, a sensible packager will err on the side of providing more data, not less, and thus want to include pre-1970 data.

'make ZFLAGS=-r at 0' is of no interest to packagers precisely because it is obviously wrong. ie. choosing UTC does not make people think, it just means the make option is not used. It is simply not good enough to be a viable choice. It is also not close enough to any values that have previously been placed in long-term storage.

Hopefully all this explains why "Surely it would be better to discard the pre-1970 data then users would be on notice that it's missing"
(ZFLAGS=-r at 0) isn't a viable route forward.


The rule-based truncation I outlined above is a compromise position suitable for use as the default in the makefile. I believe it meets the needs of packagers who would hopefully accept it. It provides truncated data pre-1970, but in a way that is not completely
unreasonable:
* It is good enough for most use cases except those that really care about historical detail
* It avoids the weird per-second LMT offsets in the far past that often confuse end users
* It is close enough to what end users have in long-term storage to not cause migration issues Just returning UTC does not meet these goals.


> it's likely wrong for Copenhagen

I think you do a disservice to TZDB's many authors here. I'd argue that the data for Copenhagen is likely to be entirely correct, as it has had many eyes on it for many years.

Beyond that, I think there is also a need to recognise that TZDB's
pre-1970 data is the de facto truth for large parts of the world. Most end users don't care about the accuracy, just that someone has made an effort to record it. I still believe that reinstating the data would be by far the best outcome, but a rules-based truncation approach would be a viable alternative if pre-1970 data is not something to be maintained.

Stephen


More information about the tz mailing list