[tz] Error in Scandinavian tz data

Stephen Colebourne scolebourne at joda.org
Tue Jun 27 11:09:12 UTC 2023


On Tue, 27 Jun 2023 at 10:31, Paul Eggert <eggert at cs.ucla.edu> wrote:
> On 2023-06-27 01:35, Stephen Colebourne via tz wrote:
> > The simplest approach would be to determine a rule, eg.
> >   - take the standard offset that applied on most days (modal average)
> > for the 5 years from 1970 to 1975 - and use that pre-1970.
> > - or, take the LMT of the city and round to the nearest hour
> > So long as the answer is reasonable for most cases, it would be fine.
>
> This would take the existing TZif files (admittedly problematic, as you
> say) and make them worse, as they'd become wrong for every location,
> even the location that names the Zone.

Being slightly wrong everywhere is a much better outcome than what we
have today.


> Surely it would be better to discard the pre-1970 data - then users
> would be on notice that it's missing. And there's a standard way to do
> that, documented in the Makefile: use 'make ZFLAGS=-r at 0'. Perhaps this
> option should be documented more prominently.
>
> It's not clear that -r at 0 should be the Makefile default, though, as that
> could well cause more trouble than it would cure. For example, it would
> cause the following behavior:
>
>         $ export TZ=Europe/Copenhagen
>         $ date -r 1; date -r 0; date -r -1
>         Thu Jan  1 01:00:01 CET 1970
>         Thu Jan  1 01:00:00 CET 1970
>         Wed Dec 31 23:59:59 -00 1969
>
> and the UT offset zero and abbreviation -00 of pre-1970 timestamps would
> likely give many users pause. That being said, in installations not
> needing pre-1970 timestamps, -z at 0 is a clear win.

In most cases, end users do not pick what options to install with.
They just get what they are given by their packager.

Since the packager cannot know whether the end user wants pre-1970
data or not, a sensible packager will err on the side of providing
more data, not less, and thus want to include pre-1970 data.

'make ZFLAGS=-r at 0' is of no interest to packagers precisely because it
is obviously wrong. ie. choosing UTC does not make people think, it
just means the make option is not used. It is simply not good enough
to be a viable choice. It is also not close enough to any values that
have previously been placed in long-term storage.

Hopefully all this explains why "Surely it would be better to discard
the pre-1970 data then users would be on notice that it's missing"
(ZFLAGS=-r at 0) isn't a viable route forward.


The rule-based truncation I outlined above is a compromise position
suitable for use as the default in the makefile. I believe it meets
the needs of packagers who would hopefully accept it. It provides
truncated data pre-1970, but in a way that is not completely
unreasonable:
* It is good enough for most use cases except those that really care
about historical detail
* It avoids the weird per-second LMT offsets in the far past that
often confuse end users
* It is close enough to what end users have in long-term storage to
not cause migration issues
Just returning UTC does not meet these goals.


> it's likely wrong for Copenhagen

I think you do a disservice to TZDB's many authors here. I'd argue
that the data for Copenhagen is likely to be entirely correct, as it
has had many eyes on it for many years.

Beyond that, I think there is also a need to recognise that TZDB's
pre-1970 data is the de facto truth for large parts of the world. Most
end users don't care about the accuracy, just that someone has made an
effort to record it. I still believe that reinstating the data would
be by far the best outcome, but a rules-based truncation approach
would be a viable alternative if pre-1970 data is not something to be
maintained.

Stephen



More information about the tz mailing list