[tz] localtime_r multiple times slower for Europe/Moscow timezone
gharris at sonic.net
Wed Jan 11 08:07:07 UTC 2023
On Jan 10, 2023, at 6:40 PM, Paul Eggert via tz <tz at iana.org> wrote:
> This is a known performance issue in glibc; see:
This dates back to the addition of 64-bit time support; in the discussion, Robbin Kawabata suggested:
> 1. This is another idea for supporting dates into the far future.
> Is it feasible for zic to encode variable information in the data file
> for the last set of rules, that would be used for times past the last
> entry in the transition table? Then localtime() would use the variables
> algorithmically rather than using table-driven data, for dates past the
> last table entry. Perhaps set a minimum size for the table entries
> (ie, have table entries at least up to year xxxx.)
> zic would effectively map the last set of transition rules of an Olson
> timezone to an equivalent POSIX timezone.
where the sequence of transitions stretching into the future is compressed by providing data for an algorithm to compute them.
Unless I missed a mail message in my archive search, we ended up going with
> As before, zic writes a second instance of headers and data to time zone files;
> the second instance has eight-byte transition times to cover far-future
> (and far past) cases. Zic also puts a newline-enclosed POSIX-style time zone
> string at the end of the file when possible (or, when a zone can't be
> represented using POSIX, puts a newline-enclode empty string at the end of the
> file). (Enclosing the string in newlines makes for meaningful output from the
> "tail -1" command applied to time zone files.) When a POSIX-style string is
> available, zic does *not* write 400 years worth of data.
> The files that don't have a POSIX string at the end are:
> For zones such as America/Godthab, we use the previous dodge of writing 400
> years worth of data to the time zone data file and then working modulo 400
> in localtime.
("enclode" is a typo for "enclosed"; Arthur later sent out a message with the subject "yet another try at 64-bit changes", in which the typo is fixed.)
The problem appears to be that, if the tzdb region's file 1) has no transitions past the time being converted and 2) has a POSIX TZ string, then localtime_r() and localtime() will, as per the above, parse the POSIX TZ string.
If the results of the first parse of the TZ string are saved and are reused for all subsequent, this won't be too bad.
If they are not, and you're converting a lot of times with the same tz setting, you're going to parse the same string over and over again, which seems a bit costly.
GNU libc's code does *not* save the results of the parse. Its code to test whether the time is past the last transition's time is
else if (__glibc_unlikely (timer >= transitions[num_transitions - 1]))
which suggest that they assumed that this was an unlikely case. There may have been a time when it was unlikely, but, with the current zic, this will, I think, be the case for any tzdb region that does not currently adjust the clocks, and there are quite a few of them.
I am waiting for Sourceware to give me a Bugzilla account. Once they do, I will point this out to them.
More information about the tz