[tz] Support for the IANA database in the Python standard library

Paul Ganssle paul at ganssle.io
Wed Feb 26 02:40:49 UTC 2020


Thanks for the comments, I'll update the proposal in response. My
responses are here:

> * Having global state can lead to problems (even in single-threaded
> apps), and this proposal has global state in the implementation of the
> 'set_tzpath' function. Instead of that function, how about an optional
> argument to the ZoneInfo constructors? The optional argument would
> specify the path for apps that don't want the default path. Then apps
> wouldn't need set_tzpath.

Yes, `set_tzpath` is the part of this proposal that I am most ambivalent
about and also, unsurprisingly, the one I've gotten the most pushback
about. My response to that aspect of it is found in my last reply in
this post:
https://discuss.python.org/t/pep-615-support-for-the-iana-time-zone-database-in-the-standard-library/3468/4

Basically, both the mutable global state and the constructor argument
have undesirable interactions with the caching behavior of the main
constructor (and dropping the caching behavior would cause worse
problems). My position is that you probably /shouldn't/ call set_tzpath
except in specific situations like tests or "call it once in main()
before anything else has run", and that a free function that mutates the
global state accomplishes this better and encourages people not to make
a bunch of calls to the constructor with inconsistent search paths.

I'm amenable to dropping the whole thing, though, TBH.

> * The proposal is silent on the issue of what happens during updates
> to tzdata files while a Python program is running. Should this update
> be reflected immediately in all Python-level calls? Should it wait
> until the next call to ZoneInfo on the affected zone, which means that
> zones are not immutable? Should the update be ignored and the program
> proceed as if the update hasn't happened? Or are all these valid
> implementations of the spec? Whichever you decide, the decision should
> be written down so that developers are put on notice. 

I am going to update this to clarify, but I think this is /mostly/
covered by the caching behavior described in the section on constructors
<https://www.python.org/dev/peps/pep-0615/#id6>, once I make explicit
the assumption that the full time zone data must be eagerly read into
memory at construction (rather than being implemented in terms of system
calls or something of that nature). With that assumption in place, the
answer is that the data is updated whenever a cache miss occurs - the
first time any given zone is constructed or, depending on the
implementation, the first time it is constructed after a previous
version has been ejected from the cache (in the reference
implementation, we use a "strong" LRU cache of 8 zones and an unbound
"weak" cache, so if you construct 9 zones and hold no references to any
of them, constructing the first one again will be a cache miss, and the
other 8 will be a cache hit).

This does mean that if you call ZoneInfo("America/New_York") when your
installed zoneinfo data is 2019c and then you upgrade to 2020a and call
ZoneInfo("US/Eastern"), the two objects may have different behaviors,
but I think this is mainly unavoidable without a pretty significant
performance impact.

In any case, I'll add a whole section on "what happens if the tzdata is
updated during an interpreter run", thanks for the pointer.

> * How do I get tzdata-based leap second support, if I want it? Or are
> leap seconds specifically out of scope?

Beyond the fact that I plan to ship non-"zone" files in the tzdata
fallback package (and thus include the leap seconds), leap seconds are
out of scope for this proposal. Python's datetime type has no support
for leap seconds currently, and other than being tracked in the same
database, I think they're at least somewhat orthogonal to the primary
problem we're solving here (a tzinfo implementation).

Leap second support is on my long list of improvements for the datetime
module, so I'll probably get around to it at some point in the future.

I have had another query about leap seconds, so I'll probably want to
add to the proposal that it's out of scope.

> * You might want to mention TZDIST (RFC 7808) as an alternative source
> for tzdata that a Python implementation could use instead of TZif
> files in a local filesystem. 

Yes, I will have to look into this. My main concern is that my hope is
to try to use a time zone data source that can be managed at the system
level, independent of language stack. I will admit to never having
looked into the details, but I was under the impression that tzdist was
something that the system would consume, rather than individual
programs, is that wrong?

I also am not clear - are there public tzdist servers, or is the
suggestion that we would have a Python-specific tzdist service and end
users would subscribe to it for updates?

I'm mainly asking because I decided early on (on some very good advice)
that effectively distributing the data is a big enough task on its own
that it would bog down the initial implementation to try and handle both
at once, so my goal with this is to get something that will work /if you
have the data/, and provide a reasonable way to get the data and handle
the data distribution in a separate proposal. If tzdist is consistent
with a backwards compatible upgrade from a version using TZif files at
some point in the future, I'm happy to put it off as, "We should look
into this when we try to solve the distribution problem." It sorta seems
like it /should/ be possible to seamlessly transition from system files
to tzdist (at least depending on how strong our promises are about the
tz search path, anyway).

In any case, thanks so much for the comments, you've given me a lot of
food for thought already!

Best,
Paul

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/tz/attachments/20200225/27b765d7/attachment.html>


More information about the tz mailing list