[tz] Estimate for a new tzdb release with Brazil rules.

Fri May 17 00:52:39 UTC 2019

    Date:        Thu, 16 May 2019 13:42:57 -0700
    From:        Alan Mintz <alan.mintz at gmail.com>
    Message-ID:  <CAMLM5-XApfetNyjyG2wwuT6Br6oi0rOvuxoxGxTzM1sOtq3gGg at mail.gmail.com>

  | With all due respect for the volunteer nature of the project,

Really, that affects validating and dealing with the data more than
anything else - the "make a release" part of this is (by comparison)
trivial.

  | putting a specific timeframe on it is a good idea,
  | even if it's just a loose goal.

That may be, but not because of:

  | Policy makers are much more likely to co-operate when they have a specific
  | target (instead of just "as early as possible") that they use to push their
  | colleagues to a decision.

It would likely have exactly the wrong effect ... "we know we only
need to announce our decision a month in advance, as they guarantee
a new release within a month" - what we want if for the policy makers
to make their decisions as early as possible, and for that, it would
be better for them to believe it might take us a year to make their
new rules available!

Not that I would suggest that we go that way ... a specific timeframe is
more as a push upon us (well, upol Paul currently) to make  the releases
happening.   At least tzdata releases - nothing says that we have to
continue the rececent practice of doing source releases in sync with
data, every time.

The only real issue with doing "too many" releases (aside from using up the
alphabet for release names - which really, is very unlikely to happen) 
is concern for the workload imposed upon downstream maintainers, having
to continually import, test, and distribute updated versions (that is
real work that is part of a release).

But nothing says that they are required to do that for every release, or to
do it without waiting a while to see if a new release appears first.
But for that (as also being one of those people for now) I know I'd
much rather have a new release available, and ignore it for a while
in case something new appears to save myself some work, than to not have
it available, because someone else is waiting for the same reason, upstream,
and thus miss my release deadline and end up distributing known bad data
(or have to go install "experimental" patches - perhaps only installed that
day, and not yet really known correct, and thus take the risk of breakingw
what worked before, rathern than just failing to keep up).

Re earlier messages:

Tim Parenti <tim at timtimeonline.com> said:
  | For as low-volume as our changes are, I don't think cron is necessarily
  | the best strategy, 

Agreed, that was just a suggestion to require less manual work.

But:

  | though, and obviously, manually turning the crank on a release is
  | non-trivial, 

not obviously at all, it ought to be.   What's more, since we're telling
people (downstream distributors) that they should be fetching the data from
git, and distributing that, every git commit (push?) is effectively making
a new release (just in a different sequence of release names).  All of
the hard work ha shappened before that.   What follows to convert the
data in the git repo (assuming you have a git command!) into the tarballs
with 2019d names is just bookkeeping - the only slightly tricky part is
generating the signature, which needs access to the key (and which is the
real reason that using cron might not be the best idea).

  | For something ~6 months out, perhaps waiting ~4?6 weeks to catch straggling
  | changes is warranted,

Even that's too long, and there is no real point.   Better to get the
data out to as many people as possible, so we get better feedback, and
can make corrections (and a new release) on the odd occasion that it
is needed.

I think something between a week and ten days is about right to wait
for list readers and very early adoptors to find errors (which is all
we really should be waiting for - if we pick up some other jurisdictio's
changes in the same update, just by fluke, that's fine - but we should
never be waiting in the hope that someone else will change their rules
before we are forced to release the pervious changes - never).

So, pick a weekday (which includes weekend days) as the regular release
day, and then release everything that was in the repo 4 days earlier
than that (ie: make the tarballs then, but only actualy distribute them
if there are no reported problems).   So, release Friday, with everything
done up to the end of the previous Monday (or whatever).   (Or if
manual work is needed at IANA, release Monday, with the cutoff for the
data being the end of the previous Thursday - the actual days don't
matter).   If there is a problem, simply remove the pending tarballs
and that week's release doesn't happen.   (Similarly, if nothing has
changed, no new tarballs are made, and no release happens.)

Then downstrem can adopt their own strategy for updaing their releases
in the knowledge that they have new data available, and they can wait
until the next tzday if that is ahead of their deadline, to see if
there are more changes.

Finally, for anyone concerned that we might run out of alphabet, don't
be ... first because the a..z za..zz sequence contains 52 identifiers,
sufficient for one for every week of the year, but more rationally because
there simply never are that many changes.

Even in the one year when we "came close" to exhausting the a..z sequence
(it wasn't really that close, in 2009 we used 21 letters (up to 'u'),
which left 5 still available - that's about 20% - lots of margin there,
and that year was extraordinary, there were changes every month, 2 months
had 3, and 5 more had 2, the other 5 months just 1 change.   It is hard to
imagine anything more messy than that year - most years don't even get close.

kre

ps; Even if we keep doing combined source/data releases, nothing says that
we need to make new source tarballs from the repo every week - those can
be generated manually when it is decided that the sources should be updated
in the release - deciding to release sources is a more complex process, as
those really need to have been verified to work in different environments,
whereas the data either is correct, or is not.