[tz] [PROPOSED PATCH 1/2] Also distribute code and data in a single tarball
Paul Eggert
eggert at cs.ucla.edu
Mon Aug 22 18:57:08 UTC 2016
Robert Elz wrote:
> Old style distribution was a single file containing both code and data.
> It was split because it was more manageable (for everyone) that way
I must say it's not more manageable for me, and I recall at least one other
complaint about splitting the distribution in two.
The original idea was that the code would be stable and so could be distributed
separately and independently from the data. And for a while we did that: for
example, in 1999 there was one more data than code release, and users just had
to know that tzcode1999h was released contemporaneously (and tested) with
tzdata1999i. But this was confusing. On at least one occasion it confused even
me, and I released something with the "wrong" version number. So starting in
late 2012 code and data releases have been issued in lockstep, even if there is
no change to the code other than the version number. So nowadays the release
comes as two tarballs, but it's really just one release.
Unfortunately the tarball separation is incomplete, as some files are
distributed in both tarballs, which means that if you combine tzcode version X
with tzdata version Y (not recommended) you may need to resolve conflicts.
There's another point of confusion with the old format. Its tarballs extract
files into the working directory, which is unusual and I recall at least one
complaint about it. It's almost universal practice nowadays for distribution
tarballs to extract into a subdirectory named after the distribution. In some
circles having a bunch of top-level files in an archive is evem considered to be
a security risk. In the long run it'll be a win if we switch to common practice
and avoid these problems.
I hope the above all helps to explain the format switch a bit more.
People who just want the data can grab the unified tarball and get just the
files they want. That's what they do now anyway, as the data tarball contains
several non-data files.
> ps: I don't really see any need for a better compression technique than
> what is currently used (or really, for that matter, any compression at all.)
The need for compression will go up once we start distributing .tzs files (which
partly motivated the format switch). With the current draft the tarball sizes
look like this:
bytes file
665600 tzcodeX.tar
1669120 tzdataX.tar
2334720 concatenation of tzcodeX.tar + tzdataX.tar
202051 tzcodeX.tar.gz
393052 tzdataX.tar.gz
595103 concatenation of tzcodeX.tar.gz + tzdataX.tar.gz
383796 tzdb-X.tar.xz
Granted, if you're well connected all these sizes are small. But if you're
paying for each kilobyte of download, a 36% savings over split .gz format (84%
savings from raw tarballs) is nice to have.
Also, the compressed tzdb tarball is a tad smaller than the compressed tzcode
tarball, so even if you want just the data you're still a bit ahead by
downloading the new combined format.
More information about the tz
mailing list