[tz] [PROPOSED PATCH 1/2] Also distribute code and data in a single tarball

Paul Eggert eggert at cs.ucla.edu
Mon Aug 22 18:57:08 UTC 2016

Robert Elz wrote:
> Old style distribution was a single file containing both code and data.
> It was split because it was more manageable (for everyone) that way

I must say it's not more manageable for me, and I recall at least one other 
complaint about splitting the distribution in two.

The original idea was that the code would be stable and so could be distributed 
separately and independently from the data. And for a while we did that: for 
example, in 1999 there was one more data than code release, and users just had 
to know that tzcode1999h was released contemporaneously (and tested) with 
tzdata1999i. But this was confusing. On at least one occasion it confused even 
me, and I released something with the "wrong" version number. So starting in 
late 2012 code and data releases have been issued in lockstep, even if there is 
no change to the code other than the version number. So nowadays the release 
comes as two tarballs, but it's really just one release.

Unfortunately the tarball separation is incomplete, as some files are 
distributed in both tarballs, which means that if you combine tzcode version X 
with tzdata version Y (not recommended) you may need to resolve conflicts.

There's another point of confusion with the old format. Its tarballs extract 
files into the working directory, which is unusual and I recall at least one 
complaint about it. It's almost universal practice nowadays for distribution 
tarballs to extract into a subdirectory named after the distribution. In some 
circles having a bunch of top-level files in an archive is evem considered to be 
a security risk. In the long run it'll be a win if we switch to common practice 
and avoid these problems.

I hope the above all helps to explain the format switch a bit more.

People who just want the data can grab the unified tarball and get just the 
files they want. That's what they do now anyway, as the data tarball contains 
several non-data files.

> ps: I don't really see any need for a better compression technique than
> what is currently used (or really, for that matter, any compression at all.)

The need for compression will go up once we start distributing .tzs files (which 
partly motivated the format switch). With the current draft the tarball sizes 
look like this:

     bytes file

    665600 tzcodeX.tar
   1669120 tzdataX.tar
   2334720 concatenation of tzcodeX.tar + tzdataX.tar

    202051 tzcodeX.tar.gz
    393052 tzdataX.tar.gz
    595103 concatenation of tzcodeX.tar.gz + tzdataX.tar.gz

    383796 tzdb-X.tar.xz

Granted, if you're well connected all these sizes are small. But if you're 
paying for each kilobyte of download, a 36% savings over split .gz format (84% 
savings from raw tarballs) is nice to have.

Also, the compressed tzdb tarball is a tad smaller than the compressed tzcode 
tarball, so even if you want just the data you're still a bit ahead by 
downloading the new combined format.

More information about the tz mailing list