[tz] [PROPOSED PATCH 2/2] Use lz format for new tarball

Alexander Belopolsky alexander.belopolsky at gmail.com
Tue Aug 30 22:34:41 UTC 2016


On Tue, Aug 30, 2016 at 5:18 PM, Paul Eggert <eggert at cs.ucla.edu> wrote:

>   $ ls -l tz*.tar.*z*
>   -rw-r--r-- 1 eggert eggert 202609 Aug 30 14:00 tzcode2016X.tar.gz
>   -rw-r--r-- 1 eggert eggert 394169 Aug 30 14:00 tzdata2016X.tar.gz
>   -rw-r--r-- 1 eggert eggert 426667 Aug 30 14:10 tzdb-2016X.tar.bz2
>   -rw-r--r-- 1 eggert eggert 382991 Aug 30 14:00 tzdb-2016X.tar.lz
>

If the size of data distribution is a concern, it looks like one can
achieve a much better compression by simply discarding comments in the data
files:

$ cat africa antarctica asia australasia \
    europe northamerica southamerica | wc -c
  647830
$ cat africa antarctica asia australasia \
     europe northamerica southamerica | egrep -v '^\w*(#.*|$)' | wc -c
  151231

Given the structured (low entropy) nature of the resulting stream, it
compresses very well:

$ cat africa antarctica asia australasia \
     europe northamerica southamerica | egrep -v '^\w*(#.*|$)'| xz -c | wc
-c
   24600
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/tz/attachments/20160830/c96c8d14/attachment-0001.html>


More information about the tz mailing list