[tz] [PROPOSED PATCH 2/2] Use lz format for new tarball
Alexander Belopolsky
alexander.belopolsky at gmail.com
Tue Aug 30 22:34:41 UTC 2016
On Tue, Aug 30, 2016 at 5:18 PM, Paul Eggert <eggert at cs.ucla.edu> wrote:
> $ ls -l tz*.tar.*z*
> -rw-r--r-- 1 eggert eggert 202609 Aug 30 14:00 tzcode2016X.tar.gz
> -rw-r--r-- 1 eggert eggert 394169 Aug 30 14:00 tzdata2016X.tar.gz
> -rw-r--r-- 1 eggert eggert 426667 Aug 30 14:10 tzdb-2016X.tar.bz2
> -rw-r--r-- 1 eggert eggert 382991 Aug 30 14:00 tzdb-2016X.tar.lz
>
If the size of data distribution is a concern, it looks like one can
achieve a much better compression by simply discarding comments in the data
files:
$ cat africa antarctica asia australasia \
europe northamerica southamerica | wc -c
647830
$ cat africa antarctica asia australasia \
europe northamerica southamerica | egrep -v '^\w*(#.*|$)' | wc -c
151231
Given the structured (low entropy) nature of the resulting stream, it
compresses very well:
$ cat africa antarctica asia australasia \
europe northamerica southamerica | egrep -v '^\w*(#.*|$)'| xz -c | wc
-c
24600
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/tz/attachments/20160830/c96c8d14/attachment.htm>
More information about the tz
mailing list