[tz] timezone DB distribution

Brian Inglis Brian.Inglis at SystematicSw.ab.ca
Sat Aug 22 19:29:39 UTC 2020


On 2020-08-20 14:37, Paul Eggert wrote:
> On 8/20/20 12:33 PM, Brian Inglis wrote:
>> They often have bigger issues with space for decoding, data storage, and use;
>> one suggestion was a stream compressed list of file base names and POSIX strings
>> from the last line of the files e.g.
> 
> The main argument for POSIX strings is simplicity rather than saving storage or
> network capacity or even CPU time. Currently all the tzdb data can be compressed
> to 22,418 bytes, via 'lzip -9 tzdata.zi'. Although limiting tzdata to
> (compressed) names and POSIX strings would shrink that considerably, I doubt
> whether the data shrinkage is worth it, except perhaps for embedded applications
> where all timestamps are future timestamps.

Agreed - the issues are not on IoT dev SoC boards with GB of RAM and flash,
running minimal BSD or Linux distros, but the deployed SoCs may have only MB or
even KB of ROM, RAM, and perhaps no flash.
The code space and memory required for modern decompressors may be over the
available memory budget on the chip and perhaps over the available time budget
for the deployed processor speed.
The simple reference lzip decompressor lzd requires 5MB virtual on a 32 bit system.
Editing approaches such as using tzdata.zi, POSIX TZ strings, airport codes for
locations, and eliminating all but a single location with identical current rule
sets per offset, probably reduce size more and allow lighter weight code for
limited memory systems.

For comparison, I took the current continent data files only,
concatenated, tarred, zipped, 7zed them to produce the tz.* files,
also picked the POSIX TZ strings from the last lines of the equivalent zoneinfo
files, kept only the basename of the path, and manually eliminated truly
redundant duplicated names to get tz-posix.log,
then compressed tz.tar, tz.txt, tzdata.zi, and tz-posix.log, using the available
compressors,
all using the highest -9 compression where available, including zip with bzip2,
to get the following file and archive sizes:

750K  tz.tar
741K  tz.txt
329K  tz.tar.Z
328K  tz.txt.Z
251K  tz.tar.gz
250K  tz.txt.gz
225K  tz.tar.zst
224K  tz.txt.zst
219K  tz.zip
202K  tz.tar.7z
202K  tz.tar.xz
202K  tz.7z
202K  tz.txt.7z
202K  tz.txt.xz
202K  tz.tar.lz
201K  tz.txt.lz
192K  tz.tar.zip
192K  tz.tar.bz2
192K  tz.txt.zip
192K  tz.txt.bz2
109K  tzdata.zi
 39K  tzdata.zi.Z
 27K  tzdata.zi.gz
 26K  tzdata.zi.zst
 22K  tzdata.zi.7z
 22K  tzdata.zi.xz
 22K  tzdata.zi.lz
 22K  tzdata.zi.zip
 22K  tzdata.zi.bz2
 12K  tz-posix.log
5.4K  tz-posix.log.Z
3.8K  tz-posix.log.gz
3.7K  tz-posix.log.zst
3.6K  tz-posix.log.zip
3.6K  tz-posix.log.7z
3.5K  tz-posix.log.xz
3.5K  tz-posix.log.lz
3.5K  tz-posix.log.bz2

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]


More information about the tz mailing list