Back-of-the-envelope cost of extra data :-)

Paul Eggert eggert at CS.UCLA.EDU
Wed May 4 17:47:52 UTC 2005


"Olson, Arthur David (NIH/NCI)" <olsona at dc37a.nci.nih.gov> writes:

> This output...
> 	Script started on Wed 04 May 2005 10:15:00 AM EDT
> 	lecserver$ du -s tz*/tmp/*/zoneinfo
> 	489	tz/tmp/etc/zoneinfo
> 	1709	tzexp/tmp/etc/zoneinfo
> 	lecserver$ exit
>
> 	script done on Wed 04 May 2005 10:15:08 AM EDT
> ...indicates that "old-format" data eats up about half a megabyte of disk
> space total while "new-format" data eats up about 2 megabytes.

This depends on filesystem blocking.  I calculate the actual data as
growing from 301,265 to 2,128,478 bytes, a factor of 7.  The
old-format data produces tiny files that waste a lot of disk space due
to internal fragmentation; the new-format data produces larger files,
with less internal fragmentation.  With Solaris 9 UFS, using 1 KiB
units, I get:

$ du -sk tz*/etc/zoneinfo
660	tz-0/etc/zoneinfo
2445	tz-1/etc/zoneinfo

or a 3.7x growth on that file system.  I don't know what units your
"du" was generating output for, but I'm a bit surprised if it the
units are 1 KiB, as it's indicating that you're shoehorning 
2,128,478 bytes into 1709 KiB.  Are you using a compressed file system?

> Maximum total cost: 650 million computers * three twentieths of a cent:
> $975,000 (ulp!)

Yup.  It'd be nice to shrink this a bit, if it's feasible.  Perhaps go
to a varying-width format?



More information about the tz mailing list