Back-of-the-envelope cost of extra data :-)
Paul Eggert
eggert at CS.UCLA.EDU
Wed May 4 17:47:52 UTC 2005
"Olson, Arthur David (NIH/NCI)" <olsona at dc37a.nci.nih.gov> writes:
> This output...
> Script started on Wed 04 May 2005 10:15:00 AM EDT
> lecserver$ du -s tz*/tmp/*/zoneinfo
> 489 tz/tmp/etc/zoneinfo
> 1709 tzexp/tmp/etc/zoneinfo
> lecserver$ exit
>
> script done on Wed 04 May 2005 10:15:08 AM EDT
> ...indicates that "old-format" data eats up about half a megabyte of disk
> space total while "new-format" data eats up about 2 megabytes.
This depends on filesystem blocking. I calculate the actual data as
growing from 301,265 to 2,128,478 bytes, a factor of 7. The
old-format data produces tiny files that waste a lot of disk space due
to internal fragmentation; the new-format data produces larger files,
with less internal fragmentation. With Solaris 9 UFS, using 1 KiB
units, I get:
$ du -sk tz*/etc/zoneinfo
660 tz-0/etc/zoneinfo
2445 tz-1/etc/zoneinfo
or a 3.7x growth on that file system. I don't know what units your
"du" was generating output for, but I'm a bit surprised if it the
units are 1 KiB, as it's indicating that you're shoehorning
2,128,478 bytes into 1709 KiB. Are you using a compressed file system?
> Maximum total cost: 650 million computers * three twentieths of a cent:
> $975,000 (ulp!)
Yup. It'd be nice to shrink this a bit, if it's feasible. Perhaps go
to a varying-width format?
More information about the tz
mailing list