[tz] Inappropriate project direction

Steffen Nurpmeso steffen at sdaoden.eu
Mon Feb 12 17:55:03 UTC 2018


Bryan Smith <b.j.smith at ieee.org> wrote:
 |Not weighing in on the overall debate, just adding some technical bits ...
 |
 |Steffen Nurpmeso <steffen at sdaoden.eu> wrote:
 |> Stephen Colebourne <scolebourne at joda.org> wrote:
 |>|3) Move to a niche archive format
 |>|
 |>|The main archive format was changed from the well-known and widely
 |>|supported gz to the niche lz format. There was and is no justification
 |>|for using a niche format on a project as important as this one.
 |>|
 |>|- downstream consumers need to find and use the niche format
 |>|- Windows does not have proper support for the niche format
 |>
 |> Yes i do not like that either, given that xz is far more common
 |> and zstd is seeing more usage over time.  It may be that it has
 |> archive format advantages over the former.
 |
 |Ignoring the compressor and/or archiver aspect**,
 | and ...
 |Ignore whether adding an LZ77-based compressor (gzip) was justified ...
 |
 |If one is going to add an LZMA container-based compressor (e.g., 7z,
 |xz, lzip, et al), if portability and longevity are paramount, then
 |lzip is probably best.  Although xz finally added some POSIX meta that
 |7z lacked, it's still not something I'd recommend when exchanging
 |between various POSIX (let alone non-POSIX) systems.
 |
 |E.g., I only use xz with tar when I'm sending data to be used on the
 |same POSIX architecture/platform, and not for long-term retention.
 |lzip is really the first real attempt -- not not saying it is, but it
 |is the first, real attempt -- at an universal POSIX archive standard.

I am very much surprised by the latter statement.
There is pax, now with three uncompressed formats (ustar, cpio..).
And below.

I have no idea on the standardizing issue, just noting that zstd
is the only good outcome of an american company that i know about
and the first thing that i use.  It is BSD licensed and has an
additional patent grant, and it is now used by e.g. the FreeBSD
kernel because it can be very fast or compress very good.  Though
not so good as the mentioned.

Longevity is a real problem.  I think only data duplication and
overall checksumming will do that.  Darkness and other defined
environmental parameters; e.g., in Germany we have the
«Barbarastollen» underground archive (with false Wikipedia if
i trust the german Wikipedia, as it is buried only 200 metres
under earth and thus soon subject of new warheads me thinks).

I have never looked and compared header and data chunk
checksumming of any of the compressors etc. that is mentioned
here.  I am sure you need overall plus chunk including header
checksumming plus data repititions in order to get something
useful.  Correcting 1-bit errors is possibly a theoretical issue.
Multiple copies on distinct places looks better.

That makes me realize that the american nuclear missile launch
system still used floppy disks for the keys last year, unless this
was a duck or what is the name for fake news over there.
All very volatile.  If i recall, bitsavers.org have also lost
funding or priority, or both..

 |Which brings me to ...
 |
 |> That is a GNU project decision and Paul Eggert is one of _the_
 |> GNU contributors.
 |
 |There is still a heavy copyleft (e.g., GPL) v. non-copyleft FLOSS
 |debate that rages on, with various people putting various values --
 |from nothing to everything -- on that.  lzip -- or more problematic,
 |it's lzlib -- is GPLv2 -- not even LGPL -- which has some asterisks
 |for commercial developers.  I don't know the state of public domain
 |lzip (pdlzip), but it seems to be good enough for any decompression
 |(and most general compression).
 |
 |Lastly ...
 |
 |> Also, .gz is supported, it is just the big all-in-one ball which \
 |> uses that format.
 |
 |I don't see projects dropping LZ77-based compressors anytime soon.  If
 |they want to add a 2nd format option, then that's going to happen, and
 |debates will rage.
 |
 |However, I also expect the GNU Project or pro-copyleft maintainers to
 |push a GNU solution over others.  So lzip isn't surprising.
 |
 |- bjs
 |
 |**P.S.  The other bit is that lzip -- like everything from legacy
 |LZ77-based PKZip to LZMA2 7-Zip -- actually focuses on compressing a
 |file into an archive, instead of being a streaming compressor for
 |either files or an archive.  Yes, it's not very UNIX-like to do that,
 |but there are too many advantages with it to ignore.  It's a common
 |approach on Windows, since Windows never provided a native archiving
 |(w/o compression) approach, unlike UNIX system.
 |
 |E.g., years ago, when linear access (tape) was common, I used to use
 |afio for per-file compression, over a compressed cpio stream (of files
 |in the archive), to backup (to tape).  It offered too many advantages,
 |all while being cpio compatible (if cpio was used to extract, then it
 |was a stream of compressed files).  Several of these advantages still
 |exist, like better handling of multi-volume archives (no different
 |than in the days of tape).
 |
 |I was hopeful the "Austin Group" of IEEE POSIX + XOpen SuS
 |lineage/workgroup would have addressed this for the 21st century, or
 |at least defined an archiver that could do compression on a per-file
 |basis, even if the compressor choice was still external.  But instead,
 |we got 'pax' which just 'punted' (I'll use the popular, American
 |reference) the problem, and why 'tar' is still popular.

I do not like pax command line (either).  Puuh, yes, this is very
dense.  Well, a series of competing stream compressors to be used
on an archive, not like zip which is an all-in-one and now offers
compression equal to or bettern than ARJ or RAR whick were better
in the 90s when i was using Windows/4DOS.

Compression of an archive as a whole will always be better than
compressing individual members, sufficient window size provided.
Of course.  The nice thing with the UNIX approach is that you can
easily create anything from a script to a C program to create what
you want with POSIX standardized components, if compress a.k.a.
Lempel-Ziv / Lempel-Ziv-Welch is good enough.  It likely is not
today.  The pax ustar interchange subformat has a simple checksum.
That as well as cksum(1) as such would benefit from better
algorithms.  Note that FreeBSD just now introduces CRC-32
checksums to the UFS file system, for an upcoming release.

Maybe the future will bring an extension to compress or a new set
of tools to achieve better compression.  But then i personally,
just in case it mattered, would vote for a software that scales to
needs from fast to small as good as the one i mentioned does, from
almost as fast as lz4 (while compressing better and thus also
requiring more energe) to compressing almost as good as xz or
lzip, ending up as an universal tool, which is preferable for
a standard in my eyes.

Have a nice day.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


More information about the tz mailing list