Problem

Until now there have been only two compression methods of significant importance: the traditional UNIX compress and GNU gzip. The latter is a much stronger compressor than the former, and gzipped files are very wide-spread on the Internet.

The sacred mission of Quasijarus Project is to maintain and promote True Pure UNIX(R). The GNU Project is the very antithesis of this, as GNU stands for "Not UNIX". Therefore, GNU is Quasijarus Project's arch-enemy, and any GNUisms or anything that even remotely resembles GNU must not come closer than a cannon shot to 4.3BSD-Quasijarus. Among other things, this means that I will never include gzip in the standard system, and I will by all means discourage users from installing it locally.

Nonetheless, it is highly desirable to have a way to use compression as strong as gzip's, as well as read and write gzipped files for exchanging information with the big bad corrupt GNU-infested world out there.

Solution

My first step in solving this problem was to incorporate Jean-loup Gailly's zlib into the standard system. zlib doesn't have 'g' in its name, and it has a BSD-compatible copyleft, not a GNU one, so this is acceptable. The addition of zlib makes it much easier to attack the problem of strong compression, since it supports the deflate compression method used by PKZIP, Info-Zip, gzip, and others.

My next step was to define the strong compressed file format. I wanted it to be different from gzip, so that I can say with a straight face that 4.3BSD-Quasijarus doesn't contain any GNU impurities, but very close to gzip, so that it's very easy to convert between the pure and gzip formats for information interchange purposes. Also using gzip's format is unacceptable for another reason. gzip stores some information in the header that the standard compress doesn't store anywhere, and thus gzip's format can never be adopted as standard for True Pure UNIX(R). gzip's notion of multi-member archives is unacceptable too.

I have chosen to use a format that is identical to gzip's in all respects except the header and explicitly bans multi-member archives. In this format the header consists of only two bytes (\037\241 magic), immediately followed by deflated data. The trailer is exactly the same as gzip's, i.e., CRC-32 and the uncompressed length. Since this format is different from gzip's (has a different magic number), it is pure and acceptable. Since it differs from single-member gzip only in the header, one can convert between the two at lightning speed: only the header needs to be changed, and there is no compression, decompression, CRC calculation, or any other computation involved in the conversion.

Next, I modified compress(1) to support this format in addition to the traditional one. For compression the strong compression format is enabled by -s. For decompression the input format is sensed automatically. Since the compress -s header stores exactly the same amount of information as the traditional compress one, all compress(1) semantics stays the same, i.e., the suffix is .Z, all compress(1) features work the same, etc.

Finally, I wrote a program to convert between compress -s and gzip formats. This program is named gzcompat, and it always reads from stdin and writes to stdout. It automatically senses the input format on stdin and produces the opposite format on stdout, so it can be used on either end of a compression-involving pipeline. Since this program's name clearly indicates that it's for backward compatibility only, it can be and is included in the standard system.

zlib, the new compress, and gzcompat first appeared in the 4.3BSD-Quasijarus0a release.

The Quasijarus Consortium has adopted the compress -s format as the standard for all on-line software distribution. All BSD tape images and other distributions from now on will be in this format. The strong compression code is available as a separate package in the BSD distribution archive (it is itself uncompressed). Downloading, compiling, and installing it is the first step in upgrading an earlier 4.2BSD or 4.3BSD system to 4.3BSD-Quasijarus.

@(#)compress.html 1.1 03/12/18

Michael Sokolov
msokolov@ivan.Harhan.ORG