[BusyBox] Where I left off with bzip compression...

Rob Landley rob at landley.net
Fri Feb 20 09:29:58 UTC 2004


Okay, the following produces archives that both jseward's bunzip and my bunzip 
correctly decompress.  They don't exactly match jseward's version due to the 
iterative algorithm that selects better huffman coding tables not quite 
agreeing on the definition of "better" in a couple cases (all sorts of weird 
butterfly effects; deterministic on a given set of input, but change the 
input slightly and the result might be a smaller file for an input with 
seemingly _less_ redundancy.).

Sometimes my version produces a smaller archive than his, and sometimes his 
produces a smaller one than mine.  I spent a month trying to consistently 
beat him, and can't.  (This code may still be taking more interations at 
huffman code generation, actually.  Didn't help, it just made the resulting 
file _larger_.  Still a valid archive that decompresses with either version 
of bunzip to match the original data, though.)

This evening I intended to adjust this to match jseward's version's output 
exactly (which involves doing _stupid_ things in a couple of places, notably 
inserting data into and them immediately removing it back out of the 
heapsort, which perturbs the heap just enough that later equavliant entries 
come out in a different order, of course, and yes having ties come out in a 
different order in the sorting algorithm makes the resulting archive size 
different), but wont have time before my road trip.  I'll do it after I come 
to rest in a week or so.

bzip.c is the new compression engine.  crap.c is the string sorting stuff 
(ripped straight from jseward's version, I intend to clean it up but haven't 
yet.)

It compiles with "gcc -Os bzip.c crap.c -o bzip", then do "cat filename | 
./a.out > filename.bz2" to test it.

Any line in bzip.c starting with // is debug crap you can just delete from the 
sources, it's currently commented out anyway...

Back in a few days...

Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bzip.c
Type: text/x-csrc
Size: 16457 bytes
Desc: not available
Url : http://lists.busybox.net/pipermail/busybox/attachments/20040220/668ff64e/attachment.c 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: crap.c
Type: text/x-csrc
Size: 29817 bytes
Desc: not available
Url : http://lists.busybox.net/pipermail/busybox/attachments/20040220/668ff64e/attachment-0001.c 


More information about the busybox mailing list