shouldn't tar quit at end of archive?

Glenn L. McGrath bug1 at ihug.co.nz
Fri Mar 31 13:56:24 PST 2006


On Fri, 31 Mar 2006 15:07:29 -0500
Paul Fox <pgf at brightstareng.com> wrote:

> i think busybox tar is over-aggressive about consuming its input
> data.
> 
> after it detects the end of the tar archive, tar continues
> reading its input until EOF is reached. 
<snip>
> second, it causes tar to exit when it detects the first occurence
> of a null header in the archive, rather than waiting for the
> second such header.  not all tar archives are terminated with two
> null headers.


hmm, i thought it one of tar's "features" was that you can have
random multiples of zero'ed 512 bytes blocks in the middle of archives.

Something about making tar drives faster, (a faster snail) by aligning
each header to 4k or whatever boundaries.

I just did a test with GNU tar (and star), i appended two tar archives
which resulted in a dozen or see zero'ed blocks in between them. I
expecting GNU tar to be able to extract both of them once appended, it
only see's the first tar archive.

I may have misjudged tars behaviour in relation to multiple internal
zero'ed headers.


> ironically, i was the last person to touch the lines i'm now
> changing, when i commited the patch that was submitted with bug
> 262.  before that patch, tar also attempted to read all of
> the data in the file, but if it happened that there was non-zero
> data after the end-of-archive had been detected, tar would resume
> trying to interpret that data.  the fix for 262 addressed that
> issue, at the expense of requiring the doubled null block at the
> end.
> 
> one thing i don't understand about the current code is the comment
> regarding emptying the gz or bz2 pipe.  when or why would this be
> an issue?

gzip and bzip count the number of bytes that are in the archive, so
when you get to the trailer it will ocompare the number of bytes we
read (or intentionaly skipped over) and compare them to the number
stored in the trailer of the comrpessor.

Also, we need to read all data to compute the checksum, if we dont read
all the zero headers the checksum will fail.

Unfortunately its easy to walk into problems with tar.


Glenn


More information about the busybox mailing list