[PATCH] diff: rewrite V2. -1005 bytes

Matheus Izvekov mizvekov at gmail.com
Thu Jan 14 23:17:23 UTC 2010


On 23:50 Thu 14 Jan     , Denys Vlasenko wrote:
> On Thursday 14 January 2010 15:25, Matheus Izvekov wrote:
> > On 04:14 Thu 14 Jan     , Denys Vlasenko wrote:
> > > On Thursday 14 January 2010 03:01, Matheus Izvekov wrote:
> > > > On 01:34 Thu 14 Jan     , Denys Vlasenko wrote:
> > > > > Attached patch cuts off ~250 more bytes:
> > > > > 
> > > > > function                                             old     new   delta
> > > > > read_token                                           176     183      +7
> > > > > diff_main                                           1265    1198     -67
> > > > > diff                                                3033    2847    -186
> > > > > ------------------------------------------------------------------------------
> > > > > (add/remove: 0/0 grow/shrink: 1/2 up/down: 7/-253)           Total: -246 bytes
> > > > 
> > > > Actually, you did notice EOF was a token, but then you reused the eof
> > > > flag in the 8th bit position to represent it.
> > > > Problem is, with flag -b set, the user does not see that token,
> > > > because it is considered a space, and it should treat all kinds
> > > > of spaces as if they were the same.
> > > > 
> > > > To sum it up, all flags should be moved up 1 bit and 0x1ff should be
> > > > used instead of 0xff as the mask.
> > > 
> > > Can you point out where exactly is the bug:
> > > 
> > > typedef int token_t;
> > > enum {
> > >         SHIFT_EOF = (sizeof(token_t)*8 - 8) - 1,
> >           remove this line
> > >         TOK_EOF   = 1 << 8,
> ...
> 
> This isn't answering my question.
> I meant: "explain where my code goes off the rails".
> 
> You don't explain it. Instead, you are showing how you'd change
> the code so that it is "becoming correct".
> This does not convince me that old code was incorrect.
> 
> Below is the diff between your code and mine.
> As far as I can see, it nearly literally translates
> usage of bool fields into the usage of bytes.
> So where's the bug?

The bug would be that when flag -b is set, and an EOF comes, the caller
would see token 0xffffff20 instead of 0x20.
The problem is reusing the same bit that represents that the file ended
to also represent information about the token itself.
For example, if you changed this part:

+                               tok &= ~0xff;
+                               tok |= TOK_SPACE + ' ';

to the below:

+                               tok &= ~0x1ff;
+                               tok |= TOK_SPACE + ' ';

Then the caller would never know the file ended, and would get stuck.
There is just no way around this, you have to reserve another bit for
the token value.

Also, this:
#define TOK2CHAR(t) (((t) << SHIFT_EOF) >> SHIFT_EOF)

Is uneeded, because the token value is only used to compare to outputs
emmited by this same function.
It only matters that each token has a different value, and that they
are assigned systematically.
So this could just become instead:
#define TOK2CHAR(t) (((t) & 0x1ff)


More information about the busybox mailing list