[PATCH] unify itoa

Rob Landley rob at landley.net
Tue Jul 11 19:42:38 UTC 2006


On Tuesday 11 July 2006 10:44 am, Denis Vlasenko wrote:
> On Tuesday 11 July 2006 02:47, Rob Landley wrote:
> > > The attached test program shows that snprintf is 2.5 times slower
> > > than utoa. Operations done per second:
> > > 	sprintf='12345678' count=27346
> > > 	utoa_to_buf='12345678' count=69880
> > > 	utoa='12345678' count=65934
> > >
> > > A few days ago I ran "hexdump | grep" over entire hard drive.
> > > Took a long time. I wish hexdump was optimized a bit better...
> >
> > I haven't really looked closely at hexdump, but what does a base 10 utoa
> > have to do with a base 16 hexdump?
>
> This is a real-life example of a program which does a lot of
> int -> string conversions, so that it actually mattered for me recently.
>
> hexdump is of course using base 16 conversion. tcpdump could be
> an example of a proc which does base 10 conversions, but I didn't
> use it lately.

Understood, but if I wanted to speed up hexdump | grep I'd profile both 
hexdump and grep, and probably find out that our grep is dog slow (there's 
the pending patch to speed up get_chomped_line_from_file() in my todo heap, 
which is waiting for me to get back around to the sed NUL bug and make sure 
that they play nice together).

> > The lookup table is a classic size/speed tradeoff, something BusyBox has
> > traditionally come down on the "size" side of.  Doing a write() on the
> > result is going to be way slower than the actual conversion anyway.  What
> > actual bottleneck are you seeing?
> >
> > I definitely agree the bounds checking needs to be added, though.  I'll
> > do that now...
>
> The version in current svn produces _most significant_ digits
> if buffer is too small. IOW: utoa_to_buf_rob(100223,buf,3)=="10".
> The usual practuce is to produce "23".

If the buffer is too small it essentially produces garbage, we just don't want 
it to overflow and stomp memory it doesn't own.

> How about this modified version?
...
> It's even a bit smaller. gcc is smart enough to know div instruction
> produces the remainder also, and we get n %= i for free.

Very nice.  My only concern is on which platforms does division automatically 
produce a remainder?  I expect division to be supported everywhere, and to 
perform fairly well.  I have a vague recollection that in some cases 
environments modulus is overlooked and (as a result) kind of evil, but I 
don't remember specifics.

> I still dislike the speed, i/=10 eats too much CPU...

It's a tight L1 cache-hot loop.  In a pipeline like hexdump | grep it's 
unlikely to be even 1% of the total CPU overhead.  What platform have you 
profiled this on?  I don't believe integer division has boiled down to a 
subtraction loop on any interesting processor for a quarter-century (I 
remember that multiplication could be done in something like 6 clock cycles 
without being _that_ clever, although this is a vague recollection from my 
undergraduate days circa 1993), but then I'm not entirely certain how some of 
the really low-transistor-budget embedded processors (like ARM) are 
implemented.  I doubt they're _stupid_, though.  Many of them have floating 
point units, there's no reason for them to stint on integer arithmetic 
operations...

Could I have some benchmark numbers on the system you're worried about, 
please?  I already mentioned I tried the following here:

#include <stdio.h>

// 0x4f bytes
void utoa_to_buf(unsigned n, char *buf, int buflen)
{
        int i, out = 0;
        for (i=1000000000; i; i/=10) {
                int res = n/i;

                if (res || out || i == 1) {
                        out++;
                        n -= res*i;
                        *buf++ = '0' + res;
                }
        }
        *buf = 0;
}

int main(int argc, char *argv[])
{
  int i;
  char blah[16];

  for(i=0;i<1000000;i++) {
    utoa_to_buf(i, blah, 16);
  }
}

Ran that under "time", and got 0.707 seconds.  1000000/.707 is 1.4 million 
according to my handy dandy python interpreter (the best calculator Linux 
ever shipped with).  Adding bounds checking didn't slow it down _that_ much.

What do you get on the system you're worried about the performance of?

Rob
-- 
Never bet against the cheap plastic solution.



More information about the busybox mailing list