Proposition: use a hashtable instead of bsearch to locate applets

Fri May 30 18:19:29 UTC 2014


Am 30.05.2014 20:07, schrieb Rich Felker:
> On Fri, May 30, 2014 at 12:13:41PM +0100, Laurent Bercot wrote:
>>
>> On 05/30/2014 10:16 AM, Bartosz Gołaszewski wrote:
>>> I've checked the times just by looking up all the applets in a loop
>>> and measuring the time using gettimeofday() - the results are: ~220
>>> microseconds for bsearch and ~150 microseconds for hashtable on my
>>> linux laptop. Is it significant? I think so. Is it noticeable?
>>> Probably not. The idea came to me, when thinking about unifying the
>>> hashtables used in busybox. I guess you're right and it isn't really
>>> worth the size increase.
>>
>>  I think it would definitely be a worthwhile improvement if all busybox
>> is doing was looking up the applets in a loop. ;)
>>  A more realistic test, however, would be to fork+exec a trivial applet
>> (true, for instance) in a loop, and measure the times with bsearch vs.
>> with the hash table. And I'm certain you'll find that the difference
>> becomes quite insignificant.
> 
> The lowest time I've ever seen for exec (not even counting fork;
> measured from immediately before the exec syscall to entry into main)
> is over 200µs, and can easily exceed 1ms with dynamic linking. So even
> if the program did nothing but applet lookup and exit, I think we'd be
> looking at a performance increase of ~30% at best.
> 
> Realistically, as soon as you throw in some actual work and other
> syscalls that actually do something, I suspect the difference to be
> less than 5%.
> 
> If there is a desire to change the way applet lookup is done, I would
> suggest trying to make it optimal in both size and performance. 150µs
> is not impressive at all. A perfect hash constructed at build time for
> the list of busybox applet-name strings should make it possible to do
> the applet lookup in ~1µs with trivial code/table size (all constant
> in .text).
> 

We should keep in mind that busybox is also about size reduction, so we need
a more flexible code that can be used on other places as well (e.g. shell).
>From my experience i would not expect to much most times the hogs are somewhere
else.  Glibc has some hash functions (no idea, if POSIX or GNU or ...)
using them should not make a notable size difference.

re,
 wh