Requirements for adding new tools

Thu Mar 23 08:20:07 PST 2006

On Thursday 23 March 2006 8:01 am, Dag Wieers wrote:
> Hi,
>
> I'm very interested to have tcptraceroute functionality in busybox in
> order to have bootable rescue images with the capability to check if
> firewalls are configured correctly and trace where the problems are.
>
> Is this a reasonable request or somehow (technically or
> philosophically) impossible or undesirable ?

Sure, that sounds reasonable.  Go for it.  We could use a traceroute.

> What is necessary to consider something like this for inclusion ?

People keep asking me this.  Ok, I'll try to articulate something.

People have to make hard choices when they build small systems.  What do they 
want to spend their byte budget on?  The goal of busybox is to give the best 
"bang for the byte", not to avoid tempting users with extra features.  So 
saying "including this would make busybox too big" when you can just switch 
it off in the config, is not a problem as long as you get a great bang for 
the bck.

What is a turn-off is complexity and external dependencies.  When looking at 
new features first ask "how complex is this"?  (Both to implement and to 
understand.)  The simpler it is, the more interesting it is.  We also focus 
on things that have no external dependencies.  We don't link with X11 or with 
curses.  We're moving towards making libcrypt optional, and try to reduce 
dependencies on libm.  We don't link against zlib or libbzip2, since those 
are built in as well.  Ideally, we want to get away with linking against libc 
and our shared code in libb, and that's it.  Library dependencies bloat 
busybox really fast, in a way that doesn't show up in the executable size but 
is just as real.

Next, do you want to solve a problem or do you want to include an 
implementation?  To me, the worst kind of busybox applets are the external 
packages that are nailed to the side of busybox and then the version in 
busybox is repeatedly resynced against those external packages, which is the 
_opposite_ of cleaning them up into a small, simple, well-integrated part of 
busybox that does as little as possible and shares as much code as possible 
with the other applets.

In general, we shouldn't _care_ what some other implementation is doing 
(modulo security fixes which may or may not apply to us if we _are_ a 
separate implementation).  We should instead care about the problem the 
applet solves, and the specification for the expected behavior (if any).  
Learn from other implementations, sure, but if you want to use the other 
package why are you involving busybox?  (And you will always need separate 
packages to build your system, we're not rolling libc or the kernel into 
busybox.)

I generally don't look at the source code of another package when working on a 
busybox version.  I read the man page, and the specification (if any), and I 
test the heck out of it to see how it treats corner cases (and try to turn 
that into an automated regression test suite I can run against both 
versions).

There are of course exceptions to this, like bunzip2, where the implementation 
was the only specification for the problem.  In that case I did track down 
papers on the burrows-wheeler transform and move-to-front encoding (I already 
knew huffman coding), and try to derive something like a spec I could then 
implement.  I tore the old code apart, rearranged it into a straight line, 
deleted everything I could until I just had pseudo-code, made lots and lots 
of comments, and then wrote fresh code in a new window from the pseudo-code 
and those comments.  I didn't have anything that actually _compiled_ for 
weeks.

When you expect to be able to shrink things more than 50%, you don't _want_ to 
use the old code as a starting point.  The old code if full of distracting 
things you don't have to do, and what you want to focus on is what's the 
minimal set of things this code _does_ need to do?  Which is defined by the 
task, not by the implementation.  If this sounds like a bad approach to doing 
a busybox version, think twice about whether the applet's a good candidate 
for busybox.

Dropbear is an example of something we probably don't want to merge.  To start 
with, the current version is just fine as is.  The current package is well 
maintained, and the maintainer has shown no interest in discontinuing the 
existing version in favor of merging with busybox.  It's complex enough that 
we couldn't easily maintain our own fork, and marshalling patches back and 
forth between two versions is unnecessary work anyway.  Dropbear is already 
built with size in mind, so we're not likely to do a significantly smaller 
implementation, yet it's fairly large (100k) and depends on a lot of math 
libraries which most of the rest of busybox wouldn't benefit from.

If you come up with a patch to shrink dropbear, the dropbear maintainer will 
be happy to accept it.  If busybox needs some encryption logic (such as https 
support for httpd), rather than trying to glue dropbear code into busybox why 
not ask the dropbear maintainer if he can provide an https filter mode we can 
run data through the way tar uses gzip?  Work with dropbear rather than 
trying to include it.

Right now, I plan to throw out all four shells in busybox and write one common 
implementation.  I'm looking closely at lash (the smallest), and looking a 
bit at hush (to learn from its mistakes: that was an earlier unification 
attempt that stalled when it couldn't be as small as lash and thus couldn't 
replace it).  I may nor may not use code from either, haven't decided yet, 
but my implementation will "be informed by" both since both were trying for 
small and simple.  I'll probably take a quick glance through msh as well to 
see what features it has, but I'm not sure I'll look at ash at all and have 
_no_ plans to use any of its code.

What do I plan to use as implementation guides?  The spec and test cases I 
write up from lash, hush, and maybe msh.  The bash man page.  The sysv3 
sections on the shell.  That "csh considered harmful" paper I once read.  
Every existing shell script I can find to use as a test case...

It's a darn big project, but that's what's needed to do it _right_.

Rob
-- 
Never bet against the cheap plastic solution.