Requirements for adding new tools
Rob Landley
rob at landley.net
Thu Mar 23 08:20:07 PST 2006
On Thursday 23 March 2006 8:01 am, Dag Wieers wrote:
> Hi,
>
> I'm very interested to have tcptraceroute functionality in busybox in
> order to have bootable rescue images with the capability to check if
> firewalls are configured correctly and trace where the problems are.
>
> Is this a reasonable request or somehow (technically or
> philosophically) impossible or undesirable ?
Sure, that sounds reasonable. Go for it. We could use a traceroute.
> What is necessary to consider something like this for inclusion ?
People keep asking me this. Ok, I'll try to articulate something.
People have to make hard choices when they build small systems. What do they
want to spend their byte budget on? The goal of busybox is to give the best
"bang for the byte", not to avoid tempting users with extra features. So
saying "including this would make busybox too big" when you can just switch
it off in the config, is not a problem as long as you get a great bang for
the bck.
What is a turn-off is complexity and external dependencies. When looking at
new features first ask "how complex is this"? (Both to implement and to
understand.) The simpler it is, the more interesting it is. We also focus
on things that have no external dependencies. We don't link with X11 or with
curses. We're moving towards making libcrypt optional, and try to reduce
dependencies on libm. We don't link against zlib or libbzip2, since those
are built in as well. Ideally, we want to get away with linking against libc
and our shared code in libb, and that's it. Library dependencies bloat
busybox really fast, in a way that doesn't show up in the executable size but
is just as real.
Next, do you want to solve a problem or do you want to include an
implementation? To me, the worst kind of busybox applets are the external
packages that are nailed to the side of busybox and then the version in
busybox is repeatedly resynced against those external packages, which is the
_opposite_ of cleaning them up into a small, simple, well-integrated part of
busybox that does as little as possible and shares as much code as possible
with the other applets.
In general, we shouldn't _care_ what some other implementation is doing
(modulo security fixes which may or may not apply to us if we _are_ a
separate implementation). We should instead care about the problem the
applet solves, and the specification for the expected behavior (if any).
Learn from other implementations, sure, but if you want to use the other
package why are you involving busybox? (And you will always need separate
packages to build your system, we're not rolling libc or the kernel into
busybox.)
I generally don't look at the source code of another package when working on a
busybox version. I read the man page, and the specification (if any), and I
test the heck out of it to see how it treats corner cases (and try to turn
that into an automated regression test suite I can run against both
versions).
There are of course exceptions to this, like bunzip2, where the implementation
was the only specification for the problem. In that case I did track down
papers on the burrows-wheeler transform and move-to-front encoding (I already
knew huffman coding), and try to derive something like a spec I could then
implement. I tore the old code apart, rearranged it into a straight line,
deleted everything I could until I just had pseudo-code, made lots and lots
of comments, and then wrote fresh code in a new window from the pseudo-code
and those comments. I didn't have anything that actually _compiled_ for
weeks.
When you expect to be able to shrink things more than 50%, you don't _want_ to
use the old code as a starting point. The old code if full of distracting
things you don't have to do, and what you want to focus on is what's the
minimal set of things this code _does_ need to do? Which is defined by the
task, not by the implementation. If this sounds like a bad approach to doing
a busybox version, think twice about whether the applet's a good candidate
for busybox.
Dropbear is an example of something we probably don't want to merge. To start
with, the current version is just fine as is. The current package is well
maintained, and the maintainer has shown no interest in discontinuing the
existing version in favor of merging with busybox. It's complex enough that
we couldn't easily maintain our own fork, and marshalling patches back and
forth between two versions is unnecessary work anyway. Dropbear is already
built with size in mind, so we're not likely to do a significantly smaller
implementation, yet it's fairly large (100k) and depends on a lot of math
libraries which most of the rest of busybox wouldn't benefit from.
If you come up with a patch to shrink dropbear, the dropbear maintainer will
be happy to accept it. If busybox needs some encryption logic (such as https
support for httpd), rather than trying to glue dropbear code into busybox why
not ask the dropbear maintainer if he can provide an https filter mode we can
run data through the way tar uses gzip? Work with dropbear rather than
trying to include it.
Right now, I plan to throw out all four shells in busybox and write one common
implementation. I'm looking closely at lash (the smallest), and looking a
bit at hush (to learn from its mistakes: that was an earlier unification
attempt that stalled when it couldn't be as small as lash and thus couldn't
replace it). I may nor may not use code from either, haven't decided yet,
but my implementation will "be informed by" both since both were trying for
small and simple. I'll probably take a quick glance through msh as well to
see what features it has, but I'm not sure I'll look at ash at all and have
_no_ plans to use any of its code.
What do I plan to use as implementation guides? The spec and test cases I
write up from lash, hush, and maybe msh. The bash man page. The sysv3
sections on the shell. That "csh considered harmful" paper I once read.
Every existing shell script I can find to use as a test case...
It's a darn big project, but that's what's needed to do it _right_.
Rob
--
Never bet against the cheap plastic solution.
More information about the busybox
mailing list