mount -a remounts tmpfs entries: bug or feature?
Rob Landley
rob at landley.net
Tue Nov 25 04:30:48 PST 2008
On Monday 24 November 2008 17:12:45 Denys Vlasenko wrote:
> On Monday 24 November 2008 14:59, Rob Landley wrote:
> > On Sunday 23 November 2008 09:01:35 Denys Vlasenko wrote:
> > > On Friday 07 November 2008 03:01, busybox at eehouse.org wrote:
> > > > busybox's implementation of mount differs from the standalone version
> >
> > Back in the 1.1 timeframe I rewrote it more or less from scratch,
> > something like 3 times, trying to get it to behave sanely. (Mount is
> > tricksy.)
> >
> > I see it's been fairly heavily edited then. Kind of horrible to read
> > through now, actually.
>
> NFS code was merged into mount.c. Somebody asked me to do it
> (it was before my maintainership). In hindsight, that was not
> such a good idea - now it's not readable.
>
> Other that NFS code - what places you don't like Rob?
An #ifdef for _dietlibc_, special casing rootfs, special casing shared subtree
flags, the mount_option_str separately quoting "\0" (small but ugly, you
wouldn't do it for \n)...
Some of it might just be that I've gotten used to looking at the toybox
infrastructure for things like option parsing, so constructs like:
#if ENABLE_BLAH
#define ifBlah (logic)
#else
#define ifBlah 0
#endif
Just look really wrong, although that one was probably my fault once upon a
time. Also, I don't #ifdef things out of my shared globals struct on the
theory that it's probably going to round up to page granularity anyway and
usually the extra source complexity doesn't even make it pull in fewer cache
lines, while with the simple way you can just go (CFG_WHATSIS && TT.whatsis)
in the code and let the compiler drop it out.
Lots of #if ENABLE in general that could be if (ENABLE) instead.
ENABLE_MOUNT_LABEL in resolve_mount_spec(), for example. In general a static
function should be inlineable and optimizable away with gcc 4.x. I'm looking
at verbose_mount() here, which only has two callers anyway so guarding it at
the call sites might be better, although the second call site has comments on
each _argument_, which is sick...
Speaking of verbose_mount(), in its second caller you have this:
rc = verbose_mount(/*source:*/ "", /*target:*/ argv[0],
/*type:*/ "", /*flags:*/ i, /*data:*/ "");
I would it would have been more readable as:
// verbose_mount(source, target, type, flags, data)
rc = verbose_mount ("", argv[0], "", i, "");
Or just assuming readers could look at the names of the variables in the
prototype...
> > And kind of broken in several places. Ooh, ick.
>
> Where? :(
Alas, I no longer remember what specifically that was in response to. (I was
still recovering from food poisoning and trying not to give myself a
headache.) I do remember that "several" in this instance was only 2 or 3, and
it was more "it won't do the right thing given these arguments" rather than a
security issue.
A quick glance at the code shows it's got lots of:
// WARNING. I am not sure this matches util-linux's
// behavior. It's possible util-linux does not
// take -o opts from mtab (takes only mount source).
I actually explicitly tested that sort of thing over a period of 3 months and
worked out what the correct behavior should _be_, and implemented it at the
time. (See "needed to write up a spec, didn't manage". My fault again.
Mount is only anything like simple and straightforward after you sit down and
study it for weeks, and the day or two I had set aside for writing down a
coherent explanation of how mounting should work wasn't nearly enough.)
> > Some filesystem types are per-instance, and some are shared with all
> > instances (most block backed ones, non-containerized versions of /proc
> > and /sys...).
> >
> > Did you ever read the thing I wrote about the four types of filesystems
> > (blocked back, ram backed, synthetic, and network)?
>
> No. Do you have an URL?
Nope, I think it might have been on timesys's website, but a lot of their old
content got closed up or moved around when their engineering department
disintegrated in late 2006. (It's all different people now...)
Here's a quick and dirty, unedited, stream of consciousness dump which sounds
to me like it's coming from Captain Obvious, but on the off chance it might
prove useful...
Mounting a filesystem just connects a filesystem driver to a directory, and
the driver can put anything it darn well pleases in that directory, but it
generally falls into four categories. Two of them have "backing store" and
two of them don't:
- block backed: the classic one everybody thinks of, and which mount is
actually _designed_ around. When Unix was young this was the only type of
filesystem. Your filesystem driver (specified with -t fsname) acts as a
lens to look at a specified block device through. This means you have
_two_ drivers involved in every read and write fro this filesystem: a
filesystem driver to interpret the format and a block driver to talk to
the hardware (which is implicit, it's providing the block device you
pointed the filesystem driver at). Note that ramdisks are block backed
filesystems; a ramdisk driver produces a block device out of a chunk of
memory, and then you format it and look at it through a filesystem driver
such as ext2.
Note that "block device" is a specific API, a randomly seekable range of
bytes with invariant size. Block backed filesystems _only_ talk to this
API, to the point that Loopback devices exist solely to provide a block
device API wrapper around normal files (which are perfectly capable of
providing a range of bytes, but they don't guarantee their length won't
change while you're using 'em. I believe attempting to truncate a file
which a loop device is attached to no longer panics the kernel, but I
haven't actually tried it).
- Network filesystems: the first complication, a filesystem that talks to
something _other_ than a block device for its backing store. (I think
these showed up sometime in the early 80's, but am not looking it up right
now.) You'd think they'd have been smart and made it talk to a character
device, but the BSD guys who added networking to Unix didn't give network
cards /dev entries, and then Sun hired them to produce NFS. (They also
inflicted vi upon us.) In general these suckers sort of act like they
talk to their backing store via a serial protocol that can fit through a
pipe (or character device, or socket, or...), but you've really got to
squint and _want_ to see it. In practice with NFS as one of the early
models for this (those who do not understand TCP are doomed to reinvent it
poorly via UDP; you wouldn't _think_ Bill Joy would have been in that
group, and yet...), the result was a mess of back-channels and
side-channels and weird overlapping incestuous knowledge of their backing
store. This group includes everything from samba through FUSE, and
outliers you can lump in here include jffs2 (since the backing store isn't
a normal block device; it _must_ be flash, which the driver has incestuous
knowledge of).
With network file systems, the "block device" field mount passes in gets
treated as an address of the backing store, but how to interpret that
address (a URL? Flash memory range? Cookie to look up in a database?)
is up to the driver. You can identify these because the filesystem
holds arbitrary files, it has a persistent backing store, and that backing
store is something _other_ than a normal block device. (The sane ones of
these at least still have a separate driver or program or something
handling the backing store, but they're not all cleanly separated. Case
in point, jffs2 again, which has code in it to erase flash banks, and thus
has to know about NAND vs NOR and thus
http://www.linuxdevices.com/news/NS7386103729.html is news and... ugh.
Clean orthogonal separation is a good thing. In the network filesystem
space, Linux finally gave us a universal API, and it's called FUSE.)
- ram backed: Now we get weird: filesystems that store arbitrary files, but
have no persistent backing store. Really this abuses the disk cache to
act like a filesystem, by plugging it up so the cached data has nowhere to
go and just stays in the cache instead. The implementation is very small
and very simple because the page and dentry caches already _exist_ as
common code in the VFS layer, so it only takes a ~100 line driver to
stub out a few things and give you a temporary filesystem.
Linus Torvalds invented this approach in April 2000:
http://kernel-traffic.org/kernel-traffic/kt20000424_64.html#1
Linus wanted ramfs kept simple so another variant (tmpfs) was invented
that allows size limits and swapping out the pages (ordinarily, swapping
out disk cache is counterproductive because you tell the filesystem driver
to get rid of the, page, and it writes it to backing store if need be and
then frees the memory since it can read it in again from backing store (or
in the case of synthetic filesystems generate new contents. Ramfs is
almost unique in that the data has nowhere to go and _can't_ be freed.
tmpfs shuffles cache pages into anonymous pages as if they belonged to a
process, and lets 'em get swapped out. It uses the swap partition as
a transient backing store, but it still goes away when you reboot.)
The next fun thing was rootfs, which is an instance of ramfs (alas, _not_
tmpfs) that gets auto-created at boot and populated from a cpio archive.
Remember how ramdisks are really block backed filesystems? That means
they need two device drivers (the block driver and the filesystem format
interpreting driver that turns the data in the block device into files and
directories and writes it all back again in the right places as
necessary). And to boot, they need to be statically linked. Plus the
data is copied fromm the block device into the page cache, so you have
_two_ copies of the data when the files are in use. I don't have to sell
this crowd on why this is cool, but it's also _simple_. The trick to
making the problems of turning a chunk of memory into a block device that
could be used as a block backed filesystem was _not_to_do_that_, and as
far as I can tell it simply hadn't occurred to anybody that you could get
_away_ with not doing that until Linus did it. Obvious in retrospect, of
course, but most good ideas are. :)
- synthetic filesystems. Here we really go off into the weeds, filesystem
drivers that don't even store arbitrary files. The files here are just a
way of communicating with the driver; writing to them provides information
for the driver to act upon to perform special effects, and reading from
them lets the driver to supply information back to userspace. The driver
isn't "storing" information like a normal filesystem, it's eating what you
write into it and hallucinating any darn contents it feels like in return.
Examples include sysfs, proc, debugfs, usbfs, the late unlamented devfs,
and more. The first synthetic filesystem was /proc, which was invented to
show information about processes (so ps didn't have to try to parse
/dev/mem to find internal kernel structures; yes, that's how it used to do
it, you may barf now). At the time it was the first and at one time only
synthetic filesystem, so every time somebody wanted to pass any _other_
info to userspace (like /proc/version) they added it to /proc until it
became a horrible compost heap. And then libfs was invented, as described
in http://lwn.net/Articles/57369/ and go read that instead because I'm
falling asleep.
All of the above was A) much more coherent in the version I actually bothered
to _edit_, B) actually the _introduction_ to a longer document (it's what I
wound up writing when I sat down to do a mount spec and decided I needed to
start with some background). To be honest, I don't really remember where it
went from there. (I vaguely recall a segue into mount parameters; which
includes flags, string flags, and the block device parameter itself. Was that
it? Dunno.)
I go sleep now.
Rob
More information about the busybox
mailing list