rm -r fails to delete entire hierarchy when path goes in and out of it

Gian Ntzik gian.ntzik08 at imperial.ac.uk
Thu Sep 18 16:20:06 UTC 2014


Isaac Dunham <ibid.ag at gmail.com> writes:

> On Wed, Sep 17, 2014 at 06:08:45PM -0400, Joshua Judson Rosen wrote:
>> GNU rm opens the top-level directory and uses unlinkat(), fstatat(), etc. to remove
>> files and subdirectories without having to resolve the paths for every
>> file/directory processed; so it runs into the problem (that some ".." link no
>> longer exists) only when it finally reaches that top-level directory.
>> It still refuses to operate on paths that _end_ in "/.." (or "/.").
>
> This is explicitly required by POSIX.
> Which implies that determining a canonical name should not be done.
>

The POSIX specification of rm
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/rm.html#tag_20_111

explicitly requires that rm fails on arguments ending in dot or
dot-dot. Nothing in the specification says or implies that there cannot
be dot or dot-dot at some point other than the end of the pathname
identifying a file system object to remove.

Note that rm is required to ignore the entries dot and dot-dot
when considering a directory (2.c in the Description).

A canonical starting path for the recursive traversal can still be
generated after the required checks on the initial argument.


To follow up on my initial report of the issue, the problem is not
specifically due to the use of dot-dots in the pathname. It lies in the
fact that the pathname used to identify a directory traverses
descendants of that directory. This can happen with symbolic links as
well.

Consider the following example, again tested on Ubuntu 14.04 64bit with
busybox-i686 binary version 1.21.1 (each line is given to busybox-i686)

$ mkdir -p /tmp/a/b/c
$ mkdir -p /tmp/a/e
$ ln -s /tmp /tmp/a/e/x
$ ls /tmp/a
b  e
$ ls /tmp/a/e/x/a
b  e
$ rm -r /tmp/a/e/x/a
rm: can't remove '/tmp/a/e/x/a/e': No such file or directory
rm: can't remove '/tmp/a/e/x/a': No such file or directory
$ ls /tmp/a
e

In fact, the actual behavior is determined by the order in which
directory entries are visited. If /tmp/a/e is visited first then the
results are different. As far as I know this defined by the underlying
file system implementation.

This means, that given no interference from some concurrently executing
process and some successfully resolving pathname, the behavior of
rm -r is non-deterministic, in the sense it cannot be precisely
predicted.


More information about the busybox mailing list