[Bug 14541] New: sed: s-command with "semi-special" delimiters get wrong behaviour

bugzilla at busybox.net bugzilla at busybox.net
Fri Jan 21 16:06:09 UTC 2022


https://bugs.busybox.net/show_bug.cgi?id=14541

            Bug ID: 14541
           Summary: sed: s-command with "semi-special" delimiters get
                    wrong behaviour
           Product: Busybox
           Version: 1.30.x
          Hardware: All
                OS: All
            Status: NEW
          Severity: major
          Priority: P5
         Component: Standard Compliance
          Assignee: unassigned at busybox.net
          Reporter: calestyo at scientia.org
                CC: busybox-cvs at busybox.net
  Target Milestone: ---

Hey.

I recently looked into behaviour of sed implementations when unusual delimiters
were used in:
- context addresses
- s-command


The outcome was, that in my opinion POSIX itself is pretty ambiguous with
respect to how it defines these and their parsing and semantics.
See more about that here:
- https://www.austingroupbugs.net/view.php?id=1550
  (not so interesting, just minor clarification proposals)
- https://www.austingroupbugs.net/view.php?id=1551
- https://www.austingroupbugs.net/view.php?id=1556

Especially issue 1551 gives several example where BusyBox' sed behaves
considerably different from GNU's sed (which by itself is however not
necessarily a bug, especially when even POSIX seems ambiguous).

But there is one case at least, where I think BusyBox' sed is definitely wrong

It's described in detail in:
- https://austingroupbugs.net/view.php?id=1551        (the "main" report and
there point (3) )
  https://austingroupbugs.net/view.php?id=1551#c5611  (some additions to point
"(3)")

and a tabular overview given in:
- https://austingroupbugs.net/view.php?id=1551#c5612




In short (using BREs as example):

Consider a delimiter is used, that is by itself not a special character (again
BREs as example) like  (the non-standard) '+' which is however special when
preceded by a '\' (that is itself not quoted).
Same example would work with the (standard) '(', though with different effects.

In BusyBox sed, the following happens:
   $ printf '%s\n' '9+' | busybox sed 's+9\++X+'
   X+
   $ printf '%s\n' '99+' | busybox sed 's+9\++X+'
   X+
   $ printf '%s\n' '999+' | busybox sed 's+9\++X+'
   X+

In GNU sed:
   $ printf '%s\n' '9+' | sed 's+9\++X+'
   X
   $ printf '%s\n' '99+' | sed 's+9\++X+'
   9X
   $ printf '%s\n' '999+' | sed 's+9\++X+'
   99X


In BREs, '+' alone is never special, and the '\+' here is clearly the escape of
the delimiter.
Regardless of what POSIX actually means with "literal" (see the discussion in
the tickets above), ... there is IMO no way to interpret it like what BusyBox
seems to do, which is:

- "un-delimiter" the '+' (because of it's preceding '\' )
- still keep the '\' with respect to the RE and give special meaning to the '+'

That's like "doubling" the effect of the '\'.




The https://austingroupbugs.net/view.php?id=1551#c5611 mentioned above,
describes how dangerous this actually is.
Cause even if one uses delimiters, which don't seem "special" in any way, like
with:
   $ printf '%s\n' '9' | busybox sed 'sw9\wwXw'
   9
   $ printf '%s\n' '99' | busybox sed 'sw9\wwXw'
   X
   $ printf '%s\n' '999' | busybox sed 'sw9\wwXw'
   X9

BusyBox behaviour get's odd, when that normal character has some special
meaning, when preceded by a '\' and when that sequence has than some
(non-standard) special meaning (like '\w').

GNU sed handles these like:
   $ printf '%s\n' '99' | sed 'sw9\wwXw'
   99
   $ printf '%s\n' '9w' | sed 'sw9\wwXw'
   X
so it effectively takes the \w ... makes it a non-delimiter (and removes the
escaping), being just left with the character 'w' which is by itself literal.



I'd guess that all this might also apply to context addresses in BusyBox sed.
Not sure whether EREs are also affect in some weird way.



Thanks,
Chris.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the busybox-cvs mailing list