sed behaving strangely when -n and the delete command are combined
Denys Vlasenko
vda.linux at googlemail.com
Fri Jul 10 09:54:39 UTC 2009
On Fri, Jul 10, 2009 at 9:23 AM, Rob Landley<rob at landley.net> wrote:
> So the first page of the patch is still nothing but white noise...
Okay... Attached 5.patch is the meat of the change, and 6.patch is
white noise...
>>@@ -893,35 +894,51 @@ static void process_files(void)
>>
>> || (!sed_cmd->beg_line && !sed_cmd->end_line
>>
>> && !sed_cmd->beg_match && !sed_cmd->end_match)
>> /* Or did we match the start of a numerical range? */
>>- || (sed_cmd->beg_line > 0 && (sed_cmd->beg_line == linenum))
>>+ || (sed_cmd->beg_line > 0 && (sed_cmd->beg_line == linenum
>>+ /* "shadowed beginning" case: "1d;1,ENDp" - p
>> still matches at line 2 + * even though 1d
>> skipped line 1 which is a start line for p */
>
> It's somewhat disturbing to have a comment inside a nested parenthetical. The
> indentation gives no hints so it's a bit hard to track the structure here.
What?
/* Or did we match the start of a numerical range? */
|| (sed_cmd->beg_line > 0 &&
(sed_cmd->beg_line == linenum
/* "shadowed
beginning" case: "1d;1,ENDp" - p still matches at line 2
* even though
1d skipped line 1 which is a start line for p */
||
(sed_cmd->end_line && sed_cmd->beg_line < linenum && sed_cmd->end_line
>= linenum)
||
(sed_cmd->end_match && sed_cmd->beg_line < linenum)
)
)
> I note the comment could just be one line:
>
> /* "1d;1,ENDp" should start matching at line 2. */
Are you serious? I mean, do you really complain that I do not write
comments in exactly the way you like?
>> || (sed_cmd->end_line && sed_cmd->beg_line < linenum && sed_cmd->end_line
>> >= linenum)
>
> This is more complicated than it needs to be.
>
> If you're going to slavishly copy what gnu does, you might want to run a few
> more tests to see what that actually is. Specifically:
>
> echo -e "one\ntwo\nthree\nfour" | sed -ne '2d;2,1p'
> three
>
> So the logic can be simplified: when your start is a number it triggers on the
> next line >= to that number. Just change the match start to >=, and then
> disable numerical starting matches are disabled after the first hit.
I did not think about this example. Let me try... aha, now works:
# echo -ne "first\nsecond\nthird\nfourth\n" | sed -n '1d;2,1p'
second
# echo -ne "first\nsecond\nthird\nfourth\n" | ./busybox sed -n '1d;2,1p'
second
But this breaks another case:
# echo -ne "first\nsecond\nthird\nfourth\n" | sed -n '1d;1,3p'
second
third
# echo -ne "first\nsecond\nthird\nfourth\n" | ./busybox sed -n '1d;1,3p'
(nothing)
See? You waste your time writing up agressive emails
but you make mistakes too.
> Beyond
> that, you want to fall through like it was already doing to the normal match
> ending logic, which has had <= linenum since my days.
>
> So you can get closer to the gnu behavior by changing the code _less_. Which
Nice theory, until you actually try to implement it instead of just describing
how easy it is and how idiotic someone else's code is.
> means you don't have to avoid snapshotting the line and continuing on to the
> "should we end the match now" logic.
>
> Also, why do you discard the comment about snapshotting the line? The
- /* Snapshot the value */
matched = sed_cmd->in_match;
Because comment says that we are saving a value. Well.
That's kind of obvious. The comment should say why we do that
in order to be useful.
>>+ /* Is this line the end of the current match? */
>> if (matched) {
>>- sed_cmd->in_match = !(
>>+ int n = (
>> /* has the ending line come, or is this a single address
>> command? */
>> - (sed_cmd->end_line ?
>>+ sed_cmd->end_line ?
>> sed_cmd->end_line == -1 ?
>> !next_line
>>
>> : (sed_cmd->end_line <= linenum)
>> :
>> : !sed_cmd->end_match
>>
>>- )
>>+ );
>
> I think this is more random cosmetic changes mixed in with the functional
> ones? Except for not setting in_match immediately, which I think is actually
> a bad thing.
>
>>+ if (!n) {
>> /* or does this line matches our last address regex */
>>- || (sed_cmd->end_match && old_matched
>>+ n = (sed_cmd->end_match
>>+ && old_matched
>> && (regexec(sed_cmd->end_match,
>>- pattern_space, 0, NULL, 0) == 0))
>>+ pattern_space, 0, NULL, 0) == 0)
>> );
>>+ if (n && sed_cmd->beg_line > 0) {
>>+ /* Once matched, "n,regex" range is dead, disabling it
>> */ + regfree(sed_cmd->end_match);
>>+ free(sed_cmd->end_match);
>>+ sed_cmd->end_match = NULL;
>>+ }
>>+ }
>>+ sed_cmd->in_match = !n;
>> }
>
> Ok, buried in the random wordwrapping changes, you're now calling regfree in
> the middle of a sed, which seems way overkill.
sed_cmd->in_match = !(complex multi line expression)
was already somewhat hard to understand. When I needed to throw in
even more into it, it got really bad. So I split it into
separate statements:
if (matched) {
int n = (
/* has the ending line come, or is
this a single address command? */
sed_cmd->end_line ?
sed_cmd->end_line == -1 ?
!next_line
: (sed_cmd->end_line <= linenum)
: !sed_cmd->end_match
);
if (!n) {
/* or does this line matches our last
address regex */
n = (sed_cmd->end_match
&& old_matched
&& (regexec(sed_cmd->end_match,
pattern_space, 0,
NULL, 0) == 0)
);
if (n && sed_cmd->beg_line > 0) {
/* Once matched, "n,regex"
range is dead, disabling it */
regfree(sed_cmd->end_match);
free(sed_cmd->end_match);
sed_cmd->end_match = NULL;
}
}
sed_cmd->in_match = !n;
}
I do not see what's wrong here from the style POV.
> If we're in a match, and beg_line > 0, set beg_line = -1. That's all you need
> to do to prevent the match from triggering again after we end it. You can
> even do that test every time, it's small and simple and really cheap, happens
> totally in L1 cache, and more or less parallelizes away on a modern processor
> with multiple execution cores.
This makes sense. Except for now it can erroneously match here:
/* Or did we match last line of input? */
|| (sed_cmd->beg_line == -1 && next_line == NULL);
I guess I can try to use -2
>> /* Okay, so did this line match? */
>>- if (sed_cmd->invert ? !matched : matched) {
>>+ if (sed_cmd->invert ? matched : !matched)
>>+ continue; /* no */
>>+
>
> Um, not noise. This is, in fact, a functional change totally unrelated to the
> rest of the patch of the kind that tends to cause subtle bugs, which I used to
> be very sensitive to in sed because it's so complicated and fiddly.
>
> You left this change in, but didn't change the indentation of everything that
> used to be a block, so the indentation is now misleading.
I did. You complained that the patch is messed up. So I rediffed it with
diff -bB to make it look better. Now you complain about *that*.
I did mix the fix with whitespace and comment changes.
This complaint is valid.
Please review attached 7.patch which fixes 2d;2,1p
function old new delta
process_files 2173 2120 -53
--
vda
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 5.patch
Type: text/x-patch
Size: 2048 bytes
Desc: not available
URL: <http://lists.busybox.net/pipermail/busybox/attachments/20090710/d571090d/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 6.patch
Type: text/x-patch
Size: 15863 bytes
Desc: not available
URL: <http://lists.busybox.net/pipermail/busybox/attachments/20090710/d571090d/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 7.patch
Type: text/x-patch
Size: 2301 bytes
Desc: not available
URL: <http://lists.busybox.net/pipermail/busybox/attachments/20090710/d571090d/attachment-0005.bin>
More information about the busybox
mailing list