sed problems with hexadecimal substitutions

Denis Vlasenko vda.linux at googlemail.com
Wed Jan 17 22:47:41 UTC 2007


On Tuesday 16 January 2007 03:48, Andre wrote:
> I'm trying to use Busybox sed for quick and dirty UTF16 to ascii
> conversion:
> 
> $ xxd tst.uft16 
> 0000000: 6100 6200 6300 3100 3200 3300 0a         a.b.c.1.2.3..
> 
> GNU sed version 4.1.2 gives the expected result:
> 
> $ sed 's/\x00//g' < tst.uft16 | xxd
> 0000000: 6162 6331 3233 0a                        abc123.
> 
> Busybox 1.3.1 sed doesn't seem to do anything:
> 
> $ busybox-1.3.1 sed 's/\x00//g' < tst.uft16 | xxd
> 0000000: 6100 6200 6300 3100 3200 3300 0a         a.b.c.1.2.3..

Yes. It actually treats NUL and \n as two possible line termination
chars, and does s/// thing on the line contents (naturally, _excluding_
terminatin character)!

> Busybox 1.2.1 sed removes the \x00 chars, but then adds spurious
> newline chars after every conversion...

No. It mangles NULs always, irrespective whether you specified \x00
anywhere or not. This was fixed, as you observed.

The problem is that we use regexec() in order to find matches
(see do_subst_command() in sed.c), and regexec operates on NUL terminated
strings. Sort of hard to match NULs using this API ;)

If you will go that far as to actually looking into the code, then
note that line terminator (\n? NUL? NO terminator at all (EOF)?)
is stored in last_char, lik this:
         last_char = next_last_char;
         next_line = get_next_line(&next_last_char);
get_next_line(..., int*) stores last char ORed with two flags:
0x100 = "there were no line terminator at all (EOF + last line wasnt terminated)
0x200 = "this is the first line from file"

How gnu sed code handles this?
--
vda



More information about the busybox mailing list