[BusyBox 0004774]: Bitwise operations in awk applet are done the default signedness of longs, which varies with compilation options/platforms
bugs at busybox.net
bugs at busybox.net
Mon Sep 1 03:41:04 UTC 2008
The following issue has been REOPENED.
======================================================================
http://busybox.net/bugs/view.php?id=4774
======================================================================
Reported By: benoar
Assigned To: BusyBox
======================================================================
Project: BusyBox
Issue ID: 4774
Category: Standards Compliance
Reproducibility: always
Severity: minor
Priority: normal
Status: feedback
======================================================================
Date Submitted: 08-27-2008 21:33 PDT
Last Modified: 08-31-2008 20:41 PDT
======================================================================
Summary: Bitwise operations in awk applet are done the
default signedness of longs, which varies with compilation options/platforms
Description:
When using bitwise operations in awk applet, the signedness of the value
operated on depends on how busybox is compiled, because these operations
are defined to work on "long". This can lead to "strange" result when long
is signed by default, when using values higher than 2^31-1, i.e. :
echo|awk '{ print and(0x80000000,1) }'
gives 1 when compiled with signed long, whereas it gives 0 when compiled
with unsigned longs by default.
I don't know if there is a standard for bitwise operations in awk
(http://www.opengroup.org/onlinepubs/009695399/utilities/awk.html doesn't
give me a clue), but most of the platform I have tested behave as if long
is unsigned by default.
Furthermore, working on signed values when doing bitwise operations is
awkward.
This may be a gcc bug, which uses the wrong default, I don't know. The
fact is that this bug affects openwrt on big endian platforms which use
signed long by default (I filled
https://dev.openwrt.org/cgi-bin/trac.fcgi/ticket/3946), contrary to
little-endian platform which use unsigned long.
I think the correct solution is to explicitly state that these bitwise
operations operate on unsigned long. A patch is attached to correct this
behavior. It only affects platforms that use signed long by default.
======================================================================
----------------------------------------------------------------------
vda - 08-28-08 13:53
----------------------------------------------------------------------
long is always signed in C. Only char has unspecified signedness.
Please give a concrete example of incorrect awk behavior. If GNU awk
produces a result which is different from busybox's awk, that is a good
indication of a bug.
----------------------------------------------------------------------
vda - 08-28-08 16:16
----------------------------------------------------------------------
Applied to svn, thanks! Also see:
http://busybox.net/downloads/fixes-1.12.0/busybox-1.12.0-awk.patch
----------------------------------------------------------------------
benoar - 08-31-08 20:41
----------------------------------------------------------------------
I reopen this bug after I made some more investigation on the root cause of
the problem, and to let busybox developers decide what is the right thing
to do.
First, vda, sorry for the signedness confusion : you are right, long is
always signed, there is no such thing as "default signedness" for it, and
the problem doesn't come from there. This is where I was confused.
Actually, the "problem" comes from the cast from double (internal bb awk
representation of numbers) to (unsigned) long. The bug I saw in openwrt
was in fact that different architecture casted to different integral
values for values > 2^31-1.
But I think the reason for this behavior is that there is no rule on how
to cast a value _not_ in the destination's type range, as 0x80000000 >
LONG_MAX ! So, the behavior was undefined, and I got different results on
different archs. This is only a supposition, please correct me if I am
wrong.
So, the patch I sent obviously matches what suits me, but I am not sure it
pleases everybody : now, negative values in bitwise operations get cast to
something undefined. I get 0 on an armeb for example.
To me, using bitwise ops on values >0 and <ULONG_MAX looks good, as
opposed to values >LONG_MIN and <LONG_MAX. But not everybody may agree. If
someone can find some reference on awk's behavior on integers limits
regarding bitwise ops, I'd be gratefull.
Issue History
Date Modified Username Field Change
======================================================================
08-27-08 21:33 benoar New Issue
08-27-08 21:33 benoar Status new => assigned
08-27-08 21:33 benoar Assigned To => BusyBox
08-27-08 21:33 benoar File Added: bitwise_ops_on_unsigned_long.patch
08-28-08 13:53 vda Note Added: 0010854
08-28-08 16:16 vda Status assigned => closed
08-28-08 16:16 vda Note Added: 0010864
08-28-08 16:16 vda Resolution open => fixed
08-28-08 16:16 vda Fixed in Version => svn
08-31-08 20:41 benoar Status closed => feedback
08-31-08 20:41 benoar Resolution fixed => reopened
08-31-08 20:41 benoar Note Added: 0010914
======================================================================
More information about the busybox-cvs
mailing list