[Bug 15748] New: with leading quote, printf prints value of first byte of a character instead of its numeric value in the codset
bugzilla at busybox.net
bugzilla at busybox.net
Thu Aug 31 07:15:14 UTC 2023
https://bugs.busybox.net/show_bug.cgi?id=15748
Bug ID: 15748
Summary: with leading quote, printf prints value of first byte
of a character instead of its numeric value in the
codset
Product: Busybox
Version: 1.35.x
Hardware: All
OS: Linux
Status: NEW
Severity: normal
Priority: P5
Component: Standard Compliance
Assignee: unassigned at busybox.net
Reporter: cslycord at gmail.com
CC: busybox-cvs at busybox.net
Target Milestone: ---
According to posix standard for printf:
If the leading character is a single-quote or double-quote, the value shall be
the numeric value in the underlying codeset of the character following the
single-quote or double-quote.
This implies that it should be the character's codepoint, which is what is used
in coreutils and bash.
In busybox, instead it can return the value of the first of byte of the
character.
Examples:
* 바
HEX codepoint: BC14
DEC codepoint: 48148
Hex UTF-8 bytes: EB B0 94
(UTF-8 bytes converted to DEC): 235 176 148
https://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=%EB%B0%94&mode=char
* 학
HEX codepoint: D559
DEC codepoint: 54617
Hex UTF-8 bytes: ED 95 99
(UTF-8 bytes converted to DEC): 237 149 153
https://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=%ED%95%99&mode=char
# busybox printf '%X' "'바"
EB
busybox printf '%X' "'학"
ED
# busybox printf '%d' "'바"
235
# busybox printf '%d' "'학"
237
(these are the HEX and DEC values of the first byte of the character)
Then the printf from coreutils/bash
# printf '%X' "'바"
BC14
# printf '%X' "'학"
D559
# printf '%d' "'바"
48148
# printf '%d' "'학"
54617
(which are the HEX and DEC values of the character's codepoint)
Same happens with multibyte Chinese characters.
# (coreutils) printf '%X' "'传"
4F20
# busybox printf '%X' "'传"
E4
传 has HEX codepoint 4F20 and Hex UTF-8 bytes: E4 BC A0
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the busybox-cvs
mailing list