From bugzilla at busybox.net Tue Feb 1 18:48:38 2022
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Tue, 01 Feb 2022 18:48:38 +0000
Subject: [Bug 14041] New: using modprobe/insmod with compressed modules gives
scary kernel warnings
Message-ID:
https://bugs.busybox.net/show_bug.cgi?id=14041
Bug ID: 14041
Summary: using modprobe/insmod with compressed modules gives
scary kernel warnings
Product: Busybox
Version: 1.33.x
Hardware: All
OS: Linux
Status: NEW
Severity: normal
Priority: P5
Component: Other
Assignee: unassigned at busybox.net
Reporter: nolange79 at gmail.com
CC: busybox-cvs at busybox.net
Target Milestone: ---
I am using kernel 5.4 on x86_64 fro an embedded system.
loading a compressed module will give the kernel log error:
kernel: Module has invalid ELF structures
steps to reproduce are simply:
# busybox insmod nbd.ko.gz
Some points:
this happens with gzip and xz
the util-linux insmod/modprobe work without log entry
the module seems correctly loaded and works.
decompressing the module (with the same busybox executable) and then loading it
will lead to no log entry
# (after unloading the module again)
# busybox gzip -d nbd.ko.gz
# busybox insmod nbd.ko
I dont know if there is any functional issue, but I am tempted to raise
severity since I cant rule it out either.
--- Comment #1 from sylvain.prat at gmail.com ---
I also fell into the problem. I was wondering how alpine linux worked out with
compressed linux modules and I finally found out that they don't use busybox's
modprobe anymore... Since the decompression methods already exist in busybox,
it shouldn't be too hard to implement I guess, but I'm not competent enough to
do it myself.
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at busybox.net Tue Feb 1 20:16:08 2022
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Tue, 01 Feb 2022 20:16:08 +0000
Subject: [Bug 14541] sed: s-command with "semi-special" delimiters get wrong
behaviour
In-Reply-To:
References:
Message-ID:
https://bugs.busybox.net/show_bug.cgi?id=14541
Christoph Anton Mitterer changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|REOPENED |RESOLVED
--- Comment #4 from Christoph Anton Mitterer ---
I hadn't seen the 2nd commit, f12fb1e4092900f26f7f8c71cde44b1cd7d26439, when
testing.
That also fixes the case from comment #3.
Now, BusyBox sed seems to behave identically to GNU sed in all the cases I had
given in:
https://www.austingroupbugs.net/view.php?id=1551#c5612
Especially, it also seems to consider "un-delimitered" delimiters that are also
special characters as "still special" (or at least I tried that with '.') -
which is, while IMO not clearly defined by POSIX, identical to the behaviour of
GNU sed, see https://www.austingroupbugs.net/view.php?id=1551#c5648 for test
cases.)
Thus closing again.
Thanks.
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at busybox.net Wed Feb 2 05:11:03 2022
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Wed, 02 Feb 2022 05:11:03 +0000
Subject: [Bug 14566] New: ifupdown: Document supported stanzas for interfaces
file
Message-ID:
https://bugs.busybox.net/show_bug.cgi?id=14566
Bug ID: 14566
Summary: ifupdown: Document supported stanzas for interfaces
file
Product: Busybox
Version: 1.33.x
Hardware: All
OS: Linux
Status: NEW
Severity: normal
Priority: P5
Component: Networking
Assignee: unassigned at busybox.net
Reporter: michael at cassaniti.id.au
CC: busybox-cvs at busybox.net
Target Milestone: ---
Hi,
First, thank you so much for Busybox. It makes my life very easy I must say.
I'm using Busybox 1.33.1 under Alpine Linux 3.14. The current configuration
should be at this URL:
https://git.alpinelinux.org/aports/tree/main/busybox/busyboxconfig?id=1aa6700d1e4ef810f2319506e48a8b5316d17abe
I've read the man page for interfaces from these URLs and they don't all agree
on the supported stanzas:
-
https://salsa.debian.org/debian/ifupdown/-/raw/19052e2ecb0a908428813b5bc25d5bd0283c5a18/interfaces.5.pre
- https://manpages.org/etc-network-interfaces/5
- https://www.systutorials.com/docs/linux/man/5-interfaces/
I'm likely not the only one to encounter confusion about what stanzas Busybox
does and does not support. I did read the source code and do my best to
determine what is supported. This documentation would cover what is
__natively__ supported since using additional scripts essentially allows
extending the syntax.
I found the following directives so far are not supported and I assume I've
missed some:
- rename
- inherits
- allow- stanzas (e.g.: allow-hotplug)
- no-auto-down
- no-scripts
- description
- template
- source-dir is supported but not source-directory
Please note I'm not requesting any of these stanzas be supported as part of
this bug. It would personally be nice to have rename supported, but I can
understand why others mentioned above are not included.
--
You are receiving this mail because:
You are on the CC list for the bug.
From vda.linux at googlemail.com Thu Feb 3 13:58:02 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Thu, 3 Feb 2022 14:58:02 +0100
Subject: [git commit] libbb/sha256: optional x86 hardware accelerated hashing
Message-ID: <20220203135206.237C982911@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=6472ac942898437e040171cec991de1c0b962f72
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
64 bit:
function old new delta
sha256_process_block64_shaNI - 730 +730
.rodata 108314 108586 +272
sha256_begin 31 83 +52
------------------------------------------------------------------------------
(add/remove: 5/1 grow/shrink: 2/0 up/down: 1055/-1) Total: 1054 bytes
32 bit:
function old new delta
sha256_process_block64_shaNI - 747 +747
.rodata 104318 104590 +272
sha256_begin 29 84 +55
------------------------------------------------------------------------------
(add/remove: 5/1 grow/shrink: 2/0 up/down: 1075/-1) Total: 1074 bytes
Signed-off-by: Denys Vlasenko
---
libbb/Config.src | 6 +
libbb/Kbuild.src | 2 +
libbb/hash_md5_sha.c | 54 ++++---
libbb/hash_md5_sha256_x86-32_shaNI.S | 283 +++++++++++++++++++++++++++++++++++
libbb/hash_md5_sha256_x86-64_shaNI.S | 281 ++++++++++++++++++++++++++++++++++
libbb/hash_md5_sha_x86-32_shaNI.S | 4 +-
libbb/hash_md5_sha_x86-64.S | 2 +-
libbb/hash_md5_sha_x86-64.S.sh | 2 +-
libbb/hash_md5_sha_x86-64_shaNI.S | 4 +-
9 files changed, 612 insertions(+), 26 deletions(-)
diff --git a/libbb/Config.src b/libbb/Config.src
index 708d3b0c8..0ecd5bd46 100644
--- a/libbb/Config.src
+++ b/libbb/Config.src
@@ -70,6 +70,12 @@ config SHA1_HWACCEL
On x86, this adds ~590 bytes of code. Throughput
is about twice as fast as fully-unrolled generic code.
+config SHA256_HWACCEL
+ bool "SHA256: Use hardware accelerated instructions if possible"
+ default y
+ help
+ On x86, this adds ~1k bytes of code.
+
config SHA3_SMALL
int "SHA3: Trade bytes for speed (0:fast, 1:slow)"
default 1 # all "fast or small" options default to small
diff --git a/libbb/Kbuild.src b/libbb/Kbuild.src
index b9d34de8e..653025e56 100644
--- a/libbb/Kbuild.src
+++ b/libbb/Kbuild.src
@@ -59,6 +59,8 @@ lib-y += hash_md5_sha.o
lib-y += hash_md5_sha_x86-64.o
lib-y += hash_md5_sha_x86-64_shaNI.o
lib-y += hash_md5_sha_x86-32_shaNI.o
+lib-y += hash_md5_sha256_x86-64_shaNI.o
+lib-y += hash_md5_sha256_x86-32_shaNI.o
# Alternative (disabled) MD5 implementation
#lib-y += hash_md5prime.o
lib-y += messages.o
diff --git a/libbb/hash_md5_sha.c b/libbb/hash_md5_sha.c
index a23db5152..880ffab01 100644
--- a/libbb/hash_md5_sha.c
+++ b/libbb/hash_md5_sha.c
@@ -13,6 +13,27 @@
#define NEED_SHA512 (ENABLE_SHA512SUM || ENABLE_USE_BB_CRYPT_SHA)
+#if ENABLE_SHA1_HWACCEL || ENABLE_SHA256_HWACCEL
+# if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
+static void cpuid(unsigned *eax, unsigned *ebx, unsigned *ecx, unsigned *edx)
+{
+ asm ("cpuid"
+ : "=a"(*eax), "=b"(*ebx), "=c"(*ecx), "=d"(*edx)
+ : "0"(*eax), "1"(*ebx), "2"(*ecx), "3"(*edx)
+ );
+}
+static smallint shaNI;
+void FAST_FUNC sha1_process_block64_shaNI(sha1_ctx_t *ctx);
+void FAST_FUNC sha256_process_block64_shaNI(sha256_ctx_t *ctx);
+# if defined(__i386__)
+struct ASM_expects_76_shaNI { char t[1 - 2*(offsetof(sha256_ctx_t, hash) != 76)]; };
+# endif
+# if defined(__x86_64__)
+struct ASM_expects_80_shaNI { char t[1 - 2*(offsetof(sha256_ctx_t, hash) != 80)]; };
+# endif
+# endif
+#endif
+
/* gcc 4.2.1 optimizes rotr64 better with inline than with macro
* (for rotX32, there is no difference). Why? My guess is that
* macro requires clever common subexpression elimination heuristics
@@ -1142,25 +1163,6 @@ static void FAST_FUNC sha512_process_block128(sha512_ctx_t *ctx)
}
#endif /* NEED_SHA512 */
-#if ENABLE_SHA1_HWACCEL
-# if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
-static void cpuid(unsigned *eax, unsigned *ebx, unsigned *ecx, unsigned *edx)
-{
- asm ("cpuid"
- : "=a"(*eax), "=b"(*ebx), "=c"(*ecx), "=d"(*edx)
- : "0"(*eax), "1"(*ebx), "2"(*ecx), "3"(*edx)
- );
-}
-void FAST_FUNC sha1_process_block64_shaNI(sha1_ctx_t *ctx);
-# if defined(__i386__)
-struct ASM_expects_76_shaNI { char t[1 - 2*(offsetof(sha1_ctx_t, hash) != 76)]; };
-# endif
-# if defined(__x86_64__)
-struct ASM_expects_80_shaNI { char t[1 - 2*(offsetof(sha1_ctx_t, hash) != 80)]; };
-# endif
-# endif
-#endif
-
void FAST_FUNC sha1_begin(sha1_ctx_t *ctx)
{
ctx->hash[0] = 0x67452301;
@@ -1173,7 +1175,6 @@ void FAST_FUNC sha1_begin(sha1_ctx_t *ctx)
#if ENABLE_SHA1_HWACCEL
# if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
{
- static smallint shaNI;
if (!shaNI) {
unsigned eax = 7, ebx = ebx, ecx = 0, edx = edx;
cpuid(&eax, &ebx, &ecx, &edx);
@@ -1225,6 +1226,19 @@ void FAST_FUNC sha256_begin(sha256_ctx_t *ctx)
memcpy(&ctx->total64, init256, sizeof(init256));
/*ctx->total64 = 0; - done by prepending two 32-bit zeros to init256 */
ctx->process_block = sha256_process_block64;
+#if ENABLE_SHA256_HWACCEL
+# if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
+ {
+ if (!shaNI) {
+ unsigned eax = 7, ebx = ebx, ecx = 0, edx = edx;
+ cpuid(&eax, &ebx, &ecx, &edx);
+ shaNI = ((ebx >> 29) << 1) - 1;
+ }
+ if (shaNI > 0)
+ ctx->process_block = sha256_process_block64_shaNI;
+ }
+# endif
+#endif
}
#if NEED_SHA512
diff --git a/libbb/hash_md5_sha256_x86-32_shaNI.S b/libbb/hash_md5_sha256_x86-32_shaNI.S
new file mode 100644
index 000000000..56e37fa38
--- /dev/null
+++ b/libbb/hash_md5_sha256_x86-32_shaNI.S
@@ -0,0 +1,283 @@
+#if ENABLE_SHA256_HWACCEL && defined(__GNUC__) && defined(__i386__)
+/* The code is adapted from Linux kernel's source */
+
+// We use shorter insns, even though they are for "wrong"
+// data type (fp, not int).
+// For Intel, there is no penalty for doing it at all
+// (CPUs which do have such penalty do not support SHA1 insns).
+// For AMD, the penalty is one extra cycle
+// (allegedly: I failed to find measurable difference).
+
+//#define mova128 movdqa
+#define mova128 movaps
+//#define movu128 movdqu
+#define movu128 movups
+//#define shuf128_32 pshufd
+#define shuf128_32 shufps
+
+ .section .text.sha256_process_block64_shaNI, "ax", @progbits
+ .globl sha256_process_block64_shaNI
+ .hidden sha256_process_block64_shaNI
+ .type sha256_process_block64_shaNI, @function
+
+#define DATA_PTR %eax
+
+#define SHA256CONSTANTS %ecx
+
+#define MSG %xmm0
+#define STATE0 %xmm1
+#define STATE1 %xmm2
+#define MSGTMP0 %xmm3
+#define MSGTMP1 %xmm4
+#define MSGTMP2 %xmm5
+#define MSGTMP3 %xmm6
+#define MSGTMP4 %xmm7
+
+ .balign 8 # allow decoders to fetch at least 3 first insns
+sha256_process_block64_shaNI:
+ pushl %ebp
+ movl %esp, %ebp
+ subl $32, %esp
+ andl $~0xF, %esp # paddd needs aligned memory operand
+
+ movu128 76+0*16(%eax), STATE0
+ movu128 76+1*16(%eax), STATE1
+
+ shuf128_32 $0xB1, STATE0, STATE0 /* CDAB */
+ shuf128_32 $0x1B, STATE1, STATE1 /* EFGH */
+ mova128 STATE0, MSGTMP4
+ palignr $8, STATE1, STATE0 /* ABEF */
+ pblendw $0xF0, MSGTMP4, STATE1 /* CDGH */
+
+# mova128 PSHUFFLE_BSWAP32_FLIP_MASK, SHUF_MASK
+ lea K256, SHA256CONSTANTS
+
+ /* Save hash values for addition after rounds */
+ mova128 STATE0, 0*16(%esp)
+ mova128 STATE1, 1*16(%esp)
+
+ /* Rounds 0-3 */
+ movu128 0*16(DATA_PTR), MSG
+ pshufb PSHUFFLE_BSWAP32_FLIP_MASK, MSG
+ mova128 MSG, MSGTMP0
+ paddd 0*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+
+ /* Rounds 4-7 */
+ movu128 1*16(DATA_PTR), MSG
+ pshufb PSHUFFLE_BSWAP32_FLIP_MASK, MSG
+ mova128 MSG, MSGTMP1
+ paddd 1*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP1, MSGTMP0
+
+ /* Rounds 8-11 */
+ movu128 2*16(DATA_PTR), MSG
+ pshufb PSHUFFLE_BSWAP32_FLIP_MASK, MSG
+ mova128 MSG, MSGTMP2
+ paddd 2*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP2, MSGTMP1
+
+ /* Rounds 12-15 */
+ movu128 3*16(DATA_PTR), MSG
+ pshufb PSHUFFLE_BSWAP32_FLIP_MASK, MSG
+ mova128 MSG, MSGTMP3
+ paddd 3*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP3, MSGTMP4
+ palignr $4, MSGTMP2, MSGTMP4
+ paddd MSGTMP4, MSGTMP0
+ sha256msg2 MSGTMP3, MSGTMP0
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP3, MSGTMP2
+
+ /* Rounds 16-19 */
+ mova128 MSGTMP0, MSG
+ paddd 4*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP0, MSGTMP4
+ palignr $4, MSGTMP3, MSGTMP4
+ paddd MSGTMP4, MSGTMP1
+ sha256msg2 MSGTMP0, MSGTMP1
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP0, MSGTMP3
+
+ /* Rounds 20-23 */
+ mova128 MSGTMP1, MSG
+ paddd 5*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP1, MSGTMP4
+ palignr $4, MSGTMP0, MSGTMP4
+ paddd MSGTMP4, MSGTMP2
+ sha256msg2 MSGTMP1, MSGTMP2
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP1, MSGTMP0
+
+ /* Rounds 24-27 */
+ mova128 MSGTMP2, MSG
+ paddd 6*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP2, MSGTMP4
+ palignr $4, MSGTMP1, MSGTMP4
+ paddd MSGTMP4, MSGTMP3
+ sha256msg2 MSGTMP2, MSGTMP3
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP2, MSGTMP1
+
+ /* Rounds 28-31 */
+ mova128 MSGTMP3, MSG
+ paddd 7*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP3, MSGTMP4
+ palignr $4, MSGTMP2, MSGTMP4
+ paddd MSGTMP4, MSGTMP0
+ sha256msg2 MSGTMP3, MSGTMP0
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP3, MSGTMP2
+
+ /* Rounds 32-35 */
+ mova128 MSGTMP0, MSG
+ paddd 8*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP0, MSGTMP4
+ palignr $4, MSGTMP3, MSGTMP4
+ paddd MSGTMP4, MSGTMP1
+ sha256msg2 MSGTMP0, MSGTMP1
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP0, MSGTMP3
+
+ /* Rounds 36-39 */
+ mova128 MSGTMP1, MSG
+ paddd 9*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP1, MSGTMP4
+ palignr $4, MSGTMP0, MSGTMP4
+ paddd MSGTMP4, MSGTMP2
+ sha256msg2 MSGTMP1, MSGTMP2
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP1, MSGTMP0
+
+ /* Rounds 40-43 */
+ mova128 MSGTMP2, MSG
+ paddd 10*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP2, MSGTMP4
+ palignr $4, MSGTMP1, MSGTMP4
+ paddd MSGTMP4, MSGTMP3
+ sha256msg2 MSGTMP2, MSGTMP3
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP2, MSGTMP1
+
+ /* Rounds 44-47 */
+ mova128 MSGTMP3, MSG
+ paddd 11*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP3, MSGTMP4
+ palignr $4, MSGTMP2, MSGTMP4
+ paddd MSGTMP4, MSGTMP0
+ sha256msg2 MSGTMP3, MSGTMP0
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP3, MSGTMP2
+
+ /* Rounds 48-51 */
+ mova128 MSGTMP0, MSG
+ paddd 12*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP0, MSGTMP4
+ palignr $4, MSGTMP3, MSGTMP4
+ paddd MSGTMP4, MSGTMP1
+ sha256msg2 MSGTMP0, MSGTMP1
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP0, MSGTMP3
+
+ /* Rounds 52-55 */
+ mova128 MSGTMP1, MSG
+ paddd 13*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP1, MSGTMP4
+ palignr $4, MSGTMP0, MSGTMP4
+ paddd MSGTMP4, MSGTMP2
+ sha256msg2 MSGTMP1, MSGTMP2
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+
+ /* Rounds 56-59 */
+ mova128 MSGTMP2, MSG
+ paddd 14*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP2, MSGTMP4
+ palignr $4, MSGTMP1, MSGTMP4
+ paddd MSGTMP4, MSGTMP3
+ sha256msg2 MSGTMP2, MSGTMP3
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+
+ /* Rounds 60-63 */
+ mova128 MSGTMP3, MSG
+ paddd 15*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+
+ /* Add current hash values with previously saved */
+ paddd 0*16(%esp), STATE0
+ paddd 1*16(%esp), STATE1
+
+ /* Write hash values back in the correct order */
+ shuf128_32 $0x1B, STATE0, STATE0 /* FEBA */
+ shuf128_32 $0xB1, STATE1, STATE1 /* DCHG */
+ mova128 STATE0, MSGTMP4
+ pblendw $0xF0, STATE1, STATE0 /* DCBA */
+ palignr $8, MSGTMP4, STATE1 /* HGFE */
+
+ movu128 STATE0, 76+0*16(%eax)
+ movu128 STATE1, 76+1*16(%eax)
+
+ movl %ebp, %esp
+ popl %ebp
+ ret
+ .size sha256_process_block64_shaNI, .-sha256_process_block64_shaNI
+
+.section .rodata.cst256.K256, "aM", @progbits, 256
+.balign 16
+K256:
+ .long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+ .long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+ .long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+ .long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+ .long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+ .long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+ .long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+ .long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+ .long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+ .long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+ .long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+ .long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+ .long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+ .long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+ .long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+ .long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+
+.section .rodata.cst16.PSHUFFLE_BSWAP32_FLIP_MASK, "aM", @progbits, 16
+.balign 16
+PSHUFFLE_BSWAP32_FLIP_MASK:
+ .octa 0x0c0d0e0f08090a0b0405060700010203
+
+#endif
diff --git a/libbb/hash_md5_sha256_x86-64_shaNI.S b/libbb/hash_md5_sha256_x86-64_shaNI.S
new file mode 100644
index 000000000..1c2b75af3
--- /dev/null
+++ b/libbb/hash_md5_sha256_x86-64_shaNI.S
@@ -0,0 +1,281 @@
+#if ENABLE_SHA256_HWACCEL && defined(__GNUC__) && defined(__x86_64__)
+/* The code is adapted from Linux kernel's source */
+
+// We use shorter insns, even though they are for "wrong"
+// data type (fp, not int).
+// For Intel, there is no penalty for doing it at all
+// (CPUs which do have such penalty do not support SHA1 insns).
+// For AMD, the penalty is one extra cycle
+// (allegedly: I failed to find measurable difference).
+
+//#define mova128 movdqa
+#define mova128 movaps
+//#define movu128 movdqu
+#define movu128 movups
+//#define shuf128_32 pshufd
+#define shuf128_32 shufps
+
+ .section .text.sha256_process_block64_shaNI, "ax", @progbits
+ .globl sha256_process_block64_shaNI
+ .hidden sha256_process_block64_shaNI
+ .type sha256_process_block64_shaNI, @function
+
+#define DATA_PTR %rdi
+
+#define SHA256CONSTANTS %rax
+
+#define MSG %xmm0
+#define STATE0 %xmm1
+#define STATE1 %xmm2
+#define MSGTMP0 %xmm3
+#define MSGTMP1 %xmm4
+#define MSGTMP2 %xmm5
+#define MSGTMP3 %xmm6
+#define MSGTMP4 %xmm7
+
+#define SHUF_MASK %xmm8
+
+#define ABEF_SAVE %xmm9
+#define CDGH_SAVE %xmm10
+
+ .balign 8 # allow decoders to fetch at least 2 first insns
+sha256_process_block64_shaNI:
+ movu128 80+0*16(%rdi), STATE0
+ movu128 80+1*16(%rdi), STATE1
+
+ shuf128_32 $0xB1, STATE0, STATE0 /* CDAB */
+ shuf128_32 $0x1B, STATE1, STATE1 /* EFGH */
+ mova128 STATE0, MSGTMP4
+ palignr $8, STATE1, STATE0 /* ABEF */
+ pblendw $0xF0, MSGTMP4, STATE1 /* CDGH */
+
+ mova128 PSHUFFLE_BSWAP32_FLIP_MASK(%rip), SHUF_MASK
+ lea K256(%rip), SHA256CONSTANTS
+
+ /* Save hash values for addition after rounds */
+ mova128 STATE0, ABEF_SAVE
+ mova128 STATE1, CDGH_SAVE
+
+ /* Rounds 0-3 */
+ movu128 0*16(DATA_PTR), MSG
+ pshufb SHUF_MASK, MSG
+ mova128 MSG, MSGTMP0
+ paddd 0*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+
+ /* Rounds 4-7 */
+ movu128 1*16(DATA_PTR), MSG
+ pshufb SHUF_MASK, MSG
+ mova128 MSG, MSGTMP1
+ paddd 1*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP1, MSGTMP0
+
+ /* Rounds 8-11 */
+ movu128 2*16(DATA_PTR), MSG
+ pshufb SHUF_MASK, MSG
+ mova128 MSG, MSGTMP2
+ paddd 2*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP2, MSGTMP1
+
+ /* Rounds 12-15 */
+ movu128 3*16(DATA_PTR), MSG
+ pshufb SHUF_MASK, MSG
+ mova128 MSG, MSGTMP3
+ paddd 3*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP3, MSGTMP4
+ palignr $4, MSGTMP2, MSGTMP4
+ paddd MSGTMP4, MSGTMP0
+ sha256msg2 MSGTMP3, MSGTMP0
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP3, MSGTMP2
+
+ /* Rounds 16-19 */
+ mova128 MSGTMP0, MSG
+ paddd 4*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP0, MSGTMP4
+ palignr $4, MSGTMP3, MSGTMP4
+ paddd MSGTMP4, MSGTMP1
+ sha256msg2 MSGTMP0, MSGTMP1
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP0, MSGTMP3
+
+ /* Rounds 20-23 */
+ mova128 MSGTMP1, MSG
+ paddd 5*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP1, MSGTMP4
+ palignr $4, MSGTMP0, MSGTMP4
+ paddd MSGTMP4, MSGTMP2
+ sha256msg2 MSGTMP1, MSGTMP2
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP1, MSGTMP0
+
+ /* Rounds 24-27 */
+ mova128 MSGTMP2, MSG
+ paddd 6*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP2, MSGTMP4
+ palignr $4, MSGTMP1, MSGTMP4
+ paddd MSGTMP4, MSGTMP3
+ sha256msg2 MSGTMP2, MSGTMP3
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP2, MSGTMP1
+
+ /* Rounds 28-31 */
+ mova128 MSGTMP3, MSG
+ paddd 7*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP3, MSGTMP4
+ palignr $4, MSGTMP2, MSGTMP4
+ paddd MSGTMP4, MSGTMP0
+ sha256msg2 MSGTMP3, MSGTMP0
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP3, MSGTMP2
+
+ /* Rounds 32-35 */
+ mova128 MSGTMP0, MSG
+ paddd 8*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP0, MSGTMP4
+ palignr $4, MSGTMP3, MSGTMP4
+ paddd MSGTMP4, MSGTMP1
+ sha256msg2 MSGTMP0, MSGTMP1
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP0, MSGTMP3
+
+ /* Rounds 36-39 */
+ mova128 MSGTMP1, MSG
+ paddd 9*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP1, MSGTMP4
+ palignr $4, MSGTMP0, MSGTMP4
+ paddd MSGTMP4, MSGTMP2
+ sha256msg2 MSGTMP1, MSGTMP2
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP1, MSGTMP0
+
+ /* Rounds 40-43 */
+ mova128 MSGTMP2, MSG
+ paddd 10*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP2, MSGTMP4
+ palignr $4, MSGTMP1, MSGTMP4
+ paddd MSGTMP4, MSGTMP3
+ sha256msg2 MSGTMP2, MSGTMP3
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP2, MSGTMP1
+
+ /* Rounds 44-47 */
+ mova128 MSGTMP3, MSG
+ paddd 11*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP3, MSGTMP4
+ palignr $4, MSGTMP2, MSGTMP4
+ paddd MSGTMP4, MSGTMP0
+ sha256msg2 MSGTMP3, MSGTMP0
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP3, MSGTMP2
+
+ /* Rounds 48-51 */
+ mova128 MSGTMP0, MSG
+ paddd 12*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP0, MSGTMP4
+ palignr $4, MSGTMP3, MSGTMP4
+ paddd MSGTMP4, MSGTMP1
+ sha256msg2 MSGTMP0, MSGTMP1
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+ sha256msg1 MSGTMP0, MSGTMP3
+
+ /* Rounds 52-55 */
+ mova128 MSGTMP1, MSG
+ paddd 13*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP1, MSGTMP4
+ palignr $4, MSGTMP0, MSGTMP4
+ paddd MSGTMP4, MSGTMP2
+ sha256msg2 MSGTMP1, MSGTMP2
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+
+ /* Rounds 56-59 */
+ mova128 MSGTMP2, MSG
+ paddd 14*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ mova128 MSGTMP2, MSGTMP4
+ palignr $4, MSGTMP1, MSGTMP4
+ paddd MSGTMP4, MSGTMP3
+ sha256msg2 MSGTMP2, MSGTMP3
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+
+ /* Rounds 60-63 */
+ mova128 MSGTMP3, MSG
+ paddd 15*16(SHA256CONSTANTS), MSG
+ sha256rnds2 STATE0, STATE1
+ shuf128_32 $0x0E, MSG, MSG
+ sha256rnds2 STATE1, STATE0
+
+ /* Add current hash values with previously saved */
+ paddd ABEF_SAVE, STATE0
+ paddd CDGH_SAVE, STATE1
+
+ /* Write hash values back in the correct order */
+ shuf128_32 $0x1B, STATE0, STATE0 /* FEBA */
+ shuf128_32 $0xB1, STATE1, STATE1 /* DCHG */
+ mova128 STATE0, MSGTMP4
+ pblendw $0xF0, STATE1, STATE0 /* DCBA */
+ palignr $8, MSGTMP4, STATE1 /* HGFE */
+
+ movu128 STATE0, 80+0*16(%rdi)
+ movu128 STATE1, 80+1*16(%rdi)
+
+ ret
+ .size sha256_process_block64_shaNI, .-sha256_process_block64_shaNI
+
+.section .rodata.cst256.K256, "aM", @progbits, 256
+.balign 16
+K256:
+ .long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+ .long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+ .long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+ .long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+ .long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+ .long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+ .long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+ .long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+ .long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+ .long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+ .long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+ .long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+ .long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+ .long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+ .long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+ .long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+
+.section .rodata.cst16.PSHUFFLE_BSWAP32_FLIP_MASK, "aM", @progbits, 16
+.balign 16
+PSHUFFLE_BSWAP32_FLIP_MASK:
+ .octa 0x0c0d0e0f08090a0b0405060700010203
+
+#endif
diff --git a/libbb/hash_md5_sha_x86-32_shaNI.S b/libbb/hash_md5_sha_x86-32_shaNI.S
index 166cfd38a..11b855e26 100644
--- a/libbb/hash_md5_sha_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha_x86-32_shaNI.S
@@ -20,7 +20,7 @@
#define extr128_32 pextrd
//#define extr128_32 extractps # not shorter
- .section .text.sha1_process_block64_shaNI,"ax", at progbits
+ .section .text.sha1_process_block64_shaNI, "ax", @progbits
.globl sha1_process_block64_shaNI
.hidden sha1_process_block64_shaNI
.type sha1_process_block64_shaNI, @function
@@ -224,7 +224,7 @@ sha1_process_block64_shaNI:
.size sha1_process_block64_shaNI, .-sha1_process_block64_shaNI
.section .rodata.cst16.PSHUFFLE_BYTE_FLIP_MASK, "aM", @progbits, 16
-.align 16
+.balign 16
PSHUFFLE_BYTE_FLIP_MASK:
.octa 0x000102030405060708090a0b0c0d0e0f
diff --git a/libbb/hash_md5_sha_x86-64.S b/libbb/hash_md5_sha_x86-64.S
index 743269d98..47ace60de 100644
--- a/libbb/hash_md5_sha_x86-64.S
+++ b/libbb/hash_md5_sha_x86-64.S
@@ -1394,7 +1394,7 @@ sha1_process_block64:
.size sha1_process_block64, .-sha1_process_block64
.section .rodata.cst16.sha1const, "aM", @progbits, 16
- .align 16
+ .balign 16
rconst0x5A827999:
.long 0x5A827999
.long 0x5A827999
diff --git a/libbb/hash_md5_sha_x86-64.S.sh b/libbb/hash_md5_sha_x86-64.S.sh
index 47c40af0d..656fb5414 100755
--- a/libbb/hash_md5_sha_x86-64.S.sh
+++ b/libbb/hash_md5_sha_x86-64.S.sh
@@ -433,7 +433,7 @@ echo "
.size sha1_process_block64, .-sha1_process_block64
.section .rodata.cst16.sha1const, \"aM\", @progbits, 16
- .align 16
+ .balign 16
rconst0x5A827999:
.long 0x5A827999
.long 0x5A827999
diff --git a/libbb/hash_md5_sha_x86-64_shaNI.S b/libbb/hash_md5_sha_x86-64_shaNI.S
index 33cc3bf7f..ba92f09df 100644
--- a/libbb/hash_md5_sha_x86-64_shaNI.S
+++ b/libbb/hash_md5_sha_x86-64_shaNI.S
@@ -20,7 +20,7 @@
#define extr128_32 pextrd
//#define extr128_32 extractps # not shorter
- .section .text.sha1_process_block64_shaNI,"ax", at progbits
+ .section .text.sha1_process_block64_shaNI, "ax", @progbits
.globl sha1_process_block64_shaNI
.hidden sha1_process_block64_shaNI
.type sha1_process_block64_shaNI, @function
@@ -218,7 +218,7 @@ sha1_process_block64_shaNI:
.size sha1_process_block64_shaNI, .-sha1_process_block64_shaNI
.section .rodata.cst16.PSHUFFLE_BYTE_FLIP_MASK, "aM", @progbits, 16
-.align 16
+.balign 16
PSHUFFLE_BYTE_FLIP_MASK:
.octa 0x000102030405060708090a0b0c0d0e0f
From vda.linux at googlemail.com Thu Feb 3 14:11:23 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Thu, 3 Feb 2022 15:11:23 +0100
Subject: [git commit] libbb/sha256: code shrink in 32-bit x86
Message-ID: <20220203140444.EF21982A68@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=de6cb4bed82356db72af81890c7c26d7e85fb50d
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
function old new delta
sha256_process_block64_shaNI 747 722 -25
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha256_x86-32_shaNI.S | 35 +++++++++++++++++------------------
1 file changed, 17 insertions(+), 18 deletions(-)
diff --git a/libbb/hash_md5_sha256_x86-32_shaNI.S b/libbb/hash_md5_sha256_x86-32_shaNI.S
index 56e37fa38..632dab7e6 100644
--- a/libbb/hash_md5_sha256_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-32_shaNI.S
@@ -49,8 +49,7 @@ sha256_process_block64_shaNI:
palignr $8, STATE1, STATE0 /* ABEF */
pblendw $0xF0, MSGTMP4, STATE1 /* CDGH */
-# mova128 PSHUFFLE_BSWAP32_FLIP_MASK, SHUF_MASK
- lea K256, SHA256CONSTANTS
+ movl $K256+8*16, SHA256CONSTANTS
/* Save hash values for addition after rounds */
mova128 STATE0, 0*16(%esp)
@@ -60,7 +59,7 @@ sha256_process_block64_shaNI:
movu128 0*16(DATA_PTR), MSG
pshufb PSHUFFLE_BSWAP32_FLIP_MASK, MSG
mova128 MSG, MSGTMP0
- paddd 0*16(SHA256CONSTANTS), MSG
+ paddd 0*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -69,7 +68,7 @@ sha256_process_block64_shaNI:
movu128 1*16(DATA_PTR), MSG
pshufb PSHUFFLE_BSWAP32_FLIP_MASK, MSG
mova128 MSG, MSGTMP1
- paddd 1*16(SHA256CONSTANTS), MSG
+ paddd 1*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -79,7 +78,7 @@ sha256_process_block64_shaNI:
movu128 2*16(DATA_PTR), MSG
pshufb PSHUFFLE_BSWAP32_FLIP_MASK, MSG
mova128 MSG, MSGTMP2
- paddd 2*16(SHA256CONSTANTS), MSG
+ paddd 2*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -89,7 +88,7 @@ sha256_process_block64_shaNI:
movu128 3*16(DATA_PTR), MSG
pshufb PSHUFFLE_BSWAP32_FLIP_MASK, MSG
mova128 MSG, MSGTMP3
- paddd 3*16(SHA256CONSTANTS), MSG
+ paddd 3*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP3, MSGTMP4
palignr $4, MSGTMP2, MSGTMP4
@@ -101,7 +100,7 @@ sha256_process_block64_shaNI:
/* Rounds 16-19 */
mova128 MSGTMP0, MSG
- paddd 4*16(SHA256CONSTANTS), MSG
+ paddd 4*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP0, MSGTMP4
palignr $4, MSGTMP3, MSGTMP4
@@ -113,7 +112,7 @@ sha256_process_block64_shaNI:
/* Rounds 20-23 */
mova128 MSGTMP1, MSG
- paddd 5*16(SHA256CONSTANTS), MSG
+ paddd 5*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP1, MSGTMP4
palignr $4, MSGTMP0, MSGTMP4
@@ -125,7 +124,7 @@ sha256_process_block64_shaNI:
/* Rounds 24-27 */
mova128 MSGTMP2, MSG
- paddd 6*16(SHA256CONSTANTS), MSG
+ paddd 6*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP2, MSGTMP4
palignr $4, MSGTMP1, MSGTMP4
@@ -137,7 +136,7 @@ sha256_process_block64_shaNI:
/* Rounds 28-31 */
mova128 MSGTMP3, MSG
- paddd 7*16(SHA256CONSTANTS), MSG
+ paddd 7*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP3, MSGTMP4
palignr $4, MSGTMP2, MSGTMP4
@@ -149,7 +148,7 @@ sha256_process_block64_shaNI:
/* Rounds 32-35 */
mova128 MSGTMP0, MSG
- paddd 8*16(SHA256CONSTANTS), MSG
+ paddd 8*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP0, MSGTMP4
palignr $4, MSGTMP3, MSGTMP4
@@ -161,7 +160,7 @@ sha256_process_block64_shaNI:
/* Rounds 36-39 */
mova128 MSGTMP1, MSG
- paddd 9*16(SHA256CONSTANTS), MSG
+ paddd 9*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP1, MSGTMP4
palignr $4, MSGTMP0, MSGTMP4
@@ -173,7 +172,7 @@ sha256_process_block64_shaNI:
/* Rounds 40-43 */
mova128 MSGTMP2, MSG
- paddd 10*16(SHA256CONSTANTS), MSG
+ paddd 10*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP2, MSGTMP4
palignr $4, MSGTMP1, MSGTMP4
@@ -185,7 +184,7 @@ sha256_process_block64_shaNI:
/* Rounds 44-47 */
mova128 MSGTMP3, MSG
- paddd 11*16(SHA256CONSTANTS), MSG
+ paddd 11*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP3, MSGTMP4
palignr $4, MSGTMP2, MSGTMP4
@@ -197,7 +196,7 @@ sha256_process_block64_shaNI:
/* Rounds 48-51 */
mova128 MSGTMP0, MSG
- paddd 12*16(SHA256CONSTANTS), MSG
+ paddd 12*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP0, MSGTMP4
palignr $4, MSGTMP3, MSGTMP4
@@ -209,7 +208,7 @@ sha256_process_block64_shaNI:
/* Rounds 52-55 */
mova128 MSGTMP1, MSG
- paddd 13*16(SHA256CONSTANTS), MSG
+ paddd 13*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP1, MSGTMP4
palignr $4, MSGTMP0, MSGTMP4
@@ -220,7 +219,7 @@ sha256_process_block64_shaNI:
/* Rounds 56-59 */
mova128 MSGTMP2, MSG
- paddd 14*16(SHA256CONSTANTS), MSG
+ paddd 14*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP2, MSGTMP4
palignr $4, MSGTMP1, MSGTMP4
@@ -231,7 +230,7 @@ sha256_process_block64_shaNI:
/* Rounds 60-63 */
mova128 MSGTMP3, MSG
- paddd 15*16(SHA256CONSTANTS), MSG
+ paddd 15*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
From vda.linux at googlemail.com Thu Feb 3 14:17:42 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Thu, 3 Feb 2022 15:17:42 +0100
Subject: [git commit] libbb/sha256: code shrink in 64-bit x86
Message-ID: <20220203141101.202F882A2F@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=a1429fbb8ca373efc01939d599f6f65969b1a366
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
function old new delta
sha256_process_block64_shaNI 730 706 -24
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha256_x86-64_shaNI.S | 34 +++++++++++++++++-----------------
1 file changed, 17 insertions(+), 17 deletions(-)
diff --git a/libbb/hash_md5_sha256_x86-64_shaNI.S b/libbb/hash_md5_sha256_x86-64_shaNI.S
index 1c2b75af3..f3df541e4 100644
--- a/libbb/hash_md5_sha256_x86-64_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-64_shaNI.S
@@ -50,7 +50,7 @@ sha256_process_block64_shaNI:
pblendw $0xF0, MSGTMP4, STATE1 /* CDGH */
mova128 PSHUFFLE_BSWAP32_FLIP_MASK(%rip), SHUF_MASK
- lea K256(%rip), SHA256CONSTANTS
+ leaq K256+8*16(%rip), SHA256CONSTANTS
/* Save hash values for addition after rounds */
mova128 STATE0, ABEF_SAVE
@@ -60,7 +60,7 @@ sha256_process_block64_shaNI:
movu128 0*16(DATA_PTR), MSG
pshufb SHUF_MASK, MSG
mova128 MSG, MSGTMP0
- paddd 0*16(SHA256CONSTANTS), MSG
+ paddd 0*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -69,7 +69,7 @@ sha256_process_block64_shaNI:
movu128 1*16(DATA_PTR), MSG
pshufb SHUF_MASK, MSG
mova128 MSG, MSGTMP1
- paddd 1*16(SHA256CONSTANTS), MSG
+ paddd 1*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -79,7 +79,7 @@ sha256_process_block64_shaNI:
movu128 2*16(DATA_PTR), MSG
pshufb SHUF_MASK, MSG
mova128 MSG, MSGTMP2
- paddd 2*16(SHA256CONSTANTS), MSG
+ paddd 2*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -89,7 +89,7 @@ sha256_process_block64_shaNI:
movu128 3*16(DATA_PTR), MSG
pshufb SHUF_MASK, MSG
mova128 MSG, MSGTMP3
- paddd 3*16(SHA256CONSTANTS), MSG
+ paddd 3*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP3, MSGTMP4
palignr $4, MSGTMP2, MSGTMP4
@@ -101,7 +101,7 @@ sha256_process_block64_shaNI:
/* Rounds 16-19 */
mova128 MSGTMP0, MSG
- paddd 4*16(SHA256CONSTANTS), MSG
+ paddd 4*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP0, MSGTMP4
palignr $4, MSGTMP3, MSGTMP4
@@ -113,7 +113,7 @@ sha256_process_block64_shaNI:
/* Rounds 20-23 */
mova128 MSGTMP1, MSG
- paddd 5*16(SHA256CONSTANTS), MSG
+ paddd 5*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP1, MSGTMP4
palignr $4, MSGTMP0, MSGTMP4
@@ -125,7 +125,7 @@ sha256_process_block64_shaNI:
/* Rounds 24-27 */
mova128 MSGTMP2, MSG
- paddd 6*16(SHA256CONSTANTS), MSG
+ paddd 6*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP2, MSGTMP4
palignr $4, MSGTMP1, MSGTMP4
@@ -137,7 +137,7 @@ sha256_process_block64_shaNI:
/* Rounds 28-31 */
mova128 MSGTMP3, MSG
- paddd 7*16(SHA256CONSTANTS), MSG
+ paddd 7*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP3, MSGTMP4
palignr $4, MSGTMP2, MSGTMP4
@@ -149,7 +149,7 @@ sha256_process_block64_shaNI:
/* Rounds 32-35 */
mova128 MSGTMP0, MSG
- paddd 8*16(SHA256CONSTANTS), MSG
+ paddd 8*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP0, MSGTMP4
palignr $4, MSGTMP3, MSGTMP4
@@ -161,7 +161,7 @@ sha256_process_block64_shaNI:
/* Rounds 36-39 */
mova128 MSGTMP1, MSG
- paddd 9*16(SHA256CONSTANTS), MSG
+ paddd 9*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP1, MSGTMP4
palignr $4, MSGTMP0, MSGTMP4
@@ -173,7 +173,7 @@ sha256_process_block64_shaNI:
/* Rounds 40-43 */
mova128 MSGTMP2, MSG
- paddd 10*16(SHA256CONSTANTS), MSG
+ paddd 10*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP2, MSGTMP4
palignr $4, MSGTMP1, MSGTMP4
@@ -185,7 +185,7 @@ sha256_process_block64_shaNI:
/* Rounds 44-47 */
mova128 MSGTMP3, MSG
- paddd 11*16(SHA256CONSTANTS), MSG
+ paddd 11*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP3, MSGTMP4
palignr $4, MSGTMP2, MSGTMP4
@@ -197,7 +197,7 @@ sha256_process_block64_shaNI:
/* Rounds 48-51 */
mova128 MSGTMP0, MSG
- paddd 12*16(SHA256CONSTANTS), MSG
+ paddd 12*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP0, MSGTMP4
palignr $4, MSGTMP3, MSGTMP4
@@ -209,7 +209,7 @@ sha256_process_block64_shaNI:
/* Rounds 52-55 */
mova128 MSGTMP1, MSG
- paddd 13*16(SHA256CONSTANTS), MSG
+ paddd 13*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP1, MSGTMP4
palignr $4, MSGTMP0, MSGTMP4
@@ -220,7 +220,7 @@ sha256_process_block64_shaNI:
/* Rounds 56-59 */
mova128 MSGTMP2, MSG
- paddd 14*16(SHA256CONSTANTS), MSG
+ paddd 14*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
mova128 MSGTMP2, MSGTMP4
palignr $4, MSGTMP1, MSGTMP4
@@ -231,7 +231,7 @@ sha256_process_block64_shaNI:
/* Rounds 60-63 */
mova128 MSGTMP3, MSG
- paddd 15*16(SHA256CONSTANTS), MSG
+ paddd 15*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
From vda.linux at googlemail.com Sat Feb 5 23:33:42 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sun, 6 Feb 2022 00:33:42 +0100
Subject: [git commit] libbb/sha256: code shrink in 64-bit x86
Message-ID: <20220205234357.20D04819E6@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=31c1c310772fa6c897ee1585ea15fc38f3ab3dff
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
function old new delta
sha256_process_block64_shaNI 706 701 -5
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha256_x86-64_shaNI.S | 96 ++++++++++++++++++------------------
1 file changed, 48 insertions(+), 48 deletions(-)
diff --git a/libbb/hash_md5_sha256_x86-64_shaNI.S b/libbb/hash_md5_sha256_x86-64_shaNI.S
index f3df541e4..dbf391135 100644
--- a/libbb/hash_md5_sha256_x86-64_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-64_shaNI.S
@@ -31,9 +31,7 @@
#define MSGTMP1 %xmm4
#define MSGTMP2 %xmm5
#define MSGTMP3 %xmm6
-#define MSGTMP4 %xmm7
-
-#define SHUF_MASK %xmm8
+#define XMMTMP4 %xmm7
#define ABEF_SAVE %xmm9
#define CDGH_SAVE %xmm10
@@ -45,11 +43,12 @@ sha256_process_block64_shaNI:
shuf128_32 $0xB1, STATE0, STATE0 /* CDAB */
shuf128_32 $0x1B, STATE1, STATE1 /* EFGH */
- mova128 STATE0, MSGTMP4
+ mova128 STATE0, XMMTMP4
palignr $8, STATE1, STATE0 /* ABEF */
- pblendw $0xF0, MSGTMP4, STATE1 /* CDGH */
+ pblendw $0xF0, XMMTMP4, STATE1 /* CDGH */
- mova128 PSHUFFLE_BSWAP32_FLIP_MASK(%rip), SHUF_MASK
+/* XMMTMP4 holds flip mask from here... */
+ mova128 PSHUFFLE_BSWAP32_FLIP_MASK(%rip), XMMTMP4
leaq K256+8*16(%rip), SHA256CONSTANTS
/* Save hash values for addition after rounds */
@@ -58,7 +57,7 @@ sha256_process_block64_shaNI:
/* Rounds 0-3 */
movu128 0*16(DATA_PTR), MSG
- pshufb SHUF_MASK, MSG
+ pshufb XMMTMP4, MSG
mova128 MSG, MSGTMP0
paddd 0*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
@@ -67,7 +66,7 @@ sha256_process_block64_shaNI:
/* Rounds 4-7 */
movu128 1*16(DATA_PTR), MSG
- pshufb SHUF_MASK, MSG
+ pshufb XMMTMP4, MSG
mova128 MSG, MSGTMP1
paddd 1*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
@@ -77,7 +76,7 @@ sha256_process_block64_shaNI:
/* Rounds 8-11 */
movu128 2*16(DATA_PTR), MSG
- pshufb SHUF_MASK, MSG
+ pshufb XMMTMP4, MSG
mova128 MSG, MSGTMP2
paddd 2*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
@@ -87,13 +86,14 @@ sha256_process_block64_shaNI:
/* Rounds 12-15 */
movu128 3*16(DATA_PTR), MSG
- pshufb SHUF_MASK, MSG
+ pshufb XMMTMP4, MSG
+/* ...to here */
mova128 MSG, MSGTMP3
paddd 3*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP3, MSGTMP4
- palignr $4, MSGTMP2, MSGTMP4
- paddd MSGTMP4, MSGTMP0
+ mova128 MSGTMP3, XMMTMP4
+ palignr $4, MSGTMP2, XMMTMP4
+ paddd XMMTMP4, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -103,9 +103,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP0, MSG
paddd 4*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP0, MSGTMP4
- palignr $4, MSGTMP3, MSGTMP4
- paddd MSGTMP4, MSGTMP1
+ mova128 MSGTMP0, XMMTMP4
+ palignr $4, MSGTMP3, XMMTMP4
+ paddd XMMTMP4, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -115,9 +115,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP1, MSG
paddd 5*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP1, MSGTMP4
- palignr $4, MSGTMP0, MSGTMP4
- paddd MSGTMP4, MSGTMP2
+ mova128 MSGTMP1, XMMTMP4
+ palignr $4, MSGTMP0, XMMTMP4
+ paddd XMMTMP4, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -127,9 +127,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP2, MSG
paddd 6*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP2, MSGTMP4
- palignr $4, MSGTMP1, MSGTMP4
- paddd MSGTMP4, MSGTMP3
+ mova128 MSGTMP2, XMMTMP4
+ palignr $4, MSGTMP1, XMMTMP4
+ paddd XMMTMP4, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -139,9 +139,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP3, MSG
paddd 7*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP3, MSGTMP4
- palignr $4, MSGTMP2, MSGTMP4
- paddd MSGTMP4, MSGTMP0
+ mova128 MSGTMP3, XMMTMP4
+ palignr $4, MSGTMP2, XMMTMP4
+ paddd XMMTMP4, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -151,9 +151,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP0, MSG
paddd 8*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP0, MSGTMP4
- palignr $4, MSGTMP3, MSGTMP4
- paddd MSGTMP4, MSGTMP1
+ mova128 MSGTMP0, XMMTMP4
+ palignr $4, MSGTMP3, XMMTMP4
+ paddd XMMTMP4, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -163,9 +163,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP1, MSG
paddd 9*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP1, MSGTMP4
- palignr $4, MSGTMP0, MSGTMP4
- paddd MSGTMP4, MSGTMP2
+ mova128 MSGTMP1, XMMTMP4
+ palignr $4, MSGTMP0, XMMTMP4
+ paddd XMMTMP4, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -175,9 +175,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP2, MSG
paddd 10*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP2, MSGTMP4
- palignr $4, MSGTMP1, MSGTMP4
- paddd MSGTMP4, MSGTMP3
+ mova128 MSGTMP2, XMMTMP4
+ palignr $4, MSGTMP1, XMMTMP4
+ paddd XMMTMP4, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -187,9 +187,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP3, MSG
paddd 11*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP3, MSGTMP4
- palignr $4, MSGTMP2, MSGTMP4
- paddd MSGTMP4, MSGTMP0
+ mova128 MSGTMP3, XMMTMP4
+ palignr $4, MSGTMP2, XMMTMP4
+ paddd XMMTMP4, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -199,9 +199,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP0, MSG
paddd 12*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP0, MSGTMP4
- palignr $4, MSGTMP3, MSGTMP4
- paddd MSGTMP4, MSGTMP1
+ mova128 MSGTMP0, XMMTMP4
+ palignr $4, MSGTMP3, XMMTMP4
+ paddd XMMTMP4, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -211,9 +211,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP1, MSG
paddd 13*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP1, MSGTMP4
- palignr $4, MSGTMP0, MSGTMP4
- paddd MSGTMP4, MSGTMP2
+ mova128 MSGTMP1, XMMTMP4
+ palignr $4, MSGTMP0, XMMTMP4
+ paddd XMMTMP4, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -222,9 +222,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP2, MSG
paddd 14*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP2, MSGTMP4
- palignr $4, MSGTMP1, MSGTMP4
- paddd MSGTMP4, MSGTMP3
+ mova128 MSGTMP2, XMMTMP4
+ palignr $4, MSGTMP1, XMMTMP4
+ paddd XMMTMP4, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -243,9 +243,9 @@ sha256_process_block64_shaNI:
/* Write hash values back in the correct order */
shuf128_32 $0x1B, STATE0, STATE0 /* FEBA */
shuf128_32 $0xB1, STATE1, STATE1 /* DCHG */
- mova128 STATE0, MSGTMP4
+ mova128 STATE0, XMMTMP4
pblendw $0xF0, STATE1, STATE0 /* DCBA */
- palignr $8, MSGTMP4, STATE1 /* HGFE */
+ palignr $8, XMMTMP4, STATE1 /* HGFE */
movu128 STATE0, 80+0*16(%rdi)
movu128 STATE1, 80+1*16(%rdi)
From vda.linux at googlemail.com Sat Feb 5 23:56:13 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sun, 6 Feb 2022 00:56:13 +0100
Subject: [git commit] libbb/sha256: code shrink in 32-bit x86
Message-ID: <20220205235036.87148821F1@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=4f40735c87f8292a87c066b3b7099b0be007cf59
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
function old new delta
sha256_process_block64_shaNI 722 713 -9
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha256_x86-32_shaNI.S | 93 +++++++++++++++++++-----------------
1 file changed, 48 insertions(+), 45 deletions(-)
diff --git a/libbb/hash_md5_sha256_x86-32_shaNI.S b/libbb/hash_md5_sha256_x86-32_shaNI.S
index 632dab7e6..417da37d8 100644
--- a/libbb/hash_md5_sha256_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-32_shaNI.S
@@ -31,7 +31,7 @@
#define MSGTMP1 %xmm4
#define MSGTMP2 %xmm5
#define MSGTMP3 %xmm6
-#define MSGTMP4 %xmm7
+#define XMMTMP4 %xmm7
.balign 8 # allow decoders to fetch at least 3 first insns
sha256_process_block64_shaNI:
@@ -45,10 +45,12 @@ sha256_process_block64_shaNI:
shuf128_32 $0xB1, STATE0, STATE0 /* CDAB */
shuf128_32 $0x1B, STATE1, STATE1 /* EFGH */
- mova128 STATE0, MSGTMP4
+ mova128 STATE0, XMMTMP4
palignr $8, STATE1, STATE0 /* ABEF */
- pblendw $0xF0, MSGTMP4, STATE1 /* CDGH */
+ pblendw $0xF0, XMMTMP4, STATE1 /* CDGH */
+/* XMMTMP4 holds flip mask from here... */
+ mova128 PSHUFFLE_BSWAP32_FLIP_MASK, XMMTMP4
movl $K256+8*16, SHA256CONSTANTS
/* Save hash values for addition after rounds */
@@ -57,7 +59,7 @@ sha256_process_block64_shaNI:
/* Rounds 0-3 */
movu128 0*16(DATA_PTR), MSG
- pshufb PSHUFFLE_BSWAP32_FLIP_MASK, MSG
+ pshufb XMMTMP4, MSG
mova128 MSG, MSGTMP0
paddd 0*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
@@ -66,7 +68,7 @@ sha256_process_block64_shaNI:
/* Rounds 4-7 */
movu128 1*16(DATA_PTR), MSG
- pshufb PSHUFFLE_BSWAP32_FLIP_MASK, MSG
+ pshufb XMMTMP4, MSG
mova128 MSG, MSGTMP1
paddd 1*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
@@ -76,7 +78,7 @@ sha256_process_block64_shaNI:
/* Rounds 8-11 */
movu128 2*16(DATA_PTR), MSG
- pshufb PSHUFFLE_BSWAP32_FLIP_MASK, MSG
+ pshufb XMMTMP4, MSG
mova128 MSG, MSGTMP2
paddd 2*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
@@ -86,13 +88,14 @@ sha256_process_block64_shaNI:
/* Rounds 12-15 */
movu128 3*16(DATA_PTR), MSG
- pshufb PSHUFFLE_BSWAP32_FLIP_MASK, MSG
+ pshufb XMMTMP4, MSG
+/* ...to here */
mova128 MSG, MSGTMP3
paddd 3*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP3, MSGTMP4
- palignr $4, MSGTMP2, MSGTMP4
- paddd MSGTMP4, MSGTMP0
+ mova128 MSGTMP3, XMMTMP4
+ palignr $4, MSGTMP2, XMMTMP4
+ paddd XMMTMP4, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -102,9 +105,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP0, MSG
paddd 4*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP0, MSGTMP4
- palignr $4, MSGTMP3, MSGTMP4
- paddd MSGTMP4, MSGTMP1
+ mova128 MSGTMP0, XMMTMP4
+ palignr $4, MSGTMP3, XMMTMP4
+ paddd XMMTMP4, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -114,9 +117,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP1, MSG
paddd 5*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP1, MSGTMP4
- palignr $4, MSGTMP0, MSGTMP4
- paddd MSGTMP4, MSGTMP2
+ mova128 MSGTMP1, XMMTMP4
+ palignr $4, MSGTMP0, XMMTMP4
+ paddd XMMTMP4, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -126,9 +129,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP2, MSG
paddd 6*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP2, MSGTMP4
- palignr $4, MSGTMP1, MSGTMP4
- paddd MSGTMP4, MSGTMP3
+ mova128 MSGTMP2, XMMTMP4
+ palignr $4, MSGTMP1, XMMTMP4
+ paddd XMMTMP4, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -138,9 +141,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP3, MSG
paddd 7*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP3, MSGTMP4
- palignr $4, MSGTMP2, MSGTMP4
- paddd MSGTMP4, MSGTMP0
+ mova128 MSGTMP3, XMMTMP4
+ palignr $4, MSGTMP2, XMMTMP4
+ paddd XMMTMP4, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -150,9 +153,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP0, MSG
paddd 8*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP0, MSGTMP4
- palignr $4, MSGTMP3, MSGTMP4
- paddd MSGTMP4, MSGTMP1
+ mova128 MSGTMP0, XMMTMP4
+ palignr $4, MSGTMP3, XMMTMP4
+ paddd XMMTMP4, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -162,9 +165,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP1, MSG
paddd 9*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP1, MSGTMP4
- palignr $4, MSGTMP0, MSGTMP4
- paddd MSGTMP4, MSGTMP2
+ mova128 MSGTMP1, XMMTMP4
+ palignr $4, MSGTMP0, XMMTMP4
+ paddd XMMTMP4, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -174,9 +177,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP2, MSG
paddd 10*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP2, MSGTMP4
- palignr $4, MSGTMP1, MSGTMP4
- paddd MSGTMP4, MSGTMP3
+ mova128 MSGTMP2, XMMTMP4
+ palignr $4, MSGTMP1, XMMTMP4
+ paddd XMMTMP4, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -186,9 +189,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP3, MSG
paddd 11*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP3, MSGTMP4
- palignr $4, MSGTMP2, MSGTMP4
- paddd MSGTMP4, MSGTMP0
+ mova128 MSGTMP3, XMMTMP4
+ palignr $4, MSGTMP2, XMMTMP4
+ paddd XMMTMP4, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -198,9 +201,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP0, MSG
paddd 12*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP0, MSGTMP4
- palignr $4, MSGTMP3, MSGTMP4
- paddd MSGTMP4, MSGTMP1
+ mova128 MSGTMP0, XMMTMP4
+ palignr $4, MSGTMP3, XMMTMP4
+ paddd XMMTMP4, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -210,9 +213,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP1, MSG
paddd 13*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP1, MSGTMP4
- palignr $4, MSGTMP0, MSGTMP4
- paddd MSGTMP4, MSGTMP2
+ mova128 MSGTMP1, XMMTMP4
+ palignr $4, MSGTMP0, XMMTMP4
+ paddd XMMTMP4, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -221,9 +224,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP2, MSG
paddd 14*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP2, MSGTMP4
- palignr $4, MSGTMP1, MSGTMP4
- paddd MSGTMP4, MSGTMP3
+ mova128 MSGTMP2, XMMTMP4
+ palignr $4, MSGTMP1, XMMTMP4
+ paddd XMMTMP4, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -242,9 +245,9 @@ sha256_process_block64_shaNI:
/* Write hash values back in the correct order */
shuf128_32 $0x1B, STATE0, STATE0 /* FEBA */
shuf128_32 $0xB1, STATE1, STATE1 /* DCHG */
- mova128 STATE0, MSGTMP4
+ mova128 STATE0, XMMTMP4
pblendw $0xF0, STATE1, STATE0 /* DCBA */
- palignr $8, MSGTMP4, STATE1 /* HGFE */
+ palignr $8, XMMTMP4, STATE1 /* HGFE */
movu128 STATE0, 76+0*16(%eax)
movu128 STATE1, 76+1*16(%eax)
From vda.linux at googlemail.com Sun Feb 6 18:53:10 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sun, 6 Feb 2022 19:53:10 +0100
Subject: [git commit] *: slap on a few ALIGN* where appropriate
Message-ID: <20220206184644.726FD82C83@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=ca466f385ac985a8b3491daa9f326dc480cdee70
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
The result of looking at "grep -F -B2 '*fill*' busybox_unstripped.map"
function old new delta
.rodata 108586 108460 -126
------------------------------------------------------------------------------
(add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-126) Total: -126 bytes
text data bss dec hex filename
970412 4219 1848 976479 ee65f busybox_old
970286 4219 1848 976353 ee5e1 busybox_unstripped
Signed-off-by: Denys Vlasenko
---
console-tools/reset.c | 2 +-
coreutils/od.c | 2 +-
include/platform.h | 1 +
libbb/appletlib.c | 2 +-
libbb/get_console.c | 2 +-
miscutils/bc.c | 2 +-
miscutils/man.c | 2 +-
networking/ifupdown.c | 8 ++++----
networking/interface.c | 6 +++---
networking/libiproute/ipaddress.c | 2 +-
networking/udhcp/common.c | 2 +-
networking/udhcp/d6_dhcpc.c | 2 +-
shell/ash.c | 2 +-
util-linux/hexdump.c | 2 +-
util-linux/nsenter.c | 2 +-
util-linux/unshare.c | 2 +-
16 files changed, 21 insertions(+), 20 deletions(-)
diff --git a/console-tools/reset.c b/console-tools/reset.c
index b3acf69f8..cc04e4fcc 100644
--- a/console-tools/reset.c
+++ b/console-tools/reset.c
@@ -36,7 +36,7 @@ int stty_main(int argc, char **argv) MAIN_EXTERNALLY_VISIBLE;
int reset_main(int argc, char **argv) MAIN_EXTERNALLY_VISIBLE;
int reset_main(int argc UNUSED_PARAM, char **argv UNUSED_PARAM)
{
- static const char *const args[] = {
+ static const char *const args[] ALIGN_PTR = {
"stty", "sane", NULL
};
diff --git a/coreutils/od.c b/coreutils/od.c
index 9a888dd5f..6f22331e0 100644
--- a/coreutils/od.c
+++ b/coreutils/od.c
@@ -144,7 +144,7 @@ odoffset(dumper_t *dumper, int argc, char ***argvp)
}
}
-static const char *const add_strings[] = {
+static const char *const add_strings[] ALIGN_PTR = {
"16/1 \"%3_u \" \"\\n\"", /* a */
"8/2 \" %06o \" \"\\n\"", /* B, o */
"16/1 \"%03o \" \"\\n\"", /* b */
diff --git a/include/platform.h b/include/platform.h
index ad27bb31a..ea0512f36 100644
--- a/include/platform.h
+++ b/include/platform.h
@@ -346,6 +346,7 @@ typedef unsigned smalluint;
# define ALIGN4
#endif
#define ALIGN8 __attribute__((aligned(8)))
+#define ALIGN_INT __attribute__((aligned(sizeof(int))))
#define ALIGN_PTR __attribute__((aligned(sizeof(void*))))
/*
diff --git a/libbb/appletlib.c b/libbb/appletlib.c
index 03389f541..841b3b873 100644
--- a/libbb/appletlib.c
+++ b/libbb/appletlib.c
@@ -651,7 +651,7 @@ static void check_suid(int applet_no)
# if ENABLE_FEATURE_INSTALLER
static const char usr_bin [] ALIGN1 = "/usr/bin/";
static const char usr_sbin[] ALIGN1 = "/usr/sbin/";
-static const char *const install_dir[] = {
+static const char *const install_dir[] ALIGN_PTR = {
&usr_bin [8], /* "/" */
&usr_bin [4], /* "/bin/" */
&usr_sbin[4] /* "/sbin/" */
diff --git a/libbb/get_console.c b/libbb/get_console.c
index 7f2c75332..9044efea1 100644
--- a/libbb/get_console.c
+++ b/libbb/get_console.c
@@ -37,7 +37,7 @@ static int open_a_console(const char *fnam)
*/
int FAST_FUNC get_console_fd_or_die(void)
{
- static const char *const console_names[] = {
+ static const char *const console_names[] ALIGN_PTR = {
DEV_CONSOLE, CURRENT_VC, CURRENT_TTY
};
diff --git a/miscutils/bc.c b/miscutils/bc.c
index ae370ff55..ab785bbc8 100644
--- a/miscutils/bc.c
+++ b/miscutils/bc.c
@@ -6011,7 +6011,7 @@ static BC_STATUS zxc_program_assign(char inst)
#endif
if (ib || sc || left->t == XC_RESULT_OBASE) {
- static const char *const msg[] = {
+ static const char *const msg[] ALIGN_PTR = {
"bad ibase; must be [2,16]", //XC_RESULT_IBASE
"bad obase; must be [2,"BC_MAX_OBASE_STR"]", //XC_RESULT_OBASE
"bad scale; must be [0,"BC_MAX_SCALE_STR"]", //XC_RESULT_SCALE
diff --git a/miscutils/man.c b/miscutils/man.c
index d319e8bba..deaf9e5ab 100644
--- a/miscutils/man.c
+++ b/miscutils/man.c
@@ -303,7 +303,7 @@ int man_main(int argc UNUSED_PARAM, char **argv)
config_close(parser);
if (!man_path_list) {
- static const char *const mpl[] = { "/usr/man", "/usr/share/man", NULL };
+ static const char *const mpl[] ALIGN_PTR = { "/usr/man", "/usr/share/man", NULL };
man_path_list = (char**)mpl;
/*count_mp = 2; - not used below anyway */
}
diff --git a/networking/ifupdown.c b/networking/ifupdown.c
index 737113dd4..6c4ae27f2 100644
--- a/networking/ifupdown.c
+++ b/networking/ifupdown.c
@@ -532,7 +532,7 @@ static int FAST_FUNC v4tunnel_down(struct interface_defn_t * ifd, execfn * exec)
}
# endif
-static const struct method_t methods6[] = {
+static const struct method_t methods6[] ALIGN_PTR = {
# if ENABLE_FEATURE_IFUPDOWN_IP
{ "v4tunnel" , v4tunnel_up , v4tunnel_down , },
# endif
@@ -627,7 +627,7 @@ struct dhcp_client_t {
const char *stopcmd;
};
-static const struct dhcp_client_t ext_dhcp_clients[] = {
+static const struct dhcp_client_t ext_dhcp_clients[] ALIGN_PTR = {
{ "dhcpcd",
"dhcpcd[[ -h %hostname%]][[ -i %vendor%]][[ -I %client%]][[ -l %leasetime%]] %iface%",
"dhcpcd -k %iface%",
@@ -774,7 +774,7 @@ static int FAST_FUNC wvdial_down(struct interface_defn_t *ifd, execfn *exec)
"-p /var/run/wvdial.%iface% -s 2", ifd, exec);
}
-static const struct method_t methods[] = {
+static const struct method_t methods[] ALIGN_PTR = {
{ "manual" , manual_up_down, manual_up_down, },
{ "wvdial" , wvdial_up , wvdial_down , },
{ "ppp" , ppp_up , ppp_down , },
@@ -797,7 +797,7 @@ static int FAST_FUNC link_up_down(struct interface_defn_t *ifd UNUSED_PARAM, exe
return 1;
}
-static const struct method_t link_methods[] = {
+static const struct method_t link_methods[] ALIGN_PTR = {
{ "none", link_up_down, link_up_down }
};
diff --git a/networking/interface.c b/networking/interface.c
index ea6a2c8a8..6b6c0944a 100644
--- a/networking/interface.c
+++ b/networking/interface.c
@@ -446,13 +446,13 @@ static char *get_name(char name[IFNAMSIZ], char *p)
* %n specifiers (even the size of integers may not match).
*/
#if INT_MAX == LONG_MAX
-static const char *const ss_fmt[] = {
+static const char *const ss_fmt[] ALIGN_PTR = {
"%n%llu%u%u%u%u%n%n%n%llu%u%u%u%u%u",
"%llu%llu%u%u%u%u%n%n%llu%llu%u%u%u%u%u",
"%llu%llu%u%u%u%u%u%u%llu%llu%u%u%u%u%u%u"
};
#else
-static const char *const ss_fmt[] = {
+static const char *const ss_fmt[] ALIGN_PTR = {
"%n%llu%lu%lu%lu%lu%n%n%n%llu%lu%lu%lu%lu%lu",
"%llu%llu%lu%lu%lu%lu%n%n%llu%llu%lu%lu%lu%lu%lu",
"%llu%llu%lu%lu%lu%lu%lu%lu%llu%llu%lu%lu%lu%lu%lu%lu"
@@ -731,7 +731,7 @@ static const struct hwtype ib_hwtype = {
#endif
-static const struct hwtype *const hwtypes[] = {
+static const struct hwtype *const hwtypes[] ALIGN_PTR = {
&loop_hwtype,
ðer_hwtype,
&ppp_hwtype,
diff --git a/networking/libiproute/ipaddress.c b/networking/libiproute/ipaddress.c
index 17a838411..ecc3848ff 100644
--- a/networking/libiproute/ipaddress.c
+++ b/networking/libiproute/ipaddress.c
@@ -58,7 +58,7 @@ typedef struct filter_t filter_t;
static void print_link_flags(unsigned flags, unsigned mdown)
{
- static const int flag_masks[] = {
+ static const int flag_masks[] ALIGN_INT = {
IFF_LOOPBACK, IFF_BROADCAST, IFF_POINTOPOINT,
IFF_MULTICAST, IFF_NOARP, IFF_UP, IFF_LOWER_UP };
static const char flag_labels[] ALIGN1 =
diff --git a/networking/udhcp/common.c b/networking/udhcp/common.c
index 8e9b93655..ae818db05 100644
--- a/networking/udhcp/common.c
+++ b/networking/udhcp/common.c
@@ -19,7 +19,7 @@ const uint8_t MAC_BCAST_ADDR[6] ALIGN2 = {
* See RFC2132 for more options.
* OPTION_REQ: these options are requested by udhcpc (unless -o).
*/
-const struct dhcp_optflag dhcp_optflags[] = {
+const struct dhcp_optflag dhcp_optflags[] ALIGN2 = {
/* flags code */
{ OPTION_IP | OPTION_REQ, 0x01 }, /* DHCP_SUBNET */
{ OPTION_S32 , 0x02 }, /* DHCP_TIME_OFFSET */
diff --git a/networking/udhcp/d6_dhcpc.c b/networking/udhcp/d6_dhcpc.c
index 9d2a8f5d3..9fc690315 100644
--- a/networking/udhcp/d6_dhcpc.c
+++ b/networking/udhcp/d6_dhcpc.c
@@ -65,7 +65,7 @@
/* "struct client_data_t client_data" is in bb_common_bufsiz1 */
-static const struct dhcp_optflag d6_optflags[] = {
+static const struct dhcp_optflag d6_optflags[] ALIGN2 = {
#if ENABLE_FEATURE_UDHCPC6_RFC3646
{ OPTION_6RD | OPTION_LIST | OPTION_REQ, D6_OPT_DNS_SERVERS },
{ OPTION_DNS_STRING | OPTION_LIST | OPTION_REQ, D6_OPT_DOMAIN_LIST },
diff --git a/shell/ash.c b/shell/ash.c
index 55df54bd0..adb0f223a 100644
--- a/shell/ash.c
+++ b/shell/ash.c
@@ -313,7 +313,7 @@ typedef long arith_t;
/* ============ Shell options */
/* If you add/change options hare, update --help text too */
-static const char *const optletters_optnames[] = {
+static const char *const optletters_optnames[] ALIGN_PTR = {
"e" "errexit",
"f" "noglob",
/* bash has '-o ignoreeof', but no short synonym -I for it */
diff --git a/util-linux/hexdump.c b/util-linux/hexdump.c
index 57e7e8db7..307a84803 100644
--- a/util-linux/hexdump.c
+++ b/util-linux/hexdump.c
@@ -71,7 +71,7 @@ static void bb_dump_addfile(dumper_t *dumper, char *name)
fclose(fp);
}
-static const char *const add_strings[] = {
+static const char *const add_strings[] ALIGN_PTR = {
"\"%07.7_ax \"16/1 \"%03o \"\"\n\"", /* b */
"\"%07.7_ax \"16/1 \"%3_c \"\"\n\"", /* c */
"\"%07.7_ax \"8/2 \" %05u \"\"\n\"", /* d */
diff --git a/util-linux/nsenter.c b/util-linux/nsenter.c
index e6339da2f..1aa045b35 100644
--- a/util-linux/nsenter.c
+++ b/util-linux/nsenter.c
@@ -93,7 +93,7 @@ enum {
* The user namespace comes first, so that it is entered first.
* This gives an unprivileged user the potential to enter other namespaces.
*/
-static const struct namespace_descr ns_list[] = {
+static const struct namespace_descr ns_list[] ALIGN_INT = {
{ CLONE_NEWUSER, "ns/user", },
{ CLONE_NEWIPC, "ns/ipc", },
{ CLONE_NEWUTS, "ns/uts", },
diff --git a/util-linux/unshare.c b/util-linux/unshare.c
index 68ccdd874..06b938074 100644
--- a/util-linux/unshare.c
+++ b/util-linux/unshare.c
@@ -120,7 +120,7 @@ enum {
NS_USR_POS, /* OPT_user, NS_USR_POS, and ns_list[] index must match! */
NS_COUNT,
};
-static const struct namespace_descr ns_list[] = {
+static const struct namespace_descr ns_list[] ALIGN_INT = {
{ CLONE_NEWNS, "mnt" },
{ CLONE_NEWUTS, "uts" },
{ CLONE_NEWIPC, "ipc" },
From vda.linux at googlemail.com Sun Feb 6 19:07:12 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sun, 6 Feb 2022 20:07:12 +0100
Subject: [git commit] *: slap on a few ALIGN_PTR where appropriate
Message-ID: <20220206190009.C99FD82B4D@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=987be932ed3cbea56b68bbe85649191c13b66015
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
Signed-off-by: Denys Vlasenko
---
coreutils/test.c | 2 +-
e2fsprogs/fsck.c | 2 +-
libbb/getopt32.c | 2 +-
miscutils/devfsd.c | 4 ++--
modutils/modutils-24.c | 4 ++--
networking/inetd.c | 2 +-
procps/nmeter.c | 2 +-
selinux/setenforce.c | 2 +-
shell/hush.c | 10 +++++-----
9 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/coreutils/test.c b/coreutils/test.c
index a914c7490..840a0daaf 100644
--- a/coreutils/test.c
+++ b/coreutils/test.c
@@ -242,7 +242,7 @@ int depth;
depth--; \
return __res; \
} while (0)
-static const char *const TOKSTR[] = {
+static const char *const TOKSTR[] ALIGN_PTR = {
"EOI",
"FILRD",
"FILWR",
diff --git a/e2fsprogs/fsck.c b/e2fsprogs/fsck.c
index 96c1e51e0..028f8a803 100644
--- a/e2fsprogs/fsck.c
+++ b/e2fsprogs/fsck.c
@@ -190,7 +190,7 @@ struct globals {
* Required for the uber-silly devfs /dev/ide/host1/bus2/target3/lun3
* pathames.
*/
-static const char *const devfs_hier[] = {
+static const char *const devfs_hier[] ALIGN_PTR = {
"host", "bus", "target", "lun", NULL
};
#endif
diff --git a/libbb/getopt32.c b/libbb/getopt32.c
index 5ab4d66f1..e861d0567 100644
--- a/libbb/getopt32.c
+++ b/libbb/getopt32.c
@@ -296,7 +296,7 @@ Special characters:
/* Code here assumes that 'unsigned' is at least 32 bits wide */
-const char *const bb_argv_dash[] = { "-", NULL };
+const char *const bb_argv_dash[] ALIGN_PTR = { "-", NULL };
enum {
PARAM_STRING,
diff --git a/miscutils/devfsd.c b/miscutils/devfsd.c
index 839d00fd0..fb9ebcf60 100644
--- a/miscutils/devfsd.c
+++ b/miscutils/devfsd.c
@@ -928,7 +928,7 @@ static void action_compat(const struct devfsd_notify_struct *info, unsigned int
unsigned int i;
char rewind_;
/* 1 to 5 "scsi/" , 6 to 9 "ide/host" */
- static const char *const fmt[] = {
+ static const char *const fmt[] ALIGN_PTR = {
NULL ,
"sg/c%db%dt%du%d", /* scsi/generic */
"sd/c%db%dt%du%d", /* scsi/disc */
@@ -1468,7 +1468,7 @@ const char *get_old_name(const char *devname, unsigned int namelen,
const char *pty1;
const char *pty2;
/* 1 to 5 "scsi/" , 6 to 9 "ide/host", 10 sbp/, 11 vcc/, 12 pty/ */
- static const char *const fmt[] = {
+ static const char *const fmt[] ALIGN_PTR = {
NULL ,
"sg%u", /* scsi/generic */
NULL, /* scsi/disc */
diff --git a/modutils/modutils-24.c b/modutils/modutils-24.c
index ac8632481..d0bc2a6ef 100644
--- a/modutils/modutils-24.c
+++ b/modutils/modutils-24.c
@@ -3458,7 +3458,7 @@ static int obj_load_progbits(char *image, size_t image_size, struct obj_file *f,
static void hide_special_symbols(struct obj_file *f)
{
- static const char *const specials[] = {
+ static const char *const specials[] ALIGN_PTR = {
SPFX "cleanup_module",
SPFX "init_module",
SPFX "kernel_version",
@@ -3484,7 +3484,7 @@ static int obj_gpl_license(struct obj_file *f, const char **license)
* linux/include/linux/module.h. Checking for leading "GPL" will not
* work, somebody will use "GPL sucks, this is proprietary".
*/
- static const char *const gpl_licenses[] = {
+ static const char *const gpl_licenses[] ALIGN_PTR = {
"GPL",
"GPL v2",
"GPL and additional rights",
diff --git a/networking/inetd.c b/networking/inetd.c
index e71be51c3..fb2fbe323 100644
--- a/networking/inetd.c
+++ b/networking/inetd.c
@@ -1538,7 +1538,7 @@ int inetd_main(int argc UNUSED_PARAM, char **argv)
#if ENABLE_FEATURE_INETD_SUPPORT_BUILTIN_ECHO \
|| ENABLE_FEATURE_INETD_SUPPORT_BUILTIN_DISCARD
# if !BB_MMU
-static const char *const cat_args[] = { "cat", NULL };
+static const char *const cat_args[] ALIGN_PTR = { "cat", NULL };
# endif
#endif
diff --git a/procps/nmeter.c b/procps/nmeter.c
index 2310e9844..088d366bf 100644
--- a/procps/nmeter.c
+++ b/procps/nmeter.c
@@ -70,7 +70,7 @@ typedef struct proc_file {
smallint last_gen;
} proc_file;
-static const char *const proc_name[] = {
+static const char *const proc_name[] ALIGN_PTR = {
"stat", // Must match the order of proc_file's!
"loadavg",
"net/dev",
diff --git a/selinux/setenforce.c b/selinux/setenforce.c
index 996034f8e..2267be451 100644
--- a/selinux/setenforce.c
+++ b/selinux/setenforce.c
@@ -26,7 +26,7 @@
/* These strings are arranged so that odd ones
* result in security_setenforce(1) being done,
* the rest will do security_setenforce(0) */
-static const char *const setenforce_cmd[] = {
+static const char *const setenforce_cmd[] ALIGN_PTR = {
"0",
"1",
"permissive",
diff --git a/shell/hush.c b/shell/hush.c
index 6dc2ecaac..ae81f0da5 100644
--- a/shell/hush.c
+++ b/shell/hush.c
@@ -564,7 +564,7 @@ enum {
#define NULL_O_STRING { NULL }
#ifndef debug_printf_parse
-static const char *const assignment_flag[] = {
+static const char *const assignment_flag[] ALIGN_PTR = {
"MAYBE_ASSIGNMENT",
"DEFINITELY_ASSIGNMENT",
"NOT_ASSIGNMENT",
@@ -3682,7 +3682,7 @@ static void free_pipe_list(struct pipe *pi)
#ifndef debug_print_tree
static void debug_print_tree(struct pipe *pi, int lvl)
{
- static const char *const PIPE[] = {
+ static const char *const PIPE[] ALIGN_PTR = {
[PIPE_SEQ] = "SEQ",
[PIPE_AND] = "AND",
[PIPE_OR ] = "OR" ,
@@ -3717,7 +3717,7 @@ static void debug_print_tree(struct pipe *pi, int lvl)
[RES_XXXX ] = "XXXX" ,
[RES_SNTX ] = "SNTX" ,
};
- static const char *const CMDTYPE[] = {
+ static const char *const CMDTYPE[] ALIGN_PTR = {
"{}",
"()",
"[noglob]",
@@ -7659,7 +7659,7 @@ static int generate_stream_from_string(const char *s, pid_t *pid_p)
if (is_prefixed_with(s, "trap")
&& skip_whitespace(s + 4)[0] == '\0'
) {
- static const char *const argv[] = { NULL, NULL };
+ static const char *const argv[] ALIGN_PTR = { NULL, NULL };
builtin_trap((char**)argv);
fflush_all(); /* important */
_exit(0);
@@ -9826,7 +9826,7 @@ static int run_list(struct pipe *pi)
static const char encoded_dollar_at[] ALIGN1 = {
SPECIAL_VAR_SYMBOL, '@' | 0x80, SPECIAL_VAR_SYMBOL, '\0'
}; /* encoded representation of "$@" */
- static const char *const encoded_dollar_at_argv[] = {
+ static const char *const encoded_dollar_at_argv[] ALIGN_PTR = {
encoded_dollar_at, NULL
}; /* argv list with one element: "$@" */
char **vals;
From vda.linux at googlemail.com Tue Feb 8 02:29:16 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Tue, 8 Feb 2022 03:29:16 +0100
Subject: [git commit] libbb/sha1: shrink unrolled x86-64 code
Message-ID: <20220208022853.C4572831C9@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=4923f74e5873b25b8205a4059964cff75ee731a8
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
function old new delta
sha1_process_block64 3482 3481 -1
.rodata 108460 108412 -48
------------------------------------------------------------------------------
(add/remove: 1/4 grow/shrink: 0/2 up/down: 0/-49) Total: -49 bytes
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha_x86-64.S | 33 ++++++++++-----------------------
libbb/hash_md5_sha_x86-64.S.sh | 34 +++++++++++-----------------------
2 files changed, 21 insertions(+), 46 deletions(-)
diff --git a/libbb/hash_md5_sha_x86-64.S b/libbb/hash_md5_sha_x86-64.S
index e26c46f25..287cfe547 100644
--- a/libbb/hash_md5_sha_x86-64.S
+++ b/libbb/hash_md5_sha_x86-64.S
@@ -24,6 +24,7 @@ sha1_process_block64:
# xmm0..xmm3: W[]
# xmm4,xmm5: temps
# xmm6: current round constant
+# xmm7: all round constants
# -64(%rsp): area for passing RCONST + W[] from vector to integer units
movl 80(%rdi), %eax # a = ctx->hash[0]
@@ -32,16 +33,17 @@ sha1_process_block64:
movl 92(%rdi), %edx # d = ctx->hash[3]
movl 96(%rdi), %ebp # e = ctx->hash[4]
- movaps rconst0x5A827999(%rip), %xmm6
+ movaps sha1const(%rip), %xmm7
+ pshufd $0x00, %xmm7, %xmm6
# Load W[] to xmm registers, byteswapping on the fly.
#
# For iterations 0..15, we pass W[] in rsi,r8..r14
- # for use in RD1A's instead of spilling them to stack.
+ # for use in RD1As instead of spilling them to stack.
# We lose parallelized addition of RCONST, but LEA
- # can do two additions at once, so it's probably a wash.
+ # can do two additions at once, so it is probably a wash.
# (We use rsi instead of rN because this makes two
- # LEAs in two first RD1A's shorter by one byte).
+ # LEAs in two first RD1As shorter by one byte).
movq 4*0(%rdi), %rsi
movq 4*2(%rdi), %r8
bswapq %rsi
@@ -253,7 +255,7 @@ sha1_process_block64:
roll $5, %edi # rotl32(a,5)
addl %edi, %edx # e += rotl32(a,5)
rorl $2, %eax # b = rotl32(b,30)
- movaps rconst0x6ED9EBA1(%rip), %xmm6
+ pshufd $0x55, %xmm7, %xmm6
# PREP %xmm1 %xmm2 %xmm3 %xmm0 -64+16*1(%rsp)
movaps %xmm0, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
@@ -614,7 +616,7 @@ sha1_process_block64:
roll $5, %esi # rotl32(a,5)
addl %esi, %edx # e += rotl32(a,5)
rorl $2, %eax # b = rotl32(b,30)
- movaps rconst0x8F1BBCDC(%rip), %xmm6
+ pshufd $0xaa, %xmm7, %xmm6
# PREP %xmm2 %xmm3 %xmm0 %xmm1 -64+16*2(%rsp)
movaps %xmm1, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
@@ -1001,7 +1003,7 @@ sha1_process_block64:
roll $5, %esi # rotl32(a,5)
addl %esi, %edx # e += rotl32(a,5)
rorl $2, %eax # b = rotl32(b,30)
- movaps rconst0xCA62C1D6(%rip), %xmm6
+ pshufd $0xff, %xmm7, %xmm6
# PREP %xmm3 %xmm0 %xmm1 %xmm2 -64+16*3(%rsp)
movaps %xmm2, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
@@ -1475,25 +1477,10 @@ sha1_process_block64:
.section .rodata.cst16.sha1const, "aM", @progbits, 16
.balign 16
-rconst0x5A827999:
+sha1const:
.long 0x5A827999
- .long 0x5A827999
- .long 0x5A827999
- .long 0x5A827999
-rconst0x6ED9EBA1:
- .long 0x6ED9EBA1
- .long 0x6ED9EBA1
- .long 0x6ED9EBA1
.long 0x6ED9EBA1
-rconst0x8F1BBCDC:
.long 0x8F1BBCDC
- .long 0x8F1BBCDC
- .long 0x8F1BBCDC
- .long 0x8F1BBCDC
-rconst0xCA62C1D6:
- .long 0xCA62C1D6
- .long 0xCA62C1D6
- .long 0xCA62C1D6
.long 0xCA62C1D6
#endif
diff --git a/libbb/hash_md5_sha_x86-64.S.sh b/libbb/hash_md5_sha_x86-64.S.sh
index fb1e4b57e..a10ac411d 100755
--- a/libbb/hash_md5_sha_x86-64.S.sh
+++ b/libbb/hash_md5_sha_x86-64.S.sh
@@ -34,6 +34,7 @@ exec >hash_md5_sha_x86-64.S
xmmT1="%xmm4"
xmmT2="%xmm5"
xmmRCONST="%xmm6"
+xmmALLRCONST="%xmm7"
T=`printf '\t'`
# SSE instructions are longer than 4 bytes on average.
@@ -125,6 +126,7 @@ sha1_process_block64:
# xmm0..xmm3: W[]
# xmm4,xmm5: temps
# xmm6: current round constant
+# xmm7: all round constants
# -64(%rsp): area for passing RCONST + W[] from vector to integer units
movl 80(%rdi), %eax # a = ctx->hash[0]
@@ -133,16 +135,17 @@ sha1_process_block64:
movl 92(%rdi), %edx # d = ctx->hash[3]
movl 96(%rdi), %ebp # e = ctx->hash[4]
- movaps rconst0x5A827999(%rip), $xmmRCONST
+ movaps sha1const(%rip), $xmmALLRCONST
+ pshufd \$0x00, $xmmALLRCONST, $xmmRCONST
# Load W[] to xmm registers, byteswapping on the fly.
#
# For iterations 0..15, we pass W[] in rsi,r8..r14
- # for use in RD1A's instead of spilling them to stack.
+ # for use in RD1As instead of spilling them to stack.
# We lose parallelized addition of RCONST, but LEA
- # can do two additions at once, so it's probably a wash.
+ # can do two additions at once, so it is probably a wash.
# (We use rsi instead of rN because this makes two
- # LEAs in two first RD1A's shorter by one byte).
+ # LEAs in two first RD1As shorter by one byte).
movq 4*0(%rdi), %rsi
movq 4*2(%rdi), %r8
bswapq %rsi
@@ -359,7 +362,7 @@ RD1A bx cx dx bp ax 4; RD1A ax bx cx dx bp 5; RD1A bp ax bx cx dx 6; RD1A dx
a=`PREP %xmm0 %xmm1 %xmm2 %xmm3 "-64+16*0(%rsp)"`
b=`RD1A cx dx bp ax bx 8; RD1A bx cx dx bp ax 9; RD1A ax bx cx dx bp 10; RD1A bp ax bx cx dx 11;`
INTERLEAVE "$a" "$b"
-a=`echo " movaps rconst0x6ED9EBA1(%rip), $xmmRCONST"
+a=`echo " pshufd \\$0x55, $xmmALLRCONST, $xmmRCONST"
PREP %xmm1 %xmm2 %xmm3 %xmm0 "-64+16*1(%rsp)"`
b=`RD1A dx bp ax bx cx 12; RD1A cx dx bp ax bx 13; RD1A bx cx dx bp ax 14; RD1A ax bx cx dx bp 15;`
INTERLEAVE "$a" "$b"
@@ -378,7 +381,7 @@ INTERLEAVE "$a" "$b"
a=`PREP %xmm1 %xmm2 %xmm3 %xmm0 "-64+16*1(%rsp)"`
b=`RD2 cx dx bp ax bx 28; RD2 bx cx dx bp ax 29; RD2 ax bx cx dx bp 30; RD2 bp ax bx cx dx 31;`
INTERLEAVE "$a" "$b"
-a=`echo " movaps rconst0x8F1BBCDC(%rip), $xmmRCONST"
+a=`echo " pshufd \\$0xaa, $xmmALLRCONST, $xmmRCONST"
PREP %xmm2 %xmm3 %xmm0 %xmm1 "-64+16*2(%rsp)"`
b=`RD2 dx bp ax bx cx 32; RD2 cx dx bp ax bx 33; RD2 bx cx dx bp ax 34; RD2 ax bx cx dx bp 35;`
INTERLEAVE "$a" "$b"
@@ -397,7 +400,7 @@ INTERLEAVE "$a" "$b"
a=`PREP %xmm2 %xmm3 %xmm0 %xmm1 "-64+16*2(%rsp)"`
b=`RD3 cx dx bp ax bx 48; RD3 bx cx dx bp ax 49; RD3 ax bx cx dx bp 50; RD3 bp ax bx cx dx 51;`
INTERLEAVE "$a" "$b"
-a=`echo " movaps rconst0xCA62C1D6(%rip), $xmmRCONST"
+a=`echo " pshufd \\$0xff, $xmmALLRCONST, $xmmRCONST"
PREP %xmm3 %xmm0 %xmm1 %xmm2 "-64+16*3(%rsp)"`
b=`RD3 dx bp ax bx cx 52; RD3 cx dx bp ax bx 53; RD3 bx cx dx bp ax 54; RD3 ax bx cx dx bp 55;`
INTERLEAVE "$a" "$b"
@@ -439,25 +442,10 @@ echo "
.section .rodata.cst16.sha1const, \"aM\", @progbits, 16
.balign 16
-rconst0x5A827999:
+sha1const:
.long 0x5A827999
- .long 0x5A827999
- .long 0x5A827999
- .long 0x5A827999
-rconst0x6ED9EBA1:
- .long 0x6ED9EBA1
- .long 0x6ED9EBA1
- .long 0x6ED9EBA1
.long 0x6ED9EBA1
-rconst0x8F1BBCDC:
.long 0x8F1BBCDC
- .long 0x8F1BBCDC
- .long 0x8F1BBCDC
- .long 0x8F1BBCDC
-rconst0xCA62C1D6:
- .long 0xCA62C1D6
- .long 0xCA62C1D6
- .long 0xCA62C1D6
.long 0xCA62C1D6
#endif"
From vda.linux at googlemail.com Mon Feb 7 01:34:04 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Mon, 7 Feb 2022 02:34:04 +0100
Subject: [git commit] libbb/sha1: shrink and speed up unrolled x86-64 code
Message-ID: <20220208022853.B8D66831C4@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=c193cbd6dfd095c6b8346bab1ea6ba7106b3e5bb
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
function old new delta
sha1_process_block64 3514 3482 -32
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha256_x86-32_shaNI.S | 8 +-
libbb/hash_md5_sha256_x86-64_shaNI.S | 8 +-
libbb/hash_md5_sha_x86-32_shaNI.S | 4 +-
libbb/hash_md5_sha_x86-64.S | 144 +++++++++++++++++++++++++++--------
libbb/hash_md5_sha_x86-64.S.sh | 9 ++-
libbb/hash_md5_sha_x86-64_shaNI.S | 4 +-
6 files changed, 131 insertions(+), 46 deletions(-)
diff --git a/libbb/hash_md5_sha256_x86-32_shaNI.S b/libbb/hash_md5_sha256_x86-32_shaNI.S
index 417da37d8..39e2baf41 100644
--- a/libbb/hash_md5_sha256_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-32_shaNI.S
@@ -257,8 +257,8 @@ sha256_process_block64_shaNI:
ret
.size sha256_process_block64_shaNI, .-sha256_process_block64_shaNI
-.section .rodata.cst256.K256, "aM", @progbits, 256
-.balign 16
+ .section .rodata.cst256.K256, "aM", @progbits, 256
+ .balign 16
K256:
.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
@@ -277,8 +277,8 @@ K256:
.long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
-.section .rodata.cst16.PSHUFFLE_BSWAP32_FLIP_MASK, "aM", @progbits, 16
-.balign 16
+ .section .rodata.cst16.PSHUFFLE_BSWAP32_FLIP_MASK, "aM", @progbits, 16
+ .balign 16
PSHUFFLE_BSWAP32_FLIP_MASK:
.octa 0x0c0d0e0f08090a0b0405060700010203
diff --git a/libbb/hash_md5_sha256_x86-64_shaNI.S b/libbb/hash_md5_sha256_x86-64_shaNI.S
index dbf391135..c6c931341 100644
--- a/libbb/hash_md5_sha256_x86-64_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-64_shaNI.S
@@ -253,8 +253,8 @@ sha256_process_block64_shaNI:
ret
.size sha256_process_block64_shaNI, .-sha256_process_block64_shaNI
-.section .rodata.cst256.K256, "aM", @progbits, 256
-.balign 16
+ .section .rodata.cst256.K256, "aM", @progbits, 256
+ .balign 16
K256:
.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
@@ -273,8 +273,8 @@ K256:
.long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
-.section .rodata.cst16.PSHUFFLE_BSWAP32_FLIP_MASK, "aM", @progbits, 16
-.balign 16
+ .section .rodata.cst16.PSHUFFLE_BSWAP32_FLIP_MASK, "aM", @progbits, 16
+ .balign 16
PSHUFFLE_BSWAP32_FLIP_MASK:
.octa 0x0c0d0e0f08090a0b0405060700010203
diff --git a/libbb/hash_md5_sha_x86-32_shaNI.S b/libbb/hash_md5_sha_x86-32_shaNI.S
index 11b855e26..5d082ebfb 100644
--- a/libbb/hash_md5_sha_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha_x86-32_shaNI.S
@@ -223,8 +223,8 @@ sha1_process_block64_shaNI:
ret
.size sha1_process_block64_shaNI, .-sha1_process_block64_shaNI
-.section .rodata.cst16.PSHUFFLE_BYTE_FLIP_MASK, "aM", @progbits, 16
-.balign 16
+ .section .rodata.cst16.PSHUFFLE_BYTE_FLIP_MASK, "aM", @progbits, 16
+ .balign 16
PSHUFFLE_BYTE_FLIP_MASK:
.octa 0x000102030405060708090a0b0c0d0e0f
diff --git a/libbb/hash_md5_sha_x86-64.S b/libbb/hash_md5_sha_x86-64.S
index 47ace60de..e26c46f25 100644
--- a/libbb/hash_md5_sha_x86-64.S
+++ b/libbb/hash_md5_sha_x86-64.S
@@ -180,8 +180,13 @@ sha1_process_block64:
# PREP %xmm0 %xmm1 %xmm2 %xmm3 -64+16*0(%rsp)
movaps %xmm3, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
- pshufd $0x4e, %xmm0, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
- punpcklqdq %xmm1, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# pshufd $0x4e, %xmm0, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
+# punpcklqdq %xmm1, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# same result as above, but shorter and faster:
+# pshufd/shufps are subtly different: pshufd takes all dwords from source operand,
+# shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one!
+ movaps %xmm0, %xmm5
+ shufps $0x4e, %xmm1, %xmm5 # 01001110=(T2.dw[2], T2.dw[3], W4.dw[0], W4.dw[1]) = ([2],[3],[4],[5])
xorps %xmm2, %xmm0 # ([8],[9],[10],[11]) ^ ([0],[1],[2],[3])
xorps %xmm4, %xmm5 # ([13],[14],[15],0) ^ ([2],[3],[4],[5])
xorps %xmm5, %xmm0 # ^
@@ -252,8 +257,13 @@ sha1_process_block64:
# PREP %xmm1 %xmm2 %xmm3 %xmm0 -64+16*1(%rsp)
movaps %xmm0, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
- pshufd $0x4e, %xmm1, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
- punpcklqdq %xmm2, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# pshufd $0x4e, %xmm1, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
+# punpcklqdq %xmm2, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# same result as above, but shorter and faster:
+# pshufd/shufps are subtly different: pshufd takes all dwords from source operand,
+# shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one!
+ movaps %xmm1, %xmm5
+ shufps $0x4e, %xmm2, %xmm5 # 01001110=(T2.dw[2], T2.dw[3], W4.dw[0], W4.dw[1]) = ([2],[3],[4],[5])
xorps %xmm3, %xmm1 # ([8],[9],[10],[11]) ^ ([0],[1],[2],[3])
xorps %xmm4, %xmm5 # ([13],[14],[15],0) ^ ([2],[3],[4],[5])
xorps %xmm5, %xmm1 # ^
@@ -323,8 +333,13 @@ sha1_process_block64:
# PREP %xmm2 %xmm3 %xmm0 %xmm1 -64+16*2(%rsp)
movaps %xmm1, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
- pshufd $0x4e, %xmm2, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
- punpcklqdq %xmm3, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# pshufd $0x4e, %xmm2, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
+# punpcklqdq %xmm3, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# same result as above, but shorter and faster:
+# pshufd/shufps are subtly different: pshufd takes all dwords from source operand,
+# shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one!
+ movaps %xmm2, %xmm5
+ shufps $0x4e, %xmm3, %xmm5 # 01001110=(T2.dw[2], T2.dw[3], W4.dw[0], W4.dw[1]) = ([2],[3],[4],[5])
xorps %xmm0, %xmm2 # ([8],[9],[10],[11]) ^ ([0],[1],[2],[3])
xorps %xmm4, %xmm5 # ([13],[14],[15],0) ^ ([2],[3],[4],[5])
xorps %xmm5, %xmm2 # ^
@@ -392,8 +407,13 @@ sha1_process_block64:
# PREP %xmm3 %xmm0 %xmm1 %xmm2 -64+16*3(%rsp)
movaps %xmm2, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
- pshufd $0x4e, %xmm3, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
- punpcklqdq %xmm0, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# pshufd $0x4e, %xmm3, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
+# punpcklqdq %xmm0, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# same result as above, but shorter and faster:
+# pshufd/shufps are subtly different: pshufd takes all dwords from source operand,
+# shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one!
+ movaps %xmm3, %xmm5
+ shufps $0x4e, %xmm0, %xmm5 # 01001110=(T2.dw[2], T2.dw[3], W4.dw[0], W4.dw[1]) = ([2],[3],[4],[5])
xorps %xmm1, %xmm3 # ([8],[9],[10],[11]) ^ ([0],[1],[2],[3])
xorps %xmm4, %xmm5 # ([13],[14],[15],0) ^ ([2],[3],[4],[5])
xorps %xmm5, %xmm3 # ^
@@ -457,8 +477,13 @@ sha1_process_block64:
# PREP %xmm0 %xmm1 %xmm2 %xmm3 -64+16*0(%rsp)
movaps %xmm3, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
- pshufd $0x4e, %xmm0, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
- punpcklqdq %xmm1, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# pshufd $0x4e, %xmm0, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
+# punpcklqdq %xmm1, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# same result as above, but shorter and faster:
+# pshufd/shufps are subtly different: pshufd takes all dwords from source operand,
+# shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one!
+ movaps %xmm0, %xmm5
+ shufps $0x4e, %xmm1, %xmm5 # 01001110=(T2.dw[2], T2.dw[3], W4.dw[0], W4.dw[1]) = ([2],[3],[4],[5])
xorps %xmm2, %xmm0 # ([8],[9],[10],[11]) ^ ([0],[1],[2],[3])
xorps %xmm4, %xmm5 # ([13],[14],[15],0) ^ ([2],[3],[4],[5])
xorps %xmm5, %xmm0 # ^
@@ -522,8 +547,13 @@ sha1_process_block64:
# PREP %xmm1 %xmm2 %xmm3 %xmm0 -64+16*1(%rsp)
movaps %xmm0, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
- pshufd $0x4e, %xmm1, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
- punpcklqdq %xmm2, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# pshufd $0x4e, %xmm1, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
+# punpcklqdq %xmm2, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# same result as above, but shorter and faster:
+# pshufd/shufps are subtly different: pshufd takes all dwords from source operand,
+# shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one!
+ movaps %xmm1, %xmm5
+ shufps $0x4e, %xmm2, %xmm5 # 01001110=(T2.dw[2], T2.dw[3], W4.dw[0], W4.dw[1]) = ([2],[3],[4],[5])
xorps %xmm3, %xmm1 # ([8],[9],[10],[11]) ^ ([0],[1],[2],[3])
xorps %xmm4, %xmm5 # ([13],[14],[15],0) ^ ([2],[3],[4],[5])
xorps %xmm5, %xmm1 # ^
@@ -588,8 +618,13 @@ sha1_process_block64:
# PREP %xmm2 %xmm3 %xmm0 %xmm1 -64+16*2(%rsp)
movaps %xmm1, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
- pshufd $0x4e, %xmm2, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
- punpcklqdq %xmm3, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# pshufd $0x4e, %xmm2, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
+# punpcklqdq %xmm3, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# same result as above, but shorter and faster:
+# pshufd/shufps are subtly different: pshufd takes all dwords from source operand,
+# shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one!
+ movaps %xmm2, %xmm5
+ shufps $0x4e, %xmm3, %xmm5 # 01001110=(T2.dw[2], T2.dw[3], W4.dw[0], W4.dw[1]) = ([2],[3],[4],[5])
xorps %xmm0, %xmm2 # ([8],[9],[10],[11]) ^ ([0],[1],[2],[3])
xorps %xmm4, %xmm5 # ([13],[14],[15],0) ^ ([2],[3],[4],[5])
xorps %xmm5, %xmm2 # ^
@@ -653,8 +688,13 @@ sha1_process_block64:
# PREP %xmm3 %xmm0 %xmm1 %xmm2 -64+16*3(%rsp)
movaps %xmm2, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
- pshufd $0x4e, %xmm3, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
- punpcklqdq %xmm0, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# pshufd $0x4e, %xmm3, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
+# punpcklqdq %xmm0, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# same result as above, but shorter and faster:
+# pshufd/shufps are subtly different: pshufd takes all dwords from source operand,
+# shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one!
+ movaps %xmm3, %xmm5
+ shufps $0x4e, %xmm0, %xmm5 # 01001110=(T2.dw[2], T2.dw[3], W4.dw[0], W4.dw[1]) = ([2],[3],[4],[5])
xorps %xmm1, %xmm3 # ([8],[9],[10],[11]) ^ ([0],[1],[2],[3])
xorps %xmm4, %xmm5 # ([13],[14],[15],0) ^ ([2],[3],[4],[5])
xorps %xmm5, %xmm3 # ^
@@ -718,8 +758,13 @@ sha1_process_block64:
# PREP %xmm0 %xmm1 %xmm2 %xmm3 -64+16*0(%rsp)
movaps %xmm3, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
- pshufd $0x4e, %xmm0, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
- punpcklqdq %xmm1, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# pshufd $0x4e, %xmm0, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
+# punpcklqdq %xmm1, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# same result as above, but shorter and faster:
+# pshufd/shufps are subtly different: pshufd takes all dwords from source operand,
+# shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one!
+ movaps %xmm0, %xmm5
+ shufps $0x4e, %xmm1, %xmm5 # 01001110=(T2.dw[2], T2.dw[3], W4.dw[0], W4.dw[1]) = ([2],[3],[4],[5])
xorps %xmm2, %xmm0 # ([8],[9],[10],[11]) ^ ([0],[1],[2],[3])
xorps %xmm4, %xmm5 # ([13],[14],[15],0) ^ ([2],[3],[4],[5])
xorps %xmm5, %xmm0 # ^
@@ -795,8 +840,13 @@ sha1_process_block64:
# PREP %xmm1 %xmm2 %xmm3 %xmm0 -64+16*1(%rsp)
movaps %xmm0, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
- pshufd $0x4e, %xmm1, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
- punpcklqdq %xmm2, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# pshufd $0x4e, %xmm1, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
+# punpcklqdq %xmm2, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# same result as above, but shorter and faster:
+# pshufd/shufps are subtly different: pshufd takes all dwords from source operand,
+# shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one!
+ movaps %xmm1, %xmm5
+ shufps $0x4e, %xmm2, %xmm5 # 01001110=(T2.dw[2], T2.dw[3], W4.dw[0], W4.dw[1]) = ([2],[3],[4],[5])
xorps %xmm3, %xmm1 # ([8],[9],[10],[11]) ^ ([0],[1],[2],[3])
xorps %xmm4, %xmm5 # ([13],[14],[15],0) ^ ([2],[3],[4],[5])
xorps %xmm5, %xmm1 # ^
@@ -872,8 +922,13 @@ sha1_process_block64:
# PREP %xmm2 %xmm3 %xmm0 %xmm1 -64+16*2(%rsp)
movaps %xmm1, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
- pshufd $0x4e, %xmm2, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
- punpcklqdq %xmm3, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# pshufd $0x4e, %xmm2, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
+# punpcklqdq %xmm3, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# same result as above, but shorter and faster:
+# pshufd/shufps are subtly different: pshufd takes all dwords from source operand,
+# shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one!
+ movaps %xmm2, %xmm5
+ shufps $0x4e, %xmm3, %xmm5 # 01001110=(T2.dw[2], T2.dw[3], W4.dw[0], W4.dw[1]) = ([2],[3],[4],[5])
xorps %xmm0, %xmm2 # ([8],[9],[10],[11]) ^ ([0],[1],[2],[3])
xorps %xmm4, %xmm5 # ([13],[14],[15],0) ^ ([2],[3],[4],[5])
xorps %xmm5, %xmm2 # ^
@@ -950,8 +1005,13 @@ sha1_process_block64:
# PREP %xmm3 %xmm0 %xmm1 %xmm2 -64+16*3(%rsp)
movaps %xmm2, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
- pshufd $0x4e, %xmm3, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
- punpcklqdq %xmm0, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# pshufd $0x4e, %xmm3, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
+# punpcklqdq %xmm0, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# same result as above, but shorter and faster:
+# pshufd/shufps are subtly different: pshufd takes all dwords from source operand,
+# shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one!
+ movaps %xmm3, %xmm5
+ shufps $0x4e, %xmm0, %xmm5 # 01001110=(T2.dw[2], T2.dw[3], W4.dw[0], W4.dw[1]) = ([2],[3],[4],[5])
xorps %xmm1, %xmm3 # ([8],[9],[10],[11]) ^ ([0],[1],[2],[3])
xorps %xmm4, %xmm5 # ([13],[14],[15],0) ^ ([2],[3],[4],[5])
xorps %xmm5, %xmm3 # ^
@@ -1027,8 +1087,13 @@ sha1_process_block64:
# PREP %xmm0 %xmm1 %xmm2 %xmm3 -64+16*0(%rsp)
movaps %xmm3, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
- pshufd $0x4e, %xmm0, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
- punpcklqdq %xmm1, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# pshufd $0x4e, %xmm0, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
+# punpcklqdq %xmm1, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# same result as above, but shorter and faster:
+# pshufd/shufps are subtly different: pshufd takes all dwords from source operand,
+# shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one!
+ movaps %xmm0, %xmm5
+ shufps $0x4e, %xmm1, %xmm5 # 01001110=(T2.dw[2], T2.dw[3], W4.dw[0], W4.dw[1]) = ([2],[3],[4],[5])
xorps %xmm2, %xmm0 # ([8],[9],[10],[11]) ^ ([0],[1],[2],[3])
xorps %xmm4, %xmm5 # ([13],[14],[15],0) ^ ([2],[3],[4],[5])
xorps %xmm5, %xmm0 # ^
@@ -1104,8 +1169,13 @@ sha1_process_block64:
# PREP %xmm1 %xmm2 %xmm3 %xmm0 -64+16*1(%rsp)
movaps %xmm0, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
- pshufd $0x4e, %xmm1, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
- punpcklqdq %xmm2, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# pshufd $0x4e, %xmm1, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
+# punpcklqdq %xmm2, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# same result as above, but shorter and faster:
+# pshufd/shufps are subtly different: pshufd takes all dwords from source operand,
+# shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one!
+ movaps %xmm1, %xmm5
+ shufps $0x4e, %xmm2, %xmm5 # 01001110=(T2.dw[2], T2.dw[3], W4.dw[0], W4.dw[1]) = ([2],[3],[4],[5])
xorps %xmm3, %xmm1 # ([8],[9],[10],[11]) ^ ([0],[1],[2],[3])
xorps %xmm4, %xmm5 # ([13],[14],[15],0) ^ ([2],[3],[4],[5])
xorps %xmm5, %xmm1 # ^
@@ -1169,8 +1239,13 @@ sha1_process_block64:
# PREP %xmm2 %xmm3 %xmm0 %xmm1 -64+16*2(%rsp)
movaps %xmm1, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
- pshufd $0x4e, %xmm2, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
- punpcklqdq %xmm3, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# pshufd $0x4e, %xmm2, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
+# punpcklqdq %xmm3, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# same result as above, but shorter and faster:
+# pshufd/shufps are subtly different: pshufd takes all dwords from source operand,
+# shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one!
+ movaps %xmm2, %xmm5
+ shufps $0x4e, %xmm3, %xmm5 # 01001110=(T2.dw[2], T2.dw[3], W4.dw[0], W4.dw[1]) = ([2],[3],[4],[5])
xorps %xmm0, %xmm2 # ([8],[9],[10],[11]) ^ ([0],[1],[2],[3])
xorps %xmm4, %xmm5 # ([13],[14],[15],0) ^ ([2],[3],[4],[5])
xorps %xmm5, %xmm2 # ^
@@ -1234,8 +1309,13 @@ sha1_process_block64:
# PREP %xmm3 %xmm0 %xmm1 %xmm2 -64+16*3(%rsp)
movaps %xmm2, %xmm4
psrldq $4, %xmm4 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
- pshufd $0x4e, %xmm3, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
- punpcklqdq %xmm0, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# pshufd $0x4e, %xmm3, %xmm5 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
+# punpcklqdq %xmm0, %xmm5 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# same result as above, but shorter and faster:
+# pshufd/shufps are subtly different: pshufd takes all dwords from source operand,
+# shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one!
+ movaps %xmm3, %xmm5
+ shufps $0x4e, %xmm0, %xmm5 # 01001110=(T2.dw[2], T2.dw[3], W4.dw[0], W4.dw[1]) = ([2],[3],[4],[5])
xorps %xmm1, %xmm3 # ([8],[9],[10],[11]) ^ ([0],[1],[2],[3])
xorps %xmm4, %xmm5 # ([13],[14],[15],0) ^ ([2],[3],[4],[5])
xorps %xmm5, %xmm3 # ^
diff --git a/libbb/hash_md5_sha_x86-64.S.sh b/libbb/hash_md5_sha_x86-64.S.sh
index 656fb5414..fb1e4b57e 100755
--- a/libbb/hash_md5_sha_x86-64.S.sh
+++ b/libbb/hash_md5_sha_x86-64.S.sh
@@ -203,8 +203,13 @@ echo "# PREP $@
movaps $xmmW12, $xmmT1
psrldq \$4, $xmmT1 # rshift by 4 bytes: T1 = ([13],[14],[15],0)
- pshufd \$0x4e, $xmmW0, $xmmT2 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
- punpcklqdq $xmmW4, $xmmT2 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# pshufd \$0x4e, $xmmW0, $xmmT2 # 01001110=2,3,0,1 shuffle, ([2],[3],x,x)
+# punpcklqdq $xmmW4, $xmmT2 # T2 = W4[0..63]:T2[0..63] = ([2],[3],[4],[5])
+# same result as above, but shorter and faster:
+# pshufd/shufps are subtly different: pshufd takes all dwords from source operand,
+# shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one!
+ movaps $xmmW0, $xmmT2
+ shufps \$0x4e, $xmmW4, $xmmT2 # 01001110=(T2.dw[2], T2.dw[3], W4.dw[0], W4.dw[1]) = ([2],[3],[4],[5])
xorps $xmmW8, $xmmW0 # ([8],[9],[10],[11]) ^ ([0],[1],[2],[3])
xorps $xmmT1, $xmmT2 # ([13],[14],[15],0) ^ ([2],[3],[4],[5])
diff --git a/libbb/hash_md5_sha_x86-64_shaNI.S b/libbb/hash_md5_sha_x86-64_shaNI.S
index ba92f09df..8ddec87ce 100644
--- a/libbb/hash_md5_sha_x86-64_shaNI.S
+++ b/libbb/hash_md5_sha_x86-64_shaNI.S
@@ -217,8 +217,8 @@ sha1_process_block64_shaNI:
ret
.size sha1_process_block64_shaNI, .-sha1_process_block64_shaNI
-.section .rodata.cst16.PSHUFFLE_BYTE_FLIP_MASK, "aM", @progbits, 16
-.balign 16
+ .section .rodata.cst16.PSHUFFLE_BYTE_FLIP_MASK, "aM", @progbits, 16
+ .balign 16
PSHUFFLE_BYTE_FLIP_MASK:
.octa 0x000102030405060708090a0b0c0d0e0f
From vda.linux at googlemail.com Tue Feb 8 07:22:17 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Tue, 8 Feb 2022 08:22:17 +0100
Subject: [git commit] libbb/sha1: shrink x86 hardware accelerated hashing
Message-ID: <20220208073205.50AE1813BB@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=71a1cccaad679bd102f87283f78c581a8fb0e255
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
function old new delta
sha1_process_block64_shaNI 32-bit 524 517 -7
sha1_process_block64_shaNI 64-bit 510 508 -2
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha_x86-32_shaNI.S | 37 +++++++++++++++++--------------------
libbb/hash_md5_sha_x86-64_shaNI.S | 24 ++++++++++++------------
2 files changed, 29 insertions(+), 32 deletions(-)
diff --git a/libbb/hash_md5_sha_x86-32_shaNI.S b/libbb/hash_md5_sha_x86-32_shaNI.S
index 5d082ebfb..0f3fe57ca 100644
--- a/libbb/hash_md5_sha_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha_x86-32_shaNI.S
@@ -32,14 +32,10 @@
#define MSG1 %xmm4
#define MSG2 %xmm5
#define MSG3 %xmm6
-#define SHUF_MASK %xmm7
- .balign 8 # allow decoders to fetch at least 3 first insns
+ .balign 8 # allow decoders to fetch at least 2 first insns
sha1_process_block64_shaNI:
- pushl %ebp
- movl %esp, %ebp
- subl $32, %esp
- andl $~0xF, %esp # paddd needs aligned memory operand
+ subl $16, %esp
/* load initial hash values */
xor128 E0, E0
@@ -47,30 +43,33 @@ sha1_process_block64_shaNI:
pinsrd $3, 76+4*4(%eax), E0 # load to uppermost 32-bit word
shuf128_32 $0x1B, ABCD, ABCD # DCBA -> ABCD
- mova128 PSHUFFLE_BYTE_FLIP_MASK, SHUF_MASK
+ mova128 PSHUFFLE_BYTE_FLIP_MASK, %xmm7
+
+ movu128 0*16(%eax), MSG0
+ pshufb %xmm7, MSG0
+ movu128 1*16(%eax), MSG1
+ pshufb %xmm7, MSG1
+ movu128 2*16(%eax), MSG2
+ pshufb %xmm7, MSG2
+ movu128 3*16(%eax), MSG3
+ pshufb %xmm7, MSG3
/* Save hash values for addition after rounds */
- movu128 E0, 16(%esp)
+ movu128 E0, %xmm7
movu128 ABCD, (%esp)
/* Rounds 0-3 */
- movu128 0*16(%eax), MSG0
- pshufb SHUF_MASK, MSG0
paddd MSG0, E0
mova128 ABCD, E1
sha1rnds4 $0, E0, ABCD
/* Rounds 4-7 */
- movu128 1*16(%eax), MSG1
- pshufb SHUF_MASK, MSG1
sha1nexte MSG1, E1
mova128 ABCD, E0
sha1rnds4 $0, E1, ABCD
sha1msg1 MSG1, MSG0
/* Rounds 8-11 */
- movu128 2*16(%eax), MSG2
- pshufb SHUF_MASK, MSG2
sha1nexte MSG2, E0
mova128 ABCD, E1
sha1rnds4 $0, E0, ABCD
@@ -78,8 +77,6 @@ sha1_process_block64_shaNI:
xor128 MSG2, MSG0
/* Rounds 12-15 */
- movu128 3*16(%eax), MSG3
- pshufb SHUF_MASK, MSG3
sha1nexte MSG3, E1
mova128 ABCD, E0
sha1msg2 MSG3, MSG0
@@ -210,16 +207,16 @@ sha1_process_block64_shaNI:
sha1rnds4 $3, E1, ABCD
/* Add current hash values with previously saved */
- sha1nexte 16(%esp), E0
- paddd (%esp), ABCD
+ sha1nexte %xmm7, E0
+ movu128 (%esp), %xmm7
+ paddd %xmm7, ABCD
/* Write hash values back in the correct order */
shuf128_32 $0x1B, ABCD, ABCD
movu128 ABCD, 76(%eax)
extr128_32 $3, E0, 76+4*4(%eax)
- movl %ebp, %esp
- popl %ebp
+ addl $16, %esp
ret
.size sha1_process_block64_shaNI, .-sha1_process_block64_shaNI
diff --git a/libbb/hash_md5_sha_x86-64_shaNI.S b/libbb/hash_md5_sha_x86-64_shaNI.S
index 8ddec87ce..fc2ca92e8 100644
--- a/libbb/hash_md5_sha_x86-64_shaNI.S
+++ b/libbb/hash_md5_sha_x86-64_shaNI.S
@@ -32,7 +32,6 @@
#define MSG1 %xmm4
#define MSG2 %xmm5
#define MSG3 %xmm6
-#define SHUF_MASK %xmm7
.balign 8 # allow decoders to fetch at least 2 first insns
sha1_process_block64_shaNI:
@@ -43,30 +42,33 @@ sha1_process_block64_shaNI:
pinsrd $3, 80+4*4(%rdi), E0 # load to uppermost 32-bit word
shuf128_32 $0x1B, ABCD, ABCD # DCBA -> ABCD
- mova128 PSHUFFLE_BYTE_FLIP_MASK(%rip), SHUF_MASK
+ mova128 PSHUFFLE_BYTE_FLIP_MASK(%rip), %xmm7
+
+ movu128 0*16(%rdi), MSG0
+ pshufb %xmm7, MSG0
+ movu128 1*16(%rdi), MSG1
+ pshufb %xmm7, MSG1
+ movu128 2*16(%rdi), MSG2
+ pshufb %xmm7, MSG2
+ movu128 3*16(%rdi), MSG3
+ pshufb %xmm7, MSG3
/* Save hash values for addition after rounds */
- mova128 E0, %xmm9
+ mova128 E0, %xmm7
mova128 ABCD, %xmm8
/* Rounds 0-3 */
- movu128 0*16(%rdi), MSG0
- pshufb SHUF_MASK, MSG0
paddd MSG0, E0
mova128 ABCD, E1
sha1rnds4 $0, E0, ABCD
/* Rounds 4-7 */
- movu128 1*16(%rdi), MSG1
- pshufb SHUF_MASK, MSG1
sha1nexte MSG1, E1
mova128 ABCD, E0
sha1rnds4 $0, E1, ABCD
sha1msg1 MSG1, MSG0
/* Rounds 8-11 */
- movu128 2*16(%rdi), MSG2
- pshufb SHUF_MASK, MSG2
sha1nexte MSG2, E0
mova128 ABCD, E1
sha1rnds4 $0, E0, ABCD
@@ -74,8 +76,6 @@ sha1_process_block64_shaNI:
xor128 MSG2, MSG0
/* Rounds 12-15 */
- movu128 3*16(%rdi), MSG3
- pshufb SHUF_MASK, MSG3
sha1nexte MSG3, E1
mova128 ABCD, E0
sha1msg2 MSG3, MSG0
@@ -206,7 +206,7 @@ sha1_process_block64_shaNI:
sha1rnds4 $3, E1, ABCD
/* Add current hash values with previously saved */
- sha1nexte %xmm9, E0
+ sha1nexte %xmm7, E0
paddd %xmm8, ABCD
/* Write hash values back in the correct order */
From vda.linux at googlemail.com Tue Feb 8 14:23:26 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Tue, 8 Feb 2022 15:23:26 +0100
Subject: [git commit] libbb/sha1: shrink x86 hardware accelerated hashing
(32-bit)
Message-ID: <20220208141656.F168E829FC@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=eb52e7fa522d829fb400461ca4c808ee5c1d6428
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
function old new delta
sha1_process_block64_shaNI 517 511 -6
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha_x86-32_shaNI.S | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/libbb/hash_md5_sha_x86-32_shaNI.S b/libbb/hash_md5_sha_x86-32_shaNI.S
index 0f3fe57ca..ad814a21b 100644
--- a/libbb/hash_md5_sha_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha_x86-32_shaNI.S
@@ -35,11 +35,9 @@
.balign 8 # allow decoders to fetch at least 2 first insns
sha1_process_block64_shaNI:
- subl $16, %esp
-
/* load initial hash values */
- xor128 E0, E0
movu128 76(%eax), ABCD
+ xor128 E0, E0
pinsrd $3, 76+4*4(%eax), E0 # load to uppermost 32-bit word
shuf128_32 $0x1B, ABCD, ABCD # DCBA -> ABCD
@@ -56,7 +54,7 @@ sha1_process_block64_shaNI:
/* Save hash values for addition after rounds */
movu128 E0, %xmm7
- movu128 ABCD, (%esp)
+ /*movu128 ABCD, %xmm8 - NOPE, 32bit has no xmm8 */
/* Rounds 0-3 */
paddd MSG0, E0
@@ -208,7 +206,9 @@ sha1_process_block64_shaNI:
/* Add current hash values with previously saved */
sha1nexte %xmm7, E0
- movu128 (%esp), %xmm7
+ /*paddd %xmm8, ABCD - 32-bit mode has no xmm8 */
+ movu128 76(%eax), %xmm7 # recreate original ABCD
+ shuf128_32 $0x1B, %xmm7, %xmm7 # DCBA -> ABCD
paddd %xmm7, ABCD
/* Write hash values back in the correct order */
@@ -216,7 +216,6 @@ sha1_process_block64_shaNI:
movu128 ABCD, 76(%eax)
extr128_32 $3, E0, 76+4*4(%eax)
- addl $16, %esp
ret
.size sha1_process_block64_shaNI, .-sha1_process_block64_shaNI
From vda.linux at googlemail.com Tue Feb 8 14:34:02 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Tue, 8 Feb 2022 15:34:02 +0100
Subject: [git commit] libbb/sha1: shrink x86 hardware accelerated hashing
(32-bit)
Message-ID: <20220208142920.E0AE082B5D@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=eb8d5f3b8f3c91f3ed82a52b4ce52a154c146ede
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
function old new delta
sha1_process_block64_shaNI 511 507 -4
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha_x86-32_shaNI.S | 9 ++++-----
libbb/hash_md5_sha_x86-64_shaNI.S | 3 +--
2 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/libbb/hash_md5_sha_x86-32_shaNI.S b/libbb/hash_md5_sha_x86-32_shaNI.S
index ad814a21b..a61b3cbed 100644
--- a/libbb/hash_md5_sha_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha_x86-32_shaNI.S
@@ -53,8 +53,8 @@ sha1_process_block64_shaNI:
pshufb %xmm7, MSG3
/* Save hash values for addition after rounds */
- movu128 E0, %xmm7
- /*movu128 ABCD, %xmm8 - NOPE, 32bit has no xmm8 */
+ mova128 E0, %xmm7
+ /*mova128 ABCD, %xmm8 - NOPE, 32bit has no xmm8 */
/* Rounds 0-3 */
paddd MSG0, E0
@@ -207,12 +207,11 @@ sha1_process_block64_shaNI:
/* Add current hash values with previously saved */
sha1nexte %xmm7, E0
/*paddd %xmm8, ABCD - 32-bit mode has no xmm8 */
- movu128 76(%eax), %xmm7 # recreate original ABCD
- shuf128_32 $0x1B, %xmm7, %xmm7 # DCBA -> ABCD
- paddd %xmm7, ABCD
+ movu128 76(%eax), %xmm7 # get original ABCD (not shuffled)...
/* Write hash values back in the correct order */
shuf128_32 $0x1B, ABCD, ABCD
+ paddd %xmm7, ABCD # ...add it to final ABCD
movu128 ABCD, 76(%eax)
extr128_32 $3, E0, 76+4*4(%eax)
diff --git a/libbb/hash_md5_sha_x86-64_shaNI.S b/libbb/hash_md5_sha_x86-64_shaNI.S
index fc2ca92e8..b32029360 100644
--- a/libbb/hash_md5_sha_x86-64_shaNI.S
+++ b/libbb/hash_md5_sha_x86-64_shaNI.S
@@ -36,9 +36,8 @@
.balign 8 # allow decoders to fetch at least 2 first insns
sha1_process_block64_shaNI:
/* load initial hash values */
-
- xor128 E0, E0
movu128 80(%rdi), ABCD
+ xor128 E0, E0
pinsrd $3, 80+4*4(%rdi), E0 # load to uppermost 32-bit word
shuf128_32 $0x1B, ABCD, ABCD # DCBA -> ABCD
From bugzilla at busybox.net Tue Feb 8 15:48:37 2022
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Tue, 08 Feb 2022 15:48:37 +0000
Subject: [Bug 14571] New: ash crashes with fork (&) and stty -echo
Message-ID:
https://bugs.busybox.net/show_bug.cgi?id=14571
Bug ID: 14571
Summary: ash crashes with fork (&) and stty -echo
Product: Busybox
Version: 1.33.x
Hardware: All
OS: Linux
Status: NEW
Severity: normal
Priority: P5
Component: Other
Assignee: unassigned at busybox.net
Reporter: cyrilbur at gmail.com
CC: busybox-cvs at busybox.net
Target Milestone: ---
Setting -echo and leaving a dangling fork results in an ash crash.
I have a relatively stripped down busybox, I am using the busybox coreutls.
Reproduce:
stty -echo
sleep 1 & ps &
^ This is the problem
ash will crash.
I have done the same thing with bash and dash and neither crash.
If I have time I will endeavour to get more information.
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at busybox.net Tue Feb 8 18:41:34 2022
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Tue, 08 Feb 2022 18:41:34 +0000
Subject: [Bug 14576] New: unzip: test skipped with bad archive
Message-ID:
https://bugs.busybox.net/show_bug.cgi?id=14576
Bug ID: 14576
Summary: unzip: test skipped with bad archive
Product: Busybox
Version: 1.33.x
Hardware: All
OS: Linux
Status: NEW
Severity: major
Priority: P5
Component: Standard Compliance
Assignee: unassigned at busybox.net
Reporter: dharanendiran at gmail.com
CC: busybox-cvs at busybox.net
Target Milestone: ---
When I run the testsuite, the unzip (bad archive) was skipped. So the expected
results is always skipped here or it require further components to get succeed.
# ./runtest -v unzip
======================
echo -ne '' >input
echo -ne '' | unzip -q foo.zip foo/ && test -d foo && test ! -f foo/bar && echo
yes
PASS: unzip (subdir only)
SKIPPED: unzip (bad archive)
======================
echo -ne '' >input
echo -ne '' | unzip -p ../unzip_bad_lzma_1.zip 2>&1; echo $?
PASS: unzip (archive with corrupted lzma 1)
======================
echo -ne '' >input
echo -ne '' | unzip -p ../unzip_bad_lzma_2.zip 2>&1; echo $?
PASS: unzip (archive with corrupted lzma 2)
#
The following config options are enabled in busybox:
FEATURE_UNZIP_CDF CONFIG_UNICODE_SUPPORT UUDECODE
--
You are receiving this mail because:
You are on the CC list for the bug.
From vda.linux at googlemail.com Wed Feb 9 00:30:23 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Wed, 9 Feb 2022 01:30:23 +0100
Subject: [git commit] libbb/sha256: code shrink in 32-bit x86
Message-ID: <20220209003846.5A4608148C@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=461a994b09c5022b93bccccf903b39438d61bbf1
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
function old new delta
sha256_process_block64_shaNI 697 676 -21
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha256_x86-32_shaNI.S | 29 ++++++++++++++++-------------
1 file changed, 16 insertions(+), 13 deletions(-)
diff --git a/libbb/hash_md5_sha256_x86-32_shaNI.S b/libbb/hash_md5_sha256_x86-32_shaNI.S
index a849dfcc2..846230e3e 100644
--- a/libbb/hash_md5_sha256_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-32_shaNI.S
@@ -34,16 +34,18 @@
#define XMMTMP %xmm7
+#define SHUF(a,b,c,d) $(a+(b<<2)+(c<<4)+(d<<6))
+
.balign 8 # allow decoders to fetch at least 2 first insns
sha256_process_block64_shaNI:
- movu128 76+0*16(%eax), STATE0
- movu128 76+1*16(%eax), STATE1
- shuf128_32 $0xB1, STATE0, STATE0 /* CDAB */
- shuf128_32 $0x1B, STATE1, STATE1 /* EFGH */
+ movu128 76+0*16(%eax), STATE1 /* DCBA (msb-to-lsb: 3,2,1,0) */
+ movu128 76+1*16(%eax), STATE0 /* HGFE */
+/* shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one */
mova128 STATE0, XMMTMP
- palignr $8, STATE1, STATE0 /* ABEF */
- pblendw $0xF0, XMMTMP, STATE1 /* CDGH */
+ shufps SHUF(1,0,1,0), STATE1, STATE0 /* ABEF */
+ shufps SHUF(3,2,3,2), STATE1, XMMTMP /* CDGH */
+ mova128 XMMTMP, STATE1
/* XMMTMP holds flip mask from here... */
mova128 PSHUFFLE_BSWAP32_FLIP_MASK, XMMTMP
@@ -231,18 +233,19 @@ sha256_process_block64_shaNI:
sha256rnds2 STATE1, STATE0
/* Write hash values back in the correct order */
- shuf128_32 $0x1B, STATE0, STATE0 /* FEBA */
- shuf128_32 $0xB1, STATE1, STATE1 /* DCHG */
+ /* STATE0: ABEF (msb-to-lsb: 3,2,1,0) */
+ /* STATE1: CDGH */
mova128 STATE0, XMMTMP
- pblendw $0xF0, STATE1, STATE0 /* DCBA */
- palignr $8, XMMTMP, STATE1 /* HGFE */
+/* shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one */
+ shufps SHUF(3,2,3,2), STATE1, STATE0 /* DCBA */
+ shufps SHUF(1,0,1,0), STATE1, XMMTMP /* HGFE */
/* add current hash values to previous ones */
+ movu128 76+1*16(%eax), STATE1
+ paddd XMMTMP, STATE1
+ movu128 STATE1, 76+1*16(%eax)
movu128 76+0*16(%eax), XMMTMP
paddd XMMTMP, STATE0
- movu128 76+1*16(%eax), XMMTMP
movu128 STATE0, 76+0*16(%eax)
- paddd XMMTMP, STATE1
- movu128 STATE1, 76+1*16(%eax)
ret
.size sha256_process_block64_shaNI, .-sha256_process_block64_shaNI
From vda.linux at googlemail.com Tue Feb 8 23:33:39 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Wed, 9 Feb 2022 00:33:39 +0100
Subject: [git commit] libbb/sha256: code shrink in 32-bit x86
Message-ID: <20220209003846.4FB3582DFD@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=c0ff0d4528d718c20b9ca2290bd10d59e9f794a3
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
function old new delta
sha256_process_block64_shaNI 713 697 -16
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha256_x86-32_shaNI.S | 130 ++++++++++++++++-------------------
libbb/hash_md5_sha256_x86-64_shaNI.S | 107 ++++++++++++++--------------
2 files changed, 114 insertions(+), 123 deletions(-)
diff --git a/libbb/hash_md5_sha256_x86-32_shaNI.S b/libbb/hash_md5_sha256_x86-32_shaNI.S
index 39e2baf41..a849dfcc2 100644
--- a/libbb/hash_md5_sha256_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-32_shaNI.S
@@ -31,35 +31,27 @@
#define MSGTMP1 %xmm4
#define MSGTMP2 %xmm5
#define MSGTMP3 %xmm6
-#define XMMTMP4 %xmm7
- .balign 8 # allow decoders to fetch at least 3 first insns
-sha256_process_block64_shaNI:
- pushl %ebp
- movl %esp, %ebp
- subl $32, %esp
- andl $~0xF, %esp # paddd needs aligned memory operand
+#define XMMTMP %xmm7
+ .balign 8 # allow decoders to fetch at least 2 first insns
+sha256_process_block64_shaNI:
movu128 76+0*16(%eax), STATE0
movu128 76+1*16(%eax), STATE1
- shuf128_32 $0xB1, STATE0, STATE0 /* CDAB */
- shuf128_32 $0x1B, STATE1, STATE1 /* EFGH */
- mova128 STATE0, XMMTMP4
- palignr $8, STATE1, STATE0 /* ABEF */
- pblendw $0xF0, XMMTMP4, STATE1 /* CDGH */
+ shuf128_32 $0xB1, STATE0, STATE0 /* CDAB */
+ shuf128_32 $0x1B, STATE1, STATE1 /* EFGH */
+ mova128 STATE0, XMMTMP
+ palignr $8, STATE1, STATE0 /* ABEF */
+ pblendw $0xF0, XMMTMP, STATE1 /* CDGH */
-/* XMMTMP4 holds flip mask from here... */
- mova128 PSHUFFLE_BSWAP32_FLIP_MASK, XMMTMP4
+/* XMMTMP holds flip mask from here... */
+ mova128 PSHUFFLE_BSWAP32_FLIP_MASK, XMMTMP
movl $K256+8*16, SHA256CONSTANTS
- /* Save hash values for addition after rounds */
- mova128 STATE0, 0*16(%esp)
- mova128 STATE1, 1*16(%esp)
-
/* Rounds 0-3 */
movu128 0*16(DATA_PTR), MSG
- pshufb XMMTMP4, MSG
+ pshufb XMMTMP, MSG
mova128 MSG, MSGTMP0
paddd 0*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
@@ -68,7 +60,7 @@ sha256_process_block64_shaNI:
/* Rounds 4-7 */
movu128 1*16(DATA_PTR), MSG
- pshufb XMMTMP4, MSG
+ pshufb XMMTMP, MSG
mova128 MSG, MSGTMP1
paddd 1*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
@@ -78,7 +70,7 @@ sha256_process_block64_shaNI:
/* Rounds 8-11 */
movu128 2*16(DATA_PTR), MSG
- pshufb XMMTMP4, MSG
+ pshufb XMMTMP, MSG
mova128 MSG, MSGTMP2
paddd 2*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
@@ -88,14 +80,14 @@ sha256_process_block64_shaNI:
/* Rounds 12-15 */
movu128 3*16(DATA_PTR), MSG
- pshufb XMMTMP4, MSG
+ pshufb XMMTMP, MSG
/* ...to here */
mova128 MSG, MSGTMP3
paddd 3*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP3, XMMTMP4
- palignr $4, MSGTMP2, XMMTMP4
- paddd XMMTMP4, MSGTMP0
+ mova128 MSGTMP3, XMMTMP
+ palignr $4, MSGTMP2, XMMTMP
+ paddd XMMTMP, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -105,9 +97,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP0, MSG
paddd 4*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP0, XMMTMP4
- palignr $4, MSGTMP3, XMMTMP4
- paddd XMMTMP4, MSGTMP1
+ mova128 MSGTMP0, XMMTMP
+ palignr $4, MSGTMP3, XMMTMP
+ paddd XMMTMP, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -117,9 +109,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP1, MSG
paddd 5*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP1, XMMTMP4
- palignr $4, MSGTMP0, XMMTMP4
- paddd XMMTMP4, MSGTMP2
+ mova128 MSGTMP1, XMMTMP
+ palignr $4, MSGTMP0, XMMTMP
+ paddd XMMTMP, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -129,9 +121,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP2, MSG
paddd 6*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP2, XMMTMP4
- palignr $4, MSGTMP1, XMMTMP4
- paddd XMMTMP4, MSGTMP3
+ mova128 MSGTMP2, XMMTMP
+ palignr $4, MSGTMP1, XMMTMP
+ paddd XMMTMP, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -141,9 +133,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP3, MSG
paddd 7*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP3, XMMTMP4
- palignr $4, MSGTMP2, XMMTMP4
- paddd XMMTMP4, MSGTMP0
+ mova128 MSGTMP3, XMMTMP
+ palignr $4, MSGTMP2, XMMTMP
+ paddd XMMTMP, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -153,9 +145,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP0, MSG
paddd 8*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP0, XMMTMP4
- palignr $4, MSGTMP3, XMMTMP4
- paddd XMMTMP4, MSGTMP1
+ mova128 MSGTMP0, XMMTMP
+ palignr $4, MSGTMP3, XMMTMP
+ paddd XMMTMP, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -165,9 +157,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP1, MSG
paddd 9*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP1, XMMTMP4
- palignr $4, MSGTMP0, XMMTMP4
- paddd XMMTMP4, MSGTMP2
+ mova128 MSGTMP1, XMMTMP
+ palignr $4, MSGTMP0, XMMTMP
+ paddd XMMTMP, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -177,9 +169,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP2, MSG
paddd 10*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP2, XMMTMP4
- palignr $4, MSGTMP1, XMMTMP4
- paddd XMMTMP4, MSGTMP3
+ mova128 MSGTMP2, XMMTMP
+ palignr $4, MSGTMP1, XMMTMP
+ paddd XMMTMP, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -189,9 +181,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP3, MSG
paddd 11*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP3, XMMTMP4
- palignr $4, MSGTMP2, XMMTMP4
- paddd XMMTMP4, MSGTMP0
+ mova128 MSGTMP3, XMMTMP
+ palignr $4, MSGTMP2, XMMTMP
+ paddd XMMTMP, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -201,9 +193,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP0, MSG
paddd 12*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP0, XMMTMP4
- palignr $4, MSGTMP3, XMMTMP4
- paddd XMMTMP4, MSGTMP1
+ mova128 MSGTMP0, XMMTMP
+ palignr $4, MSGTMP3, XMMTMP
+ paddd XMMTMP, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -213,9 +205,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP1, MSG
paddd 13*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP1, XMMTMP4
- palignr $4, MSGTMP0, XMMTMP4
- paddd XMMTMP4, MSGTMP2
+ mova128 MSGTMP1, XMMTMP
+ palignr $4, MSGTMP0, XMMTMP
+ paddd XMMTMP, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -224,9 +216,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP2, MSG
paddd 14*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP2, XMMTMP4
- palignr $4, MSGTMP1, XMMTMP4
- paddd XMMTMP4, MSGTMP3
+ mova128 MSGTMP2, XMMTMP
+ palignr $4, MSGTMP1, XMMTMP
+ paddd XMMTMP, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -238,22 +230,20 @@ sha256_process_block64_shaNI:
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
- /* Add current hash values with previously saved */
- paddd 0*16(%esp), STATE0
- paddd 1*16(%esp), STATE1
-
/* Write hash values back in the correct order */
- shuf128_32 $0x1B, STATE0, STATE0 /* FEBA */
- shuf128_32 $0xB1, STATE1, STATE1 /* DCHG */
- mova128 STATE0, XMMTMP4
- pblendw $0xF0, STATE1, STATE0 /* DCBA */
- palignr $8, XMMTMP4, STATE1 /* HGFE */
-
+ shuf128_32 $0x1B, STATE0, STATE0 /* FEBA */
+ shuf128_32 $0xB1, STATE1, STATE1 /* DCHG */
+ mova128 STATE0, XMMTMP
+ pblendw $0xF0, STATE1, STATE0 /* DCBA */
+ palignr $8, XMMTMP, STATE1 /* HGFE */
+ /* add current hash values to previous ones */
+ movu128 76+0*16(%eax), XMMTMP
+ paddd XMMTMP, STATE0
+ movu128 76+1*16(%eax), XMMTMP
movu128 STATE0, 76+0*16(%eax)
+ paddd XMMTMP, STATE1
movu128 STATE1, 76+1*16(%eax)
- movl %ebp, %esp
- popl %ebp
ret
.size sha256_process_block64_shaNI, .-sha256_process_block64_shaNI
diff --git a/libbb/hash_md5_sha256_x86-64_shaNI.S b/libbb/hash_md5_sha256_x86-64_shaNI.S
index c6c931341..b5c950a9a 100644
--- a/libbb/hash_md5_sha256_x86-64_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-64_shaNI.S
@@ -31,7 +31,8 @@
#define MSGTMP1 %xmm4
#define MSGTMP2 %xmm5
#define MSGTMP3 %xmm6
-#define XMMTMP4 %xmm7
+
+#define XMMTMP %xmm7
#define ABEF_SAVE %xmm9
#define CDGH_SAVE %xmm10
@@ -41,14 +42,14 @@ sha256_process_block64_shaNI:
movu128 80+0*16(%rdi), STATE0
movu128 80+1*16(%rdi), STATE1
- shuf128_32 $0xB1, STATE0, STATE0 /* CDAB */
- shuf128_32 $0x1B, STATE1, STATE1 /* EFGH */
- mova128 STATE0, XMMTMP4
- palignr $8, STATE1, STATE0 /* ABEF */
- pblendw $0xF0, XMMTMP4, STATE1 /* CDGH */
+ shuf128_32 $0xB1, STATE0, STATE0 /* CDAB */
+ shuf128_32 $0x1B, STATE1, STATE1 /* EFGH */
+ mova128 STATE0, XMMTMP
+ palignr $8, STATE1, STATE0 /* ABEF */
+ pblendw $0xF0, XMMTMP, STATE1 /* CDGH */
-/* XMMTMP4 holds flip mask from here... */
- mova128 PSHUFFLE_BSWAP32_FLIP_MASK(%rip), XMMTMP4
+/* XMMTMP holds flip mask from here... */
+ mova128 PSHUFFLE_BSWAP32_FLIP_MASK(%rip), XMMTMP
leaq K256+8*16(%rip), SHA256CONSTANTS
/* Save hash values for addition after rounds */
@@ -57,7 +58,7 @@ sha256_process_block64_shaNI:
/* Rounds 0-3 */
movu128 0*16(DATA_PTR), MSG
- pshufb XMMTMP4, MSG
+ pshufb XMMTMP, MSG
mova128 MSG, MSGTMP0
paddd 0*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
@@ -66,7 +67,7 @@ sha256_process_block64_shaNI:
/* Rounds 4-7 */
movu128 1*16(DATA_PTR), MSG
- pshufb XMMTMP4, MSG
+ pshufb XMMTMP, MSG
mova128 MSG, MSGTMP1
paddd 1*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
@@ -76,7 +77,7 @@ sha256_process_block64_shaNI:
/* Rounds 8-11 */
movu128 2*16(DATA_PTR), MSG
- pshufb XMMTMP4, MSG
+ pshufb XMMTMP, MSG
mova128 MSG, MSGTMP2
paddd 2*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
@@ -86,14 +87,14 @@ sha256_process_block64_shaNI:
/* Rounds 12-15 */
movu128 3*16(DATA_PTR), MSG
- pshufb XMMTMP4, MSG
+ pshufb XMMTMP, MSG
/* ...to here */
mova128 MSG, MSGTMP3
paddd 3*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP3, XMMTMP4
- palignr $4, MSGTMP2, XMMTMP4
- paddd XMMTMP4, MSGTMP0
+ mova128 MSGTMP3, XMMTMP
+ palignr $4, MSGTMP2, XMMTMP
+ paddd XMMTMP, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -103,9 +104,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP0, MSG
paddd 4*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP0, XMMTMP4
- palignr $4, MSGTMP3, XMMTMP4
- paddd XMMTMP4, MSGTMP1
+ mova128 MSGTMP0, XMMTMP
+ palignr $4, MSGTMP3, XMMTMP
+ paddd XMMTMP, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -115,9 +116,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP1, MSG
paddd 5*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP1, XMMTMP4
- palignr $4, MSGTMP0, XMMTMP4
- paddd XMMTMP4, MSGTMP2
+ mova128 MSGTMP1, XMMTMP
+ palignr $4, MSGTMP0, XMMTMP
+ paddd XMMTMP, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -127,9 +128,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP2, MSG
paddd 6*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP2, XMMTMP4
- palignr $4, MSGTMP1, XMMTMP4
- paddd XMMTMP4, MSGTMP3
+ mova128 MSGTMP2, XMMTMP
+ palignr $4, MSGTMP1, XMMTMP
+ paddd XMMTMP, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -139,9 +140,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP3, MSG
paddd 7*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP3, XMMTMP4
- palignr $4, MSGTMP2, XMMTMP4
- paddd XMMTMP4, MSGTMP0
+ mova128 MSGTMP3, XMMTMP
+ palignr $4, MSGTMP2, XMMTMP
+ paddd XMMTMP, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -151,9 +152,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP0, MSG
paddd 8*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP0, XMMTMP4
- palignr $4, MSGTMP3, XMMTMP4
- paddd XMMTMP4, MSGTMP1
+ mova128 MSGTMP0, XMMTMP
+ palignr $4, MSGTMP3, XMMTMP
+ paddd XMMTMP, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -163,9 +164,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP1, MSG
paddd 9*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP1, XMMTMP4
- palignr $4, MSGTMP0, XMMTMP4
- paddd XMMTMP4, MSGTMP2
+ mova128 MSGTMP1, XMMTMP
+ palignr $4, MSGTMP0, XMMTMP
+ paddd XMMTMP, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -175,9 +176,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP2, MSG
paddd 10*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP2, XMMTMP4
- palignr $4, MSGTMP1, XMMTMP4
- paddd XMMTMP4, MSGTMP3
+ mova128 MSGTMP2, XMMTMP
+ palignr $4, MSGTMP1, XMMTMP
+ paddd XMMTMP, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -187,9 +188,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP3, MSG
paddd 11*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP3, XMMTMP4
- palignr $4, MSGTMP2, XMMTMP4
- paddd XMMTMP4, MSGTMP0
+ mova128 MSGTMP3, XMMTMP
+ palignr $4, MSGTMP2, XMMTMP
+ paddd XMMTMP, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -199,9 +200,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP0, MSG
paddd 12*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP0, XMMTMP4
- palignr $4, MSGTMP3, XMMTMP4
- paddd XMMTMP4, MSGTMP1
+ mova128 MSGTMP0, XMMTMP
+ palignr $4, MSGTMP3, XMMTMP
+ paddd XMMTMP, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -211,9 +212,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP1, MSG
paddd 13*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP1, XMMTMP4
- palignr $4, MSGTMP0, XMMTMP4
- paddd XMMTMP4, MSGTMP2
+ mova128 MSGTMP1, XMMTMP
+ palignr $4, MSGTMP0, XMMTMP
+ paddd XMMTMP, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -222,9 +223,9 @@ sha256_process_block64_shaNI:
mova128 MSGTMP2, MSG
paddd 14*16-8*16(SHA256CONSTANTS), MSG
sha256rnds2 STATE0, STATE1
- mova128 MSGTMP2, XMMTMP4
- palignr $4, MSGTMP1, XMMTMP4
- paddd XMMTMP4, MSGTMP3
+ mova128 MSGTMP2, XMMTMP
+ palignr $4, MSGTMP1, XMMTMP
+ paddd XMMTMP, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
sha256rnds2 STATE1, STATE0
@@ -241,11 +242,11 @@ sha256_process_block64_shaNI:
paddd CDGH_SAVE, STATE1
/* Write hash values back in the correct order */
- shuf128_32 $0x1B, STATE0, STATE0 /* FEBA */
- shuf128_32 $0xB1, STATE1, STATE1 /* DCHG */
- mova128 STATE0, XMMTMP4
- pblendw $0xF0, STATE1, STATE0 /* DCBA */
- palignr $8, XMMTMP4, STATE1 /* HGFE */
+ shuf128_32 $0x1B, STATE0, STATE0 /* FEBA */
+ shuf128_32 $0xB1, STATE1, STATE1 /* DCHG */
+ mova128 STATE0, XMMTMP
+ pblendw $0xF0, STATE1, STATE0 /* DCBA */
+ palignr $8, XMMTMP, STATE1 /* HGFE */
movu128 STATE0, 80+0*16(%rdi)
movu128 STATE1, 80+1*16(%rdi)
From vda.linux at googlemail.com Wed Feb 9 00:42:49 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Wed, 9 Feb 2022 01:42:49 +0100
Subject: [git commit] libbb/sha256: code shrink in 64-bit x86
Message-ID: <20220209003846.64EAB8315B@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=11bcea7ac0ac4b2156c1b2d53f926d789b9792b4
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
function old new delta
sha256_process_block64_shaNI 701 680 -21
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha256_x86-64_shaNI.S | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/libbb/hash_md5_sha256_x86-64_shaNI.S b/libbb/hash_md5_sha256_x86-64_shaNI.S
index b5c950a9a..bc063b9cc 100644
--- a/libbb/hash_md5_sha256_x86-64_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-64_shaNI.S
@@ -37,16 +37,18 @@
#define ABEF_SAVE %xmm9
#define CDGH_SAVE %xmm10
+#define SHUF(a,b,c,d) $(a+(b<<2)+(c<<4)+(d<<6))
+
.balign 8 # allow decoders to fetch at least 2 first insns
sha256_process_block64_shaNI:
- movu128 80+0*16(%rdi), STATE0
- movu128 80+1*16(%rdi), STATE1
- shuf128_32 $0xB1, STATE0, STATE0 /* CDAB */
- shuf128_32 $0x1B, STATE1, STATE1 /* EFGH */
+ movu128 80+0*16(%rdi), STATE1 /* DCBA (msb-to-lsb: 3,2,1,0) */
+ movu128 80+1*16(%rdi), STATE0 /* HGFE */
+/* shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one */
mova128 STATE0, XMMTMP
- palignr $8, STATE1, STATE0 /* ABEF */
- pblendw $0xF0, XMMTMP, STATE1 /* CDGH */
+ shufps SHUF(1,0,1,0), STATE1, STATE0 /* ABEF */
+ shufps SHUF(3,2,3,2), STATE1, XMMTMP /* CDGH */
+ mova128 XMMTMP, STATE1
/* XMMTMP holds flip mask from here... */
mova128 PSHUFFLE_BSWAP32_FLIP_MASK(%rip), XMMTMP
@@ -242,14 +244,15 @@ sha256_process_block64_shaNI:
paddd CDGH_SAVE, STATE1
/* Write hash values back in the correct order */
- shuf128_32 $0x1B, STATE0, STATE0 /* FEBA */
- shuf128_32 $0xB1, STATE1, STATE1 /* DCHG */
+ /* STATE0: ABEF (msb-to-lsb: 3,2,1,0) */
+ /* STATE1: CDGH */
mova128 STATE0, XMMTMP
- pblendw $0xF0, STATE1, STATE0 /* DCBA */
- palignr $8, XMMTMP, STATE1 /* HGFE */
+/* shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one */
+ shufps SHUF(3,2,3,2), STATE1, STATE0 /* DCBA */
+ shufps SHUF(1,0,1,0), STATE1, XMMTMP /* HGFE */
movu128 STATE0, 80+0*16(%rdi)
- movu128 STATE1, 80+1*16(%rdi)
+ movu128 XMMTMP, 80+1*16(%rdi)
ret
.size sha256_process_block64_shaNI, .-sha256_process_block64_shaNI
From vda.linux at googlemail.com Wed Feb 9 00:50:22 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Wed, 9 Feb 2022 01:50:22 +0100
Subject: [git commit] libbb/sha256: code shrink in x86 assembly
Message-ID: <20220209004444.C2B4182B8E@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=caa9c4f707b661cf398f2c2d66f54f5b0d8adfe2
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
function old new delta
sha256_process_block64_shaNI 32-bit 676 673 -3
sha256_process_block64_shaNI 64-bit 680 677 -3
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha256_x86-32_shaNI.S | 11 +++++------
libbb/hash_md5_sha256_x86-64_shaNI.S | 11 +++++------
2 files changed, 10 insertions(+), 12 deletions(-)
diff --git a/libbb/hash_md5_sha256_x86-32_shaNI.S b/libbb/hash_md5_sha256_x86-32_shaNI.S
index 846230e3e..aa68193bd 100644
--- a/libbb/hash_md5_sha256_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-32_shaNI.S
@@ -39,13 +39,12 @@
.balign 8 # allow decoders to fetch at least 2 first insns
sha256_process_block64_shaNI:
- movu128 76+0*16(%eax), STATE1 /* DCBA (msb-to-lsb: 3,2,1,0) */
- movu128 76+1*16(%eax), STATE0 /* HGFE */
+ movu128 76+0*16(%eax), XMMTMP /* DCBA (msb-to-lsb: 3,2,1,0) */
+ movu128 76+1*16(%eax), STATE1 /* HGFE */
/* shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one */
- mova128 STATE0, XMMTMP
- shufps SHUF(1,0,1,0), STATE1, STATE0 /* ABEF */
- shufps SHUF(3,2,3,2), STATE1, XMMTMP /* CDGH */
- mova128 XMMTMP, STATE1
+ mova128 STATE1, STATE0
+ shufps SHUF(1,0,1,0), XMMTMP, STATE0 /* ABEF */
+ shufps SHUF(3,2,3,2), XMMTMP, STATE1 /* CDGH */
/* XMMTMP holds flip mask from here... */
mova128 PSHUFFLE_BSWAP32_FLIP_MASK, XMMTMP
diff --git a/libbb/hash_md5_sha256_x86-64_shaNI.S b/libbb/hash_md5_sha256_x86-64_shaNI.S
index bc063b9cc..4663f750a 100644
--- a/libbb/hash_md5_sha256_x86-64_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-64_shaNI.S
@@ -42,13 +42,12 @@
.balign 8 # allow decoders to fetch at least 2 first insns
sha256_process_block64_shaNI:
- movu128 80+0*16(%rdi), STATE1 /* DCBA (msb-to-lsb: 3,2,1,0) */
- movu128 80+1*16(%rdi), STATE0 /* HGFE */
+ movu128 80+0*16(%rdi), XMMTMP /* DCBA (msb-to-lsb: 3,2,1,0) */
+ movu128 80+1*16(%rdi), STATE1 /* HGFE */
/* shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one */
- mova128 STATE0, XMMTMP
- shufps SHUF(1,0,1,0), STATE1, STATE0 /* ABEF */
- shufps SHUF(3,2,3,2), STATE1, XMMTMP /* CDGH */
- mova128 XMMTMP, STATE1
+ mova128 STATE1, STATE0
+ shufps SHUF(1,0,1,0), XMMTMP, STATE0 /* ABEF */
+ shufps SHUF(3,2,3,2), XMMTMP, STATE1 /* CDGH */
/* XMMTMP holds flip mask from here... */
mova128 PSHUFFLE_BSWAP32_FLIP_MASK(%rip), XMMTMP
From vda.linux at googlemail.com Wed Feb 9 10:29:23 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Wed, 9 Feb 2022 11:29:23 +0100
Subject: [git commit] whitespace fix
Message-ID: <20220209102223.E965181D55@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=6a6c1c0ea91edeeb18736190feb5a7278d3d1141
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha256_x86-32_shaNI.S | 6 +++---
libbb/hash_md5_sha256_x86-64_shaNI.S | 6 +++---
libbb/hash_md5_sha_x86-32_shaNI.S | 4 ++--
libbb/hash_md5_sha_x86-64_shaNI.S | 4 ++--
4 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/libbb/hash_md5_sha256_x86-32_shaNI.S b/libbb/hash_md5_sha256_x86-32_shaNI.S
index aa68193bd..413e2df9e 100644
--- a/libbb/hash_md5_sha256_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-32_shaNI.S
@@ -250,7 +250,7 @@ sha256_process_block64_shaNI:
.size sha256_process_block64_shaNI, .-sha256_process_block64_shaNI
.section .rodata.cst256.K256, "aM", @progbits, 256
- .balign 16
+ .balign 16
K256:
.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
@@ -270,8 +270,8 @@ K256:
.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
.section .rodata.cst16.PSHUFFLE_BSWAP32_FLIP_MASK, "aM", @progbits, 16
- .balign 16
+ .balign 16
PSHUFFLE_BSWAP32_FLIP_MASK:
- .octa 0x0c0d0e0f08090a0b0405060700010203
+ .octa 0x0c0d0e0f08090a0b0405060700010203
#endif
diff --git a/libbb/hash_md5_sha256_x86-64_shaNI.S b/libbb/hash_md5_sha256_x86-64_shaNI.S
index 4663f750a..c246762aa 100644
--- a/libbb/hash_md5_sha256_x86-64_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-64_shaNI.S
@@ -257,7 +257,7 @@ sha256_process_block64_shaNI:
.size sha256_process_block64_shaNI, .-sha256_process_block64_shaNI
.section .rodata.cst256.K256, "aM", @progbits, 256
- .balign 16
+ .balign 16
K256:
.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
@@ -277,8 +277,8 @@ K256:
.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
.section .rodata.cst16.PSHUFFLE_BSWAP32_FLIP_MASK, "aM", @progbits, 16
- .balign 16
+ .balign 16
PSHUFFLE_BSWAP32_FLIP_MASK:
- .octa 0x0c0d0e0f08090a0b0405060700010203
+ .octa 0x0c0d0e0f08090a0b0405060700010203
#endif
diff --git a/libbb/hash_md5_sha_x86-32_shaNI.S b/libbb/hash_md5_sha_x86-32_shaNI.S
index a61b3cbed..afca98a62 100644
--- a/libbb/hash_md5_sha_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha_x86-32_shaNI.S
@@ -219,8 +219,8 @@ sha1_process_block64_shaNI:
.size sha1_process_block64_shaNI, .-sha1_process_block64_shaNI
.section .rodata.cst16.PSHUFFLE_BYTE_FLIP_MASK, "aM", @progbits, 16
- .balign 16
+ .balign 16
PSHUFFLE_BYTE_FLIP_MASK:
- .octa 0x000102030405060708090a0b0c0d0e0f
+ .octa 0x000102030405060708090a0b0c0d0e0f
#endif
diff --git a/libbb/hash_md5_sha_x86-64_shaNI.S b/libbb/hash_md5_sha_x86-64_shaNI.S
index b32029360..54d122788 100644
--- a/libbb/hash_md5_sha_x86-64_shaNI.S
+++ b/libbb/hash_md5_sha_x86-64_shaNI.S
@@ -217,8 +217,8 @@ sha1_process_block64_shaNI:
.size sha1_process_block64_shaNI, .-sha1_process_block64_shaNI
.section .rodata.cst16.PSHUFFLE_BYTE_FLIP_MASK, "aM", @progbits, 16
- .balign 16
+ .balign 16
PSHUFFLE_BYTE_FLIP_MASK:
- .octa 0x000102030405060708090a0b0c0d0e0f
+ .octa 0x000102030405060708090a0b0c0d0e0f
#endif
From vda.linux at googlemail.com Thu Feb 10 14:38:10 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Thu, 10 Feb 2022 15:38:10 +0100
Subject: [git commit] libbb/sha: improve comments
Message-ID: <20220210143100.BAFC48142B@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=6f56fa17131b3cbb84e887c6c5fb202f2492169e
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha256_x86-32_shaNI.S | 18 +++++++++---------
libbb/hash_md5_sha256_x86-64_shaNI.S | 19 +++++++++----------
libbb/hash_md5_sha_x86-32_shaNI.S | 2 +-
libbb/hash_md5_sha_x86-64_shaNI.S | 2 +-
4 files changed, 20 insertions(+), 21 deletions(-)
diff --git a/libbb/hash_md5_sha256_x86-32_shaNI.S b/libbb/hash_md5_sha256_x86-32_shaNI.S
index 413e2df9e..4b33449d4 100644
--- a/libbb/hash_md5_sha256_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-32_shaNI.S
@@ -4,7 +4,7 @@
// We use shorter insns, even though they are for "wrong"
// data type (fp, not int).
// For Intel, there is no penalty for doing it at all
-// (CPUs which do have such penalty do not support SHA1 insns).
+// (CPUs which do have such penalty do not support SHA insns).
// For AMD, the penalty is one extra cycle
// (allegedly: I failed to find measurable difference).
@@ -39,12 +39,13 @@
.balign 8 # allow decoders to fetch at least 2 first insns
sha256_process_block64_shaNI:
- movu128 76+0*16(%eax), XMMTMP /* DCBA (msb-to-lsb: 3,2,1,0) */
- movu128 76+1*16(%eax), STATE1 /* HGFE */
+ movu128 76+0*16(%eax), XMMTMP /* ABCD (little-endian dword order) */
+ movu128 76+1*16(%eax), STATE1 /* EFGH */
/* shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one */
mova128 STATE1, STATE0
- shufps SHUF(1,0,1,0), XMMTMP, STATE0 /* ABEF */
- shufps SHUF(3,2,3,2), XMMTMP, STATE1 /* CDGH */
+ /* --- -------------- ABCD -- EFGH */
+ shufps SHUF(1,0,1,0), XMMTMP, STATE0 /* FEBA */
+ shufps SHUF(3,2,3,2), XMMTMP, STATE1 /* HGDC */
/* XMMTMP holds flip mask from here... */
mova128 PSHUFFLE_BSWAP32_FLIP_MASK, XMMTMP
@@ -232,12 +233,11 @@ sha256_process_block64_shaNI:
sha256rnds2 STATE1, STATE0
/* Write hash values back in the correct order */
- /* STATE0: ABEF (msb-to-lsb: 3,2,1,0) */
- /* STATE1: CDGH */
mova128 STATE0, XMMTMP
/* shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one */
- shufps SHUF(3,2,3,2), STATE1, STATE0 /* DCBA */
- shufps SHUF(1,0,1,0), STATE1, XMMTMP /* HGFE */
+ /* --- -------------- HGDC -- FEBA */
+ shufps SHUF(3,2,3,2), STATE1, STATE0 /* ABCD */
+ shufps SHUF(1,0,1,0), STATE1, XMMTMP /* EFGH */
/* add current hash values to previous ones */
movu128 76+1*16(%eax), STATE1
paddd XMMTMP, STATE1
diff --git a/libbb/hash_md5_sha256_x86-64_shaNI.S b/libbb/hash_md5_sha256_x86-64_shaNI.S
index c246762aa..5ed80c2ef 100644
--- a/libbb/hash_md5_sha256_x86-64_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-64_shaNI.S
@@ -4,7 +4,7 @@
// We use shorter insns, even though they are for "wrong"
// data type (fp, not int).
// For Intel, there is no penalty for doing it at all
-// (CPUs which do have such penalty do not support SHA1 insns).
+// (CPUs which do have such penalty do not support SHA insns).
// For AMD, the penalty is one extra cycle
// (allegedly: I failed to find measurable difference).
@@ -42,12 +42,13 @@
.balign 8 # allow decoders to fetch at least 2 first insns
sha256_process_block64_shaNI:
- movu128 80+0*16(%rdi), XMMTMP /* DCBA (msb-to-lsb: 3,2,1,0) */
- movu128 80+1*16(%rdi), STATE1 /* HGFE */
+ movu128 80+0*16(%rdi), XMMTMP /* ABCD (little-endian dword order) */
+ movu128 80+1*16(%rdi), STATE1 /* EFGH */
/* shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one */
mova128 STATE1, STATE0
- shufps SHUF(1,0,1,0), XMMTMP, STATE0 /* ABEF */
- shufps SHUF(3,2,3,2), XMMTMP, STATE1 /* CDGH */
+ /* --- -------------- ABCD -- EFGH */
+ shufps SHUF(1,0,1,0), XMMTMP, STATE0 /* FEBA */
+ shufps SHUF(3,2,3,2), XMMTMP, STATE1 /* HGDC */
/* XMMTMP holds flip mask from here... */
mova128 PSHUFFLE_BSWAP32_FLIP_MASK(%rip), XMMTMP
@@ -243,13 +244,11 @@ sha256_process_block64_shaNI:
paddd CDGH_SAVE, STATE1
/* Write hash values back in the correct order */
- /* STATE0: ABEF (msb-to-lsb: 3,2,1,0) */
- /* STATE1: CDGH */
mova128 STATE0, XMMTMP
/* shufps takes dwords 0,1 from *2nd* operand, and dwords 2,3 from 1st one */
- shufps SHUF(3,2,3,2), STATE1, STATE0 /* DCBA */
- shufps SHUF(1,0,1,0), STATE1, XMMTMP /* HGFE */
-
+ /* --- -------------- HGDC -- FEBA */
+ shufps SHUF(3,2,3,2), STATE1, STATE0 /* ABCD */
+ shufps SHUF(1,0,1,0), STATE1, XMMTMP /* EFGH */
movu128 STATE0, 80+0*16(%rdi)
movu128 XMMTMP, 80+1*16(%rdi)
diff --git a/libbb/hash_md5_sha_x86-32_shaNI.S b/libbb/hash_md5_sha_x86-32_shaNI.S
index afca98a62..c7fb243ce 100644
--- a/libbb/hash_md5_sha_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha_x86-32_shaNI.S
@@ -4,7 +4,7 @@
// We use shorter insns, even though they are for "wrong"
// data type (fp, not int).
// For Intel, there is no penalty for doing it at all
-// (CPUs which do have such penalty do not support SHA1 insns).
+// (CPUs which do have such penalty do not support SHA insns).
// For AMD, the penalty is one extra cycle
// (allegedly: I failed to find measurable difference).
diff --git a/libbb/hash_md5_sha_x86-64_shaNI.S b/libbb/hash_md5_sha_x86-64_shaNI.S
index 54d122788..c13cdec07 100644
--- a/libbb/hash_md5_sha_x86-64_shaNI.S
+++ b/libbb/hash_md5_sha_x86-64_shaNI.S
@@ -4,7 +4,7 @@
// We use shorter insns, even though they are for "wrong"
// data type (fp, not int).
// For Intel, there is no penalty for doing it at all
-// (CPUs which do have such penalty do not support SHA1 insns).
+// (CPUs which do have such penalty do not support SHA insns).
// For AMD, the penalty is one extra cycle
// (allegedly: I failed to find measurable difference).
From vda.linux at googlemail.com Fri Feb 11 05:08:27 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Fri, 11 Feb 2022 06:08:27 +0100
Subject: [git commit] libbb/sha1: shrink unrolled x86-64 code
Message-ID: <20220211050806.E034782212@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=8154146be491bc66ab34d5d5f2a2466ddbdcff52
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
function old new delta
sha1_process_block64 3481 3384 -97
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha_x86-64.S | 129 ++++++++++++++++++++---------------------
libbb/hash_md5_sha_x86-64.S.sh | 111 +++++++++++++++++------------------
2 files changed, 117 insertions(+), 123 deletions(-)
diff --git a/libbb/hash_md5_sha_x86-64.S b/libbb/hash_md5_sha_x86-64.S
index 287cfe547..51fde082a 100644
--- a/libbb/hash_md5_sha_x86-64.S
+++ b/libbb/hash_md5_sha_x86-64.S
@@ -27,68 +27,60 @@ sha1_process_block64:
# xmm7: all round constants
# -64(%rsp): area for passing RCONST + W[] from vector to integer units
- movl 80(%rdi), %eax # a = ctx->hash[0]
- movl 84(%rdi), %ebx # b = ctx->hash[1]
- movl 88(%rdi), %ecx # c = ctx->hash[2]
- movl 92(%rdi), %edx # d = ctx->hash[3]
- movl 96(%rdi), %ebp # e = ctx->hash[4]
-
movaps sha1const(%rip), %xmm7
+ movaps bswap32_mask(%rip), %xmm4
pshufd $0x00, %xmm7, %xmm6
- # Load W[] to xmm registers, byteswapping on the fly.
+ # Load W[] to xmm0..3, byteswapping on the fly.
#
- # For iterations 0..15, we pass W[] in rsi,r8..r14
+ # For iterations 0..15, we pass RCONST+W[] in rsi,r8..r14
# for use in RD1As instead of spilling them to stack.
- # We lose parallelized addition of RCONST, but LEA
- # can do two additions at once, so it is probably a wash.
# (We use rsi instead of rN because this makes two
- # LEAs in two first RD1As shorter by one byte).
- movq 4*0(%rdi), %rsi
- movq 4*2(%rdi), %r8
- bswapq %rsi
- bswapq %r8
- rolq $32, %rsi # rsi = W[1]:W[0]
- rolq $32, %r8 # r8 = W[3]:W[2]
- movq %rsi, %xmm0
- movq %r8, %xmm4
- punpcklqdq %xmm4, %xmm0 # xmm0 = r8:rsi = (W[0],W[1],W[2],W[3])
-# movaps %xmm0, %xmm4 # add RCONST, spill to stack
-# paddd %xmm6, %xmm4
-# movups %xmm4, -64+16*0(%rsp)
+ # ADDs in two first RD1As shorter by one byte).
+ movups 16*0(%rdi), %xmm0
+ pshufb %xmm4, %xmm0
+ movaps %xmm0, %xmm5
+ paddd %xmm6, %xmm5
+ movq %xmm5, %rsi
+# pextrq $1, %xmm5, %r8 #SSE4.1 insn
+# movhpd %xmm5, %r8 #can only move to mem, not to reg
+ shufps $0x0e, %xmm5, %xmm5
+ movq %xmm5, %r8
+
+ movups 16*1(%rdi), %xmm1
+ pshufb %xmm4, %xmm1
+ movaps %xmm1, %xmm5
+ paddd %xmm6, %xmm5
+ movq %xmm5, %r9
+ shufps $0x0e, %xmm5, %xmm5
+ movq %xmm5, %r10
- movq 4*4(%rdi), %r9
- movq 4*6(%rdi), %r10
- bswapq %r9
- bswapq %r10
- rolq $32, %r9 # r9 = W[5]:W[4]
- rolq $32, %r10 # r10 = W[7]:W[6]
- movq %r9, %xmm1
- movq %r10, %xmm4
- punpcklqdq %xmm4, %xmm1 # xmm1 = r10:r9 = (W[4],W[5],W[6],W[7])
+ movups 16*2(%rdi), %xmm2
+ pshufb %xmm4, %xmm2
+ movaps %xmm2, %xmm5
+ paddd %xmm6, %xmm5
+ movq %xmm5, %r11
+ shufps $0x0e, %xmm5, %xmm5
+ movq %xmm5, %r12
- movq 4*8(%rdi), %r11
- movq 4*10(%rdi), %r12
- bswapq %r11
- bswapq %r12
- rolq $32, %r11 # r11 = W[9]:W[8]
- rolq $32, %r12 # r12 = W[11]:W[10]
- movq %r11, %xmm2
- movq %r12, %xmm4
- punpcklqdq %xmm4, %xmm2 # xmm2 = r12:r11 = (W[8],W[9],W[10],W[11])
+ movups 16*3(%rdi), %xmm3
+ pshufb %xmm4, %xmm3
+ movaps %xmm3, %xmm5
+ paddd %xmm6, %xmm5
+ movq %xmm5, %r13
+ shufps $0x0e, %xmm5, %xmm5
+ movq %xmm5, %r14
- movq 4*12(%rdi), %r13
- movq 4*14(%rdi), %r14
- bswapq %r13
- bswapq %r14
- rolq $32, %r13 # r13 = W[13]:W[12]
- rolq $32, %r14 # r14 = W[15]:W[14]
- movq %r13, %xmm3
- movq %r14, %xmm4
- punpcklqdq %xmm4, %xmm3 # xmm3 = r14:r13 = (W[12],W[13],W[14],W[15])
+ # MOVQs to GPRs (above) have somewhat high latency.
+ # Load hash[] while they are completing:
+ movl 80(%rdi), %eax # a = ctx->hash[0]
+ movl 84(%rdi), %ebx # b = ctx->hash[1]
+ movl 88(%rdi), %ecx # c = ctx->hash[2]
+ movl 92(%rdi), %edx # d = ctx->hash[3]
+ movl 96(%rdi), %ebp # e = ctx->hash[4]
# 0
- leal 0x5A827999(%rbp,%rsi), %ebp # e += RCONST + W[n]
+ addl %esi, %ebp # e += RCONST + W[n]
shrq $32, %rsi
movl %ecx, %edi # c
xorl %edx, %edi # ^d
@@ -100,7 +92,7 @@ sha1_process_block64:
addl %edi, %ebp # e += rotl32(a,5)
rorl $2, %ebx # b = rotl32(b,30)
# 1
- leal 0x5A827999(%rdx,%rsi), %edx # e += RCONST + W[n]
+ addl %esi, %edx # e += RCONST + W[n]
movl %ebx, %edi # c
xorl %ecx, %edi # ^d
andl %eax, %edi # &b
@@ -111,7 +103,7 @@ sha1_process_block64:
addl %edi, %edx # e += rotl32(a,5)
rorl $2, %eax # b = rotl32(b,30)
# 2
- leal 0x5A827999(%rcx,%r8), %ecx # e += RCONST + W[n]
+ addl %r8d, %ecx # e += RCONST + W[n]
shrq $32, %r8
movl %eax, %edi # c
xorl %ebx, %edi # ^d
@@ -123,7 +115,7 @@ sha1_process_block64:
addl %edi, %ecx # e += rotl32(a,5)
rorl $2, %ebp # b = rotl32(b,30)
# 3
- leal 0x5A827999(%rbx,%r8), %ebx # e += RCONST + W[n]
+ addl %r8d, %ebx # e += RCONST + W[n]
movl %ebp, %edi # c
xorl %eax, %edi # ^d
andl %edx, %edi # &b
@@ -134,7 +126,7 @@ sha1_process_block64:
addl %edi, %ebx # e += rotl32(a,5)
rorl $2, %edx # b = rotl32(b,30)
# 4
- leal 0x5A827999(%rax,%r9), %eax # e += RCONST + W[n]
+ addl %r9d, %eax # e += RCONST + W[n]
shrq $32, %r9
movl %edx, %edi # c
xorl %ebp, %edi # ^d
@@ -146,7 +138,7 @@ sha1_process_block64:
addl %edi, %eax # e += rotl32(a,5)
rorl $2, %ecx # b = rotl32(b,30)
# 5
- leal 0x5A827999(%rbp,%r9), %ebp # e += RCONST + W[n]
+ addl %r9d, %ebp # e += RCONST + W[n]
movl %ecx, %edi # c
xorl %edx, %edi # ^d
andl %ebx, %edi # &b
@@ -157,7 +149,7 @@ sha1_process_block64:
addl %edi, %ebp # e += rotl32(a,5)
rorl $2, %ebx # b = rotl32(b,30)
# 6
- leal 0x5A827999(%rdx,%r10), %edx # e += RCONST + W[n]
+ addl %r10d, %edx # e += RCONST + W[n]
shrq $32, %r10
movl %ebx, %edi # c
xorl %ecx, %edi # ^d
@@ -169,7 +161,7 @@ sha1_process_block64:
addl %edi, %edx # e += rotl32(a,5)
rorl $2, %eax # b = rotl32(b,30)
# 7
- leal 0x5A827999(%rcx,%r10), %ecx # e += RCONST + W[n]
+ addl %r10d, %ecx # e += RCONST + W[n]
movl %eax, %edi # c
xorl %ebx, %edi # ^d
andl %ebp, %edi # &b
@@ -210,7 +202,7 @@ sha1_process_block64:
paddd %xmm6, %xmm5
movups %xmm5, -64+16*0(%rsp)
# 8
- leal 0x5A827999(%rbx,%r11), %ebx # e += RCONST + W[n]
+ addl %r11d, %ebx # e += RCONST + W[n]
shrq $32, %r11
movl %ebp, %edi # c
xorl %eax, %edi # ^d
@@ -222,7 +214,7 @@ sha1_process_block64:
addl %edi, %ebx # e += rotl32(a,5)
rorl $2, %edx # b = rotl32(b,30)
# 9
- leal 0x5A827999(%rax,%r11), %eax # e += RCONST + W[n]
+ addl %r11d, %eax # e += RCONST + W[n]
movl %edx, %edi # c
xorl %ebp, %edi # ^d
andl %ecx, %edi # &b
@@ -233,7 +225,7 @@ sha1_process_block64:
addl %edi, %eax # e += rotl32(a,5)
rorl $2, %ecx # b = rotl32(b,30)
# 10
- leal 0x5A827999(%rbp,%r12), %ebp # e += RCONST + W[n]
+ addl %r12d, %ebp # e += RCONST + W[n]
shrq $32, %r12
movl %ecx, %edi # c
xorl %edx, %edi # ^d
@@ -245,7 +237,7 @@ sha1_process_block64:
addl %edi, %ebp # e += rotl32(a,5)
rorl $2, %ebx # b = rotl32(b,30)
# 11
- leal 0x5A827999(%rdx,%r12), %edx # e += RCONST + W[n]
+ addl %r12d, %edx # e += RCONST + W[n]
movl %ebx, %edi # c
xorl %ecx, %edi # ^d
andl %eax, %edi # &b
@@ -287,7 +279,7 @@ sha1_process_block64:
paddd %xmm6, %xmm5
movups %xmm5, -64+16*1(%rsp)
# 12
- leal 0x5A827999(%rcx,%r13), %ecx # e += RCONST + W[n]
+ addl %r13d, %ecx # e += RCONST + W[n]
shrq $32, %r13
movl %eax, %edi # c
xorl %ebx, %edi # ^d
@@ -299,7 +291,7 @@ sha1_process_block64:
addl %edi, %ecx # e += rotl32(a,5)
rorl $2, %ebp # b = rotl32(b,30)
# 13
- leal 0x5A827999(%rbx,%r13), %ebx # e += RCONST + W[n]
+ addl %r13d, %ebx # e += RCONST + W[n]
movl %ebp, %edi # c
xorl %eax, %edi # ^d
andl %edx, %edi # &b
@@ -310,7 +302,7 @@ sha1_process_block64:
addl %edi, %ebx # e += rotl32(a,5)
rorl $2, %edx # b = rotl32(b,30)
# 14
- leal 0x5A827999(%rax,%r14), %eax # e += RCONST + W[n]
+ addl %r14d, %eax # e += RCONST + W[n]
shrq $32, %r14
movl %edx, %edi # c
xorl %ebp, %edi # ^d
@@ -322,7 +314,7 @@ sha1_process_block64:
addl %edi, %eax # e += rotl32(a,5)
rorl $2, %ecx # b = rotl32(b,30)
# 15
- leal 0x5A827999(%rbp,%r14), %ebp # e += RCONST + W[n]
+ addl %r14d, %ebp # e += RCONST + W[n]
movl %ecx, %edi # c
xorl %edx, %edi # ^d
andl %ebx, %edi # &b
@@ -1475,6 +1467,11 @@ sha1_process_block64:
ret
.size sha1_process_block64, .-sha1_process_block64
+ .section .rodata.cst16.bswap32_mask, "aM", @progbits, 16
+ .balign 16
+bswap32_mask:
+ .octa 0x0c0d0e0f08090a0b0405060700010203
+
.section .rodata.cst16.sha1const, "aM", @progbits, 16
.balign 16
sha1const:
diff --git a/libbb/hash_md5_sha_x86-64.S.sh b/libbb/hash_md5_sha_x86-64.S.sh
index a10ac411d..f34e6e6fa 100755
--- a/libbb/hash_md5_sha_x86-64.S.sh
+++ b/libbb/hash_md5_sha_x86-64.S.sh
@@ -129,65 +129,57 @@ sha1_process_block64:
# xmm7: all round constants
# -64(%rsp): area for passing RCONST + W[] from vector to integer units
- movl 80(%rdi), %eax # a = ctx->hash[0]
- movl 84(%rdi), %ebx # b = ctx->hash[1]
- movl 88(%rdi), %ecx # c = ctx->hash[2]
- movl 92(%rdi), %edx # d = ctx->hash[3]
- movl 96(%rdi), %ebp # e = ctx->hash[4]
-
movaps sha1const(%rip), $xmmALLRCONST
+ movaps bswap32_mask(%rip), $xmmT1
pshufd \$0x00, $xmmALLRCONST, $xmmRCONST
- # Load W[] to xmm registers, byteswapping on the fly.
+ # Load W[] to xmm0..3, byteswapping on the fly.
#
- # For iterations 0..15, we pass W[] in rsi,r8..r14
+ # For iterations 0..15, we pass RCONST+W[] in rsi,r8..r14
# for use in RD1As instead of spilling them to stack.
- # We lose parallelized addition of RCONST, but LEA
- # can do two additions at once, so it is probably a wash.
# (We use rsi instead of rN because this makes two
- # LEAs in two first RD1As shorter by one byte).
- movq 4*0(%rdi), %rsi
- movq 4*2(%rdi), %r8
- bswapq %rsi
- bswapq %r8
- rolq \$32, %rsi # rsi = W[1]:W[0]
- rolq \$32, %r8 # r8 = W[3]:W[2]
- movq %rsi, %xmm0
- movq %r8, $xmmT1
- punpcklqdq $xmmT1, %xmm0 # xmm0 = r8:rsi = (W[0],W[1],W[2],W[3])
-# movaps %xmm0, $xmmT1 # add RCONST, spill to stack
-# paddd $xmmRCONST, $xmmT1
-# movups $xmmT1, -64+16*0(%rsp)
-
- movq 4*4(%rdi), %r9
- movq 4*6(%rdi), %r10
- bswapq %r9
- bswapq %r10
- rolq \$32, %r9 # r9 = W[5]:W[4]
- rolq \$32, %r10 # r10 = W[7]:W[6]
- movq %r9, %xmm1
- movq %r10, $xmmT1
- punpcklqdq $xmmT1, %xmm1 # xmm1 = r10:r9 = (W[4],W[5],W[6],W[7])
-
- movq 4*8(%rdi), %r11
- movq 4*10(%rdi), %r12
- bswapq %r11
- bswapq %r12
- rolq \$32, %r11 # r11 = W[9]:W[8]
- rolq \$32, %r12 # r12 = W[11]:W[10]
- movq %r11, %xmm2
- movq %r12, $xmmT1
- punpcklqdq $xmmT1, %xmm2 # xmm2 = r12:r11 = (W[8],W[9],W[10],W[11])
-
- movq 4*12(%rdi), %r13
- movq 4*14(%rdi), %r14
- bswapq %r13
- bswapq %r14
- rolq \$32, %r13 # r13 = W[13]:W[12]
- rolq \$32, %r14 # r14 = W[15]:W[14]
- movq %r13, %xmm3
- movq %r14, $xmmT1
- punpcklqdq $xmmT1, %xmm3 # xmm3 = r14:r13 = (W[12],W[13],W[14],W[15])
+ # ADDs in two first RD1As shorter by one byte).
+ movups 16*0(%rdi), %xmm0
+ pshufb $xmmT1, %xmm0
+ movaps %xmm0, $xmmT2
+ paddd $xmmRCONST, $xmmT2
+ movq $xmmT2, %rsi
+# pextrq \$1, $xmmT2, %r8 #SSE4.1 insn
+# movhpd $xmmT2, %r8 #can only move to mem, not to reg
+ shufps \$0x0e, $xmmT2, $xmmT2
+ movq $xmmT2, %r8
+
+ movups 16*1(%rdi), %xmm1
+ pshufb $xmmT1, %xmm1
+ movaps %xmm1, $xmmT2
+ paddd $xmmRCONST, $xmmT2
+ movq $xmmT2, %r9
+ shufps \$0x0e, $xmmT2, $xmmT2
+ movq $xmmT2, %r10
+
+ movups 16*2(%rdi), %xmm2
+ pshufb $xmmT1, %xmm2
+ movaps %xmm2, $xmmT2
+ paddd $xmmRCONST, $xmmT2
+ movq $xmmT2, %r11
+ shufps \$0x0e, $xmmT2, $xmmT2
+ movq $xmmT2, %r12
+
+ movups 16*3(%rdi), %xmm3
+ pshufb $xmmT1, %xmm3
+ movaps %xmm3, $xmmT2
+ paddd $xmmRCONST, $xmmT2
+ movq $xmmT2, %r13
+ shufps \$0x0e, $xmmT2, $xmmT2
+ movq $xmmT2, %r14
+
+ # MOVQs to GPRs (above) have somewhat high latency.
+ # Load hash[] while they are completing:
+ movl 80(%rdi), %eax # a = ctx->hash[0]
+ movl 84(%rdi), %ebx # b = ctx->hash[1]
+ movl 88(%rdi), %ecx # c = ctx->hash[2]
+ movl 92(%rdi), %edx # d = ctx->hash[3]
+ movl 96(%rdi), %ebp # e = ctx->hash[4]
"
PREP() {
@@ -266,15 +258,15 @@ local rN=$((7+n0/2))
echo "
# $n
";test $n0 = 0 && echo "
- leal $RCONST(%r$e,%rsi), %e$e # e += RCONST + W[n]
+ addl %esi, %e$e # e += RCONST + W[n]
shrq \$32, %rsi
";test $n0 = 1 && echo "
- leal $RCONST(%r$e,%rsi), %e$e # e += RCONST + W[n]
+ addl %esi, %e$e # e += RCONST + W[n]
";test $n0 -ge 2 && test $((n0 & 1)) = 0 && echo "
- leal $RCONST(%r$e,%r$rN), %e$e # e += RCONST + W[n]
+ addl %r${rN}d, %e$e # e += RCONST + W[n]
shrq \$32, %r$rN
";test $n0 -ge 2 && test $((n0 & 1)) = 1 && echo "
- leal $RCONST(%r$e,%r$rN), %e$e # e += RCONST + W[n]
+ addl %r${rN}d, %e$e # e += RCONST + W[n]
";echo "
movl %e$c, %edi # c
xorl %e$d, %edi # ^d
@@ -440,6 +432,11 @@ echo "
ret
.size sha1_process_block64, .-sha1_process_block64
+ .section .rodata.cst16.bswap32_mask, \"aM\", @progbits, 16
+ .balign 16
+bswap32_mask:
+ .octa 0x0c0d0e0f08090a0b0405060700010203
+
.section .rodata.cst16.sha1const, \"aM\", @progbits, 16
.balign 16
sha1const:
From vda.linux at googlemail.com Fri Feb 11 13:53:26 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Fri, 11 Feb 2022 14:53:26 +0100
Subject: [git commit] libbb/sha1: revert last commit: pshufb is a SSSE3 insn,
can't use it
Message-ID: <20220211134649.1F2D782E01@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=dda77e83762861b52d62f0f161e2b4bf8092eacf
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha256_x86-32_shaNI.S | 4 ++
libbb/hash_md5_sha256_x86-64_shaNI.S | 4 ++
libbb/hash_md5_sha_x86-32_shaNI.S | 5 ++
libbb/hash_md5_sha_x86-64.S | 127 +++++++++++++++++----------------
libbb/hash_md5_sha_x86-64.S.sh | 133 +++++++++++++++++++++--------------
libbb/hash_md5_sha_x86-64_shaNI.S | 5 ++
6 files changed, 163 insertions(+), 115 deletions(-)
diff --git a/libbb/hash_md5_sha256_x86-32_shaNI.S b/libbb/hash_md5_sha256_x86-32_shaNI.S
index 4b33449d4..c059fb18d 100644
--- a/libbb/hash_md5_sha256_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-32_shaNI.S
@@ -15,6 +15,10 @@
//#define shuf128_32 pshufd
#define shuf128_32 shufps
+// pshufb and palignr are SSSE3 insns.
+// We do not check SSSE3 in cpuid,
+// all SHA-capable CPUs support it as well.
+
.section .text.sha256_process_block64_shaNI, "ax", @progbits
.globl sha256_process_block64_shaNI
.hidden sha256_process_block64_shaNI
diff --git a/libbb/hash_md5_sha256_x86-64_shaNI.S b/libbb/hash_md5_sha256_x86-64_shaNI.S
index 5ed80c2ef..9578441f8 100644
--- a/libbb/hash_md5_sha256_x86-64_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-64_shaNI.S
@@ -15,6 +15,10 @@
//#define shuf128_32 pshufd
#define shuf128_32 shufps
+// pshufb and palignr are SSSE3 insns.
+// We do not check SSSE3 in cpuid,
+// all SHA-capable CPUs support it as well.
+
.section .text.sha256_process_block64_shaNI, "ax", @progbits
.globl sha256_process_block64_shaNI
.hidden sha256_process_block64_shaNI
diff --git a/libbb/hash_md5_sha_x86-32_shaNI.S b/libbb/hash_md5_sha_x86-32_shaNI.S
index c7fb243ce..2366b046a 100644
--- a/libbb/hash_md5_sha_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha_x86-32_shaNI.S
@@ -20,6 +20,11 @@
#define extr128_32 pextrd
//#define extr128_32 extractps # not shorter
+// pshufb is a SSSE3 insn.
+// pinsrd, pextrd, extractps are SSE4.1 insns.
+// We do not check SSSE3/SSE4.1 in cpuid,
+// all SHA-capable CPUs support them as well.
+
.section .text.sha1_process_block64_shaNI, "ax", @progbits
.globl sha1_process_block64_shaNI
.hidden sha1_process_block64_shaNI
diff --git a/libbb/hash_md5_sha_x86-64.S b/libbb/hash_md5_sha_x86-64.S
index 51fde082a..f0daa30f6 100644
--- a/libbb/hash_md5_sha_x86-64.S
+++ b/libbb/hash_md5_sha_x86-64.S
@@ -27,60 +27,68 @@ sha1_process_block64:
# xmm7: all round constants
# -64(%rsp): area for passing RCONST + W[] from vector to integer units
+ movl 80(%rdi), %eax # a = ctx->hash[0]
+ movl 84(%rdi), %ebx # b = ctx->hash[1]
+ movl 88(%rdi), %ecx # c = ctx->hash[2]
+ movl 92(%rdi), %edx # d = ctx->hash[3]
+ movl 96(%rdi), %ebp # e = ctx->hash[4]
+
movaps sha1const(%rip), %xmm7
- movaps bswap32_mask(%rip), %xmm4
pshufd $0x00, %xmm7, %xmm6
# Load W[] to xmm0..3, byteswapping on the fly.
#
- # For iterations 0..15, we pass RCONST+W[] in rsi,r8..r14
+ # For iterations 0..15, we pass W[] in rsi,r8..r14
# for use in RD1As instead of spilling them to stack.
+ # We lose parallelized addition of RCONST, but LEA
+ # can do two additions at once, so it is probably a wash.
# (We use rsi instead of rN because this makes two
- # ADDs in two first RD1As shorter by one byte).
- movups 16*0(%rdi), %xmm0
- pshufb %xmm4, %xmm0
- movaps %xmm0, %xmm5
- paddd %xmm6, %xmm5
- movq %xmm5, %rsi
-# pextrq $1, %xmm5, %r8 #SSE4.1 insn
-# movhpd %xmm5, %r8 #can only move to mem, not to reg
- shufps $0x0e, %xmm5, %xmm5
- movq %xmm5, %r8
-
- movups 16*1(%rdi), %xmm1
- pshufb %xmm4, %xmm1
- movaps %xmm1, %xmm5
- paddd %xmm6, %xmm5
- movq %xmm5, %r9
- shufps $0x0e, %xmm5, %xmm5
- movq %xmm5, %r10
+ # LEAs in two first RD1As shorter by one byte).
+ movq 4*0(%rdi), %rsi
+ movq 4*2(%rdi), %r8
+ bswapq %rsi
+ bswapq %r8
+ rolq $32, %rsi # rsi = W[1]:W[0]
+ rolq $32, %r8 # r8 = W[3]:W[2]
+ movq %rsi, %xmm0
+ movq %r8, %xmm4
+ punpcklqdq %xmm4, %xmm0 # xmm0 = r8:rsi = (W[0],W[1],W[2],W[3])
+# movaps %xmm0, %xmm4 # add RCONST, spill to stack
+# paddd %xmm6, %xmm4
+# movups %xmm4, -64+16*0(%rsp)
- movups 16*2(%rdi), %xmm2
- pshufb %xmm4, %xmm2
- movaps %xmm2, %xmm5
- paddd %xmm6, %xmm5
- movq %xmm5, %r11
- shufps $0x0e, %xmm5, %xmm5
- movq %xmm5, %r12
+ movq 4*4(%rdi), %r9
+ movq 4*6(%rdi), %r10
+ bswapq %r9
+ bswapq %r10
+ rolq $32, %r9 # r9 = W[5]:W[4]
+ rolq $32, %r10 # r10 = W[7]:W[6]
+ movq %r9, %xmm1
+ movq %r10, %xmm4
+ punpcklqdq %xmm4, %xmm1 # xmm1 = r10:r9 = (W[4],W[5],W[6],W[7])
- movups 16*3(%rdi), %xmm3
- pshufb %xmm4, %xmm3
- movaps %xmm3, %xmm5
- paddd %xmm6, %xmm5
- movq %xmm5, %r13
- shufps $0x0e, %xmm5, %xmm5
- movq %xmm5, %r14
+ movq 4*8(%rdi), %r11
+ movq 4*10(%rdi), %r12
+ bswapq %r11
+ bswapq %r12
+ rolq $32, %r11 # r11 = W[9]:W[8]
+ rolq $32, %r12 # r12 = W[11]:W[10]
+ movq %r11, %xmm2
+ movq %r12, %xmm4
+ punpcklqdq %xmm4, %xmm2 # xmm2 = r12:r11 = (W[8],W[9],W[10],W[11])
- # MOVQs to GPRs (above) have somewhat high latency.
- # Load hash[] while they are completing:
- movl 80(%rdi), %eax # a = ctx->hash[0]
- movl 84(%rdi), %ebx # b = ctx->hash[1]
- movl 88(%rdi), %ecx # c = ctx->hash[2]
- movl 92(%rdi), %edx # d = ctx->hash[3]
- movl 96(%rdi), %ebp # e = ctx->hash[4]
+ movq 4*12(%rdi), %r13
+ movq 4*14(%rdi), %r14
+ bswapq %r13
+ bswapq %r14
+ rolq $32, %r13 # r13 = W[13]:W[12]
+ rolq $32, %r14 # r14 = W[15]:W[14]
+ movq %r13, %xmm3
+ movq %r14, %xmm4
+ punpcklqdq %xmm4, %xmm3 # xmm3 = r14:r13 = (W[12],W[13],W[14],W[15])
# 0
- addl %esi, %ebp # e += RCONST + W[n]
+ leal 0x5A827999(%rbp,%rsi), %ebp # e += RCONST + W[n]
shrq $32, %rsi
movl %ecx, %edi # c
xorl %edx, %edi # ^d
@@ -92,7 +100,7 @@ sha1_process_block64:
addl %edi, %ebp # e += rotl32(a,5)
rorl $2, %ebx # b = rotl32(b,30)
# 1
- addl %esi, %edx # e += RCONST + W[n]
+ leal 0x5A827999(%rdx,%rsi), %edx # e += RCONST + W[n]
movl %ebx, %edi # c
xorl %ecx, %edi # ^d
andl %eax, %edi # &b
@@ -103,7 +111,7 @@ sha1_process_block64:
addl %edi, %edx # e += rotl32(a,5)
rorl $2, %eax # b = rotl32(b,30)
# 2
- addl %r8d, %ecx # e += RCONST + W[n]
+ leal 0x5A827999(%rcx,%r8), %ecx # e += RCONST + W[n]
shrq $32, %r8
movl %eax, %edi # c
xorl %ebx, %edi # ^d
@@ -115,7 +123,7 @@ sha1_process_block64:
addl %edi, %ecx # e += rotl32(a,5)
rorl $2, %ebp # b = rotl32(b,30)
# 3
- addl %r8d, %ebx # e += RCONST + W[n]
+ leal 0x5A827999(%rbx,%r8), %ebx # e += RCONST + W[n]
movl %ebp, %edi # c
xorl %eax, %edi # ^d
andl %edx, %edi # &b
@@ -126,7 +134,7 @@ sha1_process_block64:
addl %edi, %ebx # e += rotl32(a,5)
rorl $2, %edx # b = rotl32(b,30)
# 4
- addl %r9d, %eax # e += RCONST + W[n]
+ leal 0x5A827999(%rax,%r9), %eax # e += RCONST + W[n]
shrq $32, %r9
movl %edx, %edi # c
xorl %ebp, %edi # ^d
@@ -138,7 +146,7 @@ sha1_process_block64:
addl %edi, %eax # e += rotl32(a,5)
rorl $2, %ecx # b = rotl32(b,30)
# 5
- addl %r9d, %ebp # e += RCONST + W[n]
+ leal 0x5A827999(%rbp,%r9), %ebp # e += RCONST + W[n]
movl %ecx, %edi # c
xorl %edx, %edi # ^d
andl %ebx, %edi # &b
@@ -149,7 +157,7 @@ sha1_process_block64:
addl %edi, %ebp # e += rotl32(a,5)
rorl $2, %ebx # b = rotl32(b,30)
# 6
- addl %r10d, %edx # e += RCONST + W[n]
+ leal 0x5A827999(%rdx,%r10), %edx # e += RCONST + W[n]
shrq $32, %r10
movl %ebx, %edi # c
xorl %ecx, %edi # ^d
@@ -161,7 +169,7 @@ sha1_process_block64:
addl %edi, %edx # e += rotl32(a,5)
rorl $2, %eax # b = rotl32(b,30)
# 7
- addl %r10d, %ecx # e += RCONST + W[n]
+ leal 0x5A827999(%rcx,%r10), %ecx # e += RCONST + W[n]
movl %eax, %edi # c
xorl %ebx, %edi # ^d
andl %ebp, %edi # &b
@@ -202,7 +210,7 @@ sha1_process_block64:
paddd %xmm6, %xmm5
movups %xmm5, -64+16*0(%rsp)
# 8
- addl %r11d, %ebx # e += RCONST + W[n]
+ leal 0x5A827999(%rbx,%r11), %ebx # e += RCONST + W[n]
shrq $32, %r11
movl %ebp, %edi # c
xorl %eax, %edi # ^d
@@ -214,7 +222,7 @@ sha1_process_block64:
addl %edi, %ebx # e += rotl32(a,5)
rorl $2, %edx # b = rotl32(b,30)
# 9
- addl %r11d, %eax # e += RCONST + W[n]
+ leal 0x5A827999(%rax,%r11), %eax # e += RCONST + W[n]
movl %edx, %edi # c
xorl %ebp, %edi # ^d
andl %ecx, %edi # &b
@@ -225,7 +233,7 @@ sha1_process_block64:
addl %edi, %eax # e += rotl32(a,5)
rorl $2, %ecx # b = rotl32(b,30)
# 10
- addl %r12d, %ebp # e += RCONST + W[n]
+ leal 0x5A827999(%rbp,%r12), %ebp # e += RCONST + W[n]
shrq $32, %r12
movl %ecx, %edi # c
xorl %edx, %edi # ^d
@@ -237,7 +245,7 @@ sha1_process_block64:
addl %edi, %ebp # e += rotl32(a,5)
rorl $2, %ebx # b = rotl32(b,30)
# 11
- addl %r12d, %edx # e += RCONST + W[n]
+ leal 0x5A827999(%rdx,%r12), %edx # e += RCONST + W[n]
movl %ebx, %edi # c
xorl %ecx, %edi # ^d
andl %eax, %edi # &b
@@ -279,7 +287,7 @@ sha1_process_block64:
paddd %xmm6, %xmm5
movups %xmm5, -64+16*1(%rsp)
# 12
- addl %r13d, %ecx # e += RCONST + W[n]
+ leal 0x5A827999(%rcx,%r13), %ecx # e += RCONST + W[n]
shrq $32, %r13
movl %eax, %edi # c
xorl %ebx, %edi # ^d
@@ -291,7 +299,7 @@ sha1_process_block64:
addl %edi, %ecx # e += rotl32(a,5)
rorl $2, %ebp # b = rotl32(b,30)
# 13
- addl %r13d, %ebx # e += RCONST + W[n]
+ leal 0x5A827999(%rbx,%r13), %ebx # e += RCONST + W[n]
movl %ebp, %edi # c
xorl %eax, %edi # ^d
andl %edx, %edi # &b
@@ -302,7 +310,7 @@ sha1_process_block64:
addl %edi, %ebx # e += rotl32(a,5)
rorl $2, %edx # b = rotl32(b,30)
# 14
- addl %r14d, %eax # e += RCONST + W[n]
+ leal 0x5A827999(%rax,%r14), %eax # e += RCONST + W[n]
shrq $32, %r14
movl %edx, %edi # c
xorl %ebp, %edi # ^d
@@ -314,7 +322,7 @@ sha1_process_block64:
addl %edi, %eax # e += rotl32(a,5)
rorl $2, %ecx # b = rotl32(b,30)
# 15
- addl %r14d, %ebp # e += RCONST + W[n]
+ leal 0x5A827999(%rbp,%r14), %ebp # e += RCONST + W[n]
movl %ecx, %edi # c
xorl %edx, %edi # ^d
andl %ebx, %edi # &b
@@ -1467,11 +1475,6 @@ sha1_process_block64:
ret
.size sha1_process_block64, .-sha1_process_block64
- .section .rodata.cst16.bswap32_mask, "aM", @progbits, 16
- .balign 16
-bswap32_mask:
- .octa 0x0c0d0e0f08090a0b0405060700010203
-
.section .rodata.cst16.sha1const, "aM", @progbits, 16
.balign 16
sha1const:
diff --git a/libbb/hash_md5_sha_x86-64.S.sh b/libbb/hash_md5_sha_x86-64.S.sh
index f34e6e6fa..57e77b118 100755
--- a/libbb/hash_md5_sha_x86-64.S.sh
+++ b/libbb/hash_md5_sha_x86-64.S.sh
@@ -99,6 +99,30 @@ INTERLEAVE() {
)
}
+# movaps bswap32_mask(%rip), $xmmT1
+# Load W[] to xmm0..3, byteswapping on the fly.
+# For iterations 0..15, we pass RCONST+W[] in rsi,r8..r14
+# for use in RD1As instead of spilling them to stack.
+# (We use rsi instead of rN because this makes two
+# ADDs in two first RD1As shorter by one byte).
+# movups 16*0(%rdi), %xmm0
+# pshufb $xmmT1, %xmm0 #SSSE3 insn
+# movaps %xmm0, $xmmT2
+# paddd $xmmRCONST, $xmmT2
+# movq $xmmT2, %rsi
+# #pextrq \$1, $xmmT2, %r8 #SSE4.1 insn
+# #movhpd $xmmT2, %r8 #can only move to mem, not to reg
+# shufps \$0x0e, $xmmT2, $xmmT2 # have to use two-insn sequence
+# movq $xmmT2, %r8 # instead
+# ...
+#
+# ...
+#- leal $RCONST(%r$e,%rsi), %e$e # e += RCONST + W[n]
+#+ addl %esi, %e$e # e += RCONST + W[n]
+# ^^^^^^^^^^^^^^^^^^^^^^^^
+# The above is -97 bytes of code...
+# ...but pshufb is a SSSE3 insn. Can't use it.
+
echo \
"### Generated by hash_md5_sha_x86-64.S.sh ###
@@ -129,57 +153,65 @@ sha1_process_block64:
# xmm7: all round constants
# -64(%rsp): area for passing RCONST + W[] from vector to integer units
+ movl 80(%rdi), %eax # a = ctx->hash[0]
+ movl 84(%rdi), %ebx # b = ctx->hash[1]
+ movl 88(%rdi), %ecx # c = ctx->hash[2]
+ movl 92(%rdi), %edx # d = ctx->hash[3]
+ movl 96(%rdi), %ebp # e = ctx->hash[4]
+
movaps sha1const(%rip), $xmmALLRCONST
- movaps bswap32_mask(%rip), $xmmT1
pshufd \$0x00, $xmmALLRCONST, $xmmRCONST
# Load W[] to xmm0..3, byteswapping on the fly.
#
- # For iterations 0..15, we pass RCONST+W[] in rsi,r8..r14
+ # For iterations 0..15, we pass W[] in rsi,r8..r14
# for use in RD1As instead of spilling them to stack.
+ # We lose parallelized addition of RCONST, but LEA
+ # can do two additions at once, so it is probably a wash.
# (We use rsi instead of rN because this makes two
- # ADDs in two first RD1As shorter by one byte).
- movups 16*0(%rdi), %xmm0
- pshufb $xmmT1, %xmm0
- movaps %xmm0, $xmmT2
- paddd $xmmRCONST, $xmmT2
- movq $xmmT2, %rsi
-# pextrq \$1, $xmmT2, %r8 #SSE4.1 insn
-# movhpd $xmmT2, %r8 #can only move to mem, not to reg
- shufps \$0x0e, $xmmT2, $xmmT2
- movq $xmmT2, %r8
-
- movups 16*1(%rdi), %xmm1
- pshufb $xmmT1, %xmm1
- movaps %xmm1, $xmmT2
- paddd $xmmRCONST, $xmmT2
- movq $xmmT2, %r9
- shufps \$0x0e, $xmmT2, $xmmT2
- movq $xmmT2, %r10
-
- movups 16*2(%rdi), %xmm2
- pshufb $xmmT1, %xmm2
- movaps %xmm2, $xmmT2
- paddd $xmmRCONST, $xmmT2
- movq $xmmT2, %r11
- shufps \$0x0e, $xmmT2, $xmmT2
- movq $xmmT2, %r12
-
- movups 16*3(%rdi), %xmm3
- pshufb $xmmT1, %xmm3
- movaps %xmm3, $xmmT2
- paddd $xmmRCONST, $xmmT2
- movq $xmmT2, %r13
- shufps \$0x0e, $xmmT2, $xmmT2
- movq $xmmT2, %r14
-
- # MOVQs to GPRs (above) have somewhat high latency.
- # Load hash[] while they are completing:
- movl 80(%rdi), %eax # a = ctx->hash[0]
- movl 84(%rdi), %ebx # b = ctx->hash[1]
- movl 88(%rdi), %ecx # c = ctx->hash[2]
- movl 92(%rdi), %edx # d = ctx->hash[3]
- movl 96(%rdi), %ebp # e = ctx->hash[4]
+ # LEAs in two first RD1As shorter by one byte).
+ movq 4*0(%rdi), %rsi
+ movq 4*2(%rdi), %r8
+ bswapq %rsi
+ bswapq %r8
+ rolq \$32, %rsi # rsi = W[1]:W[0]
+ rolq \$32, %r8 # r8 = W[3]:W[2]
+ movq %rsi, %xmm0
+ movq %r8, $xmmT1
+ punpcklqdq $xmmT1, %xmm0 # xmm0 = r8:rsi = (W[0],W[1],W[2],W[3])
+# movaps %xmm0, $xmmT1 # add RCONST, spill to stack
+# paddd $xmmRCONST, $xmmT1
+# movups $xmmT1, -64+16*0(%rsp)
+
+ movq 4*4(%rdi), %r9
+ movq 4*6(%rdi), %r10
+ bswapq %r9
+ bswapq %r10
+ rolq \$32, %r9 # r9 = W[5]:W[4]
+ rolq \$32, %r10 # r10 = W[7]:W[6]
+ movq %r9, %xmm1
+ movq %r10, $xmmT1
+ punpcklqdq $xmmT1, %xmm1 # xmm1 = r10:r9 = (W[4],W[5],W[6],W[7])
+
+ movq 4*8(%rdi), %r11
+ movq 4*10(%rdi), %r12
+ bswapq %r11
+ bswapq %r12
+ rolq \$32, %r11 # r11 = W[9]:W[8]
+ rolq \$32, %r12 # r12 = W[11]:W[10]
+ movq %r11, %xmm2
+ movq %r12, $xmmT1
+ punpcklqdq $xmmT1, %xmm2 # xmm2 = r12:r11 = (W[8],W[9],W[10],W[11])
+
+ movq 4*12(%rdi), %r13
+ movq 4*14(%rdi), %r14
+ bswapq %r13
+ bswapq %r14
+ rolq \$32, %r13 # r13 = W[13]:W[12]
+ rolq \$32, %r14 # r14 = W[15]:W[14]
+ movq %r13, %xmm3
+ movq %r14, $xmmT1
+ punpcklqdq $xmmT1, %xmm3 # xmm3 = r14:r13 = (W[12],W[13],W[14],W[15])
"
PREP() {
@@ -258,15 +290,15 @@ local rN=$((7+n0/2))
echo "
# $n
";test $n0 = 0 && echo "
- addl %esi, %e$e # e += RCONST + W[n]
+ leal $RCONST(%r$e,%rsi), %e$e # e += RCONST + W[n]
shrq \$32, %rsi
";test $n0 = 1 && echo "
- addl %esi, %e$e # e += RCONST + W[n]
+ leal $RCONST(%r$e,%rsi), %e$e # e += RCONST + W[n]
";test $n0 -ge 2 && test $((n0 & 1)) = 0 && echo "
- addl %r${rN}d, %e$e # e += RCONST + W[n]
+ leal $RCONST(%r$e,%r$rN), %e$e # e += RCONST + W[n]
shrq \$32, %r$rN
";test $n0 -ge 2 && test $((n0 & 1)) = 1 && echo "
- addl %r${rN}d, %e$e # e += RCONST + W[n]
+ leal $RCONST(%r$e,%r$rN), %e$e # e += RCONST + W[n]
";echo "
movl %e$c, %edi # c
xorl %e$d, %edi # ^d
@@ -432,11 +464,6 @@ echo "
ret
.size sha1_process_block64, .-sha1_process_block64
- .section .rodata.cst16.bswap32_mask, \"aM\", @progbits, 16
- .balign 16
-bswap32_mask:
- .octa 0x0c0d0e0f08090a0b0405060700010203
-
.section .rodata.cst16.sha1const, \"aM\", @progbits, 16
.balign 16
sha1const:
diff --git a/libbb/hash_md5_sha_x86-64_shaNI.S b/libbb/hash_md5_sha_x86-64_shaNI.S
index c13cdec07..794e97040 100644
--- a/libbb/hash_md5_sha_x86-64_shaNI.S
+++ b/libbb/hash_md5_sha_x86-64_shaNI.S
@@ -20,6 +20,11 @@
#define extr128_32 pextrd
//#define extr128_32 extractps # not shorter
+// pshufb is a SSSE3 insn.
+// pinsrd, pextrd, extractps are SSE4.1 insns.
+// We do not check SSSE3/SSE4.1 in cpuid,
+// all SHA-capable CPUs support them as well.
+
.section .text.sha1_process_block64_shaNI, "ax", @progbits
.globl sha1_process_block64_shaNI
.hidden sha1_process_block64_shaNI
From vda.linux at googlemail.com Fri Feb 11 22:03:27 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Fri, 11 Feb 2022 23:03:27 +0100
Subject: [git commit] whitespace fixes
Message-ID: <20220211215609.0CC91831C4@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=1f272c06d02e7c7f0f3af1f97165722255c8828d
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha_x86-64.S | 8 ++++----
libbb/hash_md5_sha_x86-64.S.sh | 14 +++++++-------
2 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/libbb/hash_md5_sha_x86-64.S b/libbb/hash_md5_sha_x86-64.S
index f0daa30f6..1d55b91f8 100644
--- a/libbb/hash_md5_sha_x86-64.S
+++ b/libbb/hash_md5_sha_x86-64.S
@@ -71,8 +71,8 @@ sha1_process_block64:
movq 4*10(%rdi), %r12
bswapq %r11
bswapq %r12
- rolq $32, %r11 # r11 = W[9]:W[8]
- rolq $32, %r12 # r12 = W[11]:W[10]
+ rolq $32, %r11 # r11 = W[9]:W[8]
+ rolq $32, %r12 # r12 = W[11]:W[10]
movq %r11, %xmm2
movq %r12, %xmm4
punpcklqdq %xmm4, %xmm2 # xmm2 = r12:r11 = (W[8],W[9],W[10],W[11])
@@ -81,8 +81,8 @@ sha1_process_block64:
movq 4*14(%rdi), %r14
bswapq %r13
bswapq %r14
- rolq $32, %r13 # r13 = W[13]:W[12]
- rolq $32, %r14 # r14 = W[15]:W[14]
+ rolq $32, %r13 # r13 = W[13]:W[12]
+ rolq $32, %r14 # r14 = W[15]:W[14]
movq %r13, %xmm3
movq %r14, %xmm4
punpcklqdq %xmm4, %xmm3 # xmm3 = r14:r13 = (W[12],W[13],W[14],W[15])
diff --git a/libbb/hash_md5_sha_x86-64.S.sh b/libbb/hash_md5_sha_x86-64.S.sh
index 57e77b118..40c979d35 100755
--- a/libbb/hash_md5_sha_x86-64.S.sh
+++ b/libbb/hash_md5_sha_x86-64.S.sh
@@ -99,7 +99,7 @@ INTERLEAVE() {
)
}
-# movaps bswap32_mask(%rip), $xmmT1
+# movaps bswap32_mask(%rip), $xmmT1
# Load W[] to xmm0..3, byteswapping on the fly.
# For iterations 0..15, we pass RCONST+W[] in rsi,r8..r14
# for use in RD1As instead of spilling them to stack.
@@ -110,8 +110,8 @@ INTERLEAVE() {
# movaps %xmm0, $xmmT2
# paddd $xmmRCONST, $xmmT2
# movq $xmmT2, %rsi
-# #pextrq \$1, $xmmT2, %r8 #SSE4.1 insn
-# #movhpd $xmmT2, %r8 #can only move to mem, not to reg
+# #pextrq \$1, $xmmT2, %r8 #SSE4.1 insn
+# #movhpd $xmmT2, %r8 #can only move to mem, not to reg
# shufps \$0x0e, $xmmT2, $xmmT2 # have to use two-insn sequence
# movq $xmmT2, %r8 # instead
# ...
@@ -197,8 +197,8 @@ sha1_process_block64:
movq 4*10(%rdi), %r12
bswapq %r11
bswapq %r12
- rolq \$32, %r11 # r11 = W[9]:W[8]
- rolq \$32, %r12 # r12 = W[11]:W[10]
+ rolq \$32, %r11 # r11 = W[9]:W[8]
+ rolq \$32, %r12 # r12 = W[11]:W[10]
movq %r11, %xmm2
movq %r12, $xmmT1
punpcklqdq $xmmT1, %xmm2 # xmm2 = r12:r11 = (W[8],W[9],W[10],W[11])
@@ -207,8 +207,8 @@ sha1_process_block64:
movq 4*14(%rdi), %r14
bswapq %r13
bswapq %r14
- rolq \$32, %r13 # r13 = W[13]:W[12]
- rolq \$32, %r14 # r14 = W[15]:W[14]
+ rolq \$32, %r13 # r13 = W[13]:W[12]
+ rolq \$32, %r14 # r14 = W[15]:W[14]
movq %r13, %xmm3
movq %r14, $xmmT1
punpcklqdq $xmmT1, %xmm3 # xmm3 = r14:r13 = (W[12],W[13],W[14],W[15])
From bugzilla at busybox.net Fri Feb 11 22:39:50 2022
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Fri, 11 Feb 2022 22:39:50 +0000
Subject: [Bug 14586] lsof missing from command description page
In-Reply-To:
References:
Message-ID:
https://bugs.busybox.net/show_bug.cgi?id=14586
Mike Frysinger changed:
What |Removed |Added
----------------------------------------------------------------------------
CC|vapier at gentoo.org |busybox-cvs at busybox.net
Assignee|unassigned at buildroot.uclibc |unassigned at busybox.net
|.org |
Component|Website |Website
Product|Infrastructure |Busybox
--
You are receiving this mail because:
You are on the CC list for the bug.
From vda.linux at googlemail.com Fri Feb 11 23:52:12 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sat, 12 Feb 2022 00:52:12 +0100
Subject: [git commit] libbb/sha256: explicitly use sha256rnds2's %xmm0 (MSG)
argument
Message-ID: <20220211234704.AFEFE82DB5@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=c2e7780e526b0f421c3b43367a53019d1dc5f2d6
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
Else, the code seemingly does not use MSG.
Signed-off-by: Denys Vlasenko
---
libbb/hash_md5_sha256_x86-32_shaNI.S | 64 +++++++++++++++---------------
libbb/hash_md5_sha256_x86-64_shaNI.S | 76 ++++++++++++++++++------------------
2 files changed, 70 insertions(+), 70 deletions(-)
diff --git a/libbb/hash_md5_sha256_x86-32_shaNI.S b/libbb/hash_md5_sha256_x86-32_shaNI.S
index c059fb18d..3905bad9a 100644
--- a/libbb/hash_md5_sha256_x86-32_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-32_shaNI.S
@@ -60,18 +60,18 @@ sha256_process_block64_shaNI:
pshufb XMMTMP, MSG
mova128 MSG, MSGTMP0
paddd 0*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
/* Rounds 4-7 */
movu128 1*16(DATA_PTR), MSG
pshufb XMMTMP, MSG
mova128 MSG, MSGTMP1
paddd 1*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP1, MSGTMP0
/* Rounds 8-11 */
@@ -79,9 +79,9 @@ sha256_process_block64_shaNI:
pshufb XMMTMP, MSG
mova128 MSG, MSGTMP2
paddd 2*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP2, MSGTMP1
/* Rounds 12-15 */
@@ -90,151 +90,151 @@ sha256_process_block64_shaNI:
/* ...to here */
mova128 MSG, MSGTMP3
paddd 3*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP3, XMMTMP
palignr $4, MSGTMP2, XMMTMP
paddd XMMTMP, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP3, MSGTMP2
/* Rounds 16-19 */
mova128 MSGTMP0, MSG
paddd 4*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP0, XMMTMP
palignr $4, MSGTMP3, XMMTMP
paddd XMMTMP, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP0, MSGTMP3
/* Rounds 20-23 */
mova128 MSGTMP1, MSG
paddd 5*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP1, XMMTMP
palignr $4, MSGTMP0, XMMTMP
paddd XMMTMP, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP1, MSGTMP0
/* Rounds 24-27 */
mova128 MSGTMP2, MSG
paddd 6*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP2, XMMTMP
palignr $4, MSGTMP1, XMMTMP
paddd XMMTMP, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP2, MSGTMP1
/* Rounds 28-31 */
mova128 MSGTMP3, MSG
paddd 7*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP3, XMMTMP
palignr $4, MSGTMP2, XMMTMP
paddd XMMTMP, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP3, MSGTMP2
/* Rounds 32-35 */
mova128 MSGTMP0, MSG
paddd 8*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP0, XMMTMP
palignr $4, MSGTMP3, XMMTMP
paddd XMMTMP, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP0, MSGTMP3
/* Rounds 36-39 */
mova128 MSGTMP1, MSG
paddd 9*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP1, XMMTMP
palignr $4, MSGTMP0, XMMTMP
paddd XMMTMP, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP1, MSGTMP0
/* Rounds 40-43 */
mova128 MSGTMP2, MSG
paddd 10*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP2, XMMTMP
palignr $4, MSGTMP1, XMMTMP
paddd XMMTMP, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP2, MSGTMP1
/* Rounds 44-47 */
mova128 MSGTMP3, MSG
paddd 11*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP3, XMMTMP
palignr $4, MSGTMP2, XMMTMP
paddd XMMTMP, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP3, MSGTMP2
/* Rounds 48-51 */
mova128 MSGTMP0, MSG
paddd 12*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP0, XMMTMP
palignr $4, MSGTMP3, XMMTMP
paddd XMMTMP, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP0, MSGTMP3
/* Rounds 52-55 */
mova128 MSGTMP1, MSG
paddd 13*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP1, XMMTMP
palignr $4, MSGTMP0, XMMTMP
paddd XMMTMP, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
/* Rounds 56-59 */
mova128 MSGTMP2, MSG
paddd 14*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP2, XMMTMP
palignr $4, MSGTMP1, XMMTMP
paddd XMMTMP, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
/* Rounds 60-63 */
mova128 MSGTMP3, MSG
paddd 15*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
/* Write hash values back in the correct order */
mova128 STATE0, XMMTMP
diff --git a/libbb/hash_md5_sha256_x86-64_shaNI.S b/libbb/hash_md5_sha256_x86-64_shaNI.S
index 9578441f8..082ceafe4 100644
--- a/libbb/hash_md5_sha256_x86-64_shaNI.S
+++ b/libbb/hash_md5_sha256_x86-64_shaNI.S
@@ -38,8 +38,8 @@
#define XMMTMP %xmm7
-#define ABEF_SAVE %xmm9
-#define CDGH_SAVE %xmm10
+#define SAVE0 %xmm8
+#define SAVE1 %xmm9
#define SHUF(a,b,c,d) $(a+(b<<2)+(c<<4)+(d<<6))
@@ -59,26 +59,26 @@ sha256_process_block64_shaNI:
leaq K256+8*16(%rip), SHA256CONSTANTS
/* Save hash values for addition after rounds */
- mova128 STATE0, ABEF_SAVE
- mova128 STATE1, CDGH_SAVE
+ mova128 STATE0, SAVE0
+ mova128 STATE1, SAVE1
/* Rounds 0-3 */
movu128 0*16(DATA_PTR), MSG
pshufb XMMTMP, MSG
mova128 MSG, MSGTMP0
paddd 0*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
/* Rounds 4-7 */
movu128 1*16(DATA_PTR), MSG
pshufb XMMTMP, MSG
mova128 MSG, MSGTMP1
paddd 1*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP1, MSGTMP0
/* Rounds 8-11 */
@@ -86,9 +86,9 @@ sha256_process_block64_shaNI:
pshufb XMMTMP, MSG
mova128 MSG, MSGTMP2
paddd 2*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP2, MSGTMP1
/* Rounds 12-15 */
@@ -97,155 +97,155 @@ sha256_process_block64_shaNI:
/* ...to here */
mova128 MSG, MSGTMP3
paddd 3*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP3, XMMTMP
palignr $4, MSGTMP2, XMMTMP
paddd XMMTMP, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP3, MSGTMP2
/* Rounds 16-19 */
mova128 MSGTMP0, MSG
paddd 4*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP0, XMMTMP
palignr $4, MSGTMP3, XMMTMP
paddd XMMTMP, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP0, MSGTMP3
/* Rounds 20-23 */
mova128 MSGTMP1, MSG
paddd 5*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP1, XMMTMP
palignr $4, MSGTMP0, XMMTMP
paddd XMMTMP, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP1, MSGTMP0
/* Rounds 24-27 */
mova128 MSGTMP2, MSG
paddd 6*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP2, XMMTMP
palignr $4, MSGTMP1, XMMTMP
paddd XMMTMP, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP2, MSGTMP1
/* Rounds 28-31 */
mova128 MSGTMP3, MSG
paddd 7*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP3, XMMTMP
palignr $4, MSGTMP2, XMMTMP
paddd XMMTMP, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP3, MSGTMP2
/* Rounds 32-35 */
mova128 MSGTMP0, MSG
paddd 8*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP0, XMMTMP
palignr $4, MSGTMP3, XMMTMP
paddd XMMTMP, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP0, MSGTMP3
/* Rounds 36-39 */
mova128 MSGTMP1, MSG
paddd 9*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP1, XMMTMP
palignr $4, MSGTMP0, XMMTMP
paddd XMMTMP, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP1, MSGTMP0
/* Rounds 40-43 */
mova128 MSGTMP2, MSG
paddd 10*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP2, XMMTMP
palignr $4, MSGTMP1, XMMTMP
paddd XMMTMP, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP2, MSGTMP1
/* Rounds 44-47 */
mova128 MSGTMP3, MSG
paddd 11*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP3, XMMTMP
palignr $4, MSGTMP2, XMMTMP
paddd XMMTMP, MSGTMP0
sha256msg2 MSGTMP3, MSGTMP0
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP3, MSGTMP2
/* Rounds 48-51 */
mova128 MSGTMP0, MSG
paddd 12*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP0, XMMTMP
palignr $4, MSGTMP3, XMMTMP
paddd XMMTMP, MSGTMP1
sha256msg2 MSGTMP0, MSGTMP1
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
sha256msg1 MSGTMP0, MSGTMP3
/* Rounds 52-55 */
mova128 MSGTMP1, MSG
paddd 13*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP1, XMMTMP
palignr $4, MSGTMP0, XMMTMP
paddd XMMTMP, MSGTMP2
sha256msg2 MSGTMP1, MSGTMP2
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
/* Rounds 56-59 */
mova128 MSGTMP2, MSG
paddd 14*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
mova128 MSGTMP2, XMMTMP
palignr $4, MSGTMP1, XMMTMP
paddd XMMTMP, MSGTMP3
sha256msg2 MSGTMP2, MSGTMP3
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
/* Rounds 60-63 */
mova128 MSGTMP3, MSG
paddd 15*16-8*16(SHA256CONSTANTS), MSG
- sha256rnds2 STATE0, STATE1
+ sha256rnds2 MSG, STATE0, STATE1
shuf128_32 $0x0E, MSG, MSG
- sha256rnds2 STATE1, STATE0
+ sha256rnds2 MSG, STATE1, STATE0
/* Add current hash values with previously saved */
- paddd ABEF_SAVE, STATE0
- paddd CDGH_SAVE, STATE1
+ paddd SAVE0, STATE0
+ paddd SAVE1, STATE1
/* Write hash values back in the correct order */
mova128 STATE0, XMMTMP
From bugzilla at busybox.net Sun Feb 13 20:22:13 2022
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Sun, 13 Feb 2022 20:22:13 +0000
Subject: [Bug 14591] New: Online Manufacturing Service
Message-ID:
https://bugs.busybox.net/show_bug.cgi?id=14591
Bug ID: 14591
Summary: Online Manufacturing Service
Product: Busybox
Version: 1.33.x
Hardware: All
OS: Linux
Status: NEW
Severity: normal
Priority: P5
Component: Standard Compliance
Assignee: unassigned at busybox.net
Reporter: gavin77902 at balaket.com
CC: busybox-cvs at busybox.net
Target Milestone: ---
3DSculpLab offers professional online manufacturing services for prototypes,
one-off goods, and short-run production using a variety of manufacturing
processes, all through a single online platform. https://www.3dsculplab.xyz/
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at busybox.net Sun Feb 13 20:22:52 2022
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Sun, 13 Feb 2022 20:22:52 +0000
Subject: [Bug 14591] Online Manufacturing Service
In-Reply-To:
References:
Message-ID:
https://bugs.busybox.net/show_bug.cgi?id=14591
3D Printing changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |documentation
URL| |https://www.3dsculplab.xyz/
--
You are receiving this mail because:
You are on the CC list for the bug.
From vda.linux at googlemail.com Fri Feb 18 16:09:51 2022
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Fri, 18 Feb 2022 17:09:51 +0100
Subject: [git commit] libbb/sha1: update config help text with new performance
numbers
Message-ID: <20220218161429.8F9DB813EE@busybox.osuosl.org>
commit: https://git.busybox.net/busybox/commit/?id=1891fdda59092a215d3a407d9108bbbe6ab8df7a
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master
Signed-off-by: Denys Vlasenko
---
libbb/Config.src | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/libbb/Config.src b/libbb/Config.src
index 0ecd5bd46..66a3ffa23 100644
--- a/libbb/Config.src
+++ b/libbb/Config.src
@@ -57,11 +57,12 @@ config SHA1_SMALL
range 0 3
help
Trade binary size versus speed for the sha1 algorithm.
+ With FEATURE_COPYBUF_KB=64:
throughput MB/s size of sha1_process_block64
value 486 x86-64 486 x86-64
- 0 367 375 3657 3502
- 1 224 229 654 732
- 2,3 200 195 358 380
+ 0 440 485 3481 3502
+ 1 265 265 641 696
+ 2,3 220 210 342 364
config SHA1_HWACCEL
bool "SHA1: Use hardware accelerated instructions if possible"
From bugzilla at busybox.net Sun Feb 20 04:57:25 2022
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Sun, 20 Feb 2022 04:57:25 +0000
Subject: [Bug 14576] unzip: test skipped with bad archive
In-Reply-To:
References:
Message-ID:
https://bugs.busybox.net/show_bug.cgi?id=14576
--- Comment #1 from dharan ---
Hi Team,
Can you please share an update on the requested SKIPPED test case?
SKIPPED: unzip (bad archive)
Regards,
-Dharan
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at busybox.net Wed Feb 23 16:19:45 2022
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Wed, 23 Feb 2022 16:19:45 +0000
Subject: [Bug 11736] KCONFIG_ALLCONFIG does not apply passed config
(regression in 0b1c62934)
In-Reply-To:
References:
Message-ID:
https://bugs.busybox.net/show_bug.cgi?id=11736
--- Comment #1 from Axel Fontaine ---
This issue is still present in the latest release. Is there any workaround?
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at busybox.net Mon Feb 28 19:15:08 2022
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Mon, 28 Feb 2022 19:15:08 +0000
Subject: [Bug 14616] New: Printf format code and data type do not match in
taskset
Message-ID:
https://bugs.busybox.net/show_bug.cgi?id=14616
Bug ID: 14616
Summary: Printf format code and data type do not match in
taskset
Product: Busybox
Version: 1.33.x
Hardware: Other
OS: Linux
Status: NEW
Severity: normal
Priority: P5
Component: Other
Assignee: unassigned at busybox.net
Reporter: pdvb at yahoo.com
CC: busybox-cvs at busybox.net
Target Milestone: ---
The following code uses an (unsigned long) "%lx" format code, but passes an
(unsigned long long) value to printf. The result is that on architectures
which use 32-bit for (unsigned long) and 64-bit for (unsigned long long) the
printf produces incorrect output.
#define TASKSET_PRINTF_MASK "%lx"
static unsigned long long from_mask(ul *mask, unsigned sz_in_bytes
UNUSED_PARAM)
{
return *mask;
}
This was broken by commit ef0e76cc on 1/29/2017
The quick fix is to define the function as:
static unsigned long from_mask()
--
You are receiving this mail because:
You are on the CC list for the bug.