From bugzilla at busybox.net  Mon Nov  1 17:32:57 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Mon, 01 Nov 2021 17:32:57 +0000
Subject: [Bug 14306] New: ash: incorrect tilde expansion
Message-ID: <bug-14306-161@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=14306

            Bug ID: 14306
           Summary: ash: incorrect tilde expansion
           Product: Busybox
           Version: 1.33.x
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Standard Compliance
          Assignee: unassigned at busybox.net
          Reporter: dg+busybox at atufi.org
                CC: busybox-cvs at busybox.net
  Target Milestone: ---

On busybox 1.33.1, tilde expansion incorrectly alters words when the
tilde-prefix matches no valid login name (note the missing ending slash): 

$ ash -c 'echo ~~nouser/'
~~nouser

$ bash -c 'echo ~~nouser/'
~~nouser/

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From bugzilla at busybox.net  Wed Nov  3 04:12:17 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Wed, 03 Nov 2021 04:12:17 +0000
Subject: [Bug 14316] New: get_free_loop needs waiting
Message-ID: <bug-14316-161@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=14316

            Bug ID: 14316
           Summary: get_free_loop needs waiting
           Product: Busybox
           Version: 1.33.x
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: major
          Priority: P5
         Component: Other
          Assignee: unassigned at busybox.net
          Reporter: aswjh at 163.com
                CC: busybox-cvs at busybox.net
  Target Milestone: ---

libbb/loop.c: set_loop

Sometimes loop device is not ready arter get_free_loop, 
raise "can't setup loop device: No such file or directory".


It will be ok if usleep before "goto open_lfd":

try = xasprintf(LOOP_FORMAT, i); 
for (lc=0; lc<100; lc++) {
  if (stat(try, &buf2)==0) break; 
  usleep(20);
}

goto open_lfd;

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From bugzilla at busybox.net  Wed Nov  3 04:23:16 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Wed, 03 Nov 2021 04:23:16 +0000
Subject: [Bug 14316] get_free_loop needs waiting
In-Reply-To: <bug-14316-161@https.bugs.busybox.net/>
References: <bug-14316-161@https.bugs.busybox.net/>
Message-ID: <bug-14316-161-YWo4KttuIz@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=14316

--- Comment #1 from wjh <aswjh at 163.com> ---
Linux box 5.3.11-tinycore64 #1 SMP Wed Nov 20 08:16:37 CST 2019 x86_64
GNU/Linux

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From bugzilla at busybox.net  Fri Nov  5 00:16:49 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Fri, 05 Nov 2021 00:16:49 +0000
Subject: [Bug 14326] New: [PATCH] pkill: add -e to display the name and PID
 of the process being killed
Message-ID: <bug-14326-161@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=14326

            Bug ID: 14326
           Summary: [PATCH] pkill: add -e to display the name and PID of
                    the process being killed
           Product: Busybox
           Version: unspecified
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: Other
          Assignee: unassigned at busybox.net
          Reporter: sautier.louis at gmail.com
                CC: busybox-cvs at busybox.net
  Target Milestone: ---

Created attachment 9146
  --> https://bugs.busybox.net/attachment.cgi?id=9146&action=edit
0001-pkill-add-e-to-display-the-name-and-PID-of-the-proce.patch

Hello,
I found this pkill feature very useful so I implemented it. Please let me know
if the attached patch is OK.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From bugzilla at busybox.net  Sun Nov  7 19:29:45 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Sun, 07 Nov 2021 19:29:45 +0000
Subject: =?UTF-8?B?W0J1ZyAxNDMzMV0gTmV3OiB0ciBkb2VzbuKAmXQgdW5kZXJzdGFu?=
 =?UTF-8?B?ZCBbOmNsYXNzOl0gY2hhcmFjdGVyIGNsYXNzZXM=?=
Message-ID: <bug-14331-161@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=14331

            Bug ID: 14331
           Summary: tr doesn?t understand [:class:] character classes
           Product: Busybox
           Version: 1.30.x
          Hardware: All
                OS: All
            Status: NEW
          Severity: critical
          Priority: P5
         Component: Standard Compliance
          Assignee: unassigned at busybox.net
          Reporter: calestyo at scientia.org
                CC: busybox-cvs at busybox.net
  Target Milestone: ---

Hey.

Unlike mandated by POSIX:
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/tr.html

busybox' tr doesn't seem to understand any of the character classes,... and I'd
guess neither the other formats given in the EXTENDED DESCRIPTION of POSIX.

Not only does it not understand this, but it even takes such characters literal
so e.g. when using
busybox tr -d '[:alpha:]'
it will remove 'a' and so on.

Cheers,
Chris.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From bugzilla at busybox.net  Sun Nov  7 21:30:24 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Sun, 07 Nov 2021 21:30:24 +0000
Subject: =?UTF-8?B?W0J1ZyAxNDMzMV0gdHIgZG9lc27igJl0IHVuZGVyc3RhbmQgWzpj?=
 =?UTF-8?B?bGFzczpdIGNoYXJhY3RlciBjbGFzc2Vz?=
In-Reply-To: <bug-14331-161@https.bugs.busybox.net/>
References: <bug-14331-161@https.bugs.busybox.net/>
Message-ID: <bug-14331-161-IuS1FMcOfv@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=14331

--- Comment #1 from Ron Yorston <rmy at pobox.com> ---
Character classes should work with the default build configuration, though they
can be disabled by turning off FEATURE_TR_CLASSES. Is it possible that's the
case for the binary you're using?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From bugzilla at busybox.net  Sun Nov  7 22:06:08 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Sun, 07 Nov 2021 22:06:08 +0000
Subject: =?UTF-8?B?W0J1ZyAxNDMzMV0gdHIgZG9lc27igJl0IHVuZGVyc3RhbmQgWzpj?=
 =?UTF-8?B?bGFzczpdIGNoYXJhY3RlciBjbGFzc2Vz?=
In-Reply-To: <bug-14331-161@https.bugs.busybox.net/>
References: <bug-14331-161@https.bugs.busybox.net/>
Message-ID: <bug-14331-161-SYu2O4BDDp@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=14331

Christoph Anton Mitterer <calestyo at scientia.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|NEW                         |RESOLVED

--- Comment #2 from Christoph Anton Mitterer <calestyo at scientia.org> ---
Indeed, Debian seems to have disabled this.

Sorry for the noise.

Thanks,
Chris.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From bugzilla at busybox.net  Sun Nov  7 22:55:08 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Sun, 07 Nov 2021 22:55:08 +0000
Subject: =?UTF-8?B?W0J1ZyAxNDMzMV0gdHIgZG9lc27igJl0IHVuZGVyc3RhbmQgWzpj?=
 =?UTF-8?B?bGFzczpdIGNoYXJhY3RlciBjbGFzc2Vz?=
In-Reply-To: <bug-14331-161@https.bugs.busybox.net/>
References: <bug-14331-161@https.bugs.busybox.net/>
Message-ID: <bug-14331-161-HHQ0XymMs6@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=14331

--- Comment #3 from Christoph Anton Mitterer <calestyo at scientia.org> ---
Just for the records, forwarded downstream to:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=998803

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From bugzilla at busybox.net  Sun Nov  7 23:44:35 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Sun, 07 Nov 2021 23:44:35 +0000
Subject: [Bug 14336] New: busybox sed differs from GNU sed with respect to
 NUL (0x00)
Message-ID: <bug-14336-161@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=14336

            Bug ID: 14336
           Summary: busybox sed differs from GNU sed with respect to NUL
                    (0x00)
           Product: Busybox
           Version: 1.30.x
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Other
          Assignee: unassigned at busybox.net
          Reporter: calestyo at scientia.org
                CC: busybox-cvs at busybox.net
  Target Milestone: ---

Hey.

Not sure whether this is a "bug" or just something not defined by POSIX (I'm
not really sure whether POSIX says anything with respect to sed and NUL),... at
least it doesn't seem to be a configure option this time.

I've noted a differing behaviour between busybox' sed and GNU sed with respect
to 0x00:

It seems that GNU sed, leaves any 0x00 (as well as other "binary" characters)
in the current line and respects it when matching.
busybox' sed OTOH, doesn't do this but seems to terminate the string upon such
0x00.


Example Files:
$ hd test-with-0x00
00000000  66 6f 6f 0a 62 61 72 0a  7a 65 72 00 0a 62 61 7a  |foo.bar.zer..baz|
00000010  0a 7a 65 72 00 0a 65 6e  64 0a                    |.zer..end.|
0000001a
$ hd test-with-lone-0x00
00000000  66 6f 6f 0a 62 61 72 0a  00 0a 62 61 7a 0a 7a 65  |foo.bar...baz.ze|
00000010  72 00 0a 65 6e 64 0a                              |r..end.|
00000017
$ hd test-with-0x02-and-0x00
00000000  66 6f 6f 0a 62 61 72 0a  7a 65 02 00 0a 62 61 7a  |foo.bar.ze...baz|
00000010  0a 7a 65 72 00 0a 65 6e  64 0a                    |.zer..end.|
0000001a
$ hd test-with-0x00-followed-by-alpha
00000000  66 6f 6f 0a 62 61 72 0a  7a 65 72 00 6f 6f 0a 62  |foo.bar.zer.oo.b|
00000010  61 7a 0a 7a 65 72 00 74  74 0a 65 6e 64 0a        |az.zer.tt.end.|
0000001e


GNU sed:
$ sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x00 | hd
00000000  7a 65 72 00 0a                                    |zer..|
00000005
$ sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-lone-0x00 | hd
00000000  00 0a                                             |..|
00000002
$ sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x02-and-0x00 | hd
00000000  7a 65 02 00 0a                                    |ze...|
00000005
$ sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x00-followed-by-alpha | hd
00000000  7a 65 72 00 6f 6f 0a                              |zer.oo.|
00000007

(Note that GNU sed's -z option is NOT used.)


busybox' sed:
$ busybox sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x00 | hd
$ busybox sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-lone-0x00 | hd
$ busybox sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x02-and-0x00 | hd
$ busybox sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x00-followed-by-alpha | hd
$


So it seems that busybox' sed simply does the matching till the 0x00 (which is
perhaps used as string terminator), while GNU sed goes fully down the end of
line (\n).


Though it's worth to bring this to your attention.


Cheers,
Chris.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From vda.linux at googlemail.com  Tue Nov  9 12:51:22 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Tue, 9 Nov 2021 13:51:22 +0100
Subject: [git commit] which: add -a to help text
Message-ID: <20211109124747.1A5A48B7EE@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=15f7d618ea7f8c3a0277c98309268b709e20d77c
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

function                                             old     new   delta
packed_usage                                       34075   34079      +4

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 debianutils/which.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/debianutils/which.c b/debianutils/which.c
index b9f1b92fd..23692dc6f 100644
--- a/debianutils/which.c
+++ b/debianutils/which.c
@@ -17,9 +17,10 @@
 //kbuild:lib-$(CONFIG_WHICH) += which.o
 
 //usage:#define which_trivial_usage
-//usage:       "COMMAND..."
+//usage:       "[-a] COMMAND..."
 //usage:#define which_full_usage "\n\n"
-//usage:       "Locate COMMAND"
+//usage:       "Locate COMMAND\n"
+//usage:     "\n	-a	Show all matches"
 //usage:
 //usage:#define which_example_usage
 //usage:       "$ which login\n"

From bugzilla at busybox.net  Wed Nov 10 08:18:13 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Wed, 10 Nov 2021 08:18:13 +0000
Subject: =?UTF-8?B?W0J1ZyAxNDM0MV0gTmV3OiBCdXN5Qm94IOKAkyAxNCBuZXcgdnVs?=
 =?UTF-8?B?bmVyYWJpbGl0aWVz?=
Message-ID: <bug-14341-161@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=14341

            Bug ID: 14341
           Summary: BusyBox ? 14 new vulnerabilities
           Product: Busybox
           Version: 1.33.x
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Other
          Assignee: unassigned at busybox.net
          Reporter: xiechengliang1 at huawei.com
                CC: busybox-cvs at busybox.net
  Target Milestone: ---

The jfrog website has disclosed 14 vulnerabilities, and fixed in version
buysbox 1.34.0.  but I can?t find which is the repair commit for each CVE.

who can help me?

Reference:
https://jfrog.com/blog/unboxing-busybox-14-new-vulnerabilities-uncovered-by-claroty-and-jfrog/

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From bugzilla at busybox.net  Sat Nov 20 00:17:12 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Sat, 20 Nov 2021 00:17:12 +0000
Subject: [Bug 14361] New: udhcpc ignores T1 and T2 values
Message-ID: <bug-14361-161@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=14361

            Bug ID: 14361
           Summary: udhcpc ignores T1 and T2 values
           Product: Busybox
           Version: unspecified
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Networking
          Assignee: unassigned at busybox.net
          Reporter: luke-jr+busyboxbugs at utopios.org
                CC: busybox-cvs at busybox.net
  Target Milestone: ---

It seems the code just hard-coded T1 at half the lease time

https://git.busybox.net/busybox/tree/networking/udhcp/dhcpc.c?h=1_34_stable#n1802

Values provided by the DHCP server (opt 58, 59) just get ignored...

Use case: I issue 24 hour leases, and want the lease to be used that long if
necessary (eg, if router is down), but I also want DHCP renewals every minute
so: 1) I can rapidly give static IP leases and have the clients pick up on
them; and 2) router reboots can quickly rebuild their lease table without
persistent state.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From bugzilla at busybox.net  Sat Nov 20 11:33:22 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Sat, 20 Nov 2021 11:33:22 +0000
Subject: [Bug 13736] LABEL/UUID mount in fstab doesn't work
In-Reply-To: <bug-13736-161@https.bugs.busybox.net/>
References: <bug-13736-161@https.bugs.busybox.net/>
Message-ID: <bug-13736-161-WlHt2tQUk0@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=13736

--- Comment #1 from stsp <stsp2 at yandex.ru> ---
Ironically, I reported the same
bug to util-linux and it was fixed:
https://github.com/util-linux/util-linux/issues/1492

I wonder if busybox's mount also
uses libblkid from util-linux, so
maybe in busybox this is now fixed
too?
Or the same bug in an entirely
different code bases?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From bugzilla at busybox.net  Sun Nov 21 10:42:50 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Sun, 21 Nov 2021 10:42:50 +0000
Subject: [Bug 13736] LABEL/UUID mount in fstab doesn't work
In-Reply-To: <bug-13736-161@https.bugs.busybox.net/>
References: <bug-13736-161@https.bugs.busybox.net/>
Message-ID: <bug-13736-161-OQOP9CX6av@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=13736

--- Comment #2 from Fabrice Fontaine <fontaine.fabrice at gmail.com> ---
>From my understanding, this is a different code base:
https://git.busybox.net/busybox/tree/util-linux

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From bugzilla at busybox.net  Tue Nov 23 17:34:49 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Tue, 23 Nov 2021 17:34:49 +0000
Subject: [Bug 14376] New: Tar component in busybox version 1.34.1 has a
 memory leak bug when trying to unpack a tar file.
Message-ID: <bug-14376-161@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=14376

            Bug ID: 14376
           Summary: Tar component in busybox version 1.34.1 has a memory
                    leak bug when trying to unpack a tar file.
           Product: Busybox
           Version: unspecified
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: major
          Priority: P5
         Component: Other
          Assignee: unassigned at busybox.net
          Reporter: spwpun at gmail.com
                CC: busybox-cvs at busybox.net
  Target Milestone: ---

Created attachment 9156
  --> https://bugs.busybox.net/attachment.cgi?id=9156&action=edit
try to unpack this file with cmds above.

Hi~

In libbb/xfuncs_printf.c:50, malloc twice for archive_handle and
archive_hadle->fileheader with 184 and 72 bytes heap space.

Back to tar_main function, the two pointers(tar_handle?tar_handle->file_header)
hasn't been freed when return.

Complie cmds:
```
make O=/path/to/build defconfig
make O=/path/to/build menuconfig  # and choice ASAN options
cd /path/to/build && make -j4
```
Reproduce cmd:
```
./busybox_unstripped tar -xf test.tar
```

Backtarce in gdb:
```
[#0] 0x555555e7022e ? tar_main(argc=0x3, argv=0x7fffffffe430)
[#1] 0x555555b06aac ? run_applet_no_and_exit(applet_no=0x148,
name=0x7fffffffe709 "tar", argv=0x7fffffffe430)
[#2] 0x555555b06b6b ? run_applet_and_exit(name=0x7fffffffe709 "tar",
argv=0x7fffffffe430)
[#3] 0x555555b067cf ? busybox_main(argv=0x7fffffffe430)
[#4] 0x555555b06b29 ? run_applet_and_exit(name=0x7fffffffe6f6
"busybox_unstripped", argv=0x7fffffffe428)
[#5] 0x555555b06cbf ? main(argc=0x4, argv=0x7fffffffe428)
```

LeakSanitizer log:
```
=================================================================
==120986==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 184 byte(s) in 1 object(s) allocated from:
    #0 0x7efda806bb40 in __interceptor_malloc
(/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
    #1 0x555577ed8987 in xmalloc
/home/zy/packages/dhcp-targets/busybox-1.34.1/libbb/xfuncs_printf.c:50

Indirect leak of 72 byte(s) in 1 object(s) allocated from:
    #0 0x7efda806bb40 in __interceptor_malloc
(/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
    #1 0x555577ed8987 in xmalloc
/home/zy/packages/dhcp-targets/busybox-1.34.1/libbb/xfuncs_printf.c:50

SUMMARY: AddressSanitizer: 256 byte(s) leaked in 2 allocation(s).
```

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From vda.linux at googlemail.com  Tue Nov 23 04:31:30 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Tue, 23 Nov 2021 05:31:30 +0100
Subject: [git commit branch/1_33_stable] unlzma: fix a case where we could
 read before beginning of buffer
Message-ID: <20211124132354.46FE78F1C6@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=d326be2850ea2bd78fe2c22d6c45c3b861d82937
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/1_33_stable

Testcase:

  21 01 01 00 00 00 00 00 e7 01 01 01 ef 00 df b6
  00 17 02 10 11 0f ff 00 16 00 00

Unfortunately, the bug is not reliably causing a segfault,
the behavior depends on what's in memory before the buffer.

function                                             old     new   delta
unpack_lzma_stream                                  2762    2768      +6

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
(cherry picked from commit 04f052c56ded5ab6a904e3a264a73dc0412b2e78)
---
 archival/libarchive/decompress_unlzma.c |   5 ++++-
 testsuite/unlzma.tests                  |  17 +++++++++++++----
 testsuite/unlzma_issue_3.lzma           | Bin 0 -> 27 bytes
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/archival/libarchive/decompress_unlzma.c b/archival/libarchive/decompress_unlzma.c
index 0744f231a..fb5aac8fe 100644
--- a/archival/libarchive/decompress_unlzma.c
+++ b/archival/libarchive/decompress_unlzma.c
@@ -290,8 +290,11 @@ unpack_lzma_stream(transformer_state_t *xstate)
 				uint32_t pos;
 
 				pos = buffer_pos - rep0;
-				if ((int32_t)pos < 0)
+				if ((int32_t)pos < 0) {
 					pos += header.dict_size;
+					if ((int32_t)pos < 0)
+						goto bad;
+				}
 				match_byte = buffer[pos];
 				do {
 					int bit;
diff --git a/testsuite/unlzma.tests b/testsuite/unlzma.tests
index 0e98afe09..fcc6e9441 100755
--- a/testsuite/unlzma.tests
+++ b/testsuite/unlzma.tests
@@ -8,14 +8,23 @@
 
 # Damaged encrypted streams
 testing "unlzma (bad archive 1)" \
-	"unlzma <unlzma_issue_1.lzma >/dev/null; echo \$?" \
-"1
+	"unlzma <unlzma_issue_1.lzma 2>&1 >/dev/null; echo \$?" \
+"unlzma: corrupted data
+1
 " "" ""
 
 # Damaged encrypted streams
 testing "unlzma (bad archive 2)" \
-	"unlzma <unlzma_issue_2.lzma >/dev/null; echo \$?" \
-"1
+	"unlzma <unlzma_issue_2.lzma 2>&1 >/dev/null; echo \$?" \
+"unlzma: corrupted data
+1
+" "" ""
+
+# Damaged encrypted streams
+testing "unlzma (bad archive 3)" \
+	"unlzma <unlzma_issue_3.lzma 2>&1 >/dev/null; echo \$?" \
+"unlzma: corrupted data
+1
 " "" ""
 
 exit $FAILCOUNT
diff --git a/testsuite/unlzma_issue_3.lzma b/testsuite/unlzma_issue_3.lzma
new file mode 100644
index 000000000..cc60f29e4
Binary files /dev/null and b/testsuite/unlzma_issue_3.lzma differ

From vda.linux at googlemail.com  Tue Nov 23 04:31:30 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Tue, 23 Nov 2021 05:31:30 +0100
Subject: [git commit branch/1_33_stable] ash: parser: Fix VSLENGTH parsing
 with trailing garbage
Message-ID: <20211124132354.54F288F1C7@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=5b939a6d290651bcd836083d2a3e6fa6ff7bc636
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/1_33_stable

Let's adopt Herbert Xu's patch, not waiting for it to reach dash git:
hush already has a similar fix.

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
(cherry picked from commit 53a7a9cd8c15d64fcc2278cf8981ba526dfbe0d2)
---
 shell/ash.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/shell/ash.c b/shell/ash.c
index a33ab0626..1ca45f9c1 100644
--- a/shell/ash.c
+++ b/shell/ash.c
@@ -12635,7 +12635,7 @@ parsesub: {
 			do {
 				STPUTC(c, out);
 				c = pgetc_eatbnl();
-			} while (!subtype && isdigit(c));
+			} while ((subtype == 0 || subtype == VSLENGTH) && isdigit(c));
 		} else if (c != '}') {
 			/* $[{[#]]<specialchar>[}] */
 			int cc = c;
@@ -12665,11 +12665,6 @@ parsesub: {
 		} else
 			goto badsub;
 
-		if (c != '}' && subtype == VSLENGTH) {
-			/* ${#VAR didn't end with } */
-			goto badsub;
-		}
-
 		if (subtype == 0) {
 			static const char types[] ALIGN1 = "}-+?=";
 			/* ${VAR...} but not $VAR or ${#VAR} */
@@ -12726,6 +12721,8 @@ parsesub: {
 #endif
 			}
 		} else {
+			if (subtype == VSLENGTH && c != '}')
+				subtype = 0;
  badsub:
 			pungetc();
 		}

From vda.linux at googlemail.com  Wed Nov 24 13:27:03 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Wed, 24 Nov 2021 14:27:03 +0100
Subject: [git commit branch/1_33_stable] Bump version to 1.33.2
Message-ID: <20211124132354.794FF8F1C6@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=db726ae0c61ffec6b58e19749e0c63aaaf4f6989
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/1_33_stable

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 35d1589cb..5af09b38c 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
 VERSION = 1
 PATCHLEVEL = 33
-SUBLEVEL = 1
+SUBLEVEL = 2
 EXTRAVERSION =
 NAME = Unnamed
 

From vda.linux at googlemail.com  Tue Nov 23 04:31:30 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Tue, 23 Nov 2021 05:31:30 +0100
Subject: [git commit branch/1_33_stable] hush: fix handling of "cmd && &"
Message-ID: <20211124132354.70F2A8F1C7@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=bb612052900542046ce75e61a4e0b030c946984b
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/1_33_stable

function                                             old     new   delta
done_pipe                                            213     231     +18

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
(cherry picked from commit 83a4967e50422867f340328d404994553e56b839)
---
 shell/hush.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/shell/hush.c b/shell/hush.c
index 249728b9d..41a4653ea 100644
--- a/shell/hush.c
+++ b/shell/hush.c
@@ -3694,9 +3694,10 @@ static void debug_print_tree(struct pipe *pi, int lvl)
 
 	pin = 0;
 	while (pi) {
-		fdprintf(2, "%*spipe %d %sres_word=%s followup=%d %s\n",
+		fdprintf(2, "%*spipe %d #cmds:%d %sres_word=%s followup=%d %s\n",
 				lvl*2, "",
 				pin,
+				pi->num_cmds,
 				(IF_HAS_KEYWORDS(pi->pi_inverted ? "! " :) ""),
 				RES[pi->res_word],
 				pi->followup, PIPE[pi->followup]
@@ -3839,6 +3840,9 @@ static void done_pipe(struct parse_context *ctx, pipe_style type)
 #endif
 		/* Replace all pipes in ctx with one newly created */
 		ctx->list_head = ctx->pipe = pi;
+		/* for cases like "cmd && &", do not be tricked by last command
+		 * being null - the entire {...} & is NOT null! */
+		not_null = 1;
 	} else {
  no_conv:
 		ctx->pipe->followup = type;

From vda.linux at googlemail.com  Tue Nov 23 04:31:30 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Tue, 23 Nov 2021 05:31:30 +0100
Subject: [git commit branch/1_33_stable] hush: fix handling of \^C and "^C"
Message-ID: <20211124132354.643E18F1C6@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=56a335378ac100d51c30b21eee499a2effa37fba
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/1_33_stable

function                                             old     new   delta
parse_stream                                        2238    2252     +14
encode_string                                        243     256     +13
------------------------------------------------------------------------------
(add/remove: 0/0 grow/shrink: 2/0 up/down: 27/0)               Total: 27 bytes

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
(cherry picked from commit 1b7a9b68d0e9aa19147d7fda16eb9a6b54156985)
---
 shell/ash_test/ash-misc/control_char3.right   |  1 +
 shell/ash_test/ash-misc/control_char3.tests   |  2 ++
 shell/ash_test/ash-misc/control_char4.right   |  1 +
 shell/ash_test/ash-misc/control_char4.tests   |  2 ++
 shell/hush.c                                  | 11 +++++++++++
 shell/hush_test/hush-misc/control_char3.right |  1 +
 shell/hush_test/hush-misc/control_char3.tests |  2 ++
 shell/hush_test/hush-misc/control_char4.right |  1 +
 shell/hush_test/hush-misc/control_char4.tests |  2 ++
 9 files changed, 23 insertions(+)

diff --git a/shell/ash_test/ash-misc/control_char3.right b/shell/ash_test/ash-misc/control_char3.right
new file mode 100644
index 000000000..283e02cbb
--- /dev/null
+++ b/shell/ash_test/ash-misc/control_char3.right
@@ -0,0 +1 @@
+SHELL: line 1: : not found
diff --git a/shell/ash_test/ash-misc/control_char3.tests b/shell/ash_test/ash-misc/control_char3.tests
new file mode 100755
index 000000000..4359db3f3
--- /dev/null
+++ b/shell/ash_test/ash-misc/control_char3.tests
@@ -0,0 +1,2 @@
+# (set argv0 to "SHELL" to avoid "/path/to/shell: blah" in error messages)
+$THIS_SH -c '\' SHELL
diff --git a/shell/ash_test/ash-misc/control_char4.right b/shell/ash_test/ash-misc/control_char4.right
new file mode 100644
index 000000000..2bf18e684
--- /dev/null
+++ b/shell/ash_test/ash-misc/control_char4.right
@@ -0,0 +1 @@
+SHELL: line 1: -: not found
diff --git a/shell/ash_test/ash-misc/control_char4.tests b/shell/ash_test/ash-misc/control_char4.tests
new file mode 100755
index 000000000..48010f154
--- /dev/null
+++ b/shell/ash_test/ash-misc/control_char4.tests
@@ -0,0 +1,2 @@
+# (set argv0 to "SHELL" to avoid "/path/to/shell: blah" in error messages)
+$THIS_SH -c '"-"' SHELL
diff --git a/shell/hush.c b/shell/hush.c
index 9fead37da..249728b9d 100644
--- a/shell/hush.c
+++ b/shell/hush.c
@@ -5235,6 +5235,11 @@ static int encode_string(o_string *as_string,
 	}
 #endif
 	o_addQchr(dest, ch);
+	if (ch == SPECIAL_VAR_SYMBOL) {
+		/* Convert "^C" to corresponding special variable reference */
+		o_addchr(dest, SPECIAL_VAR_QUOTED_SVS);
+		o_addchr(dest, SPECIAL_VAR_SYMBOL);
+	}
 	goto again;
 #undef as_string
 }
@@ -5346,6 +5351,11 @@ static struct pipe *parse_stream(char **pstring,
 			if (ch == '\n')
 				continue; /* drop \<newline>, get next char */
 			nommu_addchr(&ctx.as_string, '\\');
+			if (ch == SPECIAL_VAR_SYMBOL) {
+				nommu_addchr(&ctx.as_string, ch);
+				/* Convert \^C to corresponding special variable reference */
+				goto case_SPECIAL_VAR_SYMBOL;
+			}
 			o_addchr(&ctx.word, '\\');
 			if (ch == EOF) {
 				/* Testcase: eval 'echo Ok\' */
@@ -5670,6 +5680,7 @@ static struct pipe *parse_stream(char **pstring,
 		/* Note: nommu_addchr(&ctx.as_string, ch) is already done */
 
 		switch (ch) {
+		case_SPECIAL_VAR_SYMBOL:
 		case SPECIAL_VAR_SYMBOL:
 			/* Convert raw ^C to corresponding special variable reference */
 			o_addchr(&ctx.word, SPECIAL_VAR_SYMBOL);
diff --git a/shell/hush_test/hush-misc/control_char3.right b/shell/hush_test/hush-misc/control_char3.right
new file mode 100644
index 000000000..94b4f8699
--- /dev/null
+++ b/shell/hush_test/hush-misc/control_char3.right
@@ -0,0 +1 @@
+hush: can't execute '': No such file or directory
diff --git a/shell/hush_test/hush-misc/control_char3.tests b/shell/hush_test/hush-misc/control_char3.tests
new file mode 100755
index 000000000..4359db3f3
--- /dev/null
+++ b/shell/hush_test/hush-misc/control_char3.tests
@@ -0,0 +1,2 @@
+# (set argv0 to "SHELL" to avoid "/path/to/shell: blah" in error messages)
+$THIS_SH -c '\' SHELL
diff --git a/shell/hush_test/hush-misc/control_char4.right b/shell/hush_test/hush-misc/control_char4.right
new file mode 100644
index 000000000..698e21427
--- /dev/null
+++ b/shell/hush_test/hush-misc/control_char4.right
@@ -0,0 +1 @@
+hush: can't execute '-': No such file or directory
diff --git a/shell/hush_test/hush-misc/control_char4.tests b/shell/hush_test/hush-misc/control_char4.tests
new file mode 100755
index 000000000..48010f154
--- /dev/null
+++ b/shell/hush_test/hush-misc/control_char4.tests
@@ -0,0 +1,2 @@
+# (set argv0 to "SHELL" to avoid "/path/to/shell: blah" in error messages)
+$THIS_SH -c '"-"' SHELL

From vda at busybox.net  Wed Nov 24 13:27:03 2021
From: vda at busybox.net (Denys Vlasenko)
Date: Wed, 24 Nov 2021 14:27:03 +0100
Subject: [tag/1_33_2] new tag created
Message-ID: <20211124132406.675128F1C7@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=db726ae0c61ffec6b58e19749e0c63aaaf4f6989
tag: https://git.busybox.net/busybox/commit/?id=refs/tags/1_33_2

 Bump version to 1.33.2

From bugzilla at busybox.net  Thu Nov 25 11:44:57 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Thu, 25 Nov 2021 11:44:57 +0000
Subject: [Bug 14381] New: busybox awk '$2 == var' can fail to give only lines
 with given search string
Message-ID: <bug-14381-161@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=14381

            Bug ID: 14381
           Summary: busybox awk '$2 == var' can fail to give only lines
                    with given search string
           Product: Busybox
           Version: unspecified
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: major
          Priority: P5
         Component: Standard Compliance
          Assignee: unassigned at busybox.net
          Reporter: ricercar at tuta.io
                CC: busybox-cvs at busybox.net
  Target Milestone: ---

Created attachment 9161
  --> https://bugs.busybox.net/attachment.cgi?id=9161&action=edit
.config

Version:1.34.1
Expected: this code to only output second line, but it outputs both lines:

> printf "8 0091\n9 0133\n"|~/busybox-1.34.1/busybox awk '$2 == 0133'

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From bugzilla at busybox.net  Thu Nov 25 11:51:37 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Thu, 25 Nov 2021 11:51:37 +0000
Subject: [Bug 14386] New: ls -sh does not show human readable size
Message-ID: <bug-14386-161@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=14386

            Bug ID: 14386
           Summary: ls -sh does not show human readable size
           Product: Busybox
           Version: unspecified
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: minor
          Priority: P5
         Component: Standard Compliance
          Assignee: unassigned at busybox.net
          Reporter: ricercar at tuta.io
                CC: busybox-cvs at busybox.net
  Target Milestone: ---

Created attachment 9166
  --> https://bugs.busybox.net/attachment.cgi?id=9166&action=edit
.config

Version: 1.34.1 and 1.33.1
Expected: Human readable size, but code below show same output as ls -s (only
"total:" is human readable):

~/busybox-1.34.1/busybox ls -sh

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From bugzilla at busybox.net  Thu Nov 25 15:46:05 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Thu, 25 Nov 2021 15:46:05 +0000
Subject: [Bug 14381] busybox awk '$2 == var' can fail to give only lines with
 given search string
In-Reply-To: <bug-14381-161@https.bugs.busybox.net/>
References: <bug-14381-161@https.bugs.busybox.net/>
Message-ID: <bug-14381-161-vFPDTutmmE@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=14381

--- Comment #1 from ricercar at tuta.io ---
I'm on Alpine Linux btw, which uses musl.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From bugzilla at busybox.net  Thu Nov 25 21:28:30 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Thu, 25 Nov 2021 21:28:30 +0000
Subject: [Bug 14391] New: sha1sum slow on x64 and possibly others
Message-ID: <bug-14391-161@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=14391

            Bug ID: 14391
           Summary: sha1sum slow on x64 and possibly others
           Product: Busybox
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: Other
          Assignee: unassigned at busybox.net
          Reporter: blazejroszkowski at gmail.com
                CC: busybox-cvs at busybox.net
  Target Milestone: ---

Created attachment 9171
  --> https://bugs.busybox.net/attachment.cgi?id=9171&action=edit
dot config file from my build on Fedora VM

sha1sum in BusyBox is over twice as slow as decently optimized C
implementation.

All tests were done on x64-86 (VMs and real OSes, Windws and Linux, two
laptops). I don't know if this is a performance problem on other architectures
(I'd guess it is).

Test file (1 GiB): dd if=/dev/urandom of=gig.gig bs=1024 count=$((1024 * 1024))

GNU coreutils sha1sum (Git for Windows, no libcrypto use) and my own
implementation take (on my personal laptop) 2.8 seconds.

BusyBox takes 6.3 seconds. It's around 2x slower on my work laptop as well.

This is present in at least versions: 1.34.1 (my own build on Fedora 34,
.config attached), 1.34.1 in latest Alpine (3.15.0), 1.34.1 from Fedora 34
repos, 1.34.0 on Windows.

I also remember it being present on Ubuntu 18.04 LTS and 20.04 LTS as well
(busybox from the repos).

My optimized plain C sha1 implementation (that I'm happy to contribute) is
here: https://github.com/FRex/blasha1

The only downside I see from using optimized C version is potential increase in
binary size, since optimized code is heavily unrolled (but I didn't investigate
this increase).

I've searched for "performance", "sha1", "sha1sum", and "speed" on this
Bugzilla and found nothing about this.

I understand it's an old semi-obsolete algorithm but if BusyBox provides this
util, I assume it's better that it's faster rather than slower.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From vda.linux at googlemail.com  Sat Nov 27 10:28:11 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sat, 27 Nov 2021 11:28:11 +0100
Subject: [git commit] tls: P256: 64-bit optimizations
Message-ID: <20211127105345.39F0C880C3@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=4bc9da10718df7ed9e992b1ddd2e80d53d894177
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

function                                             old     new   delta
sp_256_proj_point_dbl_8                              421     428      +7
sp_256_point_from_bin2x32                             78      84      +6
sp_256_cmp_8                                          38      42      +4
sp_256_to_bin_8                                       28      31      +3
------------------------------------------------------------------------------
(add/remove: 0/0 grow/shrink: 4/0 up/down: 20/0)               Total: 20 bytes

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 include/platform.h      |   2 +
 networking/tls_sp_c32.c | 114 +++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 101 insertions(+), 15 deletions(-)

diff --git a/include/platform.h b/include/platform.h
index 9e1fb047d..ad27bb31a 100644
--- a/include/platform.h
+++ b/include/platform.h
@@ -239,6 +239,7 @@ typedef uint64_t bb__aliased_uint64_t FIX_ALIASING;
 # define move_from_unaligned_long(v, longp) ((v) = *(bb__aliased_long*)(longp))
 # define move_from_unaligned16(v, u16p) ((v) = *(bb__aliased_uint16_t*)(u16p))
 # define move_from_unaligned32(v, u32p) ((v) = *(bb__aliased_uint32_t*)(u32p))
+# define move_from_unaligned64(v, u64p) ((v) = *(bb__aliased_uint64_t*)(u64p))
 # define move_to_unaligned16(u16p, v)   (*(bb__aliased_uint16_t*)(u16p) = (v))
 # define move_to_unaligned32(u32p, v)   (*(bb__aliased_uint32_t*)(u32p) = (v))
 # define move_to_unaligned64(u64p, v)   (*(bb__aliased_uint64_t*)(u64p) = (v))
@@ -250,6 +251,7 @@ typedef uint64_t bb__aliased_uint64_t FIX_ALIASING;
 # define move_from_unaligned_long(v, longp) (memcpy(&(v), (longp), sizeof(long)))
 # define move_from_unaligned16(v, u16p) (memcpy(&(v), (u16p), 2))
 # define move_from_unaligned32(v, u32p) (memcpy(&(v), (u32p), 4))
+# define move_from_unaligned64(v, u64p) (memcpy(&(v), (u64p), 8))
 # define move_to_unaligned16(u16p, v) do { \
 	uint16_t __t = (v); \
 	memcpy((u16p), &__t, 2); \
diff --git a/networking/tls_sp_c32.c b/networking/tls_sp_c32.c
index 4d4ecdd74..d09f7e881 100644
--- a/networking/tls_sp_c32.c
+++ b/networking/tls_sp_c32.c
@@ -29,6 +29,20 @@ static void dump_hex(const char *fmt, const void *vp, int len)
 typedef uint32_t sp_digit;
 typedef int32_t signed_sp_digit;
 
+/* 64-bit optimizations:
+ * if BB_UNALIGNED_MEMACCESS_OK && ULONG_MAX > 0xffffffff,
+ * then loads and stores can be done in 64-bit chunks.
+ *
+ * A narrower case is when arch is also little-endian (such as x86_64),
+ * then "LSW first", uint32[8] and uint64[4] representations are equivalent,
+ * and arithmetic can be done in 64 bits too.
+ */
+#if defined(__GNUC__) && defined(__x86_64__)
+# define UNALIGNED_LE_64BIT 1
+#else
+# define UNALIGNED_LE_64BIT 0
+#endif
+
 /* The code below is taken from parts of
  *  wolfssl-3.15.3/wolfcrypt/src/sp_c32.c
  * and heavily modified.
@@ -58,6 +72,22 @@ static const sp_digit p256_mod[8] = {
  * r  A single precision integer.
  * a  Byte array.
  */
+#if BB_UNALIGNED_MEMACCESS_OK && ULONG_MAX > 0xffffffff
+static void sp_256_to_bin_8(const sp_digit* rr, uint8_t* a)
+{
+	int i;
+	const uint64_t* r = (void*)rr;
+
+	sp_256_norm_8(rr);
+
+	r += 4;
+	for (i = 0; i < 4; i++) {
+		r--;
+		move_to_unaligned64(a, SWAP_BE64(*r));
+		a += 8;
+	}
+}
+#else
 static void sp_256_to_bin_8(const sp_digit* r, uint8_t* a)
 {
 	int i;
@@ -71,6 +101,7 @@ static void sp_256_to_bin_8(const sp_digit* r, uint8_t* a)
 		a += 4;
 	}
 }
+#endif
 
 /* Read big endian unsigned byte array into r.
  *
@@ -78,6 +109,21 @@ static void sp_256_to_bin_8(const sp_digit* r, uint8_t* a)
  * a  Byte array.
  * n  Number of bytes in array to read.
  */
+#if BB_UNALIGNED_MEMACCESS_OK && ULONG_MAX > 0xffffffff
+static void sp_256_from_bin_8(sp_digit* rr, const uint8_t* a)
+{
+	int i;
+	uint64_t* r = (void*)rr;
+
+	r += 4;
+	for (i = 0; i < 4; i++) {
+		uint64_t v;
+		move_from_unaligned64(v, a);
+		*--r = SWAP_BE64(v);
+		a += 8;
+	}
+}
+#else
 static void sp_256_from_bin_8(sp_digit* r, const uint8_t* a)
 {
 	int i;
@@ -90,6 +136,7 @@ static void sp_256_from_bin_8(sp_digit* r, const uint8_t* a)
 		a += 4;
 	}
 }
+#endif
 
 #if SP_DEBUG
 static void dump_256(const char *fmt, const sp_digit* r)
@@ -125,6 +172,20 @@ static void sp_256_point_from_bin2x32(sp_point* p, const uint8_t *bin2x32)
  * return -ve, 0 or +ve if a is less than, equal to or greater than b
  * respectively.
  */
+#if UNALIGNED_LE_64BIT
+static signed_sp_digit sp_256_cmp_8(const sp_digit* aa, const sp_digit* bb)
+{
+	const uint64_t* a = (void*)aa;
+	const uint64_t* b = (void*)bb;
+	int i;
+	for (i = 3; i >= 0; i--) {
+		if (a[i] == b[i])
+			continue;
+		return (a[i] > b[i]) * 2 - 1;
+	}
+	return 0;
+}
+#else
 static signed_sp_digit sp_256_cmp_8(const sp_digit* a, const sp_digit* b)
 {
 	int i;
@@ -140,6 +201,7 @@ static signed_sp_digit sp_256_cmp_8(const sp_digit* a, const sp_digit* b)
 	}
 	return 0;
 }
+#endif
 
 /* Compare two numbers to determine if they are equal.
  *
@@ -196,8 +258,6 @@ static int sp_256_add_8(sp_digit* r, const sp_digit* a, const sp_digit* b)
 	);
 	return reg;
 #elif ALLOW_ASM && defined(__GNUC__) && defined(__x86_64__)
-	/* x86_64 has no alignment restrictions, and is little-endian,
-	 * so 64-bit and 32-bit representations are identical */
 	uint64_t reg;
 	asm volatile (
 "\n		movq	(%0), %3"
@@ -294,8 +354,6 @@ static int sp_256_sub_8(sp_digit* r, const sp_digit* a, const sp_digit* b)
 	);
 	return reg;
 #elif ALLOW_ASM && defined(__GNUC__) && defined(__x86_64__)
-	/* x86_64 has no alignment restrictions, and is little-endian,
-	 * so 64-bit and 32-bit representations are identical */
 	uint64_t reg;
 	asm volatile (
 "\n		movq	(%0), %3"
@@ -440,8 +498,6 @@ static void sp_256_mul_8(sp_digit* r, const sp_digit* a, const sp_digit* b)
 	r[15] = accl;
 	memcpy(r, rr, sizeof(rr));
 #elif ALLOW_ASM && defined(__GNUC__) && defined(__x86_64__)
-	/* x86_64 has no alignment restrictions, and is little-endian,
-	 * so 64-bit and 32-bit representations are identical */
 	const uint64_t* aa = (const void*)a;
 	const uint64_t* bb = (const void*)b;
 	uint64_t rr[8];
@@ -551,17 +607,32 @@ static void sp_256_mul_8(sp_digit* r, const sp_digit* a, const sp_digit* b)
 }
 
 /* Shift number right one bit. Bottom bit is lost. */
-static void sp_256_rshift1_8(sp_digit* r, sp_digit* a, sp_digit carry)
+#if UNALIGNED_LE_64BIT
+static void sp_256_rshift1_8(sp_digit* rr, uint64_t carry)
+{
+	uint64_t *r = (void*)rr;
+	int i;
+
+	carry = (((uint64_t)!!carry) << 63);
+	for (i = 3; i >= 0; i--) {
+		uint64_t c = r[i] << 63;
+		r[i] = (r[i] >> 1) | carry;
+		carry = c;
+	}
+}
+#else
+static void sp_256_rshift1_8(sp_digit* r, sp_digit carry)
 {
 	int i;
 
-	carry = (!!carry << 31);
+	carry = (((sp_digit)!!carry) << 31);
 	for (i = 7; i >= 0; i--) {
-		sp_digit c = a[i] << 31;
-		r[i] = (a[i] >> 1) | carry;
+		sp_digit c = r[i] << 31;
+		r[i] = (r[i] >> 1) | carry;
 		carry = c;
 	}
 }
+#endif
 
 /* Divide the number by 2 mod the modulus (prime). (r = a / 2 % m) */
 static void sp_256_div2_8(sp_digit* r, const sp_digit* a, const sp_digit* m)
@@ -570,7 +641,7 @@ static void sp_256_div2_8(sp_digit* r, const sp_digit* a, const sp_digit* m)
 	if (a[0] & 1)
 		carry = sp_256_add_8(r, a, m);
 	sp_256_norm_8(r);
-	sp_256_rshift1_8(r, r, carry);
+	sp_256_rshift1_8(r, carry);
 }
 
 /* Add two Montgomery form numbers (r = a + b % m) */
@@ -634,15 +705,28 @@ static void sp_256_mont_tpl_8(sp_digit* r, const sp_digit* a /*, const sp_digit*
 }
 
 /* Shift the result in the high 256 bits down to the bottom. */
-static void sp_256_mont_shift_8(sp_digit* r, const sp_digit* a)
+#if BB_UNALIGNED_MEMACCESS_OK && ULONG_MAX > 0xffffffff
+static void sp_256_mont_shift_8(sp_digit* rr)
+{
+	uint64_t *r = (void*)rr;
+	int i;
+
+	for (i = 0; i < 4; i++) {
+		r[i] = r[i+4];
+		r[i+4] = 0;
+	}
+}
+#else
+static void sp_256_mont_shift_8(sp_digit* r)
 {
 	int i;
 
 	for (i = 0; i < 8; i++) {
-		r[i] = a[i+8];
+		r[i] = r[i+8];
 		r[i+8] = 0;
 	}
 }
+#endif
 
 /* Mul a by scalar b and add into r. (r += a * b) */
 static int sp_256_mul_add_8(sp_digit* r /*, const sp_digit* a, sp_digit b*/)
@@ -800,7 +884,7 @@ static void sp_256_mont_reduce_8(sp_digit* a/*, const sp_digit* m, sp_digit mp*/
 					goto inc_next_word0;
 			}
 		}
-		sp_256_mont_shift_8(a, a);
+		sp_256_mont_shift_8(a);
 		if (word16th != 0)
 			sp_256_sub_8_p256_mod(a);
 		sp_256_norm_8(a);
@@ -820,7 +904,7 @@ static void sp_256_mont_reduce_8(sp_digit* a/*, const sp_digit* m, sp_digit mp*/
 					goto inc_next_word;
 			}
 		}
-		sp_256_mont_shift_8(a, a);
+		sp_256_mont_shift_8(a);
 		if (word16th != 0)
 			sp_256_sub_8_p256_mod(a);
 		sp_256_norm_8(a);

From vda.linux at googlemail.com  Sat Nov 27 11:03:43 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sat, 27 Nov 2021 12:03:43 +0100
Subject: [git commit] tls: tweak debug printout
Message-ID: <20211127105932.4C80A88224@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=446d136109633c12d748d63e2034db238f77ef97
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 networking/tls.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/networking/tls.c b/networking/tls.c
index 675ef4b3a..415952f16 100644
--- a/networking/tls.c
+++ b/networking/tls.c
@@ -1883,10 +1883,12 @@ static void process_server_key(tls_state_t *tls, int len)
 	keybuf += 4;
 	switch (t32) {
 	case _0x03001d20: //curve_x25519
+		dbg("got x25519 eccPubKey\n");
 		tls->flags |= GOT_EC_CURVE_X25519;
 		memcpy(tls->hsd->ecc_pub_key32, keybuf, 32);
 		break;
 	case _0x03001741: //curve_secp256r1 (aka P256)
+		dbg("got P256 eccPubKey\n");
 		/* P256 point can be transmitted odd- or even-compressed
 		 * (first byte is 3 or 2) or uncompressed (4).
 		 */
@@ -1899,7 +1901,6 @@ static void process_server_key(tls_state_t *tls, int len)
 	}
 
 	tls->flags |= GOT_EC_KEY;
-	dbg("got eccPubKey\n");
 }
 
 static void send_empty_client_cert(tls_state_t *tls)

From vda.linux at googlemail.com  Sat Nov 27 14:06:57 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sat, 27 Nov 2021 15:06:57 +0100
Subject: [git commit] tls: P256: remove constant-time trick in
 sp_256_proj_point_add_8
Message-ID: <20211127152257.67EC98B52C@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=bbda85c74b7a53d8b2bb46f3b44d8f0932a6e95d
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

function                                             old     new   delta
sp_256_proj_point_add_8                              576     544     -32

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 networking/tls_sp_c32.c | 79 +++++++++++++++++++++++--------------------------
 1 file changed, 37 insertions(+), 42 deletions(-)

diff --git a/networking/tls_sp_c32.c b/networking/tls_sp_c32.c
index 29dd04293..3b0473036 100644
--- a/networking/tls_sp_c32.c
+++ b/networking/tls_sp_c32.c
@@ -1269,52 +1269,47 @@ static NOINLINE void sp_256_proj_point_add_8(sp_point* r, sp_point* p, sp_point*
 	 && (sp_256_cmp_equal_8(p->y, q->y) || sp_256_cmp_equal_8(p->y, t1))
 	) {
 		sp_256_proj_point_dbl_8(r, p);
+		return;
 	}
-	else {
-		sp_point tp;
-		sp_point *v;
-
-		v = r;
-		if (p->infinity | q->infinity) {
-			memset(&tp, 0, sizeof(tp));
-			v = &tp;
-		}
 
-		*r = p->infinity ? *q : *p; /* struct copy */
 
-		/* U1 = X1*Z2^2 */
-		sp_256_mont_sqr_8(t1, q->z /*, p256_mod, p256_mp_mod*/);
-		sp_256_mont_mul_8(t3, t1, q->z /*, p256_mod, p256_mp_mod*/);
-		sp_256_mont_mul_8(t1, t1, v->x /*, p256_mod, p256_mp_mod*/);
-		/* U2 = X2*Z1^2 */
-		sp_256_mont_sqr_8(t2, v->z /*, p256_mod, p256_mp_mod*/);
-		sp_256_mont_mul_8(t4, t2, v->z /*, p256_mod, p256_mp_mod*/);
-		sp_256_mont_mul_8(t2, t2, q->x /*, p256_mod, p256_mp_mod*/);
-		/* S1 = Y1*Z2^3 */
-		sp_256_mont_mul_8(t3, t3, v->y /*, p256_mod, p256_mp_mod*/);
-		/* S2 = Y2*Z1^3 */
-		sp_256_mont_mul_8(t4, t4, q->y /*, p256_mod, p256_mp_mod*/);
-		/* H = U2 - U1 */
-		sp_256_mont_sub_8(t2, t2, t1 /*, p256_mod*/);
-		/* R = S2 - S1 */
-		sp_256_mont_sub_8(t4, t4, t3 /*, p256_mod*/);
-		/* Z3 = H*Z1*Z2 */
-		sp_256_mont_mul_8(v->z, v->z, q->z /*, p256_mod, p256_mp_mod*/);
-		sp_256_mont_mul_8(v->z, v->z, t2 /*, p256_mod, p256_mp_mod*/);
-		/* X3 = R^2 - H^3 - 2*U1*H^2 */
-		sp_256_mont_sqr_8(v->x, t4 /*, p256_mod, p256_mp_mod*/);
-		sp_256_mont_sqr_8(t5, t2 /*, p256_mod, p256_mp_mod*/);
-		sp_256_mont_mul_8(v->y, t1, t5 /*, p256_mod, p256_mp_mod*/);
-		sp_256_mont_mul_8(t5, t5, t2 /*, p256_mod, p256_mp_mod*/);
-		sp_256_mont_sub_8(v->x, v->x, t5 /*, p256_mod*/);
-		sp_256_mont_dbl_8(t1, v->y /*, p256_mod*/);
-		sp_256_mont_sub_8(v->x, v->x, t1 /*, p256_mod*/);
-		/* Y3 = R*(U1*H^2 - X3) - S1*H^3 */
-		sp_256_mont_sub_8(v->y, v->y, v->x /*, p256_mod*/);
-		sp_256_mont_mul_8(v->y, v->y, t4 /*, p256_mod, p256_mp_mod*/);
-		sp_256_mont_mul_8(t5, t5, t3 /*, p256_mod, p256_mp_mod*/);
-		sp_256_mont_sub_8(v->y, v->y, t5 /*, p256_mod*/);
+	if (p->infinity || q->infinity) {
+		*r = p->infinity ? *q : *p; /* struct copy */
+		return;
 	}
+
+	/* U1 = X1*Z2^2 */
+	sp_256_mont_sqr_8(t1, q->z /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(t3, t1, q->z /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(t1, t1, r->x /*, p256_mod, p256_mp_mod*/);
+	/* U2 = X2*Z1^2 */
+	sp_256_mont_sqr_8(t2, r->z /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(t4, t2, r->z /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(t2, t2, q->x /*, p256_mod, p256_mp_mod*/);
+	/* S1 = Y1*Z2^3 */
+	sp_256_mont_mul_8(t3, t3, r->y /*, p256_mod, p256_mp_mod*/);
+	/* S2 = Y2*Z1^3 */
+	sp_256_mont_mul_8(t4, t4, q->y /*, p256_mod, p256_mp_mod*/);
+	/* H = U2 - U1 */
+	sp_256_mont_sub_8(t2, t2, t1 /*, p256_mod*/);
+	/* R = S2 - S1 */
+	sp_256_mont_sub_8(t4, t4, t3 /*, p256_mod*/);
+	/* Z3 = H*Z1*Z2 */
+	sp_256_mont_mul_8(r->z, r->z, q->z /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(r->z, r->z, t2 /*, p256_mod, p256_mp_mod*/);
+	/* X3 = R^2 - H^3 - 2*U1*H^2 */
+	sp_256_mont_sqr_8(r->x, t4 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_sqr_8(t5, t2 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(r->y, t1, t5 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(t5, t5, t2 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_sub_8(r->x, r->x, t5 /*, p256_mod*/);
+	sp_256_mont_dbl_8(t1, r->y /*, p256_mod*/);
+	sp_256_mont_sub_8(r->x, r->x, t1 /*, p256_mod*/);
+	/* Y3 = R*(U1*H^2 - X3) - S1*H^3 */
+	sp_256_mont_sub_8(r->y, r->y, r->x /*, p256_mod*/);
+	sp_256_mont_mul_8(r->y, r->y, t4 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(t5, t5, t3 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_sub_8(r->y, r->y, t5 /*, p256_mod*/);
 }
 
 /* Multiply the point by the scalar and return the result.

From vda.linux at googlemail.com  Sat Nov 27 14:00:14 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sat, 27 Nov 2021 15:00:14 +0100
Subject: [git commit] tls: P256: do not open-code copying of struct variables
Message-ID: <20211127152257.5EB0D8B528@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=26c85225229b0a439bcc66c8ee786d16f23be9ed
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

function                                             old     new   delta
sp_256_ecc_mulmod_8                                  536     534      -2

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 networking/tls_sp_c32.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/networking/tls_sp_c32.c b/networking/tls_sp_c32.c
index d09f7e881..29dd04293 100644
--- a/networking/tls_sp_c32.c
+++ b/networking/tls_sp_c32.c
@@ -1361,13 +1361,13 @@ static void sp_256_ecc_mulmod_8(sp_point* r, const sp_point* g, const sp_digit*
 		dump_512("t[1].y %s\n", t[1].y);
 		dump_512("t[1].z %s\n", t[1].z);
 		dbg("t[2] = t[%d]\n", y);
-		memcpy(&t[2], &t[y], sizeof(sp_point));
+		t[2] = t[y]; /* struct copy */
 		dbg("t[2] *= 2\n");
 		sp_256_proj_point_dbl_8(&t[2], &t[2]);
 		dump_512("t[2].x %s\n", t[2].x);
 		dump_512("t[2].y %s\n", t[2].y);
 		dump_512("t[2].z %s\n", t[2].z);
-		memcpy(&t[y], &t[2], sizeof(sp_point));
+		t[y] = t[2]; /* struct copy */
 
 		n <<= 1;
 		c--;

From vda.linux at googlemail.com  Sat Nov 27 15:24:49 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sat, 27 Nov 2021 16:24:49 +0100
Subject: [git commit] tls: P256: fix sp_256_div2_8 - it wouldn't use a[] if
 low bit is 0
Message-ID: <20211127152257.88FB58B52C@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=dcfd8d3d1013ba989fa511f44bb0553a88c1ef10
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

It worked by chance because the only caller passed both parameters
as two pointers to the same array.
My fault (I made this error when converting from 26-bit code).

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 networking/tls_sp_c32.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/networking/tls_sp_c32.c b/networking/tls_sp_c32.c
index baed62f41..b3f7888f5 100644
--- a/networking/tls_sp_c32.c
+++ b/networking/tls_sp_c32.c
@@ -636,12 +636,14 @@ static void sp_256_rshift1_8(sp_digit* r, sp_digit carry)
 }
 #endif
 
-/* Divide the number by 2 mod the modulus (prime). (r = a / 2 % m) */
-static void sp_256_div2_8(sp_digit* r, const sp_digit* a, const sp_digit* m)
+/* Divide the number by 2 mod the modulus (prime). (r = (r / 2) % m) */
+static void sp_256_div2_8(sp_digit* r /*, const sp_digit* m*/)
 {
+	const sp_digit* m = p256_mod;
+
 	int carry = 0;
-	if (a[0] & 1)
-		carry = sp_256_add_8(r, a, m);
+	if (r[0] & 1)
+		carry = sp_256_add_8(r, r, m);
 	sp_256_norm_8(r);
 	sp_256_rshift1_8(r, carry);
 }
@@ -1125,7 +1127,7 @@ static void sp_256_proj_point_dbl_8(sp_point* r, sp_point* p)
 	/* T2 = Y * Y */
 	sp_256to512z_mont_sqr_8(t2, r->y /*, p256_mod, p256_mp_mod*/);
 	/* T2 = T2/2 */
-	sp_256_div2_8(t2, t2, p256_mod);
+	sp_256_div2_8(t2 /*, p256_mod*/);
 	/* Y = Y * X */
 	sp_256to512z_mont_mul_8(r->y, r->y, r->x /*, p256_mod, p256_mp_mod*/);
 	/* X = T1 * T1 */

From vda.linux at googlemail.com  Sat Nov 27 14:50:40 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sat, 27 Nov 2021 15:50:40 +0100
Subject: [git commit] tls: P256: remove redundant zeroing in sp_256_map_8
Message-ID: <20211127152257.8031F8820C@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=8cbb70365f653397c8c2b9370214d5aed36ec9fa
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

Previous change made it obvious that we zero out already-zeroed high bits

function                                             old     new   delta
sp_256_ecc_mulmod_8                                  534     494     -40

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 networking/tls_sp_c32.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/networking/tls_sp_c32.c b/networking/tls_sp_c32.c
index 74ded2cda..baed62f41 100644
--- a/networking/tls_sp_c32.c
+++ b/networking/tls_sp_c32.c
@@ -1062,7 +1062,6 @@ static void sp_256_map_8(sp_point* r, sp_point* p)
 
 	/* x /= z^2 */
 	sp_256to512z_mont_mul_8(r->x, p->x, t2 /*, p256_mod, p256_mp_mod*/);
-	memset(r->x + 8, 0, sizeof(r->x) / 2);
 	sp_512to256_mont_reduce_8(r->x /*, p256_mod, p256_mp_mod*/);
 	/* Reduce x to less than modulus */
 	if (sp_256_cmp_8(r->x, p256_mod) >= 0)
@@ -1071,7 +1070,6 @@ static void sp_256_map_8(sp_point* r, sp_point* p)
 
 	/* y /= z^3 */
 	sp_256to512z_mont_mul_8(r->y, p->y, t1 /*, p256_mod, p256_mp_mod*/);
-	memset(r->y + 8, 0, sizeof(r->y) / 2);
 	sp_512to256_mont_reduce_8(r->y /*, p256_mod, p256_mp_mod*/);
 	/* Reduce y to less than modulus */
 	if (sp_256_cmp_8(r->y, p256_mod) >= 0)

From vda.linux at googlemail.com  Sat Nov 27 14:47:26 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sat, 27 Nov 2021 15:47:26 +0100
Subject: [git commit] tls: P256: explain which functions use double-wide
 arrays, no code changes
Message-ID: <20211127152257.725D88B607@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=4415f7bc06f1ee382bcbaabd86c3d7aca0b46d93
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

function                                             old     new   delta
sp_512to256_mont_reduce_8                              -     243    +243
sp_256to512z_mont_mul_8                                -     150    +150
sp_256to512z_mont_sqr_8                                -       7      +7
sp_256_mont_sqr_8                                      7       -      -7
sp_256_mont_mul_8                                    150       -    -150
sp_256_mont_reduce_8                                 243       -    -243
------------------------------------------------------------------------------
(add/remove: 3/3 grow/shrink: 0/0 up/down: 400/-400)            Total: 0 bytes

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 networking/tls_sp_c32.c | 211 +++++++++++++-----------------------------------
 1 file changed, 58 insertions(+), 153 deletions(-)

diff --git a/networking/tls_sp_c32.c b/networking/tls_sp_c32.c
index 3b0473036..74ded2cda 100644
--- a/networking/tls_sp_c32.c
+++ b/networking/tls_sp_c32.c
@@ -455,8 +455,10 @@ static void sp_256_sub_8_p256_mod(sp_digit* r)
 }
 #endif
 
-/* Multiply a and b into r. (r = a * b) */
-static void sp_256_mul_8(sp_digit* r, const sp_digit* a, const sp_digit* b)
+/* Multiply a and b into r. (r = a * b)
+ * r should be [16] array (512 bits).
+ */
+static void sp_256to512_mul_8(sp_digit* r, const sp_digit* a, const sp_digit* b)
 {
 #if ALLOW_ASM && defined(__GNUC__) && defined(__i386__)
 	sp_digit rr[15]; /* in case r coincides with a or b */
@@ -704,9 +706,11 @@ static void sp_256_mont_tpl_8(sp_digit* r, const sp_digit* a /*, const sp_digit*
 	}
 }
 
-/* Shift the result in the high 256 bits down to the bottom. */
+/* Shift the result in the high 256 bits down to the bottom.
+ * High half is cleared to zeros.
+ */
 #if BB_UNALIGNED_MEMACCESS_OK && ULONG_MAX > 0xffffffff
-static void sp_256_mont_shift_8(sp_digit* rr)
+static void sp_512to256_mont_shift_8(sp_digit* rr)
 {
 	uint64_t *r = (void*)rr;
 	int i;
@@ -717,7 +721,7 @@ static void sp_256_mont_shift_8(sp_digit* rr)
 	}
 }
 #else
-static void sp_256_mont_shift_8(sp_digit* r)
+static void sp_512to256_mont_shift_8(sp_digit* r)
 {
 	int i;
 
@@ -728,7 +732,10 @@ static void sp_256_mont_shift_8(sp_digit* r)
 }
 #endif
 
-/* Mul a by scalar b and add into r. (r += a * b) */
+/* Mul a by scalar b and add into r. (r += a * b)
+ * a = p256_mod
+ * b = r[0]
+ */
 static int sp_256_mul_add_8(sp_digit* r /*, const sp_digit* a, sp_digit b*/)
 {
 //	const sp_digit* a = p256_mod;
@@ -857,11 +864,11 @@ static int sp_256_mul_add_8(sp_digit* r /*, const sp_digit* a, sp_digit b*/)
 
 /* Reduce the number back to 256 bits using Montgomery reduction.
  *
- * a   A single precision number to reduce in place.
+ * a   Double-wide number to reduce in place.
  * m   The single precision number representing the modulus.
  * mp  The digit representing the negative inverse of m mod 2^n.
  */
-static void sp_256_mont_reduce_8(sp_digit* a/*, const sp_digit* m, sp_digit mp*/)
+static void sp_512to256_mont_reduce_8(sp_digit* a/*, const sp_digit* m, sp_digit mp*/)
 {
 //	const sp_digit* m = p256_mod;
 	sp_digit mp = p256_mp_mod;
@@ -884,7 +891,7 @@ static void sp_256_mont_reduce_8(sp_digit* a/*, const sp_digit* m, sp_digit mp*/
 					goto inc_next_word0;
 			}
 		}
-		sp_256_mont_shift_8(a);
+		sp_512to256_mont_shift_8(a);
 		if (word16th != 0)
 			sp_256_sub_8_p256_mod(a);
 		sp_256_norm_8(a);
@@ -892,7 +899,7 @@ static void sp_256_mont_reduce_8(sp_digit* a/*, const sp_digit* m, sp_digit mp*/
 	else { /* Same code for explicit mp == 1 (which is always the case for P256) */
 		sp_digit word16th = 0;
 		for (i = 0; i < 8; i++) {
-			/*mu = a[i];*/
+//			mu = a[i];
 			if (sp_256_mul_add_8(a+i /*, m, mu*/)) {
 				int j = i + 8;
  inc_next_word:
@@ -904,148 +911,46 @@ static void sp_256_mont_reduce_8(sp_digit* a/*, const sp_digit* m, sp_digit mp*/
 					goto inc_next_word;
 			}
 		}
-		sp_256_mont_shift_8(a);
+		sp_512to256_mont_shift_8(a);
 		if (word16th != 0)
 			sp_256_sub_8_p256_mod(a);
 		sp_256_norm_8(a);
 	}
 }
-#if 0
-//TODO: arm32 asm (also adapt for x86?)
-static void sp_256_mont_reduce_8(sp_digit* a, sp_digit* m, sp_digit mp)
-{
-	sp_digit ca = 0;
-
-	asm volatile (
-	# i = 0
-	mov	r12, #0
-	ldr	r10, [%[a], #0]
-	ldr	r14, [%[a], #4]
-1:
-	# mu = a[i] * mp
-	mul	r8, %[mp], r10
-	# a[i+0] += m[0] * mu
-	ldr	r7, [%[m], #0]
-	ldr	r9, [%[a], #0]
-	umull	r6, r7, r8, r7
-	adds	r10, r10, r6
-	adc	r5, r7, #0
-	# a[i+1] += m[1] * mu
-	ldr	r7, [%[m], #4]
-	ldr	r9, [%[a], #4]
-	umull	r6, r7, r8, r7
-	adds	r10, r14, r6
-	adc	r4, r7, #0
-	adds	r10, r10, r5
-	adc	r4, r4, #0
-	# a[i+2] += m[2] * mu
-	ldr	r7, [%[m], #8]
-	ldr	r14, [%[a], #8]
-	umull	r6, r7, r8, r7
-	adds	r14, r14, r6
-	adc	r5, r7, #0
-	adds	r14, r14, r4
-	adc	r5, r5, #0
-	# a[i+3] += m[3] * mu
-	ldr	r7, [%[m], #12]
-	ldr	r9, [%[a], #12]
-	umull	r6, r7, r8, r7
-	adds	r9, r9, r6
-	adc	r4, r7, #0
-	adds	r9, r9, r5
-	str	r9, [%[a], #12]
-	adc	r4, r4, #0
-	# a[i+4] += m[4] * mu
-	ldr	r7, [%[m], #16]
-	ldr	r9, [%[a], #16]
-	umull	r6, r7, r8, r7
-	adds	r9, r9, r6
-	adc	r5, r7, #0
-	adds	r9, r9, r4
-	str	r9, [%[a], #16]
-	adc	r5, r5, #0
-	# a[i+5] += m[5] * mu
-	ldr	r7, [%[m], #20]
-	ldr	r9, [%[a], #20]
-	umull	r6, r7, r8, r7
-	adds	r9, r9, r6
-	adc	r4, r7, #0
-	adds	r9, r9, r5
-	str	r9, [%[a], #20]
-	adc	r4, r4, #0
-	# a[i+6] += m[6] * mu
-	ldr	r7, [%[m], #24]
-	ldr	r9, [%[a], #24]
-	umull	r6, r7, r8, r7
-	adds	r9, r9, r6
-	adc	r5, r7, #0
-	adds	r9, r9, r4
-	str	r9, [%[a], #24]
-	adc	r5, r5, #0
-	# a[i+7] += m[7] * mu
-	ldr	r7, [%[m], #28]
-	ldr	r9, [%[a], #28]
-	umull	r6, r7, r8, r7
-	adds	r5, r5, r6
-	adcs	r7, r7, %[ca]
-	mov	%[ca], #0
-	adc	%[ca], %[ca], %[ca]
-	adds	r9, r9, r5
-	str	r9, [%[a], #28]
-	ldr	r9, [%[a], #32]
-	adcs	r9, r9, r7
-	str	r9, [%[a], #32]
-	adc	%[ca], %[ca], #0
-	# i += 1
-	add	%[a], %[a], #4
-	add	r12, r12, #4
-	cmp	r12, #32
-	blt	1b
-
-	str	r10, [%[a], #0]
-	str	r14, [%[a], #4]
-	: [ca] "+r" (ca), [a] "+r" (a)
-	: [m] "r" (m), [mp] "r" (mp)
-	: "memory", "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r12", "r14"
-	);
-
-	memcpy(a, a + 8, 32);
-	if (ca)
-		a -= m;
-}
-#endif
 
 /* Multiply two Montogmery form numbers mod the modulus (prime).
  * (r = a * b mod m)
  *
  * r   Result of multiplication.
+ *     Should be [16] array (512 bits), but high half is cleared to zeros (used as scratch pad).
  * a   First number to multiply in Montogmery form.
  * b   Second number to multiply in Montogmery form.
  * m   Modulus (prime).
  * mp  Montogmery mulitplier.
  */
-static void sp_256_mont_mul_8(sp_digit* r, const sp_digit* a, const sp_digit* b
+static void sp_256to512z_mont_mul_8(sp_digit* r, const sp_digit* a, const sp_digit* b
 		/*, const sp_digit* m, sp_digit mp*/)
 {
 	//const sp_digit* m = p256_mod;
 	//sp_digit mp = p256_mp_mod;
-	sp_256_mul_8(r, a, b);
-	sp_256_mont_reduce_8(r /*, m, mp*/);
+	sp_256to512_mul_8(r, a, b);
+	sp_512to256_mont_reduce_8(r /*, m, mp*/);
 }
 
 /* Square the Montgomery form number. (r = a * a mod m)
  *
  * r   Result of squaring.
+ *     Should be [16] array (512 bits), but high half is cleared to zeros (used as scratch pad).
  * a   Number to square in Montogmery form.
  * m   Modulus (prime).
  * mp  Montogmery mulitplier.
  */
-static void sp_256_mont_sqr_8(sp_digit* r, const sp_digit* a
+static void sp_256to512z_mont_sqr_8(sp_digit* r, const sp_digit* a
 		/*, const sp_digit* m, sp_digit mp*/)
 {
 	//const sp_digit* m = p256_mod;
 	//sp_digit mp = p256_mp_mod;
-	sp_256_mont_mul_8(r, a, a /*, m, mp*/);
+	sp_256to512z_mont_mul_8(r, a, a /*, m, mp*/);
 }
 
 /* Invert the number, in Montgomery form, modulo the modulus (prime) of the
@@ -1068,15 +973,15 @@ static const uint32_t p256_mod_2[8] = {
 #endif
 static void sp_256_mont_inv_8(sp_digit* r, sp_digit* a)
 {
-	sp_digit t[2*8]; //can be just [8]?
+	sp_digit t[2*8];
 	int i;
 
 	memcpy(t, a, sizeof(sp_digit) * 8);
 	for (i = 254; i >= 0; i--) {
-		sp_256_mont_sqr_8(t, t /*, p256_mod, p256_mp_mod*/);
+		sp_256to512z_mont_sqr_8(t, t /*, p256_mod, p256_mp_mod*/);
 		/*if (p256_mod_2[i / 32] & ((sp_digit)1 << (i % 32)))*/
 		if (i >= 224 || i == 192 || (i <= 95 && i != 1))
-			sp_256_mont_mul_8(t, t, a /*, p256_mod, p256_mp_mod*/);
+			sp_256to512z_mont_mul_8(t, t, a /*, p256_mod, p256_mp_mod*/);
 	}
 	memcpy(r, t, sizeof(sp_digit) * 8);
 }
@@ -1152,22 +1057,22 @@ static void sp_256_map_8(sp_point* r, sp_point* p)
 
 	sp_256_mont_inv_8(t1, p->z);
 
-	sp_256_mont_sqr_8(t2, t1 /*, p256_mod, p256_mp_mod*/);
-	sp_256_mont_mul_8(t1, t2, t1 /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_sqr_8(t2, t1 /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(t1, t2, t1 /*, p256_mod, p256_mp_mod*/);
 
 	/* x /= z^2 */
-	sp_256_mont_mul_8(r->x, p->x, t2 /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(r->x, p->x, t2 /*, p256_mod, p256_mp_mod*/);
 	memset(r->x + 8, 0, sizeof(r->x) / 2);
-	sp_256_mont_reduce_8(r->x /*, p256_mod, p256_mp_mod*/);
+	sp_512to256_mont_reduce_8(r->x /*, p256_mod, p256_mp_mod*/);
 	/* Reduce x to less than modulus */
 	if (sp_256_cmp_8(r->x, p256_mod) >= 0)
 		sp_256_sub_8_p256_mod(r->x);
 	sp_256_norm_8(r->x);
 
 	/* y /= z^3 */
-	sp_256_mont_mul_8(r->y, p->y, t1 /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(r->y, p->y, t1 /*, p256_mod, p256_mp_mod*/);
 	memset(r->y + 8, 0, sizeof(r->y) / 2);
-	sp_256_mont_reduce_8(r->y /*, p256_mod, p256_mp_mod*/);
+	sp_512to256_mont_reduce_8(r->y /*, p256_mod, p256_mp_mod*/);
 	/* Reduce y to less than modulus */
 	if (sp_256_cmp_8(r->y, p256_mod) >= 0)
 		sp_256_sub_8_p256_mod(r->y);
@@ -1202,9 +1107,9 @@ static void sp_256_proj_point_dbl_8(sp_point* r, sp_point* p)
 	}
 
 	/* T1 = Z * Z */
-	sp_256_mont_sqr_8(t1, r->z /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_sqr_8(t1, r->z /*, p256_mod, p256_mp_mod*/);
 	/* Z = Y * Z */
-	sp_256_mont_mul_8(r->z, r->y, r->z /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(r->z, r->y, r->z /*, p256_mod, p256_mp_mod*/);
 	/* Z = 2Z */
 	sp_256_mont_dbl_8(r->z, r->z /*, p256_mod*/);
 	/* T2 = X - T1 */
@@ -1212,21 +1117,21 @@ static void sp_256_proj_point_dbl_8(sp_point* r, sp_point* p)
 	/* T1 = X + T1 */
 	sp_256_mont_add_8(t1, r->x, t1 /*, p256_mod*/);
 	/* T2 = T1 * T2 */
-	sp_256_mont_mul_8(t2, t1, t2 /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(t2, t1, t2 /*, p256_mod, p256_mp_mod*/);
 	/* T1 = 3T2 */
 	sp_256_mont_tpl_8(t1, t2 /*, p256_mod*/);
 	/* Y = 2Y */
 	sp_256_mont_dbl_8(r->y, r->y /*, p256_mod*/);
 	/* Y = Y * Y */
-	sp_256_mont_sqr_8(r->y, r->y /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_sqr_8(r->y, r->y /*, p256_mod, p256_mp_mod*/);
 	/* T2 = Y * Y */
-	sp_256_mont_sqr_8(t2, r->y /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_sqr_8(t2, r->y /*, p256_mod, p256_mp_mod*/);
 	/* T2 = T2/2 */
 	sp_256_div2_8(t2, t2, p256_mod);
 	/* Y = Y * X */
-	sp_256_mont_mul_8(r->y, r->y, r->x /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(r->y, r->y, r->x /*, p256_mod, p256_mp_mod*/);
 	/* X = T1 * T1 */
-	sp_256_mont_mul_8(r->x, t1, t1 /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(r->x, t1, t1 /*, p256_mod, p256_mp_mod*/);
 	/* X = X - Y */
 	sp_256_mont_sub_8(r->x, r->x, r->y /*, p256_mod*/);
 	/* X = X - Y */
@@ -1234,7 +1139,7 @@ static void sp_256_proj_point_dbl_8(sp_point* r, sp_point* p)
 	/* Y = Y - X */
 	sp_256_mont_sub_8(r->y, r->y, r->x /*, p256_mod*/);
 	/* Y = Y * T1 */
-	sp_256_mont_mul_8(r->y, r->y, t1 /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(r->y, r->y, t1 /*, p256_mod, p256_mp_mod*/);
 	/* Y = Y - T2 */
 	sp_256_mont_sub_8(r->y, r->y, t2 /*, p256_mod*/);
 	dump_512("y2 %s\n", r->y);
@@ -1279,36 +1184,36 @@ static NOINLINE void sp_256_proj_point_add_8(sp_point* r, sp_point* p, sp_point*
 	}
 
 	/* U1 = X1*Z2^2 */
-	sp_256_mont_sqr_8(t1, q->z /*, p256_mod, p256_mp_mod*/);
-	sp_256_mont_mul_8(t3, t1, q->z /*, p256_mod, p256_mp_mod*/);
-	sp_256_mont_mul_8(t1, t1, r->x /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_sqr_8(t1, q->z /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(t3, t1, q->z /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(t1, t1, r->x /*, p256_mod, p256_mp_mod*/);
 	/* U2 = X2*Z1^2 */
-	sp_256_mont_sqr_8(t2, r->z /*, p256_mod, p256_mp_mod*/);
-	sp_256_mont_mul_8(t4, t2, r->z /*, p256_mod, p256_mp_mod*/);
-	sp_256_mont_mul_8(t2, t2, q->x /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_sqr_8(t2, r->z /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(t4, t2, r->z /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(t2, t2, q->x /*, p256_mod, p256_mp_mod*/);
 	/* S1 = Y1*Z2^3 */
-	sp_256_mont_mul_8(t3, t3, r->y /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(t3, t3, r->y /*, p256_mod, p256_mp_mod*/);
 	/* S2 = Y2*Z1^3 */
-	sp_256_mont_mul_8(t4, t4, q->y /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(t4, t4, q->y /*, p256_mod, p256_mp_mod*/);
 	/* H = U2 - U1 */
 	sp_256_mont_sub_8(t2, t2, t1 /*, p256_mod*/);
 	/* R = S2 - S1 */
 	sp_256_mont_sub_8(t4, t4, t3 /*, p256_mod*/);
 	/* Z3 = H*Z1*Z2 */
-	sp_256_mont_mul_8(r->z, r->z, q->z /*, p256_mod, p256_mp_mod*/);
-	sp_256_mont_mul_8(r->z, r->z, t2 /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(r->z, r->z, q->z /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(r->z, r->z, t2 /*, p256_mod, p256_mp_mod*/);
 	/* X3 = R^2 - H^3 - 2*U1*H^2 */
-	sp_256_mont_sqr_8(r->x, t4 /*, p256_mod, p256_mp_mod*/);
-	sp_256_mont_sqr_8(t5, t2 /*, p256_mod, p256_mp_mod*/);
-	sp_256_mont_mul_8(r->y, t1, t5 /*, p256_mod, p256_mp_mod*/);
-	sp_256_mont_mul_8(t5, t5, t2 /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_sqr_8(r->x, t4 /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_sqr_8(t5, t2 /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(r->y, t1, t5 /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(t5, t5, t2 /*, p256_mod, p256_mp_mod*/);
 	sp_256_mont_sub_8(r->x, r->x, t5 /*, p256_mod*/);
 	sp_256_mont_dbl_8(t1, r->y /*, p256_mod*/);
 	sp_256_mont_sub_8(r->x, r->x, t1 /*, p256_mod*/);
 	/* Y3 = R*(U1*H^2 - X3) - S1*H^3 */
 	sp_256_mont_sub_8(r->y, r->y, r->x /*, p256_mod*/);
-	sp_256_mont_mul_8(r->y, r->y, t4 /*, p256_mod, p256_mp_mod*/);
-	sp_256_mont_mul_8(t5, t5, t3 /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(r->y, r->y, t4 /*, p256_mod, p256_mp_mod*/);
+	sp_256to512z_mont_mul_8(t5, t5, t3 /*, p256_mod, p256_mp_mod*/);
 	sp_256_mont_sub_8(r->y, r->y, t5 /*, p256_mod*/);
 }
 

From vda.linux at googlemail.com  Sat Nov 27 17:42:27 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sat, 27 Nov 2021 18:42:27 +0100
Subject: [git commit] tls: P256: do not open-code copying of struct variables
Message-ID: <20211127182803.A28D58D5F1@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=9c671fe3dd2e46a28c02d266130f56a1a6296791
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 networking/tls_sp_c32.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/networking/tls_sp_c32.c b/networking/tls_sp_c32.c
index b3f7888f5..3291b553c 100644
--- a/networking/tls_sp_c32.c
+++ b/networking/tls_sp_c32.c
@@ -865,6 +865,8 @@ static int sp_256_mul_add_8(sp_digit* r /*, const sp_digit* a, sp_digit b*/)
 }
 
 /* Reduce the number back to 256 bits using Montgomery reduction.
+ * Note: the result is NOT guaranteed to be less than p256_mod!
+ * (it is only guaranteed to fit into 256 bits).
  *
  * a   Double-wide number to reduce in place.
  * m   The single precision number representing the modulus.
@@ -1276,7 +1278,7 @@ static void sp_256_ecc_mulmod_8(sp_point* r, const sp_point* g, const sp_digit*
 	if (map)
 		sp_256_map_8(r, &t[0]);
 	else
-		memcpy(r, &t[0], sizeof(sp_point));
+		*r = t[0]; /* struct copy */
 
 	memset(t, 0, sizeof(t)); //paranoia
 }

From vda.linux at googlemail.com  Sat Nov 27 18:27:03 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sat, 27 Nov 2021 19:27:03 +0100
Subject: [git commit] tls: P256: change logic so that we don't need
 double-wide vectors everywhere
Message-ID: <20211127182803.AB8928D5F8@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=f92ae1dc4bc00e352e683b826609efa5e1e22708
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

Change sp_256to512z_mont_{mul,sqr}_8 to not require/zero upper 256 bits.
There is only one place where we actually used that (and that's why there
used to be zeroing memset of top half!). Fix up that place.
As a bonus, 256x256->512 multiply no longer needs to care for
"r overlaps a or b" case.

This shrinks sp_point structure as well, not just temporaries.

function                                             old     new   delta
sp_256to512z_mont_mul_8                              150       -    -150
sp_256_mont_mul_8                                      -     147    +147
sp_256to512z_mont_sqr_8                                7       -      -7
sp_256_mont_sqr_8                                      -       7      +7
sp_256_ecc_mulmod_8                                  494     543     +49
sp_512to256_mont_reduce_8                            243     249      +6
sp_256_point_from_bin2x32                             73      70      -3
sp_256_proj_point_dbl_8                              353     345      -8
sp_256_proj_point_add_8                              544     499     -45
------------------------------------------------------------------------------
(add/remove: 2/2 grow/shrink: 2/3 up/down: 209/-213)           Total: -4 bytes

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 networking/tls_sp_c32.c | 178 ++++++++++++++++++++----------------------------
 1 file changed, 72 insertions(+), 106 deletions(-)

diff --git a/networking/tls_sp_c32.c b/networking/tls_sp_c32.c
index 3291b553c..3452b08b9 100644
--- a/networking/tls_sp_c32.c
+++ b/networking/tls_sp_c32.c
@@ -49,9 +49,9 @@ typedef int32_t signed_sp_digit;
  */
 
 typedef struct sp_point {
-	sp_digit x[2 * 8];
-	sp_digit y[2 * 8];
-	sp_digit z[2 * 8];
+	sp_digit x[8];
+	sp_digit y[8];
+	sp_digit z[8];
 	int infinity;
 } sp_point;
 
@@ -456,12 +456,11 @@ static void sp_256_sub_8_p256_mod(sp_digit* r)
 #endif
 
 /* Multiply a and b into r. (r = a * b)
- * r should be [16] array (512 bits).
+ * r should be [16] array (512 bits), and must not coincide with a or b.
  */
 static void sp_256to512_mul_8(sp_digit* r, const sp_digit* a, const sp_digit* b)
 {
 #if ALLOW_ASM && defined(__GNUC__) && defined(__i386__)
-	sp_digit rr[15]; /* in case r coincides with a or b */
 	int k;
 	uint32_t accl;
 	uint32_t acch;
@@ -493,16 +492,15 @@ static void sp_256to512_mul_8(sp_digit* r, const sp_digit* a, const sp_digit* b)
 		        j--;
 			i++;
 		} while (i != 8 && i <= k);
-		rr[k] = accl;
+		r[k] = accl;
 		accl = acch;
 		acch = acc_hi;
 	}
 	r[15] = accl;
-	memcpy(r, rr, sizeof(rr));
 #elif ALLOW_ASM && defined(__GNUC__) && defined(__x86_64__)
 	const uint64_t* aa = (const void*)a;
 	const uint64_t* bb = (const void*)b;
-	uint64_t rr[8];
+	const uint64_t* rr = (const void*)r;
 	int k;
 	uint64_t accl;
 	uint64_t acch;
@@ -539,11 +537,8 @@ static void sp_256to512_mul_8(sp_digit* r, const sp_digit* a, const sp_digit* b)
 		acch = acc_hi;
 	}
 	rr[7] = accl;
-	memcpy(r, rr, sizeof(rr));
 #elif 0
 	//TODO: arm assembly (untested)
-	sp_digit tmp[16];
-
 	asm volatile (
 "\n		mov	r5, #0"
 "\n		mov	r6, #0"
@@ -575,12 +570,10 @@ static void sp_256to512_mul_8(sp_digit* r, const sp_digit* a, const sp_digit* b)
 "\n		cmp	r5, #56"
 "\n		ble	1b"
 "\n		str	r6, [%[r], r5]"
-		: [r] "r" (tmp), [a] "r" (a), [b] "r" (b)
+		: [r] "r" (r), [a] "r" (a), [b] "r" (b)
 		: "memory", "r3", "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r12", "r14"
 	);
-	memcpy(r, tmp, sizeof(tmp));
 #else
-	sp_digit rr[15]; /* in case r coincides with a or b */
 	int i, j, k;
 	uint64_t acc;
 
@@ -600,11 +593,10 @@ static void sp_256to512_mul_8(sp_digit* r, const sp_digit* a, const sp_digit* b)
 		        j--;
 			i++;
 		} while (i != 8 && i <= k);
-		rr[k] = acc;
+		r[k] = acc;
 		acc = (acc >> 32) | ((uint64_t)acc_hi << 32);
 	}
 	r[15] = acc;
-	memcpy(r, rr, sizeof(rr));
 #endif
 }
 
@@ -709,30 +701,11 @@ static void sp_256_mont_tpl_8(sp_digit* r, const sp_digit* a /*, const sp_digit*
 }
 
 /* Shift the result in the high 256 bits down to the bottom.
- * High half is cleared to zeros.
  */
-#if BB_UNALIGNED_MEMACCESS_OK && ULONG_MAX > 0xffffffff
-static void sp_512to256_mont_shift_8(sp_digit* rr)
+static void sp_512to256_mont_shift_8(sp_digit* r, sp_digit* a)
 {
-	uint64_t *r = (void*)rr;
-	int i;
-
-	for (i = 0; i < 4; i++) {
-		r[i] = r[i+4];
-		r[i+4] = 0;
-	}
+	memcpy(r, a + 8, sizeof(*r) * 8);
 }
-#else
-static void sp_512to256_mont_shift_8(sp_digit* r)
-{
-	int i;
-
-	for (i = 0; i < 8; i++) {
-		r[i] = r[i+8];
-		r[i+8] = 0;
-	}
-}
-#endif
 
 /* Mul a by scalar b and add into r. (r += a * b)
  * a = p256_mod
@@ -868,11 +841,12 @@ static int sp_256_mul_add_8(sp_digit* r /*, const sp_digit* a, sp_digit b*/)
  * Note: the result is NOT guaranteed to be less than p256_mod!
  * (it is only guaranteed to fit into 256 bits).
  *
- * a   Double-wide number to reduce in place.
+ * r   Result.
+ * a   Double-wide number to reduce. Clobbered.
  * m   The single precision number representing the modulus.
  * mp  The digit representing the negative inverse of m mod 2^n.
  */
-static void sp_512to256_mont_reduce_8(sp_digit* a/*, const sp_digit* m, sp_digit mp*/)
+static void sp_512to256_mont_reduce_8(sp_digit* r, sp_digit* a/*, const sp_digit* m, sp_digit mp*/)
 {
 //	const sp_digit* m = p256_mod;
 	sp_digit mp = p256_mp_mod;
@@ -895,10 +869,10 @@ static void sp_512to256_mont_reduce_8(sp_digit* a/*, const sp_digit* m, sp_digit
 					goto inc_next_word0;
 			}
 		}
-		sp_512to256_mont_shift_8(a);
+		sp_512to256_mont_shift_8(r, a);
 		if (word16th != 0)
-			sp_256_sub_8_p256_mod(a);
-		sp_256_norm_8(a);
+			sp_256_sub_8_p256_mod(r);
+		sp_256_norm_8(r);
 	}
 	else { /* Same code for explicit mp == 1 (which is always the case for P256) */
 		sp_digit word16th = 0;
@@ -915,10 +889,10 @@ static void sp_512to256_mont_reduce_8(sp_digit* a/*, const sp_digit* m, sp_digit
 					goto inc_next_word;
 			}
 		}
-		sp_512to256_mont_shift_8(a);
+		sp_512to256_mont_shift_8(r, a);
 		if (word16th != 0)
-			sp_256_sub_8_p256_mod(a);
-		sp_256_norm_8(a);
+			sp_256_sub_8_p256_mod(r);
+		sp_256_norm_8(r);
 	}
 }
 
@@ -926,35 +900,34 @@ static void sp_512to256_mont_reduce_8(sp_digit* a/*, const sp_digit* m, sp_digit
  * (r = a * b mod m)
  *
  * r   Result of multiplication.
- *     Should be [16] array (512 bits), but high half is cleared to zeros (used as scratch pad).
  * a   First number to multiply in Montogmery form.
  * b   Second number to multiply in Montogmery form.
  * m   Modulus (prime).
  * mp  Montogmery mulitplier.
  */
-static void sp_256to512z_mont_mul_8(sp_digit* r, const sp_digit* a, const sp_digit* b
+static void sp_256_mont_mul_8(sp_digit* r, const sp_digit* a, const sp_digit* b
 		/*, const sp_digit* m, sp_digit mp*/)
 {
 	//const sp_digit* m = p256_mod;
 	//sp_digit mp = p256_mp_mod;
-	sp_256to512_mul_8(r, a, b);
-	sp_512to256_mont_reduce_8(r /*, m, mp*/);
+	sp_digit t[2 * 8];
+	sp_256to512_mul_8(t, a, b);
+	sp_512to256_mont_reduce_8(r, t /*, m, mp*/);
 }
 
 /* Square the Montgomery form number. (r = a * a mod m)
  *
  * r   Result of squaring.
- *     Should be [16] array (512 bits), but high half is cleared to zeros (used as scratch pad).
  * a   Number to square in Montogmery form.
  * m   Modulus (prime).
  * mp  Montogmery mulitplier.
  */
-static void sp_256to512z_mont_sqr_8(sp_digit* r, const sp_digit* a
+static void sp_256_mont_sqr_8(sp_digit* r, const sp_digit* a
 		/*, const sp_digit* m, sp_digit mp*/)
 {
 	//const sp_digit* m = p256_mod;
 	//sp_digit mp = p256_mp_mod;
-	sp_256to512z_mont_mul_8(r, a, a /*, m, mp*/);
+	sp_256_mont_mul_8(r, a, a /*, m, mp*/);
 }
 
 /* Invert the number, in Montgomery form, modulo the modulus (prime) of the
@@ -964,11 +937,8 @@ static void sp_256to512z_mont_sqr_8(sp_digit* r, const sp_digit* a
  * a   Number to invert.
  */
 #if 0
-/* Mod-2 for the P256 curve. */
-static const uint32_t p256_mod_2[8] = {
-	0xfffffffd,0xffffffff,0xffffffff,0x00000000,
-	0x00000000,0x00000000,0x00000001,0xffffffff,
-};
+//p256_mod - 2:
+//ffffffff 00000001 00000000 00000000 00000000 ffffffff ffffffff ffffffff - 2
 //Bit pattern:
 //2    2         2         2         2         2         2         1...1
 //5    5         4         3         2         1         0         9...0         9...1
@@ -977,15 +947,15 @@ static const uint32_t p256_mod_2[8] = {
 #endif
 static void sp_256_mont_inv_8(sp_digit* r, sp_digit* a)
 {
-	sp_digit t[2*8];
+	sp_digit t[8];
 	int i;
 
 	memcpy(t, a, sizeof(sp_digit) * 8);
 	for (i = 254; i >= 0; i--) {
-		sp_256to512z_mont_sqr_8(t, t /*, p256_mod, p256_mp_mod*/);
+		sp_256_mont_sqr_8(t, t /*, p256_mod, p256_mp_mod*/);
 		/*if (p256_mod_2[i / 32] & ((sp_digit)1 << (i % 32)))*/
 		if (i >= 224 || i == 192 || (i <= 95 && i != 1))
-			sp_256to512z_mont_mul_8(t, t, a /*, p256_mod, p256_mp_mod*/);
+			sp_256_mont_mul_8(t, t, a /*, p256_mod, p256_mp_mod*/);
 	}
 	memcpy(r, t, sizeof(sp_digit) * 8);
 }
@@ -1056,25 +1026,28 @@ static void sp_256_mod_mul_norm_8(sp_digit* r, const sp_digit* a)
  */
 static void sp_256_map_8(sp_point* r, sp_point* p)
 {
-	sp_digit t1[2*8];
-	sp_digit t2[2*8];
+	sp_digit t1[8];
+	sp_digit t2[8];
+	sp_digit rr[2 * 8];
 
 	sp_256_mont_inv_8(t1, p->z);
 
-	sp_256to512z_mont_sqr_8(t2, t1 /*, p256_mod, p256_mp_mod*/);
-	sp_256to512z_mont_mul_8(t1, t2, t1 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_sqr_8(t2, t1 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(t1, t2, t1 /*, p256_mod, p256_mp_mod*/);
 
 	/* x /= z^2 */
-	sp_256to512z_mont_mul_8(r->x, p->x, t2 /*, p256_mod, p256_mp_mod*/);
-	sp_512to256_mont_reduce_8(r->x /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(rr, p->x, t2 /*, p256_mod, p256_mp_mod*/);
+	memset(rr + 8, 0, sizeof(rr) / 2);
+	sp_512to256_mont_reduce_8(r->x, rr /*, p256_mod, p256_mp_mod*/);
 	/* Reduce x to less than modulus */
 	if (sp_256_cmp_8(r->x, p256_mod) >= 0)
 		sp_256_sub_8_p256_mod(r->x);
 	sp_256_norm_8(r->x);
 
 	/* y /= z^3 */
-	sp_256to512z_mont_mul_8(r->y, p->y, t1 /*, p256_mod, p256_mp_mod*/);
-	sp_512to256_mont_reduce_8(r->y /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(rr, p->y, t1 /*, p256_mod, p256_mp_mod*/);
+	memset(rr + 8, 0, sizeof(rr) / 2);
+	sp_512to256_mont_reduce_8(r->y, rr /*, p256_mod, p256_mp_mod*/);
 	/* Reduce y to less than modulus */
 	if (sp_256_cmp_8(r->y, p256_mod) >= 0)
 		sp_256_sub_8_p256_mod(r->y);
@@ -1091,8 +1064,8 @@ static void sp_256_map_8(sp_point* r, sp_point* p)
  */
 static void sp_256_proj_point_dbl_8(sp_point* r, sp_point* p)
 {
-	sp_digit t1[2*8];
-	sp_digit t2[2*8];
+	sp_digit t1[8];
+	sp_digit t2[8];
 
 	/* Put point to double into result */
 	if (r != p)
@@ -1101,17 +1074,10 @@ static void sp_256_proj_point_dbl_8(sp_point* r, sp_point* p)
 	if (r->infinity)
 		return;
 
-	if (SP_DEBUG) {
-		/* unused part of t2, may result in spurios
-		 * differences in debug output. Clear it.
-		 */
-		memset(t2, 0, sizeof(t2));
-	}
-
 	/* T1 = Z * Z */
-	sp_256to512z_mont_sqr_8(t1, r->z /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_sqr_8(t1, r->z /*, p256_mod, p256_mp_mod*/);
 	/* Z = Y * Z */
-	sp_256to512z_mont_mul_8(r->z, r->y, r->z /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(r->z, r->y, r->z /*, p256_mod, p256_mp_mod*/);
 	/* Z = 2Z */
 	sp_256_mont_dbl_8(r->z, r->z /*, p256_mod*/);
 	/* T2 = X - T1 */
@@ -1119,21 +1085,21 @@ static void sp_256_proj_point_dbl_8(sp_point* r, sp_point* p)
 	/* T1 = X + T1 */
 	sp_256_mont_add_8(t1, r->x, t1 /*, p256_mod*/);
 	/* T2 = T1 * T2 */
-	sp_256to512z_mont_mul_8(t2, t1, t2 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(t2, t1, t2 /*, p256_mod, p256_mp_mod*/);
 	/* T1 = 3T2 */
 	sp_256_mont_tpl_8(t1, t2 /*, p256_mod*/);
 	/* Y = 2Y */
 	sp_256_mont_dbl_8(r->y, r->y /*, p256_mod*/);
 	/* Y = Y * Y */
-	sp_256to512z_mont_sqr_8(r->y, r->y /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_sqr_8(r->y, r->y /*, p256_mod, p256_mp_mod*/);
 	/* T2 = Y * Y */
-	sp_256to512z_mont_sqr_8(t2, r->y /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_sqr_8(t2, r->y /*, p256_mod, p256_mp_mod*/);
 	/* T2 = T2/2 */
 	sp_256_div2_8(t2 /*, p256_mod*/);
 	/* Y = Y * X */
-	sp_256to512z_mont_mul_8(r->y, r->y, r->x /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(r->y, r->y, r->x /*, p256_mod, p256_mp_mod*/);
 	/* X = T1 * T1 */
-	sp_256to512z_mont_mul_8(r->x, t1, t1 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(r->x, t1, t1 /*, p256_mod, p256_mp_mod*/);
 	/* X = X - Y */
 	sp_256_mont_sub_8(r->x, r->x, r->y /*, p256_mod*/);
 	/* X = X - Y */
@@ -1141,7 +1107,7 @@ static void sp_256_proj_point_dbl_8(sp_point* r, sp_point* p)
 	/* Y = Y - X */
 	sp_256_mont_sub_8(r->y, r->y, r->x /*, p256_mod*/);
 	/* Y = Y * T1 */
-	sp_256to512z_mont_mul_8(r->y, r->y, t1 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(r->y, r->y, t1 /*, p256_mod, p256_mp_mod*/);
 	/* Y = Y - T2 */
 	sp_256_mont_sub_8(r->y, r->y, t2 /*, p256_mod*/);
 	dump_512("y2 %s\n", r->y);
@@ -1155,11 +1121,11 @@ static void sp_256_proj_point_dbl_8(sp_point* r, sp_point* p)
  */
 static NOINLINE void sp_256_proj_point_add_8(sp_point* r, sp_point* p, sp_point* q)
 {
-	sp_digit t1[2*8];
-	sp_digit t2[2*8];
-	sp_digit t3[2*8];
-	sp_digit t4[2*8];
-	sp_digit t5[2*8];
+	sp_digit t1[8];
+	sp_digit t2[8];
+	sp_digit t3[8];
+	sp_digit t4[8];
+	sp_digit t5[8];
 
 	/* Ensure only the first point is the same as the result. */
 	if (q == r) {
@@ -1186,36 +1152,36 @@ static NOINLINE void sp_256_proj_point_add_8(sp_point* r, sp_point* p, sp_point*
 	}
 
 	/* U1 = X1*Z2^2 */
-	sp_256to512z_mont_sqr_8(t1, q->z /*, p256_mod, p256_mp_mod*/);
-	sp_256to512z_mont_mul_8(t3, t1, q->z /*, p256_mod, p256_mp_mod*/);
-	sp_256to512z_mont_mul_8(t1, t1, r->x /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_sqr_8(t1, q->z /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(t3, t1, q->z /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(t1, t1, r->x /*, p256_mod, p256_mp_mod*/);
 	/* U2 = X2*Z1^2 */
-	sp_256to512z_mont_sqr_8(t2, r->z /*, p256_mod, p256_mp_mod*/);
-	sp_256to512z_mont_mul_8(t4, t2, r->z /*, p256_mod, p256_mp_mod*/);
-	sp_256to512z_mont_mul_8(t2, t2, q->x /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_sqr_8(t2, r->z /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(t4, t2, r->z /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(t2, t2, q->x /*, p256_mod, p256_mp_mod*/);
 	/* S1 = Y1*Z2^3 */
-	sp_256to512z_mont_mul_8(t3, t3, r->y /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(t3, t3, r->y /*, p256_mod, p256_mp_mod*/);
 	/* S2 = Y2*Z1^3 */
-	sp_256to512z_mont_mul_8(t4, t4, q->y /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(t4, t4, q->y /*, p256_mod, p256_mp_mod*/);
 	/* H = U2 - U1 */
 	sp_256_mont_sub_8(t2, t2, t1 /*, p256_mod*/);
 	/* R = S2 - S1 */
 	sp_256_mont_sub_8(t4, t4, t3 /*, p256_mod*/);
 	/* Z3 = H*Z1*Z2 */
-	sp_256to512z_mont_mul_8(r->z, r->z, q->z /*, p256_mod, p256_mp_mod*/);
-	sp_256to512z_mont_mul_8(r->z, r->z, t2 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(r->z, r->z, q->z /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(r->z, r->z, t2 /*, p256_mod, p256_mp_mod*/);
 	/* X3 = R^2 - H^3 - 2*U1*H^2 */
-	sp_256to512z_mont_sqr_8(r->x, t4 /*, p256_mod, p256_mp_mod*/);
-	sp_256to512z_mont_sqr_8(t5, t2 /*, p256_mod, p256_mp_mod*/);
-	sp_256to512z_mont_mul_8(r->y, t1, t5 /*, p256_mod, p256_mp_mod*/);
-	sp_256to512z_mont_mul_8(t5, t5, t2 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_sqr_8(r->x, t4 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_sqr_8(t5, t2 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(r->y, t1, t5 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(t5, t5, t2 /*, p256_mod, p256_mp_mod*/);
 	sp_256_mont_sub_8(r->x, r->x, t5 /*, p256_mod*/);
 	sp_256_mont_dbl_8(t1, r->y /*, p256_mod*/);
 	sp_256_mont_sub_8(r->x, r->x, t1 /*, p256_mod*/);
 	/* Y3 = R*(U1*H^2 - X3) - S1*H^3 */
 	sp_256_mont_sub_8(r->y, r->y, r->x /*, p256_mod*/);
-	sp_256to512z_mont_mul_8(r->y, r->y, t4 /*, p256_mod, p256_mp_mod*/);
-	sp_256to512z_mont_mul_8(t5, t5, t3 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(r->y, r->y, t4 /*, p256_mod, p256_mp_mod*/);
+	sp_256_mont_mul_8(t5, t5, t3 /*, p256_mod, p256_mp_mod*/);
 	sp_256_mont_sub_8(r->y, r->y, t5 /*, p256_mod*/);
 }
 

From vda.linux at googlemail.com  Sat Nov 27 18:36:23 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sat, 27 Nov 2021 19:36:23 +0100
Subject: [git commit] tls: P256: trivial x86-64 fix
Message-ID: <20211127183312.2BC338DB0C@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=0b13ab66f43fc1a9437361cfcd33b485422eb0ae
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 networking/tls_sp_c32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/networking/tls_sp_c32.c b/networking/tls_sp_c32.c
index 3452b08b9..4c8f08d4e 100644
--- a/networking/tls_sp_c32.c
+++ b/networking/tls_sp_c32.c
@@ -500,7 +500,7 @@ static void sp_256to512_mul_8(sp_digit* r, const sp_digit* a, const sp_digit* b)
 #elif ALLOW_ASM && defined(__GNUC__) && defined(__x86_64__)
 	const uint64_t* aa = (const void*)a;
 	const uint64_t* bb = (const void*)b;
-	const uint64_t* rr = (const void*)r;
+	uint64_t* rr = (void*)r;
 	int k;
 	uint64_t accl;
 	uint64_t acch;

From vda.linux at googlemail.com  Sun Nov 28 01:56:02 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sun, 28 Nov 2021 02:56:02 +0100
Subject: [git commit] tls: P256: pad struct sp_point to 64 bits (on 64-bit
 arches)
Message-ID: <20211128015203.3E0788E29B@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=1b93c7c4ecc47318905b6e6f801732b7dd31e0ee
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

function                                             old     new   delta
curve_P256_compute_pubkey_and_premaster              198     190      -8

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 networking/tls_sp_c32.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/networking/tls_sp_c32.c b/networking/tls_sp_c32.c
index 4c8f08d4e..37e1cfa1c 100644
--- a/networking/tls_sp_c32.c
+++ b/networking/tls_sp_c32.c
@@ -49,14 +49,19 @@ typedef int32_t signed_sp_digit;
  */
 
 typedef struct sp_point {
-	sp_digit x[8];
+	sp_digit x[8]
+#if ULONG_MAX > 0xffffffff
+		/* Make sp_point[] arrays to not be 64-bit misaligned */
+		ALIGNED(8)
+#endif
+	;
 	sp_digit y[8];
 	sp_digit z[8];
 	int infinity;
 } sp_point;
 
 /* The modulus (prime) of the curve P256. */
-static const sp_digit p256_mod[8] = {
+static const sp_digit p256_mod[8] ALIGNED(8) = {
 	0xffffffff,0xffffffff,0xffffffff,0x00000000,
 	0x00000000,0x00000000,0x00000001,0xffffffff,
 };
@@ -903,7 +908,7 @@ static void sp_512to256_mont_reduce_8(sp_digit* r, sp_digit* a/*, const sp_digit
  * a   First number to multiply in Montogmery form.
  * b   Second number to multiply in Montogmery form.
  * m   Modulus (prime).
- * mp  Montogmery mulitplier.
+ * mp  Montogmery multiplier.
  */
 static void sp_256_mont_mul_8(sp_digit* r, const sp_digit* a, const sp_digit* b
 		/*, const sp_digit* m, sp_digit mp*/)
@@ -920,7 +925,7 @@ static void sp_256_mont_mul_8(sp_digit* r, const sp_digit* a, const sp_digit* b
  * r   Result of squaring.
  * a   Number to square in Montogmery form.
  * m   Modulus (prime).
- * mp  Montogmery mulitplier.
+ * mp  Montogmery multiplier.
  */
 static void sp_256_mont_sqr_8(sp_digit* r, const sp_digit* a
 		/*, const sp_digit* m, sp_digit mp*/)
@@ -1145,7 +1150,6 @@ static NOINLINE void sp_256_proj_point_add_8(sp_point* r, sp_point* p, sp_point*
 		return;
 	}
 
-
 	if (p->infinity || q->infinity) {
 		*r = p->infinity ? *q : *p; /* struct copy */
 		return;

From rep.dot.nop at gmail.com  Sun Nov 28 09:53:22 2021
From: rep.dot.nop at gmail.com (Bernhard Reutner-Fischer)
Date: Sun, 28 Nov 2021 10:53:22 +0100
Subject: [git commit] libarchive: remove duplicate forward declaration
Message-ID: <20211128095007.D8F8E8D36F@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=bfefa6ab6cf30507009cca7182c7302900fb5534
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

Signed-off-by: Bernhard Reutner-Fischer <rep.dot.nop at gmail.com>
---
 include/bb_archive.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/bb_archive.h b/include/bb_archive.h
index dc5e55f0a..e0ef8fc4e 100644
--- a/include/bb_archive.h
+++ b/include/bb_archive.h
@@ -195,7 +195,6 @@ char get_header_ar(archive_handle_t *archive_handle) FAST_FUNC;
 char get_header_cpio(archive_handle_t *archive_handle) FAST_FUNC;
 char get_header_tar(archive_handle_t *archive_handle) FAST_FUNC;
 char get_header_tar_gz(archive_handle_t *archive_handle) FAST_FUNC;
-char get_header_tar_xz(archive_handle_t *archive_handle) FAST_FUNC;
 char get_header_tar_bz2(archive_handle_t *archive_handle) FAST_FUNC;
 char get_header_tar_lzma(archive_handle_t *archive_handle) FAST_FUNC;
 char get_header_tar_xz(archive_handle_t *archive_handle) FAST_FUNC;

From vda.linux at googlemail.com  Sun Nov 28 10:15:34 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sun, 28 Nov 2021 11:15:34 +0100
Subject: [git commit] tls: P256: simplify sp_256_mont_inv_8 (no need for a
 temporary)
Message-ID: <20211128101115.1778F8D47D@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=cfb615781df5c7439fe0060a85e6b6a56d10dc7f
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

function                                             old     new   delta
sp_256_ecc_mulmod_8                                  543     517     -26

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 networking/tls_sp_c32.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/networking/tls_sp_c32.c b/networking/tls_sp_c32.c
index 37e1cfa1c..9bd5c6832 100644
--- a/networking/tls_sp_c32.c
+++ b/networking/tls_sp_c32.c
@@ -938,7 +938,7 @@ static void sp_256_mont_sqr_8(sp_digit* r, const sp_digit* a
 /* Invert the number, in Montgomery form, modulo the modulus (prime) of the
  * P256 curve. (r = 1 / a mod m)
  *
- * r   Inverse result.
+ * r   Inverse result. Must not coincide with a.
  * a   Number to invert.
  */
 #if 0
@@ -952,17 +952,15 @@ static void sp_256_mont_sqr_8(sp_digit* r, const sp_digit* a
 #endif
 static void sp_256_mont_inv_8(sp_digit* r, sp_digit* a)
 {
-	sp_digit t[8];
 	int i;
 
-	memcpy(t, a, sizeof(sp_digit) * 8);
+	memcpy(r, a, sizeof(sp_digit) * 8);
 	for (i = 254; i >= 0; i--) {
-		sp_256_mont_sqr_8(t, t /*, p256_mod, p256_mp_mod*/);
+		sp_256_mont_sqr_8(r, r /*, p256_mod, p256_mp_mod*/);
 		/*if (p256_mod_2[i / 32] & ((sp_digit)1 << (i % 32)))*/
 		if (i >= 224 || i == 192 || (i <= 95 && i != 1))
-			sp_256_mont_mul_8(t, t, a /*, p256_mod, p256_mp_mod*/);
+			sp_256_mont_mul_8(r, r, a /*, p256_mod, p256_mp_mod*/);
 	}
-	memcpy(r, t, sizeof(sp_digit) * 8);
 }
 
 /* Multiply a number by Montogmery normalizer mod modulus (prime).

From vda.linux at googlemail.com  Sun Nov 28 11:21:23 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sun, 28 Nov 2021 12:21:23 +0100
Subject: [git commit] libbb: code shrink in des encryption, in setup_salt()
Message-ID: <20211128112225.5A6958F29C@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=00b5051cd25ef7e42ac62637ba16b70d3ac1014a
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

function                                             old     new   delta
pw_encrypt                                           978     971      -7
.rodata                                           108208  108192     -16
des_crypt                                           1211    1181     -30
------------------------------------------------------------------------------
(add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-53)             Total: -53 bytes

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 libbb/pw_encrypt_des.c  | 29 ++++++++++++++---------------
 testsuite/cryptpw.tests | 14 ++++++++++++++
 2 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/libbb/pw_encrypt_des.c b/libbb/pw_encrypt_des.c
index dcd3521e2..fe8237cfe 100644
--- a/libbb/pw_encrypt_des.c
+++ b/libbb/pw_encrypt_des.c
@@ -363,7 +363,7 @@ des_init(struct des_ctx *ctx, const struct const_des_ctx *cctx)
 	old_rawkey0 = old_rawkey1 = 0;
 	old_salt = 0;
 #endif
-	saltbits = 0;
+	//saltbits = 0; /* not needed: we call setup_salt() before do_des() */
 	bits28 = bits32 + 4;
 	bits24 = bits28 + 4;
 
@@ -481,12 +481,11 @@ des_init(struct des_ctx *ctx, const struct const_des_ctx *cctx)
 	return ctx;
 }
 
-
+/* Accepts 24-bit salt at max */
 static void
 setup_salt(struct des_ctx *ctx, uint32_t salt)
 {
-	uint32_t obit, saltbit;
-	int i;
+	uint32_t invbits;
 
 #if USE_REPETITIVE_SPEEDUP
 	if (salt == old_salt)
@@ -494,15 +493,15 @@ setup_salt(struct des_ctx *ctx, uint32_t salt)
 	old_salt = salt;
 #endif
 
-	saltbits = 0;
-	saltbit = 1;
-	obit = 0x800000;
-	for (i = 0; i < 24; i++) {
-		if (salt & saltbit)
-			saltbits |= obit;
-		saltbit <<= 1;
-		obit >>= 1;
-	}
+	invbits = 0;
+
+	salt |= (1 << 24);
+	do {
+		invbits = (invbits << 1) + (salt & 1);
+		salt >>= 1;
+	} while (salt != 1);
+
+	saltbits = invbits;
 }
 
 static void
@@ -736,14 +735,14 @@ des_crypt(struct des_ctx *ctx, char output[DES_OUT_BUFSIZE],
 	des_setkey(ctx, (char *)keybuf);
 
 	/*
-	 * salt_str - 2 bytes of salt
+	 * salt_str - 2 chars of salt (converted to 12 bits)
 	 * key - up to 8 characters
 	 */
 	output[0] = salt_str[0];
 	output[1] = salt_str[1];
 	salt = (ascii_to_bin(salt_str[1]) << 6)
 	     |  ascii_to_bin(salt_str[0]);
-	setup_salt(ctx, salt);
+	setup_salt(ctx, salt); /* set ctx->saltbits for do_des() */
 
 	/* Do it. */
 	do_des(ctx, /*0, 0,*/ &r0, &r1, 25 /* count */);
diff --git a/testsuite/cryptpw.tests b/testsuite/cryptpw.tests
index 8ec476c9f..0dd91fe15 100755
--- a/testsuite/cryptpw.tests
+++ b/testsuite/cryptpw.tests
@@ -7,6 +7,20 @@
 
 # testing "description" "command" "result" "infile" "stdin"
 
+#optional USE_BB_CRYPT
+testing "cryptpw des 12" \
+	"cryptpw -m des QWErty '123456789012345678901234567890'" \
+	'12MnB3PqfVbMA\n' "" ""
+
+testing "cryptpw des 55" \
+	"cryptpw -m des QWErty 55" \
+	'55tgFLtkT1Y72\n' "" ""
+
+testing "cryptpw des zz" \
+	"cryptpw -m des QWErty zz" \
+	'zzIZaaXWOkxVk\n' "" ""
+#SKIP=
+
 optional USE_BB_CRYPT_SHA
 testing "cryptpw sha256" \
 	"cryptpw -m sha256 QWErty '123456789012345678901234567890'" \

From vda.linux at googlemail.com  Sun Nov 28 11:55:20 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sun, 28 Nov 2021 12:55:20 +0100
Subject: [git commit] tls: P256: add comment on logic in
 sp_512to256_mont_reduce_8, no code changes
Message-ID: <20211128115118.298808F2A3@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=832626227ea3798403159080532f763a37273a91
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 networking/tls_sp_c32.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/networking/tls_sp_c32.c b/networking/tls_sp_c32.c
index 9bd5c6832..eb6cc2431 100644
--- a/networking/tls_sp_c32.c
+++ b/networking/tls_sp_c32.c
@@ -850,6 +850,20 @@ static int sp_256_mul_add_8(sp_digit* r /*, const sp_digit* a, sp_digit b*/)
  * a   Double-wide number to reduce. Clobbered.
  * m   The single precision number representing the modulus.
  * mp  The digit representing the negative inverse of m mod 2^n.
+ *
+ * Montgomery reduction on multiprecision integers:
+ * Montgomery reduction requires products modulo R.
+ * When R is a power of B [in our case R=2^128, B=2^32], there is a variant
+ * of Montgomery reduction which requires products only of machine word sized
+ * integers. T is stored as an little-endian word array a[0..n]. The algorithm
+ * reduces it one word at a time. First an appropriate multiple of modulus
+ * is added to make T divisible by B. [In our case, it is p256_mp_mod * a[0].]
+ * Then a multiple of modulus is added to make T divisible by B^2.
+ * [In our case, it is (p256_mp_mod * a[1]) << 32.]
+ * And so on. Eventually T is divisible by R, and after division by R
+ * the algorithm is in the same place as the usual Montgomery reduction was.
+ *
+ * TODO: Can conditionally use 64-bit (if bit-little-endian arch) logic?
  */
 static void sp_512to256_mont_reduce_8(sp_digit* r, sp_digit* a/*, const sp_digit* m, sp_digit mp*/)
 {
@@ -941,15 +955,6 @@ static void sp_256_mont_sqr_8(sp_digit* r, const sp_digit* a
  * r   Inverse result. Must not coincide with a.
  * a   Number to invert.
  */
-#if 0
-//p256_mod - 2:
-//ffffffff 00000001 00000000 00000000 00000000 ffffffff ffffffff ffffffff - 2
-//Bit pattern:
-//2    2         2         2         2         2         2         1...1
-//5    5         4         3         2         1         0         9...0         9...1
-//543210987654321098765432109876543210987654321098765432109876543210...09876543210...09876543210
-//111111111111111111111111111111110000000000000000000000000000000100...00000111111...11111111101
-#endif
 static void sp_256_mont_inv_8(sp_digit* r, sp_digit* a)
 {
 	int i;
@@ -957,7 +962,15 @@ static void sp_256_mont_inv_8(sp_digit* r, sp_digit* a)
 	memcpy(r, a, sizeof(sp_digit) * 8);
 	for (i = 254; i >= 0; i--) {
 		sp_256_mont_sqr_8(r, r /*, p256_mod, p256_mp_mod*/);
-		/*if (p256_mod_2[i / 32] & ((sp_digit)1 << (i % 32)))*/
+/* p256_mod - 2:
+ * ffffffff 00000001 00000000 00000000 00000000 ffffffff ffffffff ffffffff - 2
+ * Bit pattern:
+ * 2    2         2         2         2         2         2         1...1
+ * 5    5         4         3         2         1         0         9...0         9...1
+ * 543210987654321098765432109876543210987654321098765432109876543210...09876543210...09876543210
+ * 111111111111111111111111111111110000000000000000000000000000000100...00000111111...11111111101
+ */
+		/*if (p256_mod_minus_2[i / 32] & ((sp_digit)1 << (i % 32)))*/
 		if (i >= 224 || i == 192 || (i <= 95 && i != 1))
 			sp_256_mont_mul_8(r, r, a /*, p256_mod, p256_mp_mod*/);
 	}

From vda.linux at googlemail.com  Sun Nov 28 14:44:08 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sun, 28 Nov 2021 15:44:08 +0100
Subject: [git commit] tls: P256: add 64-bit montgomery reduce (disabled),
 small optimization in 32-bit code
Message-ID: <20211128144102.094C390981@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=90b0d3304455ad432c49f38e0419ac7820a625f7
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

function                                             old     new   delta
sp_512to256_mont_reduce_8                            191     185      -6

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 networking/tls_sp_c32.c | 177 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 159 insertions(+), 18 deletions(-)

diff --git a/networking/tls_sp_c32.c b/networking/tls_sp_c32.c
index eb6cc2431..b1c410037 100644
--- a/networking/tls_sp_c32.c
+++ b/networking/tls_sp_c32.c
@@ -705,36 +705,174 @@ static void sp_256_mont_tpl_8(sp_digit* r, const sp_digit* a /*, const sp_digit*
 	}
 }
 
-/* Shift the result in the high 256 bits down to the bottom.
- */
+/* Shift the result in the high 256 bits down to the bottom. */
 static void sp_512to256_mont_shift_8(sp_digit* r, sp_digit* a)
 {
 	memcpy(r, a + 8, sizeof(*r) * 8);
 }
 
+// Disabled for now. Seems to work, but ugly and 40 bytes larger on x86-64.
+#if 0 //UNALIGNED_LE_64BIT
+/* 64-bit little-endian optimized version.
+ * See generic 32-bit version below for explanation.
+ * The benefit of this version is: even though r[3] calculation is atrocious,
+ * we call sp_256_mul_add_4() four times, not 8.
+ */
+static int sp_256_mul_add_4(uint64_t *r /*, const uint64_t* a, uint64_t b*/)
+{
+	uint64_t b = r[0];
+
+# if 0
+	const uint64_t* a = (const void*)p256_mod;
+//a[3..0] = ffffffff00000001 0000000000000000 00000000ffffffff ffffffffffffffff
+	uint128_t t;
+	int i;
+	t = 0;
+	for (i = 0; i < 4; i++) {
+		uint32_t t_hi;
+		uint128_t m = ((uint128_t)b * a[i]) + r[i];
+		t += m;
+		t_hi = (t < m);
+		r[i] = (uint64_t)t;
+		t = (t >> 64) | ((uint128_t)t_hi << 64);
+	}
+	r[4] += (uint64_t)t;
+	return (r[4] < (uint64_t)t); /* 1 if addition overflowed */
+# else
+	// Unroll, then optimize the above loop:
+		//uint32_t t_hi;
+		//uint128_t m;
+		uint64_t t64, t64u;
+
+		//m = ((uint128_t)b * a[0]) + r[0];
+		//  Since b is r[0] and a[0] is ffffffffffffffff, the above optimizes to:
+		//  m = r[0] * ffffffffffffffff + r[0] = (r[0] << 64 - r[0]) + r[0] = r[0] << 64;
+		//t += m;
+		//  t = r[0] << 64 = b << 64;
+		//t_hi = (t < m);
+		//  t_hi = 0;
+		//r[0] = (uint64_t)t;
+//		r[0] = 0;
+//the store can be eliminated since caller won't look at lower 256 bits of the result
+		//t = (t >> 64) | ((uint128_t)t_hi << 64);
+		//  t = b;
+
+		//m = ((uint128_t)b * a[1]) + r[1];
+		//  Since a[1] is 00000000ffffffff, the above optimizes to:
+		//  m = b * ffffffff + r[1] = (b * 100000000 - b) + r[1] = (b << 32) - b + r[1];
+		//t += m;
+		//  t = b + (b << 32) - b + r[1] = (b << 32) + r[1];
+		//t_hi = (t < m);
+		//  t_hi = 0;
+		//r[1] = (uint64_t)t;
+		r[1] += (b << 32);
+		//t = (t >> 64) | ((uint128_t)t_hi << 64);
+		t64 = (r[1] < (b << 32));
+		t64 += (b >> 32);
+
+		//m = ((uint128_t)b * a[2]) + r[2];
+		//  Since a[2] is 0000000000000000, the above optimizes to:
+		//  m = b * 0 + r[2] = r[2];
+		//t += m;
+		//  t = t64 + r[2];
+		//t_hi = (t < m);
+		//  t_hi = 0;
+		//r[2] = (uint64_t)t;
+		r[2] += t64;
+		//t = (t >> 64) | ((uint128_t)t_hi << 64);
+		t64 = (r[2] < t64);
+
+		//m = ((uint128_t)b * a[3]) + r[3];
+		//  Since a[3] is ffffffff00000001, the above optimizes to:
+		//  m = b * ffffffff00000001 + r[3];
+		//  m = b +  b*ffffffff00000000 + r[3]
+		//  m = b + (b*ffffffff << 32) + r[3]
+		//  m = b + (((b<<32) - b) << 32) + r[3]
+		//t += m;
+		//  t = t64 + (uint128_t)b + ((((uint128_t)b << 32) - b) << 32) + r[3];
+		t64 += b;
+		t64u = (t64 < b);
+		t64 += r[3];
+		t64u += (t64 < r[3]);
+		{
+			uint64_t lo,hi;
+			//lo = (((b << 32) - b) << 32
+			//hi = (((uint128_t)b << 32) - b) >> 32
+			//but without uint128_t:
+			hi = (b << 32) - b; /* form lower 32 bits of "hi" part 1 */
+			b = (b >> 32) - (/*borrowed above?*/(b << 32) < b); /* upper 32 bits of "hi" are in b */
+			lo = hi << 32;      /* (use "hi" value to calculate "lo",... */
+			t64 += lo;          /* ...consume... */
+			t64u += (t64 < lo); /* ..."lo") */
+			hi >>= 32;          /* form lower 32 bits of "hi" part 2 */
+			hi |= (b << 32);    /* combine lower and upper */
+			t64u += hi;         /* consume "hi" */
+		}
+		//t_hi = (t < m);
+		//  t_hi = 0;
+		//r[3] = (uint64_t)t;
+		r[3] = t64;
+		//t = (t >> 64) | ((uint128_t)t_hi << 64);
+		//  t = t64u;
+
+	r[4] += t64u;
+	return (r[4] < t64u); /* 1 if addition overflowed */
+# endif
+}
+
+static void sp_512to256_mont_reduce_8(sp_digit* r, sp_digit* aa/*, const sp_digit* m, sp_digit mp*/)
+{
+//	const sp_digit* m = p256_mod;
+	int i;
+	uint64_t *a = (void*)aa;
+
+	sp_digit carry = 0;
+	for (i = 0; i < 4; i++) {
+//		mu = a[i];
+		if (sp_256_mul_add_4(a+i /*, m, mu*/)) {
+			int j = i + 4;
+ inc_next_word:
+			if (++j > 7) { /* a[8] array has no more words? */
+				carry++;
+				continue;
+			}
+			if (++a[j] == 0) /* did this overflow too? */
+				goto inc_next_word;
+		}
+	}
+	sp_512to256_mont_shift_8(r, aa);
+	if (carry != 0)
+		sp_256_sub_8_p256_mod(r);
+	sp_256_norm_8(r);
+}
+
+#else /* Generic 32-bit version */
+
 /* Mul a by scalar b and add into r. (r += a * b)
  * a = p256_mod
  * b = r[0]
  */
 static int sp_256_mul_add_8(sp_digit* r /*, const sp_digit* a, sp_digit b*/)
 {
-//	const sp_digit* a = p256_mod;
-//a[7..0] = ffffffff 00000001 00000000 00000000 00000000 ffffffff ffffffff ffffffff
 	sp_digit b = r[0];
-
 	uint64_t t;
 
-//	t = 0;
-//	for (i = 0; i < 8; i++) {
-//		uint32_t t_hi;
-//		uint64_t m = ((uint64_t)b * a[i]) + r[i];
-//		t += m;
-//		t_hi = (t < m);
-//		r[i] = (sp_digit)t;
-//		t = (t >> 32) | ((uint64_t)t_hi << 32);
-//	}
-//	r[8] += (sp_digit)t;
-
+# if 0
+	const sp_digit* a = p256_mod;
+//a[7..0] = ffffffff 00000001 00000000 00000000 00000000 ffffffff ffffffff ffffffff
+	int i;
+	t = 0;
+	for (i = 0; i < 8; i++) {
+		uint32_t t_hi;
+		uint64_t m = ((uint64_t)b * a[i]) + r[i];
+		t += m;
+		t_hi = (t < m);
+		r[i] = (sp_digit)t;
+		t = (t >> 32) | ((uint64_t)t_hi << 32);
+	}
+	r[8] += (sp_digit)t;
+	return (r[8] < (sp_digit)t); /* 1 if addition overflowed */
+# else
 	// Unroll, then optimize the above loop:
 		//uint32_t t_hi;
 		uint64_t m;
@@ -748,7 +886,8 @@ static int sp_256_mul_add_8(sp_digit* r /*, const sp_digit* a, sp_digit b*/)
 		//t_hi = (t < m);
 		//  t_hi = 0;
 		//r[0] = (sp_digit)t;
-		r[0] = 0;
+//		r[0] = 0;
+//the store can be eliminated since caller won't look at lower 256 bits of the result
 		//t = (t >> 32) | ((uint64_t)t_hi << 32);
 		//  t = b;
 
@@ -840,6 +979,7 @@ static int sp_256_mul_add_8(sp_digit* r /*, const sp_digit* a, sp_digit b*/)
 
 	r[8] += (sp_digit)t;
 	return (r[8] < (sp_digit)t); /* 1 if addition overflowed */
+# endif
 }
 
 /* Reduce the number back to 256 bits using Montgomery reduction.
@@ -861,7 +1001,7 @@ static int sp_256_mul_add_8(sp_digit* r /*, const sp_digit* a, sp_digit b*/)
  * Then a multiple of modulus is added to make T divisible by B^2.
  * [In our case, it is (p256_mp_mod * a[1]) << 32.]
  * And so on. Eventually T is divisible by R, and after division by R
- * the algorithm is in the same place as the usual Montgomery reduction was.
+ * the algorithm is in the same place as the usual Montgomery reduction.
  *
  * TODO: Can conditionally use 64-bit (if bit-little-endian arch) logic?
  */
@@ -914,6 +1054,7 @@ static void sp_512to256_mont_reduce_8(sp_digit* r, sp_digit* a/*, const sp_digit
 		sp_256_norm_8(r);
 	}
 }
+#endif
 
 /* Multiply two Montogmery form numbers mod the modulus (prime).
  * (r = a * b mod m)

From vda.linux at googlemail.com  Sun Nov 28 20:43:51 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Sun, 28 Nov 2021 21:43:51 +0100
Subject: [git commit] tls: P256: enable 64-bit version of montgomery reduction
Message-ID: <20211128204212.CC9C390AC1@busybox.osuosl.org>

commit: https://git.busybox.net/busybox/commit/?id=8514b4166d7a9d7720006d852ae67f43baed8ef1
branch: https://git.busybox.net/busybox/commit/?id=refs/heads/master

After more testing, (1) I'm more sure it is indeed correct, and
(2) it is a significant speedup - we do a lot of those multiplications.

function                                             old     new   delta
sp_512to256_mont_reduce_8                            191     223     +32

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 networking/tls_sp_c32.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/networking/tls_sp_c32.c b/networking/tls_sp_c32.c
index b1c410037..cb166e413 100644
--- a/networking/tls_sp_c32.c
+++ b/networking/tls_sp_c32.c
@@ -711,12 +711,13 @@ static void sp_512to256_mont_shift_8(sp_digit* r, sp_digit* a)
 	memcpy(r, a + 8, sizeof(*r) * 8);
 }
 
-// Disabled for now. Seems to work, but ugly and 40 bytes larger on x86-64.
-#if 0 //UNALIGNED_LE_64BIT
+#if UNALIGNED_LE_64BIT
 /* 64-bit little-endian optimized version.
  * See generic 32-bit version below for explanation.
  * The benefit of this version is: even though r[3] calculation is atrocious,
  * we call sp_256_mul_add_4() four times, not 8.
+ * Measured run time improvement of curve_P256_compute_pubkey_and_premaster()
+ * call on x86-64: from ~1500us to ~900us. Code size +32 bytes.
  */
 static int sp_256_mul_add_4(uint64_t *r /*, const uint64_t* a, uint64_t b*/)
 {
@@ -794,18 +795,18 @@ static int sp_256_mul_add_4(uint64_t *r /*, const uint64_t* a, uint64_t b*/)
 		t64u = (t64 < b);
 		t64 += r[3];
 		t64u += (t64 < r[3]);
-		{
-			uint64_t lo,hi;
+		{ // add ((((uint128_t)b << 32) - b) << 32):
+			uint64_t lo, hi;
 			//lo = (((b << 32) - b) << 32
 			//hi = (((uint128_t)b << 32) - b) >> 32
 			//but without uint128_t:
-			hi = (b << 32) - b; /* form lower 32 bits of "hi" part 1 */
+			hi = (b << 32) - b; /* make lower 32 bits of "hi", part 1 */
 			b = (b >> 32) - (/*borrowed above?*/(b << 32) < b); /* upper 32 bits of "hi" are in b */
 			lo = hi << 32;      /* (use "hi" value to calculate "lo",... */
 			t64 += lo;          /* ...consume... */
 			t64u += (t64 < lo); /* ..."lo") */
-			hi >>= 32;          /* form lower 32 bits of "hi" part 2 */
-			hi |= (b << 32);    /* combine lower and upper */
+			hi >>= 32;          /* make lower 32 bits of "hi", part 2 */
+			hi |= (b << 32);    /* combine lower and upper 32 bits */
 			t64u += hi;         /* consume "hi" */
 		}
 		//t_hi = (t < m);

From bugzilla at busybox.net  Mon Nov 29 08:18:13 2021
From: bugzilla at busybox.net (bugzilla at busybox.net)
Date: Mon, 29 Nov 2021 08:18:13 +0000
Subject: [Bug 14401] New: The unzip Binary is from older version which has
 security vulnerabilities.
Message-ID: <bug-14401-161@https.bugs.busybox.net/>

https://bugs.busybox.net/show_bug.cgi?id=14401

            Bug ID: 14401
           Summary: The unzip Binary is from older version which has
                    security vulnerabilities.
           Product: Busybox
           Version: 1.33.x
          Hardware: All
                OS: All
            Status: NEW
          Severity: major
          Priority: P5
         Component: Other
          Assignee: unassigned at busybox.net
          Reporter: sandep121 at gmail.com
                CC: busybox-cvs at busybox.net
  Target Milestone: ---

Current unzip binary has following security issues:
https://nvd.nist.gov/vuln/detail/CVE-2005-0602
https://nvd.nist.gov/vuln/detail/CVE-2001-1268
https://nvd.nist.gov/vuln/detail/CVE-2001-1269

-- 
You are receiving this mail because:
You are on the CC list for the bug.

From vda.linux at googlemail.com  Tue Nov 30 22:41:13 2021
From: vda.linux at googlemail.com (Denys Vlasenko)
Date: Tue, 30 Nov 2021 23:41:13 +0100
Subject: [git commit] Announce 1.33.2
Message-ID: <20211130223754.BABBD92198@busybox.osuosl.org>

commit: https://git.busybox.net/busybox-website/commit/?id=76db5f4e96656f869defbbddf2d572a87daf3ddd
branch: https://git.busybox.net/busybox-website/commit/?id=refs/heads/master

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 news.html | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/news.html b/news.html
index bdffc39..fc1f8fb 100644
--- a/news.html
+++ b/news.html
@@ -34,6 +34,16 @@
     </p>
   </li>
 
+  <li><b>30 November 2021 -- BusyBox 1.33.2 (stable)</b>
+    <p><a href="https://busybox.net/downloads/busybox-1.33.2.tar.bz2">BusyBox 1.33.2</a>.
+    (<a href="https://git.busybox.net/busybox/tree/?h=1_33_stable">git</a>)</p>
+
+    <p>Bug fix release. 1.33.2 has fixes for
+    hush and ash (parsing fixes) and
+    unlzma (fix a case where we could read before beginning of buffer).
+    </p>
+  </li>
+
   <li><b>30 September 2021 -- BusyBox 1.34.1 (stable)</b>
     <p><a href="https://busybox.net/downloads/busybox-1.34.1.tar.bz2">BusyBox 1.34.1</a>.
     (<a href="https://git.busybox.net/busybox/tree/?h=1_34_stable">git</a>)</p>