Proposal for a new applet: strings

Sat Jul 22 17:31:28 UTC 2023

On Sat, 22 Jul 2023 at 15:40, tito <farmatito at tiscali.it> wrote:

> Hi,
>
> I'm not the maintainer so I can say nothing about integration,
> I can just point out things that look strange to me and my limited knowledge.
> When I read that this code is faster vs other code as I'm a curious
> person I just try to see how much faster it is and why as there
> is always something to learn on the busybox mailing list.
> If in my little tests it is not faster then I think I'm entitled
> to ask questions about it as science results should be reproducible.
>
> For simple benchmarking maybe reading a big enough file
> into memory and feeding it to strings in a few 1000 iterations
> should do to avoid bias from hdd/sdd and system load, one shot shows:
>
> ramtmp="$(mktemp -p /dev/shm/)"
>  dd if=vmlinux.o of=$ramtmp
> echo $ramtmp
> /dev/shm/tmp.ll3G2kzKE1
>
> 1) coreutils strings
> time  strings $ramtmp > /dev/null

This is not correct because you are reading a file in tmpfs while the
normal operations do not happen in this way for almost all the cases.
Sometimes in ramfs, usually not. While it makes perfectly sense that
the output will be sent to a tmpfs especially for those devices that
the hdd/sdd/flash is particularly slow. After all, the strings output
is temporary for its nature and IMHO is piped with grep, usually.

>
> of course a few more iterations would give statistically better results.

The suite I provided with benchmark.sh is the answer because with
dropping cache en/disabled check the two most important system states
with all the cases that matter in real life, AFAIK.

> 2)  busybox strings vs  new strings:
>
> for i in $list; do if test -f $i; then  ./Desktop/strings $i > out1.txt; ./Desktop/busybox strings $i > out2.txt; diff -q out1.txt out2.txt; fi; done
> Files out1.txt and out2.txt differ

Confirmed that exists some differences in output with this:

for i in /usr/bin/*; do if test -f $i; then ./strings $i > out1.txt;
busybox strings $i > out2.txt; diff -q out1.txt out2.txt || break; fi;
done

diff -pruN out1.txt out2.txt

Lines particularly long, more than 4096 characters are divided into
blocks with \n. It is clearly a corner case for which \n should be
omitted in printing. Thanks for this test, I did some but I did not
catch the 4096 buffer overrun.

> I suspect this could be a problem for integration  and also size of code after integration is relevant.

It is a corner case that could be addressed. I did not check the size
of strings in busybox. However, once confirmed that the size is more
important than the speed for busybox - I agree on this - then it can
be proposed to binutils (or coreutils) depending on which package is
included. I found the binary version for aarch64 on binutils, AFAIR.

Best regards, R-