Proposal for a new applet: strings

Sun Jul 23 11:18:09 UTC 2023

On Sun, 23 Jul 2023 12:00:56 +0200
"Roberto A. Foglietta" <roberto.foglietta at gmail.com> wrote:

> On Sun, 23 Jul 2023 at 11:42, tito <farmatito at tiscali.it> wrote:
> >
> > On Sun, 23 Jul 2023 00:36:09 +0200
> > "Roberto A. Foglietta" <roberto.foglietta at gmail.com> wrote:
> >
> > > On Sat, 22 Jul 2023 at 21:29, tito <farmatito at tiscali.it> wrote:
> > > >
> > > > On Sat, 22 Jul 2023 19:31:28 +0200
> > > > "Roberto A. Foglietta" <roberto.foglietta at gmail.com> wrote:
> > > >
> > > > > On Sat, 22 Jul 2023 at 15:40, tito <farmatito at tiscali.it> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm not the maintainer so I can say nothing about integration,
> > > > > > I can just point out things that look strange to me and my limited knowledge.
> > > > > > When I read that this code is faster vs other code as I'm a curious
> > > > > > person I just try to see how much faster it is and why as there
> > > > > > is always something to learn on the busybox mailing list.
> > > > > > If in my little tests it is not faster then I think I'm entitled
> > > > > > to ask questions about it as science results should be reproducible.
> > > > > >
> > > > > > For simple benchmarking maybe reading a big enough file
> > > > > > into memory and feeding it to strings in a few 1000 iterations
> > > > > > should do to avoid bias from hdd/sdd and system load, one shot shows:
> > > > > >
> > > > > > ramtmp="$(mktemp -p /dev/shm/)"
> > > > > >  dd if=vmlinux.o of=$ramtmp
> > > > > > echo $ramtmp
> > > > > > /dev/shm/tmp.ll3G2kzKE1
> > > > > >
> > > > > > 1) coreutils strings
> > > > > > time  strings $ramtmp > /dev/null
> > > > >
> > > > > This is not correct because you are reading a file in tmpfs while the
> > > >
> > > > Yes, this was exactly the purpose of the test to eliminate all
> > > > factors connected to underlying block devices and time
> > > > the speed of code of the different implementations.
> > > >
> > >
> > > Which is wrong because you did a hypothesis which is far away from the
> > > typical usage and in some cases you can even use it because strings
> > > over a 4GB ISO image would not necessarily fit into a tmpfs in every
> > > system. Abstract benchmarks can be funny but do not depict/measure the
> > > reality as usual. Extending this logic, we can trash the Ohm law
> > > because we can reach in the laboratory a near zero temperature!
> >
> > I see but dropping the caches etc doesn't seem to be a typical use case either.
> 
> Dropping the cache is a trick to bring the system in its state after
> the boot or as much as possible at that point. It is indispensable for
> a confrontation with the normal functioning which has a larger
> variance in completion time for each runs.
> 
> >
> > Using the same optimization flag -O3 the busybox applet in a real life
> > system gives close empirical results, which is the results most
> > people in their normal life use cases (one shot, no loops running,
> > no files in memory, no dropped caches, no giant multi-GB files)
> > will see so the performance increase is swallowed by the system
> > or by other bottlenecks.
> >
> 
> This is correct, AFAIK my busybox has been compiled with -02. I have to check.
> 
> 
> > I think the size will rather increase as there are a bunch of features
> > missing that the original bb implementation already has:
> >
> > 1) multiple file handling (a must i would dare to say)
> 
> Which is not such a problem, after all
> 
> for i in "$@"; do simply-strings "$i" | sed -e "s/^/$i:/"; done
> 
> the sed will include also the file name in front of the string which
> is useful for grepping. However, the single-file limitation brings to
> personalize the approach:
> 
> for i in "$@"; do simply-strings "$i" | grep -w "word" && break; done; echo $i

Don't cheat, this change would break other people's scripts.

> For example. However, I admit that you are right about multiple-files
> input. Personally, I do not need at all and if I need, I do with a
> custom for.
> 
> 
> > 2) -a -f -o -n -t command line options
> > The options are:
> >   -a - --all                Scan the entire file, not just the data section [default]
> >   -f --print-file-name      Print the name of the file before each string
> >   -n --bytes=[number]       Locate & print any NUL-terminated sequence of at
> >                                                least [number] characters (default 4).
> >   -t --radix={o,d,x}        Print the location of the string in base 8, 10 or 16
> >   -o                        An alias for --radix=o
> >
> 
> Yes, strings has a lot of options and also busybox have several
> options. This is the best critic about proceeding with an integration.
> I will check if I can put an optimization into bb strings, just for my
> own curiosity.

This would be far better than reinventing the wheel.

> 
> > 3) output compatible with original gnu strings
> >
> > > In attachment the new version with the test suite and the benchmark
> > > suite in the header. The benchmark suite did not change with respect
> > > to the script file I just sent.
> > >
> > > Best regards, R-
> >
> > BTW: there still seem to be corner-cases:
> > list=`find /usr`
> > for i in $list; do if test -f $i; then  ./strings $i > out1.txt; strings $i > out2.txt; diff -q out1.txt out2.txt; fi; done
> > Files out1.txt and out2.txt differ
> > Files out1.txt and out2.txt differ
> > Files out1.txt and out2.txt differ
> > Files out1.txt and out2.txt differ
> >
> > test is still running....
> 
> ok, I will do a run. Can you please echo the finenames, instead?
> 
> for i in $list; do if test -f $i; then  ./strings $i > out1.txt;
> strings $i > out2.txt; diff -q out1.txt out2.txt >/dev/null || echo
> $i; fi; done
> 
> Thanks, R-

if you hire me as beta tester....at least you own me a beer if we ever met in person.

root at devuan:/home/tito/Desktop# for i in $list; do if test -f $i; then  ./strings $i > out1.txt; strings $i > out2.txt; diff -q out1.txt out2.txt; if test $? -eq 1 ; then echo $i; fi; fi; done
Files out1.txt and out2.txt differ
/usr/share/themes/Adapta-Nokto-Eta/gtk-3.24/gtk.gresource
Files out1.txt and out2.txt differ
/usr/share/themes/Adapta/gtk-3.24/gtk.gresource
Files out1.txt and out2.txt differ
/usr/share/themes/Adapta-Nokto/gtk-3.24/gtk.gresource
Files out1.txt and out2.txt differ
/usr/share/themes/Adapta-Eta/gtk-3.24/gtk.gresource
Files out1.txt and out2.txt differ
/usr/lib/x86_64-linux-gnu/libkomsooxml.so.17.0.0
Files out1.txt and out2.txt differ
/usr/lib/x86_64-linux-gnu/libkomsooxml.so.17

Ciao,
Tito