[git commit] use_less_ram.html: explain SORT_BY_ALIGNMENT trick

Denys Vlasenko vda.linux at googlemail.com
Sat Apr 23 17:14:05 UTC 2016


commit: https://git.busybox.net/busybox-website/commit/?id=72428d2157555b893c4d1424870cf5d1608a7c7a
branch: https://git.busybox.net/busybox-website/commit/?id=refs/heads/master

Signed-off-by: Denys Vlasenko <vda.linux at googlemail.com>
---
 use_less_ram.html | 48 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 38 insertions(+), 10 deletions(-)

diff --git a/use_less_ram.html b/use_less_ram.html
index da92292..247c3a4 100644
--- a/use_less_ram.html
+++ b/use_less_ram.html
@@ -5,7 +5,7 @@
 <p>
 Busybox is designed to be frugal with memory usage. However, it is still
 written in C. Compiler and linker are usually not written with a focus
-to strongly minimize memory usage. This can be helped, though.
+to strongly minimize memory usage.
 </p>
 
 <h4>Overview of RAM usage by Linux binaries</h4>
@@ -70,6 +70,17 @@ more data+bss pages than necessary.
 It is possible to have a Busybox binary with almost all applets and only
 8 kilobytes of data+bss.
 </p>
+
+<h4>Using libc which is better at using RAM sparingly</h4>
+<p>
+[TODO:] Static builds are better than dynamic ones. Glibc is horrible for static builds,
+though. Ideally, libs should not initialize "[heap]" before user program calls malloc().
+Libc should aggressively prune (return to OS) freed malloc space (glibc defaults are bad).
+Even though we don't need that in Busybox, there should be a method to prune
+used (dirtied) stack space after the use of deep recursion or large on-stack objects.
+</p><p>
+</p>
+
 <h4>Optimizing start of data</h4>
 <p>
 A simple solution for RO/RW mappings for text and data would be to
@@ -107,8 +118,32 @@ and replace it with:
 </p><p>
 <pre>. = ALIGN (0x1000); . = DATA_SEGMENT_ALIGN (0x1000, 0x1000);</pre>
 </p>
+
 <h4>Reducing padding between data</h4>
 <p>
+Busybox is linked with "--sort-section alignment". This helps, but unfortunately
+ld is not doing good enough job. It's possible to help it.
+</p><p>
+Output sections in the executable are specified in linker script as follows:
+</p><p>
+<pre>.rodata: { *(.rodata .rodata.*) }</pre>
+</p><p>
+If in such rules bare section wildcards are replaced with "SORT_BY_ALIGNMENT(wildcard)",
+the result is often more compact.
+In practice, vast majority of sections which benefit from alignment sorting
+are .rodata, .data and .bss ones, thus it's enough to only changle three locations
+in the linker script:
+</p><p>
+<pre>   *(SORT_BY_ALIGNMENT(.rodata) SORT_BY_ALIGNMENT(.rodata.*) {the rest})
+ ...
+   *(SORT_BY_ALIGNMENT(.data) SORT_BY_ALIGNMENT(.data.*) {the rest})
+ ...
+   *(SORT_BY_ALIGNMENT(.bss) SORT_BY_ALIGNMENT(.bss.*) {the rest})</pre>
+</p><p>
+Unfortunately, a more optimal "SORT_BY_ALIGNMENT(.rodata .rodata.*)", which would
+sort .rodata and .rodata.foo sections by alignment as one group, not separately,
+is not a valid syntax.
+</p><p>
 GCC, even with -Os, compiles byte and 16-bit arrays into data object definitions
 which require word alignment (at least 4 bytes). Some versions even require
 32-byte alignment for arrays of more than 32 bytes long. This includes explicitly
@@ -128,16 +163,8 @@ Busybox code is evolving, new arrays constantly pop up, coders forget
 to add ALIGNn on them. You can run "grep -F -B3 '*fill*' busybox_unstripped.map"
 to find all linker-added padding in your binary, and add forgotten ALIGNn's.
 Please send a patch if you do.
-<p>
-<h4>Using libc which is better at using RAM sparingly</h4>
-<p>
-[TODO:] Static builds are better than dynamic ones. Glibc is horrible for static builds,
-though. Ideally, libs should not initialize "[heap]" before user program calls malloc().
-Libc should aggressively prune (return to OS) freed malloc space (glibc defaults are bad).
-Even though we don't need that in Busybox, there should be a method to prune
-used (dirtied) stack space after the use of deep recursion or large on-stack objects.
-</p><p>
 </p>
+
 <h4>Converting bss to data</h4>
 <p>
 Busybox's data and bss sections are small already, some 4-12 kilobytes.
@@ -175,6 +202,7 @@ This has a drawback that on-disk binary contains a few zeroed pages and they
 will need to be read when formerly-bss variables are touched. IOW:
 this has a small speed penalty.
 </p>
+
 <h4>Use space at the end of bss: FEATURE_USE_BSS_TAIL</h4>
 <p>
 The end of bss is usually not page-aligned. There is an unused space in the last page.


More information about the busybox-cvs mailing list