A smaller µlfs (2023-04-21) - jlxip's blog


Yesterday I published µlfs [1]. While I think it's great and can't be made much simpler without losing a lot of functionality, the xz-compressed size of almost 50 MB doesn't really convey much confidence of its simplicity.

The elephant in the room is the modules. There's 20 MB of them. Even worse, the modules are present twice in the filesystem: once in the root, and another time in the initramfs. As the initramfs is gzip-compressed, compressing the whole img file later cannot reconcile these two copies, so the size is practically doubled for no reason.

A naive approach to reduce the size of the distro would be to just take some modules out. However, randomly discarding drivers for a distro which is supposed to be shared is a terrible idea; and cherrypicking which ones are thrown away and which ones are kept is very time consuming and requires a lot of research.

The thing is that this problem is unsolvable. You should always try to keep as many (free, at least) drivers as you can, or you're risking some user not being able to connect to the internet to download the missing ones, or even boot!

So the only real approach is to boot the distro in the environment it's supposed to be ran, enumerate the used modules, and remove the rest. Assuming no new hardware will be plugged in, which is generally the case for VMs, then everything works fine and you're shaving most of the bloat off.

For this purpose I wrote µlfs_runtime_module_minimization.sh [2]. This post will explain how it works, by the time you're finished you'll see how I got those 50 MB down to just 14 MB.

The process

When minimizing VM images, including raw ones (such as the img we're dealing with), the trick is to use zerofree [3], a tool that sort of defragments the image by zeroing unused blocks, which makes them compressible. Thus, the first step, at line 7, is to check zerofree is installed. "which"ing it is enough, since, at the shebang at line 1, bash is invoked with "-eu". For those who don't know:

- The "-e" flag exits if any of the orders exit with a non-zero exit code. This is often a good practice since the script doesn't keep running after reaching an invalid state, possibly messing something up

- The "-u" flag exits if an undefined variable is referenced. This is not mandatory nor important as "-e", but helps finding typos in variable names. You really don't want to "rm -rf /$TYPO"

Then, the image is transformed into a loop device and mounted. At line 17, the script drops another script onto the root of the filesystem. This script is what must be ran by the user later. It can't be made to run automatically, since it's possible that not all modules are loaded when it runs at boot. I'd recommend running it right after login as: sh -x /rmm.sh

This inner script gets the used modules from /proc/modules. It gets each one's name, and then passes them one by one to modinfo, which returns some info about them, including the path from where they were loaded in the first place. The line 20 just removes the beginning "filename: " of each line.

Then, all modules are listed at line 24, using just "find" at the root fs. The substraction is done at line 28 using "comm", a coreutil. Note that both "used" and "all" are sorted before being written. This is mandatory for comm [4].

The unused modules are then removed from the root fs at line 31, and, just for cleaning up, the directories that are left empty are removed too.

The initramfs is decompressed and extracted to /root/initramfs. The paths are then prepended with the path of extraction, and then the files are removed. The empty directories are removed once again, and the new initramfs is packed and compressed to its final directory.

Finally, the temporal txt files, the extraction directory, and the very same script are removed in order to leave a clean hierarchy.

The inner script is made executable, and the filesystem is unmounted. The script now wait for the user to execute rmm.sh from inside of the VM.

When they're done, the loop device is mounted once again in order to have the partition block device accesible. Zerofree is ran, and everything is unmounted again.

At this point, the hdd.img file is minimized to its bare minimum amount of modules. Compress it with: xz -k -T`nproc` -v hdd.img

Bonus: Since the initramfs is now way smaller, the whole distro boots much faster!

Thanks for reading.

-- jlxip


[1] https://jlxip.net/blog/entries/1
[2] https://gist.github.com/jlxip/b7f609ff31849f7d4ae7be10485f2901
[3] https://manpages.ubuntu.com/manpages/xenial/man8/zerofree.8.html
[4] https://unix.stackexchange.com/a/443579/258686