GNU Coreutils feature request: sort by human-readable disk sizes

GNU logo bold

The GNU Core Utilities are the basic file, shell and text manipulation utilities of the GNU operating system. These are the core utilities which are expected to exist on every operating system.

This post is an elaboration on a feature request I recently sent to the GNU Coreutils mailing list.

I’ve been playing with backups a bit lately, and my favorite command line tool in that effort (other than rsync of course) has been du—usually known for its annoying recursive and verbose output—but which I just learned to tame with the --max-depth=1 option, e.g.

$ du -h --max-depth=1
4.0K    ./bin
52M     ./code
765M    ./Desktop
12K     ./mail
17M     ./music
124K    ./PDF
5.5G    ./photos
153M    ./public_html
652K    ./ufc
7.3G    .

Perfect! Helps me quickly home in on which directories to trim, and which aren’t worth the bother. The -h “print[s] sizes in human readable format (e.g., 1K 234M 2G)” but as the list of directories gets longer, it gets harder to differentiate the 153Ms from the 652Ks.

If I don’t use -h, I can pipe the du output through sort -n, but human-readable sizes are so nice. So I proposed to the coreutils group that sort be extended with an option for sorting by human-readable disk size values. Using the example above, the hypothetical usage and output would look like this:

$ du -h --max-depth=1 | sort -h
4.0K    ./bin
12K     ./mail
124K    ./PDF
652K    ./ufc
17M     ./music
52M     ./code
153M    ./public_html
765M    ./Desktop
5.5G    ./photos
7.3G    .

Perhaps it’s time to refresh my knowledge of C?

12 Comments

Agreed. In the mean time, I’ve been using KDirStat (http://kdirstat.sourceforge.net/) to help clean up my disks.

try:

ruby -e 'puts `du --si -s #{ARGV.join(%q( ))}`.sort_by{|foo|foo=~/(\d+\.?\d*)(\w)/;[$2,1/$1.to_f]}'  FILEPATTERN

quoted up for bash to use in an alias:

alias dus='ruby -e "puts \`du --si -s #{ARGV.join(%q( ))}\`.sort_by{|foo|foo=~/(\d+\.?\d*)(\w)/;[\$2,1/\$1.to_f]}"'

whoa. your comment box hates code.

http://pastie.caboo.se/100617

Try that.

Also, I noticed you used -h instead of --si

--si gives you lowercase k, which makes the size prefixes alpha sortable to actual size.

Corey, the command line force is strong in you. I was going for simple. :) And somehow getting the GNU folks to update sort.c seemed simpler than a ruby command line alias. Good news is, my email seems to have gotten some traction.

zen

I’ve been wanting sort to do this for ages. It can sort “dictionary”, “numeric”, and by “month name”, but not by “human readable”? :) Today I finally wrote a simple sort script in perl to handle the output of “du -h”.

fyi… your “hypothetical output” of “sort -h” isn’t quite right. ;)

zen, how embarrassing! this has been on the web for 5 months and nobody noticed my mistake until now. thanks for the heads up, it’s fixed now.

simpz

Did this make it to a future release plan. Haven’t seen any objections expressed.

Steve

I’ve wanted this feature for ages. In fact, a few versions ago (coreutils 5.18 or so maybe) I had found a patch to add the switch, but it broke on a more recent update. I wonder what held it up?

Peter

Would be great to have this built in!

schmildo

Yeah i really want this option too. But maybe a sort config file is a way to prevent future needless feature requests. That way any new ordering types can be easily added, and people can more flexibly modify the output.
As an example you could sort data based on groups of your own pre-defined priority. Or colours by the order they appear in the rainbow without complex scripting. Additionally, human readable size abbreviations vary, Kb=KiB,Mb=MiB,Gb=GiB etc. If a task is common, I should not have to use an excessive amount of piping and recursive queries.

mika

With newer coreutils you can simply use:
du -h –max-depth=1 | sort -h

just as your hypothetical example looks like.

Name

Email (optional)

Blog (optional)