Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance: dua-cli full scan takes way longer than gdu #223

Closed
glowinthedark opened this issue Jan 21, 2024 · 7 comments
Closed

performance: dua-cli full scan takes way longer than gdu #223

glowinthedark opened this issue Jan 21, 2024 · 7 comments
Labels
question Further information is requested

Comments

@glowinthedark
Copy link

glowinthedark commented Jan 21, 2024

Directory scanning with dua i /some/folder takes orders of magnitude longer compared to gdu even when setting -t <some-number-bigger-than-number-of-cpu-cores>.

Didn't do any proper benchmarks, but just an example, while dua shows progress info with number of scanned files around 64k gdu in the same time on the same folder reaches around 300k+ files. dua-cli takes minutes longer to complete a full scan.

The huge speed difference has been observed with APFS (macos), HFS+ (macos), exFat (macos, linux), EXT4 (Linux with both armv7/arm64 and intel cpu's).

Note: dua-cli is still as fast or faster than ncdu, so apparently it's gdu that does some serious optimizations to speed up the scan. On macos APFS gdu full scan takes less time than calling ootb Apple's Finder Get Info on the same folder.

@Byron
Copy link
Owner

Byron commented Jan 21, 2024

Thanks for reporting!

Can you try to use hyperfine and see the impact of the thread-count on performance? Note that I threw in pdu as well as it usually is the fastest way to iterate.

root=<path-to-measure>
hyperfine -N -w1 -M2 "gdu $root" "dua -t1 $root" "dua -t2 $root" "dua -t4 $root" "dua -t8 $root" "pdu $root"

The theory is that dua uses too many threads which can actually hurt performance on MacOS, and I noticed that 3 to 4 threads is usually giving the best performance. Maybe there is a number that is bringing it closer to gdu. Lastly, pdu is typically faster than dua and I'd expect it to be as fast as gdu or faster. Please note that it has flags for thread-counts as well, in case you want to dive deeper if the results are interesting.
Also note that this uses the non-interactive version of dua which uses the same traversal engine under the hood.

@Byron Byron added question Further information is requested help wanted Extra attention is needed labels Jan 21, 2024
@glowinthedark
Copy link
Author

glowinthedark commented Jan 21, 2024

@Byron

hyperfine results

linux arm64 ext4 (772.98 GiB total, HDD)

Click for system details RAM: 4 GB
$ uname -a
Linux iq 6.1.0-rpi7-rpi-2712 #1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24) aarch64 GNU/Linux
$ lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               ARM
  Model name:            Cortex-A76
    Model:               1
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          1
    Stepping:            r4p1
    CPU(s) scaling MHz:  100%
    CPU max MHz:         2400.0000
    CPU min MHz:         1000.0000
    BogoMIPS:            108.00
    Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32

                          atomics fphp asimdhp cpuid asimdrdm lrcpc

                          dcpop asimddp
Summary
  'gdu /media/t12/Music' ran
    1.07 ± 0.01 times faster than 'dua -t2 /media/t12/Music'
    1.13 ± 0.00 times faster than 'dua -t4 /media/t12/Music'
    1.31 ± 0.02 times faster than 'dua -t8 /media/t12/Music'
    1.49 ± 0.01 times faster than 'dua -t1 /media/t12/Music'

macos APFS (78,48 GiB total, built-in SSD)

Click for system details
#uname -a
Darwin NCM38333.local 22.6.0 Darwin Kernel Version 22.6.0: Wed Jul  5 22:21:53 PDT 2023; root:xnu-8796.141.3~6/RELEASE_ARM64_T6020 arm64
  Chip:	Apple M2 Pro
  Total Number of Cores:	12 (8 performance and 4 efficiency)
  Memory:	32 GB
Summary
  dua -t8 ~/projects ran
    1.08 ± 0.00 times faster than pdu ~/projects
    1.30 ± 0.00 times faster than dua -t4 ~/projects
    1.50 ± 0.01 times faster than gdu ~/projects
    2.16 ± 0.00 times faster than dua -t2 ~/projects
    3.94 ± 0.02 times faster than dua -t1 ~/projects

The non-interactive dua mode is performing great, i.e. dua -t8 ~/projects is very fast on APFS.

The slowness is observed with interactive mode with e.g. dua -t8 i ~/projects which takes almost forever. Not sure what would be the hyperfine command for testing interactive mode as I suppose it probably cannot handle tty mode (?)

Byron added a commit that referenced this issue Jan 22, 2024
Previously each time the UI refreshes, every 250ms, it display
entries but also check their metadata to assure they exist.

This could lead to performance loss when the displayed folder
has a lot of entries.
@Byron Byron mentioned this issue Jan 22, 2024
@Byron
Copy link
Owner

Byron commented Jan 22, 2024

Thanks for the measurements, very interesting results!

It's very interesting that gdu manages to be this much faster on Linux, and thread-scaling doesn't seem to do dua much good with -t2 being the best value on a 4-core machine.

On MacOS it scales much better, but the question remains why it's slow in interactive mode.

I have a hunch and implemented a fix in #225, which you are invited to try out. If you'd say that the ~/projects folder as a lot of top-level entries, then my hunch might be true.

Something you could also check is how many threads gdu uses by default - it's entirely unclear to me why it's so much faster on Linux except that maybe it's related to internal inefficiencies during traversal which weigh dua down (see #224). Edit: Maybe it's also related to the HDD being less receptive to the order of traversal or something related to it due to generally higher latencies. Whatever it is that makes it faster on SSD might be what makes it slower on HDD.

PS: I have made a new release with the fix, and would hope it will improve the situation as this is the only guess I had: https://github.com/Byron/dua-cli/releases/tag/v2.27.2 . Should it still not release the handbreaks you'd probably need to instrument a run, but we get there when we get there.

@Byron Byron removed the help wanted Extra attention is needed label Jan 22, 2024
@glowinthedark
Copy link
Author

glowinthedark commented Jan 22, 2024

compiling for apple silicon on macos m2 throws an error while running cargo install dua-cli

error[E0446]: crate-private type `FilesystemScan` in public interface
  --> ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dua-cli-2.27.2/src/interactive/app/state.rs:42:5
   |
27 | pub(crate) struct FilesystemScan {
   | -------------------------------- `FilesystemScan` declared as crate-private
...
42 |     pub scan: Option<FilesystemScan>,
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't leak crate-private type

For more information about this error, try `rustc --explain E0446`.
error: could not compile `dua-cli` (bin "dua") due to previous error
error: failed to compile `dua-cli v2.27.2`, intermediate artifacts can be found at `/var/folders/py/73sb2fsj37xbmtkgw111l07w0000gp/T/cargo-installoMoXeN`.
To reuse those artifacts with a future compilation, set the environment variable `CARGO_TARGET_DIR` to that path.

same error when explicitly checking out the tag (both on macos m2 and linux arm64):

git clone https://github.com/Byron/dua-cli.git && cd dua-cli
git checkout tags/v2.27.2
cargo build --release

Tried the Intel X86 binary from the releases — completes Ok:

/tmp/dua-v2.27.2-x86_64-apple-darwin/dua i ~/projects
Sort mode: size descending  Total disk usage: 149.07 GB  
Processed 1743246 entries in 9.81s 

the original m2-binary (v2.20.1 arm64) still shows scanning apparently even after scanning finished (although the number of entries is not identical) 🤔

Entries: 1 in 0s (472/s)  -> scanning <- 149.07 GB  
Entries: 1743248 in 8.99s

@Byron
Copy link
Owner

Byron commented Jan 22, 2024

compiling for apple silicon on macos m2 throws an error while running cargo install dua-cli

This is fixed now in main, see #226 .

the original m2-binary (v2.20.1 arm64) still shows scanning apparently even after scanning finished (although the number of entries is not identical) 🤔

This typically means that it is indeed still scanning, but all threads are stalled, presumably. I recommend to try again building the latest version. Let's see.

@glowinthedark
Copy link
Author

glowinthedark commented Jan 22, 2024

pulling, building and running latest main now makes dua -t8 i .. finish scanning in about the same time as gdu with just ~2..3 seconds difference on macos m2 (1744024 entries in 22.25s), on linux rpi 5 arm64 8GB RAM scanning a 765GB file system tree on a NVME m2 drive takes roughly equal time as gdu (723.05 GiB Processed 640603 entries in 5.25s), hard to tell the difference

thank you so much for taking the time to look into this — much appreciated! 🙏

@Byron
Copy link
Owner

Byron commented Jan 22, 2024

Thanks so much for letting me know, it's much appreciated, too :).

It's great to hear that the fix did indeed work, and that gdu isn't unconditionally faster anymore :).

Closing, as it sounds like this issue is no more.

@Byron Byron closed this as completed Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants