Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reference to dua-cli in the README as similar tool #14

Merged
merged 2 commits into from
May 29, 2021

Conversation

Byron
Copy link
Contributor

@Byron Byron commented May 29, 2021

As dua is providing both a CLI mode as well as an interactive mode via dua i I placed it into both categories.

Disclaimer: I am the author of this tool and have adapted this paragraph for my own README.

Edit: Now that I got to use pdu a little I finally get to appreciate the way the data is presented. Whereas dua gives a high-level overview and pdu dives in to reveal exactly where the main offenders in terms of disk space usage are. It took me a while and I even wrote my own tool to solve this problem, but finally I can see the benefits of this kind of visualization.

dust never worked for me as it was too slow and…used too much memory, so pdu truly makes a difference here.

Lastly I encourage you to build a TUI which allows the safe deletion of picked items to support the entire workflow people are usually using pdu for.

As `dua` is providing both a CLI mode as well as an interactive mode via `dua i` I placed it into both categories.

Disclaimer: I am the author of this tool and have adapted [this paragraph](https://github.com/Byron/dua-cli/blob/60f432417fe2adbbd54de7293f1c3ffcd45365f7/README.md#L168-L181) for my own README.
Copy link
Owner

@KSXGitHub KSXGitHub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like duplication.

If the main and default interface of dua is CLI, add dua (optional TUI) to the CLI list.
If the main and default interface of dua is TUI, add dua (optional CLI) to the TUI list.

@KSXGitHub
Copy link
Owner

Thanks for your compliment btw 😄. Though I would imagine that pdu uses even more memory than dust (threads and all). Of course, I'm no expert in parallel computing.

@KSXGitHub KSXGitHub merged commit 0382691 into KSXGitHub:master May 29, 2021
@Byron
Copy link
Contributor Author

Byron commented May 29, 2021

Thanks for your compliment btw 😄. Though I would imagine that pdu uses even more memory than dust (threads and all). Of course, I'm no expert in parallel computing.

I compared to dua, here is the result of pdu:

real        10.25
user         2.20
sys         56.06
           146489344  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               13239  page reclaims
                  50  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
              160643  voluntary context switches
              313338  involuntary context switches
        154768495577  instructions retired
        153908010215  cycles elapsed
           146114624  peak memory footprint

And here is the one of dua.

real        10.56
user         5.42
sys         26.00
           185794560  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               15260  page reclaims
                  50  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
              207419  voluntary context switches
              355053  involuntary context switches
        196983443197  instructions retired
         90303457631  cycles elapsed
           179029640  peak memory footprint

So, erm, it uses less than a dua which certainly does less, too 🤦‍♂️. It's a bit surprising to me, as dua doesn't have any much state on its own in this mode (using jwalk.

@KSXGitHub
Copy link
Owner

Regarding this line, I see that you intend to push items to aggregates? My advice is "beware of reallocation (vec resizing)", I would attempt to give a size hint (Vec::with_capacity) or switch to another container format.

Lastly I encourage you to build a TUI which allows the safe deletion of picked items to support the entire workflow people are usually using pdu for.

I'm not sure if lazy me is willing to do that. Even the relatively simple CLI doesn't yet have any integration tests of its own because the complexity of setting the environment alone is too much for my lazy ass. Testing a TUI would be nigh-impossible. How could one guarantee the stability of the software without tests?

Finally, would you mind if I add dua to the list of benchmark?

@Byron
Copy link
Contributor Author

Byron commented May 29, 2021

Regarding this line, I see that you intend to push items to aggregates?

I believe this is just the top-level that will be listed later - it's not huge and won't show up in any profile.

I'm not sure if lazy me is willing to do that.

Indeed it would be quite some work. If lines-of-code count is any measure, the 4k lines of PDU would certainly go up quite a bit. dua clocks in at 3.4k LOCs for everything Rust, so the ways of pdu seem more complex which probably translates to the TUI as well.
I am lazy too, but once I want something bad enough, I will do it :D. With dua this problem I was having of clearing disk space is solved and it isn't clear that a pdu TUI would do so much better to make it worth the effort.

How could one guarantee the stability of the software without tests?

It's actually straightforward. dua itself is an event-driven engine that has events manipulate its state. This is perfectly testable. The rendering is done with tui which supports a test-backend that can be used with snapshot testing. That way one can unit-test both the application state and the looks perfectly. dua doesn't test the looks, it's something I do visually only. Furthermore it only tests the most important happy-paths of typical user journeys. The latter were written after I had something working which was easy enough to not require unit tests solely to protect against regression.

Finally, would you mind if I add dua to the list of benchmark?

Not at all - it would be nice if you could ping me here as I am curious about the findings. Basically you would be comparing the pdu engine with jwalk which dua relies on.

@Byron
Copy link
Contributor Author

Byron commented May 29, 2021

Screenshot 2021-05-29 at 11 35 31

So…I couldn't help myself but to imagine how a TUI could work. But let's start with a question: Is there a way to increase the contrast of these percentage lines? It's so hard to make out on certain levels - probably my main sore point with pdu right now.

The reason I keep thinking about workflow here is what I am usually doing with that data: I want to delete some of it. Even though a full-blown TUI with selection and subsequent (potentially parallel) deletion would be great, maybe there is a way to output the list of paths that it displayed before in a format that makes copy-pasting for deletion easier.

In the end, the user needs complete paths and pdu could provide them. I could imagine running it once for visualization, and another time to get a flat list of paths for removing manually copied ones.

@Byron
Copy link
Contributor Author

Byron commented May 29, 2021

You see, I really like pdu because its automating a part of what dua i allows doing: 1) find a list of candidates for deletion, and 2) delete them. 1) is done better in pdu and I have a feeling that with a little tuning it could replace some uses of dua for me.

Right now I would probably run it before dua i to get an idea, and then use dua i to queue the offenders for deletion. The suggestion above with outputting a flat (and parseable) list of items would definitely help automating these tasks.

Maybe something like pdu --list-level 2 | dua i --queue-pdu-list could use the existing TUI of dua to schedule folders on level 2 of the directory tree for deletion. The user would then have to dequeue the ones they don't want to delete.

@KSXGitHub
Copy link
Owner

it isn't clear that a pdu TUI would do so much better to make it worth the effort

Correction: pdu's UI isn't original, I stole it from dust, which in turn stole it from dutree.

I'm probably not going to implement an interactive TUI for deleting files in the near future. However, parallel-disk-usage is also a library crate, the data structures and algorithms for aggregating and visualizing the directory tree are already there. Anyone who wants this feature bad enough could build a tool on top of this library. I am also interested in seeing how it could be done.

Is there a way to increase the contrast of these percentage lines? It's so hard to make out on certain levels - probably my main sore point with pdu right now.

That's the problem of your terminal and/or your fonts.

I use Tilix (which uses VTE under the hood) as my main terminal, and Hack Nerd Font as my font. Here is the screnshot:

Tilix's screenshot

I also test the same command on Alacritty, and it looks not as good:

Alacritty's screenshot

@KSXGitHub
Copy link
Owner

Right now I would probably run it before dua i to get an idea, and then use dua i to queue the offenders for deletion. The suggestion above with outputting a flat (and parseable) list of items would definitely help automating these tasks.

Maybe something like pdu --list-level 2 | dua i --queue-pdu-list could use the existing TUI of dua to schedule folders on level 2 of the directory tree for deletion. The user would then have to dequeue the ones they don't want to delete.

GNU's du can already create a flat, machine-readable list of items. I also plan to add JSON input/output to pdu in the future (it wouldn't be flat however).

@Byron
Copy link
Contributor Author

Byron commented May 29, 2021

That's the problem of your terminal and/or your fonts.

Thanks for the hint, I will see how I can get alacritty to display this better then and change the font.

I also plan to add JSON input/output to pdu in the future (it wouldn't be flat however).

Neat, that would be working fine as well as I would implement this specifically to be able to use pdu as part of the processing pipeline. I subscribed to releases to stay informed :).

@KSXGitHub
Copy link
Owner

I also plan to add JSON input/output to pdu in the future (it wouldn't be flat however).

Neat, that would be working fine as well as I would implement this specifically to be able to use pdu as part of the processing pipeline. I subscribed to releases to stay informed :).

Subscribe to #17 and #18 as well.

@KSXGitHub
Copy link
Owner

KSXGitHub commented May 29, 2021

Finally, would you mind if I add dua to the list of benchmark?

Not at all - it would be nice if you could ping me here as I am curious about the findings. Basically you would be comparing the pdu engine with jwalk which dua relies on.

I am happy to inform you that the benchmark reports is now available as a release artifact

@Byron
Copy link
Contributor Author

Byron commented May 29, 2021

Congratulations, it's amazing to see there is still performance to be gained in this field.

To me the sub-second runs don't matter that much, but for bigger trees this really starts to show and a couple of milliseconds become seconds.
On my test-set with 1.44 million files dua takes 12.2s whereas pdu takes 10.3, quite significant.

I took a look at what it would mean to use pdu as engine and noticed this would pull in additional dependencies like clap and thus increase the compile time. To fix this in the current setup, cargo features could be used.

In the meantime I will be waiting for the JSON export feature to land which would allow me to use the greater speeds of figuring out good candidates for deletion with the actual deletion TUI of dua :D.

@KSXGitHub
Copy link
Owner

I took a look at what it would mean to use pdu as engine and noticed this would pull in additional dependencies like clap and thus increase the compile time. To fix this in the current setup, cargo features could be used.

Sound advice. I will be implementing this soon.

@KSXGitHub
Copy link
Owner

KSXGitHub commented May 29, 2021

I took a look at what it would mean to use pdu as engine and noticed this would pull in additional dependencies like clap and thus increase the compile time. To fix this in the current setup, cargo features could be used.

In version 0.2.0 (which may or may yet be released), the CLI part of parallel-disk-usage library and its dependencies (clap, structopt, etc.) can now be disabled by disabling default features.

@KSXGitHub
Copy link
Owner

KSXGitHub commented Jun 3, 2021

Version 0.3.0 has been released. It can now print disk usage data as JSON to stdout as well as visualizing input JSON from stdin. A new benchmark report with the latest version of dua has also been produced.

@Byron
Copy link
Contributor Author

Byron commented Jun 4, 2021

Thanks a lot! I have added an issue to hopefully one day implement a pdu integration.

On another note, can it be that the picture in the README right below the list of program versions used to create it is out of date? The benchmark has run multiple times by now yet the last modified date of it appears as Sun, 30 May 2021 04:24:53 GMT (produced with xh HEAD https://camo.githubusercontent.com/8a2a9497f22a5d1879128c069cfdb8c1679a7f8b620f5077d602b0170c7b5d11/68747470733a2f2f6b73786769746875622e6769746875622e696f2f706172616c6c656c2d6469736b2d75736167652d302e322e342d62656e63686d61726b732f746d702e62656e63686d61726b2d7265706f72742e636f6d706574696e672e626c6b73697a652e737667).

By now I reproduced part of the benchmark run and am curious about what's happening on the CI runner.

Note that both pdu and dua are extremely close and in theory, pdu could be a little faster but now suffers from the same M1 problem that dua would suffer from had I not compiled in a different amount of default threads on Apple Silicon. So dua uses 4 threads whereas pdu uses 8.

hyperfine 'dua --apparent-size tmp.sample' 'pdu tmp.sample' 'du tmp.sample'

Benchmark #1: dua --apparent-size tmp.sample
  Time (mean ± σ):      76.4 ms ±   7.1 ms    [User: 105.4 ms, System: 257.2 ms]
  Range (min … max):    73.0 ms … 111.1 ms    26 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (111.1 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark #2: pdu tmp.sample
  Time (mean ± σ):      83.5 ms ±   3.9 ms    [User: 81.3 ms, System: 515.8 ms]
  Range (min … max):    81.2 ms … 103.0 ms    28 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (103.0 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark #3: du tmp.sample
  Time (mean ± σ):     152.9 ms ±   1.5 ms    [User: 11.6 ms, System: 140.7 ms]
  Range (min … max):   150.0 ms … 156.1 ms    19 runs

Summary
  'dua --apparent-size tmp.sample' ran
    1.09 ± 0.11 times faster than 'pdu tmp.sample'
    2.00 ± 0.19 times faster than 'du tmp.sample'

It's interesting how fast du is given that it uses way less system resources, making it the definitive winner per Watt :D.

As for the reason that on CI the world looks different, the only explanation I could pull out of thin air is that hyperfine is run individually on each program whereas it could possibly also be used to produce all output to generate the report from.

@KSXGitHub
Copy link
Owner

As for the reason that on CI the world looks different, the only explanation I could pull out of thin air is that hyperfine is run individually on each program whereas it could possibly also be used to produce all output to generate the report from.

I think it's actually about the way you invoke hyperfine:

hyperfine 'dua --apparent-size tmp.sample' 'pdu tmp.sample' 'du tmp.sample'

There are also warnings:

Warning: The first benchmarking run for this command was significantly slower than the rest (111.1 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
Warning: The first benchmarking run for this command was significantly slower than the rest (103.0 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

In the GitHub Workflow files, I always add --warmup=3 to every hyperfine command.

If you also want to also measure cold start, I suggest rebooting after each benchmark.

@KSXGitHub
Copy link
Owner

On another note, can it be that the picture in the README right below the list of program versions used to create it is out of date?

Yes, I have yet to update the benchmark section of the README. But it doesn't actually matter, because the dua's performance doesn't change much in the benchmark performed by pdu 0.3.0 (direct link to the benchmark reports).

There's also 0.4.0, which I have yet checked out.

@Byron
Copy link
Contributor Author

Byron commented Jun 5, 2021

I think it's actually about the way you invoke hyperfine:

It's the lazy way of invoking it, admittedly. Ultimately it's hyperfine who runs the programs hundreds of times to get comparable values still.

If you also want to also measure cold start, I suggest rebooting after each benchmark.

With prepare it's possible to purge the fs cache, on MacOS it would be --prepare purge.

[…] because the dua's performance doesn't change much […]

And that's the last unresolved riddle here. Thus far the arrival of pdu already uncovered a lot of interesting knowledge and as far as I can see also helped fixing a synchronization issue in the pdu progress reporting. From my tests I now that both tools very similar regarding performance and it comes down to milliseconds. dua being consistently slower than single-threaded progams, however, makes no sense to me and I am sure there is more interesting knowledge to be uncovered here.

Please don't get me wrong, to me it matters not who is 'the fastest', but I want to understand what's going on as the benchmark contradicts both my experience and measurements alike.

@KSXGitHub
Copy link
Owner

being consistently slower than single-threaded progams, however, makes no sense to me and I am sure there is more interesting knowledge to be uncovered here.

I am still in disbelief that these fast programs are actually single-threaded.

Please don't get me wrong, to me it matters not who is 'the fastest', but I want to understand what's going on as the benchmark contradicts both my experience and measurements alike.

I didn't intend to make pdu the fastest either. I only wanted a dust with acceptable performance, the fact that it becomes the fastest is unintentional.

@Byron
Copy link
Contributor Author

Byron commented Jun 5, 2021

I am still in disbelief that these fast programs are actually single-threaded.

That's a good point. Last time I tested them on MacOS they were. Maybe that changed. What matters is the version the CI system is using and their threaded-ness should be easy to observe with time.

Just to try one, I downloaded the latest source of ncdu, built it and ran it like this:

➜  ncdu-1.15.1 time ./ncdu  ~/dev
./ncdu ~/dev  0.83s user 17.37s system 56% cpu 32.105 total

It doesn't even saturate a single CPU core, which is quite typical on my system when doing single-threaded filesystem traversals. What matters is the system time (it was stuck in GUI mode for a while longer) so it takes about 18s to traverse what takes dua at its best configuration 10.5s and pdu currently 11.5s due to using 8 threads.

Grepping through the code to look for threading didn't yield results either so I doubt there is a compile flag to turn that on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants