-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file count #31
Add file count #31
Conversation
This is a rudimentary implementation, and just outputs the total number of files found as well. It includes directories in the count as well, and is very naive in its reporting in that sense.
Also should note that this might be a breaking change in the sense that the output has changed, and any clients parsing using that output will likely break. Maybe its better to add this as a flag to maintain that? Wasn't sure if the API had been marked as stable yet. |
Thank you very much for your contribution (sorry for the late reply)! I'd like to keep diskus simple, but this seems like a reasonable addition. Before we merge this, we should
Thank you for mentioning it. I wouldn't be worried about breaking changes in this respect. But now that you mention it, it might make sense to add something like a |
Ive added the change to split files and directories. I'll work on running the Also looks like there's been some restructuring so I'll resolve the conflicts here as well in short order. Let me know if you see any issues with the implementation logic here as well though. |
Unfortunately, no :-( |
No worries, I'll write up a bash script for it as well. Thanks! |
So I'm not exactly sure how to validate these results. Warm Cache: hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' '~/workspace/diskus/target/release/diskus' 'du -sh' 'sn p -d0 -j8' 'dust -d0'
Benchmark #1: ~/workspace/diskus/target/release/diskus
Time (mean ± σ): 286.1 ms ± 56.0 ms [User: 195.6 ms, System: 418.6 ms]
Range (min … max): 226.0 ms … 365.3 ms 10 runs
Benchmark #2: du -sh
Time (mean ± σ): 1.854 s ± 0.095 s [User: 128.1 ms, System: 540.7 ms]
Range (min … max): 1.661 s … 2.001 s 10 runs
Benchmark #3: sn p -d0 -j8
Time (mean ± σ): 292.2 ms ± 34.0 ms [User: 74.2 ms, System: 340.6 ms]
Range (min … max): 250.5 ms … 355.2 ms 10 runs
Benchmark #4: dust -d0
Time (mean ± σ): 2.006 s ± 0.093 s [User: 267.7 ms, System: 537.7 ms]
Range (min … max): 1.927 s … 2.200 s 10 runs
Summary
'~/workspace/diskus/target/release/diskus' ran
1.02 ± 0.23 times faster than 'sn p -d0 -j8'
6.48 ± 1.31 times faster than 'du -sh'
7.01 ± 1.41 times faster than 'dust -d0' Cold Cache: hyperfine --warmup 5 '~/workspace/diskus/target/release/diskus' 'du -sh' 'sn p -d0 -j8' 'dust -d0'
Benchmark #1: ~/workspace/diskus/target/release/diskus
Time (mean ± σ): 65.1 ms ± 7.4 ms [User: 81.1 ms, System: 138.9 ms]
Range (min … max): 53.4 ms … 87.8 ms 44 runs
Benchmark #2: du -sh
Time (mean ± σ): 129.6 ms ± 3.7 ms [User: 39.9 ms, System: 88.6 ms]
Range (min … max): 127.2 ms … 145.1 ms 22 runs
Benchmark #3: sn p -d0 -j8
Time (mean ± σ): 42.1 ms ± 2.2 ms [User: 42.9 ms, System: 108.0 ms]
Range (min … max): 39.7 ms … 52.5 ms 67 runs
Benchmark #4: dust -d0
Time (mean ± σ): 249.8 ms ± 52.6 ms [User: 131.6 ms, System: 114.5 ms]
Range (min … max): 176.4 ms … 348.3 ms 16 runs
Summary
'sn p -d0 -j8' ran
1.55 ± 0.19 times faster than '~/workspace/diskus/target/release/diskus'
3.08 ± 0.18 times faster than 'du -sh'
5.94 ± 1.29 times faster than 'dust -d0' However, comparisons to the version installed via cargo look comparable: hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' '~/workspace/diskus/target/release/diskus' 'diskus'
Benchmark #1: ~/workspace/diskus/target/release/diskus
Time (mean ± σ): 255.0 ms ± 17.1 ms [User: 164.1 ms, System: 375.7 ms]
Range (min … max): 232.3 ms … 279.5 ms 10 runs
Benchmark #2: diskus
Time (mean ± σ): 271.3 ms ± 26.4 ms [User: 178.7 ms, System: 413.6 ms]
Range (min … max): 240.4 ms … 328.9 ms 10 runs
Summary
'~/workspace/diskus/target/release/diskus' ran
1.06 ± 0.13 times faster than 'diskus' And hyperfine --warmup 5 '~/workspace/diskus/target/release/diskus' 'diskus'
Benchmark #1: ~/workspace/diskus/target/release/diskus
Time (mean ± σ): 65.6 ms ± 7.6 ms [User: 84.7 ms, System: 137.3 ms]
Range (min … max): 55.3 ms … 81.2 ms 36 runs
Benchmark #2: diskus
Time (mean ± σ): 68.3 ms ± 8.1 ms [User: 83.0 ms, System: 149.0 ms]
Range (min … max): 57.0 ms … 93.1 ms 43 runs
Summary
'~/workspace/diskus/target/release/diskus' ran
1.04 ± 0.17 times faster than 'diskus' |
Also, for posterity, here's the script I used to generate the test directory: #!/bin/bash -e
# Generates a rather naive testing directory
# Aims for 400,400 files totalling 15GB
# Nested directories totalling 100,100 (plus one for the top dir.)
BASE_DIR=$1
# 15GB/400,000 ~= 370KB
FILE_SIZE_IN_KB=370
for i in {1..100}; do
mkdir $BASE_DIR/$i
for f in {1..4}; do
head -c ${FILE_SIZE_IN_KB}K </dev/urandom > $BASE_DIR/$i/$f.txt
done;
for j in {1..100}; do
mkdir $BASE_DIR/$i/$j
for f in {1..4}; do
head -c ${FILE_SIZE_IN_KB}K </dev/urandom > $BASE_DIR/$i/$j/$f.txt
done;
done;
done; |
Which version of
I'm pretty sure that's the version I used for the benchmark (which I updated recently). I should have documented the exact versions. It might also be your disk / machine or the test directory. I can try to run the benchmark again with your test directory. |
Ah I have a newer version: bash$ sn -V
The Tin Summer 1.21.8 I'm not actually able to install |
Realized there were bin releases of it. Even with the old version I still see better performance with
hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' '~/workspace/diskus/target/release/diskus' 'du -sh' '/home/torme/extracts/sn p -d0 -j8' 'dust -d0'
Benchmark #1: ~/workspace/diskus/target/release/diskus
Time (mean ± σ): 252.6 ms ± 11.3 ms [User: 155.4 ms, System: 373.6 ms]
Range (min … max): 237.0 ms … 272.0 ms 10 runs
Benchmark #2: du -sh
Time (mean ± σ): 1.814 s ± 0.176 s [User: 116.3 ms, System: 478.4 ms]
Range (min … max): 1.605 s … 2.065 s 10 runs
Benchmark #3: /home/torme/extracts/sn p -d0 -j8
Time (mean ± σ): 332.9 ms ± 41.9 ms [User: 98.3 ms, System: 342.0 ms]
Range (min … max): 303.2 ms … 442.4 ms 10 runs
Benchmark #4: dust -d0
Time (mean ± σ): 1.736 s ± 0.101 s [User: 191.5 ms, System: 434.2 ms]
Range (min … max): 1.587 s … 1.915 s 10 runs
Summary
'~/workspace/diskus/target/release/diskus' ran
1.32 ± 0.18 times faster than '/home/torme/extracts/sn p -d0 -j8'
6.87 ± 0.51 times faster than 'dust -d0'
7.18 ± 0.77 times faster than 'du -sh' hyperfine --warmup 5 '~/workspace/diskus/target/release/diskus' 'du -sh' '/home/torme/extracts/sn p -d0 -j8' 'dust -d0'
Benchmark #1: ~/workspace/diskus/target/release/diskus
Time (mean ± σ): 65.1 ms ± 6.3 ms [User: 80.7 ms, System: 141.9 ms]
Range (min … max): 54.2 ms … 77.7 ms 38 runs
Benchmark #2: du -sh
Time (mean ± σ): 128.0 ms ± 2.8 ms [User: 45.1 ms, System: 82.0 ms]
Range (min … max): 125.6 ms … 136.8 ms 23 runs
Benchmark #3: /home/torme/extracts/sn p -d0 -j8
Time (mean ± σ): 51.5 ms ± 5.3 ms [User: 70.3 ms, System: 115.0 ms]
Range (min … max): 45.5 ms … 67.2 ms 53 runs
Benchmark #4: dust -d0
Time (mean ± σ): 175.2 ms ± 3.9 ms [User: 85.0 ms, System: 89.0 ms]
Range (min … max): 171.6 ms … 187.1 ms 17 runs
Summary
'/home/torme/extracts/sn p -d0 -j8' ran
1.26 ± 0.18 times faster than '~/workspace/diskus/target/release/diskus'
2.48 ± 0.26 times faster than 'du -sh'
3.40 ± 0.36 times faster than 'dust -d0' |
Today, I had a chance to look into this. There are many things to consider:
cold cache:
warm cache:
If I optimize the number of threads for
In any case, the thing we are really interested in is comparing diskus-master against this feature branch. On the test directory, I get: cold:
warm:
For the Dropbox folder, I get: cold:
warm:
In conclusion: there is no statistically significant difference between the two, which is great! In fact, we can even run the recently added t-test script from the hyperfine repo, which tells us in both cases: The two benchmarks are almost the same (p >= 0.05). |
This adds better clarity to the return type instead of returning a tuple of three unidentified u64 values.
Digging through my old PR's, are you just waiting on some additional testing for this? |
Yes. I would really like this to be a bit more properly tested before we merge this. If I execute
It probably counts the current directory ( Minor: If there would be one directory, it would be great if it would print "1 directory". Same for files. (Coming to think of it, the same could probably happen for the Next, symlinks are completely ignored. They are reported as neither files or directories. That's technically correct, but probably unexpected as well. The same is true for other filetypes like sockets or pipes. I think we should probably count everything which is not a directory as a "file-like entry". In the output, we could probably still call it "file". |
This is a rudimentary implementation, and just outputs the total number of files found as well. It includes directories in the count as well, and is very naive in its reporting in that sense. We leverage diskus for reporting and this metric is helpful for us.
Haven't submitted many PRs before, so let me know if you need other things added.