Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8.0.0 / 46 and 410 don't find duplicates properly Windows 10 #1403

Open
rramstad833 opened this issue Nov 23, 2024 · 1 comment
Open

8.0.0 / 46 and 410 don't find duplicates properly Windows 10 #1403

rramstad833 opened this issue Nov 23, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@rramstad833
Copy link

Bug Description

When doing deduplication, czkawka does not find bit identical files between two file systems on Windows 10.

This seems to have been introduced in the last year or so. I'm a long term user of the program and recently "updated" from 5.1.0 which works fine.

I suspect there's something very wrong in the caching code, as that code has apparently been reworked several times recently from looking at the changelogs.

Steps to reproduce:

Take a folder with a lot of files in it. Make a copy of that folder. Start czkawka and point it at the two folders, with the original folder marked as reference. Deduplicate and delete all files found. Remove all empty directories.

Review and confirm that there are many files left in the copy, though there should be none, as everything was an exact duplicate. Run czkawka again to deduplicate, it will find a few more duplicates, which is clearly wrong. Delete those files.

Review and confirm there are still files left in the copy.

Use a terminal window and file comparison software to recursively compare the original folder with the copy folder, and verify that all of the remaining files in the copy are in fact exactly the same as the original.

Note that that final step proves it's not a file corruption issue -- the copy of the original folder is bit perfect, and using another tool like Cygwin + diff proves that the remaining files are exact copies of the source files -- czkawka should have picked them up.

Terminal output (optional):

<!--
Add terminal output only if needed - if there are some errors or warnings or you have performance/freeze issues.  
Very helpful in this situation will be logs from czkawka run with RUST_LOG environment variable set e.g. 
`RUST_LOG=debug ./czkawka` or `flatpak run --env=RUST_LOG=debug com.github.qarmin.czkawka` if you use flatpak, which will print more detailed info about executed function.
-->

<details>
<summary>Debug log</summary>

# UNCOMMENT DETAILS AND PUT LOGS HERE

</details>

System

  • Czkawka/Krokiet version:
  • OS version:
  • Installation method:
@rramstad833 rramstad833 added the bug Something isn't working label Nov 23, 2024
@rramstad833
Copy link
Author

If you can tell me how to generate logs or whatnot, I'm happy to help with debugging.

For reference, we're talking about 100,000+ files and 300 GB or so in the original folder, so it's a decent amount of data.

I recognize that mine is somewhat a degenerate case i.e. the two folders are supposed to be exactly identical, and I'm simply proving that before I delete the folder copy. I'd say the program finds about 2/3rds of the files to be identical, not all, and the remaining 1/3rd can be proven identical using system utilities for comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant