Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSS-Fuzz: OSS-Fuzz fuzzing integration #385

Merged
merged 5 commits into from
Nov 4, 2024

Conversation

arthurscchan
Copy link
Contributor

Hi! Would you be interested in setting up fuzzing for the tar-rs module via OSS-Fuzz?

Fuzzing is essentially a stress-testing approach used to find bugs in software, and OSS-Fuzz is a free service run by Google that continuously fuzzes important open-source projects. Integrating your module with OSS-Fuzz could help uncover memory corruption issues that may exist.

This PR adds a Cargo fuzz configuration along with 3 fuzzers for the tar-rs module. In combination with an initial attempt in OSS-Fuzz (google/oss-fuzz#12645), it enables OSS-Fuzz to fuzz the tar-rs module while keeping the fuzzers upstream for further modification and expansion. If you're happy to proceed with the integration and store the fuzzers upstream, please let me know, and I'd be glad to provide more details if needed.

The only thing required at this point is an email associated with a Google account, which will be used to receive notifications when bugs are found.

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks! I only took a start of a review on this but I love the idea.

fuzz/fuzz_targets/archive.rs Outdated Show resolved Hide resolved
fuzz/fuzz_targets/builder.rs Outdated Show resolved Hide resolved
fuzz/fuzz_targets/archive.rs Outdated Show resolved Hide resolved
fuzz/fuzz_targets/archive.rs Outdated Show resolved Hide resolved
fuzz/fuzz_targets/archive.rs Outdated Show resolved Hide resolved
fuzz/fuzz_targets/archive.rs Outdated Show resolved Hide resolved
@cgwalters
Copy link
Collaborator

The only thing required at this point is an email associated with a Google account, which will be used to receive notifications when bugs are found.

My gmail is [email protected], the other most active maintainers are @xzfc and @alexcrichton so let's see if they have and want to provide gmail accounts too.

(But that said...why can't oss-fuzz learn to e.g. file security issues on GH?)

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking changes requested to track review state above

Signed-off-by: Arthur Chan <[email protected]>
@alexcrichton
Copy link
Owner

Personally I'm all for more fuzzing. I'm cc'd on the wasmtime project on oss-fuzz and we use it to great benefit there. In that sense 👍 from me.

On the specifics of the fuzzers here one thing I might recommend is to more heavily use the arbitrary crate to make parsing the input data easier. I'd also be a bit worried about the coverage of these fuzzers, for example the tar.rs fuzzer looks more like a unit test than a fuzzer since the data is just written to the filesystem. For builder.rs that also looks more like a unit tests since the set of things that can happen in the fuzzer is so small (more-or-less only looking at the first byte). The archive.rs fuzzer looks like it's got some interesting bits going on but it's a bit hard to read/follow I find and I think that arbitrary would help clean it up.

Overall though I think the idea of fuzzing is good, but I think it might be best to shore up what's being fuzzed here. For example this might be a good candidate for differential fuzzing against a different tar library or similar (e.g. linking to libtar in C if that's reasonably easy to do). Either that or perhaps fuzzing things like assigning arbitrary filenames and ensuring that no files are created outside of the folder during extraction (stuff like that). I'll note though that in general fuzzers are most effective when they do no I/O since that enables them to go much faster.

@arthurscchan
Copy link
Contributor Author

@cgwalters I have updated the fuzzers to include more randomness and clean up with arbitrary crate. I have also removed builder.rs which have duplicated targets from the other two fuzzers to make it more clear.

@arthurscchan
Copy link
Contributor Author

@alexcrichton Thank you for your suggestions. I have updated the fuzzers accordingly. I have removed builder.rs as it contains much of the duplicated logic from the other two fuzzers. I have also included more randomness in file structure and contents using the arbitrary crate for both tar.rs and archive.rs. The aim of these fuzzers is to explore and test the code for tar archiving and extraction functionalities. I believe the fuzzers are much improved now, both in terms of randomness and targeted functions. Please do share any further comments on improving the fuzzing targets or approaches to enhance them. Thanks again for your suggestions and feedback.

fuzz/fuzz_targets/archive.rs Outdated Show resolved Hide resolved
fuzz/fuzz_targets/archive.rs Outdated Show resolved Hide resolved
Signed-off-by: Arthur Chan <[email protected]>
@cgwalters
Copy link
Collaborator

Thanks for your work on this! Your filtering of the pathnames looks sane, but I still worry about having code that reads or writes to the filesystem as a result of the fuzzer. In CI, it doesn't really matter - I'm sure that the oss-fuzz runners are sandboxed heavily.

But someone trying to reproduce a fuzzing failure locally is probably not by default, and it'd be pretty unfortunate if we somehow ended up writing into their $HOME or so for example.

The safest fix is really to avoid all filesystem APIs. I should have recommended this earlier but we could also use https://crates.io/crates/cap-std which itself acts as a sandbox.

Alternatively - and this may be the easiest - we could have the fuzz targets bomb out like this:

if !std::env::var_os("TAR_FUZZ_ACKNOWLEDGE_SANDBOX_RECOMMENDED").is_some() {
  anyhow::bail!("These fuzzing targets may read/write to the filesystem; please run in a sandbox and set TAR_FUZZ_ACKNOWLEDGE_SANDBOX_RECOMMENDED");

or so?

@NobodyXu
Copy link

NobodyXu commented Nov 2, 2024

The safest fix is really to avoid all filesystem APIs. I should have recommended this earlier but we could also use https://crates.io/crates/cap-std which itself acts as a sandbox.

Just my 2c, but it seems that having a sans-io interface would help fuzzing, as it ensures no I/O?

Copy link
Owner

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fuzzers look good to me, thanks! I might second the cap-std suggestion for perhaps creating the directory to crate an archive from. That can help remove the tests for safe paths I think since it should automatically deny access outside the directory (if I understand it right)

While skipping I/O entirely would be best I do realize that tar is intimately tied with the filesystem most of the time so it might just not be possible to skip it for these fuzzers.

fuzz/fuzz_targets/tar.rs Outdated Show resolved Hide resolved
Signed-off-by: Arthur Chan <[email protected]>
@arthurscchan
Copy link
Contributor Author

@alexcrichton @cgwalters I have revamped the logic, using derive_abitrary crate and change the size check to use int_in_range. I have also implemented the sandbox_dir with cap_std and remove unnecessary path checking and sanitisation (since cap_std won't allow going outside from the sandbox_dir). I agree that it is a better handling of the I/O process. I actually second the idea that I/O is kinda not avoidable in fuzzing this project since tar is coupled with I/O process in some sence, and that is also one of the worthy fuzzing target in the project.

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for your work on this! I think we can get this in and see if fuzzing at scale turns up anything.

Am I correct in that basically what this fuzzing coverage should find is an unexpected panic or so? We're swallowing/ignoring most Err.

I think it'd be quite interesting to have a followup for Alex's suggestion of differential fuzzing, e.g. something like validate that a tar we create can "round trip" especially through other tar implementations like GNU tar, libarchive, the Go encoding/tar etc.
That would likely turn up a lot of edge cases in interoperability.

fuzz/fuzz_targets/archive.rs Show resolved Hide resolved
@cgwalters cgwalters merged commit 9189d3c into alexcrichton:main Nov 4, 2024
7 checks passed
@arthurscchan
Copy link
Contributor Author

arthurscchan commented Nov 4, 2024

@cgwalters Yes, we are aiming to find unexpected panic, or even deadly signal like Segmentation Fault or else. Thanks for merging the fuzzers in. I will go on and add your email to the OSS-Fuzz integration and it should be ready to go. Not sure if @alexcrichton also wants to be added to the OSS-Fuzz contact for this project. If yes, please give me a email address linked to a Google account and I am happy to add it in for you. Thanks.

@arthurscchan arthurscchan deleted the oss-fuzz branch November 4, 2024 15:46
DavidKorczynski pushed a commit to google/oss-fuzz that referenced this pull request Nov 4, 2024
This PR initialises OSS-Fuzz integration for the tar-rs project in Rust.
New fuzzers have been created, and a PR
(alexcrichton/tar-rs#385) has been submitted
upstream to merge the fuzzers.

---------

Signed-off-by: Arthur Chan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants