Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add links to source code on https://introspector.oss-fuzz.com/ #1317

Open
DavidKorczynski opened this issue Nov 15, 2023 · 6 comments
Open
Assignees

Comments

@DavidKorczynski
Copy link
Contributor

It would be nice to have direct links to the fuzzer source files on the profile pages -- I think some heuristics will be able to do this and it will make it very convenient to browse a given project's structure.

@DavidKorczynski DavidKorczynski self-assigned this Nov 15, 2023
@nathaniel-brough
Copy link
Contributor

nathaniel-brough commented Nov 22, 2023

Do you think this functionality could be extended to an optional part of the API? i.e. it would be great to programmatically fetch from github. This could be done by just exposing the following in the all-functions endpoint;

  • line_number (e.g. 29)
  • commit (e.g. 660d8bf

Then you can create a link like the following;

Having it as part of the API would also mean you can do other things like download the file using the github API. The only downside is that not all projects use git (so the commit field might need to be something more generic or optional).

@DavidKorczynski
Copy link
Contributor Author

If I understand your thought correctly then I think it would be neat -- namely to have an API available that'll provide you a link to the source code, or, perhaps the source code of each harness.

However, I'm unsure what you meant by commit? Which commits are you referring to for each harness?

My thoughts are:
Take the llhttp project with a single fuzzer: https://introspector.oss-fuzz.com/project-profile?project=llhttp -- in this case, I'd like for the profile page to have a "fuzzers table" with 1 row (because there's a single fuzzer) with a link to https://github.com/nodejs/llhttp/blob/main/test/fuzzers/fuzz_parser.c#L8-L45 for the fuzz_parser harness.

To make it an API, I would make either the above URL accessible. We should be able to provide references to other repo websites e.g. gitlab and more, and, in the worst case we can provide a URL to the code coverage report for where the fuzzer is as we'll always (or when coverage is working at least) have a link to the code coverage reports.

Are you perhaps thinking instead of https://github.com/nodejs/llhttp/blob/main/test/fuzzers/fuzz_parser.c#L8-L45 the right link to provide is https://github.com/nodejs/llhttp/blob/8498ef9d8b0e9539c8c331cf59213529287789e1/test/fuzzers/fuzz_parser.c#L8-L45?

@DavidKorczynski
Copy link
Contributor Author

We may run into some issue with having to predict branch names. Hmm, I'm not sure if there are many edge cases we'll have to handle.

One option is to reduce this to links to the location in the code coverage reports. I think that's also useful in and of itself, but, I also think having the source repo URLs provide high value, and even the source code itself.

@nathaniel-brough
Copy link
Contributor

Are you perhaps thinking instead of https://github.com/nodejs/llhttp/blob/main/test/fuzzers/fuzz_parser.c#L8-L45 the right link to provide is https://github.com/nodejs/llhttp/blob/8498ef9d8b0e9539c8c331cf59213529287789e1/test/fuzzers/fuzz_parser.c#L8-L45?

Yeah that's pretty close to what I was saying, I guess what I was getting at is that you need 3 bits of information to reproducibly find a function;

  • The specific version (my suggestion being the specific git sha from the project that introspector is being run on). If it's not already this could be extracted using the following commands;
$ cd llhttp # Insert project here
$ git rev-parse HEAD
8498ef9d8b0e9539c8c331cf59213529287789e1
  • The file path e.g. test/fuzzers/fuzz_parser.c
  • The line range e.g. L8-L45

If you stitch all of these peices together you can reproducibly find the specific function again, and it will always remain the same e.g.

https://github.com/nodejs/llhttp/blob/8498ef9d8b0e9539c8c331cf59213529287789e1/test/fuzzers/fuzz_parser.c#L8-L45
https://github.com/nodejs/llhttp/blob/{---------------commit-sha-------------}/{-----------path----------}#{range}

My suggestion is to just include those peices of information in the API, and leave the URL building up to the user. For example the same thing would be reproducible from the command line. e.g.

$ git clone https://github.com/nodejs/llhttp.git
$ cd llhttp
$ git checkout 8498ef9d8b0e9539c8c331cf59213529287789e1
$ # Snip out lines 8-45
$ sed -n '8,45 p' test/fuzzers/fuzz_parser.c 
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
 // Truncated ...
}

The latter being closer to what I would likely be doing.

We may run into some issue with having to predict branch names. Hmm, I'm not sure if there are many edge cases we'll have to handle.

I don't think branch prediction would be an issue with the above approach. A branch in itself is just a stream of sequential commits. Whereas a commit itself is an atomic representation of a git repository. So as long as you collect the git sha, when you run introspector you should be able to reproducibly restore that commit (or use the github api, to view the file at that commit).

@nathaniel-brough
Copy link
Contributor

That's assuming you meant git branch and not some other definition of a branch :)

@nathaniel-brough
Copy link
Contributor

Also worth noting that gitlab, bitbucket and others have a similiar api structure available as well e.g.

https://gitlab.com/gnuwget/wget2/-/blob/8271687e29568e9a271afa1b3112325611f48183/fuzz/libwget_atom_url_fuzzer.c#L31-53
https://bitbucket.org/snakeyaml/snakeyaml/src/a4df9e7d7ffdc0c21fe268f872a1e30d03aa8f02/src/main/java9/module-info.java#lines-14:44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants