Pysa detection capabilities #959

yoann-marquer · 2025-01-10T15:29:41Z

Pysa Bug

Pre-submission checklist
[✓] I've checked the list of common issues and mine does not appear

Bug description
Apologies, this is not exactly a bug, more a set of questions to better understand Pysa capabilities, don't hesitate to redirect us if necessary.

We are trying to use Pysa to detect vulnerabilities in several Python projects.
According to the tutorial and the documentation, Pysa is able to detect when there is a data flow from a source to a sink (which, eventually, was not caught by a sanitizer) and to report a location in the path between the source and the sink as the vulnerability location.

But is it possible to do more with Pysa, by defining dedicated rules?
For instance:

be able to detect if a variable is present in a given procedure/statement
be able to report a vulnerability if a sink is present but there is no data flow between the source and the sink (as opposed to the current behavior where there must be a data flow to report a vulnerability)
be able to detect that there is a data flow between a source and the end of a procedure (instead of a sink)

Reproduction steps
None

Expected behavior
For CVE-2016-9243, is it possible to detect in ‎src/cryptography/hazmat/primitives/kdf/hkdf.py that the variable self._algorithm.digest_size was divided by 8? For instance, by detecting the presence of the character 8 at Line 94?

For CVE-2017-2809, is it possible to detect in ansible_vault/api.py that yaml.load was called instead of yaml.safe_load at Line 18?

For CVE-2016-9909, is it possible to detect in html5lib/serializer/htmlserializer.py that the variable self.quote_attr_values was not compared with the value "legacy", as in the fix at Line 255?

For CVE-2014-7143, is it possible to detect in twisted/web/client.py
that the "source" _trustRoot should be present when the "sink" optionsForClientTLS is called (i.e., there is a vulnerability if optionsForClientTLS is called without _trustRoot)?

Similarly, for CVE-2012-2417, is it possible to detect in ‎lib/Crypto/PublicKey/ElGamal.py that the "source" getPrime should be reached before the "sink" .isPrime (i.e., there is a vulnerability if getPrime is called but not .isPrime)?

Logs
None

Additional context
None

The text was updated successfully, but these errors were encountered:

arthaud · 2025-01-13T12:05:26Z

Hi,

be able to detect if a variable is present in a given procedure/statement

How would it "detect" the variable? By its name? Then, no, in the general case, Pysa is not able to detect if a variable name is used. If, instead, you want to detect if an attribute is used, then you can model a specific class attribute as a source or sink.

be able to report a vulnerability if a sink is present but there is no data flow between the source and the sink (as opposed to the current behavior where there must be a data flow to report a vulnerability)

No, unfortunately, Pysa can only find flows from sources to sinks. If you just want to know if a function is called from anywhere, you could use the call graph generated by Pysa (see option --dump-call-graph).

be able to detect that there is a data flow between a source and the end of a procedure (instead of a sink)

Yes, you can mark the return value of a function as a sink using return sinks. Unfortunately, it looks like this isn't documented, but this is supported. See this test:
https://github.com/facebook/pyre-check/blob/main/source/interprocedural_analyses/taint/test/integration/return_sinks.py.pysa#L3-L4
https://github.com/facebook/pyre-check/blob/main/source/interprocedural_analyses/taint/test/integration/return_sinks.py#L24-L30

For CVE-2016-9243, is it possible to detect in ‎src/cryptography/hazmat/primitives/kdf/hkdf.py that the variable self._algorithm.digest_size was divided by 8? For instance, by detecting the presence of the character 8 at Line 94?

You could try this: make int. __truediv__ a sink, so you find all flows from digest_size (you would make it a source DigestSize) to any integer division. Then, you can use ViaValueOf: https://pyre-check.org/docs/pysa-features/#via-value-feature-using-viavalueof
This will make Pysa add a breadcrumb via-value:<dividend> (e.g via-value:8) to all these flows. Then you can use filters to exclude issues where the dividend is not 8.
However I will admit this is just a workaround and might not be the best approach.

For CVE-2017-2809, is it possible to detect in ansible_vault/api.py that yaml.load was called instead of yaml.safe_load at Line 18?

Pysa can only find flows from sources to sinks. For instance, using yaml.load on a static string or something coming from a configuration file is probably not a vulnerability. I could imagine a few things that could be sinks here. For instance, anything returned by Vault.decrypt seems like it could be user controlled? Or at least you could make a source DecryptedPayload or something.
You could also just find all calls to yaml.load using the call graph, as I suggested above for another question.

For CVE-2016-9909, is it possible to detect in html5lib/serializer/htmlserializer.py that the variable self.quote_attr_values was not compared with the value "legacy", as in the fix at Line 255?

I don't see an easy way for Pysa to detect whether a variable is NOT compared with a specific string. However, I'm wondering if this vulnerability can be modeled differently as a source to sink problem.

For CVE-2014-7143, is it possible to detect in twisted/web/client.py that the "source" _trustRoot should be present when the "sink" optionsForClientTLS is called (i.e., there is a vulnerability if optionsForClientTLS is called without _trustRoot)?

Pysa cannot detect a "not" flow. I think the correct way to model this is to make hostname be the source, and filter out flows where _trustRoot was provided. For instance, ViaValueOf can tell if you an argument was present or NOT.

Similarly, for CVE-2012-2417, is it possible to detect in ‎lib/Crypto/PublicKey/ElGamal.py that the "source" getPrime should be reached before the "sink" .isPrime (i.e., there is a vulnerability if getPrime is called but not .isPrime)?

I would make getPrime a sanitizer, but then we need something to act as a source. Maybe bits could be the source?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pysa detection capabilities #959

Pysa detection capabilities #959

yoann-marquer commented Jan 10, 2025

arthaud commented Jan 13, 2025 •

edited

Loading

Pysa detection capabilities #959

Pysa detection capabilities #959

Comments

yoann-marquer commented Jan 10, 2025

Pysa Bug

arthaud commented Jan 13, 2025 • edited Loading

arthaud commented Jan 13, 2025 •

edited

Loading