Evaluate and implement improvements to code analysis #6111

eloquence · 2021-09-28T23:47:50Z

(Based on an issue originally filed by @emkll internally.)

We should consider improving and tweaking our static code analysis to identify common code quality/security issues. We are currently using Bandit, but Pyre and Semgrep also provide taint analysis, to ensure untrusted or user-specified input does not make its way to sensitive functions without being sanitized.

Furthermore, we should add rules for existing issues we've uncovered historically, to prevent regressions. The following is a non-exhaustive list of static code analysis tools

Bandit

We are currently using Bandit, but the default rules provided by the tool may be insufficient. We can consider creating custom rules for Bandit, as we identify new issues or to verify.

Pyre

We have done several small spikes to investigate the use of Pyre in the context of the SecureDrop Client: freedomofpress/securedrop-client#385

Semgrep

We use Semgrep in the SecureDrop Client as of freedomofpress/securedrop-client#1226 and and in securedrop-export as of freedomofpress/securedrop-export#67, but we do not use it in this repo yet.

CodeQL

This solution offered by GitHub is similar to Semgrep, but is only available through GitHub Actions, and as such cannot be run locally.

The text was updated successfully, but these errors were encountered:

tesitura · 2021-10-07T00:45:48Z

Based on interesting criteria for the project I've gathered a comparison between the tools and some insights of possible next steps.

Comparison of tools
https://docs.google.com/spreadsheets/d/1zTMND0SsKuUgur18rvJ12XehSRoPPkfnO3HUM_pjBMg/edit?usp=sharing
Insights on possible interesting tools based on comparison
https://docs.google.com/document/d/1RMd86Jw_WgstF_MzQqTCR0F0kpbQd9NHp4dseoMGtgQ/edit?usp=sharing

lsd-cat · 2022-01-13T11:07:10Z

Having researched both Pyre/Pysa and Semgrep here is a small list of pros and cons of each of them:
Pyre/Pysa:

➕ Better taint analysis
➕ More friendly syntax/settings
➕ Support privacy/permission controls in taint analysis
➖ Seems to be only used and maintained by Facebook
➖ Not a lot of public projects or samples where it is currently in use

Semgrep:

➕ Easy to integrate in CI
➕ Matching rules are somewhat intuitive to write for simple bugs
➕ Matching rules support privacy/authorization checks
➕ Can produce fancy diagrams
➕ Bigger community and examples
➖ Taint analysis is somewhat new and 'light'
➖ Writing meaninful code pattern rules to spot complex bugs can be challenging

CodeQL would bound our CI to GitHub and seems overall less feature rich.

In both cases, instrumenting the code and writing meaningful rules would require to go over most of the code and a lot of manual work at least the first time but would have the benefit of giving us come kind of automated security and regression checking in CI after the initial effort. I would say that the better example and support availability of Semgrep could make the initial effort easier. Seems also like that in case of abandonment of Pyre/Pysa by Facebook, there would not be a big enough community to maintain the tool.

eloquence · 2022-04-27T22:22:28Z

(@L3th3 is looking into this and may open a draft PR this sprint, moving into "in development".)

ghost · 2022-06-02T15:42:41Z

I encountered quite a few problems with pyre/pysa. While it works fine on smaller projects and the sample cases that they provide, I had a very hard time getting it to work on any real world projects such as securedrop. For future reference:

inconsistencies between the models packaged in the wheels and the ones in the github repo. I found that the models covering python's standard library didn't cover the entire standard library, but that the functions which were missing varied based on which set of models were being used. This particular issue is the most blocking.
pysa is finicky about matching functions to their .pysa model, which leads to much fewer execution paths being analyzed than expected, and fewer results
many of the models provided are broken. While there is an option to make that non-blocking, it still means making do with less coverage
inconsistent behavior in different environments. Again, non-blocking, because we can simply use docker environments, but the fact that pyre was raising errors on arch about being unable to modify its own configuration points to some deeper issues within the tool itself. I didn't have this issue in Debian environments
as the tool is designed, pysa needs to be able to be installed in the same venv as the project it's analyzing to also analyze the dependencies. However, this also means that the project's dependencies might need to be adapted to allow pyre to run, which is meh

Overall, while the taint analysis is a very interesting feature, in practice the tool needs to mature before we can really make use of it. In particular, there needs to be some significant improvement on the models they offer, because while we can make our own models, we would still need to rely on the models they provide, which are simply insufficient right now.

eloquence · 2022-07-12T20:46:56Z

Closing as a research task, semgrep support was added in #6479

eloquence added security epic Meta issue tracking child issues labels Sep 28, 2021

eloquence mentioned this issue Sep 28, 2021

Add GitHub CodeQL and/or semgrep for static code analysis #5598

Closed

eloquence closed this as completed Jul 12, 2022

cfm mentioned this issue Jun 26, 2024

Add annotations for data sources and sinks and further static analysis freedomofpress/securedrop-client#385

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate and implement improvements to code analysis #6111

Evaluate and implement improvements to code analysis #6111

eloquence commented Sep 28, 2021

tesitura commented Oct 7, 2021

lsd-cat commented Jan 13, 2022

eloquence commented Apr 27, 2022

ghost commented Jun 2, 2022

eloquence commented Jul 12, 2022

Evaluate and implement improvements to code analysis #6111

Evaluate and implement improvements to code analysis #6111

Comments

eloquence commented Sep 28, 2021

Bandit

Pyre

Semgrep

CodeQL

tesitura commented Oct 7, 2021

lsd-cat commented Jan 13, 2022

eloquence commented Apr 27, 2022

ghost commented Jun 2, 2022

eloquence commented Jul 12, 2022