Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate and implement improvements to code analysis #6111

Closed
eloquence opened this issue Sep 28, 2021 · 5 comments
Closed

Evaluate and implement improvements to code analysis #6111

eloquence opened this issue Sep 28, 2021 · 5 comments
Labels
epic Meta issue tracking child issues security

Comments

@eloquence
Copy link
Member

(Based on an issue originally filed by @emkll internally.)

We should consider improving and tweaking our static code analysis to identify common code quality/security issues. We are currently using Bandit, but Pyre and Semgrep also provide taint analysis, to ensure untrusted or user-specified input does not make its way to sensitive functions without being sanitized.

Furthermore, we should add rules for existing issues we've uncovered historically, to prevent regressions. The following is a non-exhaustive list of static code analysis tools

Bandit

We are currently using Bandit, but the default rules provided by the tool may be insufficient. We can consider creating custom rules for Bandit, as we identify new issues or to verify.

Pyre

We have done several small spikes to investigate the use of Pyre in the context of the SecureDrop Client: freedomofpress/securedrop-client#385

Semgrep

We use Semgrep in the SecureDrop Client as of freedomofpress/securedrop-client#1226 and and in securedrop-export as of freedomofpress/securedrop-export#67, but we do not use it in this repo yet.

CodeQL

This solution offered by GitHub is similar to Semgrep, but is only available through GitHub Actions, and as such cannot be run locally.

@tesitura
Copy link

tesitura commented Oct 7, 2021

Based on interesting criteria for the project I've gathered a comparison between the tools and some insights of possible next steps.

Comparison of tools
https://docs.google.com/spreadsheets/d/1zTMND0SsKuUgur18rvJ12XehSRoPPkfnO3HUM_pjBMg/edit?usp=sharing
Insights on possible interesting tools based on comparison
https://docs.google.com/document/d/1RMd86Jw_WgstF_MzQqTCR0F0kpbQd9NHp4dseoMGtgQ/edit?usp=sharing

@lsd-cat
Copy link
Member

lsd-cat commented Jan 13, 2022

Having researched both Pyre/Pysa and Semgrep here is a small list of pros and cons of each of them:
Pyre/Pysa:

  • ➕ Better taint analysis
  • ➕ More friendly syntax/settings
  • ➕ Support privacy/permission controls in taint analysis
  • ➖ Seems to be only used and maintained by Facebook
  • ➖ Not a lot of public projects or samples where it is currently in use

Semgrep:

  • ➕ Easy to integrate in CI
  • ➕ Matching rules are somewhat intuitive to write for simple bugs
  • ➕ Matching rules support privacy/authorization checks
  • ➕ Can produce fancy diagrams
  • ➕ Bigger community and examples
  • ➖ Taint analysis is somewhat new and 'light'
  • ➖ Writing meaninful code pattern rules to spot complex bugs can be challenging

CodeQL would bound our CI to GitHub and seems overall less feature rich.

In both cases, instrumenting the code and writing meaningful rules would require to go over most of the code and a lot of manual work at least the first time but would have the benefit of giving us come kind of automated security and regression checking in CI after the initial effort. I would say that the better example and support availability of Semgrep could make the initial effort easier. Seems also like that in case of abandonment of Pyre/Pysa by Facebook, there would not be a big enough community to maintain the tool.

@eloquence
Copy link
Member Author

(@L3th3 is looking into this and may open a draft PR this sprint, moving into "in development".)

@ghost
Copy link

ghost commented Jun 2, 2022

I encountered quite a few problems with pyre/pysa. While it works fine on smaller projects and the sample cases that they provide, I had a very hard time getting it to work on any real world projects such as securedrop. For future reference:

  • inconsistencies between the models packaged in the wheels and the ones in the github repo. I found that the models covering python's standard library didn't cover the entire standard library, but that the functions which were missing varied based on which set of models were being used. This particular issue is the most blocking.
  • pysa is finicky about matching functions to their .pysa model, which leads to much fewer execution paths being analyzed than expected, and fewer results
  • many of the models provided are broken. While there is an option to make that non-blocking, it still means making do with less coverage
  • inconsistent behavior in different environments. Again, non-blocking, because we can simply use docker environments, but the fact that pyre was raising errors on arch about being unable to modify its own configuration points to some deeper issues within the tool itself. I didn't have this issue in Debian environments
  • as the tool is designed, pysa needs to be able to be installed in the same venv as the project it's analyzing to also analyze the dependencies. However, this also means that the project's dependencies might need to be adapted to allow pyre to run, which is meh

Overall, while the taint analysis is a very interesting feature, in practice the tool needs to mature before we can really make use of it. In particular, there needs to be some significant improvement on the models they offer, because while we can make our own models, we would still need to rely on the models they provide, which are simply insufficient right now.

@eloquence
Copy link
Member Author

Closing as a research task, semgrep support was added in #6479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Meta issue tracking child issues security
Projects
None yet
Development

No branches or pull requests

3 participants