-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[analyzer] Moving TaintPropagation checker out of alpha #67352
Conversation
@llvm/pr-subscribers-clang-static-analyzer-1 @llvm/pr-subscribers-clang ChangesThis commit renames alpha.security.taint.TaintPropagation checker to optin.security.taint.TaintPropagation. This checker was stabilized and improved by recent commits thus it's ready for production use. The checker is placed in the optin package as it implements an optional security analysis. Reports by the checker:
All reports are true positive in the sense that they correctly indicate that tainted data gets to the trusted source and the data path is correctly shown. 2 reports are indicating particularly interesting potential vulnerabilities.
The report Untrusted data is used to specify the buffer size Patch is 42.36 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67352.diff 25 Files Affected:
diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst
index dbd6d7787823530..e68b10dc2c1c6c5 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -533,7 +533,253 @@ Warns when a nullable pointer is returned from a function that has _Nonnull retu
optin
^^^^^
-Checkers for portability, performance or coding style specific rules.
+Optional checkers for portability, performance, security or coding style checkers
+which may not be applicable to all projects.
+
+optin.security.taint
+^^^^^^^^^^^^^^^^^^^^
+
+Checkers implementing
+`taint analysis <https://en.wikipedia.org/wiki/Taint_checking>`_.
+
+.. optin-security-taint-TaintPropagation:
+
+optin.security.taint.TaintPropagation (C, C++)
+""""""""""""""""""""""""""""""""""""""""""""""
+
+Taint analysis identifies potential security vulnerabilities where the
+attacker can inject malicious data to the program to execute an attack
+(privilege escalation, command injection, SQL injection etc.).
+
+The malicious data is injected at the taint source (e.g. ``getenv()`` call)
+which is then propagated through function calls and being used as arguments of
+sensitive operations, also called as taint sinks (e.g. ``system()`` call).
+
+One can defend against this type of vulnerability by always checking and
+sanitizing the potentially malicious, untrusted user input.
+
+The goal of the checker is to discover and show to the user these potential
+taint source-sink pairs and the propagation call chain.
+
+The most notable examples of taint sources are:
+
+ - data from network
+ - files or standard input
+ - environment variables
+ - data from databases
+
+Let us examine a practical example of a Command Injection attack.
+
+.. code-block:: c
+
+ // Command Injection Vulnerability Example
+ int main(int argc, char** argv) {
+ char cmd[2048] = "/bin/cat ";
+ char filename[1024];
+ printf("Filename:");
+ scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
+ strcat(cmd, filename);
+ system(cmd); // Warning: Untrusted data is passed to a system call
+ }
+
+The program prints the content of any user specified file.
+Unfortunately the attacker can execute arbitrary commands
+with shell escapes. For example with the following input the `ls` command is also
+executed after the contents of `/etc/shadow` is printed.
+`Input: /etc/shadow ; ls /`
+
+The analysis implemented in this checker points out this problem.
+
+One can protect against such attack by for example checking if the provided
+input refers to a valid file and removing any invalid user input.
+
+.. code-block:: c
+
+ // No vulnerability anymore, but we still get the warning
+ void sanitizeFileName(char* filename){
+ if (access(filename,F_OK)){// Verifying user input
+ printf("File does not exist\n");
+ filename[0]='\0';
+ }
+ }
+ int main(int argc, char** argv) {
+ char cmd[2048] = "/bin/cat ";
+ char filename[1024];
+ printf("Filename:");
+ scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
+ sanitizeFileName(filename);// filename is safe after this point
+ if (!filename[0])
+ return -1;
+ strcat(cmd, filename);
+ system(cmd); // Superfluous Warning: Untrusted data is passed to a system call
+ }
+
+Unfortunately, the checker cannot discover automatically that the programmer
+have performed data sanitation, so it still emits the warning.
+
+One can get rid of this superfluous warning by telling by specifying the
+sanitation functions in the taint configuration file (see
+:doc:`user-docs/TaintAnalysisConfiguration`).
+
+.. code-block:: YAML
+
+ Filters:
+ - Name: sanitizeFileName
+ Args: [0]
+
+The clang invocation to pass the configuration file location:
+
+.. code-block:: bash
+
+ clang --analyze -Xclang -analyzer-config -Xclang optin.security.taint.TaintPropagation:Config=`pwd`/taint_config.yml ...
+
+If you are validating your inputs instead of sanitizing them, or don't want to
+mention each sanitizing function in our configuration,
+you can use a more generic approach.
+
+Introduce a generic no-op `csa_mark_sanitized(..)` function to
+tell the Clang Static Analyzer
+that the variable is safe to be used on that analysis path.
+
+.. code-block:: c
+
+ // Marking sanitized variables safe.
+ // No vulnerability anymore, no warning.
+
+ // User csa_mark_sanitize function is for the analyzer only
+ #ifdef __clang_analyzer__
+ void csa_mark_sanitized(const void *);
+ #endif
+
+ int main(int argc, char** argv) {
+ char cmd[2048] = "/bin/cat ";
+ char filename[1024];
+ printf("Filename:");
+ scanf (" %1023[^\n]", filename);
+ if (access(filename,F_OK)){// Verifying user input
+ printf("File does not exist\n");
+ return -1;
+ }
+ #ifdef __clang_analyzer__
+ csa_mark_sanitized(filename); // Indicating to CSA that filename variable is safe to be used after this point
+ #endif
+ strcat(cmd, filename);
+ system(cmd); // No warning
+ }
+
+Similarly to the previous example, you need to
+define a `Filter` function in a `YAML` configuration file
+and add the `csa_mark_sanitized` function.
+
+.. code-block:: YAML
+
+ Filters:
+ - Name: csa_mark_sanitized
+ Args: [0]
+
+Then calling `csa_mark_sanitized(X)` will tell the analyzer that `X` is safe to
+be used after this point, because its contents are verified. It is the
+responsibility of the programmer to ensure that this verification was indeed
+correct. Please note that `csa_mark_sanitized` function is only declared and
+used during Clang Static Analysis and skipped in (production) builds.
+
+Further examples of injection vulnerabilities this checker can find.
+
+.. code-block:: c
+
+ void test() {
+ char x = getchar(); // 'x' marked as tainted
+ system(&x); // warn: untrusted data is passed to a system call
+ }
+
+ // note: compiler internally checks if the second param to
+ // sprintf is a string literal or not.
+ // Use -Wno-format-security to suppress compiler warning.
+ void test() {
+ char s[10], buf[10];
+ fscanf(stdin, "%s", s); // 's' marked as tainted
+
+ sprintf(buf, s); // warn: untrusted data used as a format string
+ }
+
+ void test() {
+ size_t ts;
+ scanf("%zd", &ts); // 'ts' marked as tainted
+ int *p = (int *)malloc(ts * sizeof(int));
+ // warn: untrusted data used as buffer size
+ }
+
+There are built-in sources, propagations and sinks even if no external taint
+configuration is provided.
+
+Default sources:
+ ``_IO_getc``, ``fdopen``, ``fopen``, ``freopen``, ``get_current_dir_name``,
+ ``getch``, ``getchar``, ``getchar_unlocked``, ``getwd``, ``getcwd``,
+ ``getgroups``, ``gethostname``, ``getlogin``, ``getlogin_r``, ``getnameinfo``,
+ ``gets``, ``gets_s``, ``getseuserbyname``, ``readlink``, ``readlinkat``,
+ ``scanf``, ``scanf_s``, ``socket``, ``wgetch``
+
+Default propagations rules:
+ ``atoi``, ``atol``, ``atoll``, ``basename``, ``dirname``, ``fgetc``,
+ ``fgetln``, ``fgets``, ``fnmatch``, ``fread``, ``fscanf``, ``fscanf_s``,
+ ``index``, ``inflate``, ``isalnum``, ``isalpha``, ``isascii``, ``isblank``,
+ ``iscntrl``, ``isdigit``, ``isgraph``, ``islower``, ``isprint``, ``ispunct``,
+ ``isspace``, ``isupper``, ``isxdigit``, ``memchr``, ``memrchr``, ``sscanf``,
+ ``getc``, ``getc_unlocked``, ``getdelim``, ``getline``, ``getw``, ``memcmp``,
+ ``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``,
+ ``qsort_r``, ``rawmemchr``, ``read``, ``recv``, ``recvfrom``, ``rindex``,
+ ``strcasestr``, ``strchr``, ``strchrnul``, ``strcasecmp``, ``strcmp``,
+ ``strcspn``, ``strlen``, ``strncasecmp``, ``strncmp``, ``strndup``,
+ ``strndupa``, ``strnlen``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
+ ``strstr``, ``strtol``, ``strtoll``, ``strtoul``, ``strtoull``, ``tolower``,
+ ``toupper``, ``ttyname``, ``ttyname_r``, ``wctomb``, ``wcwidth``
+
+Default sinks:
+ ``printf``, ``setproctitle``, ``system``, ``popen``, ``execl``, ``execle``,
+ ``execlp``, ``execv``, ``execvp``, ``execvP``, ``execve``, ``dlopen``,
+ ``memcpy``, ``memmove``, ``strncpy``, ``strndup``, ``malloc``, ``calloc``,
+ ``alloca``, ``memccpy``, ``realloc``, ``bcopy``
+
+Please note that there are no built-in filter functions.
+
+One can configure their own taint sources, sinks, and propagation rules by
+providing a configuration file via checker option
+``optin.security.taint.TaintPropagation:Config``. The configuration file is in
+`YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ format. The
+taint-related options defined in the config file extend but do not override the
+built-in sources, rules, sinks. The format of the external taint configuration
+file is not stable, and could change without any notice even in a non-backward
+compatible way.
+
+For a more detailed description of configuration options, please see the
+:doc:`user-docs/TaintAnalysisConfiguration`. For an example see
+:ref:`clangsa-taint-configuration-example`.
+
+**Configuration**
+
+* `Config` Specifies the name of the YAML configuration file. The user can
+ define their own taint sources and sinks.
+
+**Related Guidelines**
+
+* `CWE Data Neutralization Issues
+ <https://cwe.mitre.org/data/definitions/137.html>`_
+* `SEI Cert STR02-C. Sanitize data passed to complex subsystems
+ <https://wiki.sei.cmu.edu/confluence/display/c/STR02-C.+Sanitize+data+passed+to+complex+subsystems>`_
+* `SEI Cert ENV33-C. Do not call system()
+ <https://wiki.sei.cmu.edu/confluence/pages/viewpage.action?pageId=87152177>`_
+* `ENV03-C. Sanitize the environment when invoking external programs
+ <https://wiki.sei.cmu.edu/confluence/display/c/ENV03-C.+Sanitize+the+environment+when+invoking+external+programs>`_
+
+**Limitations**
+
+* The taintedness property is not propagated through function calls which are
+ unknown (or too complex) to the analyzer, unless there is a specific
+ propagation rule built-in to the checker or given in the YAML configuration
+ file. This causes potential true positive findings to be lost.
+
+Optional checkers for portability, performance, security
+or coding style specific rules.
.. _optin-cplusplus-UninitializedObject:
@@ -2217,7 +2463,7 @@ Warn about buffer overflows (newer checker).
buf[0][-1] = 1; // warn
}
- // note: requires alpha.security.taint check turned on.
+ // note: requires optin.security.taint check turned on.
void test() {
char s[] = "abc";
int x = getchar();
@@ -2406,248 +2652,6 @@ pointer. These functions include: getenv, localeconv, asctime, setlocale, strerr
// dereferencing invalid pointer
}
-alpha.security.taint
-^^^^^^^^^^^^^^^^^^^^
-
-Checkers implementing
-`taint analysis <https://en.wikipedia.org/wiki/Taint_checking>`_.
-
-.. _alpha-security-taint-TaintPropagation:
-
-alpha.security.taint.TaintPropagation (C, C++)
-""""""""""""""""""""""""""""""""""""""""""""""
-
-Taint analysis identifies potential security vulnerabilities where the
-attacker can inject malicious data to the program to execute an attack
-(privilege escalation, command injection, SQL injection etc.).
-
-The malicious data is injected at the taint source (e.g. ``getenv()`` call)
-which is then propagated through function calls and being used as arguments of
-sensitive operations, also called as taint sinks (e.g. ``system()`` call).
-
-One can defend against this type of vulnerability by always checking and
-sanitizing the potentially malicious, untrusted user input.
-
-The goal of the checker is to discover and show to the user these potential
-taint source-sink pairs and the propagation call chain.
-
-The most notable examples of taint sources are:
-
- - data from network
- - files or standard input
- - environment variables
- - data from databases
-
-Let us examine a practical example of a Command Injection attack.
-
-.. code-block:: c
-
- // Command Injection Vulnerability Example
- int main(int argc, char** argv) {
- char cmd[2048] = "/bin/cat ";
- char filename[1024];
- printf("Filename:");
- scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
- strcat(cmd, filename);
- system(cmd); // Warning: Untrusted data is passed to a system call
- }
-
-The program prints the content of any user specified file.
-Unfortunately the attacker can execute arbitrary commands
-with shell escapes. For example with the following input the `ls` command is also
-executed after the contents of `/etc/shadow` is printed.
-`Input: /etc/shadow ; ls /`
-
-The analysis implemented in this checker points out this problem.
-
-One can protect against such attack by for example checking if the provided
-input refers to a valid file and removing any invalid user input.
-
-.. code-block:: c
-
- // No vulnerability anymore, but we still get the warning
- void sanitizeFileName(char* filename){
- if (access(filename,F_OK)){// Verifying user input
- printf("File does not exist\n");
- filename[0]='\0';
- }
- }
- int main(int argc, char** argv) {
- char cmd[2048] = "/bin/cat ";
- char filename[1024];
- printf("Filename:");
- scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
- sanitizeFileName(filename);// filename is safe after this point
- if (!filename[0])
- return -1;
- strcat(cmd, filename);
- system(cmd); // Superfluous Warning: Untrusted data is passed to a system call
- }
-
-Unfortunately, the checker cannot discover automatically that the programmer
-have performed data sanitation, so it still emits the warning.
-
-One can get rid of this superfluous warning by telling by specifying the
-sanitation functions in the taint configuration file (see
-:doc:`user-docs/TaintAnalysisConfiguration`).
-
-.. code-block:: YAML
-
- Filters:
- - Name: sanitizeFileName
- Args: [0]
-
-The clang invocation to pass the configuration file location:
-
-.. code-block:: bash
-
- clang --analyze -Xclang -analyzer-config -Xclang alpha.security.taint.TaintPropagation:Config=`pwd`/taint_config.yml ...
-
-If you are validating your inputs instead of sanitizing them, or don't want to
-mention each sanitizing function in our configuration,
-you can use a more generic approach.
-
-Introduce a generic no-op `csa_mark_sanitized(..)` function to
-tell the Clang Static Analyzer
-that the variable is safe to be used on that analysis path.
-
-.. code-block:: c
-
- // Marking sanitized variables safe.
- // No vulnerability anymore, no warning.
-
- // User csa_mark_sanitize function is for the analyzer only
- #ifdef __clang_analyzer__
- void csa_mark_sanitized(const void *);
- #endif
-
- int main(int argc, char** argv) {
- char cmd[2048] = "/bin/cat ";
- char filename[1024];
- printf("Filename:");
- scanf (" %1023[^\n]", filename);
- if (access(filename,F_OK)){// Verifying user input
- printf("File does not exist\n");
- return -1;
- }
- #ifdef __clang_analyzer__
- csa_mark_sanitized(filename); // Indicating to CSA that filename variable is safe to be used after this point
- #endif
- strcat(cmd, filename);
- system(cmd); // No warning
- }
-
-Similarly to the previous example, you need to
-define a `Filter` function in a `YAML` configuration file
-and add the `csa_mark_sanitized` function.
-
-.. code-block:: YAML
-
- Filters:
- - Name: csa_mark_sanitized
- Args: [0]
-
-Then calling `csa_mark_sanitized(X)` will tell the analyzer that `X` is safe to
-be used after this point, because its contents are verified. It is the
-responsibility of the programmer to ensure that this verification was indeed
-correct. Please note that `csa_mark_sanitized` function is only declared and
-used during Clang Static Analysis and skipped in (production) builds.
-
-Further examples of injection vulnerabilities this checker can find.
-
-.. code-block:: c
-
- void test() {
- char x = getchar(); // 'x' marked as tainted
- system(&x); // warn: untrusted data is passed to a system call
- }
-
- // note: compiler internally checks if the second param to
- // sprintf is a string literal or not.
- // Use -Wno-format-security to suppress compiler warning.
- void test() {
- char s[10], buf[10];
- fscanf(stdin, "%s", s); // 's' marked as tainted
-
- sprintf(buf, s); // warn: untrusted data used as a format string
- }
-
- void test() {
- size_t ts;
- scanf("%zd", &ts); // 'ts' marked as tainted
- int *p = (int *)malloc(ts * sizeof(int));
- // warn: untrusted data used as buffer size
- }
-
-There are built-in sources, propagations and sinks even if no external taint
-configuration is provided.
-
-Default sources:
- ``_IO_getc``, ``fdopen``, ``fopen``, ``freopen``, ``get_current_dir_name``,
- ``getch``, ``getchar``, ``getchar_unlocked``, ``getwd``, ``getcwd``,
- ``getgroups``, ``gethostname``, ``getlogin``, ``getlogin_r``, ``getnameinfo``,
- ``gets``, ``gets_s``, ``getseuserbyname``, ``readlink``, ``readlinkat``,
- ``scanf``, ``scanf_s``, ``socket``, ``wgetch``
-
-Default propagations rules:
- ``atoi``, ``atol``, ``atoll``, ``basename``, ``dirname``, ``fgetc``,
- ``fgetln``, ``fgets``, ``fnmatch``, ``fread``, ``fscanf``, ``fscanf_s``,
- ``index``, ``inflate``, ``isalnum``, ``isalpha``, ``isascii``, ``isblank``,
- ``iscntrl``, ``isdigit``, ``isgraph``, ``islower``, ``isprint``, ``ispunct``,
- ``isspace``, ``isupper``, ``isxdigit``, ``memchr``, ``memrchr``, ``sscanf``,
- ``getc``, ``getc_unlocked``, ``getdelim``, ``getline``, ``getw``, ``memcmp``,
- ``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``,
- ``qsort_r``, ``rawmemchr``, ``read``, ``recv``, ``recvfrom``, ``rindex``,
- ``strcasestr``, ``strchr``, ``strchrnul``, ``strcasecmp``, ``strcmp``,
- ``strcspn``, ``strncasecmp``, ``strncmp``, ``strndup``,
- ``strndupa``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
- ``strstr``, ``strtol``, ``strtoll``, ``strtoul``, ``strtoull``, ``tolower``,
- ``toupper``, ``ttyname``, ``ttyname_r``, ``wctomb``, ``wcwidth``
-
-Default sinks:
- ``printf``, ``setproctitle``, ``system``, ``popen``, ``execl``, ``execle``,
- ``execlp``, ``execv``, ``execvp``, ``execvP``, ``execve``, ``dlopen``,
- ``memcpy``, ``memmove``, ``strncpy``, ``strndup``, ``malloc``, ``calloc``,
- ``alloca``, ``memccpy``, ``realloc``, ``bcopy``
-
-Please note that there are no built-in filter functions.
-
-One can configure their own taint sources, sinks, and propagation rules by
-providing a configuration file via checker option
-``alpha.security.taint.TaintPropagation:Config``. The configuration file is in
-`YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ format. The
-taint-related options defined in the config file extend but do not override the
-built-in sources, rules, sinks. The format of the external taint configuration
-file is not stable, and could change without any notice even in a non-backward
-compatible way.
-
-For a more detailed description of configuration options, please see the
-:doc:`user-docs/TaintAnalysisConfiguration`. For an example see
-:ref:`clangsa-taint-configuration-example`.
-
-**Configuration**
-
-* `Config` Specifies the name of the YAML configuration file. The user can
- define their own taint sources and sinks.
-
-**Related Guidelines**
-
-* `CWE Data Neutralization Issues
- <https://cwe.mitre.org/data/definitions/137.html>`_
-* `SEI Cert STR02-C. Sanitize data passed to complex subsystems
- <https://wiki.sei.cmu.edu/confluence/display/c/STR02-C.+Sanitize+data+passed+to+complex+subsystems>`_
-* `SEI Cert ENV33-C. Do not call system()
- <https://wiki.sei.cmu.edu/confluence/pages/viewpage.action?pageId=87152177>`_
-* `ENV03-C. Sanitize the environment when invoking external programs
- <https://wiki.sei.cmu.edu/confluence/display/c/ENV03-C.+Sanitize+the+environment+when+invoking+external+programs>`_
-
-**Limitations**
-
-* The taintedness property is not propagated through function calls which are
- unknown (or too complex) to the analyzer, unless there is a specific
- propagation rule built-in to the checker or given in the YAML configuration
- file. This causes potential true positive findings to be lost.
-
alpha.unix
^^^^^^^^^^^
diff --git a/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst b/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst
index 94db84494e00b22..04fc54bc8181b7d 100644
--- a/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst
+++ b/clang/docs/...
[truncated]
|
#61826 has my data (which I unfortunately couldn't publish as-is, but the order of magnitude is, around 300 reports). My main problem with the tainted-number checkers is that they don't consider at constraints at all. Eg., this is clearly a false positive:
This applies to this checker and the VLA checker that gets turned on when you enable the taint checker (it's already enabled but it never fires because there isn't any taint information in the state).
The report may be technically correct but I think the entire idea of the checker never made sense in the first place. It doesn't matter that untrusted data is used to specify buffer size once; it matters that data with different levels of trust, or coming from different sources, is used to specify size of the same buffer on different occasions. It's fine to allocate a buffer of tainted size and then copy exactly that many bytes into it. It's not fine to allocate a buffer of fixed size and copy tainted amount of data into it. It's not fine to allocate a buffer of tainted size and copy a fixed amount of data into it. So I think this checker has to stay in alpha until it can be reworked to make sense. |
(Yeah we still have those false positives, though |
I agree that it's a blocking issue that some numerical arguments act as taint sinks unconditionally (even if the analyzer knows about constraints that show that the actual value is valid). I'd suggest that those concrete issues should be addressed by separate improvement commits, and we should reconsider this "move out of alpha" commit afterwards. As I briefly scrolled through the list of sink functions, it seems that there are two affected function families: the IMO in the
I think these checks should be implemented in MallocChecker, where we already have lots of logic related to the size of allocated memory. The char buf[100], srcBuf[100] = {0};
size_t size = tainted();
if (size > 100)
return;
memcpy(buf, srcBuf, size); while creating a bug report for the variant where the early return condition is @haoNoQ I don't really understand your remark that
What does "size of the same buffer on different occasions" mean here? I'd guess that it refers to something like void *get_mem(bool use_default_size) {
size_t size = 1024;
if (!use_default_size)
size = get_tainted_number_from_config_file();
// do something important that we want to do before each allocation...
return malloc(size);
} but I'd argue that this is actually valid (although a bit unusual) code. Personally I feel that the core idea of this checker is that it marks the return value of some functions as tainted to let security-conscious users may wish to enable pessimistic handling of that data. This is a direct and unavoidable "replace FNs with FPs tradeoff" and I think the users should have this option even if it produces a significant amout of FPs. However you're completely right that even in "pessimistic handling" the analyzer should at least consult the current state before reporting that "this number is tainted, the (Also note that this the simplistic "tainted data + sink = bug report" logic is appropriate for functions like |
No-no, I mean whatever I said in the next sentence, like literally the same buffer, not the same allocation site, not the same variable, but literally the same allocation, except it has multiple size-aware operations performed on it, like this: size_t N = tainted();
size_t M = 10;
// Should warn. Out-of-bounds access if N is too large.
char *buf1 = malloc(M);
memset(buf1, 0, N);
// Should warn. Out-of-bounds access if N is too small.
char *buf2 = malloc(N);
memset(buf2, 0, M);
// Perfectly valid code, even if both operations are performed with tainted, unconstrained size.
char *buf3 = malloc(N);
memset(buf3, 0, N); |
Thanks for the clarification! I agree with your examples and the behavior shown in them can be achieved via the solutions that I suggested (the separate patches for the One minor difference is that I think that it'd be nice to have a checker that reports code like struct rec *read_user_data(FILE *untrusted_file) {
int num_rec;
fscanf(untrusted_file, "%d", &num_rec);
struct rec *res = (struct res *)malloc(num_rec*sizeof(struct rec));
// read each record from the file
return res;
} because a malicious user could use this to exhaust the available memory effectively (i.e. without spending lots of his own resources). However, I agree that this is not the "same type of error" as the "real" sinks like |
@haoNoQ thanks for pointing out #61826 umbrella issue, I somehow missed that. I see this TaintPropagation checker as a generic flexible tool to find potential vulnerable data flows between any taint source and taint sink. The user should be configure sources and sinks in the yaml config file. The default settings will not always be correct. (Whether a file is a tainted source or not, depends a lot on the actual application domain and maybe the deployment.) The user can explicitly mark a value as sanitized in these cases:
See the example for details in the doc: Of course it is annoying to add such instructions for tainted integer values, for which the analyzer can perform bounds checking automatically and refute some trivial false positives cases which you also pointed out. I saw many false positives for malloc(..) calls when tainted buffer was attempted to be copied to another buffer and fixed that in this PR (by removing taint propagation for strlen):
At the end, I agree with you that it is better to remove all the unconditional integer taint sinks from the TaintPropagation checker and handle it in the MallocChecker, CStringChecker to take integer constraints into consideration.
The VLA case you pointed out I fixed in: In case of strings , it is very difficult (in general impossible?) to automatically recognize the programmer's attempt to santize the data. I would add the Does this approach sound acceptable to you? |
…lpha alpha.security.taint.TaintPropagation modeling checker is renamed to optin.taint.TaintPropagation. alpha.security.taint.GenericTaint user facing checker is renamed to optin.taint.genericTaint These checkers were stabilized and improved by recent commits, thus it's ready for (optional) production use. The checker is placed in the optin package as it implements an optional security analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to look at the mentioned 3 TPs, but the links appear to be broken.
Contentwise, I assume its a pure cut-and-paste modulo replacement of alpha
to optin
.
I have no objection from moving this checker out of alpha.
@steakhal now the commit is rebased and the results in the description are also refreshed (not broken). All the earlier problematic reports related to tainted integers (memset, malloc, memcpy ...) are not present now as these were removed from this checker as generic sinks by earlier commits. Your suggested doc fixes also added. |
I'm good now with the change, but I want both @Xazax-hun and @haoNoQ to accept this PR before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
When you're merging the commit, please remove the second half of the PR description ("Reports from ..." and everything after it). For the review it's good that the links to the results are highlighted at that prominent location, but we shouldn't permanently burn this information into the git commit log.
I have no concerns with moving forward here, my understanding is that the blockers have been resolved. Moreover, we are early in the development cycle for the next release so we still have a lot of time to get more experience with this check once it is move out of alpha. But I'd also wait for Artem to greenlight this. |
@haoNoQ gentle ping. Could you please check if this would be good to be merged now? thanks. |
This commit moves the **alpha.security.taint.TaintPropagation** and **alpha.security.taint.GenericTaint** checkers to the **optin.taint** optional package. These checkers were stabilized and improved by recent commits thus they are ready for production use.
This commit moves the alpha.security.taint.TaintPropagation and alpha.security.taint.GenericTaint checkers to the optin.taint optional package.
These checkers were stabilized and improved by recent commits thus it's ready for production use.
The optin.taint.GenericTaint user facing checker is placed in the optin package as it implements an optional security analysis.
Reports by the checker if switched on :
There are 2 new true positive reports in postgresql.
If the environment variables regarded untrusted, an attacker may be able to execute arbitrary command through popen.
The new report in VIM is unrelated to the taint analysis and likely appaered as the symbolic execution took different paths.