Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[analyzer] Moving TaintPropagation checker out of alpha #67352

Merged
merged 2 commits into from
Sep 26, 2024

Conversation

dkrupp
Copy link
Contributor

@dkrupp dkrupp commented Sep 25, 2023

This commit moves the alpha.security.taint.TaintPropagation and alpha.security.taint.GenericTaint checkers to the optin.taint optional package.

These checkers were stabilized and improved by recent commits thus it's ready for production use.

The optin.taint.GenericTaint user facing checker is placed in the optin package as it implements an optional security analysis.

Reports by the checker if switched on :

Project New Reports Resolved Reports
memcached No reports No reports
tmux No reports No reports
twin No reports No reports
vim 1 new reports No reports
openssl No reports No reports
sqlite No reports No reports
libwebm No reports No reports
xerces No reports No reports
postgres 2 new reports No reports

There are 2 new true positive reports in postgresql.
If the environment variables regarded untrusted, an attacker may be able to execute arbitrary command through popen.

The new report in VIM is unrelated to the taint analysis and likely appaered as the symbolic execution took different paths.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html labels Sep 25, 2023
@llvmbot
Copy link
Member

llvmbot commented Sep 25, 2023

@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Changes

This commit renames alpha.security.taint.TaintPropagation checker to optin.security.taint.TaintPropagation.

This checker was stabilized and improved by recent commits thus it's ready for production use.

The checker is placed in the optin package as it implements an optional security analysis.

Reports by the checker:

All reports are true positive in the sense that they correctly indicate that tainted data gets to the trusted source and the data path is correctly shown.

2 reports are indicating particularly interesting potential vulnerabilities.

The report Untrusted data is used to specify the buffer size
is likely not a vulnerability, as an environment string is just copied to another buffer.


Patch is 42.36 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67352.diff

25 Files Affected:

  • (modified) clang/docs/analyzer/checkers.rst (+248-244)
  • (modified) clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst (+2-2)
  • (modified) clang/include/clang/StaticAnalyzer/Checkers/Checkers.td (+4-3)
  • (modified) clang/test/Analysis/analyzer-config.c (+1-1)
  • (modified) clang/test/Analysis/assume-controlled-environment.c (+2-2)
  • (modified) clang/test/Analysis/bool-assignment.c (+2-2)
  • (modified) clang/test/Analysis/cxx-method-names.cpp (+1-1)
  • (modified) clang/test/Analysis/debug-exprinspection-istainted.c (+1-1)
  • (modified) clang/test/Analysis/diagnostics/sarif-diagnostics-taint-test.c (+1-1)
  • (modified) clang/test/Analysis/diagnostics/sarif-multi-diagnostic-test.c (+1-1)
  • (modified) clang/test/Analysis/global-region-invalidation-errno.c (+2-2)
  • (modified) clang/test/Analysis/global-region-invalidation.c (+1-1)
  • (modified) clang/test/Analysis/redefined_system.c (+1-1)
  • (modified) clang/test/Analysis/string.c (+1-1)
  • (modified) clang/test/Analysis/taint-checker-callback-order-has-definition.c (+1-1)
  • (modified) clang/test/Analysis/taint-checker-callback-order-without-definition.c (+1-1)
  • (modified) clang/test/Analysis/taint-diagnostic-visitor.c (+1-1)
  • (modified) clang/test/Analysis/taint-dumps.c (+1-1)
  • (modified) clang/test/Analysis/taint-generic.c (+13-13)
  • (modified) clang/test/Analysis/taint-generic.cpp (+1-1)
  • (modified) clang/test/Analysis/taint-tester.c (+1-1)
  • (modified) clang/test/Analysis/taint-tester.cpp (+1-1)
  • (modified) clang/test/Analysis/taint-tester.m (+3-3)
  • (modified) clang/utils/analyzer/SATestBuild.py (+1-1)
  • (modified) clang/www/analyzer/alpha_checks.html (+3-3)
diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst
index dbd6d7787823530..e68b10dc2c1c6c5 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -533,7 +533,253 @@ Warns when a nullable pointer is returned from a function that has _Nonnull retu
 optin
 ^^^^^
 
-Checkers for portability, performance or coding style specific rules.
+Optional checkers for portability, performance, security or coding style checkers 
+which may not be applicable to all projects.
+
+optin.security.taint
+^^^^^^^^^^^^^^^^^^^^
+
+Checkers implementing
+`taint analysis <https://en.wikipedia.org/wiki/Taint_checking>`_.
+
+.. optin-security-taint-TaintPropagation:
+
+optin.security.taint.TaintPropagation (C, C++)
+""""""""""""""""""""""""""""""""""""""""""""""
+
+Taint analysis identifies potential security vulnerabilities where the
+attacker can inject malicious data to the program to execute an attack
+(privilege escalation, command injection, SQL injection etc.).
+
+The malicious data is injected at the taint source (e.g. ``getenv()`` call)
+which is then propagated through function calls and being used as arguments of
+sensitive operations, also called as taint sinks (e.g. ``system()`` call).
+
+One can defend against this type of vulnerability by always checking and
+sanitizing the potentially malicious, untrusted user input.
+
+The goal of the checker is to discover and show to the user these potential
+taint source-sink pairs and the propagation call chain.
+
+The most notable examples of taint sources are:
+
+  - data from network
+  - files or standard input
+  - environment variables
+  - data from databases
+
+Let us examine a practical example of a Command Injection attack.
+
+.. code-block:: c
+
+  // Command Injection Vulnerability Example
+  int main(int argc, char** argv) {
+    char cmd[2048] = "/bin/cat ";
+    char filename[1024];
+    printf("Filename:");
+    scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
+    strcat(cmd, filename);
+    system(cmd); // Warning: Untrusted data is passed to a system call
+  }
+
+The program prints the content of any user specified file.
+Unfortunately the attacker can execute arbitrary commands
+with shell escapes. For example with the following input the `ls` command is also
+executed after the contents of `/etc/shadow` is printed.
+`Input: /etc/shadow ; ls /`
+
+The analysis implemented in this checker points out this problem.
+
+One can protect against such attack by for example checking if the provided
+input refers to a valid file and removing any invalid user input.
+
+.. code-block:: c
+
+  // No vulnerability anymore, but we still get the warning
+  void sanitizeFileName(char* filename){
+    if (access(filename,F_OK)){// Verifying user input
+      printf("File does not exist\n");
+      filename[0]='\0';
+      }
+  }
+  int main(int argc, char** argv) {
+    char cmd[2048] = "/bin/cat ";
+    char filename[1024];
+    printf("Filename:");
+    scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
+    sanitizeFileName(filename);// filename is safe after this point
+    if (!filename[0])
+      return -1;
+    strcat(cmd, filename);
+    system(cmd); // Superfluous Warning: Untrusted data is passed to a system call
+  }
+
+Unfortunately, the checker cannot discover automatically that the programmer
+have performed data sanitation, so it still emits the warning.
+
+One can get rid of this superfluous warning by telling by specifying the
+sanitation functions in the taint configuration file (see
+:doc:`user-docs/TaintAnalysisConfiguration`).
+
+.. code-block:: YAML
+
+  Filters:
+  - Name: sanitizeFileName
+    Args: [0]
+
+The clang invocation to pass the configuration file location:
+
+.. code-block:: bash
+
+  clang  --analyze -Xclang -analyzer-config  -Xclang optin.security.taint.TaintPropagation:Config=`pwd`/taint_config.yml ...
+
+If you are validating your inputs instead of sanitizing them, or don't want to
+mention each sanitizing function in our configuration,
+you can use a more generic approach.
+
+Introduce a generic no-op `csa_mark_sanitized(..)` function to
+tell the Clang Static Analyzer
+that the variable is safe to be used on that analysis path.
+
+.. code-block:: c
+
+  // Marking sanitized variables safe.
+  // No vulnerability anymore, no warning.
+
+  // User csa_mark_sanitize function is for the analyzer only
+  #ifdef __clang_analyzer__
+    void csa_mark_sanitized(const void *);
+  #endif
+
+  int main(int argc, char** argv) {
+    char cmd[2048] = "/bin/cat ";
+    char filename[1024];
+    printf("Filename:");
+    scanf (" %1023[^\n]", filename);
+    if (access(filename,F_OK)){// Verifying user input
+      printf("File does not exist\n");
+      return -1;
+    }
+    #ifdef __clang_analyzer__
+      csa_mark_sanitized(filename); // Indicating to CSA that filename variable is safe to be used after this point
+    #endif
+    strcat(cmd, filename);
+    system(cmd); // No warning
+  }
+
+Similarly to the previous example, you need to
+define a `Filter` function in a `YAML` configuration file
+and add the `csa_mark_sanitized` function.
+
+.. code-block:: YAML
+
+  Filters:
+  - Name: csa_mark_sanitized
+    Args: [0]
+
+Then calling `csa_mark_sanitized(X)` will tell the analyzer that `X` is safe to
+be used after this point, because its contents are verified. It is the
+responsibility of the programmer to ensure that this verification was indeed
+correct. Please note that `csa_mark_sanitized` function is only declared and
+used during Clang Static Analysis and skipped in (production) builds.
+
+Further examples of injection vulnerabilities this checker can find.
+
+.. code-block:: c
+
+  void test() {
+    char x = getchar(); // 'x' marked as tainted
+    system(&x); // warn: untrusted data is passed to a system call
+  }
+
+  // note: compiler internally checks if the second param to
+  // sprintf is a string literal or not.
+  // Use -Wno-format-security to suppress compiler warning.
+  void test() {
+    char s[10], buf[10];
+    fscanf(stdin, "%s", s); // 's' marked as tainted
+
+    sprintf(buf, s); // warn: untrusted data used as a format string
+  }
+
+  void test() {
+    size_t ts;
+    scanf("%zd", &ts); // 'ts' marked as tainted
+    int *p = (int *)malloc(ts * sizeof(int));
+      // warn: untrusted data used as buffer size
+  }
+
+There are built-in sources, propagations and sinks even if no external taint
+configuration is provided.
+
+Default sources:
+ ``_IO_getc``, ``fdopen``, ``fopen``, ``freopen``, ``get_current_dir_name``,
+ ``getch``, ``getchar``, ``getchar_unlocked``, ``getwd``, ``getcwd``,
+ ``getgroups``, ``gethostname``, ``getlogin``, ``getlogin_r``, ``getnameinfo``,
+ ``gets``, ``gets_s``, ``getseuserbyname``, ``readlink``, ``readlinkat``,
+ ``scanf``, ``scanf_s``, ``socket``, ``wgetch``
+
+Default propagations rules:
+ ``atoi``, ``atol``, ``atoll``, ``basename``, ``dirname``, ``fgetc``,
+ ``fgetln``, ``fgets``, ``fnmatch``, ``fread``, ``fscanf``, ``fscanf_s``,
+ ``index``, ``inflate``, ``isalnum``, ``isalpha``, ``isascii``, ``isblank``,
+ ``iscntrl``, ``isdigit``, ``isgraph``, ``islower``, ``isprint``, ``ispunct``,
+ ``isspace``, ``isupper``, ``isxdigit``, ``memchr``, ``memrchr``, ``sscanf``,
+ ``getc``, ``getc_unlocked``, ``getdelim``, ``getline``, ``getw``, ``memcmp``,
+ ``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``,
+ ``qsort_r``, ``rawmemchr``, ``read``, ``recv``, ``recvfrom``, ``rindex``,
+ ``strcasestr``, ``strchr``, ``strchrnul``, ``strcasecmp``, ``strcmp``,
+ ``strcspn``, ``strlen``, ``strncasecmp``, ``strncmp``, ``strndup``,
+ ``strndupa``, ``strnlen``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
+ ``strstr``, ``strtol``, ``strtoll``, ``strtoul``, ``strtoull``, ``tolower``,
+ ``toupper``, ``ttyname``, ``ttyname_r``, ``wctomb``, ``wcwidth``
+
+Default sinks:
+ ``printf``, ``setproctitle``, ``system``, ``popen``, ``execl``, ``execle``,
+ ``execlp``, ``execv``, ``execvp``, ``execvP``, ``execve``, ``dlopen``,
+ ``memcpy``, ``memmove``, ``strncpy``, ``strndup``, ``malloc``, ``calloc``,
+ ``alloca``, ``memccpy``, ``realloc``, ``bcopy``
+
+Please note that there are no built-in filter functions.
+
+One can configure their own taint sources, sinks, and propagation rules by
+providing a configuration file via checker option
+``optin.security.taint.TaintPropagation:Config``. The configuration file is in
+`YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ format. The
+taint-related options defined in the config file extend but do not override the
+built-in sources, rules, sinks. The format of the external taint configuration
+file is not stable, and could change without any notice even in a non-backward
+compatible way.
+
+For a more detailed description of configuration options, please see the
+:doc:`user-docs/TaintAnalysisConfiguration`. For an example see
+:ref:`clangsa-taint-configuration-example`.
+
+**Configuration**
+
+* `Config`  Specifies the name of the YAML configuration file. The user can
+  define their own taint sources and sinks.
+
+**Related Guidelines**
+
+* `CWE Data Neutralization Issues
+  <https://cwe.mitre.org/data/definitions/137.html>`_
+* `SEI Cert STR02-C. Sanitize data passed to complex subsystems
+  <https://wiki.sei.cmu.edu/confluence/display/c/STR02-C.+Sanitize+data+passed+to+complex+subsystems>`_
+* `SEI Cert ENV33-C. Do not call system()
+  <https://wiki.sei.cmu.edu/confluence/pages/viewpage.action?pageId=87152177>`_
+* `ENV03-C. Sanitize the environment when invoking external programs
+  <https://wiki.sei.cmu.edu/confluence/display/c/ENV03-C.+Sanitize+the+environment+when+invoking+external+programs>`_
+
+**Limitations**
+
+* The taintedness property is not propagated through function calls which are
+  unknown (or too complex) to the analyzer, unless there is a specific
+  propagation rule built-in to the checker or given in the YAML configuration
+  file. This causes potential true positive findings to be lost.
+
+Optional checkers for portability, performance, security
+or coding style specific rules.
 
 .. _optin-cplusplus-UninitializedObject:
 
@@ -2217,7 +2463,7 @@ Warn about buffer overflows (newer checker).
    buf[0][-1] = 1; // warn
  }
 
- // note: requires alpha.security.taint check turned on.
+ // note: requires optin.security.taint check turned on.
  void test() {
    char s[] = "abc";
    int x = getchar();
@@ -2406,248 +2652,6 @@ pointer. These functions include: getenv, localeconv, asctime, setlocale, strerr
     // dereferencing invalid pointer
   }
 
-alpha.security.taint
-^^^^^^^^^^^^^^^^^^^^
-
-Checkers implementing
-`taint analysis <https://en.wikipedia.org/wiki/Taint_checking>`_.
-
-.. _alpha-security-taint-TaintPropagation:
-
-alpha.security.taint.TaintPropagation (C, C++)
-""""""""""""""""""""""""""""""""""""""""""""""
-
-Taint analysis identifies potential security vulnerabilities where the
-attacker can inject malicious data to the program to execute an attack
-(privilege escalation, command injection, SQL injection etc.).
-
-The malicious data is injected at the taint source (e.g. ``getenv()`` call)
-which is then propagated through function calls and being used as arguments of
-sensitive operations, also called as taint sinks (e.g. ``system()`` call).
-
-One can defend against this type of vulnerability by always checking and
-sanitizing the potentially malicious, untrusted user input.
-
-The goal of the checker is to discover and show to the user these potential
-taint source-sink pairs and the propagation call chain.
-
-The most notable examples of taint sources are:
-
-  - data from network
-  - files or standard input
-  - environment variables
-  - data from databases
-
-Let us examine a practical example of a Command Injection attack.
-
-.. code-block:: c
-
-  // Command Injection Vulnerability Example
-  int main(int argc, char** argv) {
-    char cmd[2048] = "/bin/cat ";
-    char filename[1024];
-    printf("Filename:");
-    scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
-    strcat(cmd, filename);
-    system(cmd); // Warning: Untrusted data is passed to a system call
-  }
-
-The program prints the content of any user specified file.
-Unfortunately the attacker can execute arbitrary commands
-with shell escapes. For example with the following input the `ls` command is also
-executed after the contents of `/etc/shadow` is printed.
-`Input: /etc/shadow ; ls /`
-
-The analysis implemented in this checker points out this problem.
-
-One can protect against such attack by for example checking if the provided
-input refers to a valid file and removing any invalid user input.
-
-.. code-block:: c
-
-  // No vulnerability anymore, but we still get the warning
-  void sanitizeFileName(char* filename){
-    if (access(filename,F_OK)){// Verifying user input
-      printf("File does not exist\n");
-      filename[0]='\0';
-      }
-  }
-  int main(int argc, char** argv) {
-    char cmd[2048] = "/bin/cat ";
-    char filename[1024];
-    printf("Filename:");
-    scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
-    sanitizeFileName(filename);// filename is safe after this point
-    if (!filename[0])
-      return -1;
-    strcat(cmd, filename);
-    system(cmd); // Superfluous Warning: Untrusted data is passed to a system call
-  }
-
-Unfortunately, the checker cannot discover automatically that the programmer
-have performed data sanitation, so it still emits the warning.
-
-One can get rid of this superfluous warning by telling by specifying the
-sanitation functions in the taint configuration file (see
-:doc:`user-docs/TaintAnalysisConfiguration`).
-
-.. code-block:: YAML
-
-  Filters:
-  - Name: sanitizeFileName
-    Args: [0]
-
-The clang invocation to pass the configuration file location:
-
-.. code-block:: bash
-
-  clang  --analyze -Xclang -analyzer-config  -Xclang alpha.security.taint.TaintPropagation:Config=`pwd`/taint_config.yml ...
-
-If you are validating your inputs instead of sanitizing them, or don't want to
-mention each sanitizing function in our configuration,
-you can use a more generic approach.
-
-Introduce a generic no-op `csa_mark_sanitized(..)` function to
-tell the Clang Static Analyzer
-that the variable is safe to be used on that analysis path.
-
-.. code-block:: c
-
-  // Marking sanitized variables safe.
-  // No vulnerability anymore, no warning.
-
-  // User csa_mark_sanitize function is for the analyzer only
-  #ifdef __clang_analyzer__
-    void csa_mark_sanitized(const void *);
-  #endif
-
-  int main(int argc, char** argv) {
-    char cmd[2048] = "/bin/cat ";
-    char filename[1024];
-    printf("Filename:");
-    scanf (" %1023[^\n]", filename);
-    if (access(filename,F_OK)){// Verifying user input
-      printf("File does not exist\n");
-      return -1;
-    }
-    #ifdef __clang_analyzer__
-      csa_mark_sanitized(filename); // Indicating to CSA that filename variable is safe to be used after this point
-    #endif
-    strcat(cmd, filename);
-    system(cmd); // No warning
-  }
-
-Similarly to the previous example, you need to
-define a `Filter` function in a `YAML` configuration file
-and add the `csa_mark_sanitized` function.
-
-.. code-block:: YAML
-
-  Filters:
-  - Name: csa_mark_sanitized
-    Args: [0]
-
-Then calling `csa_mark_sanitized(X)` will tell the analyzer that `X` is safe to
-be used after this point, because its contents are verified. It is the
-responsibility of the programmer to ensure that this verification was indeed
-correct. Please note that `csa_mark_sanitized` function is only declared and
-used during Clang Static Analysis and skipped in (production) builds.
-
-Further examples of injection vulnerabilities this checker can find.
-
-.. code-block:: c
-
-  void test() {
-    char x = getchar(); // 'x' marked as tainted
-    system(&x); // warn: untrusted data is passed to a system call
-  }
-
-  // note: compiler internally checks if the second param to
-  // sprintf is a string literal or not.
-  // Use -Wno-format-security to suppress compiler warning.
-  void test() {
-    char s[10], buf[10];
-    fscanf(stdin, "%s", s); // 's' marked as tainted
-
-    sprintf(buf, s); // warn: untrusted data used as a format string
-  }
-
-  void test() {
-    size_t ts;
-    scanf("%zd", &ts); // 'ts' marked as tainted
-    int *p = (int *)malloc(ts * sizeof(int));
-      // warn: untrusted data used as buffer size
-  }
-
-There are built-in sources, propagations and sinks even if no external taint
-configuration is provided.
-
-Default sources:
- ``_IO_getc``, ``fdopen``, ``fopen``, ``freopen``, ``get_current_dir_name``,
- ``getch``, ``getchar``, ``getchar_unlocked``, ``getwd``, ``getcwd``,
- ``getgroups``, ``gethostname``, ``getlogin``, ``getlogin_r``, ``getnameinfo``,
- ``gets``, ``gets_s``, ``getseuserbyname``, ``readlink``, ``readlinkat``,
- ``scanf``, ``scanf_s``, ``socket``, ``wgetch``
-
-Default propagations rules:
- ``atoi``, ``atol``, ``atoll``, ``basename``, ``dirname``, ``fgetc``,
- ``fgetln``, ``fgets``, ``fnmatch``, ``fread``, ``fscanf``, ``fscanf_s``,
- ``index``, ``inflate``, ``isalnum``, ``isalpha``, ``isascii``, ``isblank``,
- ``iscntrl``, ``isdigit``, ``isgraph``, ``islower``, ``isprint``, ``ispunct``,
- ``isspace``, ``isupper``, ``isxdigit``, ``memchr``, ``memrchr``, ``sscanf``,
- ``getc``, ``getc_unlocked``, ``getdelim``, ``getline``, ``getw``, ``memcmp``,
- ``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``,
- ``qsort_r``, ``rawmemchr``, ``read``, ``recv``, ``recvfrom``, ``rindex``,
- ``strcasestr``, ``strchr``, ``strchrnul``, ``strcasecmp``, ``strcmp``,
- ``strcspn``, ``strncasecmp``, ``strncmp``, ``strndup``,
- ``strndupa``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
- ``strstr``, ``strtol``, ``strtoll``, ``strtoul``, ``strtoull``, ``tolower``,
- ``toupper``, ``ttyname``, ``ttyname_r``, ``wctomb``, ``wcwidth``
-
-Default sinks:
- ``printf``, ``setproctitle``, ``system``, ``popen``, ``execl``, ``execle``,
- ``execlp``, ``execv``, ``execvp``, ``execvP``, ``execve``, ``dlopen``,
- ``memcpy``, ``memmove``, ``strncpy``, ``strndup``, ``malloc``, ``calloc``,
- ``alloca``, ``memccpy``, ``realloc``, ``bcopy``
-
-Please note that there are no built-in filter functions.
-
-One can configure their own taint sources, sinks, and propagation rules by
-providing a configuration file via checker option
-``alpha.security.taint.TaintPropagation:Config``. The configuration file is in
-`YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ format. The
-taint-related options defined in the config file extend but do not override the
-built-in sources, rules, sinks. The format of the external taint configuration
-file is not stable, and could change without any notice even in a non-backward
-compatible way.
-
-For a more detailed description of configuration options, please see the
-:doc:`user-docs/TaintAnalysisConfiguration`. For an example see
-:ref:`clangsa-taint-configuration-example`.
-
-**Configuration**
-
-* `Config`  Specifies the name of the YAML configuration file. The user can
-  define their own taint sources and sinks.
-
-**Related Guidelines**
-
-* `CWE Data Neutralization Issues
-  <https://cwe.mitre.org/data/definitions/137.html>`_
-* `SEI Cert STR02-C. Sanitize data passed to complex subsystems
-  <https://wiki.sei.cmu.edu/confluence/display/c/STR02-C.+Sanitize+data+passed+to+complex+subsystems>`_
-* `SEI Cert ENV33-C. Do not call system()
-  <https://wiki.sei.cmu.edu/confluence/pages/viewpage.action?pageId=87152177>`_
-* `ENV03-C. Sanitize the environment when invoking external programs
-  <https://wiki.sei.cmu.edu/confluence/display/c/ENV03-C.+Sanitize+the+environment+when+invoking+external+programs>`_
-
-**Limitations**
-
-* The taintedness property is not propagated through function calls which are
-  unknown (or too complex) to the analyzer, unless there is a specific
-  propagation rule built-in to the checker or given in the YAML configuration
-  file. This causes potential true positive findings to be lost.
-
 alpha.unix
 ^^^^^^^^^^^
 
diff --git a/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst b/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst
index 94db84494e00b22..04fc54bc8181b7d 100644
--- a/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst
+++ b/clang/docs/...
[truncated]

@haoNoQ
Copy link
Collaborator

haoNoQ commented Sep 25, 2023

#61826 has my data (which I unfortunately couldn't publish as-is, but the order of magnitude is, around 300 reports).

My main problem with the tainted-number checkers is that they don't consider at constraints at all. Eg., this is clearly a false positive:

char buf[100];
size_t size = tainted();
if (size > 100)
  return;
memset(buf, 0, size); // EnTrUsTeD DAtA Is UsEd TO sPeCiFY BuFfER SiZe

This applies to this checker and the VLA checker that gets turned on when you enable the taint checker (it's already enabled but it never fires because there isn't any taint information in the state).

Untrusted data is used to specify the buffer size

The report may be technically correct but I think the entire idea of the checker never made sense in the first place. It doesn't matter that untrusted data is used to specify buffer size once; it matters that data with different levels of trust, or coming from different sources, is used to specify size of the same buffer on different occasions.

It's fine to allocate a buffer of tainted size and then copy exactly that many bytes into it. It's not fine to allocate a buffer of fixed size and copy tainted amount of data into it. It's not fine to allocate a buffer of tainted size and copy a fixed amount of data into it.

So I think this checker has to stay in alpha until it can be reworked to make sense.

@haoNoQ
Copy link
Collaborator

haoNoQ commented Sep 25, 2023

(Yeah we still have those false positives, though memset isn't actually covered yet as a potential sink: https://godbolt.org/z/6h5x87vMc)

@NagyDonat
Copy link
Contributor

I agree that it's a blocking issue that some numerical arguments act as taint sinks unconditionally (even if the analyzer knows about constraints that show that the actual value is valid). I'd suggest that those concrete issues should be addressed by separate improvement commits, and we should reconsider this "move out of alpha" commit afterwards.

As I briefly scrolled through the list of sink functions, it seems that there are two affected function families: the malloc-like functions and memcpy-like functions.

IMO in the malloc function family the ideal behavior would be:

  1. Report a tainted buffer size can be huge warning when we cannot deduce that the allocated memory area is less than a certain threshold (e.g. 1 MiB, the user should be able to configure this). I'd say that it's a bug (denial of service opportunity) if untrusted data can trigger huge memory allocations (gigabytes, terabytes, SIZE_MAX etc.).
  2. Mark the extent symbol of the allocated memory area as tainted and ensure that this triggers the the taint-aware (a/k/a paranoid) mode of e.g. ArrayBoundsV2 and similar checkers.

I think these checks should be implemented in MallocChecker, where we already have lots of logic related to the size of allocated memory.

The memcpy function family is a simpler case, there we should just eliminate the current "generic taint sink" behavior (= if the number of copied bits is tainted, always report a bug) and replace it with taint-aware optimism/pessimism in CStringChecker (where AFAIK we already have logic that checks whether the memcopied chunks fits into the target memory region). This logic would let us accept

char buf[100], srcBuf[100] = {0};
size_t size = tainted();
if (size > 100)
  return;
memcpy(buf, srcBuf, size);

while creating a bug report for the variant where the early return condition is (size > 101).

@haoNoQ I don't really understand your remark that

The report may be technically correct but I think the entire idea of the checker never made sense in the first place. It doesn't matter that untrusted data is used to specify buffer size once; it matters that data with different levels of trust, or coming from different sources, is used to specify size of the same buffer on different occasions.

What does "size of the same buffer on different occasions" mean here? I'd guess that it refers to something like

void *get_mem(bool use_default_size) {
  size_t size = 1024;
  if (!use_default_size)
    size = get_tainted_number_from_config_file();
  // do something important that we want to do before each allocation...
  return malloc(size);
}

but I'd argue that this is actually valid (although a bit unusual) code.

Personally I feel that the core idea of this checker is that it marks the return value of some functions as tainted to let security-conscious users may wish to enable pessimistic handling of that data. This is a direct and unavoidable "replace FNs with FPs tradeoff" and I think the users should have this option even if it produces a significant amout of FPs.

However you're completely right that even in "pessimistic handling" the analyzer should at least consult the current state before reporting that "this number is tainted, the memcpy / memset / etc. may cause a buffer overrun".

(Also note that this the simplistic "tainted data + sink = bug report" logic is appropriate for functions like system where the potentially tainted data is a string; because we cannot store meaningful constraints about strings.)

@haoNoQ
Copy link
Collaborator

haoNoQ commented Sep 26, 2023

@haoNoQ I don't really understand your remark that

The report may be technically correct but I think the entire idea of the checker never made sense in the first place. It doesn't matter that untrusted data is used to specify buffer size once; it matters that data with different levels of trust, or coming from different sources, is used to specify size of the same buffer on different occasions.

What does "size of the same buffer on different occasions" mean here? I'd guess that it refers to something like

void *get_mem(bool use_default_size) {
  size_t size = 1024;
  if (!use_default_size)
    size = get_tainted_number_from_config_file();
  // do something important that we want to do before each allocation...
  return malloc(size);
}

but I'd argue that this is actually valid (although a bit unusual) code.

No-no, I mean whatever I said in the next sentence, like literally the same buffer, not the same allocation site, not the same variable, but literally the same allocation, except it has multiple size-aware operations performed on it, like this:

size_t N = tainted();
size_t M = 10;

// Should warn. Out-of-bounds access if N is too large.
char *buf1 = malloc(M);
memset(buf1, 0,     N);

// Should warn. Out-of-bounds access if N is too small.
char *buf2 = malloc(N);
memset(buf2, 0,     M);

 // Perfectly valid code, even if both operations are performed with tainted, unconstrained size.
char *buf3 = malloc(N);
memset(buf3, 0,     N);

@NagyDonat
Copy link
Contributor

NagyDonat commented Sep 27, 2023

No-no, I mean whatever I said in the next sentence, like literally the same buffer, not the same allocation site, not the same variable, but literally the same allocation, except it has multiple size-aware operations performed on it, [...]

Thanks for the clarification! I agree with your examples and the behavior shown in them can be achieved via the solutions that I suggested (the separate patches for the malloc and memcpy function families) .

One minor difference is that I think that it'd be nice to have a checker that reports code like

struct rec *read_user_data(FILE *untrusted_file) {
  int num_rec;
  fscanf(untrusted_file, "%d", &num_rec);
  struct rec *res = (struct res *)malloc(num_rec*sizeof(struct rec));
  // read each record from the file
  return res;
}

because a malicious user could use this to exhaust the available memory effectively (i.e. without spending lots of his own resources). However, I agree that this is not the "same type of error" as the "real" sinks like system that appear in the taint checker, so my suggestion for this involves a distinct bug type (somewhere within MallocChecker) and a config option that can control (enable/disable) these reports separately.

@dkrupp
Copy link
Contributor Author

dkrupp commented Oct 9, 2023

@haoNoQ thanks for pointing out #61826 umbrella issue, I somehow missed that.

I see this TaintPropagation checker as a generic flexible tool to find potential vulnerable data flows between any taint source and taint sink. The user should be configure sources and sinks in the yaml config file. The default settings will not always be correct. (Whether a file is a tainted source or not, depends a lot on the actual application domain and maybe the deployment.)
This checker warns for every sink unconditionally.

The user can explicitly mark a value as sanitized in these cases:

  scanf (" %1023[^\n]", filename);
  if (access(filename,F_OK)){// Verifying user input
    printf("File does not exist\n");
    return -1;
  }
  #ifdef __clang_analyzer__
    csa_mark_sanitized(filename); // Indicating to CSA that filename variable is safe to be used after this point
  #endif
  strcat(cmd, filename);
  system(cmd); // No warning

See the example for details in the doc:
https://clang.llvm.org/docs/analyzer/checkers.html#alpha-security-taint-taintpropagation-c-c

Of course it is annoying to add such instructions for tainted integer values, for which the analyzer can perform bounds checking automatically and refute some trivial false positives cases which you also pointed out.

I saw many false positives for malloc(..) calls when tainted buffer was attempted to be copied to another buffer and fixed that in this PR (by removing taint propagation for strlen):

At the end, I agree with you that it is better to remove all the unconditional integer taint sinks from the TaintPropagation checker and handle it in the MallocChecker, CStringChecker to take integer constraints into consideration.
So this PR does that:

The VLA case you pointed out I fixed in:

In case of strings , it is very difficult (in general impossible?) to automatically recognize the programmer's attempt to santize the data.
So I think csa_mark_sanitized(string); is an acceptable solution to get rid of unwanted warnings (or using a triaging tool).

I would add the TaintPropagation checker to the optin package as an optional security analysis checker which is not enabled by default, because its usage is very application area dependent. Some of the security concious applications would want to create the config file for their sources and sinks and even mark a few FPs, if they can catch a few vulnerabilities.

Does this approach sound acceptable to you?
From which project did you see the ~300 false reports?

…lpha

alpha.security.taint.TaintPropagation
modeling checker is renamed to optin.taint.TaintPropagation.

alpha.security.taint.GenericTaint
user facing checker is renamed to optin.taint.genericTaint

These checkers were stabilized and improved by recent commits,
thus it's ready for (optional) production use.

The checker is placed in the optin package as it implements
an optional security analysis.
Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to look at the mentioned 3 TPs, but the links appear to be broken.

Contentwise, I assume its a pure cut-and-paste modulo replacement of alpha to optin.
I have no objection from moving this checker out of alpha.

@steakhal steakhal removed the clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html label Aug 22, 2024
@dkrupp
Copy link
Contributor Author

dkrupp commented Aug 27, 2024

@steakhal now the commit is rebased and the results in the description are also refreshed (not broken).

All the earlier problematic reports related to tainted integers (memset, malloc, memcpy ...) are not present now as these were removed from this checker as generic sinks by earlier commits.

Your suggested doc fixes also added.

@dkrupp dkrupp requested a review from steakhal August 27, 2024 09:36
@steakhal
Copy link
Contributor

I'm good now with the change, but I want both @Xazax-hun and @haoNoQ to accept this PR before landing.

@NagyDonat NagyDonat requested a review from Xazax-hun August 28, 2024 09:41
Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

When you're merging the commit, please remove the second half of the PR description ("Reports from ..." and everything after it). For the review it's good that the links to the results are highlighted at that prominent location, but we shouldn't permanently burn this information into the git commit log.

@Xazax-hun
Copy link
Collaborator

I have no concerns with moving forward here, my understanding is that the blockers have been resolved. Moreover, we are early in the development cycle for the next release so we still have a lot of time to get more experience with this check once it is move out of alpha. But I'd also wait for Artem to greenlight this.

@dkrupp
Copy link
Contributor Author

dkrupp commented Sep 9, 2024

@haoNoQ gentle ping. Could you please check if this would be good to be merged now? thanks.

@dkrupp dkrupp merged commit f82fb06 into llvm:main Sep 26, 2024
9 checks passed
Sterling-Augustine pushed a commit to Sterling-Augustine/llvm-project that referenced this pull request Sep 27, 2024
This commit moves the **alpha.security.taint.TaintPropagation** and
**alpha.security.taint.GenericTaint** checkers to the **optin.taint**
optional package.

These checkers were stabilized and improved by recent commits thus 
they are ready for production use.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants