Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[analyzer] Removing untrusted buffer size taint warning #68607

Merged
merged 10 commits into from
May 2, 2024

Conversation

dkrupp
Copy link
Contributor

@dkrupp dkrupp commented Oct 9, 2023

alpha.security.taint.TaintPropagation checker
emitted a false warning to the following code

char buf[100];
size_t size = tainted();
if (size > 100)
return;
memset(buf, 0, size); // warn: untrusted data used as buffer size

The checker does not take into consideration that the size tainted variable is bounded.

The false warning was emmitted also for the malloc/calloc calls.

These warning (the sink) should be implemented in the MallocChecker and CStringChecker checkers instead, where more sophisticated handling can be done taking into consideration buffer size and integer constraints.

alpha.security.taint.TaintPropagation checker
emitted a false warning to the following code

char buf[100];
size_t size = tainted();
if (size > 100)
  return;
memset(buf, 0, size); // warn: untrusted data used as buffer size

The checker does not take into consideration that the
size tainted variable is bounded.

The false warning was emmitted also for the malloc/calloc calls.

These warning (the sink) should be implemented in the
MallocChecker and CStringChecker checkers instead, where more sophisticated
handling can be done taking into consideration buffer size and integer constraints.
@dkrupp dkrupp requested a review from NagyDonat October 9, 2023 16:22
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html labels Oct 9, 2023
@dkrupp dkrupp self-assigned this Oct 9, 2023
@llvmbot
Copy link
Member

llvmbot commented Oct 9, 2023

@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Changes

alpha.security.taint.TaintPropagation checker
emitted a false warning to the following code

char buf[100];
size_t size = tainted();
if (size > 100)
return;
memset(buf, 0, size); // warn: untrusted data used as buffer size

The checker does not take into consideration that the size tainted variable is bounded.

The false warning was emmitted also for the malloc/calloc calls.

These warning (the sink) should be implemented in the MallocChecker and CStringChecker checkers instead, where more sophisticated handling can be done taking into consideration buffer size and integer constraints.


Full diff: https://github.com/llvm/llvm-project/pull/68607.diff

3 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp (+18-31)
  • (modified) clang/test/Analysis/taint-diagnostic-visitor.c (+34-34)
  • (modified) clang/test/Analysis/taint-generic.c (+15-11)
diff --git a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
index 4ceaf933d0bfc84..b949cac504eddfe 100644
--- a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
@@ -59,13 +59,6 @@ constexpr llvm::StringLiteral MsgSanitizeSystemArgs =
     "Untrusted data is passed to a system call "
     "(CERT/STR02-C. Sanitize data passed to complex subsystems)";
 
-/// Check if tainted data is used as a buffer size in strn.. functions,
-/// and allocators.
-constexpr llvm::StringLiteral MsgTaintedBufferSize =
-    "Untrusted data is used to specify the buffer size "
-    "(CERT/STR31-C. Guarantee that storage for strings has sufficient space "
-    "for character data and the null terminator)";
-
 /// Check if tainted data is used as a custom sink's parameter.
 constexpr llvm::StringLiteral MsgCustomSink =
     "Untrusted data is passed to a user-defined sink";
@@ -733,13 +726,23 @@ void GenericTaintChecker::initTaintRules(CheckerContext &C) const {
       {{CDF_MaybeBuiltin, {{"stpcpy"}}},
        TR::Prop({{1}}, {{0, ReturnValueIndex}})},
       {{CDF_MaybeBuiltin, {{"strcat"}}},
-       TR::Prop({{1}}, {{0, ReturnValueIndex}})},
+       TR::Prop({{0,1}}, {{0, ReturnValueIndex}})},
       {{CDF_MaybeBuiltin, {{"wcsncat"}}},
        TR::Prop({{1}}, {{0, ReturnValueIndex}})},
       {{CDF_MaybeBuiltin, {{"strdup"}}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
       {{CDF_MaybeBuiltin, {{"strdupa"}}},
        TR::Prop({{0}}, {{ReturnValueIndex}})},
       {{CDF_MaybeBuiltin, {{"wcsdup"}}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+      {{CDF_MaybeBuiltin, BI.getName(Builtin::BImemcpy)},
+       TR::Prop({{1, 2}}, {{0, ReturnValueIndex}})},
+      {{CDF_MaybeBuiltin, {BI.getName(Builtin::BImemmove)}},
+       TR::Prop({{1, 2}}, {{0, ReturnValueIndex}})},
+      {{CDF_MaybeBuiltin, {BI.getName(Builtin::BIstrncpy)}},
+       TR::Prop({{1, 2}}, {{0, ReturnValueIndex}})},
+      {{CDF_MaybeBuiltin, {BI.getName(Builtin::BIstrndup)}},
+       TR::Prop({{0, 1}}, {{ReturnValueIndex}})},
+      {{CDF_MaybeBuiltin, {"bcopy"}},
+       TR::Prop({{0, 2}}, {{1}})},
 
       // Sinks
       {{{"system"}}, TR::Sink({{0}}, MsgSanitizeSystemArgs)},
@@ -753,32 +756,16 @@ void GenericTaintChecker::initTaintRules(CheckerContext &C) const {
       {{{"execvp"}}, TR::Sink({{0, 1}}, MsgSanitizeSystemArgs)},
       {{{"execvpe"}}, TR::Sink({{0, 1, 2}}, MsgSanitizeSystemArgs)},
       {{{"dlopen"}}, TR::Sink({{0}}, MsgSanitizeSystemArgs)},
-      {{CDF_MaybeBuiltin, {{"malloc"}}}, TR::Sink({{0}}, MsgTaintedBufferSize)},
-      {{CDF_MaybeBuiltin, {{"calloc"}}}, TR::Sink({{0}}, MsgTaintedBufferSize)},
-      {{CDF_MaybeBuiltin, {{"alloca"}}}, TR::Sink({{0}}, MsgTaintedBufferSize)},
-      {{CDF_MaybeBuiltin, {{"memccpy"}}},
-       TR::Sink({{3}}, MsgTaintedBufferSize)},
-      {{CDF_MaybeBuiltin, {{"realloc"}}},
-       TR::Sink({{1}}, MsgTaintedBufferSize)},
+       // malloc, calloc, alloca, realloc, memccpy
+       // are intentionally left out as taint sinks
+       // because unconditional reporting for these functions
+       // generate many false positives.
+       // These taint sinks should be implemented in other checkers
+       // with more sophisticated sanitation heuristics.
       {{{{"setproctitle"}}}, TR::Sink({{0}, 1}, MsgUncontrolledFormatString)},
       {{{{"setproctitle_fast"}}},
        TR::Sink({{0}, 1}, MsgUncontrolledFormatString)},
-
-      // SinkProps
-      {{CDF_MaybeBuiltin, BI.getName(Builtin::BImemcpy)},
-       TR::SinkProp({{2}}, {{1, 2}}, {{0, ReturnValueIndex}},
-                    MsgTaintedBufferSize)},
-      {{CDF_MaybeBuiltin, {BI.getName(Builtin::BImemmove)}},
-       TR::SinkProp({{2}}, {{1, 2}}, {{0, ReturnValueIndex}},
-                    MsgTaintedBufferSize)},
-      {{CDF_MaybeBuiltin, {BI.getName(Builtin::BIstrncpy)}},
-       TR::SinkProp({{2}}, {{1, 2}}, {{0, ReturnValueIndex}},
-                    MsgTaintedBufferSize)},
-      {{CDF_MaybeBuiltin, {BI.getName(Builtin::BIstrndup)}},
-       TR::SinkProp({{1}}, {{0, 1}}, {{ReturnValueIndex}},
-                    MsgTaintedBufferSize)},
-      {{CDF_MaybeBuiltin, {{"bcopy"}}},
-       TR::SinkProp({{2}}, {{0, 2}}, {{1}}, MsgTaintedBufferSize)}};
+       };
 
   // `getenv` returns taint only in untrusted environments.
   if (TR::UntrustedEnv(C)) {
diff --git a/clang/test/Analysis/taint-diagnostic-visitor.c b/clang/test/Analysis/taint-diagnostic-visitor.c
index 8a7510177f3e444..e3370e64dab319f 100644
--- a/clang/test/Analysis/taint-diagnostic-visitor.c
+++ b/clang/test/Analysis/taint-diagnostic-visitor.c
@@ -10,7 +10,8 @@ int scanf(const char *restrict format, ...);
 int system(const char *command);
 char* getenv( const char* env_var );
 size_t strlen( const char* str );
-int atoi( const char* str );
+char *strcat( char *dest, const char *src );
+char* strcpy( char* dest, const char* src );
 void *malloc(size_t size );
 void free( void *ptr );
 char *fgets(char *str, int n, FILE *stream);
@@ -53,34 +54,32 @@ void taintDiagnosticVLA(void) {
 
 // Tests if the originated note is correctly placed even if the path is
 // propagating through variables and expressions
-char *taintDiagnosticPropagation(){
-  char *pathbuf;
-  char *size=getenv("SIZE"); // expected-note {{Taint originated here}}
-                                 // expected-note@-1 {{Taint propagated to the return value}}
-  if (size){ // expected-note {{Assuming 'size' is non-null}}
-	               // expected-note@-1 {{Taking true branch}}
-    pathbuf=(char*) malloc(atoi(size)); // expected-warning{{Untrusted data is used to specify the buffer size}}
-                                                // expected-note@-1{{Untrusted data is used to specify the buffer size}}
-                                                // expected-note@-2 {{Taint propagated to the return value}}
-    return pathbuf;
+int taintDiagnosticPropagation(){
+  int res;
+  char *cmd=getenv("CMD"); // expected-note {{Taint originated here}}
+                           // expected-note@-1 {{Taint propagated to the return value}}
+  if (cmd){ // expected-note {{Assuming 'cmd' is non-null}}
+	          // expected-note@-1 {{Taking true branch}}
+    res = system(cmd); // expected-warning{{Untrusted data is passed to a system call}}
+                       // expected-note@-1{{Untrusted data is passed to a system call}}
+    return res;
   }
-  return 0;
+  return -1;
 }
 
 // Taint origin should be marked correctly even if there are multiple taint
 // sources in the function
-char *taintDiagnosticPropagation2(){
-  char *pathbuf;
+int taintDiagnosticPropagation2(){
+  int res;
   char *user_env2=getenv("USER_ENV_VAR2");//unrelated taint source
-  char *size=getenv("SIZE"); // expected-note {{Taint originated here}}
-                                 // expected-note@-1 {{Taint propagated to the return value}}
+  char *cmd=getenv("CMD"); // expected-note {{Taint originated here}}
+                           // expected-note@-1 {{Taint propagated to the return value}}
   char *user_env=getenv("USER_ENV_VAR");//unrelated taint source
-  if (size){ // expected-note {{Assuming 'size' is non-null}}
-	               // expected-note@-1 {{Taking true branch}}
-    pathbuf=(char*) malloc(atoi(size)+1); // expected-warning{{Untrusted data is used to specify the buffer size}}
-                                                // expected-note@-1{{Untrusted data is used to specify the buffer size}}
-                                                // expected-note@-2 {{Taint propagated to the return value}}
-    return pathbuf;
+  if (cmd){ // expected-note {{Assuming 'cmd' is non-null}}
+	          // expected-note@-1 {{Taking true branch}}
+    res = system(cmd); // expected-warning{{Untrusted data is passed to a system call}}
+                       // expected-note@-1{{Untrusted data is passed to a system call}}
+    return res;
   }
   return 0;
 }
@@ -95,22 +94,23 @@ void testReadStdIn(){
 }
 
 void multipleTaintSources(void) {
-  int x,y,z;
-  scanf("%d", &x); // expected-note {{Taint originated here}}
+  char cmd[2048],file[1024];
+  scanf ("%1022[^\n] ", cmd); // expected-note {{Taint originated here}}
                    // expected-note@-1 {{Taint propagated to the 2nd argument}}
-  scanf("%d", &y); // expected-note {{Taint originated here}}
+  scanf ("%1023[^\n]", file); // expected-note {{Taint originated here}}
                    // expected-note@-1 {{Taint propagated to the 2nd argument}}
-  scanf("%d", &z);
-  int* ptr = (int*) malloc(y + x); // expected-warning {{Untrusted data is used to specify the buffer size}}
-                                   // expected-note@-1{{Untrusted data is used to specify the buffer size}}
-  free (ptr);
+  strcat(cmd, file);// expected-note {{Taint propagated to the 1st argument}}
+  system(cmd); // expected-warning {{Untrusted data is passed to a system call}}
+               // expected-note@-1{{Untrusted data is passed to a system call}}
 }
 
 void multipleTaintedArgs(void) {
-  int x,y;
-  scanf("%d %d", &x, &y); // expected-note {{Taint originated here}}
+  char cmd[1024],file[1024], buf[2048];
+  scanf("%1022s %1023s", cmd, file); // expected-note {{Taint originated here}}
                           // expected-note@-1 {{Taint propagated to the 2nd argument, 3rd argument}}
-  int* ptr = (int*) malloc(x + y); // expected-warning {{Untrusted data is used to specify the buffer size}}
-                                   // expected-note@-1{{Untrusted data is used to specify the buffer size}}
-  free (ptr);
+  strcpy(buf, cmd);// expected-note {{Taint propagated to the 1st argument}}
+  strcat(buf," ");// expected-note {{Taint propagated to the 1st argument}}
+  strcat(buf, file);// expected-note {{Taint propagated to the 1st argument}}
+  system(buf); // expected-warning {{Untrusted data is passed to a system call}}
+               // expected-note@-1{{Untrusted data is passed to a system call}}
 }
diff --git a/clang/test/Analysis/taint-generic.c b/clang/test/Analysis/taint-generic.c
index c6a01594f15abb7..78305bfbb9d35f0 100644
--- a/clang/test/Analysis/taint-generic.c
+++ b/clang/test/Analysis/taint-generic.c
@@ -305,15 +305,19 @@ void testGets_s(void) {
 
 void testTaintedBufferSize(void) {
   size_t ts;
+  // malloc, calloc, bcopy, memcpy functions are removed as unconditional sinks
+  // from the GenericTaintChecker's default configuration,
+  // because it generated too many false positives.
+  // We would need more sophisticated handling of these reports to enable
+  // these test-cases again.
   scanf("%zd", &ts);
-
-  int *buf1 = (int*)malloc(ts*sizeof(int)); // expected-warning {{Untrusted data is used to specify the buffer size}}
-  char *dst = (char*)calloc(ts, sizeof(char)); //expected-warning {{Untrusted data is used to specify the buffer size}}
-  bcopy(buf1, dst, ts); // expected-warning {{Untrusted data is used to specify the buffer size}}
-  __builtin_memcpy(dst, buf1, (ts + 4)*sizeof(char)); // expected-warning {{Untrusted data is used to specify the buffer size}}
+  int *buf1 = (int*)malloc(ts*sizeof(int)); // warn here, ts is unbounded and tainted
+  char *dst = (char*)calloc(ts, sizeof(char)); // warn here, ts is unbounded tainted
+  bcopy(buf1, dst, ts); // no warning here, since the size of buf1, dst equals ts. Cannot overflow.
+  __builtin_memcpy(dst, buf1, (ts + 4)*sizeof(char)); // warn here, dst overflows (whatever the value of ts)
 
   // If both buffers are trusted, do not issue a warning.
-  char *dst2 = (char*)malloc(ts*sizeof(char)); // expected-warning {{Untrusted data is used to specify the buffer size}}
+  char *dst2 = (char*)malloc(ts*sizeof(char)); // warn here, ts in unbounded
   strncat(dst2, dst, ts); // no-warning
 }
 
@@ -353,7 +357,7 @@ void testStruct(void) {
 
   sock = socket(AF_INET, SOCK_STREAM, 0);
   read(sock, &tainted, sizeof(tainted));
-  __builtin_memcpy(buffer, tainted.buf, tainted.length); // expected-warning {{Untrusted data is used to specify the buffer size}}
+  clang_analyzer_isTainted_int(tainted.length); // expected-warning {{YES }}
 }
 
 void testStructArray(void) {
@@ -368,17 +372,17 @@ void testStructArray(void) {
   __builtin_memset(srcbuf, 0, sizeof(srcbuf));
 
   read(sock, &tainted[0], sizeof(tainted));
-  __builtin_memcpy(dstbuf, srcbuf, tainted[0].length); // expected-warning {{Untrusted data is used to specify the buffer size}}
+  clang_analyzer_isTainted_int(tainted[0].length); // expected-warning {{YES}}
 
   __builtin_memset(&tainted, 0, sizeof(tainted));
   read(sock, &tainted, sizeof(tainted));
-  __builtin_memcpy(dstbuf, srcbuf, tainted[0].length); // expected-warning {{Untrusted data is used to specify the buffer size}}
+  clang_analyzer_isTainted_int(tainted[0].length); // expected-warning {{YES}}
 
   __builtin_memset(&tainted, 0, sizeof(tainted));
   // If we taint element 1, we should not raise an alert on taint for element 0 or element 2
   read(sock, &tainted[1], sizeof(tainted));
-  __builtin_memcpy(dstbuf, srcbuf, tainted[0].length); // no-warning
-  __builtin_memcpy(dstbuf, srcbuf, tainted[2].length); // no-warning
+  clang_analyzer_isTainted_int(tainted[0].length); // expected-warning {{NO}}
+  clang_analyzer_isTainted_int(tainted[2].length); // expected-warning {{NO}}
 }
 
 void testUnion(void) {

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #67352 for the discussion and test results that motivated this patch. This is a conservative change that'll remove some bug reports (both TPs and FPs) and would let us move the generic taint checker out of the current alpha stage. After that, we would be able to recover the lost TPs with follow-up commits that extend MallocChecker and CStringChecker.

There were some trivial grammar/formatting issues, I marked them with inline comments.

@dkrupp dkrupp force-pushed the taint_remove_int_sinks branch from 584187c to 3d6c80d Compare October 10, 2023 09:46
@dkrupp dkrupp requested review from NagyDonat and haoNoQ October 10, 2023 12:38
Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this old commit could be merged after some very minor clarifications.

In addition to the changes marked in inline comments, you could also add some TODO comments in MallocChecker and CStringChecker to mark the places where we want to add the code that would handle these functions.

Copy link

github-actions bot commented Apr 30, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few minor changes.

@dkrupp dkrupp merged commit 6ceb1c0 into llvm:main May 2, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants