Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Slow analysis of arrays #1415

Closed
ryosa opened this issue Feb 1, 2018 · 10 comments · Fixed by #1664
Closed

Performance: Slow analysis of arrays #1415

ryosa opened this issue Feb 1, 2018 · 10 comments · Fixed by #1664
Milestone

Comments

@ryosa
Copy link

ryosa commented Feb 1, 2018

With SQ 5.6.7 and sonar-cxx 0.9.8, analysis of multi dimensional arrays is very slow. A 22-hour run (as a Jenkins job on CentOS7) didn't analyze a file which contains 512 2D arrays, the file size is near 5MB.
It seems that the analysis uses only one CPU. This may be a matter of Jenkins, but I wonder if the performance would be improved.
Thanks.

@guwirth
Copy link
Collaborator

guwirth commented Feb 2, 2018

Hi @ryosa can you provide a small sample please to reproduce the issue. Regards

@ryosa
Copy link
Author

ryosa commented Feb 5, 2018

@guwirth Attached is not large but should take several minutes to analyze. Please change the file name.
dim_test.cpp.txt

It turned out that the issue also takes place with large sized 1D arrays, say 1MB or so.

@guwirth
Copy link
Collaborator

guwirth commented Feb 6, 2018

@ryosa trying this in the SSLR toolkit it needs about 5s on my computer to parse it and create an AST.

It seems that the analysis uses only one CPU

Normally compiler are compiling one cpp file always with one core only. To reduce total time we can think about compiling more than one cpp file in parallel but that would help nothing here. Think we can have a look why total time is 5s.

@ryosa
Copy link
Author

ryosa commented Feb 7, 2018

@guwirth Thank you for your try and comment. If analysis of arrays should be improved, that will be a big advantage for sonar-cxx. I expect it will be done. :-)

@guwirth
Copy link
Collaborator

guwirth commented Feb 8, 2018

@ryosa thinks I'm wondering:

  • See no reason why there should be differences between one and multi dimensional arrays? Do you really have problems with multi dimensional arrays only?
  • Maybe it's a problem with hex numbers? Do you have the same problem with decimal numbers?
  • Maybe it's a problem with parsing (the grammar)?
  • Maybe it's a problem with AST creation (AST too big)? Memory problem?

Can you give some more hints?

@ryosa
Copy link
Author

ryosa commented Feb 9, 2018

@guwirth Analysis is slow with both 1D and 2D arrays, and the analyzing time seems to be in proportion to the size of array. It is not a matter of hex numbers because analysis is also slow with 1D floating point and decimal number arrays. The server has been equipped with 128GB memory. The shell script below runs prior to each analysis.

export SONAR_SCANNER_OPTS="-Xmx4096m"
export SONAR_RUNNER_OPTS="-Xmx1G -XX:MaxPermSize=512m"
ulimit -s 409600

The params below have been set in the properties.

sonar.ce.javaOpts=-Xmx4g -Xms256m -XX:+HeapDumpOnOutOfMemoryError
sonar.search.javaOpts=-Xmx8g -Xms4g -XX:+HeapDumpOnOutOfMemoryError

Clang analyzes the project (7M LOC) in less than 10 minutes, using 20+ cores, while cppcheck takes around 40 minutes with -j20 option, and Klocwork 8 hours. SQ hasn't ended the analysis, but finishes in 16 hours after excluding several files containing large arrays.

@Bertk
Copy link
Contributor

Bertk commented Feb 10, 2018

Hi @ryosa, can you please tell us why do you want to execute a source code analysis on huge multi-dimensional arrays? The plug-in does not execute any special analysis on arrays and to accelerate the analysis I recommend to exclude the files with the huge array.
Please check the SonarQube documentation – Narrowing the Focus

You can use also preprocessor statements and avoid parsing all values of the array while the AST is build e.g.

#ifdef SONAR_ANALYSIS
// define only a small array
#else 
// the array with all values
#endif

The pre-processor #define SONAR_ANALYSIS statement should be done in a special header file for the SonarQube analysis which can be activated for the analysis with the property sonar.cxx.forceIncludes

@guwirth guwirth changed the title Slow analysis of multi dimensional array Performance: Slow analysis of arrays Feb 10, 2018
@ryosa
Copy link
Author

ryosa commented Feb 13, 2018

@Bertk

I recommend to exclude the files with the huge array.

As already mentioned, I've done it. Yet the project includes a number of arrays by its nature, and it's virtually impossible to find arrays and judge if #ifdef SONAR_ANALYSIS can be inserted there, in a project of seven million LOC.

The plug-in does not execute any special analysis

Sonar-cxx may not do any special things, however, there must be a cause to reduce the performance of analysis.

ivangalkin added a commit to ivangalkin/sonar-cxx that referenced this issue Apr 29, 2018
* precompiled regex pattern are faster than String.split()
  (https://shipilev.net/talks/joker-Oct2014-string-catechism.pdf, from page 72)
* there are visitors, which use splitting for every ASTNode ->
  perceptible improvement

I used example from SonarOpenCommunity#1415 and could measure some spead-up (10%?)
The basic profiling with `-agentlib:hprof=cpu=samples,depth=100` was
however not enough to find a root cause
This was referenced Apr 29, 2018
@guwirth
Copy link
Collaborator

guwirth commented Dec 16, 2018

@ivangalkin are you familiar with profiling? I like to analyze where we are loosing so much time in the sample : dim_test.cpp.txt

@ivangalkin
Copy link
Contributor

@guwirth back in April I tried to profile the attached example by means of HPROF. I was not very successful, because the parser code contains several indirection (virtual calls, maybe even reflections). So the only information I can remember of was that SSLR was always on the top of the stack.

@guwirth guwirth added this to the 1.2.2 milestone Dec 23, 2018
Bertk pushed a commit to Bertk/sonar-cxx that referenced this issue Jun 22, 2019
- reduce the number of regex rules to detect a number
- use only one channel to detect numbers
- improve SonarOpenCommunity#1415
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

4 participants