-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trace origin of crash when running multi-threaded with XROOTD #13081
Comments
A new Issue was created by @Dr15Jones (Chris Jones). @davidlange6, @smuzaffar, @Degano, @davidlt, @Dr15Jones can you please review it and eventually sign/assign? Thanks. cms-bot commands are list here #13029 |
assign core |
New categories assigned: core @Dr15Jones,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks |
@bbockelm looks like a new crash in the CMS xrootd interface. |
What's the best way to re-run the above IB by hand? Just guessing from the stdout and lack of other details, I would guess there's an issue with the delayed close callback. |
RelVals? |
I doubt the problem is easily repeatable. To rerun the job do
|
There are 72-virtual-core x86_64 machines (sadly only with 64GB of RAM) and we could hammer some workflows very hard (high number of threads and multiple cmsRun processes) just reveal threading issues faster. |
Bah. No, looking closer at the traceback, it's probably Not entirely sure though; tough to tell with inlining. Any way to recover symbols? |
Discussing with Chris, we now think the problem is in This singleton isn't thread-safe and is used when there are concurrent opens (i.e., something we don't do often except for the new threaded mixing module). So, it should be fixed regardless of whether it caused the problem here. |
I believe #13085 may fix this issue. |
Fix was merged, we think. Can this ticket be closed? |
The following traceback was found in the integration build after going to a multi-threaded mixing module which stresses the thread-safety fo XROOTD and out interface to it much harder:
https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/slc6_amd64_gcc493/CMSSW_8_0_THREADED_X_2016-01-21-2300/pyRelValMatrixLogs/run/25205.0_ZTT_13+ZTT_13INPUT+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25/step3_ZTT_13+ZTT_13INPUT+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25.log
WIth traceback
The text was updated successfully, but these errors were encountered: