-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HLT farm crash in Run 379530 #44759
Comments
cms-bot internal usage |
A new Issue was created by @trtomei. @smuzaffar, @Dr15Jones, @rappoccio, @sextonkennedy, @antoniovilela, @makortel can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign hlt |
@cms-sw/muon-pog-l2 FYI |
New categories assigned: hlt @Martin-Grunewald,@mmusich you have been requested to review this Pull request/Issue and eventually sign? Thanks |
the affected module actually resides in e/gamma package cmssw/RecoEgamma/EgammaHLTProducers/plugins/HLTEcalPFClusterIsolationProducer.cc Line 193 in 6e82cc8
|
@cms-sw/egamma-pog-l2 FYI |
So this happens because we have a cmssw/RecoEgamma/EgammaHLTProducers/plugins/HLTEcalPFClusterIsolationProducer.cc Lines 173 to 181 in 6e82cc8
we start with cmssw/RecoEgamma/EgammaHLTProducers/plugins/HLTEcalPFClusterIsolationProducer.cc Line 183 in 6e82cc8
resulting in the bound-check failure for I tested that this simple change: diff --git a/RecoEgamma/EgammaHLTProducers/plugins/HLTEcalPFClusterIsolationProducer.cc b/RecoEgamma/EgammaHLTProducers/plugins/HLTEcalPFClusterIsolationProducer.cc
index b4c4c8b9ff7..8c878698508 100644
--- a/RecoEgamma/EgammaHLTProducers/plugins/HLTEcalPFClusterIsolationProducer.cc
+++ b/RecoEgamma/EgammaHLTProducers/plugins/HLTEcalPFClusterIsolationProducer.cc
@@ -174,7 +174,7 @@ void HLTEcalPFClusterIsolationProducer<T1>::produce(edm::Event& iEvent, const ed
int iEA = -1;
auto cEta = std::abs(candRef->eta());
for (int bIt = absEtaLowEdges_.size() - 1; bIt > -1; bIt--) {
- if (cEta > absEtaLowEdges_.at(bIt)) {
+ if (cEta >= absEtaLowEdges_.at(bIt)) {
iEA = bIt;
break;
}
diff --git a/RecoEgamma/EgammaHLTProducers/plugins/HLTHcalPFClusterIsolationProducer.cc b/RecoEgamma/EgammaHLTProducers/plugins/HLTHcalPFClusterIsolationProducer.cc
index 0b195f62b7a..b23763c10a3 100644
--- a/RecoEgamma/EgammaHLTProducers/plugins/HLTHcalPFClusterIsolationProducer.cc
+++ b/RecoEgamma/EgammaHLTProducers/plugins/HLTHcalPFClusterIsolationProducer.cc
@@ -200,7 +200,7 @@ void HLTHcalPFClusterIsolationProducer<T1>::produce(edm::StreamID sid,
int iEA = -1;
auto cEta = std::abs(candRef->eta());
for (int bIt = absEtaLowEdges_.size() - 1; bIt > -1; bIt--) {
- if (cEta > absEtaLowEdges_.at(bIt)) {
+ if (cEta >= absEtaLowEdges_.at(bIt)) {
iEA = bIt;
break;
} prevents the crash. |
assign reconstruction |
New categories assigned: reconstruction @jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks |
Incidentally I am wondering if we might shave off some (probably tiny) reconstruction time by using the |
@mmusich The proposed solution makes sense. Would you like reco to make a PR or you'll take care of it? |
thanks for finding this, Marco. As egamma modules are written in similar fashion, similar unprotected codes are there as well.. so while fixing it maybe one can fix other such unprotected cases as well, if possible.. like for example cmssw/RecoEgamma/EgammaHLTProducers/plugins/EgammaHLTHcalVarProducerFromRecHit.cc Line 232 in 6e82cc8
cmssw/RecoEgamma/EgammaHLTProducers/plugins/HLTHcalPFClusterIsolationProducer.cc Line 203 in 6e82cc8
|
feel free to make the PR (provided @cms-sw/egamma-pog-l2 agrees). I am wondering though - since these classes were not touched in a long time, if the triggering factor (candidate eta=0) is just a matter of chance or there is some other underlying problem upstream. |
Since we are checking the size of |
Ok, since (current and former) egamma experts are reacting in this thread, I'll leave it in their hands, unless someone asks me to do something :-) |
i mean since Marco have it set up already, may be he can do the PR.
this needs some more thinking probably.. this is a good point |
OK, here's a minimal fix: #44764. |
unassign reconstruction
|
proposed fixes: |
A binary-search may even be faster and safer |
+hlt
|
This issue is fully signed and ready to be closed. |
Reporting a crash in Run 379530
Here's the recipe how to reproduce the crashes (tested with CMSSW_14_0_5_patch1 on lxplus8-gpu):
then prepare the reproducer as:
In the log, we will find the following message:
@cms-sw/hlt-l2 FYI
@cms-sw/muon-pog-l2 FYI (sorry, tagged the DPG by mistake)
The text was updated successfully, but these errors were encountered: