Skip to content
This repository has been archived by the owner on Jan 24, 2025. It is now read-only.

OOM Crash - failed to store on BlockDropped in retainer #934

Closed
shufps opened this issue Apr 25, 2024 · 2 comments · Fixed by #947 or #946
Closed

OOM Crash - failed to store on BlockDropped in retainer #934

shufps opened this issue Apr 25, 2024 · 2 comments · Fixed by #947 or #946
Labels
team-node Issues for Node Team
Milestone

Comments

@shufps
Copy link
Contributor

shufps commented Apr 25, 2024

We have two nodes that crashed on out of memory.

It seems they started to log this error message:

Protocol.Engine0    	engine error (err=blockRetainer: failed to store on BlockDropped in retainer: cannot update block metadata for block BlockID(0xbc718142f4c3957f2e7484dec30b891a9edfc09b2d50c8faa8d753d09bb8dc12d4830000:33748) with state dropped as block is already committed)

About 50k times per hour.

Memory inflated at the time:
image

We have a log file when it started:
faucet.h.iota2-alphanet_2024-04-24-09.log

Unfortunately it happened at night, so we have no memory profile of this node.

But we have profile of another node that started at the same time but "recovered" later on (while memory usage still is high)
image

pprof.validator-2_20240425-075134_all.zip

Maybe it shows something 🙈

@alexsporn alexsporn added the team-consensus Issues for Consensus Team label Apr 25, 2024
@alexsporn alexsporn moved this to Backlog in iota-core Apr 25, 2024
@alexsporn alexsporn added this to the v1.0.0-beta milestone Apr 25, 2024
@alexsporn
Copy link
Member

Same underlying deadlock in the DDR-Scheduler as in #936

@alexsporn
Copy link
Member

goroutine 8456281 [sync.RWMutex.RLock, 1150 minutes]:
sync.runtime_SemacquireRWMutexR(0xc00048bb08?, 0xa0?, 0xc0004ef560?)
	/usr/local/go/src/runtime/sema.go:82 +0x25
sync.(*RWMutex).RLock(...)
	/usr/local/go/src/sync/rwmutex.go:70
github.com/iotaledger/iota-core/pkg/protocol/engine/congestioncontrol/scheduler/drr.(*Scheduler).ReadyBlocksCount(0xc000346fa0)

@alexsporn alexsporn added team-node Issues for Node Team and removed team-consensus Issues for Consensus Team labels Apr 29, 2024
This was linked to pull requests Apr 29, 2024
@github-project-automation github-project-automation bot moved this from Backlog to Done in iota-core Apr 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
team-node Issues for Node Team
Projects
Status: Done
2 participants