-
Notifications
You must be signed in to change notification settings - Fork 9
[proc] put containers and processes running in containers in the same chunk #215
Conversation
This reverts commit 8dbd82f.
…t have only skipped processes in-place
checks/container.go
Outdated
|
||
i := 0 | ||
for _, ctr := range containers { | ||
chunk = append(chunk, ctr) | ||
if len(chunk) == perChunk { | ||
chunked[i] = chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not append to chunked instead of keeping track of i
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply copied over the existing logic, but I think chunked
is already initialized with a list of empty slices: chunked := make([][]*model.Container, chunks)
, if you use append
instead of using index it would just append after the empty ones at the end. And to do use pre-allocation is mostly for efficiency reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think creating chunked
with make([][]*model.Container, 0, chunks)
will have the same efficiency advantages and allow you to use append.
checks/process.go
Outdated
GroupSize: int32(groupSize), | ||
}) | ||
if len(procsByCtr) == 0 { | ||
return []model.MessageBody{}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did this change from nil to an empty slice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing changes, I don't remember why I did that either......I'll change it back
checks/process.go
Outdated
ctrProcs := make([]*model.Process, 0) | ||
for _, ctr := range containers { | ||
// if all processes are skipped for this container, don't send the container | ||
if _, ok := procsByCtr[ctr.Id]; !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this ever happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If your container is running only one process and that process is blacklisted, would that be the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing this? We still want to send the container even if we're not collecting its processes, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would that cause confusion in the UI when people click a container and expecting processes to show up but never happens? I could change the logic as well and add something in UI to make it obvious.
// pack all containered processes with containers in a message | ||
validIndex := 0 | ||
ctrProcs := make([]*model.Process, 0) | ||
for _, ctr := range containers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to move this filtering out of createProcCtrMessages
and pass the result in? It's sort of suprising that this function modifies containers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I slightly disagree. the containers
here is only used to create messages. Changing it in this function doesn't create any side effects. On the other hand, it's changed at the very end and isolates the logic well. But open to discussion just in case I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
containers
is a parameter to this function and you're modifying it on line 139. All I'm saying is that if someone tries to re-use containers
after passing it to createProcCtrMessages
they're going to be surprised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'll create a separate list inside the function and only append to it when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leeavital this is addressed
maxSize: 2, | ||
containers: []*containers.Container{}, | ||
blacklist: []string{}, | ||
expectedChunks: 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should rewrite these tests to actually confirm that the correct processes are grouped with the containers as expected, to increase confidence in this PR, instead of just using hardcoded pre-calculated result values.
It might also be useful to have some generation helper functions like to generateProcesses(100)
to test more real world scenarios where maxSize
is 100.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sunhay A test is added to address this.
checks/container.go
Outdated
@@ -58,10 +58,10 @@ func (c *ContainerCheck) Run(cfg *config.AgentConfig, groupID int32) ([]model.Me | |||
} | |||
|
|||
groupSize := len(ctrList) / cfg.MaxPerMessage | |||
if len(ctrList) != cfg.MaxPerMessage { | |||
if cfg.MaxPerMessage*groupSize != cfg.MaxPerMessage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If groupSize := len(ctrList) / cfg.MaxPerMessage
and cfg.MaxPerMessage*groupSize
,
then
cfg.MaxPerMessage*groupSize = cfg.MaxPerMessage * len(ctrList) / cfg.MaxPerMessage = len(ctrList)
// chunkContainers formats and chunks the ctrList into a slice of chunks using a specific number of chunks. | ||
func chunkContainers(ctrList []*containers.Container, lastRates map[string]util.ContainerRateMetrics, lastRun time.Time, chunks, perChunk int) [][]*model.Container { | ||
chunked := make([][]*model.Container, 0, chunks) | ||
chunk := make([]*model.Container, 0, perChunk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems redundant to pass both perChunk and chunk because perChunk * chunks ~= len(ctrList)
. Can we get away with only passing one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this because before calling chunkContainers
the calculation is already done once: https://github.com/DataDog/datadog-process-agent/pull/215/files#diff-69c4a56eada3efac057ba71ef68a0aa2R60, it's just a matter of passing additional parameter or do the calculation twice.
checks/process.go
Outdated
@@ -17,6 +17,7 @@ import ( | |||
|
|||
// Process is a singleton ProcessCheck. | |||
var Process = &ProcessCheck{} | |||
var emptyCtrID = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const
|
||
// fill in GroupSize for each CollectorProc and convert them to final messages | ||
messages := make([]model.MessageBody, 0, len(msgs)) | ||
for _, m := range msgs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since you're already looping through the list of messages, it might clean things up to move the bookkeeping for counting procs/ctrs here:
e.g.
for _, m := range msgs {
m.GroupSize = int32(len(msgs))
messages = append(messages, m)
totalProcs += len(m.Processes)
totocalContainers += len(m.Containers)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we start collecting containers with no processes, again :)
To: @DataDog/burrito
When we split big payloads into smaller chunks, there was a bug that processes for some containers ended up in a different chunk, this results in that when we tried to attach container tags to processes, sometimes the lookup failed so the container tags are missing from those processes.
This PR tries to gather all processes running in containers together with the containers in a single message, while splitting non-container processes into chunks of messages.
Previously
fmtProcesses
andfmtContainers
create process/container objects and also splitting them evenly, in this version they only return a list of objects, then pass them along tochunkProcesses
andchunkContainers
for splitting.The change is everywhere and I have next to no confidence that this would work, please review with all your strength.