Skip to content
This repository was archived by the owner on Jul 27, 2023. It is now read-only.

[proc] put containers and processes running in containers in the same chunk #215

Merged
merged 23 commits into from
Nov 30, 2018

Conversation

shang-wang
Copy link
Contributor

To: @DataDog/burrito

When we split big payloads into smaller chunks, there was a bug that processes for some containers ended up in a different chunk, this results in that when we tried to attach container tags to processes, sometimes the lookup failed so the container tags are missing from those processes.

This PR tries to gather all processes running in containers together with the containers in a single message, while splitting non-container processes into chunks of messages.

Previously fmtProcesses and fmtContainers create process/container objects and also splitting them evenly, in this version they only return a list of objects, then pass them along to chunkProcesses and chunkContainers for splitting.

The change is everywhere and I have next to no confidence that this would work, please review with all your strength.


i := 0
for _, ctr := range containers {
chunk = append(chunk, ctr)
if len(chunk) == perChunk {
chunked[i] = chunk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not append to chunked instead of keeping track of i ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply copied over the existing logic, but I think chunked is already initialized with a list of empty slices: chunked := make([][]*model.Container, chunks), if you use append instead of using index it would just append after the empty ones at the end. And to do use pre-allocation is mostly for efficiency reasons.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think creating chunked with make([][]*model.Container, 0, chunks) will have the same efficiency advantages and allow you to use append.

GroupSize: int32(groupSize),
})
if len(procsByCtr) == 0 {
return []model.MessageBody{}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this change from nil to an empty slice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing changes, I don't remember why I did that either......I'll change it back

ctrProcs := make([]*model.Process, 0)
for _, ctr := range containers {
// if all processes are skipped for this container, don't send the container
if _, ok := procsByCtr[ctr.Id]; !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this ever happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your container is running only one process and that process is blacklisted, would that be the case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing this? We still want to send the container even if we're not collecting its processes, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that cause confusion in the UI when people click a container and expecting processes to show up but never happens? I could change the logic as well and add something in UI to make it obvious.

// pack all containered processes with containers in a message
validIndex := 0
ctrProcs := make([]*model.Process, 0)
for _, ctr := range containers {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to move this filtering out of createProcCtrMessages and pass the result in? It's sort of suprising that this function modifies containers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly disagree. the containers here is only used to create messages. Changing it in this function doesn't create any side effects. On the other hand, it's changed at the very end and isolates the logic well. But open to discussion just in case I'm wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containers is a parameter to this function and you're modifying it on line 139. All I'm saying is that if someone tries to re-use containers after passing it to createProcCtrMessages they're going to be surprised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll create a separate list inside the function and only append to it when needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leeavital this is addressed

maxSize: 2,
containers: []*containers.Container{},
blacklist: []string{},
expectedChunks: 2,
Copy link
Member

@sunhay sunhay Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rewrite these tests to actually confirm that the correct processes are grouped with the containers as expected, to increase confidence in this PR, instead of just using hardcoded pre-calculated result values.

It might also be useful to have some generation helper functions like to generateProcesses(100) to test more real world scenarios where maxSize is 100.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunhay A test is added to address this.

@@ -58,10 +58,10 @@ func (c *ContainerCheck) Run(cfg *config.AgentConfig, groupID int32) ([]model.Me
}

groupSize := len(ctrList) / cfg.MaxPerMessage
if len(ctrList) != cfg.MaxPerMessage {
if cfg.MaxPerMessage*groupSize != cfg.MaxPerMessage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If groupSize := len(ctrList) / cfg.MaxPerMessage and cfg.MaxPerMessage*groupSize,
then
cfg.MaxPerMessage*groupSize = cfg.MaxPerMessage * len(ctrList) / cfg.MaxPerMessage = len(ctrList)

// chunkContainers formats and chunks the ctrList into a slice of chunks using a specific number of chunks.
func chunkContainers(ctrList []*containers.Container, lastRates map[string]util.ContainerRateMetrics, lastRun time.Time, chunks, perChunk int) [][]*model.Container {
chunked := make([][]*model.Container, 0, chunks)
chunk := make([]*model.Container, 0, perChunk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems redundant to pass both perChunk and chunk because perChunk * chunks ~= len(ctrList). Can we get away with only passing one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because before calling chunkContainers the calculation is already done once: https://github.com/DataDog/datadog-process-agent/pull/215/files#diff-69c4a56eada3efac057ba71ef68a0aa2R60, it's just a matter of passing additional parameter or do the calculation twice.

@@ -17,6 +17,7 @@ import (

// Process is a singleton ProcessCheck.
var Process = &ProcessCheck{}
var emptyCtrID = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const


// fill in GroupSize for each CollectorProc and convert them to final messages
messages := make([]model.MessageBody, 0, len(msgs))
for _, m := range msgs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since you're already looping through the list of messages, it might clean things up to move the bookkeeping for counting procs/ctrs here:

e.g.

for _, m := range msgs {
  m.GroupSize = int32(len(msgs))
  messages = append(messages, m)
  totalProcs +=  len(m.Processes)  
  totocalContainers += len(m.Containers)
}

Copy link
Member

@sunhay sunhay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we start collecting containers with no processes, again :)

@shang-wang shang-wang merged commit 9afcd7c into master Nov 30, 2018
@shang-wang shang-wang deleted the shang/fix-chunk-bug branch November 30, 2018 21:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants