Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model: add system.network.{cpus,memory,network} #4912

Closed
wants to merge 1 commit into from

Conversation

axw
Copy link
Member

@axw axw commented Mar 4, 2021

Motivation/summary

Extend the system metadata with properties for describing the number of CPU cores, amount of system memory, and network information for end-user devices.

Checklist

How to test these changes

TODO

Related issues

#3657

@axw
Copy link
Member Author

axw commented Mar 4, 2021

@vigneshshanmugam here's a start on the new fields for RUM device info.

I started adding a new top-level "browser" field to metadata, but it occurred to me that this is probably also going to be relevant to mobile devices, and in that case there may not be any browsers involved. @bryce-b would we want to capture these properties for mobile?

Still not sure where to put the network info in the Elasticsearch schema. I'd appreciate any thoughts anyone may have.

@codecov-io
Copy link

Codecov Report

Merging #4912 (92a1938) into master (0d5f795) will decrease coverage by 0.35%.
The diff coverage is 76.92%.

@@            Coverage Diff             @@
##           master    #4912      +/-   ##
==========================================
- Coverage   76.57%   76.22%   -0.36%     
==========================================
  Files         166      166              
  Lines       10135    10160      +25     
==========================================
- Hits         7761     7744      -17     
- Misses       2374     2416      +42     
Impacted Files Coverage Δ
model/modeldecoder/v2/model.go 100.00% <ø> (ø)
model/system.go 61.29% <0.00%> (-9.09%) ⬇️
model/modeldecoder/v2/model_generated.go 69.47% <88.88%> (+0.31%) ⬆️
model/modeldecoder/v2/decoder.go 97.84% <100.00%> (+0.01%) ⬆️
beater/tracing.go 48.71% <0.00%> (-28.21%) ⬇️
beater/api/intake/handler.go 86.30% <0.00%> (-10.96%) ⬇️
processor/stream/processor.go 68.61% <0.00%> (-9.58%) ⬇️
...ssor/stream/package_tests/intake_test_processor.go 74.19% <0.00%> (-6.46%) ⬇️
...ack/apm-server/aggregation/txmetrics/aggregator.go 93.24% <0.00%> (ø)
... and 1 more

@apmmachine
Copy link
Contributor

apmmachine commented Mar 4, 2021

💔 Tests Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #4912 updated

  • Start Time: 2021-03-08T02:11:33.016+0000

  • Duration: 69 min 46 sec

  • Commit: 2842f86

Test stats 🧪

Test Results
Failed 194
Passed 4516
Skipped 11
Total 4721

Trends 🧪

Image of Build Times

Image of Tests

Test errors 194

Expand to view the tests failures

> Show only the first 10 test failures

Build and Test / Unit Test / TestPublishIntegration – beater
    Expand to view the error details

     Failed 
    

  • no stacktrace
Build and Test / Unit Test / TestPublishIntegration/Errors – beater
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

         integration_test.go:110: 
        integration_test.go:155:   map[string]interface{}{
              }
            
            Run `make update check-approvals` to verify the diff
             
    

Build and Test / Unit Test / TestPublishIntegration/Events – beater
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

         integration_test.go:110: 
        integration_test.go:155:   map[string]interface{}{
              }
            
            Run `make update check-approvals` to verify the diff
             
    

Build and Test / Unit Test / TestPublishIntegration/Spans – beater
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

         integration_test.go:110: 
        integration_test.go:155:   map[string]interface{}{
              }
            
            Run `make update check-approvals` to verify the diff
             
    

Build and Test / Unit Test / TestPublishIntegration/Transactions – beater
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

         integration_test.go:110: 
        integration_test.go:155:   map[string]interface{}{
              }
            
            Run `make update check-approvals` to verify the diff
             
    

Build and Test / Unit Test / TestServerOk – beater
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

         server_test.go:73:  
    

Build and Test / Unit Test / TestServerCORS – beater
    Expand to view the error details

     Failed 
    

  • no stacktrace
Build and Test / Unit Test / TestServerCORS/3 – beater
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

         server_test.go:307:  
    

Build and Test / Unit Test / TestServerCORS/4 – beater
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

         server_test.go:307:  
    

Build and Test / Unit Test / TestServerCORS/5 – beater
    Expand to view the error details

     Failed 
    

    Expand to view the stacktrace

         server_test.go:307:  
    

Steps errors 4

Expand to view the steps failures

Run Window tests
  • Took 5 min 5 sec . View more details on here
Run Window tests
  • Took 3 min 38 sec . View more details on here
Run Linux tests
  • Took 21 min 35 sec . View more details on here
  • Description: ./.ci/scripts/linux-test.sh
Test Sync
  • Took 4 min 30 sec . View more details on here
  • Description: ./.ci/scripts/sync.sh

Log output

Expand to view the last 100 lines of log output

[2021-03-08T02:43:00.862Z] 
[2021-03-08T02:43:00.862Z]     def test_rate_limit_small_hit(self):
[2021-03-08T02:43:00.862Z]         # all requests from the same ip
[2021-03-08T02:43:00.862Z]         # 4 events, batch size 10 => 10+1 events per requ
[2021-03-08T02:43:00.862Z]         codes = self.fire_events("events.ndjson", 8)
[2021-03-08T02:43:00.862Z] >       assert set(codes.keys()) == set([202, 429]), codes
[2021-03-08T02:43:00.862Z] E       AssertionError: defaultdict(<class 'int'>, {400: 8})
[2021-03-08T02:43:00.862Z] E       assert {400} == {202, 429}
[2021-03-08T02:43:00.862Z] E         Extra items in the left set:
[2021-03-08T02:43:00.862Z] E         400
[2021-03-08T02:43:00.862Z] E         Extra items in the right set:
[2021-03-08T02:43:00.862Z] E         202
[2021-03-08T02:43:00.862Z] E         429
[2021-03-08T02:43:00.862Z] E         Use -v to get the full diff
[2021-03-08T02:43:00.862Z] 
[2021-03-08T02:43:00.862Z] tests/system/test_requests.py:196: AssertionError
[2021-03-08T02:43:00.862Z] =============================== warnings summary ===============================
[2021-03-08T02:43:00.862Z] /tmp/python-env/build/ve/linux/lib/python3.7/site-packages/_pytest/junitxml.py:446
[2021-03-08T02:43:00.862Z]   /tmp/python-env/build/ve/linux/lib/python3.7/site-packages/_pytest/junitxml.py:446: PytestDeprecationWarning: The 'junit_family' default value will change to 'xunit2' in pytest 6.0. See:
[2021-03-08T02:43:00.862Z]     https://docs.pytest.org/en/stable/deprecations.html#junit-family-default-value-change-to-xunit2
[2021-03-08T02:43:00.862Z]   for more information.
[2021-03-08T02:43:00.862Z]     _issue_warning_captured(deprecated.JUNIT_XML_DEFAULT_FAMILY, config.hook, 2)
[2021-03-08T02:43:00.862Z] 
[2021-03-08T02:43:00.862Z] -- Docs: https://docs.pytest.org/en/stable/warnings.html
[2021-03-08T02:43:00.862Z] ---------- generated xml file: /src/apm-server/build/TEST-system.xml -----------
[2021-03-08T02:43:00.862Z] ============================= slowest 20 durations =============================
[2021-03-08T02:43:00.862Z] 8.23s call     tests/system/test_integration.py::ExperimentalModeTest::test_experimental_key_indexed
[2021-03-08T02:43:00.862Z] 8.07s call     tests/system/test_auth.py::TestAPIKeyCache::test_cache_full
[2021-03-08T02:43:00.862Z] 8.00s call     tests/system/test_integration.py::ProductionModeTest::test_experimental_key_indexed
[2021-03-08T02:43:00.862Z] 7.72s call     tests/system/test_jaeger.py::GRPCSamplingTest::test_jaeger_grpc_sampling
[2021-03-08T02:43:00.862Z] 5.54s call     tests/system/test_integration_sourcemap.py::SourcemappingCacheIntegrationTest::test_sourcemap_cache_expiration
[2021-03-08T02:43:00.862Z] 5.52s call     tests/system/test_auth.py::TestAPIKeyWithInvalidESConfig::test_backend_intake
[2021-03-08T02:43:00.862Z] 5.47s call     tests/system/test_integration_sourcemap.py::SourcemapESConfigAPIKey::test_sourcemap_applied
[2021-03-08T02:43:00.862Z] 5.41s call     tests/system/test_integration_acm.py::AgentConfigurationIntegrationTest::test_config_requests
[2021-03-08T02:43:00.862Z] 5.20s call     tests/system/test_integration_acm.py::AgentConfigurationIntegrationTest::test_rum_disabled
[2021-03-08T02:43:00.862Z] 5.19s call     tests/system/test_apikey_cmd.py::APIKeyCommandTest::test_info_by_name
[2021-03-08T02:43:00.862Z] 5.19s call     tests/system/test_pipelines.py::PipelineRegisterTest::test_pipeline_registered_and_applied
[2021-03-08T02:43:00.862Z] 4.96s call     tests/system/test_auth.py::TestAPIKeyWithESConfig::test_backend_intake
[2021-03-08T02:43:00.862Z] 4.84s call     tests/system/test_integration.py::EnrichEventIntegrationTest::test_enrich_rum_event
[2021-03-08T02:43:00.862Z] 4.80s call     tests/system/test_integration_sourcemap.py::SourcemappingIntegrationTest::test_fetch_latest_of_multiple_sourcemaps
[2021-03-08T02:43:00.862Z] 4.74s call     tests/system/test_jaeger.py::Test::test_jaeger_auth_tag_removed
[2021-03-08T02:43:00.862Z] 4.72s call     tests/system/test_jaeger.py::TestAuthTag::test_jaeger_authorized
[2021-03-08T02:43:00.862Z] 4.71s call     tests/system/test_integration_sourcemap.py::SourcemappingIntegrationTest::test_backend_error
[2021-03-08T02:43:00.862Z] 4.55s call     tests/system/test_integration_sourcemap.py::SourcemappingIntegrationTest::test_duplicated_sourcemap_warning
[2021-03-08T02:43:00.862Z] 4.28s call     tests/system/test_integration_sourcemap.py::SourcemappingIntegrationTest::test_sourcemap_mapping_cache_usage
[2021-03-08T02:43:00.862Z] 4.26s call     tests/system/test_jaeger.py::TestAuthTag::test_jaeger_unauthorized
[2021-03-08T02:43:00.862Z] =========================== short test summary info ============================
[2021-03-08T02:43:00.862Z] FAILED tests/system/test_auth.py::TestAccessDefault::test_full_access - Asser...
[2021-03-08T02:43:00.862Z] FAILED tests/system/test_auth.py::TestAPIKeyWithESConfig::test_backend_intake
[2021-03-08T02:43:00.862Z] FAILED tests/system/test_integration.py::Test::test_load_docs_with_template_and_add_error
[2021-03-08T02:43:00.862Z] FAILED tests/system/test_integration.py::Test::test_load_docs_with_template_and_add_transaction
[2021-03-08T02:43:00.862Z] FAILED tests/system/test_integration.py::Test::test_mark_type - AssertionErro...
[2021-03-08T02:43:00.862Z] FAILED tests/system/test_integration.py::Test::test_tags_type - AssertionErro...
[2021-03-08T02:43:00.862Z] FAILED tests/system/test_integration.py::EnrichEventIntegrationTest::test_backend_error
[2021-03-08T02:43:00.862Z] FAILED tests/system/test_integration.py::EnrichEventIntegrationTest::test_backend_transaction
[2021-03-08T02:43:00.863Z] FAILED tests/system/test_integration.py::EnrichEventIntegrationTest::test_enrich_backend_event
[2021-03-08T02:43:00.863Z] FAILED tests/system/test_integration.py::ILMDisabledIntegrationTest::test_override_indices_config
[2021-03-08T02:43:00.863Z] FAILED tests/system/test_integration.py::OverrideIndicesIntegrationTest::test_override_indices_config
[2021-03-08T02:43:00.863Z] FAILED tests/system/test_integration.py::OverrideIndicesILMFalseIntegrationTest::test_override_indices_config
[2021-03-08T02:43:00.863Z] FAILED tests/system/test_integration.py::OverrideIndicesILMTrueIntegrationTest::test_override_indices_config
[2021-03-08T02:43:00.863Z] FAILED tests/system/test_pipelines.py::PipelineRegisterTest::test_pipeline_registered_and_applied
[2021-03-08T02:43:00.863Z] FAILED tests/system/test_pipelines.py::PipelineConfigurationNoneTest::test_pipeline_not_applied
[2021-03-08T02:43:00.863Z] FAILED tests/system/test_pipelines.py::PipelineDisableRegisterTest::test_pipeline_not_registered
[2021-03-08T02:43:00.863Z] FAILED tests/system/test_requests.py::Test::test_deflate - AssertionError: 400
[2021-03-08T02:43:00.863Z] FAILED tests/system/test_requests.py::Test::test_gzip - AssertionError: 400
[2021-03-08T02:43:00.863Z] FAILED tests/system/test_requests.py::Test::test_ok - AssertionError: 400
[2021-03-08T02:43:00.863Z] FAILED tests/system/test_requests.py::Test::test_ok_verbose - AssertionError:...
[2021-03-08T02:43:00.863Z] FAILED tests/system/test_requests.py::ClientSideTest::test_ok - AssertionErro...
[2021-03-08T02:43:00.863Z] FAILED tests/system/test_requests.py::CorsTest::test_ok - AssertionError: 400
[2021-03-08T02:43:00.863Z] FAILED tests/system/test_requests.py::RateLimitTest::test_rate_limit_only_metadata
[2021-03-08T02:43:00.863Z] FAILED tests/system/test_requests.py::RateLimitTest::test_rate_limit_small_hit
[2021-03-08T02:43:00.863Z] ============ 24 failed, 111 passed, 1 warning in 338.23s (0:05:38) =============
[2021-03-08T02:43:00.863Z] make: *** [Makefile:94: system-tests] Error 1
[2021-03-08T02:43:01.812Z] Makefile:99: recipe for target 'docker-system-tests' failed
[2021-03-08T02:43:01.812Z] make: *** [docker-system-tests] Error 2
[2021-03-08T02:43:01.812Z] + cleanup
[2021-03-08T02:43:01.812Z] + rm -rf /tmp/tmp.Sf2prsMujz
[2021-03-08T02:43:01.812Z] + .ci/scripts/docker-get-logs.sh
[2021-03-08T02:43:03.784Z] Post stage
[2021-03-08T02:43:03.806Z] Running in /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4912/src/github.com/elastic/apm-server/build
[2021-03-08T02:43:03.834Z] Archiving artifacts
[2021-03-08T02:43:04.177Z] Recording test results
[2021-03-08T02:43:04.871Z] [Checks API] No suitable checks publisher found.
[2021-03-08T02:43:05.213Z] + tar --version
[2021-03-08T02:43:05.548Z] + tar --exclude=system-tests-linux-files.tgz -czf system-tests-linux-files.tgz system-tests
[2021-03-08T02:43:05.825Z] Archiving artifacts
[2021-03-08T02:43:06.200Z] Failed in branch System and Environment Tests
[2021-03-08T03:20:15.394Z] [INFO] For detailed information see: https://apm-ci.elastic.co/job/apm-integration-tests-selector-mbp/job/master/14836/display/redirect
[2021-03-08T03:20:15.778Z] Copied 19 artifacts from "APM Integration Test MBP Selector » master" build number 14836
[2021-03-08T03:20:16.821Z] Post stage
[2021-03-08T03:20:16.833Z] Recording test results
[2021-03-08T03:20:18.013Z] [Checks API] No suitable checks publisher found.
[2021-03-08T03:20:18.403Z] Running on worker-1225339 in /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4912
[2021-03-08T03:20:18.464Z] [INFO] getVaultSecret: Getting secrets
[2021-03-08T03:20:18.748Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2021-03-08T03:20:20.825Z] + chmod 755 generate-build-data.sh
[2021-03-08T03:20:20.825Z] + ./generate-build-data.sh https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4912/ https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4912/runs/3 FAILURE 4126418
[2021-03-08T03:20:20.825Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4912/runs/3/steps/?limit=10000 -o steps-info.json
[2021-03-08T03:20:22.275Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4912/runs/3/tests/?status=FAILED -o tests-errors.json
[2021-03-08T03:20:22.985Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4912/runs/3/log/ -o pipeline-log.txt

@vigneshshanmugam
Copy link
Member

here's a start on the new fields for RUM device info

This looks really good.

I started adding a new top-level "browser" field to metadata, but it occurred to me that this is probably also going to be relevant to mobile devices, and in that case there may not be any browsers involved.

Yeah very good point. Is there any harm keeping them in the top level as connectioninfo or networkinfo?

]
},
"effectivetype": {
"description": "EffectiveType describes the effective network type as one of a discrete set of network types, e.g. 'slow-2g', '2g', '3g', or '4g'. See https://developer.mozilla.org/en-US/docs/Web/API/NetworkInformation for more details.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move this URL under network so all fields can be referenced from there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done.

@axw axw force-pushed the browser-metrics branch from 92a1938 to 2842f86 Compare March 8, 2021 02:11
@axw
Copy link
Member Author

axw commented Mar 8, 2021

I started adding a new top-level "browser" field to metadata, but it occurred to me that this is probably also going to be relevant to mobile devices, and in that case there may not be any browsers involved.

Yeah very good point. Is there any harm keeping them in the top level as connectioninfo or networkinfo?

Those are a little bit too generic for my tastes. In theory we might later want to enable backend services to record network information on behalf of their clients. If we did that it would make more sense to nest under client, like client.network.*

I just had a quick look and there has been a discussion about adding device fields to ECS (elastic/ecs#448), but the conclusion was to use host for now.

I'm inclined to record the data as network.* in a way that won't collide with any ECS changes later on. We could capitalise the field names to avoid colliding, per https://www.elastic.co/guide/en/ecs/current/ecs-custom-fields-in-ecs.html#_capitalization

Thus in the ES docs we would end up fields like

  • network.EffectiveType
  • network.RTT
  • network.Downlink
  • network.SaveData
  • ...

WDYT?

@vigneshshanmugam
Copy link
Member

I'm inclined to record the data as network.* in a way that won't collide with any ECS changes later on. We could capitalise the field names to avoid colliding, per https://www.elastic.co/guide/en/ecs/current/ecs-custom-fields-in-ecs.html#_capitalization

Sounds good to me. We need to come up with a shorthand fields for the v3 spec and should be good to go.

@vigneshshanmugam
Copy link
Member

@axw Can we arrive on a consensus here and add support for these new RUM related fields now that we have already done the OTEL side #5436

@axw
Copy link
Member Author

axw commented Jun 21, 2021

@vigneshshanmugam yes, it's just a matter of prioritisation.

For OTel we decided to stick with lower_snake_case names and so I think we should do the same here. We ended up with network.connection_type for OTel; would you be OK with using that name instead of effectivetype?

@vigneshshanmugam
Copy link
Member

We ended up with network.connection_type for OTel; would you be OK with using that name instead of effectivetype?

Yes totally we can use the same naming scheme from OTEL. But for RUM, we are going to be using shortened fields for the V3 spec so it should not be a huge concern from the agent perspective.

yes, it's just a matter of prioritization.

I don't think we have any plans of adding the fields for 7.14 @drewpost Correct me if am wrong, But probably if we can include them as part of 7.15, that should be a good to include them as part of Exploratory view dashboard.

@axw
Copy link
Member Author

axw commented Jun 22, 2021

Yes totally we can use the same naming scheme from OTEL. But for RUM, we are going to be using shortened fields for the V3 spec so it should not be a huge concern from the agent perspective.

That's fine, I was just talking about the field name in Elasticsearch. We can still come up with a shortened name for v3.

@vigneshshanmugam
Copy link
Member

That's fine, I was just talking about the field name in Elasticsearch. We can still come up with a shortened name for v3.

Sounds good. ++ to keeping it as network.connection_type as it resonates pretty well to the https://developer.mozilla.org/en-US/docs/Web/API/NetworkInformation/effectiveType

@mergify
Copy link
Contributor

mergify bot commented Sep 20, 2021

This pull request does not have a backport label. Could you fix it @axw? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.x is the label to automatically backport to the 7.x branch.
  • backport-7./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Sep 20, 2021
@mergify
Copy link
Contributor

mergify bot commented Nov 10, 2021

This pull request is now in conflicts. Could you fix it @axw? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b browser-metrics upstream/browser-metrics
git merge upstream/master
git push upstream browser-metrics

@axw
Copy link
Member Author

axw commented Nov 11, 2021

This is very stale. It's unclear what the priority is, so let's come back to this when it is clearer. A design doc would be great.

@axw axw closed this Nov 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants