-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CF Dashboards not showing data: Fixes #423
CF Dashboards not showing data: Fixes #423
Conversation
_Notes here indicate changes as compared to previous v2._ Component Metrics v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. Doppler Server v2 - removed environment variable - cloudfoundry#305 - removed Sinks panel. - added v2 label. - added links to other v2 dashboards. Metron Agent v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. NEW v2 Dashboard Notes: _Notes here indicate changes as compared to v1._ Apps: Latency v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. - query adjusted - cloudfoundry#389, cloudfoundry#384, cloudfoundry#363, cloudfoundry#332 Apps: Requests v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. Apps: System v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. - Additional changes to the "Instances" panel to view Desired and Running as "stat" instead of "graph" to give better visual understanding with Desired vs Running. CF: BBS v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. - Removed Malformed LRPs panel. CF: Cell Summary v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. - Additional adjustments regarding the "Total" panels to display as %, and display the used/available as Current instead of avg/min/max. - Added "All" & multi-select option to the IP variable. CF: Cells Capacity v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. - Added new panel to show Disk used per cell, similar to the other two existing pie charts. - Adjusted panel "Cell with Least Memory" to display as stat. CF: Cloud Controller v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. - added metrics related to Threadqueue and Result metrics - cloudfoundry#226 CF: Diego Health v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. CF: KPIs v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. - removed Auctions per Second; Auctions; Fetch States Duration; desired LRP Sync Duration; Failed Staging Requests; Messages received/dropped by dopplers; Metron agent envelopes per second; metron agent messages sent per second; CC 5xx responses; CC job queue length (no data available for these categories) CF: LRPs & Tasks v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. - changing Desired LRP Sync Duration to use firehose_value_metric_bbs_convergence_lrp_duration instead of firehose_value_metric_nsync_bulker_desired_lrp_sync_duration CF: Organization Memory Quotas v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. CF: Organization Summary v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. CF: Route Emitter v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. CF: Router v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. CF: Services v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. CF: Space Summary v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. - removed metrics regarding cf_space_----. Changed these panels to display as Stats. CF: Summary v2 - removed environment variable - cloudfoundry#305 - added v2 label. - added links to other v2 dashboards. Updated jobs/cloudfoundry_dashboards/spec - added above v2 dashboards
Awesome work!! Would love to see this merged :) |
I work with OP and this is amazing work. A lot of effort went into this and we would really like to see this merged. |
Hi @thehandsomezebra thanks for the effort, I will test the dashboards in some of our environments to see how they work. I saw that you removed the environment from the queries. I understand they caused some trouble, but I think that might removed an important filter for people that use one prometheus deployment to monitor different environments. Without the environment filter/group by the metrics would end up in the same dashboard even if they are not in the same environment, or do I understand that wrong? |
Without the On the flip-side, I have kept the original dashboards that do include the Environment tag, if that does work for the users, they can default to those dashboards. An additional note: We also have Stratos running in our environments, and according to this , we are changing the This resulted in all of our |
I have the feeling that the stratos documentation is wrong, because the metrics environment variable is only used to 'name' certain label, that means e.g. to adjust firehose metrics to have a label environment with the name of your environment, same for cf exporter metrics. That means you can give the environment variable any value you want to help you determine the correct platform. That value will also show up in alert mails related to CF or BOSH which helps you to immediately see which environment the alert is about. Also the issue you linked for your PR talks about removing bosh-deployment, and not environment, but the linked issue can be normally fixed by adjusting the prometheus config instead of removing bosh-deployments. So what I saw when testing your new dashboards, I could see improved values for queries you fixed, the new panels you added, the style you adjusted but beside that I could see the exact same values as before. Fields that have been empty before, were still empty, so I am wondering if there might be some issue with the prometheus configuration causing the issues in the dashboards. So from my point of view it would be worth to check if there is some issue with your prometheus config, and if yes, to merge the improvements but still including the environment. You can also ping me in the slack channel for that. |
@benjaminguttmann-avtq Yesterday my team checked deeper into this, and we think there is certainly an issue with our env with regards to I'll work on getting that remedied & put the |
@thehandsomezebra Thanks for the effort :) |
…hanges to these, so I am removing them from the upcoming PR
…e. Also fixed a typo from my last commit
- added v2 label. - added links to other v2 dashboards. - query adjusted - cloudfoundry#389, cloudfoundry#384, cloudfoundry#363, cloudfoundry#332
- added v2 label. - added links to other v2 dashboards.
- added v2 label. - added links to other v2 dashboards. - Additional changes to the "Instances" panel to view Desired and Running as "stat" instead of "graph" to give better visual understanding with Desired vs Running.
- added v2 label. - added links to other v2 dashboards. - Removed Malformed LRPs panel.
- added v2 label. - added links to other v2 dashboards. - added metrics related to Threadqueue and Result metrics - cloudfoundry#226
- added v2 label. - added links to other v2 dashboards. - Additional adjustments regarding the "Total" panels to display as %, and display the used/available as Current instead of avg/min/max. - Added two new panels regarding CPU Usage.
- added v2 label. - added links to other v2 dashboards. - Added two panels to show Disk used per cell & Percent of CPU used per cell, similar to the other two existing pie charts. - Adjusted panel "Cell with Least Memory" to display as stat.
- added v2 label. - added links to other v2 dashboards.
- removed Sinks panel. - added v2 label. - added links to other v2 dashboards.
- added v2 label. - added links to other v2 dashboards. - removed Auctions per Second; Auctions; Fetch States Duration; desired LRP Sync Duration; Failed Staging Requests; Messages received/dropped by dopplers; Metron agent envelopes per second; metron agent messages sent per second; CC 5xx responses; CC job queue length (no data available for these categories)
- added v2 label. - added links to other v2 dashboards. - changing Desired LRP Sync Duration to use firehose_value_metric_bbs_convergence_lrp_duration instead of firehose_value_metric_nsync_bulker_desired_lrp_sync_duration
- added v2 label. - added links to other v2 dashboards.
- added v2 label. - added links to other v2 dashboards. - Pulled variables from `firehose_counter_event_gorouter_requests_route_emitter_total` instead of `firehose_counter_event_route_emitter_messages_emitted_total`
- added v2 label. - added links to other v2 dashboards. - removed metrics regarding cf_space_----. Changed these panels to display as Stats.
I edited my initial comment to accurately reflect my latest push. I think this should be all set for your review @benjaminguttmann-avtq One additional note, for anyone who may find this PR & comment stream. I mentioned the stratos documentation for instructions on how to connect Prometheus-boshrelease; and I wanted to follow with how I "solved" my issue with regards to the |
…board -- removing that from this PR.
I realized that I didn't actually update anything in the CF Apps Requests dashboard by making the v2. So I removed that from the PR. ( I missed it from the fixes I did in this commit. My mistake. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two small adjustments because it was sometimes misleading while testing when using the drop down to switch around the dashboards and suddenly without moving the cursor you go to the CF docs if just click :D
Per recommendation from @benjaminguttmann-avtq in cloudfoundry#423 - rearranged the buttons for v2 dropdown and the external link for the CF Metrics. Keeping it in the same arrangement as the other dashboards.
Per recommendation from @benjaminguttmann-avtq in cloudfoundry#423 - rearranged the buttons for v2 dropdown and the external link for the CF Metrics. Keeping it in the same arrangement as the other dashboards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Per recommendation from @benjaminguttmann-avtq in #423 - rearranged the buttons for v2 dropdown and the external link for the CF Metrics. Keeping it in the same arrangement as the other dashboards.
In an effort to fix issues for users who see no data & not having interruptions with users who have no issues - This PR contains updates to three dashboards & introduces seventeen new dashboards for CloudFoundry.
I have also made a few smaller tweaks, in the case that the metric was unavailable or if there was a small request to have additional information added or made clearer.
All of my updates are noted below:
UPDATED v2 Dashboard Notes:
Notes here indicate changes as compared to previous v2.
Component Metrics v2
Doppler Server v2
Metron Agent v2
NEW v2 Dashboard Notes:
Notes here indicate changes as compared to v1. V1 dashboards remain unchanged.
Apps: Latency v2
Apps: System v2
CF: BBS v2
CF: Cell Summary v2
CF: Cells Capacity v2
CF: Cloud Controller v2
CF: KPIs v2
CF: LRPs & Tasks v2
CF: Route Emitter v2
firehose_counter_event_gorouter_requests_route_emitter_total
instead offirehose_counter_event_route_emitter_messages_emitted_total
CF: Space Summary v2
Updated jobs/cloudfoundry_dashboards/spec