Skip to content

bitctrl/MeshCentral-AgentMonitoring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MeshCentral-AgentMonitoring

Table of Contents

About
Installation
Prometheus configuration
Metrics and labels
Example
TODO

About

This plugin exports metrics about and forwards metrics from MeshAgents with the MeshCentral monitoring module.

ATTENTION Currently this plugin depends on Add collectors to monitoring #6777.

Metrics, especially the up-metric, are cached with the timestamp of the event. Not only the state at the time of the scrape, but also up-down-cygles in the middle of two scrapes are recorded.

See Prometheus Federation.

Installation

Pre-requisite: First, make sure you have plugins and the monitoring module enabled for your MeshCentral installation. You might want to add plugin specific settings to the MeshCentrals config.json (this might change in future releases):

  "plugins": {
    "enabled": true,
    "pluginSettings": {
      "agent_monitoring": {
        "# prometheusJobName": "meshcentral@my-meshcentral-host",
        "# cacheDirPath": ".cache"
      }
    },
  },
  "prometheus": true,

Restart your MeshCentral server after making this change.

To install, simply add the plugin configuration URL when prompted: https://raw.githubusercontent.com/bitctrl/MeshCentral-AgentMonitoring/refs/heads/main/config.json

After installation copy the configuration template and adapt it to your needs. Restart your MeshCentral server again after making this change.

Prometheus configuration

To get the exported metrics into Prometheus it must be configured to honor_labels. See Prometheus Configuration.

---
scrape_configs:

  - job_name: meshcentral
    scrape_interval: 60s
    scrape_timeout: 60s
    honor_labels: true
    params:
      'clear-cache':
        - 'yes'
    static_configs:
      - targets:
          - meshcentral-1.example.org:9464
          - meshcentral-2.example.com:9464

Metrics and labels

The following metrics are exported for each agent if it's connected or recently disconnected:

  • up Either the agent is connected (value 1) and stable or it is not (value 0).
    • exported with the time of becomming stable, on each scrape while connected and stable and once with the time of disconnection
  • # HELP meshcentral_agent_warmup_seconds Time until agent became stable
    • exported once after the agent became stable
  • # HELP meshcentral_agent_stable_seconds Time agent was up and stable
    • exported once after a stable agent disconnected

Each of of the above metrics has the following labels:

  • instance: "meshagent" "/" meshid "/" nodeid
  • agent_name: the name of the agent as displayed in MeshCentral
  • mesh_name: the name of the group (aka mesh)
  • job: the configured job, defaults to "meshcentral" "@" hostname

The following metrics are exported for each group (aka mesh)

  • # HELP meshcentral_agent_state_changes_per_scrape MeshAgent state changes per scrape interval
  • # HELP meshcentral_agent_state_changes_per_second MeshAgent state changes per second

Each of of the above metrics has the following labels:

  • to_state: either "up" or "down"
  • mesh_name: the name of the group (aka mesh)
  • job: the configured job, defaults to "meshcentral" "@" hostname
  • The instance label is not set explicitly. Prometheus will add the target from its configuration, e.g. "meshcentral-1.example.org:9464".

Example

This is a Grafana screenshot from an erlier implementation based on a CGI and SSH. The Prometheus and Grafana stuff will be the same. sconnect-federate

Following some obfuscated metrics:

# TYPE up untyped
up{instance="meshagent/UM...U3/5U...xd",agent_name="TEAPOT",mesh_name="A.C.M.E.",job="meshcentral@vision"} 0 1740061012740
up{instance="meshagent/UM...U3/5U...xd",agent_name="TEAPOT",mesh_name="A.C.M.E.",job="meshcentral@vision"} 1 1740061020539
up{instance="meshagent/UM...U3/5U...xd",agent_name="TEAPOT",mesh_name="A.C.M.E.",job="meshcentral@vision"} 0 1740061019354

# TYPE meshcentral_agent_warmup_seconds gauge
# HELP meshcentral_agent_warmup_seconds Time until agent became stable
meshcentral_agent_warmup_seconds{instance="meshagent/UM...U3/5U...xd",agent_name="TEAPOT",mesh_name="A.C.M.E.",job="meshcentral@vision"} 1.737 1740061020539

# TYPE meshcentral_agent_stable_seconds gauge
# HELP meshcentral_agent_stable_seconds Time agent was up and stable
meshcentral_agent_stable_seconds{instance="meshagent/UM...U3/5U...xd",agent_name="TEAPOT",mesh_name="A.C.M.E.",job="meshcentral@vision"} 0.001 1740061019354

# TYPE meschcentral_agent_state_changes_per_scrape gauge
# HELP meschcentral_agent_state_changes_per_scrape MeshAgent state changes per scrape interval
meshcentral_agent_state_changes_per_scrape{to_state="up",mesh_name="example-corp",job="meshcentral@vision"} 22 1740061018235
meshcentral_agent_state_changes_per_scrape{to_state="down",mesh_name="example-ltd",job="meshcentral@vision"} 175 1740061018235

# TYPE meschcentral_agent_state_changes_per_second gauge
# HELP meschcentral_agent_state_changes_per_second MeshAgent state changes per second
meschcentral_agent_state_changes_per_second{to_state="up",mesh_name="example-corp",job="meshcentral@vision"} 5.2961001444390945 1740061018235
meschcentral_agent_state_changes_per_second{to_state="down",mesh_name="example-ltd",job="meshcentral@vision"} 42.12806933076553 1740061018235

# HELP process_cpu_user_seconds_total Total user CPU time spent in seconds.
# TYPE process_cpu_user_seconds_total counter
process_cpu_user_seconds_total 3.236218

# HELP meshcentral_userssessions Users Sessions
# TYPE meshcentral_userssessions gauge
meshcentral_userssessions 2

TODO

  • meshcore module to make MeshCentral something like a Pushgateway for the MeshAgent
  • code cleanup and documentation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published