Skip to content

Commit

Permalink
cli tool for computing sync health and exploring causes for message d…
Browse files Browse the repository at this point in the history
…iffs (#2073)

## Motivation

We want a better metric than current number of messages (i.e. "sync
percent") for measuring sync health. It doesn't account for the fact
that there are valid reasons for temporary displacements in message
counts (i.e. compaction, small network delays). Instead, we want to take
some chunk of time in the past and count up messages in that chunk and
compare the counts as a somewhat more evolved sync metric.

## Change Summary

CLI command that computes sync health between one node and peers,
explores nonzero diffs, and prints out summarized, structured output.

```
❯ node build/cli.js sync-health --max-num-peers 50 --primary-node lamia.farcaster.xyz:2283 --start-time-ofday 11:30:00 --stop-time-ofday 11:35:00
❯ cat health.out  |  jq
{
  "startTime": "2024-06-21T15:15:00.000Z",
  "stopTime": "2024-06-21T15:17:00.000Z",
  "primary": "hoyt.farcaster.xyz:2283",
  "peer": "84.247.160.59:2283",
  "primaryMessageCount": 5311,
  "peerMessageCount": 4930,
  "diff": 381,
  "diffPercentage": 0.0717379024665788,
  "numSuccessToPeer": 378,
  "numErrorToPeer": 6,
  "successTypesToPeer": [
    null,
    3,
    2,
    6
  ],
  "errorMessagesToPeer": [
    "no storage",
    "invalid signer: signer 0x02c2f67b36cec88270462f95d7c553d13bc339be61c02db591c9144fcf2592dd not found for fid 681926"
  ],
  "numSuccessToPrimary": 0,
  "numErrorToPrimary": 0,
  "successTypesToPrimary": [],
  "errorMessagesToPrimary": []
}
```

## Merge Checklist

_Choose all relevant options below by adding an `x` now or at any time
before submitting for review_

- [x] PR title adheres to the [conventional
commits](https://www.conventionalcommits.org/en/v1.0.0/) standard
- [x] PR has a
[changeset](https://github.com/farcasterxyz/hub-monorepo/blob/main/CONTRIBUTING.md#35-adding-changesets)
- [ ] PR has been tagged with a change label(s) (i.e. documentation,
feature, bugfix, or chore)
- [x] PR includes
[documentation](https://github.com/farcasterxyz/hub-monorepo/blob/main/CONTRIBUTING.md#32-writing-docs)
if necessary.
- [x] All [commits have been
signed](https://github.com/farcasterxyz/hub-monorepo/blob/main/CONTRIBUTING.md#22-signing-commits)

## Additional Context

If this is a relatively large or complex change, provide more details
here that will help reviewers

<!-- start pr-codex -->

---

## PR-Codex overview
This PR introduces a CLI tool `hubble` for measuring sync health. It
includes commands to measure sync health and calculates message
statistics between nodes.

### Detailed summary
- Added `sync-health` command to measure sync health
- Introduced classes for message stats and sync health stats
- Implemented functions to query message counts and compute stats
- Added functions to pick peers and compute sync IDs

> The following files were skipped due to too many changes:
`apps/hubble/src/utils/syncHealth.ts`

> ✨ Ask PR-Codex anything about this PR by commenting with `/codex {your
question}`

<!-- end pr-codex -->
  • Loading branch information
aditiharini authored Jun 21, 2024
1 parent ff4ec34 commit 2d26d30
Show file tree
Hide file tree
Showing 3 changed files with 575 additions and 0 deletions.
5 changes: 5 additions & 0 deletions .changeset/honest-flies-jog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@farcaster/hubble": minor
---

CLI tool for measuring sync health
23 changes: 23 additions & 0 deletions apps/hubble/src/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ import { profileGossipServer } from "./profile/gossipProfile.js";
import { getStatsdInitialization, initializeStatsd } from "./utils/statsd.js";
import os from "os";
import { startupCheck, StartupCheckStatus } from "./utils/startupCheck.js";
import { printSyncHealth } from "./utils/syncHealth.js";
import { mainnet, optimism } from "viem/chains";
import { finishAllProgressBars } from "./utils/progressBars.js";
import { MAINNET_BOOTSTRAP_PEERS } from "./bootstrapPeers.mainnet.js";
Expand Down Expand Up @@ -944,6 +945,28 @@ const readPeerId = async (filePath: string) => {
return createFromProtobuf(proto);
};

/*//////////////////////////////////////////////////////////////
SYNC HEALTH COMMAND
//////////////////////////////////////////////////////////////*/

app
.command("sync-health")
.description("Measure sync health")
.requiredOption("--start-time-ofday <time>", "How many seconds ago to start the sync health query")
.requiredOption("--stop-time-ofday <time>", "How many seconds to count over")
.option("--max-num-peers <count>", "Maximum number of peers to measure for", "20")
.option("--primary-node <host:port>", "Node to measure all peers against (required)", "hoyt.farcaster.xyz:2283")
.option("--outfile <filename>", "File to output measurements to", "health.out")
.action(async (cliOptions) => {
await printSyncHealth(
cliOptions.startTimeOfday,
cliOptions.stopTimeOfday,
cliOptions.maxNumPeers,
cliOptions.primaryNode,
cliOptions.outfile,
);
});

app.parse(process.argv);

///////////////////////////////////////////////////////////////
Expand Down
Loading

0 comments on commit 2d26d30

Please sign in to comment.