Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance profile Reaper 2.0+ (Cassandra storage) #896

Closed
michaelsembwever opened this issue Apr 15, 2020 · 6 comments
Closed

Performance profile Reaper 2.0+ (Cassandra storage) #896

michaelsembwever opened this issue Apr 15, 2020 · 6 comments
Assignees

Comments

@michaelsembwever
Copy link
Member

Analyse Reaper trunk to identify performance bottlenecks and weaknesses. Focusing only on the Cassandra backend.

This should involve:

  • stress testing
  • snapshots of CPU and GC, Memory, Garbage Collection, Threads and Classes profiling graphs
  • Method, Object, and Thread, call graphs
  • CQL call graphs
  • flame graphs

The outcome of this ticket should be observations that validate and warrant further performance optimisation in the codebase. Subsequent issues/PRs will be opened for each identified bottleneck/weakness. For example #851

Only performance optimisations that are simple patches will be applied to 2.0. Otherwise work will be done in master, slated for 2.1.0

Anyone is free to run performance tests and contribute additional findings to this issue.

@adejanovski adejanovski added the new Issues requiring triage label Apr 15, 2020
@michaelsembwever michaelsembwever added 1 - Ready and removed new Issues requiring triage labels Apr 15, 2020
@michaelsembwever michaelsembwever self-assigned this Apr 15, 2020
@michaelsembwever michaelsembwever changed the title Performance profile on Reaper 2.0+ (Cassandra storage) Performance profile Reaper 2.0+ (Cassandra storage) Apr 15, 2020
@michaelsembwever
Copy link
Member Author

michaelsembwever commented May 18, 2020

The following profile runs were done on a 6 node single-dc ccm cluster holding ~20GB of inconsistent data and one reaper instance using ALL mode with C* storage. Reaper was built using SHA d6b96e2.

Observations

First is shown hotspots on various operations. Only methods under io.cassandrareaper. are shown.

Hotspots during a normal boot sequence:
reaper-d6b96e2-normal-boot

Hotspots during the initial boot where schema creation occurs:
reaper-d6b96e2-initialisation-boot

The difference between the two:
reaper-d6b96e2-boot-diff

Hotspots during creating a cluster:
reaper-d6b96e2-add-cluster

Hotspots during creating a repair run:
reaper-d6b96e2-add-repair

Hotspots during a repair of the 20GB using 122 segments:
reaper-d6b96e2-run-repair

Hotspots during ~80 minutes of idle:
reaper-d6b96e2-80minutes-idle
And looking at idle's total time in forward calls:
Screen Shot 2020-05-18 at 17 41 06
And looking at the reverse calls:
Screen Shot 2020-05-18 at 17 42 40

Focusing on Reaper's performance during active reaper and while idle…

During active repair the noticeable hotspots are

  • ClusterFacade.tokenRangeToEndpoint
    forward calls…
    reaper-d6b96e2-run-repair--tokenRangeToEndpoint-calls
    reverse calls…

Screen Shot 2020-05-18 at 16 38 51

  • The ReaperApplication.scheduleRepairManager, DiagEventSubscriptionService.updateEnabledEvents, DiagEventSubscriptionService.pingSseClients, ClusterResource.getEndpointState, SegmentRunner.getNodeMetrics. Of these SegmentRunner.getNodeMetrics uses the most time.
    reaper-d6b96e2-run-repair--scheduled-lambda-calls

  • In the repair path: SegmentRunner.runRepair and RepairRunner.startNextSegment

Screen Shot 2020-05-18 at 17 03 01

Screen Shot 2020-05-18 at 17 05 39

Looking at the calls to the Cassandra storage backend, nothing troublesome stands out on the Reaper side:
reaper-d6b96e2-run-repair--cassandra-storage

Looking at total times to judge Reaper's impact on the Cassandra nodes
Screen Shot 2020-05-18 at 17 11 28

Looking at CQL statements executed during the repair, parsing the slow query logger with
awk -F"bound values]" '{print $2}' query-logger.txt | sort | uniq -c | sort -rn

3774  SELECT * FROM cluster;
2001  SELECT id,cluster_name,repair_unit_id,cause,owner,state,creation_time,start_time,end_time,pause_time,intensity,last_event,segment_count,repair_parallelism,tables FROM repair_run WHERE id = ? LIMIT 1;
1823  SELECT id,repair_unit_id,segment_id,start_token,end_token,segment_state,coordinator_host,segment_start_time,segment_end_time,fail_count, token_ranges FROM repair_run WHERE id = ? and segment_id = ?;
1438  SELECT * FROM repair_run_by_cluster WHERE cluster_name = ?;
1401  SELECT count(segment_id) FROM repair_run WHERE id = ? AND segment_state = ? ALLOW FILTERING;
1025  SELECT id,repair_unit_id,segment_id,start_token,end_token,segment_state,coordinator_host,segment_start_time,segment_end_time,fail_count, token_ranges FROM repair_run WHERE id = ? AND segment_state = ? ALLOW FILTERING;
 826  SELECT * FROM diagnostic_event_subscription;
 789  SELECT * FROM cluster WHERE name = ?;
 408  SELECT * FROM leader;
 378  SELECT * FROM repair_schedule_v1;
 364  UPDATE leader USING TTL ? SET reaper_instance_id = ?, reaper_instance_host = ?, last_heartbeat = toTimestamp(now()) WHERE leader_id = ? IF reaper_instance_id = ?;
 353  INSERT INTO repair_run(id, cluster_name, repair_unit_id, cause, owner, state, creation_time, start_time, end_time, pause_time, intensity, last_event, segment_count, repair_parallelism,tables) VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?);
 286  SELECT id,repair_unit_id,segment_id,start_token,end_token,segment_state,coordinator_host,segment_start_time,segment_end_time,fail_count, token_ranges FROM repair_run WHERE id = ?;
 240  BEGIN UNLOGGED BATCH INSERT INTO repair_run(id,segment_id,segment_state,coordinator_host,segment_start_time,fail_count) VALUES(?, ?, ?, ?, ?, ?); APPLY BATCH;
 178  DELETE FROM leader WHERE leader_id = ? IF reaper_instance_id = ?;
 177  SELECT count(*) FROM repair_run WHERE id = ?;
 176  INSERT INTO leader(leader_id, reaper_instance_id, reaper_instance_host, last_heartbeat)VALUES(?, ?, ?, toTimestamp(now())) IF NOT EXISTS USING TTL ?;
 122  SELECT reaper_instance_id FROM running_reapers;
 122  BEGIN UNLOGGED BATCH INSERT INTO repair_run(id,segment_id,segment_state,coordinator_host,segment_start_time,fail_count) VALUES(?, ?, ?, ?, ?, ?); INSERT INTO repair_run(id, segment_id, segment_end_time) VALUES(?, ?, ?); APPLY BATCH;
  68  INSERT INTO running_reapers(reaper_instance_id, reaper_instance_host, last_heartbeat) VALUES(?,?,toTimestamp(now()));
   2  SELECT repair_schedule_id FROM repair_schedule_by_cluster_and_keyspace WHERE cluster_name = ? and keyspace_name = ?;

Looking at CQL statements executed during the repair, parsing the slow query logger with
awk -F"bound values]" '{print $2}' query-logger.txt | sort | uniq -c | sort -rn

1948  SELECT * FROM diagnostic_event_subscription;
 975  SELECT id,cluster_name,repair_unit_id,cause,owner,state,creation_time,start_time,end_time,pause_time,intensity,last_event,segment_count,repair_parallelism,tables FROM repair_run WHERE id = ? LIMIT 1;
 975  SELECT * FROM repair_run_by_cluster WHERE cluster_name = ?;
 975  SELECT * FROM cluster;
 162  INSERT INTO running_reapers(reaper_instance_id, reaper_instance_host, last_heartbeat) VALUES(?,?,toTimestamp(now()));
  81  SELECT * FROM repair_schedule_v1;
   2

Orientation

  • The Cassandra backend automatically enables "distributed reaper" mode even if only one instance of Reaper is ever used. Over the schedulers, metrics fetching, and leader-election, there's a lot to save on here when Reaper is running only one instance.
  • Diagnostics is fetching events and pinging SSE, even when none are registered.
  • CassandraStorage.getClusters is called a lot, taking a lot of total time.
  • CassandraStorage.getAllRepairSchedules is called a lot, taking a lot of total time.
  • CassandraStorage.getRepairRunsForCluster is called a lot, taking a lot of total time.
  • CassandraStorage.getRepairRunsWithState is called a lot.
  • ClusterFacade.tokenRangeToEndpoint is expensive.
  • There are lots of selects against cluster, repair_run, repair_run_by_cluster tables, using 6 different select statements, each used more than ten times the number of segments there are.

michaelsembwever added a commit that referenced this issue May 19, 2020
…aper instances

Sidecar mode overrides this as it implies always more than one reaper instance.
ALL|EACH|LOCAL modes, when using the Cassandra storage backend, will avoid operations that are for coordination between multiple reaper instances.

Once multiple reaper instances are detected, these operations will be enabled until Reaper is restarted.

This work is based on performance findings found in #896
@michaelsembwever
Copy link
Member Author

michaelsembwever commented May 19, 2020

  • The Cassandra backend automatically enables "distributed reaper" mode even if only one instance of Reaper is ever used. Over the schedulers, metrics fetching, and leader-election, there's a lot to save on here when Reaper is running only one instance.
  • Diagnostics is fetching events and pinging SSE, even when none are registered.
  • CassandraStorage.getClusters is called a lot, taking a lot of total time.
  • CassandraStorage.getRepairRunsForCluster is called a lot, taking a lot of total time.
  • CassandraStorage.getRepairRunsWithState is called a lot.

These are tackled by #926 (when Reaper is not distributed)

Fetching of diagnostics events can be further optimised when reaper is in distributed mode, for example by adjusting the poll frequency (ramping it down when there are no subscriptions)

michaelsembwever added a commit that referenced this issue May 19, 2020
…aper instances

Sidecar mode overrides this as it implies always more than one reaper instance.
ALL|EACH|LOCAL modes, when using the Cassandra storage backend, will avoid operations that are for coordination between multiple reaper instances.

Once multiple reaper instances are detected, these operations will be enabled until Reaper is restarted.

This work is based on performance findings found in #896

 ref: #926
@adejanovski
Copy link
Contributor

This is an awesome work 💪
Do you think we could have a way to simply reproduce these tests and compare results?

michaelsembwever added a commit that referenced this issue May 19, 2020
… for updated events once per minute

This will reduce table scans on diagnostic_event_subscription when reaper is distributed and diagnostics not otherwise used/subscribed to.
Based off findings from #896
@michaelsembwever
Copy link
Member Author

  • Diagnostics is fetching events and pinging SSE, even when none are registered.

    Fetching of diagnostics events can be further optimised when reaper is in distributed mode, for example by adjusting the poll frequency (ramping it down when there are no subscriptions)

Addressed in #927

@michaelsembwever
Copy link
Member Author

  • CassandraStorage.getClusters is called a lot, taking a lot of total time.

When Reaper is in distributed mode this is addressed in #928

michaelsembwever added a commit that referenced this issue May 20, 2020
…aper instances

Sidecar mode overrides this as it implies always more than one reaper instance.
ALL|EACH|LOCAL modes, when using the Cassandra storage backend, will avoid operations that are for coordination between multiple reaper instances.

Once multiple reaper instances are detected, these operations will be enabled until Reaper is restarted.

This work is based on performance findings found in #896

 ref: #926
michaelsembwever added a commit that referenced this issue May 20, 2020
…aper instances

Sidecar mode overrides this as it implies always more than one reaper instance.
ALL|EACH|LOCAL modes, when using the Cassandra storage backend, will avoid operations that are for coordination between multiple reaper instances.

Once multiple reaper instances are detected, these operations will be enabled until Reaper is restarted.

This work is based on performance findings found in #896

 ref: #926
adejanovski pushed a commit that referenced this issue May 25, 2020
…aper instances

Sidecar mode overrides this as it implies always more than one reaper instance.
ALL|EACH|LOCAL modes, when using the Cassandra storage backend, will avoid operations that are for coordination between multiple reaper instances.

Once multiple reaper instances are detected, these operations will be enabled until Reaper is restarted.

This work is based on performance findings found in #896

 ref: #926
@michaelsembwever
Copy link
Member Author

Investigation here is done. PRs can be opened (or referenced back to this) as and when necessary.

michaelsembwever added a commit that referenced this issue May 26, 2020
… for updated events once per minute

This will reduce table scans on diagnostic_event_subscription when reaper is distributed and diagnostics not otherwise used/subscribed to.
Based off findings from #896
michaelsembwever added a commit that referenced this issue May 27, 2020
… for updated events once per minute

This will reduce table scans on diagnostic_event_subscription when reaper is distributed and diagnostics not otherwise used/subscribed to.
Based off findings from #896
michaelsembwever added a commit that referenced this issue Jun 11, 2020
adejanovski pushed a commit that referenced this issue Jun 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants