Performance profile Reaper 2.0+ (Cassandra storage) #896

michaelsembwever · 2020-04-15T08:52:28Z

Analyse Reaper trunk to identify performance bottlenecks and weaknesses. Focusing only on the Cassandra backend.

This should involve:

stress testing
snapshots of CPU and GC, Memory, Garbage Collection, Threads and Classes profiling graphs
Method, Object, and Thread, call graphs
CQL call graphs
flame graphs

The outcome of this ticket should be observations that validate and warrant further performance optimisation in the codebase. Subsequent issues/PRs will be opened for each identified bottleneck/weakness. For example #851

Only performance optimisations that are simple patches will be applied to 2.0. Otherwise work will be done in master, slated for 2.1.0

Anyone is free to run performance tests and contribute additional findings to this issue.

The text was updated successfully, but these errors were encountered:

michaelsembwever · 2020-05-18T15:36:09Z

The following profile runs were done on a 6 node single-dc ccm cluster holding ~20GB of inconsistent data and one reaper instance using ALL mode with C* storage. Reaper was built using SHA d6b96e2.

Observations

First is shown hotspots on various operations. Only methods under io.cassandrareaper. are shown.

Hotspots during a normal boot sequence:

Hotspots during the initial boot where schema creation occurs:

The difference between the two:

Hotspots during creating a cluster:

Hotspots during creating a repair run:

Hotspots during a repair of the 20GB using 122 segments:

Hotspots during ~80 minutes of idle:

And looking at idle's total time in forward calls:

And looking at the reverse calls:

Focusing on Reaper's performance during active reaper and while idle…

During active repair the noticeable hotspots are

ClusterFacade.tokenRangeToEndpoint
forward calls…

reverse calls…

The ReaperApplication.scheduleRepairManager, DiagEventSubscriptionService.updateEnabledEvents, DiagEventSubscriptionService.pingSseClients, ClusterResource.getEndpointState, SegmentRunner.getNodeMetrics. Of these SegmentRunner.getNodeMetrics uses the most time.
In the repair path: SegmentRunner.runRepair and RepairRunner.startNextSegment

Looking at the calls to the Cassandra storage backend, nothing troublesome stands out on the Reaper side:

Looking at total times to judge Reaper's impact on the Cassandra nodes

Looking at CQL statements executed during the repair, parsing the slow query logger with
awk -F"bound values]" '{print $2}' query-logger.txt | sort | uniq -c | sort -rn

3774  SELECT * FROM cluster;
2001  SELECT id,cluster_name,repair_unit_id,cause,owner,state,creation_time,start_time,end_time,pause_time,intensity,last_event,segment_count,repair_parallelism,tables FROM repair_run WHERE id = ? LIMIT 1;
1823  SELECT id,repair_unit_id,segment_id,start_token,end_token,segment_state,coordinator_host,segment_start_time,segment_end_time,fail_count, token_ranges FROM repair_run WHERE id = ? and segment_id = ?;
1438  SELECT * FROM repair_run_by_cluster WHERE cluster_name = ?;
1401  SELECT count(segment_id) FROM repair_run WHERE id = ? AND segment_state = ? ALLOW FILTERING;
1025  SELECT id,repair_unit_id,segment_id,start_token,end_token,segment_state,coordinator_host,segment_start_time,segment_end_time,fail_count, token_ranges FROM repair_run WHERE id = ? AND segment_state = ? ALLOW FILTERING;
 826  SELECT * FROM diagnostic_event_subscription;
 789  SELECT * FROM cluster WHERE name = ?;
 408  SELECT * FROM leader;
 378  SELECT * FROM repair_schedule_v1;
 364  UPDATE leader USING TTL ? SET reaper_instance_id = ?, reaper_instance_host = ?, last_heartbeat = toTimestamp(now()) WHERE leader_id = ? IF reaper_instance_id = ?;
 353  INSERT INTO repair_run(id, cluster_name, repair_unit_id, cause, owner, state, creation_time, start_time, end_time, pause_time, intensity, last_event, segment_count, repair_parallelism,tables) VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?);
 286  SELECT id,repair_unit_id,segment_id,start_token,end_token,segment_state,coordinator_host,segment_start_time,segment_end_time,fail_count, token_ranges FROM repair_run WHERE id = ?;
 240  BEGIN UNLOGGED BATCH INSERT INTO repair_run(id,segment_id,segment_state,coordinator_host,segment_start_time,fail_count) VALUES(?, ?, ?, ?, ?, ?); APPLY BATCH;
 178  DELETE FROM leader WHERE leader_id = ? IF reaper_instance_id = ?;
 177  SELECT count(*) FROM repair_run WHERE id = ?;
 176  INSERT INTO leader(leader_id, reaper_instance_id, reaper_instance_host, last_heartbeat)VALUES(?, ?, ?, toTimestamp(now())) IF NOT EXISTS USING TTL ?;
 122  SELECT reaper_instance_id FROM running_reapers;
 122  BEGIN UNLOGGED BATCH INSERT INTO repair_run(id,segment_id,segment_state,coordinator_host,segment_start_time,fail_count) VALUES(?, ?, ?, ?, ?, ?); INSERT INTO repair_run(id, segment_id, segment_end_time) VALUES(?, ?, ?); APPLY BATCH;
  68  INSERT INTO running_reapers(reaper_instance_id, reaper_instance_host, last_heartbeat) VALUES(?,?,toTimestamp(now()));
   2  SELECT repair_schedule_id FROM repair_schedule_by_cluster_and_keyspace WHERE cluster_name = ? and keyspace_name = ?;

Looking at CQL statements executed during the repair, parsing the slow query logger with
awk -F"bound values]" '{print $2}' query-logger.txt | sort | uniq -c | sort -rn

1948  SELECT * FROM diagnostic_event_subscription;
 975  SELECT id,cluster_name,repair_unit_id,cause,owner,state,creation_time,start_time,end_time,pause_time,intensity,last_event,segment_count,repair_parallelism,tables FROM repair_run WHERE id = ? LIMIT 1;
 975  SELECT * FROM repair_run_by_cluster WHERE cluster_name = ?;
 975  SELECT * FROM cluster;
 162  INSERT INTO running_reapers(reaper_instance_id, reaper_instance_host, last_heartbeat) VALUES(?,?,toTimestamp(now()));
  81  SELECT * FROM repair_schedule_v1;
   2

Orientation

The Cassandra backend automatically enables "distributed reaper" mode even if only one instance of Reaper is ever used. Over the schedulers, metrics fetching, and leader-election, there's a lot to save on here when Reaper is running only one instance.
Diagnostics is fetching events and pinging SSE, even when none are registered.
CassandraStorage.getClusters is called a lot, taking a lot of total time.
CassandraStorage.getAllRepairSchedules is called a lot, taking a lot of total time.
CassandraStorage.getRepairRunsForCluster is called a lot, taking a lot of total time.
CassandraStorage.getRepairRunsWithState is called a lot.
ClusterFacade.tokenRangeToEndpoint is expensive.
There are lots of selects against cluster, repair_run, repair_run_by_cluster tables, using 6 different select statements, each used more than ten times the number of segments there are.

…aper instances Sidecar mode overrides this as it implies always more than one reaper instance. ALL|EACH|LOCAL modes, when using the Cassandra storage backend, will avoid operations that are for coordination between multiple reaper instances. Once multiple reaper instances are detected, these operations will be enabled until Reaper is restarted. This work is based on performance findings found in #896

michaelsembwever · 2020-05-19T09:20:51Z

The Cassandra backend automatically enables "distributed reaper" mode even if only one instance of Reaper is ever used. Over the schedulers, metrics fetching, and leader-election, there's a lot to save on here when Reaper is running only one instance.

Diagnostics is fetching events and pinging SSE, even when none are registered.

CassandraStorage.getClusters is called a lot, taking a lot of total time.

CassandraStorage.getRepairRunsForCluster is called a lot, taking a lot of total time.

CassandraStorage.getRepairRunsWithState is called a lot.

These are tackled by #926 (when Reaper is not distributed)

Fetching of diagnostics events can be further optimised when reaper is in distributed mode, for example by adjusting the poll frequency (ramping it down when there are no subscriptions)

…aper instances Sidecar mode overrides this as it implies always more than one reaper instance. ALL|EACH|LOCAL modes, when using the Cassandra storage backend, will avoid operations that are for coordination between multiple reaper instances. Once multiple reaper instances are detected, these operations will be enabled until Reaper is restarted. This work is based on performance findings found in #896 ref: #926

adejanovski · 2020-05-19T13:09:07Z

This is an awesome work 💪
Do you think we could have a way to simply reproduce these tests and compare results?

… for updated events once per minute This will reduce table scans on diagnostic_event_subscription when reaper is distributed and diagnostics not otherwise used/subscribed to. Based off findings from #896

michaelsembwever · 2020-05-19T15:18:32Z

Diagnostics is fetching events and pinging SSE, even when none are registered.
…
Fetching of diagnostics events can be further optimised when reaper is in distributed mode, for example by adjusting the poll frequency (ramping it down when there are no subscriptions)

Addressed in #927

Based on findings in #896

michaelsembwever · 2020-05-19T15:30:43Z

CassandraStorage.getClusters is called a lot, taking a lot of total time.

When Reaper is in distributed mode this is addressed in #928

…aper instances Sidecar mode overrides this as it implies always more than one reaper instance. ALL|EACH|LOCAL modes, when using the Cassandra storage backend, will avoid operations that are for coordination between multiple reaper instances. Once multiple reaper instances are detected, these operations will be enabled until Reaper is restarted. This work is based on performance findings found in #896 ref: #926

michaelsembwever · 2020-05-25T14:31:52Z

Investigation here is done. PRs can be opened (or referenced back to this) as and when necessary.

… for updated events once per minute This will reduce table scans on diagnostic_event_subscription when reaper is distributed and diagnostics not otherwise used/subscribed to. Based off findings from #896

Based on findings in #896

adejanovski added the new Issues requiring triage label Apr 15, 2020

michaelsembwever added 1 - Ready and removed new Issues requiring triage labels Apr 15, 2020

michaelsembwever self-assigned this Apr 15, 2020

michaelsembwever mentioned this issue Apr 15, 2020

Performance optimisations for storage.getSegmentsWithStates(id, states…) #851

Closed

michaelsembwever changed the title ~~Performance profile on Reaper 2.0+ (Cassandra storage)~~ Performance profile Reaper 2.0+ (Cassandra storage) Apr 15, 2020

michaelsembwever mentioned this issue May 19, 2020

Only execute distributed reaper operations once we detect multiple reaper instances #926

Merged

michaelsembwever mentioned this issue May 19, 2020

When there are no broadcasters and no known subscriptions, only check for updated events once per minute #927

Merged

michaelsembwever added a commit that referenced this issue May 19, 2020

Cache the clusters list for ten seconds

700a944

Based on findings in #896

michaelsembwever mentioned this issue May 19, 2020

Cache the clusters list for ten seconds #928

Merged

michaelsembwever closed this as completed May 25, 2020

michaelsembwever added a commit that referenced this issue Jun 11, 2020

Cache the clusters list for ten seconds

f4e4676

Based on findings in #896

adejanovski pushed a commit that referenced this issue Jun 18, 2020

Cache the clusters list for ten seconds

cc6f789

Based on findings in #896

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance profile Reaper 2.0+ (Cassandra storage) #896

Performance profile Reaper 2.0+ (Cassandra storage) #896

michaelsembwever commented Apr 15, 2020

michaelsembwever commented May 18, 2020 •

edited

Loading

michaelsembwever commented May 19, 2020 •

edited

Loading

adejanovski commented May 19, 2020

michaelsembwever commented May 19, 2020

michaelsembwever commented May 19, 2020

michaelsembwever commented May 25, 2020

Performance profile Reaper 2.0+ (Cassandra storage) #896

Performance profile Reaper 2.0+ (Cassandra storage) #896

Comments

michaelsembwever commented Apr 15, 2020

michaelsembwever commented May 18, 2020 • edited Loading

Observations

Orientation

michaelsembwever commented May 19, 2020 • edited Loading

adejanovski commented May 19, 2020

michaelsembwever commented May 19, 2020

michaelsembwever commented May 19, 2020

michaelsembwever commented May 25, 2020

michaelsembwever commented May 18, 2020 •

edited

Loading

michaelsembwever commented May 19, 2020 •

edited

Loading