Latest Synapse 1.70.0rc1 (2022-10-19) release seems to have regressed `/messages` response time #14250

MadLittleMods · 2022-10-20T20:33:02Z

I think we rolled out the latest Synapse 1.70.0rc1 (2022-10-19) to matrix.org this morning. Looking at the graphs (where the spike on the right is), I noticed there seems to be a big up-tick in /messages response times.

We should check to see whether this persists tomorrow or is just a blip.

https://grafana.matrix.org/d/dYoRgTgVz/messages-timing?orgId=1&from=now-4d&to=now

It looks like we're seeing a major regression in the before backfill processing time (synapse/handlers/federation.py#L197-L503). And some in the after backfill processing time (synapse/handlers/federation_event.py#L643-L657.

What changed?

See https://github.com/matrix-org/synapse/blob/develop/CHANGES.md#synapse-1700rc1-2022-10-19

Is it #13816?

One obvious culprit is #13816 but I would expect everything that it touches to be in the after backfill processing timing. And the goal of that is to save time since we bail early if are trying to process an event we know will fail.

My /messages?dir=b&limit=500 benchmark request to Matrix HQ still takes 24s to complete (expected).. The one thing that stood out to me was a new recursively fetching redactions span under _process_pulled_events in the trace.

Is it #14164?

This one seems suspicious. Given the new recursively fetching redactions span I saw under _process_pulled_events in the Jaeger trace which _get_events_from_db does and changed in this PR. And that we removed a cache.

This would account for the after backfill processing time. Just smoke though. I don't know how it performs. We do end up running _process_pulled_event less since we only do it for new events.

What else?

What about the before backfill processing time?

Needs investigation.

I wish I had a trace that represented one of these long requests. Should be something longer than 180s

But we only recorded two traces in the past 2 days over 120s, https://jaeger.proxy.matrix.org/search?end=1666298780924000&limit=20&lookback=2d&maxDuration&minDuration=120s&operation=RoomMessageListRestServlet&service=matrix.org%20client\_reader&start=1666125980924000

And they seem more related to outgoing-federation-request for /backfill which we don't control, RequestSendFailed. And wouldn't account for before/after backfill processing times

The text was updated successfully, but these errors were encountered:

DMRobertson · 2022-10-20T22:25:27Z

Is this a regression in typical response times, or is it just getting worse in the long tail?

MadLittleMods · 2022-10-20T22:46:05Z

Is this a regression in typical response times, or is it just getting worse in the long tail?

It's not very obvious in the overall graphs besides in the very top percentiles (>99 percentile):

The problem seems to be focused in substantial (100 - 1000 members) and large rooms (> 1000 members) where it at least shows in the 75th percentile.

MadLittleMods · 2022-10-24T16:24:30Z

I'm not seeing this spike continue at all this week. Seems back down to normal.

Perhaps it was a load spike on the matrix.org servers? Nothing obvious on that front that correlates though.

I'll close for now because this isn't pressing and continuing for us to debug.

MadLittleMods added the A-Messages-Endpoint /messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill) label Oct 20, 2022

MadLittleMods closed this as completed Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latest Synapse 1.70.0rc1 (2022-10-19) release seems to have regressed `/messages` response time #14250

Latest Synapse 1.70.0rc1 (2022-10-19) release seems to have regressed `/messages` response time #14250

MadLittleMods commented Oct 20, 2022 •

edited

Loading

DMRobertson commented Oct 20, 2022

MadLittleMods commented Oct 20, 2022 •

edited

Loading

MadLittleMods commented Oct 24, 2022 •

edited

Loading

Latest Synapse 1.70.0rc1 (2022-10-19) release seems to have regressed /messages response time #14250

Latest Synapse 1.70.0rc1 (2022-10-19) release seems to have regressed /messages response time #14250

Comments

MadLittleMods commented Oct 20, 2022 • edited Loading

What changed?

Is it #13816?

Is it #14164?

What else?

DMRobertson commented Oct 20, 2022

MadLittleMods commented Oct 20, 2022 • edited Loading

MadLittleMods commented Oct 24, 2022 • edited Loading

Latest Synapse 1.70.0rc1 (2022-10-19) release seems to have regressed `/messages` response time #14250

Latest Synapse 1.70.0rc1 (2022-10-19) release seems to have regressed `/messages` response time #14250

MadLittleMods commented Oct 20, 2022 •

edited

Loading

MadLittleMods commented Oct 20, 2022 •

edited

Loading

MadLittleMods commented Oct 24, 2022 •

edited

Loading