-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility with pgbouncer with pool_mode != session #348
Conversation
Add `session` parameter to `connection()` with default value `True`. When set to `False` the client will limit usage of prepared statements to unnamed in a transaction only. As a side effect handling of unnamed prepared statements improved irrespective of `session` value - they are automatically reprepared if something invalidating them occurs.
Implements #339 |
Hi Aleksey. Thanks for working on this. The PR looks interesting, however I have concerns with the introduced redundancy. With this change, uncached anonymous statements would incur the penalty of preparing the query twice instead of once. This might be fine for simple queries, but if a query is sufficiently large and complicated, the I feel like the best solution would be to make some changes on the pgbouncer side as well. IIRC, the crux of the issue is that pgbouncer is hardcoded to only support the libpq behaviour by pinning the Parse/Bind/Execute/Sync sequence to a connection. I think teaching it to support Parse/Describe/Sync/Bind/Execute/Sync the same way should remove the need to send |
Uncached statements prepare twice iff
This is against the nature of pgbouncer and against the reason "why we use pgbouncer with exactly this settings". We don't want connection to postgresql to be held by someone outside of a transaction. Holding connection between Prepare and Execute leads to unpredictable behaviour under high load.
The choice to use pgbouncer or not should be done explicitly by a human. It has impact on performance characteristics, architecture and code of a service. |
This is not true. pgbouncer has no hardcoded sequences. "At completion of each series of extended-query messages, the frontend should issue a Sync message. This parameterless message causes the backend to close the current transaction if it's not inside a BEGIN/COMMIT transaction block (“close” meaning to commit if no error, or roll back if error). Then a ReadyForQuery response is issued. The purpose of Sync is to provide a resynchronization point for error recovery. When an error is detected while processing any extended-query message, the backend issues ErrorResponse, then reads and discards messages until a Sync is reached, then issues ReadyForQuery and returns to normal message processing." |
Yes, and this is a problem. There may be multiple reasons why the cache may be disabled, too small, get cleared often, etc. Also, I don't think adding a new connection parameter is the right way to go at all. See below.
I was referring strictly to the handling of unnamed statements, where the time between Parse and Execute is predictably small. I agree that named prepared statements, portals and other session-level features are "against the nature of pgbouncer" in transaction mode. I think, however, that a more robust bouncer implementation would reject these requests outright as opposed to failing in an unpredictable fashion.
This is not about the choice to use pgbouncer. This is about detecting the capabilities of the server and adjusting accordingly. We do this already to support various PostgreSQL variants and servers speaking its protocol. pgbouncer is not a transparent proxy by any means, given how it breaks the session semantics, so it should be treated as a PostgreSQL variant that doesn't support prepared statements properly, in which case the
If that's, in fact, true, and pgbouncer always waits for
I am aware of that, thanks. Bottom line, here's what I think should be done to add support for pgbouncer in an acceptable fashion:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per discussion above.
Can you explain this idea in more details?
Flush will not help. This will be one connection to PostgreSQL hold by one client. Application may be non-perfect and can not return control to coroutine connected to database for tenths and even hundreds of milliseconds resulting in pgsql connection hold by pgbouncer for the same time. It is inappropriate for distributed systems. |
It's not complicated. The only change is making
Then you need to fix your application! There is no guarantee that your |
I thought the primary reason of this parameter is to limit memory used by the client. If memory is not an issue then OK.
Different systems have different requirements. Small error in one of application request handlers shouldn't crash all the system (by utilizing all pgsql connections), one of worker can get stuck or die, but not the system at all. Web applications are live, they modified couple times a weak and even a day, they should be tolerant to small errors.
I don't really care about python workers, there are many of them, they easily scale. I care about database itself, master is the only and not scale.
asyncpg has nothing about "distributed systems", and any database client too. It is an architecture of application about it. The only thing a database client should do - not to interfere. Right now asyncpg prevents usage of pgbouncer. So, scale is not possible with current asyncpg version. I don't really mean that asyncpg is "inappropriate", this word was about |
if (use_cache or named) and self._protocol.session:
stmt_name = self._get_unique_id('stmt')
else:
stmt_name = '' Looks like |
It is about the memory usage, but mostly server-side, as prepared statements can consume a non-trivial amount of memory. The overall size of the client-side cache should obviously remain capped. I don't think we should expose that as a connection argument just yet, let's default to some larger value than the current
Yes, and if we were to satisfy all the different requirements of different systems in asyncpg, we would have a mess that wouldn't work for anybody. The goal of asyncpg is to work well with vanilla PostgreSQL, as that is a known common baseline. We are willing to support other PostgreSQL-like systems, but only if that support does not compromise the integrity and maintainability of asyncpg.
Again, what does that have to do with asyncpg? Your misbehaving application would exhaust the pgbouncer pool, or any other limited resource pool equally well. Fix it! Arguably, a system that can go down due to a small error in a single component is badly designed. If you are so worried about connection stalls, then refactor the database access into a standalone, well reviewed and tested service, separate from the application logic.
I think you misunderstood, or deliberately ignored what I wrote. Whatever you do, you cannot guarantee with absolute certainty that the protocol instance will always receive the entire database response in a single loop iteration, and your
... in the transaction mode. And the approach I suggested would fix that without introducing much complexity and side effects for existing workflows.
You just made that claim again.
Well, I think that As it stands, I'm against merging your version. |
It's OK if asyncpg is talking with pgbouncer. pgbouncer will cache the response and postgresql connection will be released. I don't care how many loop iteration required in python, I only want postgresql connection to be released as soon as possible. Think of pgbouncer in front of postgresql the same way as you think about nginx in front of python backend - it is used to handle slow clients.
I love to keep things simple. A lot of small services which are modified often are harder to maintain.
This PR has no side effects for existing workflow.
When connected to pgbouncer in |
Not true in every case. pgbouncer is a buffered proxy, and if the response is larger than its buffer (which is fairly small by default) then it will take multiple network operations to send the response.
Even so, it's a flawed implementation, and I'm against adding a new connection parameter for every "workflow" for reasons I outlined in my previous response. Please stop arguing your case by repeating the old arguments. I've heard them and articulated my opinion at length. Adding the necessary pgbouncer support is possible without these complications.
For simplicity, I would define local_statement_cache_size = statement_cache_size if named_supported else 1000 so that the current behaviour remains the same. We can work on improving the cache logic as a followup. |
Yes, you are right. I mean "buffer" not "cache". Buffer size is configurable.
They work. In a transaction. Exactly the same way as with vanilla PostgreSQL. I only limit them to be unnamed if connected to pgbouncer to prevent cleanup mess.
|
You cannot put named prepared statements into cache even inside a transaction, unless you clean them up from the cache once the transaction is closed, and this is way too complicated. |
Right. That's why in case of pgbouncer I put unnamed prepare statements into cache and use following check: if state.name == '' and (state != self.last_unnamed
or state.need_reprepare):
query = state.query
state.need_reprepare = False
self.last_unnamed = state
else:
query = None |
Well, that's no longer a prepared statement then, is it? If you need that, you can always subclass the PreparedStatement class and do a |
That's a prepared statement still. But unnamed and with lifetime until next unnamed statement or end of a transaction. Subclassing is overcomplicated and anyway this feature requires modifications in coreproto.pyx, protocol.pyx and connection.py. If you don't like |
Which makes it essentially useless in a prepared statement sense. The purpose of an explicit prepared statement is to control precisely when it gets prepared and deallocated. Barring these guarantees it can no longer pretend to be a true prepared statement.
In this case subclassing is the only way to go. Changing things around to make it easier is fine, as long as those changes are generic and have no negative impact.
Calling it differently does not remove my objection. The magical re- |
True or untrue - it is subjective. From the PostgreSQL docs all of them are true.
It is not magical. It is logical and predictible. More then that it already existed in the code, just was moved to a more appropriate place. |
Sorry, I don't understand your statement. If I do Example: ps = await conn.prepare('complicated query...')
for i in range(100):
param = await conn.fetchval('simple query', i)
result = await ps.fetch(param) I expect the above to either work as expected, or fail loudly. What I don't expect is the above doing silent |
"In the extended protocol, the frontend first sends a Parse message, which contains a textual query string, optionally some information about data types of parameter placeholders, and the name of a destination prepared-statement object (an empty string selects the unnamed prepared statement)." "Named prepared statements can also be created and accessed at the SQL command level, using PREPARE and EXECUTE." PREPARE SQL command used in PostgreSQL is an additional feature, not compatible with SQL standard. It doesn't support unnamed statements because it is impossible on a high level of the SQL language, but this not imply what unnamed prepared statements - untrue.
This is how database clients in different languages work. If a server side prepared statements are disabled then a client falls back to a client side. This is common, widely used and familiar behaviour. Users experienced with other database clients are ready to this. This behaviour allows to write code able to work optimally with and without pgbouncer. |
Where did you get that from? Are you able to show at least one example of a database driver that silently emulates an explicit call to make a server-side prepared statement? Does libpq |
libpq is a low level library. PQprepare is highly tied with PostgreSQL client/server protocol.
Yes, 5 times for each query by default: https://jdbc.postgresql.org/documentation/head/server-prepare.html
Wikipedia can (https://en.wikipedia.org/wiki/Prepared_statement): |
This is exactly the point. asyncpg is like libpq. Please read the README.
After which it would attempt to create a server-side prepared statement and fail unpredictably. Everything else you listed are interfaces implementing generic DB APIs, which is explicitly not the goal of asyncpg. |
It is not like libpq. Hidden introspection queries and automagical hidden prepared statements are the most notable differences. OK, your code your rules. I can live without |
I'm not against supporting pgbouncer cleanly, which can be done. I'm against your specific approach. If you are unwilling to compromise, so be it. |
You want to support pgbouncer in a fashion what will scale badly with possibility of connections get stuck. Cleanly or not is a second question. At first it just should work. For anyone who aware of horizontal scaling and who exactly knows why they want to use pgbouncer with pool_mode=transaction fork will live here: https://github.com/my-mail-ru/asyncpg |
You provided zero data to back up this claim and ignored my counter-arguments, so this is just FUD. |
Add
session
parameter toconnection()
with default valueTrue
.When set to
False
the client will limit usage of prepared statementsto unnamed in a transaction only.
As a side effect handling of unnamed prepared statements improved
irrespective of
session
value - they are automatically repreparedif something invalidating them occurs.