Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExecuteUpdate on Oracle and sub-optimal query #30742

Closed
anoordover opened this issue Apr 21, 2023 · 11 comments
Closed

ExecuteUpdate on Oracle and sub-optimal query #30742

anoordover opened this issue Apr 21, 2023 · 11 comments

Comments

@anoordover
Copy link

In oracle you have two ways to update multiple rows from one table with data from another table:

  • use an ordinary update statement (simpler and more frequently used SQL statement)
  • use a so called merge statement (developers need to get used to these kind of statements)

Both these statement yield to a different execution plan where the second one is better when updating and querying a lot of rows.
Before EF Core version 7 you could only do these kind of update using raw SQL.
From EF Core version 7 you can also use ExecuteUpdate to do these kinds of updates.

My opinion is that when we choose raw SQL because of this issue we need to measure if this issue is still viable and change our code to use ExecuteUpdate when this situation isn't viable anymore (Oracle DB optimizer, EF Core or the Oracle EF Core implementation might change).
Or we might accept the performance impact and use ExecuteUpdate so me en my colleagues only need to have knowledge about EF Core.

What would you do:

  • Go for the optimal raw SQL statement and check the viability of the reason behind this choice
  • Go for the optimal raw SQL and don't do any measurements after the implementation
  • Go for the sub-optimal solution using ExecuteUpdate and trust that implementors will eventually fix these performance problems and go for a single technology
@roji
Copy link
Member

roji commented Apr 21, 2023

@anoordover I'm familiar with MERGE from SQL Server and PostgreSQL, and as far as I know it isn't typically used for the same things that UPDATE is used; while UPDATE can be used to bulk-update many rows in a single table, MERGE is typically used to merge data from one table (or from the client) into another table.

Can you please provide more context on exactly what you're doing, showing the (apparently inferior) UPDATE and (apparently superior) MERGE, as well as the plans for these?

Go for the sub-optimal solution using ExecuteUpdate and trust that implementors will eventually fix these performance problems and go for a single technology

I'd say there's very little chance that ExecuteUpdate will be changed to use MERGE under the covers (although we're not the ones maintaining the Oracle driver). ExecuteUpdate really models the SQL UPDATE statement, MERGE is a different statement with its own syntax etc.

@anoordover
Copy link
Author

anoordover commented Apr 22, 2023

I will continue my investigation on Monday. But for now:
For now I think this might be true (I'm open to corrections...)
It might be that the execution plan of the MERGE statement is different from an ordinary UPDATE statement because the UPDATE statement updates all the (filtered) rows of the target table even if no matching row can be found in the source table (the field(s) will be updated to null?).
The MERGE statement only updates rows in the target table when there is a matching row in the source table.
I think that in the MERGE statement Oracle does a "hash join" on the two table to update the target table.

@anoordover
Copy link
Author

anoordover commented Apr 22, 2023

When my previous comment is correct I think:

  • the executeupdate statement will never be executed as a MERGE statement
  • the Oracle execution plan of the UPDATE statement will not/never be the same as the execution plan of the MERGE statement
  • for optimized queries rawsql might still be needed
  • to prevent rawsql in my application something like a mergeupdate statement should be developed in EF Core (when feasible) ;)

When writing merge statements is difficult for a team they could use Open AI to rewrite an UPDATE statement to a MERGE statement ;)

@roji
Copy link
Member

roji commented Apr 22, 2023

@anoordover I highly recommend bringing a concrete sample here with an actual UPDATE and MERGE which are equivalent, but where the MERGE is apparently more efficient - this is a bit too abstract to reason about without seeing actual SQL statements.

If there's indeed a category of cases where a SQL UPDATE can be transformed to a MERGE and that MERGE works more efficiently, then it may indeed make sense for ExecuteUpdate to to do that. If this is true for other databases which support MERGE (SQL Server, PostgreSQL), then this is something we could consider doing in EF Core itself. Otherwise, the Oracle team could add this optimization to their own provider.

But again, we need to see actual statements and plans.

@anoordover
Copy link
Author

anoordover commented Apr 22, 2023

Important indexes and table:
INDIVIDUALCLAIM (100.000 rows):
  - unique CLAIMID (reference to a claim), REFERENCE (referencenumber for the individual claim)
INDIVIDUALPAYMENT (150.000 rows):
  - unique PAYMENT_ID (reference to a payment), INDIVIDUALCLAIM_REFERENCE (functional reference to the individual claim that is paid with this individual payment)
Both these tables have a master table (CLAIM for INDIVIDUALCLAIM and PAYMENT for INDIVIDUALPAYMENT)

UPDATE "XYZ"."INDIVIDUALPAYMENT" "d"
SET "d"."INDIVIDUALCLAIMID" = (
    SELECT "v"."ID"
    FROM "XYZ"."INDIVIDUALCLAIM" "v"
    WHERE (((("v"."CLAIMID" = 4) AND ("d"."INDIVIDUALCLAIM_REFERENCE" = "v"."REFERENCE"))))
        FETCH FIRST 1 ROWS ONLY)
WHERE "d"."PAYMENT_ID" = 3


Plan hash value: 2362689481

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                                     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT               |                                          |   154K|  4379K|   619K (25)| 00:00:25 |
|   1 |  UPDATE                        | INDIVIDUALPAYMENT                        |       |       |            |          |
|*  2 |   TABLE ACCESS FULL            | INDIVIDUALPAYMENT                        |   154K|  4379K|   583   (1)| 00:00:01 |
|*  3 |   VIEW                         |                                          |     1 |    26 |     3   (0)| 00:00:01 |
|*  4 |    WINDOW BUFFER PUSHED RANK   |                                          |     1 |    21 |     3   (0)| 00:00:01 |
|   5 |     TABLE ACCESS BY INDEX ROWID| INDIVIDUALCLAIM                          |     1 |    21 |     3   (0)| 00:00:01 |
|*  6 |      INDEX UNIQUE SCAN         | IX_INDIVIDUALCLAIM_CLAIMID_REFERENCE     |     1 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

"   2 - filter(""d"".""PAYMENT_ID""=3)"
"   3 - filter(""from$_subquery$_003"".""rowlimit_$$_rownumber""<=1)"
   4 - filter(ROW_NUMBER() OVER ( ORDER BY  NULL )<=1)
"   6 - access(""v"".""CLAIMID""=4 AND ""v"".""REFERENCE""=:B1)"


MERGE INTO "XYZ"."INDIVIDUALPAYMENT" "d"
USING (
    SELECT "v"."ID", "v"."REFERENCE"
    FROM "XYZ"."INDIVIDUALCLAIM" "v"
    WHERE "v"."CLAIMID" = 4
) "v"
ON ("d"."INDIVIDUALCLAIM_REFERENCE" = "v"."REFERENCE")
WHEN MATCHED THEN UPDATE SET "d"."INDIVIDUALCLAIMID" = "v"."ID"
                  WHERE "d"."PAYMENT_ID" = 3;


Plan hash value: 2818844525

---------------------------------------------------------------------------------------------------
| Id  | Operation            | Name               | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT      |                    |     1 |   127 |       |  2194   (1)| 00:00:01 |
|   1 |  MERGE               | INDIVIDUALPAYMENT  |       |       |       |            |          |
|   2 |   VIEW               |                    |       |       |       |            |          |
|*  3 |    HASH JOIN         |                    |     1 |   122 |  3408K|  2194   (1)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| INDIVIDUALCLAIM    |   105K|  2166K|       |   617   (1)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| INDIVIDUALPAYMENT  |   154K|    14M|       |   583   (1)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

"   3 - access(""d"".""INDIVIDUALCLAIM_REFERENCE""=""v"".""REFERENCE"")"
"   4 - filter(""v"".""CLAIMID""=4)"

Please let me know if you need more information

In the plan for the MERGE-statement you see the HASH JOIN that I mentioned before.

Thinking about these differences triggered me to think about what I would expect when executing an executeupdate statement that uses a source table for updating a field in the target table. My question is: Is it the expected behavior that also the rows in the target table get updated when there is no row being found in the source table or should you be forced to define a where clause on the target table so that only rows will be selected having a matching row in the source table?

@anoordover
Copy link
Author

Adding a where clause on the target table in the UPDATE statement gives me the plan added below.
So maybe it isn't an issue...

Plan hash value: 2995876849
 
-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                                     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT               |                                          |     1 |    45 |       |  1179   (1)| 00:00:01 |
|   1 |  UPDATE                        | INDIVIDUALPAYMENT                        |       |       |       |            |          |
|*  2 |   HASH JOIN RIGHT SEMI         |                                          |     1 |    45 |  2896K|  1172   (1)| 00:00:01 |
|*  3 |    INDEX FAST FULL SCAN        | IX_INDIVIDUALCLAIM_CLAIMID_REFERENCE     |   105K|  1650K|       |   147   (1)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL           | INDIVIDUALPAYMENT                        |   154K|  4379K|       |   583   (1)| 00:00:01 |
|*  5 |   VIEW                         |                                          |     1 |    26 |       |     3   (0)| 00:00:01 |
|*  6 |    WINDOW BUFFER PUSHED RANK   |                                          |     1 |    21 |       |     3   (0)| 00:00:01 |
|   7 |     TABLE ACCESS BY INDEX ROWID| INDIVIDUALCLAIM                          |     1 |    21 |       |     3   (0)| 00:00:01 |
|*  8 |      INDEX UNIQUE SCAN         | IX_INDIVIDUALCLAIM_CLAIMID_REFERENCE     |     1 |       |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
"   2 - access(""S"".""REFERENCE""=""d"".""INDIVIDUALCLAIM_REFERENCE"")"
"   3 - filter(""S"".""CLAIMID""=4)"
"   4 - filter(""d"".""PAYMENT_ID""=3)"
"   5 - filter(""from$_subquery$_003"".""rowlimit_$$_rownumber""<=1)"
   6 - filter(ROW_NUMBER() OVER ( ORDER BY  NULL )<=1)
"   8 - access(""v"".""CLAIMID""=4 AND ""v"".""REFERENCE""=:B1)"

@roji
Copy link
Member

roji commented Apr 22, 2023

@anoordover OK. I'd be surprised if Oracle provided no way to do such a simple UPDATE efficiently, forcing users to resort to MERGE syntax instead.

Specifically for your first UPDATE above, that contains a correlated subquery, i.e. it references d.INDIVIDUALCLAIM_REFERENCE from the outer table, meaning that the subquery needs to run for each row in the outer table (INDIVIDUALPAYMENT). This is indeed always bad for perf, also for non-UPDATE queries and on other databases. Most databases allow expressing JOINs instead - just like you can do in regular queries; that does the same thing but much more efficiently. I'm no Oracle expert, so I'm not sure what's possible here.

Note that in any case, #4526 may introduce a way to express MERGE - although that issue still requires lots of careful thinking and design. If it really makes sense in the Oracle context to implement some types of ExecuteUpdate via MERGE rather than UPDATE, then the maintainers of the Oracle provider may be able to add that ability to their provider specifically.

@anoordover
Copy link
Author

anoordover commented Apr 22, 2023

As a side note...
I still need to update my issue on the Oracle implementation.
The syntax mentioned in that issue assigns the same alias to the source and target table.
But I will investigate this later this weekend or on Monday. This seems to be caused by the fact that we prefix all our tables with the schema-name, e.g. XYZ_CLAIM.

Did you notice the temp storage usage in the last plan and the MERGE plan?

Second side note: Oracle says that they don't support executeupdate yet (but they don't throw an exception)

@roji
Copy link
Member

roji commented Apr 23, 2023

@anoordover it seems we're discussing several different things (specifically in oracle/dotnet-db-samples#270).

AFAIK basic ExecuteUpdate does work with Oracle. You seem to have encountered a specific bug with the invalid SQL - that's likely to be a problem on the Oracle side (though not sure), I'd advise tracking that separately from the other ExecuteUpdate-related problems.

@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale May 3, 2023
@michiproep
Copy link

@roji It seems you are right. Simple executeUpdates work but if you have a complex where-query, Oracle actually generates a
UPDATE...
SET
FROM (...)
WHERE...
statement where the FROM-syntax is not even supported by Oracle.

@roji
Copy link
Member

roji commented Jul 17, 2023

@michiproep thanks for confirming - it's probably a good idea to make Oracle aware of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants