Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closure references are represented as constant nodes in expression trees #12841

Closed
roji opened this issue Mar 14, 2022 · 11 comments
Closed

Closure references are represented as constant nodes in expression trees #12841

roji opened this issue Mar 14, 2022 · 11 comments
Labels
Area-Quotations Quotations (compiler support or library). See also "queries" Bug Impact-Low (Internal MS Team use only) Describes an issue with limited impact on existing code.
Milestone

Comments

@roji
Copy link
Member

roji commented Mar 14, 2022

An issue has been raised with using EF Core and F# that may point to a problem in the way F# generates LINQ expression trees.

In a nutshell, in the following query over an EF Core IQueryable, the i variable is represented by a simple ConstantExpression containing 8 (see full code sample below):

let i = 8
db.Events.Where(fun x -> x.Data = i).ToList()

The same code in C#:

var i = 8;
_ = db.Events.Where(x => x.Data == i).ToList();

... yields a very different expression tree: a FieldExpression is emitted for the closure object, with the object itself (the FieldExpression's Expression) being a ConstantExpression over a type that's identifiable as a closure (Attributes contains TypeAttributes.NestedPublic, and [CompilerGeneratedAttribute] is present).

In other words, the F# code above generates an expression that is identical to what's produced by the following C# code:

_ = db.Events.Where(x => x.Data == 8).ToList();

The distinction between a simple constant and a closure parameter is very important to EF Core: while a simple ConstantExpression gets embedded as-is in the SQL, a closure reference gets generated as a SQL parameter, which gets sent outside of the SQL (the SQL gets a placeholder such as @i). This has a profound effect on performance, as databases typically reuse query plans when the SQL is the same (with different parameters), but when different constants are embedded in the SQL, they do not.

I realize things work quite differently in F#, and it may not necessarily be right for the exact same tree shape to be produced as in C#. However, the distinction itself (between constant and "closure parameter") is quite important.

/cc @NinoFloris @smitpatel @ajcvickers

F# runnable code sample
open System
open System.Linq
open Microsoft.EntityFrameworkCore
open Microsoft.Extensions.Logging

#nowarn "20"

type Event() =
    member val Id: Guid = Guid.Empty with get, set
    member val Data: int = 0 with get, set

type ApplicationDbContext() =
    inherit DbContext()

    override this.OnConfiguring(builder) =
        builder
            .UseSqlServer(@"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false")
            .LogTo(Action<string>(Console.WriteLine), LogLevel.Information)
            .EnableSensitiveDataLogging() |> ignore

    [<DefaultValue>]
    val mutable private events: DbSet<Event>

    member this.Events
        with public get () = this.events
        and public set v = this.events <- v

[<EntryPoint>]
let main args =
    use db = new ApplicationDbContext()

    db.Database.EnsureDeleted()
    db.Database.EnsureCreated()

    let i = 8
    db.Events.Where(fun x -> x.Data = i).ToList()

    0
C# runnable code sample
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.Logging;

await using var db = new BlogContext();
await db.Database.EnsureDeletedAsync();
await db.Database.EnsureCreatedAsync();

var i = 8;
_ = db.Events.Where(x => x.Data == i).ToList();

public class BlogContext : DbContext
{
    public DbSet<Event> Events { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlServer(@"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false")
            .LogTo(Console.WriteLine, LogLevel.Information)
            .EnableSensitiveDataLogging();
}

public class Event
{
    public Guid Id { get; set; }
    public int Data { get; set; }
}
@roji roji added the Bug label Mar 14, 2022
@KevinRansom KevinRansom added the Impact-Low (Internal MS Team use only) Describes an issue with limited impact on existing code. label Mar 14, 2022
@KevinRansom
Copy link
Member

KevinRansom commented Mar 14, 2022

It looks like this should be fixed in F# to improve Entity Framework interop.

@sep2
Copy link

sep2 commented Mar 15, 2022

Finally found a workaround, just leave it here for others: npgsql/efcore.pg#2302 (comment)

// wrap anything in a Some(...)
let ints = [| 1; 2 |] |> Some

// this will generate the correct behavior
db.Events.Where(fun x -> ints.Value.Contains(x.Id)).ToList()
info: 3/15/2022 18:32:55.869 RelationalEventId.CommandExecuted[20101] (Microsoft.EntityFrameworkCore.Database.Command) 
      Executed DbCommand DbCommand (16ms) [Parameters=[@__Value_0={ '1', '2' } (DbType = Object)], CommandType='Text', CommandTimeout='30']
      SELECT e."Id"
      FROM "Events" AS e
      WHERE e."Id" = ANY (@__Value_0)

@dsyme dsyme added Area-Queries Query expressions and library implementation Area-Quotations Quotations (compiler support or library). See also "queries" and removed Area-Queries Query expressions and library implementation labels Mar 30, 2022
@Lanayx
Copy link
Contributor

Lanayx commented May 30, 2022

This shouldn't be marked as low priority since any efcore program written in fsharp will become vulnerable to sql injections, which is one of the highest possible vulnerabilities in any application.

@roji
Copy link
Member Author

roji commented May 30, 2022

@Lanayx that is not true - EF Core properly escapes constants specified in the expression tree to prevent SQL injection. This affects the performance of the query (in a significant way!), but has no security implications.

@Lanayx
Copy link
Contributor

Lanayx commented May 30, 2022

@roji thank you for the clarification, I can sleep well now :)

@vzarytovskii vzarytovskii added this to the Backlog milestone May 30, 2022
@dsyme
Copy link
Contributor

dsyme commented May 30, 2022

There are some alternatives you might also like to try. Given this:

open System.Linq
type R = { Data : int }
let db = [ for i in 1..100 -> { Data = i } ].AsQueryable()

Then I see this:

  1. Representing i as a top level module binding results in a PropertyGet:
let i = 8
<@ fun x -> x.Data = i @>

gives

  Lambda (x,
        Call (None, op_Equality,
              [PropertyGet (Some (x), Data, []), PropertyGet (None, i, [])]))
  1. Representing i as a class-bound-let results in a PropertyGet:
type F3() =
    let mutable i = 9
    member x.Q = <@ fun x -> x.Data = i @>

gives

  Lambda (x,
        Call (None, op_Equality,
              [PropertyGet (Some (x), Data, []),
               FieldGet (Some (ValueWithName (FSI_0011+F3, x)), i)]))
  1. Representing i via a reference call is similar to the case above, and gives a PropertyGet:
let f2() =
    let i = ref 8
    <@ fun x -> x.Data = i.Value @>

f2()

gives

  Lambda (x,
        Call (None, op_Equality,
              [PropertyGet (Some (x), Data, []),
               PropertyGet (Some (ValueWithName ({ contents = 8 }, i)), Value,
                            [])]))
  1. Likewise you can use your own boxing type:
type R<'T> = { Value: 'T }
let R x = { Value = x }
let f4() =
    let i = R 8
    <@ fun x -> x.Data = i.Value @>

f4()

gives

val it: Quotations.Expr<(R -> bool)> =
  Lambda (x,
        Call (None, op_Equality,
              [PropertyGet (Some (x), Data, []),
               PropertyGet (Some (ValueWithName ({ Value = 8 }, i)), Value, [])]))

@dsyme
Copy link
Contributor

dsyme commented May 30, 2022

I think any of (1)-(3) above should work to help make the parameterization structure of the quotation and query more explicit. I suspect (4) won't work. However EF may well be sensitive to each particular variation.

@roji
Copy link
Member Author

roji commented May 30, 2022

@dsyme thanks for these suggestions.

These indeed some like possible workarounds for the general problem, but I'm not sure they'd help in the EF Core case specifically. A very typical scenario is having a function with some parameter, which executes an EF that embeds that parameter inside the expression tree; top-level or module or class binding doesn't seem like it would work well there (unless users factor their code in ways that seem somewhat convulted).

A reference call or boxing type could possibly work, but would involve considerable ceremony for what should ideally be a simple embedding of a parameter in a query. There would also be a pretty bad discoverability problem: even if a special boxing type exists, users would understandably assume that simply embedding a parameter directly is fine (as in C#), whereas in F# it would seem to work but lead to (severely) degraded performance.

Is there any chance that the LINQ expression tree shape here could be different? I'm trying to make the EF (and possibly other LINQ providers) experience better in F#, hopefully that's possible.

@dsyme
Copy link
Contributor

dsyme commented May 31, 2022

@roji The current F# behaviour is by-design. I don't think it can be changed.

Please add a language suggestion for this. The only thing I can think of is to add a new attribute to put on parameters and locals to get them to be represented via a reference into a heap-allocated object. That will also be undiscoverable but perhaps EF samples can always show it.

For comparison, here's some C#:

public class A
{
    static void F(System.Linq.Expressions.Expression<System.Func<int, int>> f)
    {
        Console.WriteLine("Hello, World!, f = {0}", f);
    }

    public static void M(int x)
    {
        F(a => x + 1);
    }
    public static void Main(string[] argv)
    {
        M(3);
    }
}

The output shows how x is lifted to a closure - like the autobox converstion for F#. However this conversion is not applied to F# quotations and it would be a breaking change to apply it.

Hello, World!, f = a => (value(A+<>c__DisplayClass1_0).x + 1)

@roji
Copy link
Member Author

roji commented Jun 3, 2022

OK.

Given that the expression tree shape won't change on the F# side, we're considering introducing an EF-specific mechanism for explicitly forcing parameterization in LINQ queries - EF.Param(). This may even be useful in some C# scenarios, see dotnet/efcore#28151.

Am closing this issue. Until the EF-side support is done, use the ref workaround to make suer queries get parameterized properly, and please upvote dotnet/efcore#28151 to let the EF Core team know that you need it.

@roji roji closed this as not planned Won't fix, can't repro, duplicate, stale Jun 3, 2022
@roji
Copy link
Member Author

roji commented Aug 27, 2024

Note that EF.Param has indeed been introduced on the EF side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Quotations Quotations (compiler support or library). See also "queries" Bug Impact-Low (Internal MS Team use only) Describes an issue with limited impact on existing code.
Projects
None yet
Development

No branches or pull requests

6 participants