Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow Async Query of Byte[] in 2.2.1 #14498

Closed
danjohnso opened this issue Jan 23, 2019 · 13 comments
Closed

Slow Async Query of Byte[] in 2.2.1 #14498

danjohnso opened this issue Jan 23, 2019 · 13 comments

Comments

@danjohnso
Copy link

danjohnso commented Jan 23, 2019

We are seeing 25-30x slower queries when using async to retrieve files stored as byte[] in our database. The SQL generated is fine, about the same as I would have handcrafted. I expected there would be some penalty to using async here, but not 25-30x.

Not aware if this happened before 2.2, didn't notice in our logs what these queries were performaing until more recently.

Steps to reproduce

Query files stored as byte[] in SQL Server using SingleAsync:

//case 1, 13 seconds to retrieve 10mb file
byte[] file = await _attachmentContext.AttachmentData.Where(x => x.Id == id).Select(x => x.Data).SingleAsync();

//case 2, 500ms to retrieve 10mb file
byte[] file = _attachmentContext.AttachmentData.Where(x => x.Id == id).Select(x => x.Data).Single();

//case 3, to rule out hardware issue, I used the dbContext to retrieve the Database connection and Dapper ORM to execute the same generated SQL, takes about 250ms 
connection = _attachmentContext.Database.GetDbConnection();
if (connection.State != ConnectionState.Open)
{
	connection.Open();
}
byte[] file = await connection.QuerySingleAsync<byte[]>("SELECT TOP 2 Data FROM AttachmentData WHERE Id = @id", new { id });

//case 4, Synchronous version of #3, takes about 175ms
connection = _attachmentContext.Database.GetDbConnection();
if (connection.State != ConnectionState.Open)
{
	connection.Open();
} 
byte[] file = connection.QuerySingle<byte[]>("SELECT TOP 2 Data FROM AttachmentData WHERE Id = @id", new { id });
//full entity and dbcontext
public class AttachmentData
{
	public Guid Id { get; set; }
	public DateTime Created { get; private set; }
	public Guid CreatedById { get; set; }

	private byte[] _data;
	public byte[] Data
	{
		get => _data;
		set
		{
			_data = value;
			if (value != null)
			{
				DataHash = value.Sha1();
				FileSize = value.LongLength;
			}
		}
	}

	public byte[] DataHash { get; private set; }
	public long FileSize { get; private set; }
	public DateTime Modified { get; set; }
	public Guid ModifiedById { get; set; }
	public byte[] RowVersion { get; set; }
}

internal static class AttachmentDataExtensions
{
	public static EntityTypeBuilder<AttachmentData> Map(this EntityTypeBuilder<AttachmentData> entity)
	{
		entity.ToTable(nameof(AttachmentData));
		entity.HasKey(x => x.Id);
		entity.Property(x => x.Id).ValueGeneratedOnAdd().HasDefaultValueSql("NEWID()");

		entity.Property(x => x.Created).ValueGeneratedOnAdd().HasDefaultValueSql("GETUTCDATE()");

		//HACK ef core does not support updates on DateTimes natively as of 2.1, requires a trigger to handle updates so we will just set manually
		entity.Property(x => x.Modified).ValueGeneratedOnAdd().HasDefaultValueSql("GETUTCDATE()");

		entity.Property(x => x.RowVersion).IsRowVersion();

		entity.Property(x => x.Data).IsRequired();
		entity.Property(x => x.DataHash).IsRequired();
		
		return entity;
	}
}


public class AttachmentContext : DbContext
{
	public AttachmentContext(DbContextOptions<AttachmentContext> options)
		: base(options)
	{
	}
	
	public DbSet<AttachmentData> AttachmentData { get; set; }

	protected override void OnModelCreating(ModelBuilder modelBuilder)
	{
		modelBuilder.Entity<AttachmentData>().Map();
		base.OnModelCreating(modelBuilder);
	}
}

Further technical details

EF Core version: 2.2.1
Database Provider: Microsoft.EntityFrameworkCore.SqlServer
Operating system: Win10/Server2016
IDE: Visual Studio 2017 15.9.5

@danjohnso
Copy link
Author

I assume this is not going to make 3.1 then?

@ajcvickers
Copy link
Contributor

@danjohnso Correct.

@roji
Copy link
Member

roji commented Nov 11, 2019

This could simply be a result of dotnet/SqlClient#245 which has recently been fixed in SqlClient (but not yet released). See also #18221.

Bottom line, wait for the next 1.1.0 preview release and try it out - there's a good chance the slowness may disappear. If you could confirm back on this once you test that would be very helpful.

@danjohnso
Copy link
Author

@roji I will try the next preview, but wouldn't I have seen that same issue in scenario 3 and 4 with Dapper if it were driver related?

@roji roji modified the milestones: Backlog, 5.0.0 Nov 11, 2019
@roji
Copy link
Member

roji commented Nov 12, 2019

Apologies, I didn't look closely enough at your code sample.

I did a quick benchmark based on your scenario (code below), and included raw ADO.NET access as well, with interesting results:

Scenario Time
EF Core async 2803ms
EF Core syn 50ms
Dapper async 60ms
Dapper sync 23ms
ADO.NET async 2509ms
ADO.NET sync 12ms

ADO.NET async is extremely slow as well, just as expected because of dotnet/SqlClient#245. My guess is that Dapper is doing something else which isn't triggering the SqlClient perf issue - possibly not actually calling the async ADO.NET API. Maybe @mgravell can help explain what's going on.

At this point this seems to be nothing more than dotnet/SqlClient#245, which has already been fixed for the SqlClient 1.1.0 release. I'm going to go ahead and close this as an EF Core issue, but we can continue exchanging here (and I can reopen if necessary).

Benchmark code:

class Program
{
    static async Task Main(string[] args)
    {
        int id;

        using (var ctx = new TestContext())
        {
            ctx.Database.EnsureDeleted();
            ctx.Database.EnsureCreated();

            var data = new SomeEntity { Data = new byte[1024*1024*10] };
            ctx.SomeEntities.Add(data);
            ctx.SaveChanges();

            id = data.Id;
        }

        var sw = new Stopwatch();

        using (var ctx = new TestContext())
        {
            sw.Restart();
            await ctx.SomeEntities.AsQueryable().Where(x => x.Id == id).Select(x => x.Data).SingleAsync();
            Console.WriteLine($"EF Core async: {sw.ElapsedMilliseconds}ms");
        }

        using (var ctx = new TestContext())
        {
            sw.Restart();
            ctx.SomeEntities.Where(x => x.Id == id).Select(x => x.Data).Single();
            Console.WriteLine($"EF Core sync: {sw.ElapsedMilliseconds}ms");
        }

        using (var ctx = new TestContext())
        {
            sw.Restart();
            using var conn = ctx.Database.GetDbConnection();
            await conn.QuerySingleAsync<byte[]>("SELECT Data FROM SomeEntities WHERE Id = @id", new { id });
            Console.WriteLine($"Dapper async: {sw.ElapsedMilliseconds}ms");
        }

        using (var ctx = new TestContext())
        {
            sw.Restart();
            using var conn = ctx.Database.GetDbConnection();
            conn.QuerySingle<byte[]>("SELECT Data FROM SomeEntities WHERE Id = @id", new { id });
            Console.WriteLine($"Dapper sync: {sw.ElapsedMilliseconds}ms");
        }

        using (var ctx = new TestContext())
        {
            sw.Restart();
            using var conn = ctx.Database.GetDbConnection();
            conn.Open();
            using var cmd = conn.CreateCommand();
            cmd.CommandText = "SELECT Data FROM SomeEntities WHERE Id = @id";
            cmd.Parameters.Add(new SqlParameter("id", id));
            await cmd.ExecuteScalarAsync();
            Console.WriteLine($"ADO.NET async: {sw.ElapsedMilliseconds}ms");
        }

        using (var ctx = new TestContext())
        {
            sw.Restart();
            using var conn = ctx.Database.GetDbConnection();
            conn.Open();
            using var cmd = conn.CreateCommand();
            cmd.CommandText = "SELECT Data FROM SomeEntities WHERE Id = @id";
            cmd.Parameters.Add(new SqlParameter("id", id));
            cmd.ExecuteScalar();
            Console.WriteLine($"ADO.NET sync: {sw.ElapsedMilliseconds}ms");
        }
    }
}

public class SomeEntity
{
    public int Id { get; set; }
    public byte[] Data { get; set; }
}

public class TestContext : DbContext
{
    public DbSet<SomeEntity> SomeEntities { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder.UseSqlServer(@"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0");
}

@roji roji closed this as completed Nov 12, 2019
@roji roji removed this from the 5.0.0 milestone Nov 12, 2019
@mgravell
Copy link
Member

mgravell commented Nov 12, 2019

@roji interesting; for info, Dapper here is using SequentialAccess + SingleResult (but not SingleRow, since it needs to check for a second row to assert the Single), then ExecuteReaderAsync and reader.ReadAsync; it does not use GetFieldValueAsync per row - once it is inside a row, it just uses GetValue - which has good pluses and minuses (minus: especially the boxing). Does this explain what you're seeing?

@roji
Copy link
Member

roji commented Nov 12, 2019

it does not use GetFieldValueAsync

Odd - in my tests the slowdown occurs when changing Read into ReadAsync. Changing between GetFieldValueAsync/GetFieldValue/GetValue doesn't seem to impact it. Any chance you could take a look to make sure you're calling ReadAsync, so we get to the bottom of this?

PS Calling GetFieldValueAsync rather than GetValue can be important simply because it's async - the whole point of SequentialAccess is to not have to buffer the entire row in memory, so reading a new field may involve I/O, which Dapper is doing synchronously although the call is supposed to be 100% async.

@danjohnso
Copy link
Author

Can confirm the QuerySingleAsync in Dapper is using ReadAsync. However the deserializer does use GetValue over GetFieldValueAsync, so that must be the difference here is the call not being fully async. I will still give the preview a try when the nuget package drops and see if that will be fast enough for most of our files.

@roji
Copy link
Member

roji commented Nov 12, 2019

@danjohnso ok. The preview should definitely make things much much better, you can see some numbers in dotnet/SqlClient#285.

@pwen090
Copy link

pwen090 commented Dec 20, 2019

To clarify; adding a direct reference to an EF 3.1 project for Microsoft.Data.SqlClient >=1.1 should fix the slowness and eventually Microsoft.EntityFrameworkCore.SqlServer will update its own reference of Microsoft.Data.SqlClient?

@roji
Copy link
Member

roji commented Dec 20, 2019

@pwen090 that's correct. It's fine for a project to reference any version of Microsoft.Data.SqlClient, as long as that version is compatible with what the EF 3.1 provider was compiled against (semver should express that). And yes, the EF 5.0 SQL Server provider will certainly reference a version of Microsoft.Data.SqlClient which doesn't has this issue.

@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
@DaveVdE
Copy link

DaveVdE commented Mar 8, 2023

I ran into this issue and sure enough, when I changed to sync FirstOrDefault() it was much faster.

Using Microsoft.EntityFrameworkCore.SqlServer 7.0.3 with Microsoft.Data.SqlClient 5.1.0.

Column is of type varbinary(max) and contains a 26MB zip file, takes 2 minutes to fetch on my local machine, Windows 10 w/ VS2022 17.4.4 running MSSQL 16.0.1000.6 locally.

@roji
Copy link
Member

roji commented Mar 8, 2023

@DaveVdE yeah, that's a well-known problem with SqlClient and is tracked by dotnet/SqlClient#245. It has nothing to do with EF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants