Entity Framework – improve query performance – query compilation

Introduction

In the first post, I described query execution steps that can be potentially improved when the query is executed. In this article, I will focus on the first phase named “query compilation”. I will show you how to solve the following problem:

“When the query is executed the first time, internal plan compiler compiles the query and stores it in a query plan cache. When the query is executed the second time, the compiled query should be taken from this cache. If it does not exist, then compilation must be executed once again. Unfortunately, if you have some specific syntax in your LINQ query, then the query compilation phase is made with every query execution. Microsoft claims that this stage has a medium impact on performance but from my perspective, it has an extremely high influence on CPU and memory resources.”

Cause of issues

There are mainly two reasons why query compilation phase may be executed too often. If it’s executed every time for a specific query then based on my experience, you probably use IEnumerable<>.Contains() method in your query, if it’s not executed with every call but still often then probably you use constants in your query.

There is one more case when query can be compiled every X amount of time but this only exists in a really big application when throughput is very high. If you have such kind of application and you noticed a compilation issue even if you are sure that queries were created correctly, then this can indicate that query eviction algorithm has been executed. By default EF can store only 800 elements in a cache when this threshold is exceeded then clean is executed but there is a way to increase the number of the cached element. There is a great article describing it: https://blogs.msdn.microsoft.com/premier_developer/2017/03/23/entity-framework-query-caching/

IEnumerable<>.Contains()

IEnumerable<>.Contains() – EF has a rule that he doesn’t cache queries containing Contains() method in the expression. He has such rule because.Contains() is used with the collection and is hard to predict the number of elements in your collection. In our project to solve this problem, we made a few extensions methods with our custom implementation of the Contains() and not Contains()., the code is available here: https://gist.github.com/adamlepkowski/f8073dc45de498d4f7c6299f40dcd519. That extension will not work in every scenario that you can imagine building EF query but they may be very handy.

If you use a default implementation of the contains method:

The generated SQL looks like this:

Now let’s change our LINQ query to use our custom extensions method:

Then, the generated SQL statement looks like this:

As you can notice SQL queries are a bit different, in the first query elements from a collection are put inside a SQL query and in the second query collection elements are parameters of the query so that values can be easily replaceable and the same execution plan is used. Extensions that we developed work but it have some limitation regarding a caching strategy – cache is made for a specific amount of elements in the array. It means that if you execute a query twice with the same amount of elements in the collection, then the second query will not be compiled because has the same amount of elements as the collection from the first query had.

Constants

Let’s take a look at the sample LINQ query:

var contacts = dbContext.Contacts.Where(x => x.Name == "Adam").ToList();

The generated SQL looks like this:

As you can see the name that we used for filters is inside the SQL query as a constant. So, let’s change our LINQ query a bit:

Then SQL query looks like this:

Now the query is completely different there is no static ‘Adam’ value inside SQL query but there is a parameter. If we decide to run the second query with a different name than ‘Adam’ then the query plan will be reused. If you try to execute the first query multiple times in different places of your application, then it will be reused only when the constant value is the same.

Provided example is pretty straightforward but It should let understand how to recognise potential problematic queries and analyse a SQL query to find an issue. Interesting is that some LINQ extensions methods use constants instead of variables by default (Skip() and Take()). Those two are mainly used to make paging, and if you run a query to retrieve the first page with 20 elements on the page then the query will be compiled with constants values if you change page size or page number query will be compiled once again. To solve this specific problem you can start using those two methods but from System.Data.Entity.QueryableExtensions namespace.

Problem recognition

The problem is mainly revealed by high CPU consumption or jumping of a CPU usage. If the issue exists in multiple queries then CPU consumption is high, if only a small percentage of your queries have this problem then CPU consumption increase when that problematic query is executed. When it comes to a memory consumption then allocation occur very often and Garbage collector is often fired.

There are three ways to find problematic queries and confirm a suspicion  that problem with query compilation exists:

  1. doTrace / ANTS Profiler / PerfView can be used, if you notice that GetExecutionPlan() method from EntityFramework namespace is often executed then it means that problem exists
  2. based on what I described further in a document you can open your favourite code editor and try to manually search for issues based on a syntax that can be problematic (Contains(), Skip(), Take())
  3. check you generated SQL queries using a SQL Profiler or any other tools to find queries that are not using a constants

Conclusion

I hope that after reading this article you know the impact of the query compilation on your application and you should know how to investigate and solve this problem occurring in your queries. The most important is to remember that this issue is mainly connected with a constant expression used in your LINQ query expression tree and you should always use a parameter / variable expression instead.

 

One Reply to “Entity Framework – improve query performance – query compilation”

  1. Pingback: dotnetomaniak.pl

Leave a Reply

Your email address will not be published. Required fields are marked *