Entity Framework – the query plan cache story

Introduction

This post describes my adventure with investigation of performance issues in an application with Entity Framework as data access technology. I can just tell you that I have really improved my entity framework knowledge, especially compilation phase, because I had multiple cases where my queries were compiled multiple times and sometimes randomly.

Issue

Some time ago, I noticed two main issues with my rest API execution:

  1. the response time was too long compared to returned data,
  2. the response time increased randomly during an application usage.

Additionally, I used system counters to monitor IIS process and I noticed that CPU and memory usage was very high for API calls, where I noticed the above issues.

Cause of issues

To investigate these strange problems, I used a Jet Brains dotTrace profiler. Having configured dotTrace to profile my application hosted on the IIS, I started manually clicking in different parts of the system. After a while, when issues appeared, I began verification of the profiled data. The dotTrace presented the following information:

It appeared that my LINQ queries were compiled multiple times by Entity Framework. Normally, when the query is executed the first time, internal plan compiler compiles the query and stores it in a query plan cache. When the query is executed the second time, the compiled query should be taken from this cache, if it does not exist, then compilation must be executed once again.

Further investigation

The dotTrace showed me that queries weren’t compiled in most of my API resources. What made it even stranger, sometimes compilation worked and sometimes not for the same API request. The investigation took me some time because I had to go through all problematic API methods trying to find some pattern and I have a lot of them. Finally, I found the following problems:

  • Skip & take – queries are not parameterized
  • Contains()
  • Query eviction
  • Entity Framework Plus and global filter interceptor

Skip & Take – parameterized queries

When LINQ is used to build queries, then internal plan compiler stores compiled query in a cache, where the key value is generated based on a query expression. It means that if you change your query expression, then the key will be different and the query has to be compiled once again. If you use Entity SQL, then the key value is generated based on a text and parameter collection where the parameter name and type are checked as well.

Let’s take a look at the sample LINQ query:

var contacts = dbContext.Contacts.Where(x => x.Name == "Adam").ToList();

The generated SQL looks like this:

Now, let’s change our LINQ query in this way:

Then, the generated sql statement looks like this:

The first query is not parametrized because the value of the name is stored as a constant inside the expression, it means that every time the query is executed with different name value, the query execution plan won’t be reused because, it represents a completely new query.

In the second query, instead of storing constant value, I put a variable in an expression. With such an approach the query is compiled only once and subsequent execution takes advantage of the query plan caching.

In my code, I have plenty of queries with paging functionality and I noticed that Querable Skip() and Take() methods accept an integer parameter, which is stored as a constant in the query expression. Whenever I open a new page or change an amount of elements, the page query must be generated once again. I had to change my query to use Skip() and Take() methods from System.Data.Entity.QueryableExtensions namespace. Such methods are available from version 6 of the Entity Framework and they accept expression instead of an integer value. If these methods are used with variables, then we can finally utilize the advantage of a query plan caching.

Below you can see an improved LINQ query:

The improved generated query contains parameters instead of the explicit values:

Enumerable.Contains()

Every time, you use IEnumerable<T>.Contains<T> against an in-memory collection, the query plan is not cached.  Take a look at the following code:

int[] ids = {1, 2, 3}; var contacts = dbContext.Contacts.Where(c => ids.Contains(c.Id)).ToList();

The coresponding SQL query presented is not paramterized because it does not contain paramters.

There is a possibility to cache query utilizing Contains<T>(). The general idea is as follows:

In my opinion, creation of such a query is very painful. Additionally, it will not work because it’s impossible to use index array in an expression, but we can use a PredicateBuilder (http://www.albahari.com/nutshell/predicatebuilder.aspx) from LinqKit to build our OR statement and make our query cacheable:

Now you can see that the previous query got parametrized:

The main problem with this solution is that cache gives us some advantage if the number of items in an array is the same in every execution but if the size of an array varies, then the query must be recompiled. That’s why I decided to use this approach only in places where the array size does not change very often.

Query eviction

I observed that after some time of application use, some API execution time took longer than normally. I found information that the query plan cache can only store 800 entries and after exceeding this threshold the cache is swept. It is possible to increase amount of cached entries, and you can read more how to do it here: https://blogs.msdn.microsoft.com/premier_developer/2017/03/23/entity-framework-query-caching/

Entity Framework Plus and global filter interceptor

In my application, the user can delete entities (wow :P). However, these entities are not removed from the system, they are just marked as deleted (IsDeleted = true, soft delete).

To support this scenario, I decided to use a global filter from the Entity Framework Plus library. This library allows me to create the code that will filter out the removed entities from all queries run against the database:

QueryFilterManager.Filter<IDeletable>(q => q.Where(x => x.IsDeleted == false));

Still, whenever I create a new database, the context filter generates a different expression tree. I use a session per request pattern for managing the database context, so the problem was pretty serious as you could already see above, the cache key is generated on the basis of an expression tree. I am not going to describe this issue here because I have already reported it providing an extensive description and sample source code: https://github.com/zzzprojects/EntityFramework-Plus/issues/175. The most important thing is that it will not be fixed and I had to resign from the usage of this global filter feature and I started using this library: https://github.com/jcachat/EntityFramework.DynamicFilters.

Additionally, I need to admit that this problem was most painful for me because after solving other issues described, I could not find any reason why some queries still weren’t cached. I even looked at the Entity Framework source code but finally with the help of the dotTrace with very detailed profiling I finally noticed that the issue is generated by the Entity Framework Plus library.

[I use an Entity Framework Plus library in my entire system to speed up my queries and I believe that this is one of the best products that everyone should take into consideration when the entity framework is used.]

Conclusion

Optimization of the queries was an interesting adventure because you don’t have such issues every day. I had a pleasure to work with great tools like JetBrains dotTrace and dotMemory (it wasn’t described here, but helped me to find some other bugs in the system). I think that now I better understand how the Entity Framework works (and so you do), including its advantages and disadvantages.

We carried out a performance test before and after introducing the changes described. The results were surprising, I knew that response time would decrease but I did not expect such a vast improvement. We measured CPU and memory usage, and it appeared that the usage dropped as well. Potentially, because the garbage collector has not been executed so many times.

On the screen below you can see a comparison of our performance test results before and after the changes. The X axis presents 90% percentile of the response time. On Y axis, we can see how many times faster or slower our API responded after the changes. Most of the API calls accelerated almost 8 times, which makes it a significant result. In the next system, I will certainly remember to verify whether queries plans are cached or not.

 

 

2 Replies to “Entity Framework – the query plan cache story”

  1. Pingback: dotnetomaniak.pl

Leave a Reply

Your email address will not be published. Required fields are marked *