Entity Framework Optimisations
NOTE: This is information is taken from a variety of resources and experience. Particular mentions to Stefan Djokic.
Only retrieve the fields you need
Suppose we have a presentation layer that only displays 2 fields:
var employeeList = context.Employees
.Select(e => new EmployeeDto
{
Name = e.Name,
Email = e.Email
})
.ToList();
Only return the fields that are required, this will:
- Reduced Data Transfer: Fetching only the required fields reduces the amount of data transferred from the database to the application, which can be crucial for performance, especially in networked or distributed environments.
- Lower Memory Usage: Loading fewer fields means using less memory in your application, which is particularly important for large datasets or when dealing with resource-constrained environments.
- Improved Query Performance: Queries can execute faster because the database engine has less data to retrieve and transmit.
- Decreased Entity Tracking Overhead: In EF Core, when entities are tracked, there is overhead associated with change tracking. By retrieving only needed fields, you reduce this overhead, especially when tracking changes is not necessary (e.g., in read-only scenarios).
// Instead of loading the entire Employee entity
var badExample = context.Employees
.Where(e => e.IsActive)
.ToList(); // This loads ALL fields from the database
// Use projection to only get what you need
var goodExample = context.Employees
.Where(e => e.IsActive)
.Select(e => new EmployeeDto
{
Id = e.Id,
Name = e.Name,
Email = e.Email
// Only the fields we actually need
})
.ToList();
// For even better performance with large datasets, use AsNoTracking()
var bestExample = context.Employees
.AsNoTracking() // Disables change tracking
.Where(e => e.IsActive)
.Select(e => new EmployeeDto
{
Id = e.Id,
Name = e.Name,
Email = e.Email
})
.ToList();
EmployeeDto class:
public class EmployeeDto
{
public int Id { get; set; }
public string Name { get; set; }
public string Email { get; set; }
}
Avoid N+1 Queries
This is a common performance issue in database operations, particularly with Object-Relational Mapping (ORM) tools like Entity Framework Core (EF Core). It occurs when your application makes one query to retrieve a set of objects, and then additional queries for each object to retrieve related data. This can lead to a significant performance bottleneck, especially when dealing with a large number of objects.
For example:
// Retrieve all blogs - 1 query
var blogs = context.Blogs.ToList();
foreach (var blog in blogs)
{
// For each blog, this will trigger an additional query
// to fetch its posts - N queries
var posts = blog.Posts;
foreach (var post in posts)
{
Console.WriteLine(post.Title);
}
}
A fix for this:
// Retrieve all blogs and their posts in a single query using eager loading
var blogs = context.Blogs.Include(b => b.Posts).ToList();
foreach (var blog in blogs)
{
foreach (var post in blog.Posts)
{
Console.WriteLine(post.Title);
}
}
- Eager Loading with Include: The
Include(b => b.Posts)method tells EF Core to load the Posts related to each Blog as part of the initial query. This way, when you access the data is already loaded, and no additional queries are needed. - Single Query Execution: The context.Blogs.Include(b => b.Posts).ToList() line executes a single query that retrieves all the Blogs along with their associated Posts. This effectively eliminates the N+1 queries issue.
- Iterating Over Results: The nested foreach loops then iterate over the blogs and their posts. Since the posts are already loaded into memory, no additional queries are executed during iteration.
Why?
- Improved Performance: By reducing the number of queries, you significantly decrease the load on the database and the network overhead.
- Simplified Code Logic: The code is more straightforward to maintain as it explicitly states the intention to load related data.
Using .AsNoTracking()
As touched upon earlier, using AsNoTracking in Entity Framework Core is a performance optimization technique particularly useful in scenarios where you are only reading data from the database and do not intend to update or delete it. This approach can significantly enhance the performance of your queries, especially in large-scale or read-heavy applications.
// Use products for read-only purposes
var products = context.Products.AsNoTracking().ToList();
Suppose you have a Product entity and you want to display a list of products on a web page. In this example, the product list is retrieved without the overhead of change tracking. The advantages are:
- Performance Improvement: Particularly noticeable in large-scale applications or when dealing with large datasets. Queries execute faster, and less memory is consumed.
- Reduced Overhead: Since EF Core does not need to track changes or maintain state information for these entities, there's a reduction in overhead.
Considerations
- Not Suitable for CUD Operations: AsNoTracking should not be used when you plan to update, delete, or otherwise modify the entities. In such cases, EF Core's change tracking is necessary to persist these changes back to the database.
- Best Used for Stateless Operations: Ideal for scenarios like API requests where each request is independent, and you don't need to track the entity's state across multiple operations.
Avoiding Cartesian Explosion
A Cartesian explosion refers to a situation where a query unintentionally produces a disproportionately large number of records due to the way joins are handled, significantly impacting the performance and efficiency of the query.
Suppose we want to list all books along with their authors, but we mistakenly create a query that leads to a Cartesian explosion:
// Incorrect query leading to cartesian explosion
var query = from a in context.Authors
from b in context.Books
select new { a.Name, b.Title };
var results = query.ToList(); // This will produce a cartesian product
In this query, we're incorrectly combining every author with every book, regardless of whether the book was written by that author. Obviously this result set is just plain incorrect, but regardless the corrected version involves properly joining the Authors and Books tables based on the AuthorId:
// Correct query using a proper join
var query = from a in context.Authors
join b in context.Books on a.AuthorId equals b.AuthorId
select new { a.Name, b.Title };
var results = query.ToList(); // This produces the correct result
In this corrected query, each book is matched with its corresponding author using a proper join condition. This eliminates the Cartesian explosion, resulting in a more efficient query and accurate results.
The Impact
- Performance Degradation: The query becomes slower due to the large volume of data being processed.
- Resource Intensive: Such queries consume more memory and CPU resources, which can affect the overall performance of the database server.
- Inaccurate Results: The results are often bloated with duplicate or irrelevant data, making them practically useless.
Use AsSplitQuery()
In Entity Framework Core 5.0 and later, the AsSplitQuery() method is used to split queries that retrieve multiple related entities into separate SQL queries. This can sometimes improve performance, especially when dealing with complex queries or large datasets.
Using AsSplitQuery() is particularly useful in scenarios where a single query could result in an inefficient execution plan or a cartesian explosion. By default, EF Core tries to retrieve all related data in a single SQL query using joins. While this is efficient for small datasets, it can become a bottleneck for large datasets or complex queries.
Consider an example where you have Author and Book entities, and you want to retrieve authors with their books.
A typical query without using AsSplitQuery():
var authors = context.Authors
.Include(a => a.Books)
.ToList();
This query retrieves all authors and their books in a single SQL query.
Using AsSplitQuery() to optimize the above query:
var authors = context.Authors
.Include(a => a.Books)
.AsSplitQuery()
.ToList();
Explanation
-
Without
AsSplitQuery(): EF Core generates a single SQL query with joins, which could be less efficient if the Books collection is large. -
With
AsSplitQuery(): EF Core generates separate SQL queries: one for authors and one for books. This can be more efficient in cases of large datasets or complex relationships.
Benefits
- Performance: Can improve performance and execution plan efficiency for queries involving large related collections.
- Reduced Memory Overhead: Helps to avoid cartesian explosions that consume large amounts of memory.
- Flexibility: Offers an alternative to the default join-based query approach, allowing for optimization based on specific query and data characteristics.
Considerations
- More Database Round-Trips: While
AsSplitQuery()can improve efficiency, it results in more database round-trips. This should be considered, especially in environments where database latency is a concern. - Use Case Specific: Its benefits are more pronounced in certain scenarios, particularly with large datasets. It may not always be the best choice for every query.