Love Entity Framework and ORMs

I love Entity Framework and other object-relational mappers (ORMs) like it. These frameworks allow us to code against an object-oriented data model and map any changes seamlessly (usually) to the underlying relational model that the data is persisted in. They reduce the amount that we need to worry about databases.

Where I have seen the most value from ORMs is in eliminating the need for developers to hand craft their own SQL statements. To write really good SQL you really need a decent understanding of how the database is going to process your queries, you need to ask yourself questions like: “How many table scans is this query going to perform?” and “what indexes could I add to make this run faster?”. Joe Developer usually doesn’t have the insight into how a database operates to write performant, non-trivial queries. Unfortunately, I have seen some really bad handcrafted SQL statements in my time. Queries that perform n to-the-power n table scans, etc.

Thankfully, ORMs are written by people who specialize in databases. The SQL they produce to manipulate the relational data model is usually more efficient than the average developer would write themselves. Most of the time, I see performance gains when I am refactoring some legacy code to incorporate an ORM.

Hate slow bulk insertions

However, I have encountered a bit of a weakness in Entity Framework in recent months and that is in bulk insertions. I should clarify that I’m talking specifically about Entity Framework 6 here, this may have improved in the latest versions.

I was refactoring a .NET service that had a direct link to a SQL Server database and operated with a mix of SQL and LINQ. My goal was to replace the direct connection to the database with the use of a REST API hosted by another service, which would interact with the database via Entity Framework.

Once this refactoring was complete, I discovered that the portion of the service that dealt with the bulk insertion of data had seriously regressed in terms of performance. An operation that previously might have taken minutes to insert lots of rows was now taking hours. I discovered that this is because EntityFramework doesn’t really do bulk insertions, even if you use DbSet.AddRange. The generated SQL will still perform each insert one-at-a-time.

SqlBulkCopy

I couldn’t allow my refactored service to be slower than the original implementation. The regressed code had used LINQ to talk directly to the database and I was planning to throw REST calls to another service into the mix, out-performing the original code was going to be difficult. But I did manage it by adding a mechanism to use SqlBulkCopy for bulk operations in the data service.

SqlBulkCopy is a fairly old class, but it is still the fastest way to get a large number of rows into SQL Server from a .NET application, even faster than an INSERT statement according to the class docs. To use it to perform our bulk insert, we need to construct a DataTable object that represents our Entity Framework entities in relational form.

My bulk insertion method

using FastMember;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;

namespace DataService
{
    class BulkOperation
    {
        // Bulk insertion of a List of entities into the database. Uses SqlBulkCopy for maximum performance.
        public static void Insert<T>(IList<T> data)
        {
            // Database table name matches the entity type name.
            string destTableName = typeof(T).Name;

            using (var conn = new SqlConnection(ConnectionStringBuilder.GetConnectionString()))
            {
                conn.Open();
                var validColumns = GetTableColumnNames(destTableName, conn);

                using (var bulkCopy = new SqlBulkCopy(conn))
                {
                    bulkCopy.DestinationTableName = destTableName;
                    var dataTable = ToDataTable(data, validColumns.ToArray());

                    bulkCopy.WriteToServer(dataTable);
                }
            }
        }

        // Utility method for getting the column names from the schema of a given table. These are used for object-to-datatable
        // mapping for SqlBulkCopy.
        private static List<string> GetTableColumnNames(string tableName, SqlConnection conn)
        {
            List<string> result = new List<string>();

            // Substitute DATABASE_NAME for the actual name of your database.
            var columnsTable = conn.GetSchema("Columns", new[] { "DATABASE_NAME", null, tableName });
            foreach (DataRow row in columnsTable.Rows)
                result.Add(row["COLUMN_NAME"].ToString());

            return result;
        }

        // Converts a List of entities into a DataTable for SqlBulkCopy. Uses FastMember's ObjectReader for reflection-like
        // code simplicity without the performance hit. An array of valid column names for the target table is sent to the
        // ObjectReader as only properties whose name matches a column name should get a mapping to the database.
        private static DataTable ToDataTable<T>(IList<T> data, string[] validColumns)
        {
            DataTable result = new DataTable();
            using (var reader = ObjectReader.Create(data, validColumns))
            {
                result.Load(reader);
            }

            return result;
        }
    }
}

My bulk insert method is simple, short, fast and also quite generic. It works by finding a table whose name matches the class of the object we are inserting and reading the columns of that table from the table schema. These column names map over property names on our entities.

We then use FastMember‘s ObjectReader to create an IDataReader from the list of entities we are bulk inserting. This maps the properties of each entity where the property name matches a column name in the destination table. FastMember gives us the high-performance reading of properties on generics, in a way that is faster than normal reflection.

We then load the IDataReader into a DataTable and we’re ready to perform a SqlBulkCopy.

Closing thoughts

One I had integrated the above into my data service, I found that the refactored client service outperformed the previous implementation by about 20% in the bulk insertion of a million entities, even with an additional service and HTTP communication now involved. I was quite pleased with the results, as were my users.

The classes used to do this have been around a while, but in using them I didn’t have to resort to hand-crafting SQL statements and the mechanism is generic enough to work well for new types of entity processed by the service in future. I did have to work around an aspect of Entity Framework, but not in a way that breaks with the spirit of it or many of the advantages it gives us.

About the Author Kirk MacPhee

An experienced software developer and technical lead, specializing in automation technologies and their application.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s