Effectively Pivoting data–analysis (part 2)

pivot_timeconsumption2

This diagram shows the Pivot Table’s time consumption after the first optimizations. So let’s have a closer look at the CalculatePivot method.

Analyze CalculatePivot

In that method the Pivot Table is generated. That means that values of one column are combined into multiple columns based on an identifier, forming a new DataTable.

The method is iterating over all loaded data, to create rows based on values in PARTID-column and then set the target columns value.
As known from first step that can be joined with that loop inside FetchData method.
The return table structure is currently built after receiving all data, but if the data should be filled at receipt, that must also be done there.
After some more small modifications the time consumption is still rather high and there is only one part left I haven’t touched yet.

At the Beginning of CalculatePivot method it is checked whether an row already exists or it has to be created. The DataTable.Select() method seems to be rather inefficient at finding rows by it’s primary key, especially because it is returning a collection of DataRows.
It is recommended to query primary keys with on of the DataTable.Rows.Find() methods, and well, I have to say it’s worth it.
But is that really the fastest way to get a DataRow? No!

You can get a approximately 4-times boost by tracking your primary key information inside a separate dictionary and query against that.
But don’t forget to remove the primary key definition from data table to get the full benefit.

Following test results are based on building a DataTable with 10.000 rows, querying each 100 times.

DataTable.Select() 40767 ms
DataTable.Rows.Find() 21980 ms
Dictionary with Primary Key 20239 ms
Dictionary without Primary Key 11637 ms

This data shows, that you shouldn’t always rely on the methods provided by .NET Framework. There is often a compromise to be agreed when optimizing method for multiple possible usages. A self implemented method, specialized on that what you currently need, is often much faster.

Interestingly the same applies on DataTable.Columns.Contains(string key) method, the own caching solution is a bit faster than the provided Framework method.

pivot_timeconsumption3

A new comparison shows that only two of the initial methods are left. By moving the major parts into FetchData method, 99% of time used, is spend there. But compared to the first implementation of Pivot Table process, it completes in only 2.4% of time. That equates to a 40-times improvement.