Effectively Pivoting Data–Analysis

Before you start thinking about possible optimizations, fetch some performance data. My experience told me that all kind of considerations would be totally useless before you have gathered some performance analysis.

I used the existing methods as coarse orientation points, by measuring their execution time for 10000 parts each with 100 values.

pivot_timeconsumption

The major part of execution time (86.1%) is used by GetColumns method, so I’d expect the best optimization potential there.

The methods I am analyzing can be found here: https://github.com/dwonisch/PivotExample/blob/master/src/PivotExample/Pivot.cs

Analyze GetColumns

First step would be to analyze the method’s expected behavior. This method should provide the information to create the Pivot Table Columns and is using the configuration and the completely fetched set of data.

What you can see is, that the method is iterating through all data rows, in this example there are 1 million, checking all configured columns. That sums up to 100 million iterations to get a total of 101 columns returned.

So what are the advantages of iterating over all data rows. There is only one I could think of: Not showing columns were no data would be available for. In my opinion the complete method can be skipped, because the Pivot Columns are already defined by the user and it would be more appropriate to show an empty column than to hide it.

But lets go on with the decision to hide the columns, there is another place were we are iterating through all data already, the FetchData method. Maybe we could get a performance boost by refactoring the GetColumns method, so that it can be used inside the FetchData iteration.
This change fastened the complete execution up, whoops, it actually decreased performance by 14% (what I didn’t really expect). The next part of that method is, that the configuration parameter is iterated over multiple times. Replacing it with a dictionary should fasten it up a little bit and I was right, introducing dictionaries reduced query time by 87%.

As you can see the processing time from GetColumns moved into FetchData and could be greatly reduced. Now it’s time to have a look at the next method.

pivot_timeconsumption2

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>