In this blog I explain how you can improve the performance of your Power BI report. For a long time I was struggling with a bad performing report I used for a training. Now I finally found the solution I’m happy to share my insights with you.
So if your Power BI report is running slow or you want to avoid it to run slow, you probably should optimize your data model. This is usually where the most performance gains can be made. Reducing the memory of your model has a direct impact on the performance of your reports and dashboards. To find out which optimizations potentially have the most impact, you need information about the used memory in your data model. VertiPaq*1 analyzer is the tool to accomplish this. This tool is available on sqlbi. Personally I prefer the version which is integrated in DAX Studio.
In this blog I work with an example database called ‘Mrbean’. It holds data on customers and their orders for a fictive coffee retailer in the Netherlands.
Perform an analysis
Open your report in Power BI desktop. Then launch DAX Studio (from the external tools tab) to perform an analysis of the used memory in your data model.
Within DAX Studio select View Metrics (Advanced tab) to perform the analysis (you have to be connected to a model (Home tab > Connect). The results will be discussed in the next section.
It is also possible to generate a VPAX file in DAX Studio (use Export Metrics). This file has a very limited size and can be shared easily. Making it possible to do a (quick) analysis of a model of which you can’t access the PBIX file.
Interpret the results
Before you start investigating the results, first look at the summary tab. Here you’ll find the total in memory size of your model in MB. In the example the model is 160 MB, which is pretty large.
The tables tab provides an overview of the tables in the data model. The total size of each table is displayed. As well as how much a table is taking up as percentage of the total size of all tables. In this example the Orders table accounts for 95% of the total size. By expanding the view of the Orders table it becomes clear the orders_ID column is the primary cause. This is also shown on the columns tab, which gives an overview of the most expensive columns in the database.
Performance optimalisation steps
Based on the results of the analysis I took the following 5 steps to optimize the model.
- Disable auto date/time
If the table overview shows tables starting with LocalDateTable auto date/time is enabled. Meaning Power BI generates hidden calculated tables for each date column in your model. These tables increase the model size. Therefore it is better to create your own date table and disable this file setting. In this example this saves only 4 MB (2,5%), but the impact can be much bigger.
- Choose the right datatype for a column
Vertipaq is datatype independent. Therefore it doesn’t matter if a column has a text, bigint or float data type. Vertipaq needs to create a dictionary for each of those columns. The size of this dictionary determines the performance, both in terms of speed and memory space allocated.
Since the orders_id column has the largest share in the size of the model this is the place to start optimizing. It is a string column with many distinct values. This means low compression. Vertipaq uses the hash algorithm *2 to store the data. Resulting in a large dictionary size (106 MB). This is being referred to as a high cardinality.
With an integer datatype value encoding may be enabled. This is why I converted the orders_ID column to an integer. I replaced the store (S) and webshop (W) indicator in the orders_ID with a number. Unfortunately what I expected did not happen. There was no change in dictionary size. Hash encoding is still applied to compress the column, because the orders_ID values fluctuate significantly within the range (81.000 to 9.999.990).
To overcome this problem I created an index column based on the orders_ID. This change was made in the data source. Now the number range is smaller (1 to 3.252.459) and the numbers linearly increase. Making VertiPaq assume that it is a primary key and choose value encoding. This has a major impact on the size of the data model. The dictionary of 106 MB disappears and the hierarchy size decreased tremendously (13 MB). Resulting in a way smaller model size of 45 MB.
Enabeling the right encoding is also applicable for other columns. The customers_ID column in the Orders table also consumes a considerable amount of memory. This is an integer column stored as text. After changing the datatype value encoding is enabled. However, because the customers_ID has a lot of digits the size to store the data remains high. Replacing the customers_ID column with an index column in the Orders and Customers table in the data source solves this problem. Saving another 3 MB in model size.
- Selecting the appropriate decimal features
If your decimal numbers have a precision of four or fewer decimals it is recommended to use the fixed decimal number data type. This data type is internally stored as an integer and therefore can be stored more efficiently. Applying this to several columns (i.e. wholesale price, consumer price, margin and discount) saved 1 MB in the data model.
- Remove unnecessary columns
A logical step in optimizing your data model is to remove the columns you don’t need. Especially the ones with a high column size. Don’t let the thought that you potentially might need a column later stop you. If you remove a column in the Power Query editor you can always undo this later. In this example birthday is not needed, because there is also an age column available. For the geographical analysis the customers region is needed, so latitude, longitude and city are not needed. Finally customer type can be removed from the orders table, because it is a customer property already available in the customers table. Bringing the total model size to 40 MB.
- Set available in MDX to false
Another way to optimize your model is by looking at the size of the automatically generated hierarchy of a column (used by MDX). These hierarchies are used to improve the user experience in Excel and other client applications that use the MDX query language against tabular models and Power BI datasets. The size can be set to zero in Tabular Editor by changing the option available in MDX to false. After applying this option to the columns with the largest hierarchy size (orders_ID and customers_ID), the model size decreased to 28 MB.
Within Tabular Editor select a column an go to the options menu on the right, here you can set Available In MDX to False
The 5 performed optimization steps resulted in a saving of 82,5%. The performance problems I often faced during my trainings with this dataset now disappeared.
Learnings when working on this article
- After performing optimization steps in Power Query editor you have to close and re-open the PBI file in order to see the real size of the data model. Otherwise the string columns will default have an allocated size of 1MB. Making you think that your optimizations steps didn’t make the expected impact.
- Sometimes all properties of a column seem to be fitted for value encoding, but Vertipaq still uses the hash algorithm. Changing the encoding hint setting in Tabular Editor to value may help to solve this problem. In order to be able to change this setting you’ve to select the allow unsupported Power BI features (experimental) option within preferences (in the file menu).
- Sometimes encoding settings changed after I made changes to columns used in a relationship. Value encoding switched back to hash encoding bringing the dictionary size back to the model. When I created the relationships in the data model after performing all optimization steps this behavior didn’t occur.