From SQL to DAX: Filtering Data

Categories

Consider the following SQL syntax:

SELECT * FROM DimProductWHERE Color = 'Red'

It corresponds to this DAX query using FILTER:

EVALUATEFILTER ( Product, Product[Color] = "Red" )

Copy Conventions#1

You can also use a DAX query using CALCULATETABLE:

EVALUATECALCULATETABLE ( Product, Product[Color] = "Red" )

Copy Conventions#2

In case of a simple SQL query like the initial query, there are no semantic differences between the two corresponding DAX options. However, we see a different behavior when other expressions or a more complex filter condition are present.

Consider a double filter like this one:

SELECT * FROM DimProductWHERE Color = 'Red' AND ListPrice > 1000

You have two options using FILTER. The simplest one is applying the same logical expression to a single FILTER:

EVALUATEFILTER (    Product,    AND ( Product[Color] = "Red", Product[ListPrice] > 1000 ))

Copy Conventions#3

With this approach, the logical conditions (color red and list price greater than 1000) are applied to every row in Product, even if some optimization might happen internally to reduce the number of comparisons made.

This is the corresponding solution using CALCULATETABLE:

EVALUATECALCULATETABLE (    Product,    Product[Color] = "Red",    Product[List Price] > 1000)

Copy Conventions#4

The filter arguments in CALCULATETABLE are always put in a logical AND condition. Even if the results produced by FILTER and CALCULATETABLE are identical, the performance might be different. The CALCULATETABLE could be faster if the number of values returned by each logical condition on single columns is relatively small (more details in the Understanding DAX Query Plans white paper). In fact, every filter argument in CALCULATETABLE corresponds to an internal FILTER on a single column. The previous CALCULATETABLE syntax is internally rewritten as:

EVALUATECALCULATETABLE (    Product,    FILTER (        ALL ( Product[Color] ),        Product[Color] = "Red"    ),    FILTER (        ALL ( Product[List Price] ),        Product[List Price] > 1000    ))

Copy Conventions#5

Do not assume that CALCULATETABLE is good and FILTER is bad. Every CALCULATETABLE internally incorporates one or several FILTER, and performance is generally determined by the granularity of each FILTER (even if the DAX engine might perform further optimization).

In the case of an OR condition, you should write the entire logical condition as a single condition even in CALCULATETABLE, including a FILTER condition inside. Consider the following SQL statement:

SELECT * FROM DimProductWHERE Color = 'Red' OR Weight > 100

The corresponding DAX syntax using FILTER is:

EVALUATEFILTER (    Product,    OR ( Product[Color] = "Red", Product[Weight] > 1000 ))

Copy Conventions#6

The corresponding CALCULATETABLE version requires an explicit FILTER argument in order to specify a logical condition that includes two columns of the same table (in this case, no automatic FILTER rewriting is possible).

EVALUATECALCULATETABLE (    Product,    FILTER (        Product,        OR ( Product[Color] = "Red", Product[Weight] > 1000 )    ))

Copy Conventions#7

At this point, you might think that FILTER is simpler to use. However, you should always consider the CALCULATETABLE option, because of the different behavior when you have nested calculations.

For example, consider the following SQL query that returns red products with a total sales amount greater than 100,000 and an average sales amount greater than 3,000 in the calendar year 2006.

SELECT *FROM DimProduct pWHERE    Color = 'Red'    AND ( SELECT SUM([SalesAmount])          FROM [FactInternetSales] s          INNER JOIN DimDate d            ON s.OrderDateKey = d.DateKey          WHERE s.ProductKey = p.ProductKey            AND d.CalendarYear = 2006        ) > 100000    AND ( SELECT AVG([SalesAmount])          FROM [FactInternetSales] s          INNER JOIN DimDate d            ON s.OrderDateKey = d.DateKey          WHERE s.ProductKey = p.ProductKey            AND d.CalendarYear = 2006        ) > 3000

The equivalent DAX expression using only FILTER is shorter than the SQL expression. However, you would still specify the Calendar Year filter in two calculations (sum and average):

EVALUATEFILTER (    FILTER ( Product, Product[Color] = "Red" ),    AND (        CALCULATE (            SUM ( 'Internet Sales'[Sales Amount] ),            'Date'[Calendar Year] = 2006        ) > 100000,        CALCULATE (            AVERAGE ( 'Internet Sales'[Sales Amount] ),            'Date'[Calendar Year] = 2006        ) > 3000    ))

Copy Conventions#8

Using CALCULATETABLE, the filter arguments (color and calendar year) are applied to the entire expression specified in the first argument. For this reason, the two CALCULATE expressions in the FILTER of the following DAX query do not have to include the filter on calendar year, because it is “inherited” from the outer CALCULATETABLE filters.

EVALUATECALCULATETABLE (    FILTER (        Product,        AND (            CALCULATE ( SUM ( 'Internet Sales'[Sales Amount] ) ) > 100000,            CALCULATE ( AVERAGE ( 'Internet Sales'[Sales Amount] ) ) > 3000        )    ),    Product[Color] = "Red",    'Date'[Calendar Year] = 2006)

Copy Conventions#9

Using CALCULATETABLE you propagate a filter to all the expressions embedded in the first argument. This results in shorter syntax simplifying the query, even if it requires particular attention – outer CALCULATETABLE and CALCULATE statements manipulate the filter context and affect any inner expression.