Lookup Table in Power Query

Nolock
4 min readMar 6, 2021

I have prepared an article about a transitive closure in Power Query. I wanted to publish the article already, but I decided to wait a little bit and write another one about a performance boost I’ve used in the code. It is a lookup table in Power Query.

Imagine the following scenario: You have a recursive algorithm and it contains maybe thousands of joins. You have just 1 table which you must join on itself in every step of a recursion. The result of the join is always at most one value. It can’t be performant if you use a join in every step.

Therefore, I was searching for another solution. The solution is a lookup table. But how to build a lookup table in Power Query? Well, it is simpler than expected. With a record. Even a record with few thousand elements is very fast in comparison to a table. Maybe it is implemented as an associative array, just guessing.

The data type record has a very nice feature. You can concatenate 2 records into a new one. Elements of the first record are overridden if they are also in the second record. Example:

[a = 1] & [b = 2] => [a = 1, b = 2]
[a = 1] & [a = 2] => [a = 2]

The code below shows how we can create a lookup table within a record with the help of List.Accumulate.

SourceList = List.Transform({0..CountOfUniqueNodes}, Text.From),
AsRecord = List.Accumulate(
SourceList,
[],
(state, current) =>
state &
Expression.Evaluate(
"[" & current & "= current]",
[ current = current]
)
)

Thanks to Expression.Evaluate we can create a record element with an arbitrary name. In my case we create a record with the following items [0=0, 1=1, 2=2, 3=3, …] where all values are strings.

An equivalent table is much simpler to create:

AsTable = Table.Buffer(Table.FromList(List.Transform(SourceList, each [From = _, To = _]), Record.FieldValues, {"From", "To"}))

Performance test

One of the best features of PowerQuery is that it is lazy. I love the laziness of PowerQuery. But now I have to push the PowerQuery engine to work in an order I want.

StartTimestamp = DateTime.LocalNow(),
Run =
if StartTimestamp = null then
null
else
List.Sum(
List.Generate(
() => 0,
each _ < CountOfIterations,
each _ + 1,
AppliedMethod[Fnc]
)
),
EndTimestamp =
if Run = null then
null
else
DateTime.LocalNow(),
Result =
[
Count Of Unique Nodes = CountOfUniqueNodes,
Count Of Iterations = CountOfIterations,
Used Method = AppliedMethod[Name],
Run = Run,
Runtime = EndTimestamp - StartTimestamp
]

This code guarantees that all the steps will be executed in the exact order: StartTimestamp, Run, EndTimestamp.

Let’s melt the processor

This is the whole code I use for melting down the processor.

let
CountOfUniqueNodes = 1000,
CountOfIterations = 1000000,
AppliedMethod = [Name = "Record", Fnc = (_) => Value.FromText(Record.FieldOrDefault(AsRecord, Number.ToText(Number.Mod(_, CountOfUniqueNodes))))],
/*AppliedMethod = [Name = "Table", Fnc = (item) =>
Value.FromText(
Table.SelectRows(
AsTable,
(row) => row[From] = Number.ToText(Number.Mod(item, 1000))
){0}[To]
)
],*/

SourceList = List.Transform({0..CountOfUniqueNodes}, Text.From),
AsRecord = List.Accumulate(
SourceList,
[],
(state, current) =>
state &
Expression.Evaluate(
"[" & current & "= current]",
[ current = current]
)
),
AsTable = Table.Buffer(Table.FromList(List.Transform(SourceList, each [From = _, To = _]), Record.FieldValues, {"From", "To"})),
StartTimestamp = DateTime.LocalNow(),
Run =
if StartTimestamp = null then
null
else
List.Sum(
List.Generate(
() => 0,
each _ < CountOfIterations,
each _ + 1,
AppliedMethod[Fnc]
)
),
EndTimestamp =
if Run = null then
null
else
DateTime.LocalNow(),

Result = [
Count Of Unique Nodes = CountOfUniqueNodes,
Count Of Iterations = CountOfIterations,
Used Method = AppliedMethod[Name],
Run = Run,
Runtime = EndTimestamp - StartTimestamp]
in
Result

I have a table with 1 000 nodes and that many directed edges. Each element points to itself: 0 -> 0, 1 -> 1, and so on. It is a nonsense in the real life, but it is good enough for a performance test.

And I invoke 1 000 000 times a lookup via a record.

Wow, it took just 1.57s on my machine. That’s 1.57 µs for a lookup in average. Not bad!

Now I want to do the same with a lookup via a table.

… still waiting … the processor is melting …

It is done after 6 minutes and 34 seconds. Hmm, I was expecting a bad result, but not that bad.

Another test with 10 000 nodes (10 times more than before) and 1 000 000 iterations via a record.

You see, it is slowing down a little bit. But, hey, it is still a breathtaking result. And a graph with 10 000 nodes (edges aren’t important for this test) is not a small one.

Next time, you will see how I apply this lookup table in a transitive closure.

--

--