Aggregate Tables

I’ll be gathering all my notes on aggregate tables in this post.

What are aggregate tables?

Aggregate tables, also know as summary tables, are fact tables which contain data that has been summarized up to a different level of detail. For example, let’s say that your data warehouse contains a transaction table with the following characteristics (I’ll use a banking example):

Table dimensionality: account id, transaction type, day id, transaction amount
Average number of transactions per day: 30 million
Number of days stored in the transaction table: 30
Approximate number of rows: 900 million rows

Let’s pretend that half of the daily transactions are deposits, so there are approximately 450 million rows that represent deposit transactions. The other half are withdrawals.

Suppose a DW user wants to know how much money was deposited into the bank during the past month. The user, through the reporting software, will issue a query similar to:

select sum(transaction_amount)
from transaction_fact
where transaction_type=’deposit’

Pretend that your DW platform can scan 10 million rows per second; therefore, the approximate time to complete the query will be:

query time = number of rows / scan rate

which in our example translates into:

query time = 900 million rows / 10 million rows/second

query time = 90 seconds

Waiting 90 seconds for such a simple query is simply unacceptable, so here is where an aggregate table can help you. To answer our hypothetical question, we will build an aggregate table which summarizes the transaction table by transaction type. The aggregate may be defined as follows:

create table fact_transaction_aggregate as
select day_id, transaction_type, sum(transaction_amount) as transaction_amount
from transaction_fact
group by day_id, transaction_type

We said before that there are only two transaction types and thirty days of data. Using the simplifying assumption that half of the daily transactions are deposits and half are withdrawals, the size of the new table will be only 60 rows! (table size = 30 days * 2 transaction types)

The SQL needed to get the answer is:

select sum(transaction_amount)
from transaction_aggregate
where transaction_type=’deposit’

And the answer will come back in 0.000006 seconds (60 rows / 10 million rows /second). The result: happy users!

Comments are closed.