<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Data Warehouse Performance Tuning &#187; Tips</title>
	<atom:link href="http://www.atlogic.com/blog/index.php/category/tips/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.atlogic.com/blog</link>
	<description>Searching for profound knowledge</description>
	<lastBuildDate>Tue, 27 Jan 2009 16:02:36 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>MySQL vs. PostgreSQL for Data Warehouses that use Views</title>
		<link>http://www.atlogic.com/blog/index.php/2006/11/07/mysql-vs-postgresql-for-data-warehouses-that-use-views/</link>
		<comments>http://www.atlogic.com/blog/index.php/2006/11/07/mysql-vs-postgresql-for-data-warehouses-that-use-views/#comments</comments>
		<pubDate>Tue, 07 Nov 2006 19:25:55 +0000</pubDate>
		<dc:creator>Miguel Barrientos</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Tips]]></category>

		<guid isPermaLink="false">http://www.atlogic.com/blog/index.php/2006/11/07/mysql-vs-postgresql-for-data-warehouses-that-use-views/</guid>
		<description><![CDATA[As part of the Netezza replacement project I&#8217;ve been working on, I ran a quick evaluation of MySQL 5.x. before testing PostgreSQL. I didn&#8217;t have time to conduct a thorough evaluation, but I quickly found out that MySQL does not handle queries ran against views very well. I ran some tests by creating simple views [...]]]></description>
			<content:encoded><![CDATA[<p>As part of the Netezza replacement project I&#8217;ve been working on, I ran a quick evaluation of MySQL 5.x. before testing PostgreSQL. I didn&#8217;t have time to conduct a thorough evaluation, but I quickly found out that MySQL does not handle queries ran against views very well. I ran some tests by creating simple views against some very large tables (40+ million rows). Running a query against the base table was many times faster than running the same query against a view which did not even contain a WHERE clause. I&#8217;ll need to do some additional research, but it seems that the implementation of views in MySQL lags far behind the PostgreSQL implementation. It&#8217;s obviously not very surprising, since support for views was added to MySQL only in version 5.0.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.atlogic.com/blog/index.php/2006/11/07/mysql-vs-postgresql-for-data-warehouses-that-use-views/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PostgreSQL Performance Checklist</title>
		<link>http://www.atlogic.com/blog/index.php/2006/11/07/postgresql-performance-checklist/</link>
		<comments>http://www.atlogic.com/blog/index.php/2006/11/07/postgresql-performance-checklist/#comments</comments>
		<pubDate>Tue, 07 Nov 2006 19:13:16 +0000</pubDate>
		<dc:creator>Miguel Barrientos</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Tips]]></category>

		<guid isPermaLink="false">http://www.atlogic.com/blog/index.php/2006/11/07/postgresql-performance-checklist/</guid>
		<description><![CDATA[I found some very valuable tips in the web site of an unpublished book: Power PostgreSQL by Josh Berkus and Joe Conway. If you are working with large data sets and complex queries, pay special attention to the section that discusses work_mem.
]]></description>
			<content:encoded><![CDATA[<p>I found some very valuable tips in the web site of an unpublished book: <a title="PostgreSQL Performance Checklist" href="http://www.powerpostgresql.com/PerfList/">Power PostgreSQL</a> by Josh Berkus and Joe Conway. If you are working with large data sets and complex queries, pay special attention to the section that discusses work_mem.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.atlogic.com/blog/index.php/2006/11/07/postgresql-performance-checklist/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tuning PostgreSQL for Data Warehousing</title>
		<link>http://www.atlogic.com/blog/index.php/2006/10/26/tuning-postgresql-for-data-warehousing/</link>
		<comments>http://www.atlogic.com/blog/index.php/2006/10/26/tuning-postgresql-for-data-warehousing/#comments</comments>
		<pubDate>Thu, 26 Oct 2006 18:15:37 +0000</pubDate>
		<dc:creator>Miguel Barrientos</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Tips]]></category>

		<guid isPermaLink="false">http://www.atlogic.com/blog/?p=30</guid>
		<description><![CDATA[I&#8217;m currently working in a project that will replace a Netezza database with PostgreSQL 8.1. Early test results have been very encouraging, even though I haven&#8217;t found much information online about how to best tune PostgreSQL when dealing with complex queries running against very large tables. I will share what I learned in future posts.
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m currently working in a project that will replace a Netezza database with PostgreSQL 8.1. Early test results have been very encouraging, even though I haven&#8217;t found much information online about how to best tune PostgreSQL when dealing with complex queries running against very large tables. I will share what I learned in future posts.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.atlogic.com/blog/index.php/2006/10/26/tuning-postgresql-for-data-warehousing/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Tuning Methodology</title>
		<link>http://www.atlogic.com/blog/index.php/2006/02/24/tuning-methodology/</link>
		<comments>http://www.atlogic.com/blog/index.php/2006/02/24/tuning-methodology/#comments</comments>
		<pubDate>Fri, 24 Feb 2006 15:22:02 +0000</pubDate>
		<dc:creator>Miguel Barrientos</dc:creator>
				<category><![CDATA[Tips]]></category>

		<guid isPermaLink="false">http://www.atlogic.com/blog/?p=8</guid>
		<description><![CDATA[
The first step is to identify the root causes of the problem. The IT team may claim that the reporting software does not work well due to the complexity of the database schema, while the business users may claim that even simple reports take forever to run. Regardless of what both groups say, you must [...]]]></description>
			<content:encoded><![CDATA[<ol>
<li>The first step is to identify the root causes of the problem. The IT team may claim that the reporting software does not work well due to the complexity of the database schema, while the business users may claim that even simple reports take forever to run. Regardless of what both groups say, you must gather hard-data so that the real issues reveal themselves. The first question you must ask the IT team is: are you capturing DW usage statistics? If usage statistics are not available, skip to the next step. If usage is being tracked, you should be able to answer the following questions:</li>
<ol style="list-style-type: lower-roman" type="i">
<li>Who are the top report users?</li>
<li>Who are the top report creators?</li>
<li>Which users generate the most errors?</li>
<li>What are the most common errors?</li>
<li>What are the most popular reports?</li>
<li>Which reports take the longest to run?</li>
<li>Which database tables are used most frequently?</li>
<li>What is the row count of the most frequently used tables?</li>
<li>What table columns are used the most?</li>
</ol>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.atlogic.com/blog/index.php/2006/02/24/tuning-methodology/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Performance Tuning Guide</title>
		<link>http://www.atlogic.com/blog/index.php/2006/02/23/performance-tuning-guide/</link>
		<comments>http://www.atlogic.com/blog/index.php/2006/02/23/performance-tuning-guide/#comments</comments>
		<pubDate>Thu, 23 Feb 2006 15:25:32 +0000</pubDate>
		<dc:creator>Miguel Barrientos</dc:creator>
				<category><![CDATA[Tips]]></category>

		<guid isPermaLink="false">http://www.atlogic.com/blog/?p=4</guid>
		<description><![CDATA[I&#8217;m using this blog to create a performance tuning guide for data warehousing professionals. The guide will document the steps I follow to diagnose and solve what my clients perceive to be performance problems.
Let me first set a typical scenario that I have found during my consulting career. After several years of development, a large [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m using this blog to create a performance tuning guide for data warehousing professionals. The guide will document the steps I follow to diagnose and solve what my clients perceive to be performance problems.</p>
<p>Let me first set a typical scenario that I have found during my consulting career. After several years of development, a large corporation has finally managed to implement a corporate-wide data warehouse (DW). A dedicated IT development group has built and deployed a reporting solution on top of the DW. Unfortunately, the users of the DW, typically on the business side of the corporation, are very unhappy with the reporting environment. They claim that performance is extremely slow. Reports take forever to run. Many reports return cryptic errors that only IT can decipher. The IT group is bombarded with support requests, and must allocate most of its time to answering support calls. Furthermore, since the corporation&#8217;s business is changing constantly, business users demand that the DW evolve to reflect how the business works. However, since IT spends most of its time fixing problems with the existing reporting environment, it cannot implement new features in the DW.</p>
<p>The scenario described aboveÂ creates a dynamic whereby business users grow increasingly disillusioned with the DW and the IT group in charge of it, while the IT team grows frustrated with the business users and their insatiable demand for support and feature improvements. Both groups become frustrated with the reporting software, whichÂ is the most visible part of the DW.</p>
<p>When the situation becomes critical enough, the IT team decides to bring an external consultant to &#8220;fix the problem&#8221;.</p>
<p>In my next post I will list the high-level steps thatÂ a consultant must follow to solve the problem.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.atlogic.com/blog/index.php/2006/02/23/performance-tuning-guide/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DB2 Configuration Parameters for Data Warehousing</title>
		<link>http://www.atlogic.com/blog/index.php/2005/06/01/db2-configuration-parameters-for-data-warehousing/</link>
		<comments>http://www.atlogic.com/blog/index.php/2005/06/01/db2-configuration-parameters-for-data-warehousing/#comments</comments>
		<pubDate>Thu, 02 Jun 2005 04:22:25 +0000</pubDate>
		<dc:creator>Miguel Barrientos</dc:creator>
				<category><![CDATA[Tips]]></category>

		<guid isPermaLink="false">http://www.atlogic.com/blog/?p=29</guid>
		<description><![CDATA[An essential part of the performance tuning process is verifying that your database is configured properly. Scott Hayes and Philip Gunning provide detailed tips for setting DB2 configuration parameters in Tuning Up for OLTP and Data Warehousing.
]]></description>
			<content:encoded><![CDATA[<p>An essential part of the performance tuning process is verifying that your database is configured properly. Scott Hayes and Philip Gunning provide detailed tips for setting DB2 configuration parameters in <a href="http://www.db2mag.com/db_area/archives/2002/q3/hayes.shtml">Tuning Up for OLTP and Data Warehousing</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.atlogic.com/blog/index.php/2005/06/01/db2-configuration-parameters-for-data-warehousing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Measure, Improve, Repeat</title>
		<link>http://www.atlogic.com/blog/index.php/2005/05/24/measure-improve-repeat/</link>
		<comments>http://www.atlogic.com/blog/index.php/2005/05/24/measure-improve-repeat/#comments</comments>
		<pubDate>Wed, 25 May 2005 04:21:51 +0000</pubDate>
		<dc:creator>Miguel Barrientos</dc:creator>
				<category><![CDATA[Tips]]></category>

		<guid isPermaLink="false">http://www.atlogic.com/blog/?p=28</guid>
		<description><![CDATA[Scott Hayes wrote Measure, Improve, Repeat, an article that covers some practical advice on tuning DB2. The article does not focus on data warehouse applications, but some of the tips are still valid. I found his tip about getting all performance data with just one command particularly useful:
$ db2 &#8220;get snapshot for all on DBNAME&#8221; [...]]]></description>
			<content:encoded><![CDATA[<p>Scott Hayes wrote <a href="http://www.db2mag.com/story/showArticle.jhtml;?articleID=161601940">Measure, Improve, Repeat</a>, an article that covers some practical advice on tuning DB2. The article does not focus on data warehouse applications, but some of the tips are still valid. I found his tip about getting all performance data with just one command particularly useful:</p>
<p>$ db2 &#8220;get snapshot for all on DBNAME&#8221; &gt; allsnap.txt</p>
]]></content:encoded>
			<wfw:commentRss>http://www.atlogic.com/blog/index.php/2005/05/24/measure-improve-repeat/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Teradata data compression</title>
		<link>http://www.atlogic.com/blog/index.php/2005/05/24/teradata-data-compression/</link>
		<comments>http://www.atlogic.com/blog/index.php/2005/05/24/teradata-data-compression/#comments</comments>
		<pubDate>Wed, 25 May 2005 04:03:00 +0000</pubDate>
		<dc:creator>Miguel Barrientos</dc:creator>
				<category><![CDATA[Tips]]></category>

		<guid isPermaLink="false">http://www.atlogic.com/blog/?p=27</guid>
		<description><![CDATA[One of the interesting ways performance can be improved in Teradata is through data compression. The linked article mentions some of the details behind data compression.
]]></description>
			<content:encoded><![CDATA[<p>One of the interesting ways performance can be improved in Teradata is through <a href="http://www.teradataforum.com/l020829a.htm">data compression</a>. The linked article mentions some of the details behind data compression.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.atlogic.com/blog/index.php/2005/05/24/teradata-data-compression/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Aggregate Tables</title>
		<link>http://www.atlogic.com/blog/index.php/2005/05/20/aggregate-tables/</link>
		<comments>http://www.atlogic.com/blog/index.php/2005/05/20/aggregate-tables/#comments</comments>
		<pubDate>Fri, 20 May 2005 19:01:00 +0000</pubDate>
		<dc:creator>Miguel Barrientos</dc:creator>
				<category><![CDATA[Tips]]></category>

		<guid isPermaLink="false">http://www.atlogic.com/blog/?p=26</guid>
		<description><![CDATA[I&#8217;ll be gathering all my notes on aggregate tables in this post.
What are aggregate tables?

Aggregate tables, also know as summary tables, are fact tables which contain data that has been summarized up to a different level of detail. For example, let&#8217;s say that your data warehouse contains a transaction table with the following characteristics (I&#8217;ll [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll be gathering all my notes on aggregate tables in this post.</p>
<p><strong>What are aggregate tables?</strong><br />
<strong></strong><br />
Aggregate tables, also know as summary tables, are fact tables which contain data that has been summarized up to a different level of detail. For example, let&#8217;s say that your data warehouse contains a transaction table with the following characteristics (I&#8217;ll use a banking example):</p>
<p>Table dimensionality: account id, transaction type, day id, transaction amount<br />
Average number of transactions per day: 30 million<br />
Number of days stored in the transaction table: 30<br />
Approximate number of rows: 900 million rows</p>
<p>Let&#8217;s pretend that half of the daily transactions are deposits, so there are approximately 450 million rows that represent deposit transactions. The other half are withdrawals.</p>
<p>Suppose a DW user wants to know how much money was deposited into the bank during the past month. The user, through the reporting software, will issue a query similar to:</p>
<p>select sum(transaction_amount)<br />
from transaction_fact<br />
where transaction_type=&#8217;deposit&#8217;</p>
<p>Pretend that your DW platform can scan 10 million rows per second; therefore, the approximate time to complete the query will be:</p>
<p>query time = number of rows / scan rate</p>
<p>which in our example translates into:</p>
<p>query time = 900 million rows / 10 million rows/second</p>
<p>query time = 90 seconds</p>
<p>Waiting 90 seconds for such a simple query is simply unacceptable, so here is where an aggregate table can help you. To answer our hypothetical question, we will build an aggregate table which summarizes the transaction table by transaction type. The aggregate may be defined as follows:</p>
<p>create table fact_transaction_aggregate as<br />
select day_id, transaction_type, sum(transaction_amount) as transaction_amount<br />
from transaction_fact<br />
group by day_id, transaction_type</p>
<p>We said before that there are only two transaction types and thirty days of data. Using the simplifying assumption that half of the daily transactions are deposits and half are withdrawals, the size of the new table will be only 60 rows! (table size = 30 days * 2 transaction types)</p>
<p>The SQL needed to get the answer is:</p>
<p>select sum(transaction_amount)<br />
from transaction_aggregate<br />
where transaction_type=&#8217;deposit&#8217;</p>
<p>And the answer will come back in 0.000006 seconds (60 rows / 10 million rows /second). The result: happy users!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.atlogic.com/blog/index.php/2005/05/20/aggregate-tables/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Up and Running with DB2 UDB ESE: Partitioning for Performance</title>
		<link>http://www.atlogic.com/blog/index.php/2005/05/17/up-and-running-with-db2-udb-ese-partitioning-for-performance/</link>
		<comments>http://www.atlogic.com/blog/index.php/2005/05/17/up-and-running-with-db2-udb-ese-partitioning-for-performance/#comments</comments>
		<pubDate>Tue, 17 May 2005 14:48:00 +0000</pubDate>
		<dc:creator>Miguel Barrientos</dc:creator>
				<category><![CDATA[Tips]]></category>

		<guid isPermaLink="false">http://www.atlogic.com/blog/?p=25</guid>
		<description><![CDATA[This IBM e-book is a must-read if you&#8217;re using DB2 UDB 8.1. IBM Redbooks &#124; Up and Running with DB2 UDB ESE: Partitioning for Performance in an e-Business Intelligence World
It discusses:

Guidelines on building the large database and determining the number of partitions 
Bulk load using the new multipartition load 
Performance enhancements using MultiDimensional Clustering and [...]]]></description>
			<content:encoded><![CDATA[<p>This IBM e-book is a must-read if you&#8217;re using DB2 UDB 8.1. <a href="http://www.redbooks.ibm.com/abstracts/sg246917.html?Open">IBM Redbooks | Up and Running with DB2 UDB ESE: Partitioning for Performance in an e-Business Intelligence World</a></p>
<p>It discusses:</p>
<ul>
<li>Guidelines on building the large database and determining the number of partitions </li>
<li>Bulk load using the new multipartition load </li>
<li>Performance enhancements using MultiDimensional Clustering and Materialized Query Tables.</li>
<li>Availability through the new online utilities</li>
<li>Self Managing And Resource Tuning features</li>
<li>Migration scenarios</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.atlogic.com/blog/index.php/2005/05/17/up-and-running-with-db2-udb-ese-partitioning-for-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
