Archive for the ‘Tips’ Category

Reference Architectures Specifications

Monday, May 9th, 2005

Sun and Sybase reference architecture for data warehousing: Reference Architectures Specifications

Data Warehouse Boost on a Budget – Computerworld

Monday, May 9th, 2005

Good article about the Netezza approach: Data Warehouse Boost on a Budget – Computerworld

Oracle9i Data Warehousing Guide — Contents

Monday, May 9th, 2005

Suggested reading: Oracle9i Data Warehousing Guide — Contents

Data Warehousing Infrastructure: Linux Clusters for the Enterprise

Monday, May 9th, 2005

Here is more information about DW hardware, this time for those considering Linux as an OS: Data Warehousing Infrastructure: Linux Clusters for the Enterprise

Very large data warehouse disk storage architecture issues

Monday, May 9th, 2005

As part of my research about performance tuning, I’ve been trying to find some hardware configuration guidelines for data warehousing. So far I’ve found only a few worthwhile articles. A very high-level summary is provided in the article Very large data warehouse disk storage architecture issues written by the authors of Clickstream Data Warehousing. As expected, I’ve found that some of the best material comes from the RDBMS vendors. Oracle provides good info in Hardware and I/O Considerations in Data Warehouses. I’ll look some more in the DB2, Teradata, and SQL Server sites.

Insights: Aggregate Tables Defined

Friday, May 6th, 2005

This article in DM Review suggests the wrong approach to creating aggregate tables. If you try to build an aggregate table in response to specific business needs, you may end up with a large number of aggregates that become a maintenance headache. Besides, unless I’m missing the point, business users don’t need aggregates. What they need is reports that return immediately.

Hardware

Thursday, May 5th, 2005

Data warehouse queries often require scanning millions of records so, as far as hardware is concerned, you need to maximize the amount of memory in your RDBMS server, and use the fastest hard drives your money can buy.

Why should you maximize the amount of memory in your server? The key here is that if the RDBMS can load the most frequently used tables into memory, querying a table stored in memory will be much faster than querying a table stored in disk.

Other issues that I’ll write about later:

  • CPU speed
  • Number of CPUs
  • RAID levels
  • MPP vs SMP architecture
  • The Netezza approach

A complex task

Thursday, May 5th, 2005

Tuning a data warehouse (dw) requires dealing with a thick stack of layers. To name a few:

  • Relational database management system (RDBMS) server hardware
  • RDBMS server operating system (e.g. AIX, Solaris, Windows, Linux)
  • RDBMS (e.g. Oracle, DB2, Teradata)
  • Database schema (tables/views/indexes)
  • Network
  • Application server hardware
  • Application server software
  • Application server operating system

In this blog, I will focus on the database schema layer

Welcome!

Wednesday, May 4th, 2005

This blog will document everything I know about data warehouse performance tuning and optimization.