Sun and Sybase reference architecture for data warehousing: Reference Architectures Specifications
Archive for the ‘Tips’ Category
Reference Architectures Specifications
Monday, May 9th, 2005Data Warehouse Boost on a Budget – Computerworld
Monday, May 9th, 2005Good article about the Netezza approach: Data Warehouse Boost on a Budget – Computerworld
Oracle9i Data Warehousing Guide — Contents
Monday, May 9th, 2005Suggested reading: Oracle9i Data Warehousing Guide — Contents
Data Warehousing Infrastructure: Linux Clusters for the Enterprise
Monday, May 9th, 2005Here is more information about DW hardware, this time for those considering Linux as an OS: Data Warehousing Infrastructure: Linux Clusters for the Enterprise
Very large data warehouse disk storage architecture issues
Monday, May 9th, 2005As part of my research about performance tuning, I’ve been trying to find some hardware configuration guidelines for data warehousing. So far I’ve found only a few worthwhile articles. A very high-level summary is provided in the article Very large data warehouse disk storage architecture issues written by the authors of Clickstream Data Warehousing. As expected, I’ve found that some of the best material comes from the RDBMS vendors. Oracle provides good info in Hardware and I/O Considerations in Data Warehouses. I’ll look some more in the DB2, Teradata, and SQL Server sites.
Insights: Aggregate Tables Defined
Friday, May 6th, 2005This article in DM Review suggests the wrong approach to creating aggregate tables. If you try to build an aggregate table in response to specific business needs, you may end up with a large number of aggregates that become a maintenance headache. Besides, unless I’m missing the point, business users don’t need aggregates. What they need is reports that return immediately.
Hardware
Thursday, May 5th, 2005Data warehouse queries often require scanning millions of records so, as far as hardware is concerned, you need to maximize the amount of memory in your RDBMS server, and use the fastest hard drives your money can buy.
Why should you maximize the amount of memory in your server? The key here is that if the RDBMS can load the most frequently used tables into memory, querying a table stored in memory will be much faster than querying a table stored in disk.
Other issues that I’ll write about later:
- CPU speed
- Number of CPUs
- RAID levels
- MPP vs SMP architecture
- The Netezza approach
A complex task
Thursday, May 5th, 2005Tuning a data warehouse (dw) requires dealing with a thick stack of layers. To name a few:
- Relational database management system (RDBMS) server hardware
- RDBMS server operating system (e.g. AIX, Solaris, Windows, Linux)
- RDBMS (e.g. Oracle, DB2, Teradata)
- Database schema (tables/views/indexes)
- Network
- Application server hardware
- Application server software
- Application server operating system
In this blog, I will focus on the database schema layer
Welcome!
Wednesday, May 4th, 2005This blog will document everything I know about data warehouse performance tuning and optimization.