Showing posts with label warehouse. Show all posts
Showing posts with label warehouse. Show all posts

Thursday, March 29, 2012

Drawing up the spec for our Datawarehose Server

I am in the process of drawing up the spec for our data warehouse server. The
setup that I am having to cater for is a server that will hold:
1) a replicated copy of the tables that are going to be used to extract that
data from as we have 22 offices this will be 22 databases. (ranging in size
from 500 Mb to 20 Gb)
2) A staging database
3) The Data warehouse database
4) Cognos OLAP Cubes
In terms of the hardware I am looking to get a dual processor box (with the
capacity to take 2 more in the future). With 3 or 4 Gb of RAM, and to limit
sql to a portion of this leaving a good amount available for the OS and the
OLAP Cubes.
The part that I am stuck on is the RAID config. It is envisaged that the
cubes will be refreshed once a week.
So far the RAID Solution I have come up with is below:
RAID 1 - For the OS, SQL Binaries etc.
RAID 5 - For the Database files
RAID 5 - For the Transaction Logs
RAID 5 - For the Cognos OLAP Cubes
What are your thoughts on the above?
Id stay away from RAID 5 (you're gonna hear this a lot I suspect).
If you want performance, you should try to go with RAID 1+0 AT Least for the
Data and Logs.
Get RAID Controller(s) with Battery Backed Cache as well.
Hope this helps
Greg Jackson
PDX, Oregon
|||So you appear to be looking for a one size fits all solution. This might
work for a short while during the ramp up phase but most DW installations
will split the functions apart as they run into contension.
From the notes:
You are planning on having the following on one server.
22 source databases with a total space used between 11GB and 440GB
SQL replication running against all 22 source databases
1 stagging DB
1 DW DB
Cognos PowerPlay
I assume IIS and some of the other Cognos products unless you are planning
on using the network to copy the cubes or direct LAN connect from clients.
What ETL tool will you be using DTS, straight TSQL, or third party product?
If third party product where will it be installed?
With this much source data, is there an estimate as to how large the DW
database will be?
Unless you are only popluating a very small amout of data this configuration
will be overloaded inside six months and the first two to three months is
usually used for initial development and limited user testing.
Sorry for the bad news but better to know up front...
"Russell" <Russell@.discussions.microsoft.com> wrote in message
news:BA7C6084-ACD6-4B55-9125-0165108B7D9B@.microsoft.com...
>I am in the process of drawing up the spec for our data warehouse server.
>The
> setup that I am having to cater for is a server that will hold:
> 1) a replicated copy of the tables that are going to be used to extract
> that
> data from as we have 22 offices this will be 22 databases. (ranging in
> size
> from 500 Mb to 20 Gb)
> 2) A staging database
> 3) The Data warehouse database
> 4) Cognos OLAP Cubes
> In terms of the hardware I am looking to get a dual processor box (with
> the
> capacity to take 2 more in the future). With 3 or 4 Gb of RAM, and to
> limit
> sql to a portion of this leaving a good amount available for the OS and
> the
> OLAP Cubes.
> The part that I am stuck on is the RAID config. It is envisaged that the
> cubes will be refreshed once a week.
> So far the RAID Solution I have come up with is below:
> RAID 1 - For the OS, SQL Binaries etc.
> RAID 5 - For the Database files
> RAID 5 - For the Transaction Logs
> RAID 5 - For the Cognos OLAP Cubes
> What are your thoughts on the above?
>

Drawing up the spec for our Datawarehose Server

I am in the process of drawing up the spec for our data warehouse server. Th
e
setup that I am having to cater for is a server that will hold:
1) a replicated copy of the tables that are going to be used to extract that
data from as we have 22 offices this will be 22 databases. (ranging in size
from 500 Mb to 20 Gb)
2) A staging database
3) The Data warehouse database
4) Cognos OLAP Cubes
In terms of the hardware I am looking to get a dual processor box (with the
capacity to take 2 more in the future). With 3 or 4 Gb of RAM, and to limit
sql to a portion of this leaving a good amount available for the OS and the
OLAP Cubes.
The part that I am stuck on is the RAID config. It is envisaged that the
cubes will be refreshed once a week.
So far the RAID Solution I have come up with is below:
RAID 1 - For the OS, SQL Binaries etc.
RAID 5 - For the Database files
RAID 5 - For the Transaction Logs
RAID 5 - For the Cognos OLAP Cubes
What are your thoughts on the above?Id stay away from RAID 5 (you're gonna hear this a lot I suspect).
If you want performance, you should try to go with RAID 1+0 AT Least for the
Data and Logs.
Get RAID Controller(s) with Battery Backed Cache as well.
Hope this helps
Greg Jackson
PDX, Oregon|||So you appear to be looking for a one size fits all solution. This might
work for a short while during the ramp up phase but most DW installations
will split the functions apart as they run into contension.
From the notes:
You are planning on having the following on one server.
22 source databases with a total space used between 11GB and 440GB
SQL replication running against all 22 source databases
1 stagging DB
1 DW DB
Cognos PowerPlay
I assume IIS and some of the other Cognos products unless you are planning
on using the network to copy the cubes or direct LAN connect from clients.
What ETL tool will you be using DTS, straight TSQL, or third party product?
If third party product where will it be installed?
With this much source data, is there an estimate as to how large the DW
database will be?
Unless you are only popluating a very small amout of data this configuration
will be overloaded inside six months and the first two to three months is
usually used for initial development and limited user testing.
Sorry for the bad news but better to know up front...
"Russell" <Russell@.discussions.microsoft.com> wrote in message
news:BA7C6084-ACD6-4B55-9125-0165108B7D9B@.microsoft.com...
>I am in the process of drawing up the spec for our data warehouse server.
>The
> setup that I am having to cater for is a server that will hold:
> 1) a replicated copy of the tables that are going to be used to extract
> that
> data from as we have 22 offices this will be 22 databases. (ranging in
> size
> from 500 Mb to 20 Gb)
> 2) A staging database
> 3) The Data warehouse database
> 4) Cognos OLAP Cubes
> In terms of the hardware I am looking to get a dual processor box (with
> the
> capacity to take 2 more in the future). With 3 or 4 Gb of RAM, and to
> limit
> sql to a portion of this leaving a good amount available for the OS and
> the
> OLAP Cubes.
> The part that I am stuck on is the RAID config. It is envisaged that the
> cubes will be refreshed once a week.
> So far the RAID Solution I have come up with is below:
> RAID 1 - For the OS, SQL Binaries etc.
> RAID 5 - For the Database files
> RAID 5 - For the Transaction Logs
> RAID 5 - For the Cognos OLAP Cubes
> What are your thoughts on the above?
>sql

Tuesday, February 14, 2012

Does truncate and API Load invalidate statistics

As part of my data warehouse nightly build, I truncate my tables in my
target database.
As example, I find it is much quicker to do a bulk API load of 13M
records and to do an update/insert of 100K rows. I also drop the
indexes before the builds and reindex after. Thats an aside.
What I am wondering is how is this impacting the statistics? Do I need
to update them?
Not well versed on statistics and any data is welcomed.
Thanks
Robrcamarda (robc390@.hotmail.com) writes:

Quote:

Originally Posted by

As part of my data warehouse nightly build, I truncate my tables in my
target database.
As example, I find it is much quicker to do a bulk API load of 13M
records and to do an update/insert of 100K rows. I also drop the
indexes before the builds and reindex after. Thats an aside.
What I am wondering is how is this impacting the statistics? Do I need
to update them?


When an indexes added, statistics based a full scan is added for that
index, so you are fairly safe.

Would could matter is statistics of non-indexed columns. These you lose
when you truncate the tables. These statistics are less essential than
index on indexed columns, but there are queries where they may be useful.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx