Wednesday, March 7, 2012

Don't use dateadd

I've got a reasonably efficient query that gives me a count of the top 20 daily values in my database. Now I'd like to figure out the daily total of top 20 values. Then analyze this information to work out the average, min, max and standard deviation of the daily total of top 20 values.

My best effort is horribly slow - does anyone have a better idea how to do this?

Thanks!

The schema of the database it accesses:

create view eventView (timeStr, msec, host, process, dbName, point,
description, rtu, groupName, message, type,
sevInt, time)
as
select dateadd(second,time+60*offset,'01/01/70'), msec, host, process,
dbName, ptName, description, rtuName, groupName,
message, type, sevInt, time
from summary

CREATE TABLE [summary] (
[msrepl_tran_version] uniqueidentifier ROWGUIDCOL NOT NULL CONSTRAINT [DF_Summary_GUID] DEFAULT (newid()),
[time] [GMTtime] NOT NULL ,
[msec] [int] NULL ,
[offset] [GMToffset] NOT NULL ,
[type] [nameType] NULL ,
[host] [nameType] NULL ,
[process] [nameType] NULL ,
[dbName] [nameType] NULL ,
[ptName] [nameType] NULL ,
[description] [descType] NULL ,
[rtuName] [nameType] NULL ,
[groupName] [nameType] NULL ,
[message] [msgType] NOT NULL ,
[fgInt] [tinyint] NULL ,
[sevInt] [tinyint] NULL ,
[key1] [int] NULL ,
[key2] [int] NULL ,
[spooler] [tinyint] NULL
) ON [PRIMARY]
GO

My Top 20 query:

SELECT TOP 20 count (*) as "Number of Alarms", [point], [description]

FROM [event].[dbo].[eventView]
WHERE ([timestr] < left(getdate(),11) and [timestr] >= left(getdate() - 1,11))
GROUP BY point, description HAVING count(*) > 1
ORDER BY "Number of Alarms" desc

And the messy, slow meta query:

declare @.myDay datetime
declare @.begDay datetime

declare @.myTable
table(Alarms int, Point varchar(250), Description varchar(250), Before datetime, After datetime)

declare @.myDaily
table(Date datetime, Alarms int)

select @.myDay = left(getdate(),11)

select @.begDay = left (convert(datetime, '12/01/2006'), 11)

while @.begDay <= @.myDay
begin

insert into @.myTable
SELECT TOP 20 count (*) as "Number of Alarms", [point], [description],@.begDay
FROM [event].[dbo].[eventView]
where ([timestr] < @.begDay and [timestr] >= dateadd(day,-1,@.begDay))
group by point, description
having count(*) > 1
order by "Number of Alarms" desc

select @.begDay = dateadd(day,1,@.begDay)

end

--

insert into @.myDaily
select After as "Date", sum(Alarms) as "Alarms"
from @.myTable
group by After
--

select count(Alarms) as "Count", avg(Alarms) as "Average", max(Alarms) as "Maximum", min(Alarms) as "Minimum", stdev (Alarms) as "Standard Deviation"
from @.myDaily

As I know you can improve performance if you make some change in these places.
1. avoid using function in your where clause.
2. create index in summary table.
3. I am not sure why you have to use dateadd(second,time+60*offset,'01/01/70'), in eventView? You can create another column which store time as your local timezone.


|||

I'm still learning SQL, so I don't understand all of your suggestions:

1. Avoid using function in where clause

Are you referring to the dateadd? How else can I limit the data to daily information?

2. create index in summary table.

Sorry, can't do that. I don't have control over the summary table - it's provided to my company by the owner of the software.

3. why you have to use dateadd(second,time+60*offset,'01/01/70'), in eventView

Because the software mfg stores data in the summary table in the format of "UTC seconds ". Once again, I cannot change the summary table.

|||If I were you, I won't use dateadd(....) in eventView. That is we still use UTC timestamp in eventView.

you can create @.begDay_UTC and @.PreviousDay_UTC and use them in this part of code in your where clause.
([timestr] < @.begDay_UTC and [timestr] >= @.PreviousDay_UTC)

If you can find out index Summary table used, that will help us find out how to improve performance.

Also, the estimated executions in SQL Server Management Studio will help us find out which part of code is the most expensive.|||

You can replace that while loop with a single query that will greatly improve the performance, here it is (I think, haven't tested but should be very close)

insert into @.myTable
SELECT TOP 20 count (*) as "Number of Alarms", [point], [description], [timestr], dateadd(day, 1, [timestr])
FROM [event].[dbo].[eventView]
where [message] not like '%NORMAL state%' and
[message] not like '%restored - normal%' and
[message] not like '%communication%restored%' and
[message] not like '%PLM - NORMAL%' and
[type] = 'alarm' and
[timestr] between @.begDay and @.myDay
group by [timestr], dateadd(day, 1, [timestr]), point, description, dbName
having count(*) > 1
order by "Number of Alarms" desc

|||

Agree this makes more sense and is easier to read. I've tested this and it does not make much difference in the performance.

I'm going to use your suggestion - much cleaner and easier to understand! Thanks...

|||

Think you are trying to grab the daily information using the GROUP BY.

This doesn't work for me because I'm trying to get a set of n days ( = 265 days on my system) daily top 20 values; this query only returns 20 values. I want 265 x 20 values.

|||

Which version of SQL Server are you using? You can simplify the WHILE loop in SQL Server 2005 using the APPLY operator. In SQL Server 2000, there is no easy way to write a single query - you have to do some sort of procedural loop which might be the fastest way. See below for an example in SQL Server 2005:

-- Top 2 order details based on quantity for each order:

select *
from Orders as o
cross apply (
select top 2 *
from "Order Details" as od
where od.OrderID = o.OrderID
order by od.Quantity
) as o2

|||

Our system runs MS SQL Server 2000. And since its the back end providing data archiving for our turnkey system, we will be using this for years to come.

When you say "procedural loop" that makes me think the BEGIN loop is the only way to do this job. Too bad 8-(

No comments:

Post a Comment