Friday, February 24, 2012

Domain Error variation

Hi there,
I am having problems generating the Geomean using "EXP(AVG(LOG()))" in
SQL Server 2005.
I have looked at the common causes of this, namely LOG(0) returning
NULL and hence causing the problem, but I have queried this out of the
test data and all values being used are greater than 0.
Another interesting point to note is that the query (which I'll post
below) performs this calculation over a grouping of values from
different tables, when I add a level of indirection to this (create a
temporary view of the grouping) it seems to work and returns no
problems this was why I threw the Log(0) theory for this problem out
the window.
I think it could be a rounding problem, but this same query ran fine on
SQL Server 2000 (once we set the flags correctly "SET ARITHABORT ON,
NUMERIC_ROUNDABORT OFF"), and am wondering if it is something wrong
with SQL Server 2005, the version I am using is 9.0.1399 and accessing
it through the Management Studio.
It is obviously very hard to post the test data to give you an accurate
way of reproducing it, but I was hoping that someone could place their
finger on the problem by having a look at the SQL.
SELECT
TEST_BATCH.MACHINE_ID AS MACHINE_ID,
TEST_BATCH.TEST_DATE AS TEST_DATE,
TEST.TEST_TYPE_ID AS TEST_TYPE_ID,
EXP(AVG(LOG(TEST.RESULT))) AS MEAN_RESULT,
COUNT(TEST.TEST_ID) AS RESULT_COUNT
FROM
TEST_BATCH
JOIN
TEST_PROCEDURE
ON
(TEST_BATCH.TEST_BATCH_ID = TEST_PROCEDURE.TEST_BATCH_ID)
JOIN
TEST
ON
(TEST_PROCEDURE.TEST_PROCEDURE_ID = TEST.TEST_PROCEDURE_ID)
JOIN
TEST_TYPE
ON
(TEST.TEST_TYPE_ID = TEST_TYPE.TEST_TYPE_ID)
JOIN
MACHINE_TEST_TYPE
ON
(TEST_BATCH.MACHINE_ID = MACHINE_TEST_TYPE.MACHINE_ID) AND
(TEST_TYPE.TEST_TYPE_ID = MACHINE_TEST_TYPE.TEST_TYPE_ID)
WHERE
(TEST_BATCH.MACHINE_ID IS NOT NULL) AND
(TEST_BATCH.TEST_DATE BETWEEN @.START_TEST_DATE AND
@.END_TEST_DATE) AND
(TEST_PROCEDURE.AUTO_REGISTRATION_TEST_REGIME_ID IS NULL) AND
(TEST_PROCEDURE.SAMPLE_TYPE = 'N') AND
(TEST_PROCEDURE.SAMPLE_ID IS NOT NULL) AND
(TEST_PROCEDURE.IS_SUPERSEDED = 'F') AND
(TEST_TYPE.RESULT_MEAN_TYPE = 'G') AND
(TEST.IS_MISSING_RESULT = 'F')
GROUP BY
TEST_BATCH.MACHINE_ID,
TEST_BATCH.TEST_DATE,
TEST.TEST_TYPE_ID
this query is correct with the data model that we have developed, and
as I say it has previously worked.
Many thanks for any light that can be shed on this
AlastairAlastair,
If you replace the offending expression with
MAX(CASE WHEN TEST.RESULT = 0e0 THEN 0 ELSE NULL END)
do you get NULL or 0? What is the data type of
TEST.RESULT?
I can't test anything on SQL Server 2005 until later, but
I suggest you try this as a workaround:
EXP(AVG(LOG(CASE WHEN TEST.RESULT <= 0 THEN NULL END)))
If there are any 0 values, it's possible that in one
query plan they are filtered out before LOG() is applied,
and in another they are filtered afterwards. The CASE
expression prevents them from causing a problem, regardless
of the actual query execution plan.
If that still fails, you could see what results you get if
you select things like
SUM(TEST.RESULT), MAX(LOG(TEST.RESULT)), and so on, to
consider whether the problem is really with LOG, and whether
there is some overflow somewhere.
What is the distribution of data like? What are the MIN, MAX,
and COUNT of TEST.RESULT, for example?
Steve Kass
Drew University
Alastair wrote:

> Hi there,
> I am having problems generating the Geomean using "EXP(AVG(LOG()))" in
> SQL Server 2005.
> I have looked at the common causes of this, namely LOG(0) returning
> NULL and hence causing the problem, but I have queried this out of the
> test data and all values being used are greater than 0.
> Another interesting point to note is that the query (which I'll post
> below) performs this calculation over a grouping of values from
> different tables, when I add a level of indirection to this (create a
> temporary view of the grouping) it seems to work and returns no
> problems this was why I threw the Log(0) theory for this problem out
> the window.
> I think it could be a rounding problem, but this same query ran fine on
> SQL Server 2000 (once we set the flags correctly "SET ARITHABORT ON,
> NUMERIC_ROUNDABORT OFF"), and am wondering if it is something wrong
> with SQL Server 2005, the version I am using is 9.0.1399 and accessing
> it through the Management Studio.
> It is obviously very hard to post the test data to give you an accurate
> way of reproducing it, but I was hoping that someone could place their
> finger on the problem by having a look at the SQL.
> SELECT
> TEST_BATCH.MACHINE_ID AS MACHINE_ID,
> TEST_BATCH.TEST_DATE AS TEST_DATE,
> TEST.TEST_TYPE_ID AS TEST_TYPE_ID,
> EXP(AVG(LOG(TEST.RESULT))) AS MEAN_RESULT,
> COUNT(TEST.TEST_ID) AS RESULT_COUNT
> FROM
> TEST_BATCH
> JOIN
> TEST_PROCEDURE
> ON
> (TEST_BATCH.TEST_BATCH_ID = TEST_PROCEDURE.TEST_BATCH_ID)
> JOIN
> TEST
> ON
> (TEST_PROCEDURE.TEST_PROCEDURE_ID = TEST.TEST_PROCEDURE_ID)
> JOIN
> TEST_TYPE
> ON
> (TEST.TEST_TYPE_ID = TEST_TYPE.TEST_TYPE_ID)
> JOIN
> MACHINE_TEST_TYPE
> ON
> (TEST_BATCH.MACHINE_ID = MACHINE_TEST_TYPE.MACHINE_ID) AND
> (TEST_TYPE.TEST_TYPE_ID = MACHINE_TEST_TYPE.TEST_TYPE_ID)
> WHERE
> (TEST_BATCH.MACHINE_ID IS NOT NULL) AND
> (TEST_BATCH.TEST_DATE BETWEEN @.START_TEST_DATE AND
> @.END_TEST_DATE) AND
> (TEST_PROCEDURE.AUTO_REGISTRATION_TEST_REGIME_ID IS NULL) AND
> (TEST_PROCEDURE.SAMPLE_TYPE = 'N') AND
> (TEST_PROCEDURE.SAMPLE_ID IS NOT NULL) AND
> (TEST_PROCEDURE.IS_SUPERSEDED = 'F') AND
> (TEST_TYPE.RESULT_MEAN_TYPE = 'G') AND
> (TEST.IS_MISSING_RESULT = 'F')
> GROUP BY
> TEST_BATCH.MACHINE_ID,
> TEST_BATCH.TEST_DATE,
> TEST.TEST_TYPE_ID
> this query is correct with the data model that we have developed, and
> as I say it has previously worked.
> Many thanks for any light that can be shed on this
> Alastair
>|||Hi Steve, thanks for the reply.
We chjecked all the MIN and MAX results and there is nothing out of the
ordinary that SQL Server couldn't handle, such as the case that we were
able to create an intermediate table an then perform the calculation on
that.

> What is the data type of TEST.RESULT?
NUMERIC(10, 4)
We did manage to solve the problem using your suggestion of a CASE
statement and also used (TEST.RESULT > 0) in the WHERE clause (although
again I'll state we know that the results are never 0 or below). It
seems that the if you do not explicitly state you do not want 0 or
below in your results then SQL Server seems to throw a wobbly and use
"Domain Error" as an excuse when using the LOG function.
Thanks for your help.
Alastair

No comments:

Post a Comment