Monday, March 26, 2012

Primary keys, indexes and speed

My table consists of several key fields (i.e. used in a complex primary key)
and some other fields where I do aggregation:
CREATE TABLE t
(
id INT NOT NULL,
ts DATETIME NOT NULL,
k1 INT NOT NULL,
k2 INT NOT NULL,
...
a1 FLOAT NOT NULL,
a2 FLOAT NOT NULL,
...
PRIMARY KEY (id, ts, k1, k2)
)
A SELECT query I execute on this table looks like:
SELECT k1, k2, SUM(a1) AS s1, SUM(a2) AS s2 FROM t
WHERE id = SomeNumber AND ts > StartDate AND ts <= EndDate
GROUP BY k1, k2 ORDER BY s1 DESC
As you see, id and ts are used in WHERE and k1 and k2 - in GROUP.
Do I need to add some more indexes to improve speed of retrieving
or above PRIMARY KEY is enough?It seems to be a good candidate for an indexed view ,isn't it? Read about it
in the BOL
"Tumurbaatar S." <nospam_tumur@.magicnet.mn> wrote in message
news:e5qJD2oOFHA.1884@.TK2MSFTNGP15.phx.gbl...
> My table consists of several key fields (i.e. used in a complex primary
key)
> and some other fields where I do aggregation:
> CREATE TABLE t
> (
> id INT NOT NULL,
> ts DATETIME NOT NULL,
> k1 INT NOT NULL,
> k2 INT NOT NULL,
> ...
> a1 FLOAT NOT NULL,
> a2 FLOAT NOT NULL,
> ...
> PRIMARY KEY (id, ts, k1, k2)
> )
> A SELECT query I execute on this table looks like:
> SELECT k1, k2, SUM(a1) AS s1, SUM(a2) AS s2 FROM t
> WHERE id = SomeNumber AND ts > StartDate AND ts <= EndDate
> GROUP BY k1, k2 ORDER BY s1 DESC
> As you see, id and ts are used in WHERE and k1 and k2 - in GROUP.
> Do I need to add some more indexes to improve speed of retrieving
> or above PRIMARY KEY is enough?
>|||Have you tried it yet? Or are you just guessing? Do you have a lot of
other columns? Since you have this as the clustered index, if it is the
ONLY index involved in any query, it might be enough.
Either way, you should test it out and see what happens. A lot depends on
what else you do with the table, because it can be too costly to add a bunch
of indexes to a table to speed up one query and hurt the others.
----
Louis Davidson - drsql@.hotmail.com
SQL Server MVP
Compass Technology Management - www.compass.net
Pro SQL Server 2000 Database Design -
http://www.apress.com/book/bookDisplay.html?bID=266
Blog - http://spaces.msn.com/members/drsql/
Note: Please reply to the newsgroups only unless you are interested in
consulting services. All other replies may be ignored :)
"Tumurbaatar S." <nospam_tumur@.magicnet.mn> wrote in message
news:e5qJD2oOFHA.1884@.TK2MSFTNGP15.phx.gbl...
> My table consists of several key fields (i.e. used in a complex primary
> key)
> and some other fields where I do aggregation:
> CREATE TABLE t
> (
> id INT NOT NULL,
> ts DATETIME NOT NULL,
> k1 INT NOT NULL,
> k2 INT NOT NULL,
> ...
> a1 FLOAT NOT NULL,
> a2 FLOAT NOT NULL,
> ...
> PRIMARY KEY (id, ts, k1, k2)
> )
> A SELECT query I execute on this table looks like:
> SELECT k1, k2, SUM(a1) AS s1, SUM(a2) AS s2 FROM t
> WHERE id = SomeNumber AND ts > StartDate AND ts <= EndDate
> GROUP BY k1, k2 ORDER BY s1 DESC
> As you see, id and ts are used in WHERE and k1 and k2 - in GROUP.
> Do I need to add some more indexes to improve speed of retrieving
> or above PRIMARY KEY is enough?
>|||Tumurbaatar S.,
What about the execution plan, can you post it?
set showplan_text on
go
SELECT k1, k2, SUM(a1) AS s1, SUM(a2) AS s2 FROM t
WHERE id = SomeNumber AND ts > StartDate AND ts <= EndDate
GROUP BY k1, k2 ORDER BY s1 DESC
go
set showplan_text off
go
AMB
P.S. Your last name remind me a student in my classroom, when I was in
college (studying Physics), he was from Mongolia.
"Tumurbaatar S." wrote:

> My table consists of several key fields (i.e. used in a complex primary ke
y)
> and some other fields where I do aggregation:
> CREATE TABLE t
> (
> id INT NOT NULL,
> ts DATETIME NOT NULL,
> k1 INT NOT NULL,
> k2 INT NOT NULL,
> ...
> a1 FLOAT NOT NULL,
> a2 FLOAT NOT NULL,
> ...
> PRIMARY KEY (id, ts, k1, k2)
> )
> A SELECT query I execute on this table looks like:
> SELECT k1, k2, SUM(a1) AS s1, SUM(a2) AS s2 FROM t
> WHERE id = SomeNumber AND ts > StartDate AND ts <= EndDate
> GROUP BY k1, k2 ORDER BY s1 DESC
> As you see, id and ts are used in WHERE and k1 and k2 - in GROUP.
> Do I need to add some more indexes to improve speed of retrieving
> or above PRIMARY KEY is enough?
>
>|||Since ts predicate is a range, I'd try changing the clustered PK index to
put ts first, instead of second... That might make a substantial
improvement...
"Tumurbaatar S." wrote:

> My table consists of several key fields (i.e. used in a complex primary ke
y)
> and some other fields where I do aggregation:
> CREATE TABLE t
> (
> id INT NOT NULL,
> ts DATETIME NOT NULL,
> k1 INT NOT NULL,
> k2 INT NOT NULL,
> ...
> a1 FLOAT NOT NULL,
> a2 FLOAT NOT NULL,
> ...
> PRIMARY KEY (id, ts, k1, k2)
> )
> A SELECT query I execute on this table looks like:
> SELECT k1, k2, SUM(a1) AS s1, SUM(a2) AS s2 FROM t
> WHERE id = SomeNumber AND ts > StartDate AND ts <= EndDate
> GROUP BY k1, k2 ORDER BY s1 DESC
> As you see, id and ts are used in WHERE and k1 and k2 - in GROUP.
> Do I need to add some more indexes to improve speed of retrieving
> or above PRIMARY KEY is enough?
>
>|||Yes, it is. But my application and db is designed to
work with any (at least, with many) OLEDB/ADO compliant
databases. And indexed view, I think, is MS SQL feature.
"Uri Dimant" <urid@.iscar.co.il> wrote in message
news:ehYzaRpOFHA.2132@.TK2MSFTNGP14.phx.gbl...
> It seems to be a good candidate for an indexed view ,isn't it? Read about
> it
> in the BOL
>
> "Tumurbaatar S." <nospam_tumur@.magicnet.mn> wrote in message
> news:e5qJD2oOFHA.1884@.TK2MSFTNGP15.phx.gbl...
> key)
>|||Yes, I tried. My server app works (24/7) and regularly adds new records.
And other, client, app retrieves these records to analyze. No update/delete
happens on the table. Only insert and select. And no many variants
of select, just one I posted above.
"Louis Davidson" <dr_dontspamme_sql@.hotmail.com> wrote in message
news:e47gILqOFHA.2252@.TK2MSFTNGP15.phx.gbl...
> Have you tried it yet? Or are you just guessing? Do you have a lot of
> other columns? Since you have this as the clustered index, if it is the
> ONLY index involved in any query, it might be enough.
> Either way, you should test it out and see what happens. A lot depends on
> what else you do with the table, because it can be too costly to add a
> bunch of indexes to a table to speed up one query and hurt the others.
> --
> ----
--
> Louis Davidson - drsql@.hotmail.com
> SQL Server MVP
> Compass Technology Management - www.compass.net
> Pro SQL Server 2000 Database Design -
> http://www.apress.com/book/bookDisplay.html?bID=266
> Blog - http://spaces.msn.com/members/drsql/
> Note: Please reply to the newsgroups only unless you are interested in
> consulting services. All other replies may be ignored :)
> "Tumurbaatar S." <nospam_tumur@.magicnet.mn> wrote in message
> news:e5qJD2oOFHA.1884@.TK2MSFTNGP15.phx.gbl...
>|||Ok, I will try.
"CBretana" <cbretana@.areteIndNOSPAM.com> wrote in message
news:36D28CF4-7839-4F8B-ACDC-D4D23DA61AA9@.microsoft.com...
> Since ts predicate is a range, I'd try changing the clustered PK index to
> put ts first, instead of second... That might make a substantial
> improvement...
>
> "Tumurbaatar S." wrote:
>|||On Tue, 12 Apr 2005 14:25:52 +0800, Tumurbaatar S. wrote:

>See attachment. There's an actual table structure, a query I used and
>a resulting plan.
Hi Tumurbaatar,
I think the query will be executed at about the maximum possible speed.
The clustered index that is created because of the PRIMARY KEY
constraint is optimal for this query. (Note - near the end of this
message, I'll discuss one possible improvement).
Allow me to use an analogy to explain: imagine that you have a phone
book of a large town, were all people are listed in alphabetic order of
the surname. Further, people with the same surname are ordered by first
name. In database terms, the phone book is a table with clustered index
on (Surname, FirstName).
Now suppose you are given the task of finding all people with Surname
'Smith' and whose first name starts with H, I or J. That's not hard -
you quickly flip to the page where the Smith's are listed, skip to the
first whose name starts with H, then start scanning all names - and you
stop as soon as you encounter a Smith whose first name starts with K.
But what if your task had been to find all people with first name 'John'
and surname starting with K, L, or M? In that case, you'd have to scan a
much larger proportion of the phone book (the complete listing of all
surnames starting with K, L, or M, to be precise). That's why I don't
think that CBretana's suggestion will improve the speed of your query.

>By the way, I understood nothing there.
The execution plan, you mean? Okay, take it easy. The best way to
interpret an execution plan is from innermost to outermost. This plan is
quite straightforward, actually :-)
The last line says "Clustered Index S". That is the process I
described above (in the phone book analogy) - the database uses the
index' structure to go straight to the first row with intid = 1 and ts >
'2005-4-1 00:00', then starts processing rows until it is past the last
row with intid = 1 and ts <= '2005-4-1 00:30'.
The line before that is a sort step. This is used to satisfy the group
by clause - after the sort, all rows that need to be grouped together
are together. (There are more strategies the optimizer can use for a
group by, but in this case the optimizer expects the sorting to be the
fastest).
The Stream Aggregate step takes the sorted output and calculates one row
from each group. If you inspect the entire line on your plan, you'll see
the three aggregate expressions you used in the query appear here. Note
that each result is given a name (Expr1002 through Expr1004).
And the final step is another sort step - this time to satisfy the ORDER
BY you specified. Note that the plan says to order by [Expr1004] - and
if you check the Stream Aggregate step, you'll see that Expr1004 is
exactly the name given to the expression you use in the ORDER BY.
I promised a possible improvement. I can't say if this works or not. The
idea is to eliminate one of the sort steps, at the cost of having to
scan a much bigger part of the table. If only a small number of rows in
your table satisfy the criteria in the WHERE clause, you can expect to
see performance plummetting. But if the WHERE clause is not very
selective and you were already reading most of your data anyway, then
this suggestion might help:
Change the table definition to:
CREATE TABLE dilink
(
intid SMALLINT NOT NULL,
ts SMALLDATETIME NOT NULL,
dr INT NOT NULL,
daddr INT NOT NULL,
ib FLOAT NOT NULL,
ob FLOAT NOT NULL,
CONSTRAINT dilink_intidtsdrdaddr
PRIMARY KEY (dr, daddr, intid, ts),
CONSTRAINT dilink_intidts
FOREIGN KEY (intid, ts)
REFERENCES didata (intid, ts) ON DELETE CASCADE
)
The only thing I changed is the order of columns in the primary key
constraint. This will als affect the order of columns in the index that
gets created for this constraint. The effect will be that the database
now has to scan the whole table to find the rows that match the WHERE
clause, but that the rows found will already be in order of dr, daddr
so that there is no need to sort in order to satsify the GROUP BY.
I must add that I *expect* this version to suck. It will only help you
if your table has very unusual data distribution!
If performance of your query is really critical, you'd be better advised
to change this query into an indexed view. That will result in instant
results when you want to see this data - but at the price of slower
inserts, updates and deletes (as SQL Server has to do extra work to keep
the indexed view current).
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Many thanks!
The most important criteria is a database size, SELECT speed
is the second one, so may be I will keep a current structure.
Thanks again!
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:kgao51t7i2cv1simi0o9p975bv01m9cbuv@.
4ax.com...
> On Tue, 12 Apr 2005 14:25:52 +0800, Tumurbaatar S. wrote:
>
> Hi Tumurbaatar,
> I think the query will be executed at about the maximum possible speed.
> The clustered index that is created because of the PRIMARY KEY
> constraint is optimal for this query. (Note - near the end of this
> message, I'll discuss one possible improvement).
> Allow me to use an analogy to explain: imagine that you have a phone
> book of a large town, were all people are listed in alphabetic order of
> the surname. Further, people with the same surname are ordered by first
> name. In database terms, the phone book is a table with clustered index
> on (Surname, FirstName).
> Now suppose you are given the task of finding all people with Surname
> 'Smith' and whose first name starts with H, I or J. That's not hard -
> you quickly flip to the page where the Smith's are listed, skip to the
> first whose name starts with H, then start scanning all names - and you
> stop as soon as you encounter a Smith whose first name starts with K.
> But what if your task had been to find all people with first name 'John'
> and surname starting with K, L, or M? In that case, you'd have to scan a
> much larger proportion of the phone book (the complete listing of all
> surnames starting with K, L, or M, to be precise). That's why I don't
> think that CBretana's suggestion will improve the speed of your query.
>
> The execution plan, you mean? Okay, take it easy. The best way to
> interpret an execution plan is from innermost to outermost. This plan is
> quite straightforward, actually :-)
> The last line says "Clustered Index S". That is the process I
> described above (in the phone book analogy) - the database uses the
> index' structure to go straight to the first row with intid = 1 and ts >
> '2005-4-1 00:00', then starts processing rows until it is past the last
> row with intid = 1 and ts <= '2005-4-1 00:30'.
> The line before that is a sort step. This is used to satisfy the group
> by clause - after the sort, all rows that need to be grouped together
> are together. (There are more strategies the optimizer can use for a
> group by, but in this case the optimizer expects the sorting to be the
> fastest).
> The Stream Aggregate step takes the sorted output and calculates one row
> from each group. If you inspect the entire line on your plan, you'll see
> the three aggregate expressions you used in the query appear here. Note
> that each result is given a name (Expr1002 through Expr1004).
> And the final step is another sort step - this time to satisfy the ORDER
> BY you specified. Note that the plan says to order by [Expr1004] - and
> if you check the Stream Aggregate step, you'll see that Expr1004 is
> exactly the name given to the expression you use in the ORDER BY.
>
> I promised a possible improvement. I can't say if this works or not. The
> idea is to eliminate one of the sort steps, at the cost of having to
> scan a much bigger part of the table. If only a small number of rows in
> your table satisfy the criteria in the WHERE clause, you can expect to
> see performance plummetting. But if the WHERE clause is not very
> selective and you were already reading most of your data anyway, then
> this suggestion might help:
> Change the table definition to:
> CREATE TABLE dilink
> (
> intid SMALLINT NOT NULL,
> ts SMALLDATETIME NOT NULL,
> dr INT NOT NULL,
> daddr INT NOT NULL,
> ib FLOAT NOT NULL,
> ob FLOAT NOT NULL,
> CONSTRAINT dilink_intidtsdrdaddr
> PRIMARY KEY (dr, daddr, intid, ts),
> CONSTRAINT dilink_intidts
> FOREIGN KEY (intid, ts)
> REFERENCES didata (intid, ts) ON DELETE CASCADE
> )
> The only thing I changed is the order of columns in the primary key
> constraint. This will als affect the order of columns in the index that
> gets created for this constraint. The effect will be that the database
> now has to scan the whole table to find the rows that match the WHERE
> clause, but that the rows found will already be in order of dr, daddr
> so that there is no need to sort in order to satsify the GROUP BY.
> I must add that I *expect* this version to suck. It will only help you
> if your table has very unusual data distribution!
>
> If performance of your query is really critical, you'd be better advised
> to change this query into an indexed view. That will result in instant
> results when you want to see this data - but at the price of slower
> inserts, updates and deletes (as SQL Server has to do extra work to keep
> the indexed view current).
>
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)

No comments:

Post a Comment