I have order table with ORDER_ID [int] IDENTITY (1, 1) NOT NULL ,
as Primary key and by default also clustered index.
I use this ID in my INNER JOINS with order items to connect them.
I have also orderDate column in my order table, which is datetime field..
A lot of my queries include search or order condition by date, for example,
simplified one:
SELECT * FROM ORDERS o INNER JOIN ORDER_ITEMS i ON o.ORDER_ID=i.ORDER_ID
WHERE o.orderDate>='20050212' AND o.orderDate<'20050228' ORDER BY
o.orderDate
Now I would like to speed up the execution of this query.
I have 3 options:
1: orderDate as Primary key (it will be clustered index) and ORDER_ID not in
any index
2: orderDate+ORDER_ID as Primary key
3:orderDate as Primary key and nonclustered index on ORDER_ID column
Now, date will be in clustered index and select will be much faster, because
date is usually in where and order parts of query.
Order_ID is usually only in join conditions, so, I think it's not so
important to be as clustered index - if, than it should be append to date
column and both will present clustered index.
What is yours opinion?
Any suggestions, expirience with that?
Thank you,
SimonLet's start with basics. An IDENTITY columns can not be a key by
definition. It is not an attribute in the data model, but an exposed
physical locator for the physical storage of the data. It cannot be
validated or verified.
Newbies use it because they don't know what a key is and this looks
like a pointer or record number. Next, rows are not records and columns
are not fields.
It looks like you use date *ranges*, so a clustered index on the date
column would help quite a bit.
But order_id sounds like the natural key for an Orders table (once you
make it a real data type, add a check digit or validation rule, etc.).
There are primary indexes -- those required to enforce business rules
(UNIQUE, PRIMARY KEY) and secondary indexes -- those added for
performance. You have one of each.
As an aside, other products like Sybase will see the PK-FK relationship
between Orders and OrderItems and build a pointer structure that will
"pre-join" them and things will much faster.
In SQL Server you currently have to add indexing to the referencing
table on your own. The original design of SQL Server was done by
people who mapped tables to single files rather than viewing the schema
as a whole. This is why I keep beating people up about confusing
files/records/fields with tables/rows/columns; a bad mental model leads
to bad code.|||Everything Joe Celko said. I'll add a suggestion for generating your order
number (which you *should* be using as a PK). Your order number should be a
"smart key" -- ie, it will actually convey info about your order, unlike
Order #1702. Here's an exmple technique.
Find the magnitude of average orders per day and add one. If you see 40-70
orders, then your magnitude will be three.
You have two options from here. I'd go by your customer/order ratio. If it
is above 0.1 (which I would guess), then don't cater to only a handful of
customers, so a date is better to embed in the order number. Otherwise,
you'll want to embed the customer number.
Normal Ratio: Y{1,2}DDD-S+
Low Ration: C+-S+
Let's explore ...
Y is for year. You can go one or two digits (5 for 2005 or 05 for 2005). It
all depends on how long you need to remember orders.
DDD is day of year. By doing this instead of MMDD, you save a digit. Not
for disk space, for short term memory.
S+ is the daily sequence number. The extra magnitude is for growth.
C+ is your customer number
- helps to split the number (mentally) and make it easier to remember.
Now when you cluster your PK (Order Num), it's ordered by the info you need.
And you can always index the other (Date).
-- Alex Papadimoulis
"simon" wrote:
> I have order table with ORDER_ID [int] IDENTITY (1, 1) NOT NULL ,
> as Primary key and by default also clustered index.
> I use this ID in my INNER JOINS with order items to connect them.
> I have also orderDate column in my order table, which is datetime field..
> A lot of my queries include search or order condition by date, for example
,
> simplified one:
> SELECT * FROM ORDERS o INNER JOIN ORDER_ITEMS i ON o.ORDER_ID=i.ORDER_ID
> WHERE o.orderDate>='20050212' AND o.orderDate<'20050228' ORDER BY
> o.orderDate
> Now I would like to speed up the execution of this query.
> I have 3 options:
> 1: orderDate as Primary key (it will be clustered index) and ORDER_ID not
in
> any index
> 2: orderDate+ORDER_ID as Primary key
> 3:orderDate as Primary key and nonclustered index on ORDER_ID column
> Now, date will be in clustered index and select will be much faster, becau
se
> date is usually in where and order parts of query.
> Order_ID is usually only in join conditions, so, I think it's not so
> important to be as clustered index - if, than it should be append to date
> column and both will present clustered index.
> What is yours opinion?
> Any suggestions, expirience with that?
> Thank you,
> Simon
>
>
>|||> Everything Joe Celko said. I'll add a suggestion for generating your order
> number (which you *should* be using as a PK). Your order number should be
> a
I don't know. I agree with your order number, I just still like to use
identity values for primary keys, with a unique key on things like this
order_id. There are quite a few positive reasons to do so (performance
being one, and development pattern simplification being another) and really,
as long as you have a natural key, it is an exceptionally useful way to have
a non-changing key. And there is never a need to modify a primary key,
which is usually a real pain.
I have never heard an argument against identities that made enough sense to
balance out the ease of use. I certainly will never agree that they are
"exposed physical locators" but I will agree that they cannot be "validated
or verified" as Joe Celko has said. If they were physical locators they
would change as the physical storage of a row was moved. They aren't. They
are not a value I would share with the user, but a convienience in
development that keeps key size managable.
----
Louis Davidson - drsql@.hotmail.com
SQL Server MVP
Compass Technology Management - www.compass.net
Pro SQL Server 2000 Database Design -
http://www.apress.com/book/bookDisplay.html?bID=266
Blog - http://spaces.msn.com/members/drsql/
Note: Please reply to the newsgroups only unless you are interested in
consulting services. All other replies may be ignored :)
"Alex Papadimoulis" <alexRemovePi@.pa3.14padimoulis.com> wrote in message
news:0C0B35F4-07FC-416F-BA3E-C398C4D30525@.microsoft.com...
> Everything Joe Celko said. I'll add a suggestion for generating your order
> number (which you *should* be using as a PK). Your order number should be
> a
> "smart key" -- ie, it will actually convey info about your order, unlike
> Order #1702. Here's an exmple technique.
> Find the magnitude of average orders per day and add one. If you see 40-70
> orders, then your magnitude will be three.
> You have two options from here. I'd go by your customer/order ratio. If it
> is above 0.1 (which I would guess), then don't cater to only a handful of
> customers, so a date is better to embed in the order number. Otherwise,
> you'll want to embed the customer number.
> Normal Ratio: Y{1,2}DDD-S+
> Low Ration: C+-S+
> Let's explore ...
> Y is for year. You can go one or two digits (5 for 2005 or 05 for 2005).
> It
> all depends on how long you need to remember orders.
> DDD is day of year. By doing this instead of MMDD, you save a digit. Not
> for disk space, for short term memory.
> S+ is the daily sequence number. The extra magnitude is for growth.
> C+ is your customer number
> - helps to split the number (mentally) and make it easier to remember.
> Now when you cluster your PK (Order Num), it's ordered by the info you
> need.
> And you can always index the other (Date).
> -- Alex Papadimoulis
> "simon" wrote:
>|||On Fri, 4 Mar 2005 10:54:26 -0600, Louis Davidson wrote:
> I certainly will never agree that they are
> "exposed physical locators" but I will agree that they cannot be "validate
d
> or verified" as Joe Celko has said. If they were physical locators they
> would change as the physical storage of a row was moved. They aren't.
Mr. Celko's use of the word "physical" is still a higher level than what
most people think of as physical. It's higher than magnetic spins on the
hard drive; higher than sectors on the hard drive; higher than bytes in the
file on the filesystem; higher even than the structure inside a DAT file.
Anything that can't be ported from one platform to the next strictly with
SQL statements is "physical" in Mr. Celko's view, because it's part of the
implementation that might change in the next release of the software. Since
the Identity() attribute of a column is specific to MS SQL Server, and
requires internal code to run at the time of insert, it's physical in that
sense. And, since the identity value may be different depending on the
order of inserts (think of an INSERT INTO tbl1 SELECT blah FROM tbl2 ...
the query optimizer may reorder however it likes), it really has nothing
whatsoever to do with the values in the row (or as Mr. Celko would want me
to think, nothing to do with the actual identity of the entity that the
table represents).
As a surrogate key, yes, they're awfully convenient. But I've found that
when I take the trouble to use real keys in my schema, all my code ends up
simplified, not complexified.|||Thanks very much Ross, for your translation of what Joe means when he says
'physical'. It drives me up the wall every time I read a message from him
telling someone that an IDENTITY is a physical locator for the physical
storage. I don't think Joe understands anything about SQL Server real
physical storage. But now knowing that Joe means something entirely
different when he uses that term, I will stop pulling my hair out. But we
will have to be ever vigilant to make sure new users know that when used by
Joe, 'physical' does not mean what they think it means.
Thanks again...
--
Kalen Delaney
SQL Server MVP
www.SolidQualityLearning.com
"Ross Presser" <rpresser@.imtek.com> wrote in message
news:1xbcau9ino4a7$.dlg@.rpresser.invalid...
> On Fri, 4 Mar 2005 10:54:26 -0600, Louis Davidson wrote:
>
> Mr. Celko's use of the word "physical" is still a higher level than what
> most people think of as physical. It's higher than magnetic spins on the
> hard drive; higher than sectors on the hard drive; higher than bytes in
> the
> file on the filesystem; higher even than the structure inside a DAT file.
> Anything that can't be ported from one platform to the next strictly with
> SQL statements is "physical" in Mr. Celko's view, because it's part of the
> implementation that might change in the next release of the software.
> Since
> the Identity() attribute of a column is specific to MS SQL Server, and
> requires internal code to run at the time of insert, it's physical in that
> sense. And, since the identity value may be different depending on the
> order of inserts (think of an INSERT INTO tbl1 SELECT blah FROM tbl2 ...
> the query optimizer may reorder however it likes), it really has nothing
> whatsoever to do with the values in the row (or as Mr. Celko would want me
> to think, nothing to do with the actual identity of the entity that the
> table represents).
> As a surrogate key, yes, they're awfully convenient. But I've found that
> when I take the trouble to use real keys in my schema, all my code ends up
> simplified, not complexified.|||>> I'll add a suggestion for generating your order number (which you
Can you explain how and why it is beneficial to use a "smart key" for an
identifier? Are you aware of any drawbacks of using such "smart" or
intelligent keys?
Anith|||>> But we will have to be ever vigilant to make sure new users know that
Then OTOH, we will have to be aware that the ones using the terms "physical
table", "physical row", "physical column" etc do not mean what they think it
means either :-)
Anith|||Anith,
According to some (such as Louis, who replied earlier), the draw backs are:
> performance
> storage size
> they can change
But let's think about each of those. Are they "real" problems?
Performance. Did you know, your app can shave maybe 30ns if you forgo
database technology altogether. And loops (for, while, etc) -- you can easil
y
save a few clock cycles by not using them. Never accept "performance" as a
reason for doing something unless there are real world data to back this
claim up. I could make silly performance articles about AutoID as well -- th
e
system has to take extra cycles to generate the ID, check that you don't try
to insert it, etc.
Storage Size. The key I suggested was 8 bytes (YDDDSSSS). This would allow
for 999 orders a day for 10 years. That's well over 3 million orders. If you
wanted to do the same with an auto ID, you'd need a bigint (8 bytes).
Whoops. Let's compare a smaller key (YDDDSS, 6 bytes) versus int (4 bytes).
Even if we max out the int (at 2.15 Million), we save a whopping 4.3 Million
bytes. How about we just delete "solitare" instead of worrying about this?
They can change. Oh this is my favorite. How many times has amazon.com told
you "dear customer, we're sorry, but your order number has changed from
21040204-a34 to 24030204-a34." Find me a case where your PK will change, and
I'll show you a poorly designed system.
AutoIDs should remain in MS Access. They're good for one thing -- whipping
together a quick and dirty database. The whole point of "Relational"
databases is to allow data to relate to other data by the data iteself (keys
)
instead of these artificial AutoIDs.
-- Alex Papadimoulis
"Anith Sen" wrote:
> Can you explain how and why it is beneficial to use a "smart key" for an
> identifier? Are you aware of any drawbacks of using such "smart" or
> intelligent keys?
> --
> Anith
>
>|||I also react to the "academic" view of Primary Keys... One argument that
always irritates me is the idea that Identitys are bad because they have no
relationship or connection to the entity in the row... (It is not an
attribute in the data model) But then I will see it argued by the same
individual, that Social Security Number, or CustomerNo, or PartNumber (all
constructed artificuially by a third party) IS an appropriate key! (... But
order_id sounds like the natural key...)
Any value that uniquely identifies a row, imho, is a suitable candidate for
a Key. That value Must be constructed... either from meaningful data, or
from non-meaningful data. If you choose to construct it from meaningful dat
a
(Attributes in the data model) Then you ALWAYS have the problem of picking a
n
attribute (or set of attributes) whose values are least likely to change -
AND the isssue of propagating changes when the real-world values of the
attributes for that row DO change, (And they always will - because nothing i
n
the real-world is 100% fixed, no matter what the academics think.)
If you choose a non-meaningful key, how it is constructed - whether you use
a Identity, or some arbitrary algorithm, doesn't really matter, as long as
you can guarantee uniqueness. And Identities do that quite nicely.
"Anith Sen" wrote:
> Then OTOH, we will have to be aware that the ones using the terms "physica
l
> table", "physical row", "physical column" etc do not mean what they think
it
> means either :-)
> --
> Anith
>
>
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment