Friday, March 23, 2012

Primary Keys

I was looking for feedback on what the best practice is for assigning
Primary Keys. Should you use an auto-generated number so that you are always
looking up info on a int field? Auto-Generated numbers are not good for
clustered indexes, so should I other fields that make the record unique? Any
feedback on keys and indexes are welcomed.
Thanks
Gunner
"Gunner" <gunner@.docksidesoftware.com> wrote in message
news:u7PtjF8eEHA.4068@.TK2MSFTNGP11.phx.gbl...
> I was looking for feedback on what the best practice is for assigning
> Primary Keys. Should you use an auto-generated number so that you are
always
> looking up info on a int field? Auto-Generated numbers are not good for
> clustered indexes, so should I other fields that make the record unique?
Any
> feedback on keys and indexes are welcomed.
Why do you feel that auto-generated numbers are not good for clustered
indexes?
Here's an article on primary keys:
http://www.aspfaq.com/show.asp?id=2504
|||First off Auto generated keys (Identity) is often a good choice for
clustered indexes if there isn't another column more suited. This question
in general can start wars but whether you use an Identity or not is really
up to your app and can not be answered with a yes or no in such a general
fashion.
Andrew J. Kelly SQL MVP
"Gunner" <gunner@.docksidesoftware.com> wrote in message
news:u7PtjF8eEHA.4068@.TK2MSFTNGP11.phx.gbl...
> I was looking for feedback on what the best practice is for assigning
> Primary Keys. Should you use an auto-generated number so that you are
always
> looking up info on a int field? Auto-Generated numbers are not good for
> clustered indexes, so should I other fields that make the record unique?
Any
> feedback on keys and indexes are welcomed.
> Thanks
> Gunner
>
|||I have read in several places that a clustered index of sequential numbers
creates hotspots and causes performance issues. And sometimes to the point
where it would be better not to have a cluster index at all.
"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:efYy5I8eEHA.2812@.tk2msftngp13.phx.gbl...
> "Gunner" <gunner@.docksidesoftware.com> wrote in message
> news:u7PtjF8eEHA.4068@.TK2MSFTNGP11.phx.gbl...
> always
> Any
> Why do you feel that auto-generated numbers are not good for clustered
> indexes?
> Here's an article on primary keys:
> http://www.aspfaq.com/show.asp?id=2504
>
|||> I have read in several places that a clustered index of sequential numbers
> creates hotspots and causes performance issues.
If you have a billion inserts, sure. The advantage of a GUID here is that
not all inserts go on the same page, since the GUID is random. However,
this can cause page splits and heavier fragmentation.
Hopefully "I read somewhere that..." is not enough information to make you
give up on an approach. If you do that, you're going to run out of options,
because every approach has pros and cons.
Also note that the primary key is not always the best candidate for the
clustered index. A DATETIME column, for example, can often be a better
candidate, and this is the implementation for tables in many systems that I
work with.
But as Andrew stated, it all depends, there is no yes or no answer. You
need to weigh more factors than some opinion you read on some web site.
http://www.aspfaq.com/
(Reverse address to reply.)
|||"Gunner" <gunner@.docksidesoftware.com> wrote in message
news:%23gFuLS8eEHA.2440@.tk2msftngp13.phx.gbl...
> I have read in several places that a clustered index of sequential numbers
> creates hotspots and causes performance issues. And sometimes to the point
> where it would be better not to have a cluster index at all.
Actually, that hotspot is GOOD from a performance standpoint... Very
simplified version:
SQL Server allocates disc space in 8-kb chunks, called pages (and
further, 8-page chunks called extents). A clustered index orders the actual
physical pages on the disc. So let's say we were clustering on a numeric
column and had the following two pages defined, each of which could hold two
rows of data:
Page 1:
0043
0045
Page 2:
0057
0058
As you can see, both of the pages are full (as they can only hold two rows
of data). Now I want to insert a new row, which has the value '0044'.
Since I'm clustering on the numeric column, the rows are physically ordered
by that value. But there's no room in the page between 0043 and 0045. So
SQL Server will have to allocate a new page, move 0045 to it, and THEN 0044
can be inserted. That's called a 'page split'.
We'd end up with something like this:
Page 1:
0043
0044
(Pointer to Page 3)
Page 2:
0057
0058
Page 3:
0045
(Pointer back to page 2)
Notice that the clustered index has now also become fragmented.
But now let's pretend we were clustering on an IDENTITY.
First column is now our identity:
Page 1:
1 0043
2 0045
Page 2:
3 0057
4 0058
If I go to insert 0044, no page split is necessary; SQL Server can allocate
more space at the end of the file, rather than in the middle, and drop the
new row right in:
Page 3:
5 0044
So now there's no cost of moving old data, just the cost of inserting the
new data.
Note that although I believe I've conveyed the basic idea, this explanation
is very simplified and probably somewhat incorrect. Please refer to
_Inside SQL Server 2000_ by Kalen Delaney for a much better and more
detailed explanation.
|||No, I know that that I will have to weigh my options each time and that a
GUID is sometimes not only the best choice but my only choice. My whole
reason for the post was just to get a feedback on what other people use or
look to use most of the time for clustered indexes, like the DATETIME column
you mentioned. Alot of time I am alone on some of this and it is just good
to get input from other people. I am not working on anything particular
right now, just wanted info. Thanks.
"Aaron [SQL Server MVP]" <ten.xoc@.dnartreb.noraa> wrote in message
news:#62oTZ8eEHA.2560@.TK2MSFTNGP09.phx.gbl...[vbcol=seagreen]
numbers
> If you have a billion inserts, sure. The advantage of a GUID here is that
> not all inserts go on the same page, since the GUID is random. However,
> this can cause page splits and heavier fragmentation.
> Hopefully "I read somewhere that..." is not enough information to make you
> give up on an approach. If you do that, you're going to run out of
options,
> because every approach has pros and cons.
> Also note that the primary key is not always the best candidate for the
> clustered index. A DATETIME column, for example, can often be a better
> candidate, and this is the implementation for tables in many systems that
I
> work with.
> But as Andrew stated, it all depends, there is no yes or no answer. You
> need to weigh more factors than some opinion you read on some web site.
> --
> http://www.aspfaq.com/
> (Reverse address to reply.)
>
|||Where is Kimberly when you really need her.
She teaches an excellent course on indexing and index choice. One important
issue is that you can chose to make the Primary Key on a non-clustered
index. This can lead to some very serious performance improvements up to a
point. SO, here a few points distilled from her class. ANy errors are mine
alone.
Identity columns make for excellent clustered unique indexes. They are
monotonically increasing so page splits are reduced. They are narrow, so
non-clustered index comparisons, joins, and lookups are much faster. They
hot-spot inserts, which is good up to a point. That point varies, but it is
when the latch manager becomes saturated assembling the page from the rows.
I especially like the fact that an identity puts data in insert order on the
disks. Clustered index scans run faster due to lower fragmentation. My
favorite is the effect this has on cache. By putting more recently inserted
data together on a page, it gives the cache manager a non-random
distribution that usually increases cache hit ratio and page life
expectancy. Instead of loading a random page to get a recently added row,
the system can just keep a more active page in memory longer.
IMHO, do not use the Identity column as the primary key. I am of the school
that the PK should be data-centric, not arbitrarily assigned. That is
almost as sensitive a topic as politics or top vs. bottom posting so I will
just state my position and leave it there.
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com
I support the Professional Association for SQL Server
www.sqlpass.org
"Gunner" <gunner@.docksidesoftware.com> wrote in message
news:u7PtjF8eEHA.4068@.TK2MSFTNGP11.phx.gbl...
> I was looking for feedback on what the best practice is for assigning
> Primary Keys. Should you use an auto-generated number so that you are
always
> looking up info on a int field? Auto-Generated numbers are not good for
> clustered indexes, so should I other fields that make the record unique?
Any
> feedback on keys and indexes are welcomed.
> Thanks
> Gunner
>
|||Hi,
A Primary Key is really a conceptual, Data Model / Normalisation
concept. The initial choice of Primary Key should not be made with
database integrity, not performance, in mind. This is in contrast to an
Index, which is a physical database, Performance related construct.
Primary Keys will often be dictated by the business object the entity
represents. eg a 12 character Identity Number uniquely identifies a Person.
However, once the "Natural" Primary Keys have been identified, it is
sometimes best to introduce an alternate, "Surrogate Key". This is
often done in the Data Warehousing world, as explained in Ralph Kimbals
article http://www.dbmsmag.com/9805d05.html.
The arguments he presents for using an Integer valued surrogate key,
also hold true for the relational world.
thanks
Ian
ps. A Clustered Index should organise the data in the physical sequence
that it is most often accessed. Thus minimising the number of Logical
Reads required to load sequential records. If this sequence happens to
be the same as your Identity field, then it is fine to use it as a
clustered index.
Gunner wrote:
> I was looking for feedback on what the best practice is for assigning
> Primary Keys. Should you use an auto-generated number so that you are always
> looking up info on a int field? Auto-Generated numbers are not good for
> clustered indexes, so should I other fields that make the record unique? Any
> feedback on keys and indexes are welcomed.
> Thanks
> Gunner
>
sql

No comments:

Post a Comment