Tuesday, March 20, 2012

Primary Key Datatypes

What are the most effeicent primary keys?
Is a Varchar or binary datatype bad for a primary key?This question makes no sense. There is no single, magical, universal
primary key data type. This is like believing that God put a magical
17 digit Hebrew number on everything in creation.
You have to design each schema and each table based on your data model.
First you look for industry standards, then your look for natural keys
and finally, as the last resort, you design your own encoding scheme.
You do not get to pick the data types so much as you discover them when
you are building the data model.
If you want to get some tips about the design of encoding schemes, get
a copy of SQL FOR SMARTIES and learn about types of encodings, check
digits, etc.|||As a general rule, the integer based datatypes (tinyint, smallint, int,
bigint) are going to be more efficient as far as diskspace and have better
query performance.
You can easily test this by loading up a few tables and testing the join
performance of the different datatypes.
Keep in mind that the PK for your table does not have to be the only key
field in the table. If you need to store a "Customer ID key" that is
'AAAA123456', there is no reason you cannot create a unique constraint
(var)char column and use the smallint behind the scenes. In other words, th
e
underlying databae relationships do not have to be tied directly to the
business rules.
Representing the example above would require a char(10) field repeated in
every related table if it were the PK. The same thing as a smallint for
instance will only require 2 bytes in each related table and the query
performance would be much better.
HTH,
John
"John" wrote:

> What are the most effeicent primary keys?
> Is a Varchar or binary datatype bad for a primary key?
>|||>> What are the most effeicent primary keys?
It is a silly question. Keys, by themselves, have nothing to do with
efficiency since they are intended for data integrity purposes.
That said, physical implementation of key values might have some impact on
certain DML operations, but they cannot be generalized, since many other
factors could be involved. In SQL Server, with everything else being equal,
values that have a smaller size at the physical level often helps faster
physical operations.
There is no such accepted rule however, in general, large varchar and binary
keys are often cumbersome and complex, which is not a desirable
characteristic.
SQL Server has specialized implementations for binary datatypes which make
them unsuitable for certain operations. Also, there are certain specific
instances where relational operations involving large VARCHAR values ( esp.
trailing space issues, sparse ones, etc. ) perform poorly than those with
smaller CHAR or INT counterparts. On those grounds, one can conclude large
varchar and binary types are poor choices for a primary key.
Anith|||>> Keep in mind that the PK for your table does not have to be the only key field
[sic] in the table. If you need to store a "Customer ID" key that is 'AAAA1234
56', there is no reason you cannot create a unique constraint (var)char colu
mn and use the sm
allint behind the scenes. <<
Be sure to tell him that he has to check these two keys to be sure they
are in synch everytime they are used. This means extra storage for the
redundancy and for the procedural valifdation code.
But data integrity is shot to hell. Hey, if it does not have to be
right, then I can be very, very fast.|||>extra storage for the redundancy
this "extra storage" argument is just plain wrong. Why don't you create
a child table with FK to a parent, say 100 child rows per a parent one,
create an index on FK.
Consider 2 situations for the FK column:
varchar(30)
int
let's say it's an OLTP database, the child table is narrow, say 30
bytes of data except for the FK column. Determine the size fo the child
table and the FK index, and see for yourself.
You have been repeating that false "extra storage" argument for years,
why don't you see for yourself just once?|||>> this "extra storage" argument is just plain wrong. Why don't you create a
child table with FK to a parent, say 100 child rows per a parent one, crea
te an index on FK. <<
So this extra table and its index are not redundant and take up no
physical space?
The real problem is that MS SQL Server is still a decade or more behind
the technology
In Sybase SQL Anywhere and other products, the PK values are stored
ONCE and only once in the schema. The FK references are implemented as
pointer chains back to the PK value. This makes CASCADE operations
insanely fast. It makes the database smaller. It makes optimizations
really fast. Etc.
You are doing manually -- with all the human errors -- what the SQL
engine should be doing and validating.|||>The real problem is that MS SQL Server is still a decade or more >behind th
e technology
Oh yes, sure, and so are DB2 and Oracle - "still a decade or more
behind the technology", right?
I guess in the real world these Big 3 RDBMS ARE the technology, are
they not.
Maybe we just live in different universes|||There are actually 2 questions here:
1) What is the most efficient datatype to use for a primary key?
This is really irrelevant, since the primary key is made up of real data
that identifies one entity from a set of like entities. Whatever type
the primary key needs to be is what it will need to be.
The primary key is an aspect of logical design, where efficiency
[performance] isn't a factor.
However, the 2nd [and i'm swagging that it is more to the point of your
post] question is:
2) What is the most efficient datatype to use for implementation of
referential integrity?
Since the logical design of the database and physical implementation are
not necessarily the same, you can implement RI as necessary for
performance (which is still secondary to integrity, btw).
Typically, it is more efficient DRI using integer types. They are
handled more efficiently, and eliminate the need for cascading updates
when data in the primary key changes [which it can, since it's real
data: usually it has to change due to data input errors]
I find it much less confusing to translate the logical primary key to
the physical primary key constraint.
For the RI key, I make it not nullable, put a unique constraint on it
[required for DRI], and put FK references to that column. However, I
only add an RI key if it will be referenced - it isn't a given that it
will be.
John wrote:

>What are the most effeicent primary keys?
>Is a Varchar or binary datatype bad for a primary key?
>
>|||> Be sure to tell him that he has to check these two keys to be sure they
> are in synch everytime they are used. This means extra storage for the
> redundancy and for the procedural valifdation code.
That's not necessary with IDENTITY, because each natural key value exists in
only one place in the database and IDENTITY key values are immutable.
You've also neglected to mention all of the extra code required in every
application to deal with the fact that the natural primary key value CAN
change.
Your statement about data integrity just doesn't hold water. It is just as
possible to store an incorrect natural foreign key value as it is to store
an incorrect surrogate key value. Declarative Referential Integrity
constraints ensure that the values correspond to a valid primary key value
in the referenced table--regardless of whether you're using a natural or
surrogate foreign key. If anything, integrity is improved with surrogates.
Only one check constraint is required for the natural key value because it
exists in only one place, which means less code. Less code means fewer
bugs.
"--CELKO--" <jcelko212@.earthlink.net> wrote in message
news:1130353804.653625.148220@.f14g2000cwb.googlegroups.com...
> Be sure to tell him that he has to check these two keys to be sure they
> are in synch everytime they are used. This means extra storage for the
> redundancy and for the procedural valifdation code.
>
> But data integrity is shot to hell. Hey, if it does not have to be
> right, then I can be very, very fast.
>

No comments:

Post a Comment