Friday, March 23, 2012

Primary Keys

I came across a database created by another person in the company that
frequently has 2 primary keys on its tables. I'm not sure why it was done
this way, my guess is to create uniqueness with the 2-field combination.
My question is: Is this poor design? Shouldn't we create constraints or
indexes if 2-field uniqueness is the purpose. Is one way more efficient
than the other? Any other thoughts as to why a table would be set up this
way? Thanks.>> I came across a database created by another person in the company that[vbcol=seagreen]
Since 2 primary keys are impossible in a single SQL table, I guess you meant
it to be a single key with two columns.
[vbcol=seagreen]
The number of columns in a key by itself does not tell us whether a specific
design is good or bad. There is nothing wrong in having multi-column keys;
however multi-column references can occasionally mess up data integrity
since SQL products allow NULLs in referencing columns.
[vbcol=seagreen]
Not necessarily. Keys by themselves are constraints and guarantee uniqueness
any way.
[vbcol=seagreen]
That is mostly irrelevant since at the logical level, key selection should
not be based on efficiency which is determined at the implementation level.
Considering the physical implementation level, unless we consider all
possible query combinations, data access paths and optimization rules and
test out all of them on a specific system using on a specific dataset, we
cannot generalize and conclude single column keys are efficient than
multicolumn keys or vice-versa.
Anith|||Sorry wasn't clear, you are correct, the tables have 2 columns combined to
make a concatenated primary key. In some instances, one column is a foreign
key of another table.
My next question: If a primary key consists of 2 columns, how would, or
could you, make that a foreign key in another table, and how would you query
such a thing?
Say for instance, I have a customer table, and an address table.
The address table fields:
CustomerID
AddressDescription
Address
Sample Data:
CustomerID AddressDescription Address
1 Main Office 123 Main
Street
1 Secondary Location 456 Main Street
So now I want to create another table that lists, for instance, people
located at each office. How would I set up the foreign key to point to the
address table. Wouldn't it be necessary to create another field in the
address table, maybe an identity field, and make that field the primary key?
And how would you query the database when joining tables? If you joined the
address table to a query, could you base it on both the CustomerID and
AddressDescription being equal to another field in another table. You can't
do this, can you? So I guess efficiency really isn't my question, maybe
more that it's a practicality or best practice issue: Under what
circumstances would a 2-column primary key be preferable to a nonmeaningful
primary key, with perhaps constraints on the 2 columns if the combined
uniqueness is the desired result?
I hope I'm making sense, thanks for your thoughts! And thanks for the
reminder about allowed nulls in the referenced columns, that could raise
serious issues.
"Anith Sen" <anith@.bizdatasolutions.com> wrote in message
news:OXm4hmstEHA.2184@.TK2MSFTNGP12.phx.gbl...
> Since 2 primary keys are impossible in a single SQL table, I guess you
meant
> it to be a single key with two columns.
>
> The number of columns in a key by itself does not tell us whether a
specific
> design is good or bad. There is nothing wrong in having multi-column keys;
> however multi-column references can occasionally mess up data integrity
> since SQL products allow NULLs in referencing columns.
>
> Not necessarily. Keys by themselves are constraints and guarantee
uniqueness
> any way.
>
> That is mostly irrelevant since at the logical level, key selection should
> not be based on efficiency which is determined at the implementation
level.
> Considering the physical implementation level, unless we consider all
> possible query combinations, data access paths and optimization rules and
> test out all of them on a specific system using on a specific dataset, we
> cannot generalize and conclude single column keys are efficient than
> multicolumn keys or vice-versa.
> --
> Anith
>|||To avoid problems and other types of modifications to the data structures, I
would suggest switching to a surrogate key and making that the PK. You can
then create a UNIQUE index covering those two columns.
This surrogate key allows you more flexibility down the road. It's easier
to add tables, create the joins (less typing anyhow), modify the current PK
to something else etc.
HTH
Rick
"Rock" <rockisland@.yahoo.com> wrote in message
news:OLbfeCttEHA.2804@.TK2MSFTNGP14.phx.gbl...
> Sorry wasn't clear, you are correct, the tables have 2 columns combined to
> make a concatenated primary key. In some instances, one column is a
foreign
> key of another table.
> My next question: If a primary key consists of 2 columns, how would, or
> could you, make that a foreign key in another table, and how would you
query
> such a thing?
> Say for instance, I have a customer table, and an address table.
> The address table fields:
> CustomerID
> AddressDescription
> Address
> Sample Data:
> CustomerID AddressDescription Address
> 1 Main Office 123 Main
> Street
> 1 Secondary Location 456 Main Street
> So now I want to create another table that lists, for instance, people
> located at each office. How would I set up the foreign key to point to
the
> address table. Wouldn't it be necessary to create another field in the
> address table, maybe an identity field, and make that field the primary
key?
> And how would you query the database when joining tables? If you joined
the
> address table to a query, could you base it on both the CustomerID and
> AddressDescription being equal to another field in another table. You
can't
> do this, can you? So I guess efficiency really isn't my question, maybe
> more that it's a practicality or best practice issue: Under what
> circumstances would a 2-column primary key be preferable to a
nonmeaningful
> primary key, with perhaps constraints on the 2 columns if the combined
> uniqueness is the desired result?
> I hope I'm making sense, thanks for your thoughts! And thanks for the
> reminder about allowed nulls in the referenced columns, that could raise
> serious issues.
>
>
> "Anith Sen" <anith@.bizdatasolutions.com> wrote in message
> news:OXm4hmstEHA.2184@.TK2MSFTNGP12.phx.gbl...
that[vbcol=seagreen]
> meant
> specific
keys;[vbcol=seagreen]
the[vbcol=seagreen]
> uniqueness
should[vbcol=seagreen]
> level.
and[vbcol=seagreen]
we[vbcol=seagreen]
>|||>> My next question: If a primary key consists of 2 columns, how would, or[vbcol=seagreen]
That would be simple. Just make sure your referencing table has two columns
with compatible datatypes and you can use the REFERENCES clause in the
CREATE TABLE or ALTER TABLE DDL like:
CREATE TABLR tbl (
c1 INT NOT NULL, c2 CHAR(30) NOT NULL, ...
( c1, c2 ) REFERENCES tbl2 ( c1, c2 )
PRIMARY KEY ( c1, c2 ) ) ;
For exact syntax, please refer to SQL Server Books Online.
[vbcol=seagreen]
Considering the example in your post, if the design is clean and both
columns are non-nullable they can be considered as valid choice for primary
key. ( Note that there are other desirable characteristics like stability,
minimality, simplicity and familiarity that come into play during key
selection as well )
[vbcol=seagreen]
As with many design considerations, the answer is it depends. If two columns
in the primary key can truly distinguish one entity from another, there is
no need to introduce a redundant identifier. And all the references to this
table must include the two columns and meaningful queries can be expressed
without loss of integrity.
However, multi-column keys can be often cumbersome, thereby could fail the
simplicity criteria. And NULLs in referencing columns is a serious issue in
SQL products which gives another logical reason to avoid multi-column keys
in some cases. That often gives a logical reason to use an additional
identifier which is unfamiliar, but stable, simple and minimal.
BTW, nothing prevents one from introducing artificial identifiers in all
your tables, but there are no proven advantages in doing so.
[vbcol=seagreen]
Yes, it does. But neither the SQL standards nor the vendors are interested
in addressing such issues since they are mostly busy implementing UFOs :-)
Anith|||On Wed, 20 Oct 2004 12:14:14 -0600, Rock wrote:

>My next question: If a primary key consists of 2 columns, how would, or
>could you, make that a foreign key in another table, and how would you quer
y
>such a thing?
>Say for instance, I have a customer table, and an address table.
>The address table fields:
>CustomerID
>AddressDescription
>Address
>Sample Data:
>CustomerID AddressDescription Address
>1 Main Office 123 Main
>Street
>1 Secondary Location 456 Main Street
>So now I want to create another table that lists, for instance, people
>located at each office. How would I set up the foreign key to point to the
>address table. Wouldn't it be necessary to create another field in the
>address table, maybe an identity field, and make that field the primary key?[/vbcol
]
Hi Rock,
That's not necessary. Your people table would look like this:
CREATE TABLE People (....,
...,
CustomerID int NOT NULL,
AddressDescription varchar(25) NOT NULL,
...,
FOREIGN KEY (CustomerID, AddressDescription)
REFERENCES Addresses,
...
)
[vbcol=seagreen]
>And how would you query the database when joining tables? If you joined th
e
>address table to a query, could you base it on both the CustomerID and
>AddressDescription being equal to another field in another table. You can'
t
>do this, can you?
Yes, you can:
SELECT some interesting columns
FROM People
INNER JOIN Addresses
ON Addresses.CustomerID = People.CustomerID
AND Addresses.AddressDescription = People.AddressDescription
WHERE ....

> So I guess efficiency really isn't my question, maybe
>more that it's a practicality or best practice issue: Under what
>circumstances would a 2-column primary key be preferable to a nonmeaningful
>primary key, with perhaps constraints on the 2 columns if the combined
>uniqueness is the desired result?
If the addresses table used CustomerID plus a short code for the
addresstype as it's primary key, I'd just use this natural key.
In your case, the more verbose AddressDescription is used as part of the
primary key. As you can see above, this complete description would have to
be repeated each time an address is referenced. This will use up more
space in your database. I would definitely consider using a shorter key.
This key *could* be the AddressType (moving the AddressDescription to
another table, listing all AddressTypes with their description), or I
could choose to introduce a surrogate key. This will take up more space in
the addresses table itself (as you add an extra column, plus an extra
index to check uniqueness on both the surrogate key and the combination of
CustomerID / AddressDescription), but it would reduce the amount of space
needed for references. It would increase speed on joins, but it might also
introduce the need to join in an extra table on some queries.
The bottom line is: there is no fixed answer, it all depends. At the end
of the day, you'll have to weigh all options against each other and choose
the one that fits your situation best.

>I hope I'm making sense, thanks for your thoughts! And thanks for the
>reminder about allowed nulls in the referenced columns, that could raise
>serious issues.
These can be easiliy avoided. Either set a NOT NULL constraint on the
columns, or (if you must allow NULLS) add a CHECK constraint to ensure
that either both columns are NULL, or both columns are not NULL.
CREATE TABLE People (....,
...,
CustomerID int, -- NULL allowed
AddressDescription varchar(25), -- NULL allowed
...,
FOREIGN KEY (CustomerID, AddressDescription)
REFERENCES Addresses,
CHECK ( ( CustomerID IS NULL
AND AddressDescription IS NULL)
OR ( CustomerID IS NOT NULL
AND AddressDescription IS NOT NULL))
...
)
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hugo,
[vbcol=seagreen]
Not quite that easily. The problem is in regard to multi-column primary keys
and missing information. Relational literature has volumes about such
logical problems and the lack of theory behind handling missing information
being a serious issue.
We know SQL has only one way of dealing with missing information -- NULLs.
Since there is no mandatory requirement that all foreign key column values
should be non-nullable, SQL allows foreign key references on columns whose
values are known, unknown or partially known.
Now, consider a table with a multi-column primary key with columns, colA and
colB which is referenced by another table with corresponding multi-column
foreign key.
When the user inserts a referencing row with a value for colA and NULL for
colB, should the DBMS allow the insert? Since logically DBMS cannot preserve
the integrity of the relationship should it abort the insert? Or should it
consider the validity of the colA value alone and proceed with the insert?
What if there are duplicate values for colA in the referencing table which
cannot be evaluated due to NULLs in colB? If the colA value exists only once
should the DBMS substitute the only corresponding value for colB and proceed
with the insert ?
When deleting all the rows from the referenced table which contain a
specific value for colA, what should the DBMS do to the referencing table
that contains corresponding rows with a value for colA and a NULL for colB ?
While updating a few rows in the referenced table which contain a known colA
and colB, should the DBMS allow cascading actions on all corresponding rows
in the referencing table? Or should it evaluate the existence of rows with a
value for colA and exclude the rows with a null for colB ?
Anith|||On Wed, 20 Oct 2004 17:21:36 -0500, Anith Sen wrote:

>Hugo,
>
>Not quite that easily. The problem is in regard to multi-column primary key
s
>and missing information. Relational literature has volumes about such
>logical problems and the lack of theory behind handling missing information
>being a serious issue.
(snip)
>When the user inserts a referencing row with a value for colA and NULL for
>colB, should the DBMS allow the insert? Since logically DBMS cannot preserv
e
>the integrity of the relationship should it abort the insert? Or should it
>consider the validity of the colA value alone and proceed with the insert?
(snip more consequences of this proposed insert)
Hi Anith,
That is exactly why I suggested to either give all foreign key columns a
NOT NULL constraint, or (if NULL must be allowed) a CHECK to enforce that
they are either ALL null, or ALL not null. This will prevent all the
potential problems you describe.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)

No comments:

Post a Comment