Preventing Deadlock: Primary Key

I am setting up some tables where I used to have an identity column as the primary key. I changed it so the primary key is not a char field length of 20.

Is there going to be a big performance hit for this? I didn't like the identity field because every time I referenced a table I had to do a join to get the name of object.

EG:

-- Old way
tbProductionLabour
ID (pk)| Descr | fkCostCode
-------
1 | REBAR | 1J

tbTemplateLabour
fkTemplateID | fkLabourID | Manpower | Hours
--------------
1 | 1 | 1 | 0.15

-- New way
tbProductionLabour
Labour | fkCostCode
-------
REBAR | 1J

tbTemplateLabour
fkTemplateID | fkLabour | Manpower | Hours
--------------
1 | REBAR | 1 | 0.15

This is a very basic example, but you get the idea of what I am referring to.

Any thoughts?

MikeI didn't like the identity field because every time I referenced a table I had to do a join to get the name of object.

The light! Don't look at the light!

I guess I'll add you to the "not preffering surrogates" group

http://weblogs.sqlteam.com/brettk/archive/2004/06/09/1530.aspx

Excuse me while I climb back upon my barst...um desk chair...yeah that's right...|||EDIT: Didn't we have this conversation already?|||EDIT: Didn't we have this conversation already?
Probably, but as an in-experienced developer (wannabe) I was wondering if not using a identity field will really make that great of a performance difference.

I think the char(20) for a primary key is better mainly because when I open a table I am not seeing a number which I will have to look up. It also would reduce the number of joins I would have to make (which I guess would be better for performance). But, for looking up values or joining on that field, would the performance reduction be neglable or significant enough to want to use the identity field?

Mike|||Let me ask you this...

What do you think would be faster.

A). An umpteen table join on surrogate keys to get the data, or

B). A SELECT against 1 Table|||Let me ask you this...

What do you think would be faster.

A). An umpteen table join on surrogate keys to get the data, or

B). A SELECT against 1 Table
I would assume option B, but let me ask you this.

What do you think would be faster:
A) Looking up values based on an integer
or
B) Looking up values based on an umpteen char string

?

Mike B|||OK, I'll give you 80/1000 of a second

USE Northwind
GO

SET NOCOUNT ON
CREATE TABLE myTable99(Col1 sysname)
CREATE TABLE myTable00(Col1 sysname, Col2 int IDENTITY(1,1))
GO

DECLARE @.x int
SELECT @.x = 1
WHILE @.x < 1000
BEGIN
INSERT INTO myTable99(Col1) SELECT TABLE_NAME FROM INFORMATION_SCHEMA.Tables
INSERT INTO myTable00(Col1) SELECT TABLE_NAME FROM INFORMATION_SCHEMA.Tables
SELECT @.x = @.x + 1
END

INSERT INTO myTable99(Col1) SELECT 'Brett'

CREATE INDEX myIndex99 ON myTable99(Col1)
CREATE INDEX myIndex00 ON myTable00(Col2)

SELECT COUNT(*) FROM myTable99

DECLARE @.x1 datetime, @.y1 datetime, @.x2 datetime, @.y2 datetime
SELECT @.x1 = GetDate()
SELECT @.x1 AS systime, 'Starting int look up'
SELECT * FROM myTable00 WHERE Col2 = 216784
SELECT @.y1 = GetDate()
SELECT @.y1 AS systime, 'Endinging int look up'

SELECT @.x2 = GetDate()
SELECT @.x2 AS systime, 'Starting sysname look up'
SELECT * FROM myTable00 WHERE Col1 = 'Brett'
SELECT @.y2 = GetDate()
SELECT @.y2 AS systime, 'Endinging int look up'

SELECT DATEDIFF(ms,@.x1, @.y1), DATEDIFF(ms,@.x2, @.y2)
GO

SET NOCOUNT OFF
DROP TABLE myTable99
DROP TABLE myTable00
GO|||For small tables with no updates, it doesn't matter much. If you start to update the char values you are using as keys, you'll lose hair very quickly. If you add rows (beyond about 100,000 or so), the numeric keys will be significantly faster, due to lower total physical IO.

The short answer boils down to you can use what you want. As you scale upward, the surrogate keys look better and better!

-PatP|||For small tables with no updates, it doesn't matter much. If you start to update the char values you are using as keys, you'll lose hair very quickly. If you add rows (beyond about 100,000 or so), the numeric keys will be significantly faster, due to lower total physical IO.

The short answer boils down to you can use what you want. As you scale upward, the surrogate keys look better and better!

-PatP
Yeah, I pretty much aggree with that. The tables I am refering to will not be that big and they are kind of complex so I thought it would be best to use as many natural keys as possible. Of course there are places in my DB where I use the identity because I don't think the natural keys are all that good to use.

How many people use CompanyName as a natural key in a companies table?

I understand there maybe more then one CompanyName in the table but should they be unique by appending a number or geographical location, or using the CompanyName / Address as the primary key.

My thought is that it is best to use a surrogate here because to carry a company name and address as a forein key to other tables is probably costly.

There is the argument that the address can change but is that really a problem since you can specify "Cascade update related fields" on the other tables?

I would personally love to see something other then a number when I am looking at these tables but....

Mike B|||At least in my opinion, company name stinks as a primary key. We aren't all that big, but we have several hundred companies scattered wildly about North America with the same name, and on a worldwide basis it gets even worse.

JOINs are cheap. SQL Server makes them nearly free IF you keep the FK value small (INT or smaller) and you've got enough RAM in your server.

-PatP|||JOINs are cheap.

That's gotta be the most open ended statement I've heard in a while...

Also... "Company's with the same name scattered around"?

Either they truly are a different company, which means they are separate legal entity, or they are a site for a company...

Are you essentially saying that accessing 1 table would be slower than accessing many?|||Many are franchises, some just reuse common names in different jurisdictions. The net result is that if you look for companies with names like Subway or McDonald's you find hundreds of hits, most (but not all) with separate EIN values.

If you have to haul a fifty byte VARCHAR off the disk versus a four byte integer for every row in a 34 million row table, versus a join to a table cached in RAM, then the JOIN is cheaper than the single table. While SQL Server is good at hiding physical details from the user, some things are still big enough tasks so that smart design beats brute force every time.

-PatP

Monday, March 12, 2012

Primary Key

No comments:

Post a Comment

Preventing Deadlock

Blog Archive

About Me