Hey guys, just a quick question for you. I think I know the answer, but I
would like to get the expert's opinions...
I have a primary key called RequestID that is to be constructed of the
year, month, day, and a four-digit sequence number with leading zeros. An
example of the ninth Request placed today would be:
200506090009
As well as being the primary key, this field will be used in WHERE clauses
and in joins to several other tables in my schema.
My question is; should the data type be bigint or char(12)?Well, judging by size, I would choose the bigint since it is only 8 bytes
instead of the 12 bytes that are used by a char(12). I am not sure on this,
but I would assume that searching on an int would be faster. Again, that is
pure speculation on my part and I have no way to back it up. Just my 2¢.
Chris
"Tim Baur" wrote:
> Hey guys, just a quick question for you. I think I know the answer, but I
> would like to get the expert's opinions...
> I have a primary key called RequestID that is to be constructed of the
> year, month, day, and a four-digit sequence number with leading zeros. An
> example of the ninth Request placed today would be:
> 200506090009
> As well as being the primary key, this field will be used in WHERE clauses
> and in joins to several other tables in my schema.
> My question is; should the data type be bigint or char(12)?
>|||>> As well as being the primary key, this field will be used in WHERE
Logically, the data type of an attribute is often decided based on the
operations applicable on its values. Therefore as a primary consideration,
identify the potential operations -- either integer specific or character
specific -- and use that for determining the type.
Also, physical factors like storage size of the values, potential
performance implications etc. can be a valid considerations during specific
implementation, depending on how the queries are formulated and how the data
is updated. So 8 bytes vs. 12 bytes, esp. in a large table often tend to
favor smaller size datatype, esp. for key columns.
Anith|||"Anith Sen" <anith@.bizdatasolutions.com> wrote in
news:ePLNgaTbFHA.3932@.TK2MSFTNGP12.phx.gbl:
> Logically, the data type of an attribute is often decided based on the
> operations applicable on its values. Therefore as a primary
> consideration, identify the potential operations -- either integer
> specific or character specific -- and use that for determining the
> type.
> Also, physical factors like storage size of the values, potential
> performance implications etc. can be a valid considerations during
> specific implementation, depending on how the queries are formulated
> and how the data is updated. So 8 bytes vs. 12 bytes, esp. in a large
> table often tend to favor smaller size datatype, esp. for key columns.
>
Thank you, Chris and Anith, for your input. I also suspect that bigint is
the best choice, but because it is numeric more than size considerations.
If given the choice between char(8) and bigint I would still choose bigint.
I know from general programming experience that numeric lists are faster to
index, sort, and search than strings. I suspect that SQL Server wouldn't
be any different in this, but that is just a guess. I would be curious to
hear the definitive answer on the topic.|||Well, I am definitely not an expert, but I definitely have opinions, so here
goes...
May I choose super-secret option number three, "None of the Above"? How
about using a two-column key, consisting of RequestDate(smalldatetime) and
RequestNumber(smallint)? This would provide the following benefits:
1) Uses two less bytes than a bigint (bigint = 8 bytes, smalldatetime = 4
bytes and smallint = 2 bytes)
2) You get a larger set of available numbers for RequestNumber - 10000 in
you setup versus 32000+ in my setup (if you only use the positive values).
3) Makes standard types of calculations extremely easy. If you want to
retrieve items for a range of dates, you do not need to parse the key column
to extract the date portion. If you want to look for 'gaps' in the request
numbers for a given day, you do not need to parse the key column. The list
goes on and on.
4) Searches for a specific value are no problem - just allow the user to
enter the value in the encoded format, and convert it to your two column
values. When presenting the value, re-format to the encoded format (which
should really be done in the user interface layer anyway). The users never
need to deal with the actual storage format - and that should (almost) never
be the end-users' concern anyway.
Since you did not post DDL for this table, it is impossible to know, but I
would guess that you might already have a column for RequestDate. In that
case, there really is no good reason for the duplication - just run with
what you have.
"Tim Baur" <trbo20DISREG@.ARDyahoo.com> wrote in message
news:Xns9670A0623995Ctrbo20DISREGARDyaho
o@.207.46.248.16...
> Hey guys, just a quick question for you. I think I know the answer, but I
> would like to get the expert's opinions...
> I have a primary key called RequestID that is to be constructed of the
> year, month, day, and a four-digit sequence number with leading zeros. An
> example of the ninth Request placed today would be:
> 200506090009
> As well as being the primary key, this field will be used in WHERE clauses
> and in joins to several other tables in my schema.
> My question is; should the data type be bigint or char(12)?|||Just a thought,
How about Decimal(12,0)?
That will use 9 bytes though, not really any advantage from a "compact key"
perspective.
Better than having a composite primary key. IMO composite primary key's are
evil.
Just my $0.02.
Richard|||"Jeremy Williams" <jeremydwill@.netscape.net> wrote in
news:OIfXkkTbFHA.2420@.TK2MSFTNGP12.phx.gbl:
> Subject: Re: Primary Key, bigint or char?
> From: "Jeremy Williams" <jeremydwill@.netscape.net>
> Newsgroups: microsoft.public.sqlserver.programming
> Well, I am definitely not an expert, but I definitely have opinions,
> so here goes...
> May I choose super-secret option number three, "None of the Above"?
> How about using a two-column key, consisting of
> RequestDate(smalldatetime) and RequestNumber(smallint)? This would
> provide the following benefits:
> 1) Uses two less bytes than a bigint (bigint = 8 bytes, smalldatetime
> = 4 bytes and smallint = 2 bytes)
> 2) You get a larger set of available numbers for RequestNumber - 10000
> in you setup versus 32000+ in my setup (if you only use the positive
> values). 3) Makes standard types of calculations extremely easy. If
> you want to retrieve items for a range of dates, you do not need to
> parse the key column to extract the date portion. If you want to look
> for 'gaps' in the request numbers for a given day, you do not need to
> parse the key column. The list goes on and on.
> 4) Searches for a specific value are no problem - just allow the user
> to enter the value in the encoded format, and convert it to your two
> column values. When presenting the value, re-format to the encoded
> format (which should really be done in the user interface layer
> anyway). The users never need to deal with the actual storage format -
> and that should (almost) never be the end-users' concern anyway.
> Since you did not post DDL for this table, it is impossible to know,
> but I would guess that you might already have a column for
> RequestDate. In that case, there really is no good reason for the
> duplication - just run with what you have.
>
These are all good points, Jeremy, and your approach was considered.
The way it currently stands:
The request table has an assignment log that will log the creation
date. The point to using a date in the key is because I want the
request number to reset after each day. I would rather the users not be
able to easily figure out how many requests the help desk gets during
the course of a w.
There will *never* be more than 9,999 requests in one day. There
will probably never be more than 10.
Gaps in the sequence are no big deal. I would use Count() to get a
count of records for any particular day or date range.
Composite Keys are not the friendliest to use, especially in a table
as central to the schema as this one. I do use them in other places,
but this particular key serves as a foreign key in six other tables.|||Yeah, I know some developers/DBAs have an unnatural aversion to composite
keys. It seems to "make their teeth itch".
I have never understood the problem myself - it seems much more natural to
use the key information you are already collecting than to make up yet
another key just so you can get it down to one column. As often as not, all
this does is force the developer to join tables unnecessarily when
retrieving data, or use inefficient criteria in their WHERE clauses because
they are parsing the "composite" single column.
As for keeping the user from knowing the number of requests per w, I was
under the impression you would reset the request number each day no matter
how you constructed the key. It would not seem to make a difference whether
you were munging that in with the date in one column, or keeping them
separate - the work is the same. You must have something more going on here
that you have not shown, I guess.
Since your design is already decided on (no composite keys in this
situation), then there really does not seem to be anything compelling
criteria other than storage size to recommend one choice over the other. Go
with the bigint (as others have already said) - it should suit you design
well. Thanks for the dialog!
"Tim Baur" <trbo20DISREG@.ARDyahoo.com> wrote in message
news:Xns96715ECD5928Etrbo20DISREGARDyaho
o@.207.46.248.16...
> "Jeremy Williams" <jeremydwill@.netscape.net> wrote in
> news:OIfXkkTbFHA.2420@.TK2MSFTNGP12.phx.gbl:
>
> These are all good points, Jeremy, and your approach was considered.
> The way it currently stands:
> The request table has an assignment log that will log the creation
> date. The point to using a date in the key is because I want the
> request number to reset after each day. I would rather the users not be
> able to easily figure out how many requests the help desk gets during
> the course of a w.
> There will *never* be more than 9,999 requests in one day. There
> will probably never be more than 10.
> Gaps in the sequence are no big deal. I would use Count() to get a
> count of records for any particular day or date range.
> Composite Keys are not the friendliest to use, especially in a table
> as central to the schema as this one. I do use them in other places,
> but this particular key serves as a foreign key in six other tables.
>|||>> Yeah, I know some developers/DBAs have an unnatural aversion to composite
Actually some level of aversion might be quite natural given the existence
of known issues caused by composite keys that are well documented in
relational literature. Not sure if my teeth itches :-)
Not specific to this case, but in general, simple keys often assist in
formulating simpler relational expressions. Simple keys are often minimal
and always irreducible and therefore partial key dependencies are never an
issue.
And as already mentioned above, there are certain known problems exhibited
by compound keys in referencing table when data may be missing. For
instance, there is no logical choice for a DBMS to decide on whether to
insert/update a row in a referencing table when the values in a subset of
the columns in a multi-column referencing key are missing.
You are right; often developers include additional identifiers assuming they
are a panacea, without understanding the potential benefits or implications
of using an existing key, simple or composite.
Anith|||Thanks for replying, Anith - It is nice to get the input of one of the
actual experts!
Can you guide me to some resources where I can study the composite-key
issues you mentioned further? I have not come across any issues so far, but
I have only been developing in SQL for 10 years or so, so I am sure I have
not seen all there is to see.
My experiences with partial key dependencies typically signified that there
was normalization issues that needed to be worked out - creating an
additional simple key in addition to the composite key would really not have
helped at all. And I freely admit that I do not understand what you mean by:
"For instance, there is no logical choice for a DBMS to decide on whether to
insert/update a row in a referencing table when the values in a subset of
the columns in a multi-column referencing key are missing." If the composite
key is being used in a referencing/referenced table scenario, how does
missing data get involved - a normalized design would require all components
of the composite key to exist, wouldn't it?
I am always looking to expand my knowledge, so please let me know where I
can find out more. Thanks!
"Anith Sen" <anith@.bizdatasolutions.com> wrote in message
news:uYEHCldbFHA.1384@.TK2MSFTNGP09.phx.gbl...
composite
> Actually some level of aversion might be quite natural given the existence
> of known issues caused by composite keys that are well documented in
> relational literature. Not sure if my teeth itches :-)
>
yet
> Not specific to this case, but in general, simple keys often assist in
> formulating simpler relational expressions. Simple keys are often minimal
> and always irreducible and therefore partial key dependencies are never an
> issue.
> And as already mentioned above, there are certain known problems exhibited
> by compound keys in referencing table when data may be missing. For
> instance, there is no logical choice for a DBMS to decide on whether to
> insert/update a row in a referencing table when the values in a subset of
> the columns in a multi-column referencing key are missing.
>
their
> You are right; often developers include additional identifiers assuming
they
> are a panacea, without understanding the potential benefits or
implications
> of using an existing key, simple or composite.
> --
> Anith
>
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment