Preventing Deadlock: Primary Keys When importing data

I am creating a stored procedure for our SQL server which does the following:
1. Drops a table called 'Teachers'

drop table [users_dd].[dbo].[Teachers]

2. Recreates that table ready for new data like so...

CREATE TABLE [users_dd].[dbo].[Teachers] (
[username] char (30) NOT NULL,
[SL_Teachcode] nvarchar (12) NULL,
[DueDate] smalldatetime DEFAULT GetDate(),
[ID] INT IDENTITY (1,1) PRIMARY KEY NOT NULL)

3. The last step attempts to import new data. I have used the import/export data wizard to create this query to import new data.

select [users].[username], [users].[SL_Teachcode]
from [users]
where [users].[Staff_teaching]=1
order by [users].[SL_Teachcode]

The last step returns an error because the ID field will not accept Nulls. Of course I don't want it to have nulls as this is to be the primary key for this table. What I want is an automatically generated number there but how do I make that happen? I thought it was all in the definition of the field, ie [ID] INT IDENTITY (1,1) PRIMARY KEY NOT NULL). It seems there is something I missing here however.

Can anybody tell me what I doing wrong (or not doing)?

Murrayuse some thing like:

insert into [users_dd].[dbo].[Teachers] (
[username],].[SL_Teachcode])
select [users].[username], [users].[SL_Teachcode]
from [users]
where [users].[Staff_teaching]=1
order by [users].[SL_Teachcode]

tel us if it works
i try it on my side , it works.
Ridwan,,,|||If you do not need transactional recovery during this load you might do one or a combination of the folowing.

Instead of delete the table with corresponding rows, you might truncate it first and then delete it, less logged and faster. If you do drop the table, you can use select into instead of insert as that is minimally logged and faster as well.

HTH|||SELECT INTO is not faster and is not recommended because of concurrency implications. Truncating and then deleting just doesn't make sence. Why not just TRUNCATE?|||select into is not faster than insert into? I beg to differ, as inserts are fully logged they will take longer. They will also bloat your transaction log and if you have to reallocate space on a drive that could really hurt performance and slow things down. With select into there is no question of drive space allocation and it will run faster. Just for kicks I just ran a select into statement for 1.4 million rows, it took 26 seconds. For that same structure and table an insert into took 2 minutes and 54 seconds.

Now, obviously the argument would be the log and possibly the database file was growing in this time span so to make it an even playing field I truncated the table and did it again with no hard drive growth. The insert into only took 15 seconds without the hard drive growth but after a truncate and a drop table the select into only took 9 seconds.

So, I do not see any situation where the insert into is as fast as the select into and you get the bonus of knowing the hard drive will not have to reallocate space for the log file.

Yes, I meant truncate then DROP the table, not delete. Truncate with delete does not make sense, agreed.|||yup, blurred it out too fast :o

rhigdon, you're absolutely right, - did a 1M-row select into in 20 seconds and insert has been running for the past 6 minutes (Personal edition).

Now you got me experimenting with increased preallocated trx. log.

But I also noticed that we're assuming that recovery mode is Simple. Will it be the same difference in speed if I have it set to Full? Well, I'm still waiting on my insert :)|||No problem :)

I would assume so about the full and simple, you can watch that t-log grow by playing with the dbcc sqlperf('logspace') before and after inserts or select into's.|||BTW, do you have your auto-grow settings enabled? Most of our apps here got converted from 6.5 to either directly 2K or to 7.0 then 2K. Since in 6.5 world we didn't have this luxury of auto-growing, once discovered, - we first got relieved, but then started noticing that for 150+ servers we will have to implement space utilization monitors based on activity within each database. And then - react really quickly once the size reaches the disk limitation. So instead we performed a 2-month space growth analysis and pre-estimated potential annual growth for each database. Then we pre-allocated space on both data and trx. log devices and turned off auto-growth feature. Now we can tell with the level of accuracy to +/- 10MB how much disk space will be needed for any given database by the beginning of the next year.|||Thanks Ridwan,
I tried that code and worked fine. I think I went wrong by using the import/exprot data wizard to construct my query. As soon as I did it 'by hand' it worked fine.
Interesting conversation about the difference between select into and insert into. In this example there will never be more than about 90 -100 records so speed is not a big issue to this task as it takes less than a second to run anyhow.
Thank you all for your input... I hope I can return the favour someday.
Murray|||rdjabarov - that sounds pretty interesting, yes, we use auto-grow for everything. I do have a script that uses some stored procs that runs daily to check for free space and alert me if it gets low though.

Monday, March 26, 2012

Primary Keys When importing data

No comments:

Post a Comment

Preventing Deadlock

Blog Archive

About Me