I have a scheduled job that runs every minute to monitor for
long-running jobs in production databases. This job has been running
without issue for at least 8 months now. I have not done ANY work on
this server in at least 2 months and no one has rights to do anything
within the SQL Server on this box except me. A couple of days ago, I
started getting job failure messages sporradically and the reason was
that '##jobs already exists in the database'. This is a global temp
table that gets created, populated, and dropped in the stored procedure
that's run in my job.
I ran a trace to see what's going on and why the job thinks the temp
table still exists. I can see where the DROP TABLE statement for the
temp table executes and completes on a job run that succeeds, but if
the next job run fails on the statement that creates the temp table, I
can do a select for that table and it does exist and is still populated
with the records from the previous run. I also ran the trace with
errors/exceptions and none were picked up.
Has anyone ever seen where a DROP TABLE statement is issued and
completes, but the table doesn't actually drop? Below you will find
the stored procedure, the results of the trace (where the first job run
wraps up with the final statements and the next job run tries to
execute and fails), and the select from the temp table after the DROP
TABLE statement is executed. Sorry for the length.
Any ideas would be appreciated!
STORED PROCEDURE
--
CREATE PROCEDURE [dbo].[LongRunningJobs]
@.emailaddress varchar(100)
AS
-- Description: This stored procedure will detect long running jobs.
-- A long running job is defined as a job that has
-- been running longer than what is specified as the time
-- limit for the job in the LongRunningJobs_JobTimeLimits
-- table. If it detects a long running job an email is sent.
-- DECLARE variables
SET NOCOUNT ON
DECLARE @.jobcount int
DECLARE @.jobid uniqueidentifier
DECLARE @.jobname sysname
DECLARE @.spid smallint
DECLARE @.logintime datetime
DECLARE @.duration int
-- CREATE TABLE to hold job information
CREATE TABLE #enum_job (
Job_ID uniqueidentifier,
Last_Run_Date int,
Last_Run_Time int,
Next_Run_Date int,
Next_Run_Time int,
Next_Run_Schedule_ID int,
Requested_To_Run int,
Request_Source int,
Request_Source_ID varchar(100),
Running int,
Current_Step int,
Current_Retry_Attempt int,
State int
)
-- CREATE TABLE to hold job_id and the job_id in hex character format
CREATE TABLE ##jobs (
job_id uniqueidentifier ,
job_id_char varchar(100)
)
-- Get a list of jobs
INSERT INTO #enum_job
EXEC master.dbo.xp_sqlagent_enum_jobs 1, '' -- doesn't seem to
matter what you put here
-- calculate the #jobs TABLE with job_id's
-- and their hex character representation
INSERT INTO ##jobs
SELECT job_id, dbo.fn_hex_to_char(job_id,16) FROM #enum_job
-- get a COUNT of long running jobs
SELECT @.jobcount = COUNT(*)
FROM master.dbo.sysprocesses a
join ##jobs b
ON SUBSTRING(a.program_name,32,32)= b.job_id_char
join msdb.dbo.sysjobs c
ON b.job_id = c.job_id
join dbo.LongRunningJobs_JobTimeLimits d
ON c.job_id = d.job_id
WHERE login_time < DATEADD(mi,-(d.minutes),GETDATE()) -- check for jobs
that have been running longer than 1 minute.
-- record long running jobs in LongRunningJobs_Jobs
IF @.jobcount > 0
BEGIN
UPDATE LongRunningJobs_Jobs
SET still_running = 0
DECLARE jobs_cursor CURSOR
FOR
SELECT SUBSTRING(c.name,1,78), c.job_id, a.spid, a.login_time
FROM master.dbo.sysprocesses a
join ##jobs b
ON SUBSTRING(a.program_name,32,32)= b.job_id_char
join msdb.dbo.sysjobs c
ON b.job_id = c.job_id
join dbo.LongRunningJobs_JobTimeLimits d
ON c.job_id = d.job_id
WHERE login_time < DATEADD(mi,-(d.minutes),GETDATE())
OPEN jobs_cursor
FETCH NEXT FROM jobs_cursor INTO @.jobname, @.jobid, @.spid, @.logintime
WHILE @.@.FETCH_STATUS = 0
BEGIN
IF NOT EXISTS (SELECT job_id, login_time FROM
dbo.LongRunningJobs_Jobs
WHERE job_id = @.jobid AND login_time = @.logintime)
BEGIN
SET @.duration = (SELECT DATEDIFF(mi, @.logintime, getdate()))
EXEC LongRunningJobs_Notification @.jobid, @.logintime, @.duration,
@.emailaddress
INSERT INTO dbo.LongRunningJobs_Jobs (job_id, job_name, spid,
login_time, still_running)
VALUES (@.jobid, @.jobname, @.spid, @.logintime, 1)
FETCH NEXT FROM jobs_cursor INTO @.jobname, @.jobid, @.spid, @.logintime
END
ELSE
UPDATE LongRunningJobs_Jobs
SET still_running = 1
WHERE job_id = @.jobid AND login_time = @.logintime
FETCH NEXT FROM jobs_cursor INTO @.jobname, @.jobid, @.spid, @.logintime
END
CLOSE jobs_cursor
DEALLOCATE jobs_cursor
END
ELSE
UPDATE LongRunningJobs_Jobs
SET still_running = 0
DELETE FROM LongRunningJobs_Jobs
WHERE still_running = 0
DROP TABLE #enum_job
DROP TABLE ##jobs
GO
TRACE RESULTS
--
EventClass TextData
-- --
SP:StmtStarting -- LongRunningJobs
UPDATE LongRunningJobs_Jobs
SET still_running = 0
SP:StmtCompleted -- LongRunningJobs
UPDATE LongRunningJobs_Jobs
SET still_running = 0
SP:StmtStarting -- LongRunningJobs
DELETE FROM LongRunningJobs_Jobs
WHERE still_running = 0
SP:StmtCompleted -- LongRunningJobs
DELETE FROM LongRunningJobs_Jobs
WHERE still_running = 0
SP:StmtStarting -- LongRunningJobs
DROP TABLE #enum_job
SP:StmtCompleted -- LongRunningJobs
DROP TABLE #enum_job
SP:StmtStarting -- LongRunningJobs
DROP TABLE ##jobs
SP:StmtCompleted -- LongRunningJobs
DROP TABLE ##jobs
SP:Completed EXEC ISS_DBAData_Production.dbo.LongRunningJobs
'rfsupport@.kroger.com'
SQL:StmtCompleted EXEC ISS_DBAData_Production.dbo.LongRunningJobs
'rfsupport@.kroger.com'
SQL:BatchCompleted EXEC ISS_DBAData_Production.dbo.LongRunningJobs
'rfsupport@.kroger.com'
SQL:BatchStarting EXEC ISS_DBAData_Production.dbo.LongRunningJobs
'rfsupport@.kroger.com'
SQL:StmtStarting EXEC ISS_DBAData_Production.dbo.LongRunningJobs
'rfsupport@.kroger.com'
SP:Starting EXEC ISS_DBAData_Production.dbo.LongRunningJobs
'rfsupport@.kroger.com'
SP:StmtStarting -- LongRunningJobs
SET NOCOUNT ON
SP:StmtCompleted -- LongRunningJobs
SET NOCOUNT ON
SP:StmtStarting -- LongRunningJobs
CREATE TABLE #enum_job (
Job_ID uniqueidentifier,
Last_Run_Date int,
Last_Run_Time int,
Next_Run_Date int,
Next_Run_Time int,
Next_Run_Schedule_ID int,
Requested_To_Run int,
Request_Source int,
Request_Source_ID varch
SP:StmtCompleted -- LongRunningJobs
CREATE TABLE #enum_job (
Job_ID uniqueidentifier,
Last_Run_Date int,
Last_Run_Time int,
Next_Run_Date int,
Next_Run_Time int,
Next_Run_Schedule_ID int,
Requested_To_Run int,
Request_Source int,
Request_Source_ID varch
SP:StmtStarting -- LongRunningJobs
CREATE TABLE ##jobs (
job_id uniqueidentifier ,
job_id_char varchar(100)
)
-- Get a list of jobs
SQL:BatchCompleted EXEC ISS_DBAData_Production.dbo.LongRunningJobs
'rfsupport@.kroger.com'
SELECT OF ##jobs AFTER DROP STATEMENT HAS BEEN EXECUTED SHOWS TABLE
STILL EXISTS
---
job_id job_id_char
--
----
0887FA02-2785-415B-A1E7-0F70EA9B2BF9 02FA870885275B41A1E70F70EA9B2BF9
A22E5C05-9331-4D13-A2E0-C4470EF51BEE 055C2EA23193134DA2E0C4470EF51BEE
B45F740A-3847-42E4-B5F6-0A7AC400E5B8 0A745FB44738E442B5F60A7AC400E5B8
7EDEBA0D-730C-4C57-BD79-4ABAF8D2FB8F 0DBADE7E0C73574CBD794ABAF8D2FB8F
6152F60D-7CA7-4ADD-A117-43D1E6A9E07E 0DF65261A77CDD4AA11743D1E6A9E07E
F7810610-F203-4C52-BE0C-BBF9CF68C887 100681F703F2524CBE0CBBF9CF68C887
F482D610-0171-4E1F-B43F-1266AE7506A2 10D682F471011F4EB43F1266AE7506A2
9FB2FA11-B69F-41DF-AD25-49FF5C605F37 11FAB29F9FB6DF41AD2549FF5C605F37
6CBB1318-95D3-4B03-B5B5-E8C0B526087C 1813BB6CD395034BB5B5E8C0B526087CForgot to mention:
SQL Server 2000 Enterprise Edition with sp4.
This is an active/active cluster.
The job is set up on both nodes and is only failing on the first node.|||Apan,
Why not try making the table a local temporary table instead of global
temporary table?...proc won't see it as an existing table...since you're
dropping it at the end of the proc anyway.
HTH
Jerry
"apan" <anapatterson@.yahoo.com> wrote in message
news:1148661240.078643.204250@.y43g2000cwc.googlegroups.com...
> Forgot to mention:
> SQL Server 2000 Enterprise Edition with sp4.
> This is an active/active cluster.
> The job is set up on both nodes and is only failing on the first node.
>|||I had originally set it up as a global temp table because there were
going to be other sprocs querying it. I've since changed this (nothing
else in the database calls on this temp table), so I probably could
change it to a local temp table.
However, that really doesn't answer my question as to why it's
occasionally not being dropped even though the DROP TABLE command
executes. I'm not sure why it would matter if it's local or global if
the DROP is called.|||FYI - I've gone ahead and changed the global to a local to see if this
fixes the problem since I don't really need a global. But I'd still
like to see if anyone's got any ideas as to my previous question...
BTW, thanks for your input Jerry...forgot to say that before.|||It wouldn't...any blocking occuring that could prevent this? My solution
was an imediate "get you past this" solution.
HTH
Jerry
"apan" <anapatterson@.yahoo.com> wrote in message
news:1148663035.537739.42460@.y43g2000cwc.googlegroups.com...
>I had originally set it up as a global temp table because there were
> going to be other sprocs querying it. I've since changed this (nothing
> else in the database calls on this temp table), so I probably could
> change it to a local temp table.
> However, that really doesn't answer my question as to why it's
> occasionally not being dropped even though the DROP TABLE command
> executes. I'm not sure why it would matter if it's local or global if
> the DROP is called.
>|||Nope, no blocking. I also traced locks and errors/exceptions in my
profile trace when I ran it and none showed up. Also checked the
Locks/Process ID in Enterprise Manager for blocking when I could see
that the temp table hadn't been dropped and didn't see any blocking
there.|||Hmmm...if you query for the ## table in TEMPDB does it still exist?
"apan" <anapatterson@.yahoo.com> wrote in message
news:1148664132.029575.250840@.38g2000cwa.googlegroups.com...
> Nope, no blocking. I also traced locks and errors/exceptions in my
> profile trace when I ran it and none showed up. Also checked the
> Locks/Process ID in Enterprise Manager for blocking when I could see
> that the temp table hadn't been dropped and didn't see any blocking
> there.
>|||When I queried the global table (see last 20 or so lines of original
post for query details), I believe I was just doing it from one of the
user db's and it found it and brought back data. I don't think it
would matter if I queried for it from TEMPDB since it was a global
table...or would it?|||The global temporary table IS stored in the TEMPDB database. But if you're
getting records back the it does exist.
"apan" <anapatterson@.yahoo.com> wrote in message
news:1148666005.668894.271130@.y43g2000cwc.googlegroups.com...
> When I queried the global table (see last 20 or so lines of original
> post for query details), I believe I was just doing it from one of the
> user db's and it found it and brought back data. I don't think it
> would matter if I queried for it from TEMPDB since it was a global
> table...or would it?
>|||I will post back on Tuesday as to whether or not I'm still getting
failures after making the change to use a local temp table. I haven't
received any failures so far, but the failures are sporradic and can
occur every minute or a couple times every few hours. This whole thing
is very weird. Not only because the DROP statement should be dropping
the object, but especially since the second node of this active/active
cluster is running the exact same job with the version that contains
the global temp table and I've not heard a peep from it. The only
difference between the two nodes are the user db's. The contents of
the db I've created for my admin stuff (like this sproc) is exactly the
same on both nodes (except now one has local temp table and the other
has a global one). And the fact that it's just started failing in the
last couple of days after about 8 months of running perfectly (and no
changes have been made to either server lately) is just perplexing.
Anyway...I'll post again on Tuesday. Thanks for your input.|||FYI - haven't received any errors from this job since I made the change
on Friday. No rhyme or reason as to why this change would have made a
difference. Still would be interested to know if anyone has any idea
what might have been happening within SQL Server to cause it to not
actually drop the table when it said it had.
No comments:
Post a Comment