Wednesday, June 18, 2008

Using SCOM to monitor SQL Error Logs

Our SQL DBA reported that occasionally he will see errors in the SQL Error Logs similar to:

SQL Server has encountered 45 occurrence(s) of IO requests taking longer than 15 seconds to complete on file [T:\MSSQL\Data\tempdev6_Data.NDF] in database [tempdb] (2). The OS file handle is 0x000005B4. The offset of the latest long IO is: 0x0000000c7b2000

Monitoring for this alert at first sounded simple, but as I dug into it I realized it could be much more difficult than I expected. First off, this error is reported in the SQL Error Logs and only in the SQL Error Logs, so we cant simply use the NT Event log to alert us. Secondly, depending on how you install SQL, these error logs could be anywhere on the system. And lastly, this is needed to be monitored on SQL 2000 and SQL 2005 systems, so xp_ReadErrorLog won't work for us.

So the breakdown seems fairly simple: find out where the logs are, use the text log monitor, and search for the string.

1. Find out where the SQL Error logs are stored. This first step turns out to be fairly simple, when the SQL DB Engine is discovered, one of the attributes it discovers is "$MPElement[Name='SQL!Microsoft.SQLServer.DBEngine']/ErrorLogLocation$". This should tell us where the logs are stored on each server regardless of how they were built or configured.

2. Use the text log monitor to search the SQL Error logs. Using the "Matches Regex" function, I should be able to simply search for the string "of IO requests taking longer than 15 seconds to complete on file" and report an alert on this

Sounds simple, reality however is rarely simple. The ErrorLogLocation attribute is stored as a single string such as "d:\MSSQL\log\ERRORLOG", but the text log monitor requires the path and file name to be separate components and there is no method within SCOM (that I am aware of) to split these components from a single string.

Other options:

  • Hard code the error log path and name
    • This differs on each box
    • It could be done with an override, but that would be a nightmare
  • Create an extended SQL DB Engine class that includes this setting as 2 separate attributes
    • That's a lot more work than I want to tackle, plus it's a maintenance nightmare
  • Create a new text log monitor that takes only 1 attribute for the path and name
    • Turns out this is included in SCOM as a binary (i.e. in a DLL) and would have no idea where to start with that
  • Create a script that searches the log file for a RegEx
    • This is possible, but potentially filled with problems

Using a script ultimately seems to be the best option, with it I can pass whatever options are needed (log path, how long to go back, the string to search for, and anything else needed) and it can be somewhat expandable in the future. In this case I decided to drop an NT Event message of the error and then use a separate rule to pick up this event and alert on it. I also added another parameter of DBVersion because SQL 2000 and SQL 2005 store their files in different formats

' SQLErrorLog.vbs


' param 0 - path to errorlog $Target/Property[Type="SQLServer!Microsoft.SQLServer.DBEngine"]/ErrorLogLocation$

' param 1 - time in minutes to include 30

' param 2 - string to match "of IO requests taking longer than"

' param 3 - db version $Target/Property[Type="SQLServer!Microsoft.SQLServer.DBEngine"]/Version$

Const ForReading = 1, ForWriting = 2, ForAppending = 8

Const TristateUseDefault = -2, TristateTrue = -1, TristateFalse = 0


SET oArgs = WScript.Arguments

SET oShell = CreateObject("Wscript.Shell")

set fso = CreateObject("Scripting.FileSystemObject")

errorLog = oArgs(0)

iTime = oArgs(1)

sMatch = oArgs(2)

dbVer = oArgs(3)

'oShell.LogEvent EVENT_SUCCESS, "Beginning check, errorLog: " & errorLog & ", iTime: " & iTime & ", sMatch: " & sMatch & ", dbVer: " & dbVer

' read text file

IF LEFT(dbVer,1) = 9 THEN

set f = fso.OpenTextFile(errorLog, ForReading,,TristateTrue)


set f = fso.OpenTextFile(errorLog, ForReading,,TristateFalse)


arLines = split(f.ReadAll,vbCrLf)

' find lines that are iTime minutes old

for i = UBound(arLines)-1 to 0 step -1

line = arLines(i)

lineTime = CDate(Left(line,19))

IF lineTime < DateAdd("n",- iTime, Now) THEN



IF RegExTest(sMatch, line) THEN

' generate alert

oShell.LogEvent EVENT_ERROR, "SQL I/O Error" & vbCrLf & line





Function RegExTest(sPattern, sString)

SET regEx = new RegExp

regEx.Pattern = sPattern

regEx.IgnoreCase = TRUE

regEx.Global = TRUE

SET Matches = regEx.Execute(sString)

IF Matches.Count > 0 THEN

RegExTest = true


RegExTest = false




Anonymous said...

Great article!!
I am trying to do similar things to check SQL error log for errors with severity level between 17 and 25. Also, this is for SQL 2K5 database. You mentioned about 'xp_readerrorlog' procedure. Will this proc make implementation easier? If yes, could you share the information if you know how it can be done.


Anonymous said...

I am looking for the exact same thing as anonymous. Is this possible?

shobnaamkoly said...

Its really a nice post, the content of this blog is really awesome and extraordinary.And also please Read link bvba Woodstone which provide information server monitoring software & server monitoring tools