Monday, June 30, 2008

Monitoring Logon Scripts via SCOM

In a large environment you need to know when something goes wrong. This is not necessarily because you want to know every issue that can occur, but because a single small issue can quickly grow into many small issues; which results in one large issue.

Say for instance that you have a logon script that maps the users home folders and some shared printers in a terminal server / citrix environment. If a user's printer fails to map correctly that could be the first symptom of a larger environmental issue that is effecting all users – whether they report it or not. So if we monitor this activity in SCOM then we will know what is failing, where it is failing, who it is failing for, and when it is failing – all this could hopefully direct us to why it is failing.

The solution is fairly simple: edit the logon script to identify and report failures, create a SCOM MP to catalog those failures.

Our logon script uses VBScript, so a simple subroutine can be added to check for an error code and then write the error to the NT Event Log. Then just call this sub wherever something may fail.

SUB CheckError(strSource)

IF Err.Number <> 0 THEN

WriteToLog strSource & " -- " & Err.Description & " " & Err.number & " (" & Hex(Err.number) & ")"

oShell.LogEvent EVENT_ERROR, "Printer.vbs" & vbCrLf & userName & " - " & strSource & " -- " & Err.Description & " " & Err.number & " (" & Hex(Err.number) & ")"

Err.Clear

END
IF

END
SUB


The SCOM MP simply uses the NT Event Rule to look for these events and report them.

Friday, June 27, 2008

Changing the PrintProcessor on a Windows 2000 Print Server

One of our customers has a W2k print server with several hundred (nearly 600) printers. Half are used for a program named CreateForm and therefore have a printprocessor named CfPrint, but the rest have a mixture of WinPrint and some other HP processors. They recently began having printer problems that we couldn't diagnose, but the idea came up to limit the number of variables in play: drivers, configurations, processors, etc… So its up to me to make nearly 300 print queues use WinPrint as their processor.

Under W2k3 this would be an easy item, simply write a WMI script to query the printers and then set the processor to WinPrint. W2k however sets these properties as read-only, so back to the drawing board.

It looks possible to set the processor using printui.dll, but it requires a little knowledge to make it work. First off, it doesn't seem to set the processor when run locally, instead I had to set it remotely from a W2k3 system. Secondly, it has a complex command line that can be difficult to handle; I came up with the following:

rundll32 printui.dll,PrintUIEntry /Xs /n "\\server\printer" PrintProcessor WinPrint

Now that we have a command to set the processor, we need a list of printers to change. As I stated earlier, half of these are used in CreateForm and therefore cant be changed; so we need a list of all printers that are not using the CfPrint processor. The command line utility WMIC works well for this, by using the following command I can get a listing of all printers and their processors.

wmic /node:server path win32_printer get DeviceID, PrintProcessor

Pipe the output into Excel or some other tool to filter out anything you don't want to change and save the list of printers to a text file named printers.txt. Now to simply iterate through the file and execute the printui commands.

for /F %i in (printers.txt) do rundll32 printui.dll,PrintUIEntry /Xs /n "\\server\%i" PrintProcessor WinPrint

That's it! You can re-run the WMIC command to ensure you got everything. Of course if your doing as many printers as I am then it may take a few minutes for everything to settle properly

Wednesday, June 18, 2008

Using SCOM to monitor SQL Error Logs

Our SQL DBA reported that occasionally he will see errors in the SQL Error Logs similar to:

SQL Server has encountered 45 occurrence(s) of IO requests taking longer than 15 seconds to complete on file [T:\MSSQL\Data\tempdev6_Data.NDF] in database [tempdb] (2). The OS file handle is 0x000005B4. The offset of the latest long IO is: 0x0000000c7b2000

Monitoring for this alert at first sounded simple, but as I dug into it I realized it could be much more difficult than I expected. First off, this error is reported in the SQL Error Logs and only in the SQL Error Logs, so we cant simply use the NT Event log to alert us. Secondly, depending on how you install SQL, these error logs could be anywhere on the system. And lastly, this is needed to be monitored on SQL 2000 and SQL 2005 systems, so xp_ReadErrorLog won't work for us.

So the breakdown seems fairly simple: find out where the logs are, use the text log monitor, and search for the string.

1. Find out where the SQL Error logs are stored. This first step turns out to be fairly simple, when the SQL DB Engine is discovered, one of the attributes it discovers is "$MPElement[Name='SQL!Microsoft.SQLServer.DBEngine']/ErrorLogLocation$". This should tell us where the logs are stored on each server regardless of how they were built or configured.

2. Use the text log monitor to search the SQL Error logs. Using the "Matches Regex" function, I should be able to simply search for the string "of IO requests taking longer than 15 seconds to complete on file" and report an alert on this

Sounds simple, reality however is rarely simple. The ErrorLogLocation attribute is stored as a single string such as "d:\MSSQL\log\ERRORLOG", but the text log monitor requires the path and file name to be separate components and there is no method within SCOM (that I am aware of) to split these components from a single string.

Other options:

  • Hard code the error log path and name
    • This differs on each box
    • It could be done with an override, but that would be a nightmare
  • Create an extended SQL DB Engine class that includes this setting as 2 separate attributes
    • That's a lot more work than I want to tackle, plus it's a maintenance nightmare
  • Create a new text log monitor that takes only 1 attribute for the path and name
    • Turns out this is included in SCOM as a binary (i.e. in a DLL) and would have no idea where to start with that
  • Create a script that searches the log file for a RegEx
    • This is possible, but potentially filled with problems


Using a script ultimately seems to be the best option, with it I can pass whatever options are needed (log path, how long to go back, the string to search for, and anything else needed) and it can be somewhat expandable in the future. In this case I decided to drop an NT Event message of the error and then use a separate rule to pick up this event and alert on it. I also added another parameter of DBVersion because SQL 2000 and SQL 2005 store their files in different formats


' SQLErrorLog.vbs

'

' param 0 - path to errorlog $Target/Property[Type="SQLServer!Microsoft.SQLServer.DBEngine"]/ErrorLogLocation$

' param 1 - time in minutes to include 30

' param 2 - string to match "of IO requests taking longer than"

' param 3 - db version $Target/Property[Type="SQLServer!Microsoft.SQLServer.DBEngine"]/Version$



Const ForReading = 1, ForWriting = 2, ForAppending = 8

Const TristateUseDefault = -2, TristateTrue = -1, TristateFalse = 0

CONST EVENT_SUCCESS = 0, EVENT_ERROR = 1, EVENT_WARNING = 2, EVENT_INFORMATION = 4


SET oArgs = WScript.Arguments

SET oShell = CreateObject("Wscript.Shell")

set fso = CreateObject("Scripting.FileSystemObject")


errorLog = oArgs(0)

iTime = oArgs(1)

sMatch = oArgs(2)

dbVer = oArgs(3)


'oShell.LogEvent EVENT_SUCCESS, "Beginning check, errorLog: " & errorLog & ", iTime: " & iTime & ", sMatch: " & sMatch & ", dbVer: " & dbVer

' read text file

IF LEFT(dbVer,1) = 9 THEN

set f = fso.OpenTextFile(errorLog, ForReading,,TristateTrue)

ELSE

set f = fso.OpenTextFile(errorLog, ForReading,,TristateFalse)

END
IF

arLines = split(f.ReadAll,vbCrLf)


' find lines that are iTime minutes old

for i = UBound(arLines)-1 to 0 step -1

line = arLines(i)

lineTime = CDate(Left(line,19))

IF lineTime < DateAdd("n",- iTime, Now) THEN

EXIT
FOR

ELSE

IF RegExTest(sMatch, line) THEN

' generate alert

oShell.LogEvent EVENT_ERROR, "SQL I/O Error" & vbCrLf & line

wscript.quit

END
IF

END
IF

NEXT


Function RegExTest(sPattern, sString)

SET regEx = new RegExp

regEx.Pattern = sPattern

regEx.IgnoreCase = TRUE

regEx.Global = TRUE

SET Matches = regEx.Execute(sString)

IF Matches.Count > 0 THEN

RegExTest = true

ELSE

RegExTest = false

END
IF

END
Function