Tag: monitor
Proactive Patching alerts

As a SME or team lead, ever need to know ‘Proactive Patching alerts’? i.e. What servers need patches applied, aren’t patching, or were missed? This pack builds on three (3) pillars – Health/Security/Compliance, enabling Cyber teams and more. This became an alternate option to a complex pack, with SSRS report, used by a customer to identify systems. The report was long, and had many blank lines/pages, which required a re-write. This pack started with the pending restart monitor directly from the AquilaWeb reboot pack logic. The logic helps SysAdmin/Domain Admin/NOC/NOSC/SOC teams to know when servers need reboots. This need is driven further due to multiple reboots (sometimes) required with Windows monthly updates, and Application updates. Used across multiple customers, this is the first pack enabling a proactive stance to answer the ‘Am I compliant’ question.
Quick Download: https://github.com/theKevinJustin/ProactivePatchUptimeReboot/
Testing the Proactive Patch alerts
David Allen built the ‘Aquilaweb.Support.PendingReboot.Monitor.PendingReboot’ PowerShell monitor, to tell system owners when the pending restart flag was present. Some builds though, make system changes which repeatedly flip the registry key, causing many alerts. Also, downloading the Aquila pack is a trick, as TechNet was retired.
David provided a great idea, which was built upon. This gave rise to the question of, what if the server was not patched, or not rebooted in a period of time? With my Cyber hat on, this became the next piece of content to create. That gave rise to another question – do these scenarios need to reflect in health (monitor), or not (rule)? We’re all about choices, free will, so the pack is built with those options (rules disabled out of the box).

The pack is setup to alert with CBS application updates, SCCM/MECM/Config Mgr Endpoint Management updates, and Windows Updates. This has been my experience for the most accurate reflections of alerts on secure builds where Application/System Owner needs to take action.
Last Patch and Last Reboot monitor/rules in the download, are set to 45 days. Tune this value down, if patching occurs at the 30 day mark, increase if you need more time before alerts.

Otherwise, download and import into your environment. Depending on your subscription/notification settings, the Proactive set of alerts are built upon the Windows Operating System class. If subscriptions include the class, the notifications are automatic to System/Application owners.
Useful links
David Allen blog
Addendum, what does it mean blog
Deciding ‘Event Collection vs. Alert’ rule

Ever run through an event log scenario deciding ‘event collection vs. alert rule’ is the way to filter out the needle from the haystack? There’s a few ways to do this with Monitoring tools. If you’re cloud centric, a KQL query (assuming you’re collecting the event logs, if you’re using Operations Manager (SCOM), there’s a few ways to consume the events. SCOM ACS is basically a DB for collecting Security events, and typically is an unused feature in SCOM by most customers. Kevin Holman’s had many blog posts for ACS, testing the filter, as well as a management pack (MP) fragment (blog here, GitHub fragment library here).
Let’s walk through criteria deciding ‘event collection vs. alert rule’:
- Do the event(s) happen often? If so, how often?
- Can you filter the event description to limit the amount of gathered event?
- Do you need match count or samples before action required? (i.e. count x events in y time)
- Is there a regulatory or compliance requirement to collect every event?
- Is this something you want to visualize with PowerBI?
- For better visualizations, would the EventID help view/sort data in a tabular output? i.e. Think PowerShell property) as well as TimeRaised/TimeGenerated, and Event Description
Example – DC Security events
When there is a regulatory requirement to collect events, we need to decide ‘event collection vs. alert rule, and IF we can filter for specific pieces of the event. Holman has examples of alert parameters, and dynamic data, which are very useful to get the needles out of the haystacks. Depending on your goals, use event parameters, or leverage CustomFields in the alert to build required fields.
Depending on the requirements, event collection is useful to collect related EventID’s with RegularExpressions. Use Event rules WHEN action is required. Leverage Regular expressions help filter what we collect (via event collection or alert rule. By extension, utilize CustomFields in the alerts to help the presentation or SQL query towards a PowerBI report.
Let’s talk about regular expressions examples for rules (or monitors)
MatchesRegularExpression
<Expression>
<RegExExpression>
<ValueExpression>
<XPathQuery Type=”String”>EventDescription</XPathQuery>
</ValueExpression>
<Operator>MatchesRegularExpression</Operator>
<Pattern>^(Security ID:.*admin*)|^(Security ID:.*[des]a*)$</Pattern>
</RegExExpression>
</Expression>
<Expression>
<RegExExpression>
<ValueExpression>
<XPathQuery Type=”UnsignedInteger”>EventDisplayNumber</XPathQuery>
</ValueExpression>
<Operator>MatchesMOM2005BooleanRegularExpression</Operator>
<Pattern>^(4625|4740)$</Pattern>
</RegExExpression>
</Expression>
Contains example
<Expression>
<RegExExpression>
<ValueExpression>
<XPathQuery Type=”String”>EventDescription</XPathQuery>
</ValueExpression>
<Operator>ContainsSubstring</Operator>
<Pattern>Proactive DailyTasks ADDS Monitors close automation for</Pattern>
</RegExExpression>
</Expression>
<Expression>
<RegExExpression>
<ValueExpression>
<XPathQuery Type=”String”>Params/Param[2]</XPathQuery>
</ValueExpression>
<Operator>ContainsSubstring</Operator>
<Pattern>dnsserver</Pattern>
</RegExExpression>
</Expression>
DoesNotContain example
<Expression>
<RegExExpression>
<ValueExpression>
<XPathQuery Type=”String”>EventDescription</XPathQuery>
</ValueExpression>
<Operator>DoesNotContainSubstring</Operator>
<Pattern>None</Pattern>
</RegExExpression>
</Expression>
Holman MP Fragment example of specific EventID:
<Rule ID=”Rule.StateChangeAlerts” Enabled=”true” Target=”SCOMMagementServer.Class” ConfirmDelivery=”true” Remotable=”true” Priority=”Normal” DiscardLevel=”100″>
<Category>EventCollection</Category>
<DataSources>
<DataSource ID=”DS” TypeID=”Windows!Microsoft.Windows.EventCollector”>
<ComputerName>$Target/Host/Property[Type=”Windows!Microsoft.Windows.Computer”]/NetworkName$</ComputerName>
<LogName>TestAPP</LogName>
<AllowProxying>false</AllowProxying>
<Expression>
<And>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type=”UnsignedInteger”>EventDisplayNumber</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type=”UnsignedInteger”>600</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type=”String”>PublisherName</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type=”String”>APP Test Log Monitoring</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</And>
</Expression>
</DataSource>
</DataSources>
<WriteActions>
<WriteAction ID=”CollectToDB” TypeID=”SC!Microsoft.SystemCenter.CollectEvent” />
<WriteAction ID=”CollectToDW” TypeID=”SCDW!Microsoft.SystemCenter.DataWarehouse.PublishEventData” />
</WriteActions>
</Rule>
Lastly, let’s talk about the use of CustomFields to add additional data to the alert, but NOT in the event description (Holman’s blog here)
For the tabular view of alert data (from PowerShell as with SQL query of Alerts view, we might need to display the data, such as EventDisplayNumber, TimeRaised, Message, (alternate is Parameters, or UnformattedDescription). Additionally, check alert output details, from the SCOM MS in PowerShell via get-SCOMAlert -name “MonitorDisplayNameHere” | fl | more
Leverage Custom Fields to add
EventID $Data/EventDisplayNumber$
Event Category $Data/EventCategory$
Happy Authoring!
Additional links
https://learn.microsoft.com/en-us/answers/questions/69667/scom-event-collection-rule

SCOM Monitor reset logic

Ever want to reset SCOM monitors, and wish it was just a simple Reset Button for unhealthy monitors?
I’ve been using Scott Murr’s TechNet gallery loop to maintain my alerts, and ensure monitors are healthy for all my management packs.
The blurb I put in my DS/WA scripts to reset SCOM monitors. I build on Andrew’s methods I didn’t realize (just think much uglier code!)
My PowerShell variables to reset SCOM monitors, includes my Addendum and the core – DNS example provided below (thank you Andrew!)
## Grab the MP, get the Monitors and Rules from the MP, then grab all alerts found inside the Monitors/Rules
$SCOMCoreMP = Get-SCOMManagementPack -DisplayName “Microsoft Windows Server 2016 and 1709+ DNS Monitoring”
$SCOMAddendumMP = Get-SCOMManagementPack -DisplayName “Microsoft Windows Server 2016 DNS Monitoring Addendum”
$SCOMCoreRules = $SCOMCoreMP.GetRules()
$SCOMCoreMonitors = $SCOMCoreMP.GetMonitors()
$SCOMAddendumRules = $SCOMAddendumMP.GetRules()
$SCOMAddendumMonitors = $SCOMAddendumMP.GetMonitors()
$SCOMCoreReportAlerts = Get-SCOMAlert | ? { ($_.Name -in $SCOMCoreRules.DisplayName) -or ($_.Name -in $SCOMCoreMonitors.DisplayName) }
$SCOMCoreReportAlerts.Count
$SCOMAddendumReportAlerts = Get-SCOMAlert | ? { ($_.Name -in $SCOMAddendumRules.DisplayName) -or ($_.Name -in $SCOMAddendumMonitors.DisplayName) }
$SCOMAddendumReportAlerts.Count
$SCOMOpenReportAlerts = $SCOMAddendumReportAlerts | ? { ( $_.ResolutionState -ne “255” ) }
$SCOMOpenReportAlerts.Count
$SCOMOpenAddendumReportAlerts = $SCOMAddendumReportAlerts | ? { ( $_.ResolutionState -ne “255” ) }
$SCOMOpenAddendumReportAlerts.Count
$SCOMCoreRuleAlerts = Get-SCOMAlert | ? { ( $_.Name -in $SCOMCoreMonitors.DisplayName) -AND ( $_.ResolutionState -ne “255” ) }
$SCOMCoreRuleAlerts.Count
$SCOMAddendumRuleAlerts = Get-SCOMAlert | ? { ( $_.Name -in $SCOMAddendumRules.DisplayName) -AND ( $_.ResolutionState -ne “255” ) }
$SCOMAddendumRuleAlerts.Count
$SCOMCoreMonitorAlerts = Get-SCOMAlert | ? { ($_.Name -in $SCOMCoreMonitors.DisplayName ) -AND ( $_.ResolutionState -ne “255” ) }
$SCOMCoreMonitorAlerts.Count
$SCOMAddendumMonitorAlerts = Get-SCOMAlert | ? { ($_.Name -in $SCOMAddendumMonitors.DisplayName ) -AND ( $_.ResolutionState -ne “255” ) }
$SCOMAddendumMonitorAlerts.Count
$AutoClosed = $SCOMCoreMonitorAlerts.Count + $SCOMCoreRuleAlerts.Count + $SCOMAddendumMonitorAlerts.Count + $SCOMAddendumRuleAlerts.Count
$Test = $SCOMCoreReportAlerts.Count + $SCOMAddendumReportAlerts.Count
$OpenAlerts = $SCOMOpenReportAlerts.Count + $SCOMOpenAddendumReportAlerts.Count
$ResetMonitors = $SCOMCoreMonitors + $SCOMAddendumMonitors
$MonitorAlerts = $SCOMCoreMonitorAlerts.Count + $SCOMAddendumMonitorAlerts.Count
#
# If Cleanup needed, array of report monitors
# Reset Monitors Script
# Put ps1 in mgmtpacks folder
# https://sc.scomurr.com/scom-2012-monitor-reset-cleaning-up-the-environment/
# Download
# https://gallery.technet.microsoft.com/SCOM-2012-Batch-reset-63a17534
#Alternate
#https://gallery.technet.microsoft.com/scriptcenter/Auto-reset-script-for-d8b775ca
if ( $MonitorAlerts -gt 0 )
{
foreach ( $MonitorDisplayName in $ResetMonitors.DisplayName )
{
$Monitors = @( Get-SCOMMonitor -displayname $MonitorDisplayName )
# Set up monitor objects to reset
foreach ($Monitor in $Monitors)
{
$MonitorClass = Get-SCOMClass -Id $Monitor.Target.Id
$ActiveMonitors = Get-SCOMClassInstance -Class $MonitorClass | ? { ($_.healthstate -ne ‘Success’) -AND ( $_.healthstate -ne ‘Uninitialized’) -AND ($_.IsAvailable -eq $true) }
write-host “Found” + $ActiveMonitors.Count + “active monitors.”
if ( $ActiveMonitors -ne $null)
{
foreach ($ActiveMonitor in $ActiveMonitors)
{
write-host ” Resetting Health State on ‘” +$ActiveMonitor.FullName + “‘”
$ActiveMonitor.ResetMonitoringState($Monitor.ID)
}
}
}
}
}
Build FluentD conf file
Ready to build out a FluentD conf file?
Let’s build a FluentD conf file. We can use the docs site for another example. And now, let’s build a simple FluentD configuration file. Paste the XML code below, and save as <yourlogfile>.conf
Create custom log file to test
cd /etc/opt/microsoft/omsagent/scom/conf/omsagent.d/
# vi <yourlogfile>.conf
vi mylog.conf
# Example conf file
<source>
# Specifies input plugin. Tail is a fluentd input plugin – http://docs.fluentd.org/v0.12/articles/in_tail
type tail
# Specify the log file path. Supports wild cards.
path /var/log/mylog
# Recommended so that Fluentd will record the position it last read into this file.
pos_file /home/omsagent/fluent-logging/mylog.pos
# Used to correlate the directives.
tag scom.log.mylog
format /(?<message>.*)/
</source>
<filter scom.log.mylog>
type filter_scom_simple_match
regexp1 message 911
event_id1 911
</filter>
<match scom.log.mylog>
#Disable mutual Auth
enable_server_auth false
# Output plugin to use
type out_scom
log_level trace
num_threads 5
# Size of the buffer chunk. If the top chunk exceeds this limit or the time limit flush_interval, a new empty chunk is pushed to the top of the
queue and bottom chunk is written out.
buffer_chunk_limit 5m
flush_interval 15s
# Specifies the buffer plugin to use.
buffer_type file
# Specifies the file path for buffer. Fluentd must have write access to this directory.
buffer_path /var/opt/microsoft/omsagent/scom/state/out_scom_common*.buffer
# If queue length exceeds the specified limit, events are rejected.
buffer_queue_limit 10
# Control the buffer behavior when the queue becomes full: exception, block, drop_oldest_chunk
buffer_queue_full_action drop_oldest_chunk
# Number of times Fluentd will attempt to write the chunk if it fails.
retry_limit 10
# If the bottom chunk fails to be written out, it will remain in the queue and Fluentd will retry after waiting retry_wait seconds
retry_wait 30s
# The retry wait time doubles each time until max_retry_wait.
max_retry_wait 9m
</match>
Save (:wq!)
# Restart Agent
/opt/microsoft/omsagent/bin/service_control restart
# Check for errors – see blog
grep -i error /var/opt/microsoft/omsagent/scom/log/omsagent.log
# Test strings into your logfile
# Options
echo test >> /var/log/mylog
echo 911 error >> /var/log/mylog
# mimic syslog or messages syntax
echo `date +”%b %e %H:%M:%S”` MYLOG 911 test string. Call 911 >> /var/log/mylog
Please stay tuned for more management pack options to alert on the strings. Refer to the part1/2 blogs for more details on unit testing for alerts.
OMSAgent FluentD debunked – Configure Linux FluentD – part2
Now to begin – OMSAgent FluentD debunked
Configure Linux FluentD – part2 –> see part one (1) here)
First, my thanks to Mike Johnston@Microsoft (CSS SEE SME) to help validate my steps and testing, to configure Linux FluentD on an Ubuntu server! Are you ready to bust a myth – OMSAgent FluentD debunked
If you’re starting fresh, or just joining, start with Part 1. And Part 1 configures packs and assumes SCOM agent is installed and working. Because it’s time to use the feature, we need to get the agent configured and tested.
Part one (1) quick summary
- Verify pre-reqs – SCOM Linux Management packs for Linux/Universal Linux (2019 @ 10.19.1082.0), UNIX/Linux Log File monitoring (2019 @ 10.19.1008.0)
- Linux server has SCOM Agent installed, configured, and updated (sudoers configured) – GUI blog here
- Use docs.microsoft.com article
Load Sample Log monitoring pack
This piece is missing in the doc, but the content development team has this covered in a subsequent docs article. We need to load a sample log monitoring pack to the SCOM management group, so we can test functionality.
Grab the file here, otherwise you can copy/paste from the docs article pretty easily.
Verify OMED service running on Management Server
It’s now time to enable the OMED service on the management server, and we can start with the docs subsection
Navigation steps from SCOM console (GUI)
- From the Operations console, go to Monitoring>Operations Manager>Management Server>Management Servers State.
- Select the management server in the Management Servers state.
- From Tasks, select Health Service Tasks>Enable System Center OMED Server
Steps to set/start service PowerShell (as admin)
# Verify service startup type is automatic
get-Service OMED | select -property Name,Starttype
# Example output
—- ———
OMED Automatic
# Set startup type
# Start OMED service on SCOM management server (MS)
start-service OMED
Now we’re ready to test the UNIX agent!
Configure SCOM/OMSagent on Linux server
And now it’s time to switch to the agent side. I’m assuming that you’ve already configured the SCOM agent on the Linux server. So it’s time to verify the SCOM and OMSAgent is configured and working. Let’s go back to the docs subsection for our sanity check, because we need to create folders, and set ownership, etc.
Create files and set permissions
Verify SCOM certificate
Configuring FluentD requires the SCOM management server (MS) has signed the certificate on the UNIX server. The docs article tells you to generate a new certificate for FluentD, which requires the management server.
Overview
Sign the certs on the agent > copy to MS > sign > copy back to agent
Step by step instructions
- Generate certs
/opt/microsoft/scx/bin/tools/scxsslconfig -c -g /etc/opt/microsoft/omsagent/scom/certs/
2. Rename certificates
cp -p omi-host-server.domain.pem to scom-cert.pem
cp -p omikey.pem to scom-key.pem
3. Copy certs to MS (sftp/ssh via WinSCP, or your app of choice)
4. Sign certs on MS via scxcertconfig -sign
Open PowerShell (as admin)
Go to your SCOM management server directory (hopefully d:)
cd ‘D:\Program Files\Microsoft System Center\Operations Manager\Server’
scxcertconfig -sign scom-cert.pem
scxcertconfig -sign scom-key.pem
5. Copy certs back to agent from MS (sftp/ssh via WinSCP, or your app of choice)
6. Verify the SCOM certificate shows your Management Server (MS) in the DC= line in the certificate
openssl x509 -in scom-cert.pem -noout -text

7. Restart omsagent
As the ALLINONE server is one of my 2019 SCOM labs, I can verify that my cert is now signed by the management server (MS). Time to load the certificate, and then restart the agent to see if we have any errors
# Restart Agent
/opt/microsoft/omsagent/bin/service_control restart
Verify omsagent.log errors
Verify any errors from the omsagent.log
Depending on where you are with your UNIX/Linux commands, this may help provide some context or use case examples.
My example –
First error after restart was ‘permission denied’. FluentD runs under the omsagent ID, and needs to have access to whatever log – at least read (4). For the syslog example, I made omsagent the owner, and omiusers the group. The smarter, security hat on, choice is to leave as root and make it read capable, or add omsagent to the root group
Search /var/opt/microsoft/omsagent/scom/log/omsagent.log for errors. Commands build on another, from simpler to more complex. Don’t worry if UNIX/Linux is new, I’m all about examples, so hope that helps bridge the gap!
# Tail omsagent.log for progress
# Option 1 Continual output updates from file
tail -f /var/opt/microsoft/omsagent/scom/log/omsagent.log
# Option 2 – get last 10 lines
tail /var/opt/microsoft/omsagent/scom/log/omsagent.log
# Option 3 – get last 100 lines
tail -100 /var/opt/microsoft/omsagent/scom/log/omsagent.log
# Option 4 – Get a little fancier – search for a string
grep string /var/opt/microsoft/omsagent/scom/log/omsagent.log
# Option 5 – Specific example = error, case insensitive (-i)
grep -i error /var/opt/microsoft/omsagent/scom/log/omsagent.log
# Option 6 – egrep strings and -v to exclude what you don’t want to see
grep -i error /var/opt/Microsoft/omsagent/scom/log/omsagent.log |egrep -v “Permission denied|stacktrace”
Verify FluentD config files
Verify FluentD conf files and omsagent.conf has INCLUDE line
The INCLUDE lines allows a directory for a ‘Gold depot’ to control what log files are monitored on destination linux servers. The goal is a standard repository (gold depot ) to simply copy the conf file you want for logfile/app/daemon, restart agent, and you’re off to the races monitoring that log file.
Verify omsagent.conf includes directory
grep -i include /etc/opt/Microsoft/omsagent/scom/conf/omsagent.conf
# If there’s output, make sure that omsagent.d path exists
# Verify permissions show omsagent:omiusers
ls -al /etc/opt/Microsoft/omsagent/scom/conf | grep omsagent
10. Back to step 8’s problem, to fix the FluentD conf files, so we can test! Step 9 verified that FluentD is configured via the omsagent.conf, and also for specific configuration files (.conf) in omsagent.d directory.
Next, we need to restart the agent to verify configuration, and any errors are seen on the FluentD side.
My error for ‘out_scom’ plugin was already used by some other test conf files.
grep -i error /var/opt/Microsoft/omsagent/scom/log/omsagent.log |grep “Permission denied” |tail
Example of omsagent.log where we have traced an event for our mylog
Mike explained that my error was due to having multiple FluentD conf files using the same buffer path for ‘out_scom’. I searched the conf files to see who had ‘out_scom’ and removed one of my old test files from months back when I was testing the feature.
# Example of errors in the omsagent.log
Don’t forget to restart the omsagent for reading in the new file changes
# Restart Agent
/opt/microsoft/omsagent/bin/service_control restart
I’ll cover building a fluentd conf file in another blog post for brevity.
Time to test for alerts!
Time to test our FluentD conf file and append entries into the log file!
Starting simple again
# Options
echo test >> /var/log/mylog
echo 911 error >> /var/log/mylog
# Echo entries into test logfile to mimic syslog or messages
echo `date +”%b %e %H:%M:%S”` MYLOG 911 test string. Call 911
# Verify
tail /var/log/mylog
Switch over to SCOM management server, and look for alerts
Navigate to the Monitoring Tab > Active alerts
References for more information
In case you need a refresher on all the date options… Found CyberCiti FAQ helpful
All because the goal is to make the echo statement better for testing closer test/UAT examples on string matches, etc.
echo `date +”%b %e %H:%M:%S”` MYLOG 911 test string. Call 911
And what does it look like?
Configure Linux FluentD

What are you Fluent in?
Don’t worry, if you’re on the edge, this may be a Scotty moment of “I’m giving her all she’s got” with your current monitoring environment.
Maybe you’re thinking ‘convince me’, so what does FluentD provide monitoring wise?
- System Center Operations Manager (SCOM) 2016+ has enhanced log file monitoring capabilities for Linux servers.
- Wild card characters in log file name and path.
- New match patterns for customizable log search
- Use community published plugins versus having to build from scratch
I wanted to take a moment to validate the steps provided, not just because I’ve had a pretty large part of my career supporting cross-platform environments for large enterprise companies. So, let’s get started!
Review the FluentD setup procedure
Let’s speed up the ‘do’ part. Review the procedure at docs.microsoft.com
FluentD basic operation here
Configuration Overview here
FluentD pre-reqs
Server types are covered = Linux (RedHat/Ubuntu)
Management Packs required for SCOM
- Load latest Linux Operating system management packs (2019).
- Find the pack download here
- Load the relevant Linux or Universal Linux packs
- Verify Microsoft.Linux.Log.Monitoring pack is loaded
Verify ID/password and sudoers capability for root
Use docs article if you need additional assistance to install agent on the Linux server. Alternatively, use my blog posts for PowerShell or GUI install
Update agent to latest release
Grab the latest OMS Agent release from github
- From server command line:
wget https://raw.githubusercontent.com/Microsoft/OMS-Agent-for-Linux/master/installer/scripts/onboard\_agent.sh
sh onboard_agent.sh
- The wget command above will install the OMSAgent and it’s pre-req packages
- This includes OMI, scx, OMSAgent, OMSConfig, auoms, Apache, Docker, MySQL (if the last three apply to your Linux server)
Load Management packs from latest UNIX release.
Verify packs are version 10.19.1082.0
Navigation steps:
From SCOM Console > Administration Tab > Installed Management packs
After adding the updated Linux Management packs
Linux server screenshots installing OMSAgent via wget command
Stay tuned, I’m currently testing FluentD configuration on my 2019 UR1 lab environment on Ubuntu16.
Using Unix MP’s for Shell commands and scripts
Ready to move out of the UI ?
Thanks to Saurav Babu, and Tim Helton’s help, I was able to push my MP authoring limits further.
The good thing with the Shell command template in SCOM is that your script is encoded.
Bad news
- If functionality doesn’t exist in the UI, you can’t easily pull the monitor and just add variables to get that functionality.
- Scripts and Shell commands are encoded (great news for security!)
Now to the use case – need Sample Count and Match Count to prevent false positive alerts
The UNIX Shell Command library allows us to use the following variables out of the box:
Interval, SyncTime, TargetSystem, UserName, Password, Script, ScriptArgs, TimeOut, TimeOutInMS, HealthyExpression, ErrorExpression
AND we can override Interval, Script, TimeOut, TimeOutInMS
If that’s not enough options, then read on!
When the built-in functionality doesn’t exist
For this UNIX shell command/script monitor, we required SampleCount and MatchCount
Variables explained
SampleCount is the number of times (samples for an alert).
If SampleCount = 4, this means 4 samples will generate an alert
MatchCount is the number of intervals before monitor state changes
If Interval = 60 (s), and MatchCount = 10, then it will take 10 minutes (600s before we alert)
Combining the 2 means 4 samples over 10 minutes will generate an alert.
Sometimes this is called alert suppression or counting failures before alerting
Built a custom DataSource, ProbeAction, and WriteAction, as the UNIX Shell Library MP did not include these additional variables.
Please review my updated MP Fragments TechNet Gallery for the custom MP and fragments!
https://gallery.technet.microsoft.com/Uncommon-Custom-MP-c5a12a86
Encoding the script or command to run
The other issue with UNIX scripts and commands, is the UI encodes the scripts.
How do we get around it you ask?
Since we are building an MP Fragment and MP, we must figure out how to encode.
To encode the script to put into your SCOM monitor (and MP Fragment)
Example
$script = ‘if [ `ps -ef | grep sleep | grep -v grep | wc -l` -eq “1” ]; then echo false; else echo true; fi’
# Verify script variable
$script
# Get $script bytes
$s = [System.Text.Encoding]::UTF8.GetBytes($script)
# Verify script bytes output (optional as bytes broken out by line)
$s
# Encode script to Base64
$encoded = [System.Convert]::ToBase64String($s)
# Verify $encoded
$encoded
# Optional
# Verify string converts back properly
[System.Text.Encoding]::UTF8.GetString($s)
$encoded output is what needs to be entered into the <script></script> variable in your monitor
Example Output
PS C:\Users\scomadmin\desktop> $script = ‘if [ `ps -ef | grep sleep | grep -v grep | wc -l` -eq “1” ]; then echo false;
else echo true; fi’
PS C:\Users\scomadmin\desktop> $script
if [ `ps -ef | grep sleep | grep -v grep | wc -l` -eq “1” ]; then echo false; else echo true; fi
PS C:\Users\scomadmin\desktop> $s = [System.Text.Encoding]::UTF8.GetBytes($script)
PS C:\Users\scomadmin\desktop> $s
PS C:\Users\scomadmin\desktop> $s = [System.Text.Encoding]::UTF8.GetBytes($script)
PS C:\Users\scomadmin\desktop> $encoded = [System.Convert]::ToBase64String($s)
PS C:\Users\scomadmin\desktop> $encoded
aWYgWyBgcHMgLWVmIHwgZ3JlcCBzbGVlcCB8IGdyZXAgLXYgZ3JlcCB8IHdjIC1sYCAtZXEgIjEiIF07IHRoZW4gZWNobyBmYWxzZTsgZWxzZSBlY2hvIHRydWU7IGZp
PS C:\Users\scomadmin\desktop> [System.Text.Encoding]::UTF8.GetString($s)
if [ `ps -ef | grep sleep | grep -v grep | wc -l` -eq “1” ]; then echo false; else echo true; fi
PS C:\Users\scomadmin\desktop>
References
Jonathan Almquist’s blog post
Kevin Holman’s blog on service with Samples
Clarification on Registry Key discoveries
Ran across this in my travels, difficulty getting a monitor to work properly
To clarify some of the registry MP fragments, make sure you follow the whole path
This post is to help with using the Monitor.RegistryValue.Exists.mpx fragment
Example – Verify Registry Key under TestService
This is an excerpt from the MP Fragment header
%%
Description:
This fragment includes a Monitor which checks for the existence of a registry VALUE
RegValuePath – needs to be in the format of “SOFTWARE\Microsoft\CCM\HttpPort” or “SYSTEM\CurrentControlSet\Services\CcmExec\Start” as HKLM is assumed
RegValueName – needs to be the actual Reg VALUE name or your description of it (NO SPACES or special characters allowed) such as “HttpPort”
Version: 1.1
LastModified: 29-May-2017
%%
In the MP Fragment, you substitute the variables
<AttributeName>##RegValueName##</AttributeName>
<Path>##RegValuePath##</Path>
<AttributeName>ObjectName</AttributeName>
<Path>SYSTEM\SysInfo\AppName</Path>
Registry Key = HKLM\SYSTEM\CurrentControlSet\Services\HealthService\Test
Fragment variable (##RegValueName##) = SYSTEM\CurrentControlSet\Services\HealthService\Test
AttributeName or ##RegValueName## is simply whatever you want to call the attribute
Simply the name of the Registry value for my example is Test
Substitute ##RegValueName## for Test
If you’re testing in the lab, decrease frequency so you don’t have to wait as long
<Frequency>120</Frequency>
Remember to increase the frequency when you’re done
Upload MP (don’t forget to version your pack!)
Watch Health Explorer and test away adding or removing your key
Helpful testing tips to add a key to the registry and flip the health
reg add “HKLM\System\CurrentControlSet\Services\TestService” /v “Test” /t REG_SZ /d Test
reg delete “HKLM\System\CurrentControlSet\Services\TestService” /v “Test”
Get to know your monitor
Ever need to disable a specific monitor?
I know I get tired of clicking through the console, maybe you do too?
Do you know the Monitor name and class?
If yes, then you can enable/disable monitors from PowerShell
So let’s get started.
From your management server, you can run SCOM commands as your ID (assuming your ID is set up in SCOM)
This example has 2 purposes:
- SQL2016 SP1 does NOT populate the proper fields, and will be fixed in SP2 per the SQL Engineering blog (Look at comments section – blog here)
- Tired of the warning alerts in my SCOM console
Find the monitors
$Monitor = get-scommonitor | where { $_.DisplayName -like “Service Pack Compliance” } | where { $_.Name -like “*Microsoft.SQLServer.2016.DBEngine*” }
Let’s focus for a second on some differences, and how you can interchange the two depending on what information you know
DisplayName attribute is what you see in the console (note the spaces)
Name attribute typically has dots for the spaces
Override a class
Disable-SCOMMonitor -Class $Class -ManagementPack $MP -Monitor $Monitor
Just in case you need to undo the override
Enable-SCOMMonitor -Class $Class -ManagementPack $MP -Monitor $Monitor
Override a group
$Group = (Get-SCOMGroup -DisplayName “Group*”)
# Enable the group
Enable-SCOMMonitor -Group $Group -ManagementPack $MP -Monitor $Monitor
# Disable the group
Disable-SCOMMonitor -Group $Group -ManagementPack $MP -Monitor $Monitor
Reference Links
Disable-SCOMMonitor https://docs.microsoft.com/en-us/powershell/systemcenter/systemcenter2016/operationsmanager/vlatest/disable-scommonitor
Enable-SCOMMonitor https://docs.microsoft.com/en-us/powershell/systemcenter/systemcenter2016/OperationsManager/vlatest/Enable-SCOMMonitor