See the nice warm toasty updated packs
Fresh off the press, right to your door, just in time for that gift for your special someone! Time for new updates to keep you ever-green’d, up to date, fixes, etc. ;-P
Holman updated his SCOM.Management pack for SCOM2022 UR2
Addendum packs updated
Multiple packs with multiple updates. Removed debug detail for DS/WA (Data Source/Write Action workflows) Health Explorer outputs, simplified mgmt pack recovery tasks for single WA script.
Active Directory Certificate Services (ADCS) version agnostic 2016+ addendum https://github.com/theKevinJustin/ADCS2016-Addendum 2012 here See the blog post for capabilities here https://kevinjustin.com/blog/2023/08/18/adcs-addendum-packs/
Active Directory Domain Services addendum https://github.com/theKevinJustin/ADDSAddendumAgnostic
See the blog post for capabilities here https://kevinjustin.com/blog/2023/08/18/adds-addendum-pack/
Active Directory Federation Services addendum https://github.com/theKevinJustin/ADFSAddendum See the blog post for capabilities here https://kevinjustin.com/blog/2023/08/18/adfs-addendum-pack/
FileServices Agnostic addendum https://github.com/theKevinJustin/FileServicesAddendum See the blog post for capabilities here https://kevinjustin.com/blog/2023/08/31/file-services-addendum/
MCM/MEM/MECM/SCCM Configuration Manager addendum https://github.com/theKevinJustin/MECMSCCMAddendum See the blog post for capabilities here https://kevinjustin.com/blog/2023/08/30/mecm-sccm-addendum-pack/
PKI certificate monitoring addendum https://github.com/theKevinJustin/PKIAddendum See the blog post for capabilities here https://kevinjustin.com/blog/2023/08/24/pki-addendum-pack/
Proactive NOSC DailyTasks reports addendum https://github.com/theKevinJustin/ProactiveNOSCDailyTasks See the blog post for capabilities here https://kevinjustin.com/blog/2023/08/15/proactive-daily-reports/
SCOM Core addendum https://github.com/theKevinJustin/SCOMCoreAddendum See the blog post for capabilities here https://kevinjustin.com/blog/2023/08/30/scomcore-addendum-pack/
Top Process workflows tied to monitors in Tier1 https://github.com/theKevinJustin/TopProcessTier1 See the blog post for capabilities here https://kevinjustin.com/blog/2023/08/15/proactive-daily-reports/
Windows Server 2012/2012R2 Operating System Addendum https://github.com/theKevinJustin/2012OSAddendum See the blog post for capabilities here https://kevinjustin.com/blog/2023/08/28/os-addendum-packs/
Windows Server 2016+ version agnostic Operating System Addendum https://github.com/theKevinJustin/2016ServerAgnostic See the blog post for capabilities here https://kevinjustin.com/blog/2023/08/28/os-addendum-packs/
‘NiCE VMware addendum’ enhances VMware monitoring, tuning alerts to ‘manual intervention’ required alerting. The NiCE folks have been around for some time as a trusted Microsoft partner, creating additional monitoring functionality across Microsoft products. Having completed a number of projects implementing the VMware pack, it’s time to share the configuration and alert report capabilities.
Quick Download HTTPS://GITHUB.COM/THEKEVINJUSTIN/NICEVMWAREADDENDUM/
Changes to Nice vmware pack
Key breakdown of VMware ESX environment monitoring
Adjustments to vendor pack to further the mantra ‘alert when manual intervention required’.
Set monitor alerts to multiple samples over an hour (i.e. compute and performance of ESX environment)
Reports by team (requires regular expression updates for environment servers owned by each team)
Monitor reset logic, and service monitorType (count logic for X failures over Y time, before alert)
Overrides to change vendor pack provided discoveries, rules, monitors
Remove alert noise for unmanaged objects in ESX environment
Customize pack for environment
Customize the ‘NiCE VMware addendum’ pack for specific environment. This means updating group discoveries, and GUIDs for group specific overrides. Further updates are required to update server naming conventions for team virtualization reports.
Classes/groups created for pack
Breakout of Discoveries that need pattern updates to match
Find/Replace ##ESXHostDataStoreNamingConventions## with names to exclude
Example of regular expressions for multiple customers
Update disable guest machine alerts
Disable guest machines in ESX environment to disable alerts.
Replace with relevant guest naming conventions
Example template/guest/virtual machine names typically disabled
Service MonitorType adds Samples and Intervals to alert after consecutive failures (x failures in y minutes then alert )
Rules, Monitors, Recoveries
List of workflows used to troubleshoot/resolve problems
NiCE VMware management pack https://www.nice.de/nice-vmware-mp/
IIS addendum packs to tune IIS from 2012 forward.’IIS addendum packs’ to tune IIS from 2012 forward. The GitHub repository has two packs 2012/2016+ (version agnostic pack). This includes an IIS enabled group, Daily report and cleanup DataSource and WriteAction (tasks), as well as a regular expression to set up the IIS enabled group. The IIS enabled group is to enable IIS monitoring on servers IIS monitoring is needed.
Customize for environment
Update addendums to server naming conventions for enabled IIS monitoring. Read below to better understand addendum functionality.
First, the addendums include class/group, datasource and write action alert reports and automated alert closure workflows, as well as event count logic/reset monitorType.
Second, the group discovery, find/replace the pattern to various application/web server naming conventions where IIS monitoring IS wanted.
Third, the version agnostic has overrides to disable most perf and rule alerts. Can provide OFF packs to turn off performance counter collection rules, to keep both the OperationsManager, and OperationsManagerDW databases cleaner, thereby faster with less data.
Lastly, once addendum updated, save file, move to SCOM MS, and import.
Enjoy the ‘IIS addendum packs’ for how few alerts, perhaps life changing?! (sarcasm)
Download Addendum packs https://github.com/theKevinJustin/IISAddendums
IIS2012 SCOM Management pack download https://www.microsoft.com/en-us/download/details.aspx?id=34767
IIS2016+ SCOM management pack download https://www.microsoft.com/en-us/download/details.aspx?id=54445
Time to configure the Microsoft System Center Core Monitoring pack per health model and best practice. That’s where the SCOMCore Addendum pack comes in. Addendum adds High Agent Handle count group, daily report and alert closure automation, and rule/monitor overrides. Some assembly required – update the discovery pattern for offending high handle counts, and high handle count group ContextInstance GUID after import.
Quick Download: https://github.com/theKevinJustin/SCOMCoreAddendum
While High Agent Handle count was more an issue before the x365 platform migrated UC, SharePoint, and email (i.e. Lync/Skype, SharePoint, Exchange on prem) went to the cloud. This is still seen where cloud scalability options and virtualization/storage limitations exist. Example typically is an over-utilized virtual machine in hybrid/IaaS/premise scenarios. Kevin Holman caught this performance issue years back, creating monitoring alerts pack and blog. In case you’re on SCOM jeopardy, the LAW/OMS/Microsoft Monitoring Agent/SCOM agent has a built-in health check. The built-in health check restarts service when Handle Count or memory of the HealthService (aka Microsoft Monitoring Agent service) ran too hot per SCOM PG. SCOM agent restarts caused config churn, and high compute, as workflows re-ran after the service restarted.
Assess agent restarts
Begin by verifying if you have Kevin Holman’s pack for SCOM agent restarts downloaded and installed, which sets memory/handle count informational alerts https://github.com/thekevinholman/SCOM.AgentThresholds
Validate pack installed
Configure addendum for environment
Download and Install ‘SCOMCore Addendum pack’ here
Open saved XML in notepad or Notepad++ (your favorite XML editor here!)
Update the regular expression pattern line for offending servers in the
Figure out the group GUID for the high agent handle count
From PowerShell on SCOM management server, run:
Get-SCOMClassInstance -DisplayName “Proactive High Agent Handle Count servers” | fl DisplayName,ID
Save file and Import > enjoy less alerts!
Kevin Holman blog on SCOM agent restarts
Holman’s pack for SCOM agent restarts and setting memory/handle count alerts https://github.com/thekevinholman/SCOM.AgentThresholds
Addendum download https://github.com/theKevinJustin/SCOMCoreAddendum
The ‘MSSQL Addendum pack’ wouldn’t be possible without Brandon Pires contributions. Brandon dealt with my many questions to better alert! If you need more background, check the ‘why addendum pack’ post.
The pack is based on the SQL engineering blog and program team making multiple updates per year for SQL monitoring. The addendum creates two groups for dev/test and notification/subscription modeling. Second, the overrides, man there are a bunch! aid consumption of real issues. Lastly, most environments should be SQL 2016+, as the 2012R2 EOL/EOSL is quickly approaching in October!
MSSQL group discoveries require updates to be applicable to environment
First, the Addendum pack requires the MSSQL packs MUST be installed. The addendum is based on the MSSQL 2016+ version agnostic is currently supported, as the 2012,2012R2 products are near end of support.
Find/Replace the variables as needed:
Addendum pack contains discovery, monitor, and rule overrides to tune MSSQL to CSA (old PFE/CE/CSAe Microsoft Field engineer recommendations), to match the health model reducing critical ‘wake me up in the middle of the night’ alerts.
Download pack, and save to your environment
Import into SCOM
MSSQL Addendum references
SQL Releases TechCommunity here
Engineering team latest management pack, TechCommunity release v18.104.22.168
Import ‘gotcha’ importing new custom functionality blog
To begin, the ‘ADFS addendum pack’ needs acknowledgement of the contributors who dealt with my many questions to better alert on AD issues! My thanks to Jason Windisch for his help and expertise with Active Directory Federation Services (ADFS). If you need more background, check the ‘why addendum pack’ post. BTW, what do you associate with the word – Federation?
Overview of capabilities
The Active Directory Federation Services ‘ADFS Addendum pack’ configures ADFS group of related classes for notification/subscription modeling. Second, the rules, service monitors, tasks, service recovery, alert cleanup, and summary reports aid consumption of real issues. Third, if you have ADFS2012R2, I have an addendum pack, but coordination necessary to get the ADFS management packs MSI (not currently available). Lastly, most environments should be 2016+, as the EOL/EOSL is quickly approaching in October!
ADFS Group discovery requires server names applicable to environment
Tailoring the pack(s) to your environment
First, the Active Directory Federation Services management packs MUST be installed for the ‘ADFS Addendum pack’ to load. 2016+ agnostic is currently supported, as the 2012,2012R2 products are near end of support.
Find/Replace the variables as needed
First, the DataSources (DS) and WriteActions (WA) clean up alerts, create daily reports, where the WA are the on-demand tasks versions.
Data source (DS) scheduled workflows run weekdays between 0600-0700 local SCOM management server local time. The summary and team reports (run during this time) summarize key insights. NOTE: the Monday report gathers the last 72 hours, so administrators get a ‘what happened over the weekend’ view. Tuesday-Friday reports are past 24 hours. Lastly, the group policy report summarizing unique GPUpdate error output.
Addendum pack rules schedule data source execution, add on-demand tasks. The service monitor, and Recovery tasks add service recovery automation to bring us to the ‘manual intervention required’ alerting. There are a few monitor/rule overrides to match the health model.
Download updated ‘ADFS addendum pack’ and save to your environment
Import into SCOM
ADFS 2016+ management pack download
‘Why addendum packs’? What value can they bring to my customer? Kevin Holman started the Addendum thought process quite a while back. Added functionality to a core application/program/product. The first example of this pack naming convention is his SQL RunAs Addendum to simplify SQL monitoring. Let’s break down a number of examples how the SCOM community has built packs to better monitoring, and how I believe the addendum packs bring IT Ninja lessons from Microsoft experts monitoring to your environment.
Why Addendum packs
Better monitoring from the experts, including customer examples for other ‘blind spots’ in monitoring. Blind spots consist of ‘not monitored’ pieces of infrastructure, from simply an event, ping, service, tcp port check, process, web site, scripted workflow, with the purpose to identify a problem.
The goal of monitoring is to:
Identify, self-heal, automatically run recovery or diagnostic workflows alert when manual intervention is required. Doesn’t matter what tool you use, they all do some portion of these steps.
The addendum packs do these things, adding a few differentiators.
Auto closure daily scripts (close rules/monitors)
Auto reports of problems (M-F 0600-0700 local, reflecting last 24-72 hours of open/closed alerts)
Employ count logic (x in y time)
Self-heal monitors with no new events
Adjust alert severities to health model
where critical (red) = outage, warning (yellow) = issue, informational reports or FYI’s
Capable of updating alerts (status, owner, ticketID+)
Tasks to run workflows on-demand
Recovery tasks – (i.e. service restart automation or TopProcess, Logical disk cleanup, MECM Client cache clean )
Integrate additional monitoring (like DFS replication queue script/alerts)
Synthetic checks for DNS and web applications
Web Availability and Transactional monitoring, ADFS, CRL, PowerShell Invoke-WebRequest, and more
Security and Compliance checks
Imagine I forgot something capability wise.
Stay tuned, as this builds into an even better outcome, quality data into ‘a single pane of glass’ of multiple tools within PowerBI.
Ever wish you had task manager output when you had a monitor go unhealthy? Following Kevin Holman’s lead to ‘Monitor Processes‘, the idea landed to build out the ‘Top Process PowerShell script’. This morphed into a management pack with Knowledge entries to better explain what is being done. Integrating Top Process into Health Explorer output as a recovery task helped provide another step before alerting. The idea started from the need to prove which Security tool(s) were causing the over-utilized compute spikes, causing non-responsive server(s). Thinking back to my UNIX days, we simply used top, vmstat, iostat, and other commands to identify problematic processes. Integrating PowerShell scripts into SCOM is part of the fun, then linking the obfuscated Security processes for the final output. From there, extrapolate into Azure Functions or Azure Logic apps, for additional functionality for cloud native monitoring.
Quick Download: https://github.com/theKevinJustin/TopProcess
Tier1 separated monitoring (no AD) https://github.com/theKevinJustin/TopProcessTier1
Building out the ‘Top Process PowerShell script’
Kevin Holman built a ‘ Monitor.Performance.ConsecSamples.ThenScript.TwoState.mpx fragment, beginning the logical journey. His fragment helped me start with a working model, taking processes and cores into consideration for true CPU usage on multi-core servers.
We need to see the processes, and their corresponding value, then build an output table (custom object). After gathering the processes, feed the TopProcesses array, lastly sorting the array for CPUValue
Next, we’ll want to see what applications/tools might be involved, including Active Client, IIS, monitoring, and EndPoint Management tools (keep things honest!).
Then we build an output of the data so we can take the datasource (DS) or WriteAction (WA) into a scripted monitor/rule, or recovery tasks linked to various monitors. Even built a forked version in case of SAW/Red Forest, separating Tier0 monitoring from Tier1 (snippet below is NOT that pack)
It’s that time to figure out the ConfigMgr SMS role alerts – If you are monitoring your SCCM/MECM environment, then you get role failure alerts. Many times, the Operations Helpdesk, NOSC, NOC, SOC, etc. will get alerts when various roles fail on the Configuration Manager platform. The common ask is why, what do you see, etc. Much like Exchange, ConfigMgr internalizes the checks that are seen in the console as registry keys or events documenting said degraded component/feature. Helping the MECM administrator understand the failure is key to decoding how to notify administrator, and when the helpdesk needs to act on ‘ConfigMgr SMS role alerts’.
Example – MECM/SCCM looks at replication probe action state $Config/RoleName$
The role check is based on a variable of the RoleName in a registry key that the application updates.
This is the origin of ConfigMgr SMS role alerts
HKLM:SOFTWARE\Microsoft\SMS\Operations Management\SMS Server Role\$Config/RoleName$\Availability State
1 is critical state
2,3,4 are warning states
If more details are needed, download SCCM/MECM Management Pack for SCOM here
Use Tyson’s SCOM Helper pack to unseal, and inspect XML.
Once you know the origin of the ConfigMgr SMS role alerts, you can begin tuning the MECM alerts to your environment. Understanding role alerts will help both teams understand MECM application health. First, use MECM application health to trend alerts/outages. Second, leverage maintenance mode schedules, or MM scripts to NOT monitor for common administration tasks. From my experience, the alerts are the result of MECM Admins maintaining the application – common actions like building application/package lists, cleanup actions, site maintenance, backups, etc. Lastly, set up a subscription to notify after the tuning discussion. See my blog on building a subscription for more details.