August 2023 - Kevin Justin's Blog

NiCE VMware addendum

‘NiCE VMware addendum’ enhances VMware monitoring, tuning alerts to ‘manual intervention’ required alerting. The NiCE folks have been around for some time as a trusted Microsoft partner, creating additional monitoring functionality across Microsoft products. Having completed a number of projects implementing the VMware pack, it’s time to share the configuration and alert report capabilities.

Quick Download HTTPS://GITHUB.COM/THEKEVINJUSTIN/NICEVMWAREADDENDUM/

Changes to Nice vmware pack

Key breakdown of VMware ESX environment monitoring

NiCE VMware monitoring features for ESX, vSphere, vSAN environments

Adjustments to vendor pack to further the mantra ‘alert when manual intervention required’.

Set monitor alerts to multiple samples over an hour (i.e. compute and performance of ESX environment)

Reports by team (requires regular expression updates for environment servers owned by each team)

Monitor reset logic, and service monitorType (count logic for X failures over Y time, before alert)

Overrides to change vendor pack provided discoveries, rules, monitors

Remove alert noise for unmanaged objects in ESX environment

Customize pack for environment

Customize the ‘NiCE VMware addendum’ pack for specific environment. This means updating group discoveries, and GUIDs for group specific overrides. Further updates are required to update server naming conventions for team virtualization reports.

Classes/groups created for pack

Discoveries

Breakout of Discoveries that need pattern updates to match

Find/Replace ##ESXHostDataStoreNamingConventions## with names to exclude

Example of regular expressions for multiple customers

Update disable guest machine alerts

Disable guest machines in ESX environment to disable alerts.

Find ##ESXGuestServersDiskUsageNamingConventions##

Replace with relevant guest naming conventions

Example template/guest/virtual machine names typically disabled

Service MonitorType

Service MonitorType adds Samples and Intervals to alert after consecutive failures (x failures in y minutes then alert )

Rules, Monitors, Recoveries

List of workflows used to troubleshoot/resolve problems

Documentation

NiCE VMware management pack https://www.nice.de/nice-vmware-mp/

File Services Addendum

‘File Services Addendum’, named Microsoft Windows Server FileServices 2016 Addendum, adds replication health/backlog script, seed and group classes, replication/service monitors, recovery tasks, and overrides to tune monitored environment.

Quick Download HTTPS://GITHUB.COM/THEKEVINJUSTIN/FILESERVICESADDENDUM

Overview of File Services monitoring

Addendum assumes the file services version agnostic version 10. pack is installed.

Looking at XML file in Notepad++, the pack references are what packs the workflows refer to (other management packs). Kevin Holman taught building backwards compatibility with MP authoring. Backwards compatibility allows SCOM2012+ import without errors. To take this one step further, the v10.0.0.0 file services packs referenced represent the version agnostic packs.

NOTE: File Services Addendum references may need updates if the whole file services management packs are NOT installed.

References screenshot

Addendum logic

Capabilities

Daily report and close automation, on-demand tasks for reports

DFS backlog script errors

SmSvc, DFSN, DFSR service recovery and rule alerts (from Holman fragments library)

DFS replication backlog watcher, script, alerts

Notepad++ screenshot

Next, we look at the group/class discoveries

Update the Class/Group discoveries for DFS servers or script install paths for replication script.

Update Class/Group discoveries for DFS servers or script install paths for replication script.

Find and replace FilePath and ##DFSServerNamingConvention## variable.

Save file and Import

Documentation

Kevin Holman MP authoring with fragments https://kevinholman.com/2019/01/17/mp-authoring-with-fragments-introducing-combo-fragments/

Kevin Holman MP fragment library https://github.com/thekevinholman/FragmentLibrary

Addendum GitHub Repository HTTPS://GITHUB.COM/THEKEVINJUSTIN/FILESERVICESADDENDUM

IIS addendum packs

IIS addendum packs to tune IIS from 2012 forward.’IIS addendum packs’ to tune IIS from 2012 forward. The GitHub repository has two packs 2012/2016+ (version agnostic pack). This includes an IIS enabled group, Daily report and cleanup DataSource and WriteAction (tasks), as well as a regular expression to set up the IIS enabled group. The IIS enabled group is to enable IIS monitoring on servers IIS monitoring is needed.

Customize for environment

Update addendums to server naming conventions for enabled IIS monitoring. Read below to better understand addendum functionality.

First, the addendums include class/group, datasource and write action alert reports and automated alert closure workflows, as well as event count logic/reset monitorType.

Second, the group discovery, find/replace the pattern to various application/web server naming conventions where IIS monitoring IS wanted.

Third, the version agnostic has overrides to disable most perf and rule alerts. Can provide OFF packs to turn off performance counter collection rules, to keep both the OperationsManager, and OperationsManagerDW databases cleaner, thereby faster with less data.

Lastly, once addendum updated, save file, move to SCOM MS, and import.

Enjoy the ‘IIS addendum packs’ for how few alerts, perhaps life changing?! (sarcasm)

Documentation

Download Addendum packs https://github.com/theKevinJustin/IISAddendums

IIS2012 SCOM Management pack download https://www.microsoft.com/en-us/download/details.aspx?id=34767

IIS2016+ SCOM management pack download https://www.microsoft.com/en-us/download/details.aspx?id=54445

Proactive Security bundle

Proactive Security bundle to help with three (3) various DC authentication event sets encompassing Kerberos, NetLogon, and DCOM. These events were enabled as part of the server cumulative patches. The management packs run workflows on the servers, then combine into a daily alert report of the unique event description details.

Quick Download HTTPS://GITHUB.COM/THEKEVINJUSTIN/DCAUTHALERTS

Save the files from GitHub to your local SCOM MS and import.

Proactive Security bundle components

Proactive DC Kerberos KDC Authentications 1.0.0.1
Download: https://github.com/theKevinJustin/DCAuthAlerts
Documentation: https://kevinjustin.com/blog/2023/08/30/DC-Auth-Alerts/
Purpose: Monitor DC Kerberos authentication alerts on CA, DC role servers, as well as any operating system. Daily alert report consolidates alerts as well as on-demand report tasks.
Change Impact: Low
Security Impact: Low
Any testing needed: No

Proactive DC NetLogon Allowed Sessions 1.0.3.1
Download: https://github.com/theKevinJustin/DCAuthAlerts
Documentation: https://kevinjustin.com/blog/2023/08/30/DC-Auth-Alerts/
Purpose: Monitor DC NetLogon authentication alerts on DC role servers. Daily alert report consolidates alerts as well as on-demand report tasks.
Change Impact: Low
Security Impact: Low
Any testing needed: No

Proactive Microsoft Windows DCOM Server Security Bypass 1.0.0.8
Download: https://github.com/theKevinJustin/DCAuthAlerts
Documentation: https://kevinjustin.com/blog/2023/08/30/DC-Auth-Alerts/
Purpose: Monitor DC DCOM security bypass event ID’s 10036,7,8 in Security EventLog. Pull from DC and run SCOM alert report, as well as on-demand report task.
Change Impact: Low
Security Impact: Low
Any testing needed: No

MECM/SCCM Addendum pack

The ‘MECM/SCCM Addendum pack’ started from administrators and field engineers’ inputs on actionable/manual intervention required alerts. While Endpoint Management has taken on a number of names over the past few years, monitoring the platform functionality has stayed pretty much the same. The underlying application infrastructure is based on registry key discovery of installed roles.

Quick Download https://github.com/theKevinJustin/MCMAddendum

Tailor the addendum for environment

Add monitoring for MECM servers per health model through daily team report, alert cleanup, custom groups to address subscription objects, servers, custom disk and client cache cleanup workflows, and lastly service restart automation.

Quick overview

The classes and DataSource/WriteAction alert reports require updates to target server naming convention(s). The alert report is most effective this way, only giving the administrator/AppOwner alerts relevant to owned/supported servers. Why – make the changes most effective, i.e. alert when manual intervention required.

Workflows, classes, and MonitorType

Update Discovery to find/replace hashtags

Leveraging Kevin Holman’s MP fragment find/replace common variables notated by the ##variable##, we begin by updating the ##MECMServerNamingConvention## with a regular expression of the servers involved with Configuration Management.

Second, we update the disk specific alerts if drives fill, where different amounts of space is required to alert before application/server crashes, different than the OS Logical Disk full composite alerts for % and MB free alerts. These disk specific updates allowing administrator to get unique alerts for common disk full scenarios.

Third, update MECM Group discoveries for various regular expressions.

Lastly, review MECM Rules, Tasks, Monitor and Overrides for pack functionality.

After updating relevant pieces, save file, move to SCOM MS, and Import.

My customers have loved this, hopefully this experience is shared!

Documentation

Kevin Holman MP fragments

Endpoint Management https://learn.microsoft.com/en-us/mem/endpoint-manager-overview

Microsoft System Center 2012 Configuration Manager Monitoring 5.0.8239.1010
Download https://systemcenter.wiki/?GetCategory=System+Center+2012+Configuration+Manager

Trellix Agent pack

Time to monitor the ‘Trellix agent’ pack

Trellix bought McAfee, and rebranded, but the service, application, registry keys, etc. have not yet changed. Many times, the pack fills in the gaps that the admin misses. Examples when Application services crash or become non-responsive, or just adding the capability to summarize issues seen in a daily alert report.

Quick Download: https://github.com/theKevinJustin/TrellixAgentMonitoring

Did you know?

System Event ID 7031 is logged for each application/service when the process has issues?

Trellix agent services have a monitor alert when System Event Log, EventID 7031 events have the agent services in the event description.

Second, my own spin for Application monitoring starts with the mantra ‘smarter vs. harder. Besides dynamic discovery based on registry key, adding the Service MonitorType gives additional monitorign flexibility adding Samples and Intervals to decrease false positive alerts. Simply put – count logic – x failures in y time before alerting.

Service MonitorType adds Samples and Intervals to decrease false positive alerts.

Third, the pack adds Trellix Agent rules, monitors, on-demand report task, and recovery scripts build out the manual intervention required alert action mantra.

Trellix Agent rules, monitors, on-demand report task, and recovery scripts build out the manual intervention required alert action mantra.

Optional – Configure addendum for environment

Download and Install ‘Trellix Agent pack’ here

Open saved XML in notepad or Notepad++ (your favorite XML editor here!)

Update the regular expression pattern line for McAfee server group

Save file and Import > enjoy less alerts!

Documentation

Addendum download https://github.com/theKevinJustin/TrellixAgentMonitoring

SCOMCore Addendum pack

Time to configure the Microsoft System Center Core Monitoring pack per health model and best practice. That’s where the SCOMCore Addendum pack comes in. Addendum adds High Agent Handle count group, daily report and alert closure automation, and rule/monitor overrides. Some assembly required – update the discovery pattern for offending high handle counts, and high handle count group ContextInstance GUID after import.

Quick Download: https://github.com/theKevinJustin/SCOMCoreAddendum

Background:

While High Agent Handle count was more an issue before the x365 platform migrated UC, SharePoint, and email (i.e. Lync/Skype, SharePoint, Exchange on prem) went to the cloud. This is still seen where cloud scalability options and virtualization/storage limitations exist. Example typically is an over-utilized virtual machine in hybrid/IaaS/premise scenarios. Kevin Holman caught this performance issue years back, creating monitoring alerts pack and blog. In case you’re on SCOM jeopardy, the LAW/OMS/Microsoft Monitoring Agent/SCOM agent has a built-in health check. The built-in health check restarts service when Handle Count or memory of the HealthService (aka Microsoft Monitoring Agent service) ran too hot per SCOM PG. SCOM agent restarts caused config churn, and high compute, as workflows re-ran after the service restarted.

Assess agent restarts

Begin by verifying if you have Kevin Holman’s pack for SCOM agent restarts downloaded and installed, which sets memory/handle count informational alerts https://github.com/thekevinholman/SCOM.AgentThresholds

Validate pack installed

Verify SCOM Agent Thresholds pack installed.

Configure addendum for environment

Download and Install ‘SCOMCore Addendum pack’ here

Open saved XML in notepad or Notepad++ (your favorite XML editor here!)

Update the regular expression pattern line for offending servers in the

Figure out the group GUID for the high agent handle count

From PowerShell on SCOM management server, run:

Get-SCOMClassInstance -DisplayName “Proactive High Agent Handle Count servers” | fl DisplayName,ID

Find/Replace GUID

Save file and Import > enjoy less alerts!

Documentation:

Kevin Holman blog on SCOM agent restarts

Holman’s pack for SCOM agent restarts and setting memory/handle count alerts https://github.com/thekevinholman/SCOM.AgentThresholds

Addendum download https://github.com/theKevinJustin/SCOMCoreAddendum

MCM Addendum pack

Rebranding central – MEM, EM, MECM, SCCM, Configuration manager, depending on the synonym, we’re referring to the same product. Tune the most common critical alerts per the health model to warning.

QUICK DOWNLOAD https://github.com/theKevinJustin/MCMAddendum/

Background

Read Holman’s blog for more details.

Did you know – MCM discoveries are based on registry keys added with various role installs on windows servers. These registry keys are typically under this path: HKLM\SOFTWARE\Microsoft\SMS\Operations Management\Components

What capabilities does the ‘MCM addendum pack’ provide?

Quite simply, the pack provides warning severity overrides for common alerts, disable event collection rules.

Includes warning severity changes for the following rules and monitors:

Monitors

BackupStatus.StatusMessage.Monitor

ReportingPoint.RoleAvailability.Monitor

SoftwareUpdatePoint.RoleAvailability.Monitor

SoftwareUpdatePointSync.AlertState.Monitor

Rules

ComponentServer.ComponentStoppedUnexpectedly.Event.Rule

SiteComponentManager – CanNotFindObjectInAD.Event.Rule, CouldNotAccessSiteSystem.Event.Rule

StateSystem.FailedToExecuteSummaryTask.Event.Rule

WsusConfigurationManager.FailedToConfigProxy.Event.Rule

Utilize the ‘MCM Addendum pack’

Download Kevin Holman’s MCM pack from GitHub.

Download the Addendum here, to get alerts where manual intervention required.

Save packs

Enjoy some acronym humor and ‘who moved my cheese fun!’

Import into SCOM & Enjoy!

If you need more capabilities, reach out on the blog or GitHub.

Documentation

Github repository here

SCCM management pack

Holman blog for MEM, EM, MCM, MECM, CM, ConfigMgr, Configuration Manager

DHCP Addendum pack

Leverage DHCP addendum to tune DHCP subnet monitoring.

Leverage the ‘DHCP Addendum pack’. Why? DHCP manages IP ranges, particularly customer facing issues like VPN connectivity, VDI/AVD/appliance devices, as well as client workstations/laptops/GFE’s. The DHCP management pack alerts when a subnet is nearing zero available IP’s before you have an outage. This article will help you understand how the addendum’s new capabilities tune DNS monitoring to best practice.

QUICK DOWNLOAD(S)

2016+ HTTPS://GITHUB.COM/THEKEVINJUSTIN/DCHPAGNOSTIC

What capabilities does the ‘DHCP Addendum pack’ provide?

Two groups, one DHCP server group, and DHCP subscription group to configure notifications to SME for DHCP related classes

Overrides for common alerts, disable event collection rules

Utilize the DHCP Addendum

Download the DHCP Addendum on GitHub, to get alerts where manual intervention required.

Update XML

The pack greatly decreases alerts, and the XML authoring is an easy feat. After you import the pack, find/replace is required for two pieces.

Discovery group regular expressions (RegEx)

##DHCPServerRegEx##

Find ##DHCPServerRegEx## and replace with your DNS server expressions.

Example server names: 12dc01, 19dc01,19dc02,19dc03, etc.

RegEx = (?i)12dc0|19dc0

Using PowerShell on SCOM management server (MS) to determine group GUIDs for replace in Overrides/Discoveries.

Update group GUIDs, after installing this pack.

Find/replace the GUIDs, as they are unique to every SCOM management group, hard coding the group ID GUID is not possible. We will be running Get-SCOMClassInstance to determine the group GUID’s applicable in the management group.

From PowerShell, on your SCOM management server, run get-SCOMClassInstance commands for the two groups added.

get-scomclassinstance -DisplayName “Microsoft Windows DHCP 2016+ Servers” | ft Id

get-scomclassinstance -DisplayName “Microsoft Windows DHCP 2016+ subscription components” | ft Id

Example

Leveraging Notepad++ to find/replace the group GUID with SCOM environment specific GUIDs

Find/Replace the GUID in the pack with the ID from the output above.

Save pack

Import into SCOM & Enjoy!

OS Addendum packs

Download the ‘OS Addendum packs’ for new capabilities contains Event count logic monitor type, Disk cleanup, Group Policy, self-healing/reset monitors, as well as ‘eventLog full’ logic and reports. Additional monitors reduce alert noise. Examples of common alert scenarios are: StorPort storage errors, Group Policy 1096 identification and rebuild. Disk Cleanup & EventLog service recovery, which includes Event Log file expansion and rollover.

Quick DownloadS

https://github.com/theKevinJustin/2012OSAddendum

https://github.com/theKevinJustin/2016ServerAgnostic

Tune ‘OS Addendum packs’ as needed

Update logical disk paths and retentions. The default report contains quite a few common checks, including root folders broken out by path, highest to lowest GB’. The workflow is scalable to add additional application paths, as well as file retention timeframes. Workflow runs on a weekly basis to cleanup/archive log files, paths. See Disk cleanup logic blog for more details.

' OS Addendum packs' contains Logical disk breakdown of root folders to list paths were files stored, highest to lowest — ‘ OS Addendum packs’ contains Logical disk breakdown of root folders to list paths were files stored, highest to lowest

UpdateStorPortCountForRepeatedStorageErrors

StorPort storage errors typically cut lots of alerts with storage reads/writes. The ‘count’ monitors decrease the alerts, and the daily alert report consolidates the warning alerts (critical by default). If you’re seeing these alerts, the default should decrease overall alerts to near zero. Tune as needed for disk alerts, by updating MatchCount or TimerWait in Seconds (the x events in y time piece of the monitor logic)

Update StorPort Count for Repeated Storage read/write errors

Save file(s) and import