Re-learn an old but still relevant tool – EventLog Explorer

 

Sometimes we forget about tools that can make things easier.

 

Time to talk about EventLog Explorer.

 

Need to repro and test events for an installed program, to see what SCOM will handle?

Read this old mom team blog, courtesy of Kevin Holman blog

 

 

I wanted to try it to test fire some events, had a use case where we needed to test Skype events from the SCOM MP

 

Testing on my SCOM 2016 Management server

 

Download file, run EventLog Explorer

The Paste icon next to the X is ‘Add to Execution List’ and fills out the bottom pane

The Green Arrow is ‘go’ or execute (similar to PowerShell ISE)

 

Navigate through the Event Log and Event Source on the left hand pane

Mark events with the checkbox  

 

Add to Execution

 

Verify events added to bottom pane

(see my test yesterday for fired, and not fired events from today)

 

 

 

Click Green box with white arrow to fire events, and check Event Viewer

 

 

Yesterday’s test

 

 

 

Today’s test

 

 

Verify alerting occurred as expected!

MMA Agent, cross platform, and Azure

Things that make you go hmmm….

 

 

Ran across a scenario where we were trying to connect Azure Cross-platform (Linux) VM’s and MMA/SCOM agents to SCOM management group.

 

Management group was 2012R2, discovery wizard from SCOM console, failed to install agent, certificate errors.

 

Researching, found this article first

Windows Azure VM monitoring blog

There’s a version history for the Azure Monitor VM extension here

Applies:

SCOM2012R2 after UR12 or SCOM 2016 UR2+ deprecated the SHA1 certificate

 

Deprecating SHA1 certificates
Tech Community blog

 

Product team nicely published a TechNet gallery script to help!

Gallery download – Script to update SHA1 certificates to SHA256 on cross-platform agents – SCOM

TechNet Gallery Download
https://gallery.technet.microsoft.com/scriptcenter/Script-to-update-SHA1-8a30c5ef

 

 

Service Map for SCOM

 

Ever compare your work to an amusement park?

Every business application compares to a ride, roller coaster, or even a kiddie ride.

Anyone ever ask you directions to that ride, or more technical based questions like ‘what’ communication makes up that business application?

 

 

In comes Service Map to save the day!

 

 

Last year I blogged about setting up Service Map with OMS/Log Analytics, but I didn’t get the feature installed for SCOM.

December blog on how to set up OMS/Log Analytics

 

It’s basically the SCOM Agent (MMA) and a Dependency Agent (think old Blue Stripe agent)

 

Excited to see the new Service Map to hit public preview, hoping by September

 

 

Check out the blog series

Planning and PreReqs blog
Install and configure MMA agent blog
Dependency agent blog

Set up Azure Service Principal blog
Set up SCOM Management Group blog

 

 

Possible SQL issues affecting SCOM performance

 

Good reasons for a Risk Assessment

 

SQL RAS runs 800+ queries to check on target SQL servers

Check Best Practice Recommendations (BPR)

 

May be good opportunity to audit the SQL build for BPR!

 

 

 

Ran across some good examples where SQL settings brought SCOM to a standstill

One was Cardinality Estimation – basically, predicts how many rows a query will return

Part of SQL since 1998 with SQL Server v7.0

 

Let’s figure out what SQL2016 runs OoB (out of box)

 

SQL 2016

SELECT ServerProperty(‘ProductVersion’);
GO 


SELECT name, value
FROM sys.database_scoped_configurations
WHERE name = ‘LEGACY_CARDINALITY_ESTIMATION’;
GO

 

 

The other is CLR Strict Security

SELECT * FROM sys.configurations

WHERE name = ‘clr enabled’

 

 

Talking with Shawn Nakhostin – SQL PFE, we discussed opportunities and questions around SQL optimization and best practices.

Shawn gave me the following feedback on customer performance issues:

I’ve found some customers who have had performance issues with SQL based on organizational SQL settings:

  1. Trace flag 9481
  2. CLR Strict Security is by default enabled

 

Trace flag 9481

Enabling or disabling this TF is not a matter of best practice.

The customer should see what works for them.

Here is the explanation:

Customer started using a new cardinality estimator in SQL Server 2014.

The product team knew that the new CE improved some of the query plans, but not all of them. In other words, they knew that this would improve overall query performance in “some” environments but might have a different impact in other environments.

For this reason, they created TF 9481 so that environments that see query performance degradation after upgrading SQL Server from version 2012 and earlier, they can turn on this trace flag so that the query optimizer uses the old algorithm for CE.

Note:-Trace flag 9481 forces the query optimizer to use version 70 (the SQL Server 2012 version) of the cardinality estimator when creating the query plan.

https://blogs.technet.microsoft.com/dataplatform/2017/03/22/sql-server-2016-new-features-to-deal-with-the-new-ce/

https://support.microsoft.com/en-in/help/2801413/enable-plan-affecting-sql-server-query-optimizer-behavior-that-can-be

 

CLR Strict Security is by default enabled

This causes all assemblies to be treated as unsafe.

As a result, assemblies will not load.

To get the assemblies to load they can do one of the following:

  • Sign the assembly. This may work if you have a few assemblies but becomes a huge task if there are many assemblies to sign.
  • Set the TRUSTWORTHY database property to on.
    • This is not recommended because in some form defeats the purpose of using CLR Strict Security.
  • Add the assembly to the trusted assemblies list.
    • This is called whitelisting, which may be a better option than the previous two.

https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/clr-strict-security?view=sql-server-2017

 

 

Updated Skype for Business 2015 (premise) Addendum MP

Updated and completed for Company Knowledge!

 

Many thanks to Nick Wood for his help deciphering user impact for these alerts.

Reach out to Nick on LinkedIn

 

This has been an arduous effort to complete the Skype alerts and components.

 

Gallery Download

 

It’s taken a few steps along the way, to get all the content delivered.

 

To read the whole Skype Addendum journey, please read these additional blog posts
June blog
January blog

 

The initial Addendum pack with just service recoveries and Azure overrides

Old pack https://gallery.technet.microsoft.com/Skype-for-Business-2015-b005f49f
This download disabled Azure, set service recovery tasks

 

 

The new Gallery download contains the following:

Skype SCOM Alerts.xls
Microsoft.LS.2015.Monitoring.ComponentAndUser.Addendum.xml
Microsoft.SystemCenter.Notifications.Internal.xml
Skype.for.Business.Server.Management.Pack.Alert.Grooming.xml

 

NOTE The Skype.for.Business.Server.Management.Pack.Install.txt file contains the information as well

 

#############################################################
#
# Breakdown of files
#
#############################################################

#
# Skype SCOM Alerts.xls
# Skype SCOM Alerts XLS is is MP Export excerpt formatted for XLS workbook
#
# Feel free to search this file

# Column D is ‘Escalate to Who’
# This has values as SCOM Engineering, Messaging Ops, Telephony, Messaging Engineering
# Column E is impact
# This has values: P3-P5, *Email
# Column F is the Display String for the monitor
# Column H has the User Impact, Cause, and Troubleshooting steps

 

#
# Microsoft.LS.2015.Monitoring.ComponentAndUser.Addendum.xml
# Addendum management pack sets up company knowledge tab for each Skype monitor, with actionable troubleshooting steps.
#

 

#
# Microsoft.SystemCenter.Notifications.Internal.xml
#

# Backup the current MP first, and merge if you are adding this to your environment!
# Use this pack cautiously, as it will replace existing Channels, subscribers, subscriptions.

# On Management server, open PowerShell window as Admin
#
# cd <path>
# Example

cd $HOME/desktop
Get-SCOMManagementPack -Name *Notification* | export-SCOMmanagementpack  -path “C:\”

Copy-item .\Microsoft.SystemCenter.Notifications.Internal.xml .\Original-<CompanyName>-Microsoft.SystemCenter.Notifications.Internal.xml

 

 

# Save the bundled Notifications pack to the same path

# YES it’s that important, the file can eliminate any alerts leaving SCOM!
#
# Save file to local drive

# Follow MP Fragment authoring if you need to merge existing Notifications with Skype pack
# https://kevinholman.com/2016/06/04/authoring-management-packs-the-fast-and-easy-way-using-visual-studio/
#

#
# Skype.for.Business.Server.Management.Pack.Alert.Grooming.xml
#
# This file is to set alerts per the XLS, making warnings for P4,P5, email, and P3 for Critical

#
#############################################################

 

 

 

What ID’s is SCOM using

Ever need to audit what ID’s SCOM is using?

Maybe you have to figure out how someone else setup SCOM.

Did they set up SCOM as recommended for best practices with different AD accounts per role?

 

If the ID’s are not logged during install, it’s a little more difficult to figure out what ID was used.

  • Domain Account for ALL services,
  • Enter in the unique DOMAIN\OMAA, DOMAIN\OMDAS, DOMAIN\OMREAD, DOMAIN\OMWRITE

 

Try these PowerShell commands to find what SCOM is using.

 

ON MS (from PowerShell (don’t need admin unless you’re restarting services)

$Services = ( Get-WmiObject -Class Win32_Service )

$Services | ? { $_.Name -eq “OMSDK” -OR $_.Name -eq “cshost” -OR $_.Name -eq “HealthService” } |

ft name,Startname,StartMode

 

 

 

ON SCOM DB’s, Reporting (from PowerShell (don’t need admin unless you’re restarting services)

$Services = ( Get-WmiObject -Class Win32_Service )

$Services | ? { $_.DisplayName -like “*SQL*” } | ft name,Startname,StartMode

 

 

Source https://blogs.technet.microsoft.com/heyscriptingguy/2012/02/15/the-scripting-wife-uses-powershell-to-find-service-accounts/

 

 

Azure UNIX server SCOM agent setup errors with OEL v7.x

 

Ran into some customers with UNIX agent problems, including Azure Oracle Enterprise Linux servers with SCOM agents.

 

 

 

 

 

Basically this error means

  1. Fully-qualified domain name cannot be determined from the UNIX or Linux host itself
  2. The FQDN known to the UNIX/Linux host does not match the FQDN used by the management server to reach the host

 

 

Full error message text

 

Agent verification failed. Error detail: The server certificate on the destination computer (agentname.contoso.net:1270) has the following errors:

The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.

The SSL certificate is signed by an unknown certificate authority.

It is possible that:

  1. The destination certificate is signed by another certificate authority not trusted by the management server.
  2. The destination has an invalid certificate, e.g., its common name (CN) does not match the fully qualified domain name (FQDN) used for the connection.  The FQDN used for the connection is: agentname.contoso.net.
  3. The servers in the resource pool have not been configured to trust certificates signed by other servers in the pool.

 

The server certificate on the destination computer (agentname.contoso.net:1270) has the following errors:

The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.

The SSL certificate is signed by an unknown certificate authority.

It is possible that:

  1. The destination certificate is signed by another certificate authority not trusted by the management server.
  2. The destination has an invalid certificate, e.g., its common name (CN) does not match the fully qualified domain name (FQDN) used for the connection.  The FQDN used for the connection is: agentname.contoso.net.
  3. The servers in the resource pool have not been configured to trust certificates signed by other servers in the pool.

 

 

 

 

Troubleshooting links

Old TechNet article for SCOM 2007R2

Docs site – link for 1801 – Steps haven’t changed, and IMHO, docs site is better documented

 

 

Here are some commands to help troubleshoot UNIX agent

ScxAdmin

 

Check UNIX Agent status

scxadmin -status

 

Example Output

$ scxadmin -status

scxcimserver: is running

scxcimprovagt: 2 instances running

 

 

 

Set Unix agent to START verbose logging

scxadmin -log-set all verbose

 

 

 

Restart Health Service & tail scx log

scxadmin -restart

cd /var/opt/microsoft/scx/log

tail -f scx.log

 

 

To correct a SCOM agent getting a SSL certificate error:

From the Docs site, the SCXsslConfig “tool is useful in correcting issues in which the fully-qualified domain name cannot be determined from the UNIX or Linux host itself, or the FQDN known to the UNIX/Linux host does not match the FQDN used by the management server to reach the host.”

 

As root:

1.             Get the exact hostname of the server with the hostname command  

2.             Stop the SCOM agent – /opt/microsoft/scx/bin/tools/scxadmin -stop  

3.             Rebuild the cert  – /opt/microsoft/scx/bin/tools/scxsslconfig -v -f -h HOSTNAME -d <FQDN_Here>  

4.             Start the SCOM agent – /opt/microsoft/scx/bin/tools/scxadmin -start 

 

 

 

 

 

 

Additional Configuration topics from the docs site

Configuring SSL Ciphers link

Specifying an alternate Temporary Path for scripts link

Universal Linux – Operating System Name/Version link

 

 

Other document links

Holman SCOM 2012R2 Deploying Unix agents

Holman SCOM 2016 Monitor Unix/Linux

Adding agents via PowerShell

Skype for Business 2015 (premise) Addendum MP

Ever try to figure out a Skype alert, for which server in the pool(s) is failing?

 

While maybe not the clearest to find root cause, the Skype pack brings a bunch of functionality, including synthetic transactions.

 

I was lucky enough to collaborate with Nick Wood, Skype PFE, to help provide more detail, troubleshooting, impact on what is critical versus warning.

 

What the addendum pack brings

Do you think 656 monitors can all be critical?

  • Sets up service restart recovery tasks for all Skype services
  • Company Knowledge tab for troubleshooting/user impact

 

Gallery Download

 

Here is a visual of our Skype efforts for integrating troubleshooting details into SCOM console.

NOTE:  Company Knowledge tab would be accessible from the alert as well

 

Company Knowledge

SCOM Console, Authoring tab, Dispatcher Queue monitor

Highlight monitor, right click, choose properties

Click on ‘Company Knowledge’ tab

Incorporated the XLS into SCOM under Company Knowledge for additional information on user impact, causes, and troubleshooting (under resolutions)

 

Active Directory 2012-2016 Addendum packs updated

Man time flies!

 

Thought I’d share some new functionality for AD DS (Active Directory Domain Services)

 

Ran across some customer errors with AD Event ID 1084, which exists in the old 8321 pack, but not in the v10.x pack.

Well, if you get these errors, your DC isn’t replicating, and most likely will need to be rebuilt.

 

Gallery download

 

Broke out the packs to separate the Recovery Tasks in their own pack, versus added functionality in the addendum.

Figured better to send packs NOT sealed, so that meant 2 packs,

WYSIWYG (wizzy-wig acronym)

 

What this means

v1.0.0.1 pack had just the AD DS Service Recovery Tasks

v1.0.0.2 pack has a Service Recovery Tasks pack, and the Addendum pack

What I think is cool is that the Addendum pack contains 2 rules, simple rule event (enabled by default), and also a PowerShell rule.

 

Rule Figured out how to simply look for criteria, count it, and alert on it.

We always look for alert suppression, some of the sliding/counting monitors are too much.

 

Starting with Holman’s alerting rule fragment, we can create more powerful combinations than just a single symptom.

Using Variations of the get-date command, we can actually specify how far back to look, to count for alerts.

Easier method to count events, to figure out an alert threshold.

 

From the rule in the Addendum pack

# Check blog for more detail https://blogs.technet.microsoft.com/heyscriptingguy/2015/01/21/adding-and-subtracting-dates-with-powershell/
# If you want this in other time increments – AddHours, AddSeconds, AddMilliseconds
#
$LastCheck = (Get-Date).AddMinutes(-65)

[int]$TempCount = (get-eventlog -logName “Directory Service” -Source “NTDS Replication” -InstanceID 1084 -Message “*8451 The replication operation encountered a database error*” -After $LastCheck).Count

IF ($TempCount -ge 1)
{
$Result = “BAD”
$Message = “The number of 1084 Replication Database error events was greater than 1”
}
ELSE
{
$Result = “GOOD”
}

 

Maybe we need multiple event ID’s, or search multiple event logs… you decide, and let me know.

 

System Center Orchestrator 1801 Integration Packs

 

Orchestra…?

 

FYI – Additional IP’s released this month for Orchestrator, SMA, and SPF

Orchestrator https://www.microsoft.com/en-us/download/details.aspx?id=56558

Service Management Automation https://www.microsoft.com/en-us/download/details.aspx?id=56559

Service Provider Foundation https://www.microsoft.com/en-us/download/details.aspx?id=56557

 

In case you didn’t know, from Lynne Taggart’s blog , these integration packs (IP) were released in February:

System Center 1801+ Integration Pack for HP iLO and OA
System Center 1801+ – Orchestrator Integration Packs
System Center 1801+ Integration Pack for HP Service Manager
System Center 1801+ Integration Pack for IBM Tivoli Netcool/OMNIbus
System Center 1801+ Integration Pack for VMware vSphere
System Center 1801+ Integration Pack for HP Operations Manager

 

Have fun automating!