What’s my path again?

Say what? What's my path again
Say What

What’s my path again?   Why did my command fail?

Ever get ‘command not found’ errors when calling a command on a machine?  Many times, these errors are related to what is defined on said machine.   So with monitoring tools like SCOM, ALA, Azure Automation, BMC Patrol, the ID used in monitoring rely on filepaths defined on the local server (holds true for Windows/UNIX).  Because sometimes even ls, awk, dir, etc. if their various bin directory filepaths are NOT specified as a security hardening measure.  The result of STIG/Security hardening is ALL scripts/commands require a fully qualified filepath.

Fully qualifying command paths holds true for Windows and UNIX, from generic OS commands, AND also application specific files (including an executable).  Updates are required if you want to supply the short name command.  Add the full filepath to PATH= statement.  The alternative is to fully qualify in your SCOM mgmt. pack, so the command will run regardless of user, as long as the path is correct.

 

Check for specified shell

First, let’s check UNIX to see what shell is specified for user(s).

Second, log into your UNIX server, and check files type:  ls -al .* | more  

Use ls -al | more to see what PATH files are in the user directory
Use ls -al | more to see what PATH files are in the user directory

Third, another option with less output

example:  ls -al .*profile

What's my path? Use command ls -al .*profile to find which profile(s) exist
What’s my path? Use command ls -al .*profile to find which profile(s) exist

 

Fourth, Look for the shell defined for the user account

On my server, SCOM user is bash shell (but I do NOT have a .bash_profile, only a .profile (also note NO .ksh_profile) )   Knowing what profiles are configured for user account will help define what is inherited from the OS, (automatically included).  Leverage when calling commands in your management packs for custom rules/monitors.

 

In conclusion, if executable is NOT in the filepath variable, you have two ways to resolve the issue:

  1. Create a .bash_profile
  2. Call bash/ksh shell in your script or command line:   bash; <commandhere>

 

To check path:

UNIX $PATH vs. Windows $ENV:path

UNIX example – ‘echo $PATH’ from UNIX ssh session/logon

What's my path again? Use echo $PATH
UNIX what’s my path? Use echo $PATH

 

Windows PowerShell example

What's my path? Windows PowerShell example of $PATH
What’s my path? Windows PowerShell example of $PATH

 

Here’s my .profile that sets up SCOM user (only /bin shown)

What's my path? Use UNIX .profile to find PATH
What’s my path? Use UNIX .profile to find PATH

 

 

Here’s a UNIX .profile example:

https://www.unix.com/unix-for-dummies-questions-and-answers/21995-basic-profile-setup.html

Example

set PATH=$PATH:/usr/homes/myhome/sqlldr:/appl/oracle/product/9.2.0/bin

Need to find the command UNIX pack runs for perf counter

Magnifying Glass

 

 

Have you ever needed to find the command UNIX pack runs for perf counter?   Say the processor time value doesn’t match what the Unix admin may be saying SCOM is showing.

 

Many times you can look at the SCOM management pack, and those commands trace back to the UNIX library.

 

Background:  The SCOM management server runs many of the cross-plat/xplat workflows to the UNIX agent through WinRM.

 

Agenda
  1. Unseal SCOM UNIX management pack to obtain URI
  2. Understand command line options from UNIX/Linux side, and how to view the output
  3. Enumerate command line
  4. Test Command line from SCOM MS

 

 

 

Unseal SCOM UNIX management pack

The screenshot below is unsealing the Solaris10 pack to XML, and then viewing/searching to show the processor reference.

Solaris 10 processor rules

NOTE that’s a URI, not a script

 

 

How UNIX admin may supply processor output

Example – Unix admin typically uses vmstat or iostat.

 

The screenshot uses ‘vmstat 2 10‘ – a snapshot every 2 second intervals, 10 times

vmstat output

 

We can discuss the vmstat output, but it shows way more than just processor (ready queue, swap, user, system, and cpu %) to help figure out which operating system component is the problem.

 

 

Enumerate command line test

How do we test the command line syntax, to verify what SCOM pulls when running the rule?

For example, we need to make the URI actionable from the management pack.  What is needed to make a usable command?

 

Grab the URI from the pack

http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_ProcessorStatisticalInformation?__cimnamespace=root/scx

 

Because we know the URI, we now build out the syntax with WinRM

winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_ProcessorStatisticalInformation?__cimnamespace=root/scx -auth:basic -remote:https://<servername>:1270 -username:<scomID, not necessarily root> -skipCACheck -skipCNCheck -skiprevocationcheck –encoding:utf-8

 

 

Test WinRM command from SCOM MS

For instance, we want to test the WinRM command from the MS to the UNIX server

winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_ProcessorStatisticalInformation?__cimnamespace=root/scx -auth:basic -remote:https://ubuntu:1270 -username:scom -skipCACheck -skipCNCheck -skiprevocationcheck –encoding:utf-8

 

Example output

SCX_ProcessorStatisticalInformation
InstanceID = null
Caption = Processor information
Description = CPU usage statistics
ElementName = null
Name = 0
IsAggregate = FALSE
PercentIdleTime = 99
PercentUserTime = 0
PercentNiceTime = 0
PercentPrivilegedTime = 0
PercentInterruptTime = 0
PercentDPCTime = 0
PercentProcessorTime = 1
PercentIOWaitTime = 0

SCX_ProcessorStatisticalInformation
InstanceID = null
Caption = Processor information
Description = CPU usage statistics
ElementName = null
Name = _Total
IsAggregate = TRUE
PercentIdleTime = 99
PercentUserTime = 0
PercentNiceTime = 0
PercentPrivilegedTime = 0
PercentInterruptTime = 0
PercentDPCTime = 0
PercentProcessorTime = 1
PercentIOWaitTime = 0

 

Additional references for WinRM syntax and troubleshooting

Warren’s blog

Docs site

Use Unix MP’s for shell commands

 

Build FluentD conf file

Build trust one block at a time

Ready to build out a FluentD conf file?

 

Let’s build a FluentD conf file.  We can use the docs site for another example.  And now, let’s build a simple FluentD configuration file. Paste the XML code below, and save as <yourlogfile>.conf

Create custom log file to test

cd /etc/opt/microsoft/omsagent/scom/conf/omsagent.d/
# vi <yourlogfile>.conf

vi mylog.conf

# Example conf file

<source>
# Specifies input plugin. Tail is a fluentd input plugin – http://docs.fluentd.org/v0.12/articles/in_tail
type tail
# Specify the log file path. Supports wild cards.
path /var/log/mylog
# Recommended so that Fluentd will record the position it last read into this file.
pos_file /home/omsagent/fluent-logging/mylog.pos

# Used to correlate the directives.
tag scom.log.mylog

format /(?<message>.*)/
</source>

<filter scom.log.mylog>
type filter_scom_simple_match
regexp1 message 911
event_id1 911
</filter>

<match scom.log.mylog>
#Disable mutual Auth
enable_server_auth false

# Output plugin to use
type out_scom
log_level trace
num_threads 5

# Size of the buffer chunk. If the top chunk exceeds this limit or the time limit flush_interval, a new empty chunk is pushed to the top of the
queue and bottom chunk is written out.
buffer_chunk_limit 5m
flush_interval 15s
# Specifies the buffer plugin to use.
buffer_type file
# Specifies the file path for buffer. Fluentd must have write access to this directory.
buffer_path /var/opt/microsoft/omsagent/scom/state/out_scom_common*.buffer
# If queue length exceeds the specified limit, events are rejected.
buffer_queue_limit 10
# Control the buffer behavior when the queue becomes full: exception, block, drop_oldest_chunk
buffer_queue_full_action drop_oldest_chunk
# Number of times Fluentd will attempt to write the chunk if it fails.
retry_limit 10
# If the bottom chunk fails to be written out, it will remain in the queue and Fluentd will retry after waiting retry_wait seconds
retry_wait 30s
# The retry wait time doubles each time until max_retry_wait.
max_retry_wait 9m
</match>

Save (:wq!)

 

# Restart Agent

/opt/microsoft/omsagent/bin/service_control restart

# Check for errors – see blog

grep -i error /var/opt/microsoft/omsagent/scom/log/omsagent.log

# Test strings into your logfile

# Options

echo test >> /var/log/mylog

echo 911 error >> /var/log/mylog

# mimic syslog or messages syntax

echo `date +”%b %e %H:%M:%S”` MYLOG 911 test string. Call 911 >> /var/log/mylog

 

Please stay tuned for more management pack options to alert on the strings.  Refer to the part1/2 blogs for more details on unit testing for alerts.

OMSAgent FluentD debunked – Configure Linux FluentD – part2

Are you stoked and fired up to Configure Linux FluentD - part2 !?

Now to begin – OMSAgent FluentD debunked

Configure Linux FluentD – part2 –> see part one (1) here)

 

 

First, my thanks to Mike Johnston@Microsoft (CSS SEE SME) to help validate my steps and testing, to configure Linux FluentD on an Ubuntu server!  Are you ready to bust a myth – OMSAgent FluentD debunked

 

If you’re starting fresh, or just joining, start with Part 1.  And Part 1 configures packs and assumes SCOM agent is installed and working.  Because it’s time to use the feature, we need to get the agent configured and tested.

 

Part one (1) quick summary

    • Verify pre-reqs – SCOM Linux Management packs for Linux/Universal Linux (2019 @ 10.19.1082.0), UNIX/Linux Log File monitoring (2019 @ 10.19.1008.0)
    • Linux server has SCOM Agent installed, configured, and updated (sudoers configured) – GUI blog here
    • Use docs.microsoft.com article

 

Load Sample Log monitoring pack

This piece is missing in the doc, but the content development team has this covered in a subsequent docs article.  We need to load a sample log monitoring pack to the SCOM management group, so we can test functionality.

Configure FluentD part 2 - This is a picture of the SCOM console GUI showing the OMED pack installed from the Admin tab > Management Packs > Installed Management Packs > with omed in the 'look for:' bar

Grab the file here, otherwise you can copy/paste from the docs article pretty easily.

 

 

Verify OMED service running on Management Server

It’s now time to enable the OMED service on the management server, and we can start with the docs subsection

Navigation steps from SCOM console (GUI)

    1. From the Operations console, go to Monitoring>Operations Manager>Management Server>Management Servers State.
    2. Select the management server in the Management Servers state.
    3. From Tasks, select Health Service Tasks>Enable System Center OMED Server

 

Steps to set/start service PowerShell (as admin)

# Verify service startup type is automatic

get-Service OMED | select -property Name,Starttype

# Example output

PS C:\Users\admin> Get-Service OMED | select -property name,starttype
Name StartType
—- ———
OMED Automatic

# Set startup type

 

# Start OMED service on SCOM management server (MS)

start-service OMED

 

Now we’re ready to test the UNIX agent!

 

 

Configure SCOM/OMSagent on Linux server

And now it’s time to switch to the agent side.  I’m assuming that you’ve already configured the SCOM agent on the Linux server.  So it’s time to verify the SCOM and OMSAgent is configured and working.  Let’s go back to the docs subsection for our sanity check, because we need to create folders, and set ownership, etc.

 

Create files and set permissions
mkdir /etc/opt/microsoft/omsagent/scom/conf/omsagent.d
mkdir /etc/opt/microsoft/omsagent/scom/certs
mkdir /var/opt/microsoft/omsagent/scom/log
mkdir /var/opt/microsoft/omsagent/scom/run
mkdir /var/opt/microsoft/omsagent/scom/state
mkdir /var/opt/microsoft/omsagent/scom/tmp
mkdir /home/omsagent/fluent-logging
# NOTE – This location is flexible for the path to use for log file position files
chown omsagent:omiusers state
chown omsagent:omiusers run
chown omsagent:omiusers log
chown omsagent:omiusers tmp
chown omsagent:omiusers /home/omsagent/fluent-logging

Verify SCOM certificate

Configuring FluentD requires the SCOM management server (MS) has signed the certificate on the UNIX server.  The docs article tells you to generate a new certificate for FluentD, which requires the management server.

Overview

Sign the certs on the agent > copy to MS > sign > copy back to agent

Step by step instructions
    1. Generate certs

/opt/microsoft/scx/bin/tools/scxsslconfig -c -g /etc/opt/microsoft/omsagent/scom/certs/

2. Rename certificates

cp -p omi-host-server.domain.pem to scom-cert.pem

cp -p omikey.pem to scom-key.pem

 

3. Copy certs to MS (sftp/ssh via WinSCP, or your app of choice)

 

4. Sign certs on MS via scxcertconfig -sign

 

Open PowerShell (as admin)

Go to your SCOM management server directory (hopefully d:)

cd ‘D:\Program Files\Microsoft System Center\Operations Manager\Server’

scxcertconfig -sign scom-cert.pem

scxcertconfig -sign scom-key.pem

 

5. Copy certs back to agent from MS (sftp/ssh via WinSCP, or your app of choice)

 

6. Verify the SCOM certificate shows your Management Server (MS) in the DC= line in the certificate

openssl x509 -in scom-cert.pem -noout -text

Verify SSL certificate - openssl syntax, verify the DC= portion is from the SCOM management server (MS)

 

7. Restart omsagent

As the ALLINONE server is one of my 2019 SCOM labs, I can verify that my cert is now signed by the management server (MS).  Time to load the certificate, and then restart the agent to see if we have any errors

# Restart Agent

/opt/microsoft/omsagent/bin/service_control restart

 

 

Verify omsagent.log errors

Verify any errors from the omsagent.log

Depending on where you are with your UNIX/Linux commands, this may help provide some context or use case examples.

My example –

First error after restart was ‘permission denied’.   FluentD runs under the omsagent ID, and needs to have access to whatever log – at least read (4).  For the syslog example, I made omsagent the owner, and omiusers the group.   The smarter, security hat on, choice is to leave as root and make it read capable, or add omsagent to the root group

Configure FluentD part 2 - fluentd permission denied alerts on /var/log/syslog

 

Search /var/opt/microsoft/omsagent/scom/log/omsagent.log for errors.  Commands build on another, from simpler to more complex.  Don’t worry if UNIX/Linux is new, I’m all about examples, so hope that helps bridge the gap!

 

# Tail omsagent.log for progress

# Option 1 Continual output updates from file

tail -f /var/opt/microsoft/omsagent/scom/log/omsagent.log

# Option 2 – get last 10 lines

tail /var/opt/microsoft/omsagent/scom/log/omsagent.log

 

# Option 3 – get last 100 lines

tail -100 /var/opt/microsoft/omsagent/scom/log/omsagent.log

# Option 4 – Get a little fancier – search for a string

grep string /var/opt/microsoft/omsagent/scom/log/omsagent.log

# Option 5 – Specific example = error, case insensitive (-i)

grep -i error /var/opt/microsoft/omsagent/scom/log/omsagent.log

 

# Option 6 – egrep strings and -v to exclude what you don’t want to see

grep -i error /var/opt/Microsoft/omsagent/scom/log/omsagent.log |egrep -v “Permission denied|stacktrace”

 

Verify FluentD config files

Verify FluentD conf files and omsagent.conf has INCLUDE line

The INCLUDE lines allows a directory for a ‘Gold depot’ to control what log files are monitored on destination linux servers.  The goal is a standard repository (gold depot ) to simply copy the conf file you want for logfile/app/daemon, restart agent, and you’re off to the races monitoring that log file.

 

Verify omsagent.conf includes directory

grep -i include /etc/opt/Microsoft/omsagent/scom/conf/omsagent.conf

# If there’s output, make sure that omsagent.d path exists

# Verify permissions show omsagent:omiusers

ls -al /etc/opt/Microsoft/omsagent/scom/conf | grep omsagent

 

10. Back to step 8’s problem, to fix the FluentD conf files, so we can test!  Step 9 verified that FluentD is configured via the omsagent.conf, and also for specific configuration files (.conf) in omsagent.d directory.

ls -al output list of the omsagent.d directory and oms config specific files for various log files

Next, we need to restart the agent to verify configuration, and any errors are seen on the FluentD side.

My error for ‘out_scom’ plugin was already used by some other test conf files.

grep -i error /var/opt/Microsoft/omsagent/scom/log/omsagent.log |grep “Permission denied” |tail

 

Example of omsagent.log where we have traced an event for our mylog

OMSAgent FluentD debunked - omsagent.log permission denied opening logfile errors for /var/log/syslog

Mike explained that my error was due to having multiple FluentD conf files using the same buffer path for ‘out_scom’.  I searched the conf files to see who had ‘out_scom’ and removed one of my old test files from months back when I was testing the feature.

# Example of errors in the omsagent.log

Tail of the omsagent.log where we want to look for errors

 

Don’t forget to restart the omsagent for reading in the new file changes

# Restart Agent

/opt/microsoft/omsagent/bin/service_control restart

 

 

I’ll cover building a fluentd conf file in another blog post for brevity.

 

 

Time to test for alerts!

Time to test our FluentD conf file and append entries into the log file!

Starting simple again

# Options

echo test >> /var/log/mylog

echo 911 error >> /var/log/mylog

# Echo entries into test logfile to mimic syslog or messages

echo `date +”%b %e %H:%M:%S”` MYLOG 911 test string. Call 911

# Verify

tail /var/log/mylog

Switch over to SCOM management server, and look for alerts

Navigate to the Monitoring Tab > Active alerts

OMSAgent FluentD debunked - scom console alerts for fluentd test patterns

 

 

References for more information

In case you need a refresher on all the date options… Found CyberCiti FAQ helpful

Configure FluentD part 2 - output of the date command formatting like syslog or messages

All because the goal is to make the echo statement better for testing closer test/UAT examples on string matches, etc.

echo `date +”%b %e %H:%M:%S”` MYLOG 911 test string. Call 911

And what does it look like?

OMSAgent FluentD debunked - tail of created /var/log/mylog that shows various echo options

 

Configure Linux FluentD

'Thanks' written in collage form for many languages
Thanks

What are you Fluent in?

 

Join me as we configure FluentD on Linux, and continue to improve and document monitoring cross-platform (UNIX/Linux) servers.
Background:
Some of our previous topics included UNIX logical disk class differ from Windows (here), and cross platform agent setup.   Because we always ‘need more power!’, it’s time to configure Linux FluentD.  All this, because the ‘Linux FluentD’ was updated on docs.microsoft.com!

Don’t worry, if you’re on the edge, this may be a Scotty moment of “I’m giving her all she’s got” with your current monitoring environment.

 

 

Maybe you’re thinking ‘convince me’, so what does FluentD provide monitoring wise?

    • System Center Operations Manager (SCOM) 2016+ has enhanced log file monitoring capabilities for Linux servers.
    • Wild card characters in log file name and path.
    • New match patterns for customizable log search
    • Use community published plugins versus having to build from scratch

 

I wanted to take a moment to validate the steps provided, not just because I’ve had a pretty large part of my career supporting cross-platform environments for large enterprise companies.  So, let’s get started!

 

 

Review the FluentD setup procedure 

Let’s speed up the ‘do’ part.  Review the procedure at docs.microsoft.com

FluentD basic operation here

Configuration Overview here

FluentD pre-reqs

Server types are covered = Linux (RedHat/Ubuntu)

Management Packs required for SCOM
        • Load latest Linux Operating system management packs (2019).
          1. Find the pack download here
          2. Load the relevant Linux or Universal Linux packs
          3. Verify Microsoft.Linux.Log.Monitoring pack is loaded
Verify ID/password and sudoers capability for root

Use docs article if you need additional assistance to install agent on the Linux server.  Alternatively, use my blog posts for PowerShell or GUI install

 

Update agent to latest release

Grab the latest OMS Agent release from github

      • From server command line:

wget https://raw.githubusercontent.com/Microsoft/OMS-Agent-for-Linux/master/installer/scripts/onboard\_agent.sh

sh onboard_agent.sh

            1. The wget command above will install the OMSAgent and it’s pre-req packages
            2. This includes OMI, scx, OMSAgent, OMSConfig, auoms,  Apache, Docker, MySQL (if the last three apply to your Linux server)

Load Management packs from latest UNIX release.

Verify packs are version 10.19.1082.0

Navigation steps:

From SCOM Console > Administration Tab > Installed Management packs

List of Linux based SCOM management packs installed needed to configure Linux FluentD

 

After adding the updated Linux Management packs

Screenshot list of 10.19.1082.0 versioned Linux management packs required to 'configure Linux FluentD'

 

 

Linux server screenshots installing OMSAgent via wget command

Output from the wget command to install omsagent on Linux server

screenshot saving file from GitHub to Linux server

Install of oms agent components on Linux server

 

 

Stay tuned, I’m currently testing FluentD configuration on my 2019 UR1 lab environment on Ubuntu16.



Using Unix MP’s for Shell commands and scripts

Ready to move out of the UI ?

Thanks to Saurav Babu, and Tim Helton’s help, I was able to push my MP authoring limits further.

The good thing with the Shell command template in SCOM is that your script is encoded.

Bad news

  1. If functionality doesn’t exist in the UI, you can’t easily pull the monitor and just add variables to get that functionality.
  2. Scripts and Shell commands are encoded (great news for security!)

Now to the use case – need Sample Count and Match Count to prevent false positive alerts

The UNIX Shell Command library allows us to use the following variables out of the box:

Interval, SyncTime, TargetSystem, UserName, Password, Script, ScriptArgs, TimeOut, TimeOutInMS, HealthyExpression, ErrorExpression

AND we can override Interval, Script, TimeOut, TimeOutInMS

If that’s not enough options, then read on!

When the built-in functionality doesn’t exist

For this UNIX shell command/script monitor, we required SampleCount and MatchCount

Variables explained

SampleCount is the number of times (samples for an alert).

If SampleCount = 4, this means 4 samples will generate an alert

MatchCount is the number of intervals before monitor state changes

If Interval = 60 (s), and MatchCount = 10, then it will take 10 minutes (600s before we alert)

Combining the 2 means 4 samples over 10 minutes will generate an alert.

Sometimes this is called alert suppression or counting failures before alerting

Built a custom DataSource, ProbeAction, and WriteAction, as the UNIX Shell Library MP did not include these additional variables.

Please review my updated MP Fragments TechNet Gallery for the custom MP and fragments!

https://gallery.technet.microsoft.com/Uncommon-Custom-MP-c5a12a86

Encoding the script or command to run

The other issue with UNIX scripts and commands, is the UI encodes the scripts.

How do we get around it you ask?

Since we are building an MP Fragment and MP, we must figure out how to encode.

To encode the script to put into your SCOM monitor (and MP Fragment)

Example

$script = ‘if [ `ps -ef | grep sleep | grep -v grep | wc -l` -eq “1” ]; then echo false; else echo true; fi’

# Verify script variable
$script

# Get $script bytes
$s = [System.Text.Encoding]::UTF8.GetBytes($script)

# Verify script bytes output (optional as bytes broken out by line)
$s

# Encode script to Base64
$encoded = [System.Convert]::ToBase64String($s)

# Verify $encoded
$encoded

# Optional
# Verify string converts back properly
[System.Text.Encoding]::UTF8.GetString($s)

$encoded output is what needs to be entered into the <script></script> variable in your monitor

Example Output

PS C:\Users\scomadmin\desktop> $script = ‘if [ `ps -ef | grep sleep | grep -v grep | wc -l` -eq “1” ]; then echo false;
else echo true; fi’
PS C:\Users\scomadmin\desktop> $script
if [ `ps -ef | grep sleep | grep -v grep | wc -l` -eq “1” ]; then echo false; else echo true; fi
PS C:\Users\scomadmin\desktop> $s = [System.Text.Encoding]::UTF8.GetBytes($script)
PS C:\Users\scomadmin\desktop> $s
PS C:\Users\scomadmin\desktop> $s = [System.Text.Encoding]::UTF8.GetBytes($script)

PS C:\Users\scomadmin\desktop> $encoded = [System.Convert]::ToBase64String($s)
PS C:\Users\scomadmin\desktop> $encoded
aWYgWyBgcHMgLWVmIHwgZ3JlcCBzbGVlcCB8IGdyZXAgLXYgZ3JlcCB8IHdjIC1sYCAtZXEgIjEiIF07IHRoZW4gZWNobyBmYWxzZTsgZWxzZSBlY2hvIHRydWU7IGZp
PS C:\Users\scomadmin\desktop> [System.Text.Encoding]::UTF8.GetString($s)
if [ `ps -ef | grep sleep | grep -v grep | wc -l` -eq “1” ]; then echo false; else echo true; fi
PS C:\Users\scomadmin\desktop>

References

Jonathan Almquist’s blog post

Kevin Holman’s blog on service with Samples

Azure UNIX server SCOM agent setup errors with OEL v7.x

 

Ran into some customers with UNIX agent problems, including Azure Oracle Enterprise Linux servers with SCOM agents.

 

 

 

 

 

Basically this error means

  1. Fully-qualified domain name cannot be determined from the UNIX or Linux host itself
  2. The FQDN known to the UNIX/Linux host does not match the FQDN used by the management server to reach the host

 

 

Full error message text

 

Agent verification failed. Error detail: The server certificate on the destination computer (agentname.contoso.net:1270) has the following errors:

The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.

The SSL certificate is signed by an unknown certificate authority.

It is possible that:

  1. The destination certificate is signed by another certificate authority not trusted by the management server.
  2. The destination has an invalid certificate, e.g., its common name (CN) does not match the fully qualified domain name (FQDN) used for the connection.  The FQDN used for the connection is: agentname.contoso.net.
  3. The servers in the resource pool have not been configured to trust certificates signed by other servers in the pool.

 

The server certificate on the destination computer (agentname.contoso.net:1270) has the following errors:

The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.

The SSL certificate is signed by an unknown certificate authority.

It is possible that:

  1. The destination certificate is signed by another certificate authority not trusted by the management server.
  2. The destination has an invalid certificate, e.g., its common name (CN) does not match the fully qualified domain name (FQDN) used for the connection.  The FQDN used for the connection is: agentname.contoso.net.
  3. The servers in the resource pool have not been configured to trust certificates signed by other servers in the pool.

 

 

 

 

Troubleshooting links

Old TechNet article for SCOM 2007R2

Docs site – link for 1801 – Steps haven’t changed, and IMHO, docs site is better documented

 

 

Here are some commands to help troubleshoot UNIX agent

ScxAdmin

 

Check UNIX Agent status

scxadmin -status

 

Example Output

$ scxadmin -status

scxcimserver: is running

scxcimprovagt: 2 instances running

 

 

 

Set Unix agent to START verbose logging

scxadmin -log-set all verbose

 

 

 

Restart Health Service & tail scx log

scxadmin -restart

cd /var/opt/microsoft/scx/log

tail -f scx.log

 

 

To correct a SCOM agent getting a SSL certificate error:

From the Docs site, the SCXsslConfig “tool is useful in correcting issues in which the fully-qualified domain name cannot be determined from the UNIX or Linux host itself, or the FQDN known to the UNIX/Linux host does not match the FQDN used by the management server to reach the host.”

 

As root:

1.             Get the exact hostname of the server with the hostname command  

2.             Stop the SCOM agent – /opt/microsoft/scx/bin/tools/scxadmin -stop  

3.             Rebuild the cert  – /opt/microsoft/scx/bin/tools/scxsslconfig -v -f -h HOSTNAME -d <FQDN_Here>  

4.             Start the SCOM agent – /opt/microsoft/scx/bin/tools/scxadmin -start 

 

 

 

 

 

 

Additional Configuration topics from the docs site

Configuring SSL Ciphers link

Specifying an alternate Temporary Path for scripts link

Universal Linux – Operating System Name/Version link

 

 

Other document links

Holman SCOM 2012R2 Deploying Unix agents

Holman SCOM 2016 Monitor Unix/Linux

Adding agents via PowerShell