Thursday, September 8, 2016

OMS IP Bug: ‘Operations Manager Failed to Access the Windows Event Log’ SCOM Alert

Issue
A customer of mine who has one of his SCOM environments connected with OMS reported me that they saw the Alert ‘Operations Manager Failed to Access the Windows Event Log’ coming in for many SCOM managed servers, but not all of them. They noticed the Alert was all about trying to access a non-existent event log, ATA?

Time to investigate
As it turned out, this Alert about not being able to access the ATA event log, only happened on a subset of SCOM managed servers. As stated before, the particular SCOM MG is connected to OMS. And in OMS a Group of computers is managed by OMS. And for all those servers, this Alert pops up.

The non-existent event log, ATA is all about Microsoft Advanced Threat Analytics. And the specific Rule causing this Alert is Microsoft.SystemCenter.CollectATAEvents:
image
This Rule comes from the MP Microsoft System Center Advisor Advanced Threat Analytics.

What surprises me here is the targeting of the Rule. One of the basics MP authors are taught (even though I am not a MP author, I am familiar with the foundation and the rules), is NOT to use the Windows Computer Class as a target. Simply because it’s to broad! Like using buckshot instead of a well aimed bullet…

And yet, this Rule is like buckshot:
image
Ouch!

And even though this Rule is disabled by default, it’s enabled for the Group Microsoft System Center Advisor Monitoring Server Group:
image

And this Group is populated with all the SCOM managed servers who’re also connected to OMS. And none of those servers has an Microsoft ATA event log, even though this Rule wants to connect to it:
image

But when looking deeper into this Rule, it looks even weirder since the Rule doesn’t contain any filters at all?
image
Wow, when an ATA log is present it basically means EVERY ATA event is uploaded to OMS. How much data is that? Consider this running for hundreds of servers….

So now we have the culprit and the cause. Time to solve it.

Workaround
Since this is a badly written Rule but we don’t have access to the source code, we need a workaround which is nothing more than an Override in order to disable it.

In this case I set an Override (Disable) for the Group Windows Server Computer Group and also ENFORCED the same Override in order to be 100% sure it’s effective:
image

Case closed.

1 comment:

Unknown said...

If you monitor additional logs (i.e. IIS logs) all other servers monitored by SCOM and OMS display same error if they don't have that log. Very annoying issue with OMS.