Tuesday, April 14, 2009

Recently I came across the an issue with MOM 2005 Agents reporting communication issues when trying to contact the Management Server. While investigating the issue I found a number of posts online which indicated had the same symptoms.

Most of what was being commented about online related to MS KB 885416 which involves the following events being reported in the Agent's Application Event Log.

Event Type: Error
Event Source: Microsoft Operations Manager
Event Category: None
Event ID: 21293
Description: The agent was unable to send data to the MOM Server at ManagementServerName. The error code is 10054. An existing connection was forcibly closed by the remote host.

Event Type: Information
Event Source: Microsoft Operations Manager
Event Category: None
Event ID: 26021
Description: The agent has restored communication to ManagementServerName

Unfortunately like the MS KB 885416 everything pointed to needing to install a hotfix or Service Pack 1, however like other people posting about this issue I had already installed Service Pack1. In fact this issue is the first thing fixed by Service Pack 1 according to http://support.microsoft.com/kb/905420

After hitting a brick wall with my investigation I ended up contacting Microsoft support for assistance and after providing their technician with a very clear overview of everything I had already investigated he immediately found that there was another error also being logged. This error which he found was:

Event ID: 26009
Category: None
Source: Microsoft Operations Manager
Type: Error

Machine: ComputerName
The agent could not connect to the MOM Server FQDN.MOM.Server. The error reported is 'No connection could be made because the target machine actively refused it.'. Verify the management group name is correct, the MOM Server is running, that it is listening on port 1270, and that any firewalls between this agent and the MOM server are configured to pass TCP and UDP traffic on port 1270.

Where the MS tech found this error is still beyond me as I could never see it but it was either in the Event Log or the MOM log files from the Agent.

Once this error was detected Microsoft directed me to the MB KB 934441 which matched the problem we were having.

After downloading the Hotfix as described in MB KB 934441 and applying it to the affected Management Server the communication issue was immediately corrected and monitoring restored.

I am still not 100% sure of the cause of this as Microsoft never provided a definite cause for why the errors were being reported but the hotfix has definitely corrected almost every case of this issue we came across.

NOTE: After patching the affected Management Server I also then patched all of my other Management Servers (6 in total) and then proceeded to patch the Agents, which I am still working through.

No comments:

Post a Comment