Contoso.se

Welcome to contoso.se! My name is Anders Bengtsson and this is my blog about Azure infrastructure and system management. I am a senior engineer in the FastTrack for Azure team, part of Azure Engineering, at Microsoft.  Contoso.se has two main purposes, first as a platform to share information with the community and the second as a notebook for myself.

Everything you read here is my own personal opinion and any code is provided "AS-IS" with no warranties.

Anders Bengtsson

MVP
MVP awarded 2007,2008,2009,2010

My Books
Service Manager Unleashed
Service Manager Unleashed
Orchestrator Unleashed
Orchestrator 2012 Unleashed
OMS
Inside the Microsoft Operations Management Suite

Agent Failover

What happens to agents when the management server becomes inaccessible?

You can in MOM 2005 Administrator Console setup where every agent can failover. You do this under management server properties, under computers (pic5). You can setup that agents from MOMSRV1 will failover to MOMSRV2 and MOMSRV3. You can’t setup in which order, if first to MOMSRV2 or MOMSRV3. But you can do that in your agent registry. For every management server that scan a agent there is two registry changes, Consolidator X AD Name and Consolidator X Host (pic1). Here, in the agent registry, you can change failover order. You will find Consolidators here My Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Mission Critical Software\OnePoint\Configurations\\Operations\Agent\Consolidators.

When the primary management server becomes inaccessible some events is generated on the agent, pic2 and pic3. If that management server also becomes inaccessible the agent do a failover to the next consolidator, in this case FAMOMSRV03 (pic7).

Note that there is no alerts in operator console when a agent do a failover. There is of course alerts when a management server becomes inaccessible, pic6.

If a agent have done failover first to MOMSRV02 and then failover to MOMSRV03 when MOMSRV2 gets back online will the agent don’t do a failover back to MOMSRV02. The agent will wait for the primary management server to get back online, then it will failover back to the primary management server.