I have been in the sandbox today, playing with failover. In this scenario I had four machine,
- COOM01, root management server with SQL
- COOM02, management server
- CODC01, domain controller and DNS, also agent managed from operations manager
- COXP01, XP workstation for operations manager administrators
I deployed a agent to the domain controller (CODC01) from COOM01 and assign CODC01 that management server. During the agent deployment you can setup failover for a agent. If you use Active Directory integrated settings for your agents you can specify failover settings there. With default settings the agent will automatically failover to any other management server in the management group.
When a Operations Manager administrator started the console at his workstation, COXP01, he connect to COOM01 as it is the root management server (RMS).
I then turned off the root management server. On the agent, CODC01, I first got an event ID 21006 that told me that the agent could not connect to CODC01, and that I should check the network. Two minutes later I got an event ID 21015 that told me that the agent will resume communication when COOM01 are back online. I also got, two minutes after the first event, an event telling me that the agent had successfully failed over to COOM02.
I then turned the root management server back on and after a couple of minutes the agent generated a event ID 21019
Everything was back to normal. When I opened the console I saw no alerts at all, at least no new ones. This was most likely because the root management server was down and could not update the database. Then I shut down the root management server again, this time with the console open at COXP01. I got the event at CODC01 telling me that the management server COOM01 was not available and that the agent had failed over to COOM02. I did not got any error messages in the console at COXP01, but if I tried to change the view in the console I did get a error. If I try to open the console at COOM02 I get this
After that the root management server went down the administrator at the workstation could not see anything in his console. The SDK service is responsible for authentication to the database, the SDK service are running on the root management server. This mean that if there is no root management server running the SDK service nothing can access the database.