RMS Disaster Recovery

I saw a question about disaster recovery, how to deploy a new root management server (RMS) if the first goes down. In this post I will tell about a test I did regarding this topic.

The RMS is the first management server installed in the management group. The RMS hold some special roles, if these roles are offline, the management group will not work. One of the steps during deployment are to backup the root management server key. This key will be used when promote another server to RMS, if the first RMS goes down. This means that you must have the key to recover, do not forget to backup it. If you don´t have the key there is no way to promote another server to become root management server. The root management server support cluster, so there don´t need to be a single point of failure, remember this when you design your Operations Manager 2007 environment. 

First I took a backup of the key on my RMS.

  1. Copy SecureStorageBackup.exe from the installation CD (X:\SupportTools) to the Ops Mgr installation directory (C:\Program Files\System Center Operations Manager 2007\)
  2. Open a command prompt and go to the installation directory
  3. Run the following commando: SecureStorageBackup.exe Backup C:\BackupOfKey.bin
  4. You will be asked to input a password to protect the file

Now I have a backup of my RMS, C:\BackupOfKey.bin. I took a copy of this file to another machine.
The next step was to shutdown my RMS server. After that I could see that my agents had lost connection to the management server. A member server with Ops Mgr console could not start the console anymore, “Failed to connect to server COOPSMGR02, The sdk service is either not running or not yet initialized”.

I installed a new 2003 Server (with the same IP and hostname), ran all updates, joined the domain and install Ops Mgr 2007 again. Choose to install all the components that the RMS had before for example web console, console, server and power shell.
After the installation the console started on the new server. All agents was connected again and could receive new rules, if I looked in the services console I could see that the SDK Service and Config Service was running, I could verify that the servers was RMS by looking in the console. I had a new RMS, and I didn’t needed the key.

I read that all run-as accounts are lost if the key is not restored, but I tried uninstall a agent with the management action account and it was successfully. But that account could also be saved somewhere else and therefore work. To test that I did some more operations.
I setup a simple task to do something at a machine. I configure the task to run with a new run as profile. I ran the task and verified that the task was using the new run as account. The next step was to do everything one more time to see if my new run-as account disappear during the reinstallation of the RMS, at least if the password disappear.

  1. I shut down my RMS
  2. My member server could no longer start the console, my agents could no longer connect to a management server
  3. I installed a new server with the same OS, IP and hostname
  4. I installed Ops Mgr 2007 with all components that I had before on my RMS

I started the console on my new RMS, everything seems fine and the machine is root management server according to the console. I run my special task and I could see in the output that it was running with my run as account. In other words, the password or account information had not disappear during the restore, and I had not restore the key from the first RMS.