If you have an Orchestrator environment with multiple runbook servers might have notice that when a runbook failover to secondary runbook server the runbook restarts at the first activity. Let’s say you have a runbook that creates a new user account, a user folder and deploy a computer to a client. If that runbook is interrupted between creating the user folder and deploying the computer it will failover to secondary runbook server and restart at creating the user account. The issues with this is this, in this example, is that you will end up with multiple user accounts for the same user, or the runbook will crash as it tries to create the same user account again. In other scenarios it could cause multiple installation of patches, multiple restarts of a server or multiple deployed of the same VM.
In this blog post I will show a couple of alternatives to handle state tracking. I will also include an example on logging for your runbooks. In my examples I use a SQL table to store logs from all runbooks.
SQL
For all state tracking you need to have a unique ID of the runbook job. If the runbook only runs in one instance you could use the runbook name or activity ID, but often the same runbook can run in multiple instances at the same time.
First alternative is to use a SQL table. I often create a database called OrchestratorTool or something similar to store logs, temp data and so on, in this database I create tables for different runbooks. It is often complicated to use the same table for state tracking from multiple runbooks.
Figure 1 show a “master” or “parent” runbook that invokes three other runbooks. This runbook is used from Service Manager when order a new user account, it creates an account, a user folder and a workstation. The “master/parent” runbook is the runbook that we use as a runbook activity in Service Manager. The Service Request ID (SR) from Service Manager is the unquiet ID that these runbooks use to do state tracking. After each Invoke Runbook there is a check to make sure the runbook returns “something” and not NULL. In the end of the runbook it updates the service request in Service Manager with the result, for example account name. If any of the three Invoke Runbook activities returns NULL I want the runbook to stop, I use the Fail Runbook activity to do this. The Fail Activity runbook is a Query Database activity with incorrect configuration. This will make sure the runbook stops. All the “Write to log…” Invoke Runbook activities invoke a runbook that will write data to a log table in a SQL database. Figure 2 show configuration example of a “Write to log” Invoke Runbook activity. Figure 3 show the runbook that writes to the log table.
Figure 1
Figure 2
Figure 3
Figure 4, 5 and 6 shows the different runbooks that the “master/parent” runbook invokes. Each of these runbooks starts by checks if for example a folder has been created for the service request. If the folder is not created it will create a folder and update the tracking database table with info. If there is already a folder in the tracking database table for this service request it will publish that information back to the “master/parent” runbook. Each runbook also includes a couple of “Write to log” invoke runbook activities to write data to the log table in SQL. If any of these runbooks fails the error will be written to the log table and the “master/parent” runbook will fail, with the “Fail runbook” activity. As you can see in figure 2 the severity is CRITICAL, when a runbook sends CRITICAL severity to the log runbook (figure 3) it will generate an alert in Operations Manager with the info and write it to the log table. This means that each runbook don’t have to include “Create Alert” activities to generate an alert in Operations Manager, they just need to write a CRITICAL event to the log table.
Figure 4
Figure 5
Figure 6
With the SQL alternative the “master/parent” runbook can be restarted multiple times and each “child” runbook will check if has already ran for the service request. We use a table in a SQL database to store the state of each service request. We need another runbook to clean up the state table in SQL, or do it as a last step in the “master/parent” runbook.
Service Manager
The second alternative is to use Service Manager to store the state of the runbook. In this alternative each runbook is independent of each other, we don’t have a “master/parent” runbook, and instead each “child” runbook is its own runbook activity in Service Manager. Servicedesk operators can easy re-start or skip an activity if needed and customer can track each step. If you have only one huge runbook activity, invoke a “master/parent” runbook that then invokes 50 other runbooks, it is very difficult to troubleshoot or restart.
When we build runbooks that handle success, warning and failure from all activities the runbook always ends with a success. Figure 7 show an updated version of the runbook shown in figure 5, to create the account I Active Directory. If for example the Create User activity fails the runbook will go to the Write to log – Can’t Create Account activity. That activity will invoke a runbook and write information to the log. That invoke runbook activity will most likely work, so the runbook will end as a success even if it was not a success, we couldn’t create the account in Active Directory. That is the reason for the “Fail Runbook” activity, which will make sure the runbook crash. The Fail Activity runbook is a Query Database activity with incorrect configuration. This will make sure the runbook stops.
In the end of the runbooks shown in figure 7, 8 and 9 you can see an “Update RB Activity” activity. This activity will write the result of the runbook, for example an account, to TEXT1 of the runbook activity. Each instance of a runbook activity has a number of text fields that you can use to store information. In figure 8 you can see that we use “Get Related RB activities” based on the Service Request to get the runbook activity named something with 9.2.1. 9.2.1 Is the runbook that create the account, shown in figure 7. The runbook in figure 8 then readers TEXT1 from that runbook activity and use it as input when create the folder.
Figure 7
Figure 8
Figure 9
The log table I use for these examples looks like the table in figure 10. The SQL state tracking table I use for the first example looks like the table in figure 11.
Figure 10
Figure 11
Download my example runbooks, StateTracking.
Note that this is provided “AS-IS” with no warranties at all. This is not a production ready management pack or solution, just an idea and an example
Hi,
That you can do if you build it yourself. By default Orchestrator will send all jobs to primary runbook server and then when that machine is filled up or offline, it will send jobs to the secondary runbook server. When you use the Invoke Runbook activity you can target a runbook server. You can also target a runbook server on the runbook level, but then it will Always only run on that runbook server so failover will not Work. With the Invoke Runbook activity you can first check the envrionment for example which runbook servers is online and then target it when you invoke a runbook with the invoke runbook activity. It is possible to do what you want, but it will be a lot of extra job in each runbook.
Hi,
I have 2 runbooks server one is in Idle mode and one take all the instance and run
currently 30 job.
the servers is Identical, I want to spreads Instance on booth servers but when one go down for maintenance I want the second to take care all the Instance.
I only have system center Orchestrator to work on.(I dont have all the system center suite)
Please advice.
Best regards,
Itamar
[…] There will still be logs in the technical log database for Orchestrator, more info about logs here. You will also have a work item, a Service Request, in Service Manager which information about the […]