Validate your runbook design

Posted: 28th July 2011 by Anders Bengtsson in Orchestrator

Since I started to work with Opalis I have been surprised to see how fast people build their first policy or runbook. When showing Opalis or now Orchestrator for a group of engineers it doesn’t take many minutes before they have a running workflow. Now with Orchestrator we don’t have a complex installation, instead we can install quickly and they focus on designing runbooks. A disadvantage when it is too easy is that it often is not used as it should be used. In Orchestrator and Opalis the problem is that many engineers quickly drag and drop a couple of activities to the policy or runbook, click Start and then lean backward. Most likely they have a working runbook, but are they really working? Is it built as it should be? I have written a blog post around fault tolerance, Fault tolerance in Opalis polices, and another around logging, Building a log for your runbooks, that is two topics that often is left out in those quick-dag-drop-start” runbooks.

This blog post is a sequel to the same theme; this blog post is around validating your runbook design. There is of course a lot to think about when building an A+ runbook, and I have selected a couple of them. I guess I will add more checks later; this is version 1.0 of the runbook. This version of the runbook checks the following things, for each runbook that is in your environment (all checked in runbooks)

  • Missing failed links
    It is important to handle all results from an object. Often you drag a link and go with the default “<activity name> returns success”. But what if it is not success? Maybe you want to roll-back some action or write to a log.
  • Missing warning links
    Same as above, important to handle all results, not only failed or success
  • Links with default named
    When you look at a runbook you should get some kind of basic understanding of what it is doing. Renaming links from “Link” to something more descriptive is a good start.
  • Number of link colors in use
    You should change the color of your links. I often use GREEN as success, RED for warning or failed and ORANGE if there is a condition. For example if I have an activity that can return TRUE or FALSE, then I could have two activities after. Depending on the result, TRUE or FALSE, the workflow will choose a path. You should try to use a standard, not use all kind of colors because then you lose the descriptive purpose.
  • Number of objects per runbook
    If you get to many objects in the same runbook it is difficult to administrate it and it is also difficult to understand it when you see it. If you instead split it up into “child runbooks” it is often easier to administrate and you can also re-use these child runbooks in other workflows.
  • Runbook Logging
    Checks if any runbook have logging enable. By default, logging options are disabled. Enable logging increases the size of the database. Instead you should use a external system for log, see Building a log for your runbooks.
  • Log Purge settings
    In this blog post, Opalis Log Purging, you can read about how Opalis and Orchestrator handles logs and log purging. It is an important to configure to keep your installation healthy.

The attached export file (.ois_export) contains a number of runbooks

  • 4.1 SCO check master.
    This is the “master” runbook that will trigger the other runbooks. This runbook will start by checking if there is an old report file. The result of these runbooks will be written to a HTM file. If the file already exists, it is deleted. Then a new report file is started, all the “check” runbooks are invoked and in the end the report file is closed, or the HTML code in the HTM file is closed.
  • 4.1.1, Missing Filed Links
    1. Trigger runbook 4.3.1 to write the start of a HTML table in the report file
    2. Query the Orchestrator database for all objects that are not marked as deleted. The query excludes action servers, counters, variables, links and send platform event objects
    3. For each object returned, a new query is checking if it is source object for a link, with a trigger that has %failed% as value.
    4. If the query in step 3 returns 0 rows, then there is no link handling Failed
    5. Trigger runbook 4.4.3 that writes the data from query 3 and 4 to the report file
  • 4.1.2, Missing Warning Links
    This runbooks works the same way as 4.1.1 except it checks for missing warning links instead of failed links.
  • 4.1.3, Check Links With Default Name
    1. Trigger runbook 4.3.1 to write the start of a HTML table in the report file
    2. Query the database to the all links that is not marked as deleted and has a name equals “Link” (default name)
    3. It the query returns 0 rows, the runbook will trigger runbook 4.3.2 to close the HTML table
    4. For each row that is returned, trigger runbook 4.3.3 and write it to the report file
    5. Merge multiple threads
    6. Trigger runbook 4.3.2 to close the HTML table in the report file
  • 4.1.4, Check Link Colors
    1. Trigger runbook 4.3.1 to write the start of a HTML table in the report file
    2. Query the database to get all color values on all links, and count each of them
    3. Trigger runbook 4.3.3 to write the result to the report file
    4. Merge multiple threads
    5. Trigger runbook 4.3.2 to close the HTML table in the report file
  • 4.2.1, Count Objects in Runbook
    1. Trigger runbook 4.3.1 to write the start of a HTML table in the report file
    2. Query the database to get number of activities per runbook, exclude links and objects marked as deleted
    3. Trigger runbook 4.3.3 to write the result to the report file
    4. Merge multiple threads
    5. Trigger runbook 4.3.2 to close the HTML table in the report file
  • 4.2.2, Runbook Logging
    This runbook checks if any runbook have “Store Activity-specific Returned Data” or “Store Common Returned Data” enable. You should not have these enable in production as they will store a lot of extra data in the database. Instead use them during dev and test. Then use a external system to log in production, if needed.
  • 4.3.1, 4.3.2 and 4.3.3 is used by multiple runbooks to write HTML table in the report file and write result to the report file. In runbook 4.3.3 there is a filter that make sure that if the runbook name or activity name starts with “4.” it is not written to the report file. The result is that “SCO Check” runbook will affect the report as little as possible. We could exclude a bit more by editing the database query, now we don´t write anything to the report file that includes “4.” in the name, but we include the “SCO Check” runbooks when we for example count link colors in use.
  • 4.4.1, Log Purge Check
    Checks if schedule log purge is enable or not. Read more about log purge at Opalis Log Purging

These runbooks use a  couple of variables too, that needs to be update before you start the master runbook. Variables to update is database name, database server and report path.

Let´s test this. I have built a simple runbook as the image below. Note that I only handle warning and failed result for the “Delete org file” object. To include a runbook it needs to be checked in.

If I now run the “4.1 SCO Check Master” runbook, it will, after a while, result in a report file that looks like this

You can download all runboks here, 20110728_SCO_CHECK. Please note that this is provided “as is” with no warranties at all. If you have any ideas what I should add, please post them as comment to this post or send me an e-mail. Also please note this is all based on my ideas and is not a “Health check” or Microsoft official guidelines.

  1. […] Uncategorized (7) RSS FeedsHere are some System Center news:Application Monitoring – Working with Alerts (System Center Operations Manager)Application Monitoring – Working with Alerts (System Center Operations Manager)Topology changes in System Center 2012 Operations Manager (Overview) (System Center Operations Manager)Topology changes in System Center 2012 Operations Manager (Overview) (System Center Operations Manager)NOW LIVE: The Microsoft TechNet Gallery (SCSM Engineering Team Blog)List of Build Numbers for System Center Service Manager (SCSM Engineering Team Blog)WebFront for Service Manager – General availability (Litware)Competing workflows (Litware)Recent CommentsThe book has arrived « http://www.contoso.se on Announcing: System Center Service Manager (SCSM) 2010 UnleashedBjkoekkoek on Notification on all unassigned incidentsNuno on Query a database with a monitor – part twoAngela on Handle VIP Users in Service ManagerAnders Bengtsson on Validate your runbook design […]

  2. Hi Mats,
    In my sandbox it takes around 10 minutes. Guess we could optimize sql queries a bit

  3. mats.w says:

    The runbook works great Anders!
    Just a note! On my SCOrch server in VMware workstation it took almost 17 min for it to complete.