Home » System Center Operations Manager 2007 » Custom alerting based on distributed applications

Contoso.se

Welcome to contoso.se! My name is Anders Bengtsson and this is my blog about Azure infrastructure and system management. I am a senior engineer in the FastTrack for Azure team, part of Azure Engineering, at Microsoft.  Contoso.se has two main purposes, first as a platform to share information with the community and the second as a notebook for myself.

Everything you read here is my own personal opinion and any code is provided "AS-IS" with no warranties.

Anders Bengtsson

MVP
MVP awarded 2007,2008,2009,2010

My Books
Service Manager Unleashed
Service Manager Unleashed
Orchestrator Unleashed
Orchestrator 2012 Unleashed
OMS
Inside the Microsoft Operations Management Suite

Custom alerting based on distributed applications

I ran into a interesting scenario some time ago. A customer have first line operators online 24/7. During none business hours they receive all alerts and needs to call the on-call engineer if needed. But first line don’t have deep knowledge about the environment so sometimes the alerts from Operations Manager is a bit complicated to connect to a service, for example if the alert only tells you that database Y has a problem, and also to understand how critical the alerts are. For example if only one IIS in the IIS farm goes offline, they should not call the on-call engineer in the middle of the night.

We had for example a service including two Windows services. As long as one of them are running, there should not be an alert, and if there is an alert, it should include a simple non-technical description. First we needed to create a distributed application with the two services. We used the Configure Health Rollup feature to configure rollup algorithm to “best health state”. As long as any service is health, the component box will be healthy.

rollup01

When one of the services are stopped, you will receive an alert telling you for example “the print spooler service on computer X has stopped running”. If you don’t need it you can override the monitor and configure it not to generate alerts. When booth services are down the distributed application will switch to critical status. But you will not receive an alert, only for the two services included in the distributed application.

If you need an alert when both services are offline, when the component box switch state, you can override the aggregate rollup monitor in the distributed application. Override it to both configure the alert description and also rename the alert to get a better alert name in the console. In this scenario I override the aggregate monitor on top of my two Availability monitors.

rollup02

Now when both services are offline I get one alert, saying that first line should contact the on-call engineer.

rollup03


2 Comments

  1. Hi, I did that with a dependency rollup monitor. It seems like I have not described that in the blog post, sharp to see that 🙂 So you need to create a new dependency rollup monitor, that depends on the rollup monitor for the services component box. In the dependency rollup monitor you can configure any kind of alert name and description.

  2. Hi Anders,
    i read your post and i am very surprised: How do you exactly rename an Alert (Description, Subject) with an override? I tried it, but did not find a way …

    I would be very happy if you could explain, how you exactly did this 🙂

    Thanks,

    Frank

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.