Home » Operations Manager 2012

Category Archives: Operations Manager 2012

Contoso.se

Welcome to contoso.se! My name is Anders Bengtsson and this is my blog about Azure infrastructure and system management. I am a senior engineer in the FastTrack for Azure team, part of Azure Engineering, at Microsoft.  Contoso.se has two main purposes, first as a platform to share information with the community and the second as a notebook for myself.

Everything you read here is my own personal opinion and any code is provided "AS-IS" with no warranties.

Anders Bengtsson

MVP
MVP awarded 2007,2008,2009,2010

My Books
Service Manager Unleashed
Service Manager Unleashed
Orchestrator Unleashed
Orchestrator 2012 Unleashed
OMS
Inside the Microsoft Operations Management Suite

Automate failover in SMA with Operations Manager

In this post I talked about hos failover in SMA works. As you could read in that post there is a manual step to transfer jobs from one worker to another. This process could of course be automated in a couple of different ways, you can use Orchestrator for example. Today I will show how it can be done with Operations Manager.

I have setup a rule in Operations Manager that runs a command every five minutes. The command starts a Powershell script that checks how all workers are doing. First it checks which workers there are in the environment and then it checks if the Runbook Service is running on each of them.

 

If any of the workers, I have two in my sandbox, is not working the script will exclude them from the SMA configuration. The script will also stop and start the Runbook service (rbsvc) on the working worker server. If the script do any changes it will generate an event in the Application event. That event is picked up by another rule that generate an alert in Operations Manager.

If I then run Get-SmaRunbookWorkerDeployment I can verify that SMA02 is the only worker in my environment. Also I can see in my log table that the runbook is resumed on the SMA02 worker.

 

A couple of comments around the script, first, in the script I have hardcoded https://wap01 , WAP01 is my SMA web service. The SMA management pack discover this component so the script could find the web service hostname from that discovery, it could also find the workers based on default discoveries in the SMA management pack. Second, if a worker goes offline this script will exclude it. But the script will not include the worker again when it comes back online. That has to be done manual or with a updated version of the script.

Note that this is provided “AS-IS” with no warranties at all. This is not a production ready solution, just an idea and an example.

Last 30 Minutes Performance Data

I wrote a SQL query that I thought I could share. This SQL query will show collected performance data for a specific machine for the last 30 minutes. I was working with a issue where some agents stopped sending performance data. Before we found the root cause and a fix, we configured this script in a monitor to give us an alert if the agent was not sending performance data. Info how to configure a monitor to run a SQL query can be found here.

select Path, ObjectName, CounterName, InstanceName, SampleValue, TimeSampled 
from PerformanceDataAllView pdv with (NOLOCK)
inner join PerformanceCounterView pcv on pdv.performancesourceinternalid = pcv.performancesourceinternalid
inner join BaseManagedEntity bme on pcv.ManagedEntityId = bme.BaseManagedEntityId
where path = 'dc01.contoso.local' AND (TimeSampled < GETUTCDATE() AND TimeSampled > DATEADD(MINUTE,-30, GETUTCDATE()))
order by timesampled DESC

 

Operations Manager Admin Integration Pack (v1)

With the System Center Integration Pack for System Center 2012 Operations Manager we can integrate with Operations Manager and automate maintenance mode, alert handling, monitor for alerts, and state changes. Unfortunately there are no activities for handling management packs or management pack objects.

Some time ago I posted a blog post around self-service for Operations Manager (http://contoso.se/blog/?p=2764). That idea was built around creating new objects, like a rule, in XML and then import the management pack. While this works fine in some scenarios, there are other scenarios where it would be nice to have a bit more flexibility and reliability.

Russ Slaten (a true Texans with a lot of guns), a PFE colleague and Operations Manager Jedi from the US and I have built a first version of what we call “Operations Manager Admin Integration Pack” for Orchestrator. The purpose of this integration pack is to enable more self-service and automation scenarios in Operations Manager.

20130506_IP_Activities

Activities in version 1 are

  • Create MP. Creates a new management pack
  • Create Performance Collection Rule. Creates a rule that collects performance data
  • Create Event Alert Rule. Creates a rule that generates an alert based on event viewer event
  • Delete Management Pack. Delete a management pack
  • Delete Rule. Deletes an rule based on rule ID
  • Export Management Pack. Exports a management pack
  • Get Management Pack. Lists management packs from the management group
  • Get Rule. List rules. With default settings the activity will list all rules, use the Displayname property to filter the search result
  • Import Management Pack. Imports a management pack to the management group

These activities doesn’t use global connections, instead you can specify management server in each activity. This integration pack requires Operations Manager PowerShell snap-in on all Runbook servers.

As I think you have already realized, this integration pack enables a lot more self-service scenarios where non-Operations Manager Engineers can order new objects and handle management packs. Please take it for a spin and let us know what you think!

Example configuration for the “Create Rule Collection Performance” activity

OM   Server Name SCOM-Lit.Litware.com
MP   Name Custom.Example.Sandbox
Rule   Name Custom.Example.Sandbox.Rule.Test1
Rule   Displayname Sandbox   Test Rule 1
Rule   Description Sandbox   Rule for testing
Rule   Target ‘Microsoft.Windows.Computer’
Object   Name Processor
Counter   Name %   Processor Time
All   Instances TRUE   (used for multi instance counters)
Instance   Name If   AllInstances is false, then fill this in with either a target variable or   fixed value
Interval   Seconds 300
DW   Only (Data Warehouse) TRUE   (If you only want to write to the data warehouse)
Is   Optimized TRUE   (Use if you’re using the optimized data provider)
Tolerance 10   (Percentage or absolute. If value changes more than x then collect, otherwise   skip)
Tolerance   Type Percentage   (Percentage or Absolute)
Maximum   Sample Separation 12   (How many samples can be skipped before forcing collection)

Example configuration for the “Create Alert Event Rule” activity

Management Pack ID Custom.Example.Sandbox1
Management Server SCOM-Lit.Litware.com
Rule ID Custom.Example.Sandbox.Rule.AlertTest8
Rule Description Sandbox Rule for testing
Rule Displayname Sandbox Test Alert Rule 8
Rule Target Microsoft.Windows.Computer
Computer Name $Target/Property[Type=”Windows!Microsoft.Windows.Computer”]/PrincipalName$
Event Log Name Operations Manager
Event ID 9999
Event Source OpsMgr Scripting Event
Alert Name Sandbox Test Alert Rule 8
Alert Priority High = 2Medium = 1Low = 0
Alert Severity Critical = 2Warning = 1Information = 0

Example configuration for the Create Management Pack activities

20130506_CreateMP

Example configuration for the Delete Management Pack activities

20130506_DeleteMP

Example configuration for the Delete Rule activities

20130506_DeleteRule

Example configuration for the Export Management Pack activities

20130506_ExportMP

Example configuration for the Get Management Pack activities

20130506_GetMP

Example configuration for the Get Rule activities

20130506_GetRule

Example configuration for the Import Management Pack activities

20130506_ImportMP

 

Big thanks to our colleague Stefan Stranger  for Powershell support a very early morning 🙂

Download the IP, OMAdminTasks_20130508-1

Note that this is provided “AS-IS” with no warranties at all. This is not a production ready management pack or solution, just a idea and an example.

Tracking Logon and Logoff Activities in Service Manager

Last week I received a e-mail around tracking logon and logoff activities Service Manager with Operations Manager. It can be solved with a event collection rule and the default custom event report.

  1. In Operations Manager console, navigate to the Authoring work space and Rules
  2. Create a new rule of type Collection Rules/Event Based/NT Event Log
  3. Select a suitable management pack or create a new management pack, Next
  4. Input a rule name, for example Contoso – Service Manager – Logon and Logoff
  5. Select a rule target. If you have the Service Manager management pack imported you can use the SCSM 2012 Management Server class as target. Logon and Logoff events will be generated on your Service Manager management servers. Next
  6. Event Log Type, select or input Operations Manager, Next
  7. Build Event Expression like in the figure below, then save the new rule. In the example I exclude all events about my service accounts, all service accounts starts with svc.  Event ID 26328 is logon and event id 26329 is logoff.

20130309_SCSM01

 

Once the rule is created and deployed to your Service Manager management servers they will start report back events as soon as someone logon or logoff Service Manager. You can create a event view in the same management pack and configure the event view to show events generated by your new rule.

 

20130309_SCSM02

To show this data in a report you can use the default Reporting, Microsoft Generic Report Library, Custom Event report. The following figure show configuration of the Custom Event report

20130309_SCSM03

 

and the result of the report

20130309_SCSM04

 

 

If you want to look at the events in the Operations Manager data warehouse database you can use the following SQL query

select * from Event.vEvent ev
inner join Event.vEventDetail evd on ev.eventoriginid = evd.eventoriginid
inner join Event.vEventParameter evp on ev.eventoriginid = evp.eventoriginid
where eventdisplaynumber = ‘26329’ OR eventdisplaynumber = ‘26328’

If you want to build your own report you can use SQL Report Builder. I have a example of that here.

Manage new monitoring by self-service (light MP authoring with a pinch of Orchestrator magic)

A common scenario I often see is that everyone in the IT organisation knows that Operations Manager can monitor everything and fulfill all requirements, but it is to complicated for different expert/administration teams to do anything in Operations Manager. For example if the Exchange team wants to monitor a event they need to ask the Operations Manager team to create the rule. Of course the Operations Manager team don’t have time to do that the same day, instead there is a delay and once the rule is created the Exchange team have already solve it in some other way. The result is that Operations Manager is not used as much as it should be used.

In previous posts I showed how to handle overwrites and groups in Operations Manager with self-service in Service Manager and a bit Orchestrator. In this post I want to share a idea how to handle new monitoring, for example creating new rules from the Service Manager self-service portal.

My example starts with a service request in the Service Manager portal. A engineer goes in a request a new Windows event rule in Operations Manager. The engineer fills in event ID, rule name, which Windows Log, service/system and also alert name. A service request is created and in the service request there is a runbook activity.

The runbook activity trigger a “master runbook”. The runbook first invoke a runbook that will find a suitable management pack, then invoke a runbook to create a new monitor or rule (I have only included the rule part so far), then it invokes a runbook to import the management pack into Operations Manager and finally it invokes a runbook to update the service request.

The 60.3 Find MP runbook will find and return the management pack to use. It use the service parameter from the service request to select management pack. All management packs that are in products are stored in a “production” folder. The “Check if MP exists” activity checks if there is a management pack in that folder for the selected service. If there is, it makes a copy of it to a “archive” folder and returns the file path. If there is not a management pack it will write a new management pack file and return the path of that file. The “Write new MP file” activity will write all the needed XML code to a new XML file, it includes a number of input parameters.

 

The 60.2 Create Rule runbook will first translate targeting between the service parameter and the target parameter needed in the management pack. In my example I only have one target there, that is Windows 2008 Computer. The runbook then finds the <Rules>,<DisplayStrings> and the <StringResources> sections of the management pack and then adds the new rule. We use “Find” to know where in the management pack, on which line, to insert the new configuration. Each “Add Rule -” activity use input parameters when writing the new rule.

 

The 60.5 Import MP runs a Powershell script to import the management pack into Operations Manager. The last runbook, 60.4 Update Service Request, will update the service request with some information about the new management pack.

This example shows a way to use the self-service portal in Service Manager to order a new event rule in Operations Manager. Orchestrator builds the new rule in a management pack and import it into Operations Manager. A engineer that don’t know much about Operations Manager can still “author” a new rule and import it into Operations Manager. You could include approval step in the process and you can also include a check on the Orchestrator side to make sure the management pack and the new rule is according to best practices.

You can download my example runbooks here, 60 Create OM Rule , please note that this is provided “as is” with no warranties at all. This is not a production ready management pack or solution for your production environment, just a idea and an example.

vNext of this example could include version handling in each MP, should be easy to build with a couple of counters. Also information about the service request requesting the new management pack version could be included in the management pack description, shown in the Operations Manager console.

Orchestrator dashboard in Operations Manager 2012

When you start utilize Orchestrator to integrate between services and execute workflows you soon realize that you need to get an overview of what Orchestrator is actually doing. With the Orchestrator management pack for Operations Manager 2012 you get a good foundation of monitoring the Orchestrator infrastructure, but not that much about what Orchestrator is really doing. If Orchestrator is integrated with Service Manager most runbooks will run as an activity in a work item in Service Manager and then we can use Service Manager Reports to review what has been executed. In this example I will show you how you can build a dashboard in Operations Manager 2012 to show what is going on in Orchestrator.

With Operations Manager we can run a VB script and return the result as performance data. We can then use reports, performance views or dashboards to look at the performance data. In this example I have created a number of rules that runs VB scripts every 15 minute. Each script query the Orchestrator database for some information and sends the result back as performance data to Operations Manager. Some of the rules could be merge together to one SQL query, but as this is only an example and not complete management pack I did not re-wrote that. In Operations Manager I have created a dashboard to show the data.

Each script have an override controlled parameter, Script Arguments, which input Orchestrator database server and Orchestrator database name to the script. My example rules use a run as profile named “Contoso – Orchestrator – DB read account” to configure which account to use when query the Orchestrator database. With default settings, in this example, each query runs every 15 minutes and ask for data for the last hour.

 

My example dashboard includes five widgets, each widget show a number of performance instances.

  • Queue
    • Pending Jobs, show number of runbooks with pending status, meaning they are waiting to start
    • Top minutes in queue, show number of minutes top 1 job have been in the queue.
  • Runbook Results
    • Success, show number of runbooks that have ended with success result
    • Warning, show number of runbooks that have ended with success result
    • Failed, show number of runbooks that have ended with success result
  • Runbook Jobs. This widget show number of times each runbook have run with success result. You can easy see which runbook that most often executed. The names you see is the name of the runbook.
  • Orchestrator Server Status, show status of my Orchestrator roles. In this sandbox all roles are on the same server, SCO01.
  • Orchestrator Alerts show alerts generated by my Orchestrator machine.

You can download my example MP here, NOT SUPPORTED – Contoso.Orchestrator – v2 . Note that this is provided “AS-IS” with no warranties at all. This is not a production ready management pack or solution for your production environment, just a idea and an example.

As always, big thanks to Patrik for support and good discussions around System Center.

Other examples around scripts in rules, generating performance data, can be found here

Building groups in Operations Manager, with a bit of Orchestrator magic

In many scenarios you have a list of servers, a database query result or a place in your Active Directory that contains servers that you want to monitor in some special way. Often you need the machines in a group in Operations Manager so you can for example create overrides, maintenance mode and views for that group. It is a pretty boring work to build the group manually and then to keep the group updated.

A way to get the Operations Manager group in sync with the machine list is to use a runbook that creates a management pack including a group based on the list. This set of example runbooks reads a list of machine, creates a management pack with a group that includes the machines. The list of servers could be generated by another runbook or another tool. The last runbook also imports the management pack into Operations Manager.

This first runbook execute the following steps. In general this runbook checks if the machines in the list has a Operations Manager agent, if they are monitored by Operations Manager

  1. Delete File. Deletes old Machines_IDS.txt file if it exists. Machines_IDS.txt is used later in the runbook and needs to be blank before we begin
  2. Get Lines. Read all lines in the list. The list is simple a text file with servers, one server per row
  3. Get Monitor. Check if Operations Manager have a Microsoft.Windows.Computer monitor for the servers in the text file
  4. Append Line. For each machine that has a monitor, we write the machine name to a temporary file. This is the same file as step one deleted any old version of
  5. Junction. We merge multiple threads together
  6. Invoke Runbook. Trigger next runbook
The second runbook executes the following steps. In general it builds the management pack file in XML
  1. Delete File. Deletes old MP files
  2. Modify Counter. We use a counter to keep track of the management pack version number. This step adds one to that counter value
  3. Get Counter Value. Get the counter value for the same counter as in step 2
  4. Append Line. This steps writes the first half of the XML code that needs to be in the management pack. The GroupInstanceID is a random ID that the Operations Manager console generated when I test created a group in the console. You could replace that and all the other names in the management packs.
  5. Read Line. This step reads every machine that we wrote in the machine list in the first runbook, step 4,
  6. Append Line. This steps writes all the machines from step 5 into the management pack file
  7. Junction. We merge multiple threads together
  8. Append Line. Writes the end of the management pack, some more XML
  9. Invoke Runbook. Starts the last runbook and pass the path to the management pack file
The last runbook inports the management pack file into Operations Manager

The result is that each time you run this set of runbooks they will generate a new management pack version with a group that includes all the machines from your list, that has a agent. The management pack is imported into Operations Manager and you can use the updated group. You could include a step to seal the management pack too. You can download my runbook example here, 20120410_GroupSync_WOLF.  Please note that this is provided “as is” with no warranties at all.

Automate Operations Manager with Orchestrator and Service Manager

Lately I have been working with a number of automation scenarios where we use Service Manager and Orchestrator to automate Operations Manager. The scenarios have not really been target to end-users, instead to Operations Manager administrators and subject matter experts (SME). For example install agent and uninstall agent, maintenance mode and disable alerts. All this can be done from the Operations Manager console, but that require first knowledge and then of course permission. Another benefit with automate it based on a form is that is will be done the same everytime, independent on which administrator performing the task.

For example if you want to allow subject matter experts, like your Exchange team, to install Operations Manager agent on new Exchange machines, you will need to give them Operations Manager administrator permissions. Alternative they will contact your Operations Manager administrator’s team and request to get the agent installed. With Service Manager 2012 self-service portal you can publish this as a service request. Anyone with enough Service Manager permissions can request an agent in the portal, an operations manager administrator can approve it and then Orchestrator will install the agent.

Another common scenario is that a team gets an alert that they don’t want. For example a service is not running on server X. Instead of forwarding this ticket to the Operations Manager team they could simple disable it for the server themselves, after an Operations Manager administrator have approved it in Service Manager.

In this blog post I want to share with you a runbook that disable a rule or monitor based on ticket ID from Service Manager. The scenario is that a team, for example the Exchange team, gets an incident in Service Manager; the incident is generated from an alert in Operations Manager. After some thinking they realize that they no longer want to monitor that for that server. They browse to the self-service portal and input the incident ID (ticket ID in Operations Manager), after approval, Orchestrator find the rule or monitor generate that alert and disables it for the machine. The Orchestrator runbook also handle override management pack. The Exchange team could also browse to the Orchestrator console and start the runbook.

Another discussion is if this is a service request of a change request. The current build of Service Manager 2012 allows us to publish service requests direct in the portal, not change requests. Therefor I will use Service Requests in this example.

  1. The first activity, gets Ticket Id and target server name as input. Which Operations Manager management server the runbook is using is controlled by a variable. All activities use the same variable to contact the same management group.
  2. This activity is used to format date and time which is used for the management pack backup folder
  3. Powershell script, loads the Operations Manager module and gets the alert with the ticket ID we input when the runbook was started. The script then checks if the alert is from a rule or a monitor. As we need to use different cmdlets to disable a rule or a monitor we need to know if the source is a rule or a monitor. Next step finds out the orginal management pack, the management pack that stores the alert source. The script checks if there is a suitable override management pack. The override management pack name is “Organisation – orginal name” for example “Contoso – Microsoft Unix Library”. “Contoso” is controlled by a variable. All steps will echo to the logfile, default path is C:\TEMP\logfile.txt
  4. If there is no suitable override management pack this activity creates it, as a XML file in the C:\TEMP folder
  5. Imports the override management pack created in step 4
  6. Overrides the rule or the monitor
  7. If there is a override management pack already step 7 makes a backup of the management pack. The backup of the management pack is stored in C:\TEMP by default. The script will create a folder with the current timestamp and then stores the management pack in it
  8. Overrides the rule or the monitor

An example of this could be that first an alert is generated in Operations Manager and syncronized to Service Manager. Service Manager writes back the incident number to Operations Manager

A SME finds out that this rule or monitor can be disabled for this or a machine. The SME can then navigate to the Orchestrator Console or browse to the Service Manager self-service portal and input the ticket ID and the server for which the monitoring should be disabled

Either way the runbook will be executed and create the override

Summary: Self-service can be target to engineers and SME teams, not only end-users. We use self-service with the Service Manager self-service portal or with the Orchestrator Console to execute tasks that the engineer normally dont have deep skills of or permissions to do. In this example disable a rule or monitor in Operations Manager. We save time both for the engineer and the Operations Manager team. With the Orchestrator console we can use Orchestrator as a “proxy” to carry out complex tasks in a very simple way. With Service Manager we can build a approval flow to make sure every change is approved in a correct way.

If you want to build a runbook that deploys agent to untrusted networks or workgroup machines, where you need certificate based authentication, look at, Deploy OPSMGR agent to untrusted zones with Opalis. Also look at the posts around fault tolerance and logging. If you want to build a runbook to uninstall (and install) the Operations Manager agent, that can be triggered from the Orchestrator console and the Service Manager self-service portal, this is an example

You can download the runbook here, 20111227_DisableRuleMonitor. Please note that this is provided “as is” with no warranties at all.

AEM cross-forest

Monitoring desktop client hardware, operating system and application faults can be of great value in terms of reducing total cost of ownership (TCO) through identification of widespread faults in the monitored environment. In Operations Manager Agentless Exception Monitoring (AEM) can help you with that. AEM enables you to monitor operating systems and applications for error. Client monitoring configures clients to send error reports to an Operations Manager Management Server. With the reporting function client monitoring mode can then generate reports based on these errors. AEM clients don’t need a agent and all client settings are controlled by a group policy.

This week I tested to run AEM between two un-trusted Active Directory forests and it seems to work fine. The biggest challenge is to get name resolution to work, to find a common share where to write error logs and also the design of Operations Manager. Each computer that reports a application error with AEM will be seen as a Windows Computer, with unknown state. Therefor you might want to use a dedicated management group for AEM and client monitoring in general. In the picture below you see a AEM view from the OM12 console, including errors from two different forests.

More info around AEM here.

OM12: SSL Certificate Error

This week when I was working with Linux monitoring from a resource pool in Operations Manager 2012 I got this alert

SSL Certificate Error
Description: The SSL Certificate used by the Agent has a configuration error.

The server certificate on the destination computer (SERVERNAME:1270) has the following errors: The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable. The SSL certificate is signed by an unknown certificate authority.

During discovery of Linux/UNIX machines Operations Manager when the discovery wizard deploys an agent, it retrieves the certificate from the agent, signs the certificate, deploys the certificate back to the agent, and then restarts the agent. Operations Manager uses certificates to authenticate access to the agent computer. During this process the certificate is signed by the management server that is doing the discovery. So even if you configure monitoring to be executed by a resource pool, one of those management servers will do the discovery. After discovery any of management server in the resource pool can do the job, for example run a task against the machine.

You get this alert when another management server in the resource pool is trying to communicate with the Linux/UNIX machine. The root cause is that the another management server is not trusting the first management server as certificate authority, the one that sign the certificate. Therefor when the Linux/UNIX machine is trying to communicate the management server is saying it is using a untrusted certificate. The resolution is to copy the certificate from the management server that sign the agent certificate (Linux/UNIX machine in this case) to all other management server in the resource pool.

More info how to copy the certificate between management servers is here.