Contoso.se

Welcome to contoso.se! My name is Anders Bengtsson and this is my blog about Azure infrastructure and system management. I am a senior engineer in the FastTrack for Azure team, part of Azure Engineering, at Microsoft.  Contoso.se has two main purposes, first as a platform to share information with the community and the second as a notebook for myself.

Everything you read here is my own personal opinion and any code is provided "AS-IS" with no warranties.

Anders Bengtsson

MVP
MVP awarded 2007,2008,2009,2010

My Books
Service Manager Unleashed
Service Manager Unleashed
Orchestrator Unleashed
Orchestrator 2012 Unleashed
OMS
Inside the Microsoft Operations Management Suite

Building reports with Log Analytics data

A common question I see is how to present the data collected with Log Analytics. We can use View Designer in Log Analytics, PowerBI, Azure Dashboard, and Excel PowerPivot. But in this blog post, I would like to show another way to build a “report” direct in the Azure Portal for Log Analytics data.

Workbooks is a feature in Application Insights to build interactive reports. Workbooks are configured under Application Insights but it’s possible to access data from Log Analytics.

In this example, we will build a workbook for failed logins in Active Directory. The source data (event Id 4625) is collected by the Security and Audit solution in Log Analytics.

If we run a query in Log Analytics to show these events, we can easily see failed login reason and number of events. But we would also like to drill down into these events and see account names. That is not possible in Log Analytics today, and this is where workbooks can bring value.

Any Application Insights instance can be used; no data needs to be collected by the instance (no extra cost) as we will use Log Analytics as a data source. In Application Insights, there are some default workbooks and quick start templates. For this example, we will use the “Default Template.”

In the workbook, we can configure it to use any Log Analytics workspace, in any subscription, as a source. Using different workspaces for different parts of the workbook is possible. The query used in this example is shown below, note it shows data for the last 30 days.

SecurityEvent
| where AccountType == ‘User’ and EventID == 4625
| where TimeGenerated > ago(30d)
| extend Reason = case(
SubStatus == ‘0xc000005e’, ‘No logon servers available to service the logon request’,
SubStatus == ‘0xc0000062’, ‘Account name is not properly formatted’,
SubStatus == ‘0xc0000064’, ‘Account name does not exist’,
SubStatus == ‘0xc000006a’, ‘Incorrect password’,
SubStatus == ‘0xc000006d’, ‘Bad user name or password’,
SubStatus == ‘0xc000006f’, ‘User logon blocked by account restriction’,
SubStatus == ‘0xc000006f’, ‘User logon outside of restricted logon hours’,
SubStatus == ‘0xc0000070’, ‘User logon blocked by workstation restriction’,
SubStatus == ‘0xc0000071’, ‘Password has expired’,
SubStatus == ‘0xc0000072’, ‘Account is disabled’,
SubStatus == ‘0xc0000133’, ‘Clocks between DC and other computer too far out of sync’,
SubStatus == ‘0xc000015b’, ‘The user has not been granted the requested logon right at this machine’,
SubStatus == ‘0xc0000193’, ‘Account has expirated’,
SubStatus == ‘0xc0000224’, ‘User is required to change password at next logon’,
SubStatus == ‘0xc0000234’, ‘Account is currently locked out’,
strcat(‘Unknown reason substatus: ‘, SubStatus))
| project TimeGenerated, Account, Reason, Computer

In the workbook, on Column Settings, we can configure how the result will be grouped together. In this example, we will group by failed login reason and then account name.

When running the workbook, we get a list of failed login reasons and can expand to see account names and amount of failed logins. It is possible to add an extra filter to the query to remove “noise” for example accounts with less than three failed login events.
It is also possible to pin a workbook or part of a workbook, to an Azure Dashboard, to easily access the information.

In the workbook you can also add more text fields, metric fields and query fields, for example a time chart showing the amount of events per day.

Ingestion Time in Log Analytics

A common topic around Log Analytics is ingestion time. How long time does it take before an event is visible in Log Analytics?
The latency depends on three main areas agent time, pipeline time and indexing time. This is all described in this Microsoft Docs article.

In Log Analytics or Kusto, there is a hidden DateTime column in each table called IngestionTime. The time of ingestion is recorded for each record, in that hidden column. The IngestionTime can be used to estimate the end-to-end latency in ingesting data to Log Analytics. TimeGenerated is a timestamp from the source system, for example, a Windows server. By comparing TimeGenerated and IngestionTime we can estimate the latency in getting the data into Log Analytics. More info around IngestionTime policy here.

In the image below a test event is generated on a Windows, note the timestamp (Logged).

When the event is in Log Analytics, we can find it and compare IngestionTime and TimeGenerated. We can see that the difference is around a second. TimeGenerated is the same as “Logged” on the source system. This is just an estimate, as the clocks on the server and in Log Analytics might not be in sync.

If we want to calculate the estimated latency, we can use the following query. It will take all events and estimate the latency in minutes, and order it by latency.

Event
| extend LatencyInMinutes = datetime_diff('minute', ingestion_time(), TimeGenerated)
| project TimeGenerated, ingestion_time(), LatencyInMinutes
| order by LatencyInMinutes

You can also summaries the average latency per hour, and generated a chart, with the following query. This is useful when investigating latency over a longer period of time.

Event
| extend LatencyInMinutes = datetime_diff('minute', ingestion_time(), TimeGenerated)
| project TimeGenerated, ingestion_time(), LatencyInMinutes
| summarize avg(LatencyInMinutes) by bin(TimeGenerated, 1h)

Disclaimer: Cloud is a very fast-moving target. It means that by the time you’re reading this post everything described here could have been changed completely.
Note that this is provided “AS-IS” with no warranties at all. This is not a production-ready solution for your production environment, just an idea, and an example.

Analyze and visualize Azure Firewall with Log Analytics View Designer

A colleague and I have put together a sample view for Log Analytic to analyze and visualize Azure Firewall logs. You can download the sample view here. The sample view will visualize data around application rule and network rule log data. With view Designer in Azure Log Analytics, you can create custom views to visualize data in your Log Analytics workspace, read more about View Designer here.

 

Monitor Linux Daemon with Log Analytics

In this blog post I would like to share an example of how daemons on Linux machines can be monitored with Log Analytics. Monitoring daemons are not listed as a feature direct in the Log Analytic portal, but it is possible to do. When a daemon is started or stopped a line is written in Syslog. Syslog is possible to read with the Microsoft Monitoring Agent and send to Log Analytics.

The only thing to configure is to enable collection of Syslog and the daemon facility.

If the daemon is stopped (the cron daemon in this example) the following lines are written to the syslog logfile

Soon after the same lines are written to Log Analytics as events in the Syslog table

You can now configure an alert including notification when the daemon stops. The alert can, for example, be visualized in Azure Monitor and sent by e-mail.

 

 

 

Disclaimer: Cloud is a very fast-moving target. It means that by the time you’re reading this post everything described here could have been changed completely.
Note that this is provided “AS-IS” with no warranties at all. This is not a production-ready solution for your production environment, just an idea, and an example.

Deploying a central auditing workspace with Log Analytics

One or multiple workspaces

A common question when talking Log Analytics design is one or multiple workspaces. Should there be one central workspace with all data? Should there be one workspace per application? Should there be one workspace for the auditing team? There are many different ideas and scenarios, but a common component is a central workspace for auditing. One workspace where a central team can track security-related events, assessments, and alerts.

The following topics are often involved when the decision to use one or multiple workspaces

  • In which region do we need to store the data, for example, the data must be stored within EU
  • Data Retention. The number of days to store the data is configured on a workspace level. That means that we will need to pay for the same retention setting on all our data within a workspace. If we have some data that needs to be stored for 7 days and some important data needs to be stored for 200 days, we will need to pay and store all data for 200 days.
  • Data Access. Today the workspace is the security boundary for Log Analytics. If we, for example, have log data from two different teams, that are not allowed to see each other’s data, we will need to store them in different workspaces.
  • Data Collection. Today there are solutions and data collection settings that are set on workspace level. For example, we enable collection of warnings in the Application log on Windows servers, it will be collected from all connected Windows servers, even if we only need it from some of our servers. This can affect the total cost in a negative way if collecting data not needed. In this scenario, it might be an idea to connect some servers to one workspace and others to another workspace.

When decided to use multiple workspaces it is possible to multi-home Windows servers to send data to multiple workspaces. For Linux servers and some other data sources, for example, multiple PaaS services can today send data to one workspace. One thing to note when configuring multi-home data sources is that if the same data is collected and inserted into multiple workspaces, we also pay for that data twice. In other words, it is a good idea to make sure that different kind of data is collected for each workspace, for example, audit data to one workspace and application logs to another.

The following figure describes a scenario where two application teams have their own workspaces, and there is one workspace for central auditing. The auditing team needs access to data from both service workspaces, to run analyzes and verify that everything is running according to company policies.

To deploy this scenario simple deploy three workspaces and give the central auditing team read permissions on each service workspace, see Microsoft Docs for more details.

Cross workspace queries

The next step is to start author queries to analyze and visualize the data. The data is stored in each service workspace so the Auditing team will need to use the cross-workspace query feature, read more about it (https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-cross-workspace-search). Data is only stored in the two service workspaces, there is no data in the central auditing workspace.

The following query is a cross workspace query example, query two workspaces and list failed logon events. In the query we use “isfuzzy” to tell Log Analytics that execution of the query will continue even if the underlying table or view reference is not present. We can also see the two workspace ID, one for each service workspace, and that we use the SecurityEvent table.

union isfuzzy=true
workspace(“b111d916-5556-4b3c-87cf-f8d93dad7ea0”).SecurityEvent, workspace(“0a9de77d-650f-4bb1-b12f-9bcdb6fb3652”).SecurityEvent
| where EventID == 4625 and AccountType == ‘User’
| extend LowerAccount=tolower(Account)
| summarize Failed = count() by LowerAccount
| order by Failed desc

The following example shows all failed security baseline checks for the two service workspaces

union isfuzzy=true
workspace(“b111d916-5556-4b3c-87cf-f8d93dad7ea0”).SecurityBaseline, workspace(“0a9de77d-650f-4bb1-b12f-9bcdb6fb3652”).SecurityBaseline
| where ( RuleSeverity == “Critical” )
| where ( AnalyzeResult == “Failed” )
| project Computer, Description

To make cross workspace queries a bit easier we can create a function. For example, run the following query and save it then as a function.

union isfuzzy=true
workspace(“b111d916-5556-4b3c-87cf-f8d93dad7ea0”).SecurityBaseline, workspace(“0a9de77d-650f-4bb1-b12f-9bcdb6fb3652”).SecurityBaseline

We can then call the function in our queries, for example, to get all failed security baseline checks. We don’t need to specify workspaces to join, as they are handled by the function.

ContosoSecEvents
| where ( RuleSeverity == “Critical” )
| where ( AnalyzeResult == “Failed” )
| project Computer, Description

Another way of using saved functions is the following example.
First, we have a saved function named ContosoCompMissingUpdates listing all computers that are missing updates.

union isfuzzy=true
workspace(“b111d916-5556-4b3c-87cf-f8d93dad7ea0”).Update, workspace(“0a9de77d-650f-4bb1-b12f-9bcdb6fb3652”).Update
| where UpdateState == ‘Needed’ and Optional == false and Classification == ‘Security Updates’ and Approved != false
| distinct Computer

We can then use the ContosoCompMissingUpdates function within a query showing machines with failed Security baseline checks. The result is a list of machines missing updates and with failed baseline checks.

ContosoSecEvents
| where ( RuleSeverity == “Critical” )
| where ( AnalyzeResult == “Failed” )
| where Computer in (ContosoCompMissingUpdates)
| project Computer, Description

 

Disclaimer: Cloud is a very fast-moving target. It means that by the time you’re reading this post everything described here could have been changed completely.
Note that this is provided “AS-IS” with no warranties at all. This is not a production-ready solution for your production environment, just an idea, and an example.

Exporting Azure Resource Manager templates with Azure Automation, and protecting them with Azure Backup

Earlier this week I put together a runbook to backup Azure Resource Manager (ARM) templates for existing Resource Groups. The runbook exports the resource group as a template and saves it to a JSON file. The JSON file is then uploaded to an Azure File Share that can be protected with Azure Backup.

The runbook can be downloaded from here, PS100-ExportRGConfig. The runbook format is PowerShell. The runbook might require an Azure PS module upgrade. I have noticed that in some new Azure Automation accounts, the AzureRM.Resources module doesn’t include Export-AzureRmResourceGroup and needs an update.

Inside of the runbook, you need to configure the following variables:

  • Resourcegrouptoexport , this is the Resource Group you would like to export to a JSON file.
  • storageRG, this is the name of the Resource Group that contains the file share you want to upload the JSON file to.
  • storageAccountName, this is the name of the storage account that contains the Azure file share.
  • filesharename, this is the name of the Azure file share in the storage account. On the Azure file share, there needs to be a directory named templates. You will need to create that directory manually.

When you run the runbook you might see warning messages. There might be some cases where the PowerShell cmdlet fails to generate some parts of the template. Warning messages will inform you of the resources that failed. The template will still be generated for the parts that were successful.

Once the JSON file is written to the Azure File Share you can protect the Azure file share with Azure Backup. Read more about backup for Azure file shares here.

Disclaimer: Cloud is a very fast-moving environment. It means that by the time you’re reading this post everything described here could have been changed completely. Note that this is provided “AS-IS” with no warranties at all. This is not a production ready solution for your production environment, just an idea, and an example.

“Argument is null or empty” error when running post-steps script in Azure Site Recovery

A couple of days ago I was working on Azure Site Recovery post-step scripts with my colleague Jonathan. The scenario was to fail over two virtual machines running in the West Europe Azure region to the North Europe Azure region. Enable replication between two Azure regions is not complicated, but not all components are supported to fail over between regions, for example, public IP addresses are not. To get all details about supported scenarios see the Azure Site Replication support matrix here. To set up network after failover we wrote an Azure Automation runbook and connected the runbook to our failover plan as a post-steps script. After Azure Site Recovery has run all fail failover steps it triggers the post-steps scripts. But we ran into some strange errors in our post step scripts and would like to share the solution with the community.

When doing test failover everything looked ok from the Azure Site Recovery perspective. Our two virtual machines failed over and the first script was triggered. The script will add public IP addresses to the two machines. 

But when looking in Azure Automation on the runbook job we could see that something was not working.


The runbook could not find the new virtual machine resources in the pre-created resource group. After a couple of different tests, we realized that the new Azure virtual machine resources in North Europe were not ready when the runbook was triggered by the recovery plan.

If we added a small delay on a couple of minutes in the script everything worked perfectly 🙂

Building an Azure dashboard with server performance data

I guess all of you have seen the dashboards in Azure, the first page when login to the Azure portal. In some case, there are some resources that you by mistake pin while deploying, and in some case, it is just blank. In this blog post, I would like to share how to build a simple server health dashboard with basic performance data from servers (CPU workload, free disk space, and free memory). To set up this we need to do three main tasks

  1. Connect Data sources to get data into Log Analytics
  2. Configure queries to collect the needed data
  3. Pin dashboard/view to Azure Dashboard

Connect Data Sources

If your servers are Azure virtual machines you can read some performance data (see image) direct from the VM using the Azure VM agent, but unfortunately nothing about free memory, CPU or free disk space.

To collect the required data we need to install an agent inside the OS. Azure Log Analytic (often called OMS) provides features for collect data and from different sources. Log Analytic also provide features around visualize and analyze the collected data.

In Log Analytics we first need to install the agent on all servers, more information about that here. Once all servers are connected to the workspace the next step is to start collection performance data. You can enable specific performance counters under Advanced settings / Data / Windows Performance Counters or Linux Performance Counters. In the image, it is Windows Performance Counters shown, but of course, you can also do this with Linux Performance Counters too. It is the same steps for Linux servers, install the agent and then enable performance counters.

In this example, we will add the following performance counters, and configure sample interval for every 10 seconds.

  • Memory(*)\Available MBytes
  • LogicalDisk(*)\Free Megabyte
  • LogicalDisk(*)\% Free Space
  • Processor(_Total)\% Processor Time

Building Queries

Next step is to configure queries to visualize the collected data. There are a lot of good information about building queries and working with performance data here and here. But to save you some time you can use the following queries as a foundation,

These queries show the average for each minute (1minutes), based on the data we collect every 10 seconds.

Disk, % Free Space. This query will show % free space on each logical disk that has an instance name that contains “:” (this filter out, for example, mount points volumes on DPM servers).

Perf | where ObjectName == “LogicalDisk” and CounterName == “% Free Space” | where InstanceName contains “:” | summarize FreeSpaceP = avg(CounterValue) by bin(TimeGenerated, 1minutes), CounterPath| sort by TimeGenerated desc | render timechart

Disk, Free Megabytes

Perf | where ObjectName == “LogicalDisk” and CounterName == “Free Megabytes” | where InstanceName contains “:” | summarize FreeSpaceMb = avg(CounterValue) by bin(TimeGenerated, 1minutes), CounterPath| sort by TimeGenerated desc | render timechart

Memory, Available MBytes

Perf | where ( ObjectName == “Memory” ) | where ( CounterName == “Available MBytes” ) | summarize FreeMemMb = avg(CounterValue) by bin(TimeGenerated, 1minutes), CounterPath| sort by TimeGenerated desc | render timechart

Processor, % Processor Time

Perf | where ( ObjectName == “Processor” ) | where ( CounterName == “% Processor Time” ) | where ( InstanceName == “_Total” ) | summarize CPU= avg(CounterValue) by bin(TimeGenerated, 1minutes), CounterPath| sort by TimeGenerated desc | render timechat

It can take some time before the first data is collected. If you don’t see any data when you run the queries, take another cup of coffee and try again a bit later 😊

Building a view

We now have all data sources connected and queries to visualize the data. The next step is to build views in Log Analytics. This is not requirements to build an Azure Dashboard but is nice to have.

Log Analytics View Design is a feature that we can use to build custom views. These views can later be pinned to the Azure Dashboard.  To save you some time you can download “Contoso Example Log Analytics Dashboard” (Contoso Servers) and import into View Designer.

Pin tiles to the Azure Dashboard – Log Analytics tile

There are two ways to pin a tile to the Azure dashboard that we will look at. The first one is to right-click a tile in Log Analytics and select pin to the dashboard. You can see this process in the following two images. On the Azure Dashboard you will see the view tile from the Log Analytics solution, if you click it you will go into Log Analytics and the specific solution. In this example, you can’t see processor, memory or disk performance direct on the Azure Dashboard.

Pin tiles to the Azure Dashboard – Advanced Analytics

The second alternative is to pin charts directly from the Advanced Analytics portal inside of Log Analytics. The Advanced Analytics feature provides advanced functionality not available in the Log Search portal, for example, Smart Analytics. In Log Analytics, click Analytics, to open up the Advanced Analytics portal. In the advanced analytics portal run the queries from this blog post, and click “Pin” on the right side. Once the different charts/queries are pinned to the Azure dashboard you can select them, click Edit, and change title and description on them.

Disclaimer: Cloud is very fast-moving target. It means that by the time you’re reading this post everything described here could have been changed completely.
Note that this is provided “AS-IS” with no warranties at all. This is not a production ready solution for your production environment, just an idea and an example.

 

Inside the Microsoft Operations Management Suite [e-book] version 2 now public

It took some time and a lot of writing, testing, editing, investigation, and discussion, but now it is finally here!

Download your copy here.

Big thanks for the discussions and collaboration to the authoring team, Tao Yang, Stanislav Zhelyazkov, the Pete Zerger and Kevin Greene.

Happy reading 🙂

Keep your Azure subscription tidy with Azure Automation and Log Analytics

When delivering Azure training or Azure engagements there is always a discussion about how important it is to have a policy and a lifecycle for Azure resources. Not only do we need a process to deploy resources to Azure, we also need a process to remove resources. From a cost perspective this is extra important, as an orphan IP address or disk will cost many, even if they are not in use. We also need policy to make sure everything is configured according to company policy. Much can be solved with ARM policies, but not everything. For example, you can’t make sure all resources have locks configured.

To keep the Azure subscription tidy and to get an event/recommendation when something is not configured correctly we can use an Azure Automation and OMS Log Analytics. In this blog post, I will show an example how this can be done 😊 The data flow is

  1. Azure Automation runbook triggers based on a schedule or manual. The runbook run several checks, for example if there are any orphan disks.
  2. If there is anything that should be investigated an event is created in OMS Log Analytics.
  3. In the OMS portal, we can build a dashboard to get a good overview of these events.

The example dashboard shows (down the example dashboard here)

  • Total number of recommendations/events
  • Number of resource types with recommendations
  • Number of resources groups with recommendations. If each resource group correspond to a service, it is easy to see number of services that are not configured according to policy

The runbook is this example checks if there are any disks without an owner, any VMs without automatically shut down, any public IP addresses not in use and databases without lock configured. The runbook is based on PowerShell and it is easy to add more checks. The runbook submit data to OMS Log Analytics with Tao Yang PS module for OMSDataInjection, download here. That show up in Log Analytics as a custom log called ContosoAzureCompliance_CL. The name of the log can be changed in the runbook.

The figure below shows the log search interface in the OMS portal. On the left side, you can see that we can filter based on resource, resource type, severity and resource group. This makes it easy to drill into a specific type of resource or resource group.

Disclaimer: Cloud is very fast-moving target. It means that by the time you’re reading this post everything described here could have been changed completely.
Note that this is provided “AS-IS” with no warranties at all. This is not a production ready solution for your production environment, just an idea and an example.