Building an NSG logger

https://github.com/securethelogs/Azure_NSGLogger

My first attempt at a logger was for short term: https://securethelogs.com/2020/09/17/view-azure-nsg-flow-logs-in-powershell/

It became apparent quite quickly that this wouldn’t be a long standing solution to avoid the log analytics costs.

The reason why is because the sheer volume of logs being generated and the annoyance of how Microsoft handles the logs. Looking at the breakout of each, the blob container quickly grows and the fact each hour is split, becomes a nightmare, as a single hour can be more than 50+ files. This is a typical path to a single log…

https://{storageAccountName}.blob.core.windows.net/insights-logs-networksecuritygroupflowevent/resourceId=/SUBSCRIPTIONS/{subscriptionID}/RESOURCEGROUPS/{resourceGroupName}/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/{nsgName}/y={year}/m={month}/d={day}/h={hour}/m=00/macAddress={macAddress}/PT1H.json

I did at first target only those created on the current day, however when troubleshooting live, it needs to be closer to the actual time. Pulling a day is great, but takes ages to process. To get an hour, my script was querying the path with the clients time ( -hour) and filtering out based on h=XX.

I later moved onto -context $context

Although this halved the time of the original script, it still required manual input. It worked as a solution, however if two users ran the script at the same time, they would essentially be running the same thing in parallel. This brings waste and cost; simply not efficient enough.

I considered Azure automation as I had done something similar in the past but it had it’s downfalls. Azure Automation is charged on runtime and the parsing the logs was taking too long in the hour blocks. I needed something real time that didn’t have to include the logic to spot newly written NSG flow logs. I also didn’t have an output… I thought that I could use the grid, the runtime log output or upload to the same blob. This again felt clunky and not something to do.

Microsoft BI Tools: Azure Automation

At this point, my brain was rattling to solve an issue of user experience rather than implement a greater solution with a lot of waste. I wanted it cheap, fast and be nice and easy to use. Because NSGs are similar to traditional firewall logs, I wanted to get the same experience of realtime log analysers.

Along comes Gobi….

Gobi starts to bring up the idea of Azure Functions and Tables. Not only can the Function have a trigger to run on new creations but also output the logs as they come in. This reduces the lag and runtime needed to process each log (JSON to readable format) and takes the automation to another level.

Microsoft's Serverless Glue: Azure Functions | by Earl Gay | Medium

On the first trail, the script was stripped of the manual input and shorted by more than half. On each trigger, the function would pass the JSON path so there was little need to query and filter. Instead it would parse the logs at each event and output them to table. This table would be a storage table that could be hosted within the same storage account.

Not only would this reduce the need to provision the function app across multiple storage accounts (RBAC), it would also mean that to view the regions NSG logs, you would simple go to the regions storage account and select Storage Explorer. It’s forced to be done this way because there is a restriction to keep the NSG logs within the same region as the storage account.

You can use the client but personally I find it slow. You spend the first 5 minutes looking at the logo before it tells you; “Your account has an issue” but still works at 100%. You can also use Powershell but I was wanting something that you didn’t need to install modules or have additional requirements to view.

so how did we do it…

Firstly we create a storage account within the same region as the NSGs. This is so that we can feed the logs. Once done, we need to enable the NSG flow logs by navigating to the NSG > NSG Flow Logs.

The retention policy will be handled through the management life cycle on the storage account set. This can be changed to your requirements. Microsoft mention 5 days on their docs however if you are needing longer, make the change. Later on we will be setting to archive using two options which will save you pennies.

This can also be done in Poweshell using scripts similar to:

$location = "Location"
$sa = "Storageaccountname"
$sarg = "Resourcegroup of storage account"
$NW = Get-AzNetworkWatcher -ResourceGroupName NetworkWatcherRg -Name "NetworkWatcher_$location"
$nsgs = @((Get-AzNetworkSecurityGroup | Where-Object {$_.Location -eq $location}).Id)
$sa = (Get-AzStorageAccount -ResourceGroupName $sarg -Name $sa).Id

foreach ($nsg in $nsgs){
Set-AzNetworkWatcherConfigFlowLog -NetworkWatcher $NW -TargetResourceId $nsg -StorageAccountId $sa -EnableFlowLog $true -FormatType Json -FormatVersion 2 -EnableRetention $true -RetentionInDays 180
}

Once enabled, if traffic is flowing, it will create the container:
Insights-logs-networksecuritygroupflowevent.

If you don’t see this, no traffic is flowing through the NSG.

Next steps would be to setup your function app. This will need to be integrated with the same storage account which the NSG flow logs are feeding into. You will need to select Powershell and the rest can match your requirements.

Once configured, your first step would be to setup an identity. This is so that you can target the correct RBACs to allow it to pull and write the values.

Once created, you will need to set the IAM on the storage account used. The function will need access to write to Table and read BLOB. You can customize roles to allow least privilege for this. For the example, Contributor will do.

Whilst in here, you may as well create the Table to be used. Within your Storage Account go to Tables and create. You will need this name later on. For now, I’ve called mine nsg.

The next step is to change the JSON file to allow timeout of 10 minutes. You don’t have to do this but if you’re using the cheaper option, the default is set lower than what is allowed. Click on the App Service Editor and then Go>

Under the host.JSON, add “functionTimeout”: “00:10:00”, to line 3. This will give you a ten minute timeout instead of 5. https://docs.microsoft.com/en-us/azure/azure-functions/functions-scale

The next step I believe is option but I do just in case. The script should create the directory but It’s flaky. Under advanced tools, click go:

This will give you the option to use the debugger Powershell console.

Once in here, we need to create the directory used within the script. In my script, I used:

so I will need to run mkdir D:\Home\Temp

This is pretty much it for the pre-requirements. The next step is to configure the function. Under the function tab, click add and select Blob Trigger

Give it a name and for path, use: insights-logs-networksecuritygroupflowevent/{name}

This is the path of where the NSGs are generated within the Blob. Basically this is saying; “run the function every time a file changes within the {path}”.

Once created, open it up and go to Integration. Click on Add Output and select Azure Table Storage:

Leave the parameter name and instead change the Table name to whatever you called it above. For this example, I called it nsg.

We can now copy and paste the code stored within my Repo: https://github.com/securethelogs/Azure_NSGLogger/blob/main/Azure_NSGLogger_Function.ps1

Highlight all and paste over the top. Once done, click save and open up the logs:

It’s good to watch this for the first few minutes and see if you see any issues. If not, you can go over to your Table and access via Storage Explorer:

You can then start to filter and run querying your logs. The PartitionKey is the date logged in the Timestamp of the NSG flow logs. The Time is also derived from this, but has it’s own field. To get the latest log, click on the Time field twice to switch to descending order.

Now you have your working table, it might be good to think about rotation and retention. Storage accounts are cheap however, there are still pennies to be saved. Because you have a table of data, you can potentially afford to archive your logs after X amount of days. This can easily be handled by the Life cycle again. Below I archive the logs after 1 day.

An alternative to this would be to uncomment this line within my script to archive as soon as it feeds into the Table. Be careful though as retrieving the data once in archive storage has additional costs.

For rotation because you don’t want this to grow and grow, I created an automation account that merely deletes the table and recreates. There are multiple options for this, but this worked for myself.

$connectionName = "AzureRunAsConnection"
try
{
# Get the connection "AzureRunAsConnection "
$servicePrincipalConnection=Get-AutomationConnection -Name $connectionName
connect-AzAccount ` -ServicePrincipal ` -TenantId $servicePrincipalConnection.TenantId ` -ApplicationId $servicePrincipalConnection.ApplicationId ` -CertificateThumbprint $servicePrincipalConnection.CertificateThumbprint | Out-Null
}
catch {
if (!$servicePrincipalConnection)
{
$ErrorMessage = "Connection $connectionName not found."
throw $ErrorMessage
} else{
Write-Error -Message $_.Exception
throw $_.Exception
}
}
$rsg = "Your Resourcegroup"
$sa = "Your storage account"
$table = "Name of your table"
$key = (Get-AzStorageAccountKey -StorageAccountName $sa -ResourceGroupName $rsg).Value[0]
$context = New-AzStorageContext -StorageAccountName $sa -StorageAccountKey $key
Remove-AzStorageTable -Name $table -Context $context -Force
Start-Sleep -Seconds 60
New-AzStorageTable -Name $table -Context $context
while ((Get-AzStorageTable -Name $table -Context $context -ErrorAction SilentlyContinue) -eq $null){
Start-Sleep -Seconds 5
New-AzStorageTable -Name $table -Context $context
}

This will trigger on a timer and remove the current Table. Once done, it will wait for a minute, then recreate.

I found that it takes on average around 20-30 seconds before you can recreate a table which had a past name. I also chose to delete as querying and deleting rows takes too long. The processing of this costs, so deleting it does the job. If it fails over a minute to recreate, it starts a loop until it’s there. Once it’s back live, your function will continue as normal. To backdate the date missed, you can load the function, test and run, which will populate with all logs in hot tier.

If the function does fail, remember that the script also works. You could Powershell just the necessary scripts to allowing view whilst the function app is fixed. All scripts are avaiable on my Github to build: https://github.com/securethelogs/Azure_NSGLogger

https://github.com/securethelogs/Azure_NSGLogger

I hope you enjoy 😀

Github: https://github.com/securethelogs/Azure_NSGLogger

Many thanks to Gobi (v-gomage@microsoft.com), for all his help 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s