One way expensive lesson from Malware scanning in Defender for Storage

11 April 2024 By Kelvin Stott

Malware scanning in Defender for Storage is a great new offering from Azure, part of their Microsoft Defender for Cloud suite of offerings which hit GA earlier this year. When enabled on an Azure Storage account all blobs uploaded are automatically scanned for Malware and then events are emitted with the scan result for you to react accordingly.

As a Microsoft Partner we had early access to this and we set about implementing it on one of the systems we manage to protect our customers from Malware being uploaded to their system. Our initial set up was to simply to scan all blobs uploaded to the system and react to blobs found to have Malware by deleting them and sending an email to ourselves and the system administrator. Nothing more complicated than that to begin with. We were pleased and super excited that this would only cost us approximately £16 per month (£8 for Microsoft Defender for Storage & £8 for the AntiMalware add-on). 🙌

Then we got a monthly bill of £431 (£8 for Microsoft Defender for Storage & £423 for the AntiMalware add-on) 🤯.

So where did we go so fantastically wrong?

How is it charged

The first thing to look at is how the AntiMalware feature is priced and how we attempted estimate the cost. At time of writing, for the Azure UK data centres, Microsoft will charge you £8 per month to enable Microsoft Defender for Storage per storage account. Plus £0.119 per GB of data scanned via the AntiMalware add-on. We used the Ingress metric to gauge the approximate amount of data uploaded and thus to be scanned by the Storage account and estimated our ~£8 per month figure accordingly plus the standard £8 per month charge for the single storage account we were to enable this on.

Crunching the numbers

Clearly however this estimate had made some incorrect assumption and we set to work out what that was. This began with talking to Microsoft Support to establish how the bill had been calculated and they helpfully provided a day-by-day usage data sheet for us; a snippet of which is shown below alongside the ingress metric that we’d used in our estimates for the same dates. As you can see many of the days have similar figures for both the ingress and scanned columns, this is what we expected but some of the days there was a significant difference which neatly explained the high bill, except why was the amount being scanned so different to the ingresses data?

Date	Ingress (GB)	Scanned (GB)
07 October 2023	0.4	0.2
08 October 2023	0.5	0.27
09 October 2023	6.1	922.63
10 October 2023	1.6	85.34
11 October 2023	1.6	63.16
12 October 2023	1.3	7.37
19 October 2023	6.3	695.61
21 October 2023	0.5	0.26
22 October 2023	0.5	0.22
23 October 2023	5.6	726.16

The code involved

As we set about trying to figure this out we returned to the code to see if the code shed any clues and tried to correlate code to what the system might have been doing on the specific days where the cost sky-rocketed.

The system involved here is written in C# and Blobs are uploaded to blob storage using the Azure SDK using either the BlobClient.UploadAsync method or the BlobClient.OpenWriteAsync to get a writeable stream and then write data to it via that stream.

The BlobClient.UploadAsync usage was the most common usage in the system and was used to upload single files into the system from various sources all using the same basic code similar to this:

public async Task SaveBlob(string blobName, Stream file)
{
    BlobContainerClient container;
    await container.CreateIfNotExistsAsync();
    var blob = container.GetBlobClient(blobName);

    await blob.UploadAsync(file);
}

The BlobClient.OpenWriteAsync usage was used in a far more niche scenario where the system needs to pull together and generate many files into a single zip file for download. The code works by creating a zip file blob and then adding files via the writeable Stream returned by BlobClient.OpenWriteAsync. The following code is a simplified version of the code involved where the FileGenerators variable represents a list of file factories that are iterated through and put into a zip archive one by one and uploaded to the storage account.

public async Task<Stream> GenerateZip(string blobName)
{
    BlobContainerClient container;
    await container.CreateIfNotExistsAsync();
    var blob = container.GetBlobClient(blobName);

    await using (var stream = await blob.OpenWriteAsync(true))
    using (var archive = new System.IO.Compression.ZipArchive(stream, ZipArchiveMode.Create))
    {
        foreach (var file in FileGenerators)
        {
            var archiveEntry = archive.CreateEntry(file.Path);
            await using var archiveEntryStream = archiveEntry.Open();
            await using var documentStream = await file.Generate();
            await documentStream.CopyToAsync(archiveEntryStream);
        }
    }
}

Being a much more niche scenario the BlobClient.OpenWriteAsync code was not extensively used by the system and when it was used, generally speaking, it only handled small quantities of data; however on certain dates in October it had been used to generate zip files that were slightly larger and contained many more files than we might consider “typical”; and these dates correlated very well with the high AntiMalware scanning numbers.

`BlobClient.OpenWriteAsync` vs `BlobClient.UploadAsync`

So we’d searched the haystack and seemingly found a potential needle but it was a pretty innocent looking needle. 🤔

Diving into the documentation for BlobClient.OpenWriteAsync and BlobClient.UploadAsync we can start to spot how these two methods differ and the main difference is what they return.

public virtual System.Threading.Tasks.Task<System.IO.Stream> OpenWriteAsync (bool overwrite, Azure.Storage.Blobs.Models.BlobOpenWriteOptions options = default, System.Threading.CancellationToken cancellationToken = default);

Returns

Task<Stream>

A stream to write to the Append Blob.

public virtual System.Threading.Tasks.Task<Azure.Response<Azure.Storage.Blobs.Models.BlobContentInfo>> UploadAsync (System.IO.Stream content, System.Threading.CancellationToken cancellationToken);

Returns

Task<Response<BlobContentInfo»

A Response<T> describing the state of the updated block blob.

Did you spot the key yet somewhat subtle difference?

The key difference here is that UploadAsync is interacting with a Block Blob while OpenWriteAsync is dealing with an Append Blob.

Most of the time we don’t concern ourselves with these different types of Blobs and with the SDK especially it’s, for the most part, hidden away for us and we don’t need to concern ourselves but here it helps to dive in and figure our how they work and how the SDK might be interacting with the Storage account (and it’s REST endpoints) under the hood.

Different Blob types

There are three different types of Blobs in Azure Storage: Block Blobs; Append Blobs and Page blobs. We’ll ignore Page blobs here but all three types are details in the Microsoft documentation if you’re interested.

To summarise Block blobs are optimised for uploading large amounts of data efficiently and can be uploaded via the Put Block & Put Blob operations. While Append Blobs are in effect Block blobs created via Put Blob that are further optimised for append operations and allow for data to be appended to them via the Append Block operation.

But how does this relate to the code above, we’re not Puting blocks or blobs or Appending blobs in that code?

Well that’s just it, we are, it’s just using different SDK specific language. In C# when we call BlobClient.UploadAsync it will be uploading the file using the aforementioned Put Block & Put Blob operations. Meanwhile when called BlobClient.OpenWriteAsync will be creating an empty Append blob via the Put Blob operation and returning a C# Stream which, when written to will call the Append Block operation, in our case when the CopyToAsync method is called. Each time data is uploaded via Append Block the AntiMalware add-on will scan the blob. Notice I say blob, not simply the uploaded data and this is key to understanding what happened here.

To give a simplified example, say our system is generating a zip file with 10 files each 1GB in size, the following steps will happen:

BlobClient.OpenWriteAsync is called and a Put Blob operation is triggered. We’ll assume for simplicity that the blob is 0 bytes at this stage and nothing is scanned.
The first 1GB file is generated and is uploaded to the blob via CopyToAsync which triggers a Append Block call and the AntiMalware scans 1GB of data uploaded. So far so good:
- Total Ingress & zip file blob size 1GB
- Total AntiMalware scanned 1GB
The second 1GB file is generated and is uploaded to the blob via CopyToAsync which triggers a Append Block call and the AntiMalware scans the blob which includes the 1GB of data uploaded plus the existing 1GB previously uploaded. The total ingress and total scanned by AntiMalware scanned is starting to drift apart here.
- Total Ingress & zip file blob size 2GB
- Total AntiMalware scanned 3GB = 1GB from above plus the 2GB scanned this time
The third 1GB file is generated and is uploaded to the blob via CopyToAsync which triggers a Append Block call and the AntiMalware scans the blob which includes the 1GB of data uploaded plus the existing 2GB previously uploaded. The total ingress and total scanned by AntiMalware scanned is drifting further apart here.
- Total Ingress & zip file blob size 3GB
- Total AntiMalware scanned 6GB = 3GB from above plus the 3GB scanned this time
This continues all the way up to the tenth and final file which will result in 10GB of ingress but a whopping 55GB of total AntiMalware scanning!

As you can see there is a massive exponential growth of what is scanned here with each append. In the system we have involved here the files are not as big as 1GB but even if each file is 0.1GB then you’ll have scanned over 900GB of data after just 134 files being appended into the zip 😨

What have we learnt

The key takeaway here is that when implementing Malware scanning in Azure Storage you need to be mindful that depending on your use case Ingress is not necessarily equivalent to the amount that will be scanned for Malware. You will need to have a look through the code involved and peel back the layers to establish how the code will interact with the storage account and what will and will not be scanned. You may need to redesign parts of your system to combat this and prevent excessive spend.

In our case we’re going to have to re-jig how these zip files are generated so that either the files are assembled as individual block blobs and then combines into a zip all at once meaning that they are scanned just once each or we will create a new storage account without Malware scanning which can be used when assembling the zip archive before it’s moved to the main Malware scanned storage account later in an single operation that will again result in a single scan rather than an exponentially increasing amount of scanning.