Updated November 2023 to clarify exactly what geo-replicated means in Periodic Backup storage.
Azure CosmosDB is a great database that we use for many of the systems we build. However, until recently, the backup story has not been great with only very limited backup options available. Recently Azure has massively improved the situation and anyone who uses CosmosDB in production should take the opportunity to reconfigure their CosmosDB backup.
In this post we will give an overview of the options now available to you. There are two different options, which come with different pros and cons and it is not straightforward to choose between them.
TL;DR; Periodic backups are geo replicated and can be configured to happen often and with a custom retention. However, the storage costs often become extremely high.
Continuous backup gives you 30 days of point-in-time restore capability at a very low cost. But, the backup is not geo-replicated so you need to have CosmosDB replicas in more than one Data Centre to protect from catastrophic failure - which generally means doubling the cost of your CosmosDB.
Why are you backing up anyway?
Before we get into what your options are, it is worth talking briefly about why you need to back up in the first place. There are generally two very different scenarios to consider.
In the case that a data centre is destroyed, you need to be able to get your data back from another data centre. This is a low-probability but high-impact scenario. If your data is only stored in the destroyed data centre, you will never be able to get it back. Geo-replication of the backup (what used to be called “off-site backup”) is key to dealing with this scenario.
User or software error
From time to time, a user may make a mistake and delete some data or a software error may corrupt data etc. It is often the case that you don’t know about this immediately. Subtle software errors may lurk for days or weeks before they are spotted and may thus impact quite a lot of data before they are caught. To protect from this, you need a long retention period, so you can go back in time and get the data back before it is lost forever.
Periodic backup is the default option in CosmosDB. By default, it is configured to make a backup every four hours and keep the last two of those. You can now change this to have any interval and retention period you like (within certain limits).
The backups are geo-replicated by default. This means that even if you only have a single replica of your CosmosDB and the data centre is destroyed, your backup is retrievable from a secondary data centre.
A main point of note however is that despite the data being physically stored in the secondary data centre Microsoft will only restore these backups to the original data centre. In practice if the data centre were destroyed then Microsoft would almost certainly restore the data to another data centre but there is no guarantee of this and no SLA for this to happen.
TL;DR; Geo-replicated backups offer security that the data is safe in a secondary location however it does not offer predictable access to that data in a usable format (i.e. as a cosmos account) in the case of a primary data centre outage
The main issue is that this setup can be quite expensive. Each GB of backup costs £0.12 per month. For example, if you have a 10GB database that you back up every hour and keep each backup for seven days, you will always have 24 x 7 backups = 168 stored. The cost of that is 10GB x 168 x £0.12 = £202 per month.
Some more examples:
|Size in GB||Frequency||Retention||# backups||Cost/month|
|10||1 hour||7 days||168||£202|
|10||4 hour||30 days||180||£216|
|10||1 hour||30 days||720||£864|
|50||1 hour||7 days||168||£1,008|
|50||4 hour||30 days||180||£1,080|
|50||1 hour||30 days||720||£4,320|
|100||1 hour||7 days||168||£2,016|
|100||4 hour||30 days||180||£2,160|
|100||1 hour||30 days||720||£8,640|
Continuous backup is relatively new and very exciting. It will automatically back your data up on an ongoing basis and give you point-in-time restore for the last 30 days. The cost is £0.20 per GB per month. This is based on the size of the database, not the size of the backup so is very predictable. For a 50GB database with a single replica, you should expect to pay about £10 per month for continuus backup, which is a lot more cost-effective than periodic backup. If you have two replicas, you would pay £20 and so on.
However, there is a catch: The continuous backup is only stored in the local data centre - there is no option to geo-replicate it. This means that if the data centre is destroyed, you will lose all your data irretrivably. (Note that some data centres - or “Regions” - have a concept of Availability Zones which can protect you from part of the data centre/region being destroyed, but that distinction is beyond the scope of this post.)
To protect from catastrophic failures, you need to use replicas as well as continuous backups.
Global distribution is a major feature of CosmosDB; you can specify that the database should automatically be replicated to multiple data centres. This can be used to build highly scalable and resilient systems. In practice, however, most of the systems we see just have a single replica, because that is enough for that system’s requirements.
In other words, to use continuous backup safely, you need to first configure your CosmosDB to have a read-only replica in another region and then switch on continuous backup. This will roughly double your current CosmosDB cost, as you are paying for another copy. If you use auto scale, you may pay less than double.
If you currently only use a single replica of CosmosDB, you can increase the backup frequency and retention of your database, but for larger databases this quickly becomes very expensive and will often cost more than your existing spend on CosmosDB.
If you wish to use Continuous backup, you should ensure you have a read-only replicate in another data centre, which will roughly double your current CosmosDB costs.
If you already have multiple replicas, Continuous Backup is the best option, as long as you don’t fall foul of the limitations.
There are several limitations that you need to be aware of.
Some key ones:
- Multiple write-replicas are not supported.
- “Serverless” mode cannot have multiple replicas. This is not the same as “auto scale”, which is a kind of Provisioned throughput.
How to set it up
This is already enabled by default, but you can change the frequency and retention by going to “Backup & Restore” in the CosmosDB menu.
You need to enable the feature: . That will take a little while. Once it has taken effect, the “Backup & Restore” menu item will be replaced by a new “Point in Time Restore” item: .
NewOrbit is an Azure Gold Partner and Azure Reseller (“Direct CSP”) as well as development house. We help other development companies to get more out of Azure, with a particular focus on reducing costs and making systems more secure. If you would like to buy your Azure from people who design and develop systems on Azure every day, give us a shout or ping me on Twitter. We usually give you a “trial”, in the form of a Cost, Infrastructure or Security review so you can see if we can help you and if you like working with us.