Azure Cosmos DB
Sam was a popular businessman who was looking to invest in a new e-commerce website that will be hosted on the cloud. He wanted a powerful database to connect with his application, which is capable of storing thousands of product information.
He wanted the database interactions to be super-fast, should be available to users across the globe and it should have low downtime as there will be a constant inflow of visitors to his website 24 X 7.
After scrounging for a database for a long time, he finally narrowed down to Azure Cosmos DB.
Why did you choose Azure Cosmos DB? Did it fit his requirements?
Let us find out the answers to these questions in this post by understand how Azure Cosmos DB works and what are it key features. But before that, we are going to first understand what is Azure Cosmos DB
Table of Contents
- What is Azure Cosmos DB
- Key features of Azure Cosmos DB
- How Cosmos DB works
- Areas of practical implementation
- Wrapping up
What is Azure Cosmos DB?
Azure Cosmos DB is a Platform as a Service (PaaS) offering from the house of Microsoft. It is a serverless NoSQL database that is globally distributed, has low latency, and supports multi query-API for managing large amounts of data. It was formerly known as Document DB.
The term ‘globally distributed’ is typically the capability of the users to replicate their database in different geographical areas across the globe. Also, it does not require any schema to store the data as it stores it in JSON format. Users can make use of SQL queries to execute the stored JSON documents.
Key features of Azure Cosmos DB
- High Availability
Azure Cosmos DB transparently replicates the users’ data across all the Azure regions associated with the account. It offers a staggering 99.999 % availability for read/write operations when the database is associated with multi-region accounts (with multi-region writes). A failover option is also available in case one of the regions fail.
- Quick Response
Since Azure Cosmos DB lets you distribute the data to multiple Azure regions, it means the user can access the data from a database that is closest to his location. This effectively translates to low latency rates as it can respond to read/write operations in less than 10 milliseconds.
- Easily Manage Regions
Azure Cosmos DB provides an option to add or remove Azure regions seamlessly to your account. Cosmos DB will continuously replicate the data to all the new regions added by you.
- Data is Indexed
Azure Cosmos DB automatically indexes all the data that is present in all the fields across every file. It indexes every property of every record without the need for schemas and index management. You can even add custom indexes.
- Scalability
Azure Cosmos DB provides the option to scale based on demand. You can scale horizontally by adding the required number of servers to the clusters. At its peak, you can process millions of read/write requests in a second.
- Multi-Master Support
This means Cosmos DB can support more write operations than the read operations. With the multi-master feature, Azure lets you choose all data servers as write servers.
- Multi-Model
It can store data in the form of key-value pairs, document-based, column family-based, graph-based. The best part is, irrespective of the model you choose you always get high availability, quick response, data indexing, scalability, and global distribution from Cosmos DB.
How Cosmos DB works
To understand how Cosmos DB works, let us go back to Sam’s website which we discussed at the starting of our blog. Let us assume that Sam had initially chosen a non-multi master mode database where the information is written only to a primary database (say located in Amsterdam)
Since the data is written only to the Amsterdam database, users in the western part of Europe will be able to get the data faster as compared to users in other parts of the world due to network latency. This is unacceptable for an application where thousands of visitors transact globally every second.
However, when Sam chose Azure Cosmos DB, the multi-master feature of Cosmos DB came to his rescue. With this feature, the data got written to all the regions simultaneously. For example, if a user is located in Chicago, then the data will be pushed to this user from the nearest database located in New York (secondary database) thus ensuring high response time. And remember, the data is replicated to the secondary database immediately when the data is written to the primary database.
Typically one of the common challenges associated with most databases is in maintaining perfectly synchronized data across all the databases (primary and secondary). However, this is also taken care by Azure Cosmos DB through its Consistency feature.
Cosmos DB relies on replication for facilitating high availability and low latency. However, it is an arduous task to achieve strong consistency of data along with high availability and low latency. So, Azure Cosmos DB offers a perfect trade-off between consistency, availability, and latency, and gives that choice to developers to choose it based on their needs. This is called consistency levels.
Azure offers 5 consistency levels -
- Eventual
- Consistent Prefix
- Session
- Bounded Staleness
- Strong
Remember, the data was first written in the Amsterdam database, then the replication happened where the data got synced across the other databases (which typically takes a few milliseconds). During this short time, there is a possibility that a user near the Amsterdam database can see the updated information while the rest of the world still sees the old information. This is called ‘Eventual Consistency’. On the other hand, in ‘Strong Consistency’, all the users will see the updated information at the same time.
The other types of consistency such as consistent-prefix, session, and bounded-staleness are purely defined based on the consistency level. As far as the performance and availability are concerned, Stong has the lowest performance/availability while the eventual has the highest.
Areas of practical implementation
Wherever there is a requirement for data to be read/write on a massive global scale and there is an expectation of a near-real response, it is ideal to use Azure Cosmos DB. Some of the popular areas where Azure Cosmos DB can be implemented are IoT, gaming, e-commerce, web, and mobile applications for social interactions.
Let us consider one more example where Azure Cosmos DB can be used effectively. Say there is a mobile application that has been built for social networking. This app lets users upload posts in the form of photos, videos, music, text messages, and text as comments. Now, most people think this is a common operation these days and any database can achieve this. But they cannot be more wrong. One of the key ingredients to success in a social networking application is the ability to load the post in real-time. If the users are not seeing posts in real-time, they will lose interest in the application. This is where Azure Cosmos DB with its quick response due to data replication comes in handy.
Wrapping up
Understanding the key benefits of Azure Cosmos DB and how it works is critical for us to know why we should choose Azure Cosmos DB over other databases. It is also important to choose the right multi-model and the right level of consistency to help us improve the performance and availability of the database. Due to its impressive features, Azure Cosmos DB has become one of the most common back end databases for mission-critical projects and will continue to do so for many more years.