Everyone knows: cockroaches are annoying, highly persistent bugs. They can live under extreme conditions and always keep coming back even if you eliminated them all. When it comes to your valuable data you require a rock-solid solution with is fault-tolerant and extremely reliable. CockroachDB does do that, it utilizes the true power and possibilities of the cloud for resilience. Users like it because it is built around traditional SQL concepts but also leverages scalability, resilience, and failover. CockroachDB is a hassle-free distributed SQL database solution for multiple clouds. In this article, we will explore some of the key features and characteristics to better understand why this solution might be beneficial for your organization.
Background info
History
Many good ideas start with a group of enthusiastic people that work in the same environment or work on the same project. The same is true for CockroachDB. In 2015, a couple of ex-Googlers founded CockroachDB. Most of them worked in the Google File System team and had already created Google’s BigTable and Spanner. Both tools inspired them to create a somewhat similar solution for companies outside of Google.
CockroachDB was born. It quickly matured and as of now the latest release is 20.x. As a result, a lot of large enterprises trust their production data to be stored and handled in the cloud by using CockroachDB.
Deployment model
Simply speaking: CockroachDB comes into two flavors:
- CockroachCloud. This is a fully hosted and managed solution that is offered for AWS and Azure. It follows the cloud-based pricing model: you only pay for what you use. Pricing is per-Node, per-hour based on the specifications of the Node and the type of data storage solution you require.
- CockroachDB Enterprise. For large companies that require additional features and which demand a self-hosted solution. Pricing differs per customer and is based on individual needs.
Besides these flavors, CockroachDB offers a 30-day free trial version of their product. It gives you an option to explore the features and see if they fit your use cases.
Both AWS and Azure offer CockroachDB as a “single-click deployment” package. No surprise, you can easily deploy CockroachDB on a Kubernetes cluster of your choice.
AWS
AWS partners with Jetware to offer this solution in the marketplace. Simply said: you need to select the region in which you want to deploy your EC2 instances (Virtual Machines), the type of AMI (Amazon Machine Image) you wish and the size of your EC2 instances. By agreeing with the End User License Agreement you are ready to roll it out.
With this one-stop-solution, you get a full-blown managed database that scales as you need and which survives disasters by spreading your data across multiple geographic regions. It pushes high availability of your data to the next level.
You can try it out using the free tier pricing model for a limited set of AMIs and EC2 configurations. If you require more processing power and/or custom AMIs you are charged based on the resource type you select during the roll-out. Also, be sure to check out the reviews of existing users to read their opinions about the product.
Azure
Azure users can utilize the Azure marketplace solution which Cockroach Labs offers. You can use it for free for 30 days. If you require more than the default 6vCPUS which is offered in the free tier plan, you need to contact Cockroach Labs to negotiate a private plan. As of now there are no customer reviews yet of the Azure-based solution, which is not surprising given Microsoft SQL’s dominance on Azure.
Comparison
One of the key features of CockroachDB is its support for multiple clouds. None of the current cloud-native solutions by the big cloud providers offers this yet. Azure CosmosDB, Google Spanner, AWS Aurora all support SQL, but they do not offer a multi-cloud solution. There are database solutions that offer a multi-cloud strategy like DataStax Enterprise or MongoDB. They also support serverless computing but they clearly lack the support for SQL. CockroachDB fills the gap of these worlds.
Relational versus distributed SQL
Every organization which has a strong data strategy might rethink its database solution sooner or later. In case they need or want to stick to SQL related databases and at the same time bring their storage solutions to a higher plan, they might need to think from a different perspective. Two options come into play. Traditional relational databases add up extra capabilities like sharding, scaling plus fail-over, and distributed SQL databases which offer these best practices out of the box.
Some key considerations to take into account when it comes to the next steps of your data storage solution.
Cloud
Your application(s) might run in the cloud without any problems. If you still depend on a traditional SQL database solution, your architecture is still considering legacy infrastructure. On the other hand, distributed SQL solutions are architected for cloud-native applications, thus much more future proof. For example, micro-services that heavily depend and react on systems that can fail, won’t benefit from legacy infrastructure which does not support this. The same is true for a legacy database system.
Scale
Relational solutions sometimes have to shard (split) data to enable fast read and write operations. Database professionals need to think of this and come up with the best solution. It also requires asynchronous replication in which transactions are sent to multiple systems without knowing the order in which they are handled. These kinds of problems are hard to implement. Distributed SQL solutions are designed for global scale which speeds up reads and writes across different locations. Asynchronous replication becomes easier since this aspect is managed for you.
Transactions
Consistent transactions are absolutely necessary to ensure data quality and integrity. If one transaction fails, your write operation might need to be rolled back. If this does not happen properly, parts of your data can be lost.
A distributed SQL database solution like cockroachDB serializes data in isolation to ensure its consistency. Thus you don’t need to worry about this problem (anymore).
Resilience
Passive fail-over of your database systems always need to deal with a Recovery Point Objective (RPO) lag. You risk losing valuable data between the time the primary system goes down and the time that the fail-over system takes over. This period of lag is minimized or even completely absent when using distributed SQL solution since it uses active redundant solutions everywhere.
For sure there are many more considerations, these are the main ones. It gives you a frame of reference of what you can expect from a distributed SQL database solution.
Main capabilities
Let’s now analyze some of the key features of CockroachDB to better understand it’s qualities. No surprise but important to mention explicitly: it uses a standard SQL interface so you can fully re-use your SQL statements and other SQL functions like indexes, stored procedures, etc.
Forget failures – focus on transactions
One of the key capabilities is that developers and other (cloud) system operators like database operators can truly focus on transactions. No need to worry so much about the infrastructure layer to work correctly in situations where there is a heavy load or in case of a hampering fail-over.
Asynchronous replication between multiple data centers (in multiple regions if you want) is available in cockroachDB. Transactions are checked for consistency automatically using the Raft consensus algorithm. Raft is a worldwide standard to handle the load or to elect “leaders” of a cluster of Nodes, so this guarantees a solid solution for this aspect. CockroachDB uses multiple Virtual Machines (the actual Nodes) to evenly spread the load. It is also possible to spread the load based on your own criteria.
CockroachDB handles short-term and failures and permanent failures smoothly. End users do not notice it. In case a Node (also called a replica in this matter) fails, the Raft consensus algorithm elects a new leader which ensures that data is not lost. Long term failures are handled by creating a completely new replica. They can be placed automatically as well as manually. Users can specify attributes such as the CPU power, the amount of memory and the geographic location of a replica.
All of these factors help to focus on transactions instead of ways to handle potential failures. Focusing on transactions also means: focus on the business logic which makes sense to the application. In turn, this makes sense for your end-users. It frees up your valuable time to do the things which are needed to make your organization stand out of the rest.
Geo distribution
Your data needs to be available close to when you use it. For this, traditional, in-house, data-centers still play a significant role. Slowly this is changing. Simply said: the closer the data is to your end-user the faster. Keep the latency limited to access it as fast as possible.
Besides this, the way your data is structured (think of SQL table indexes) and the storage options you choose also play a vital role. Spread your data across different geographic locations to bring it close to where you need it. By spreading it across different regions, you are also protected in case an entire region faces a blackout. CockroachDB automatically switches over to another region in this manner.
Data sovereignty
Another great feature that is related to this is the support for data sovereignty. Some companies require that data should stay inside the boundaries of the country in which they are located or registered. Companies in Germany, for example, have to follow this restriction in a lot of cases. Besides this generic restriction which applies to the entire data set, this can also be applied to specific data only. CockroachDB lets you select which data (tables) should be stored in which region. With this in mind, you can be very specific when you design your data plans. When data sovereignty is a key requirement, CockroachDB is a powerful product.
Multi-cloud and hybrid cloud
As mentioned before, there are just a few data solutions that let you spread your data across multiple clouds. It’s not so obvious that the “traditional” cloud providers lack this feature. CockroachDB goes even further by allowing you to spread your data across multiple clouds as well as support the hybrid cloud model. Specific data can still be stored in your on-prem data-center while other data can be stored in one or multiple public clouds. This gives you an enormous amount of flexibility and this also improves the options to comply with your data sovereignty. Bear in mind that this setup is complex and thus requires more effort to implement. Consider the (potential) value of your data before you decide which architecture is best for you.
The role of Kubernetes
Kubernetes also plays a vital role when it comes to CockroachDB. Of course, Kubernetes is the platform to deploy CockroachDB on. There are production-grade Helm charts to install it but the support for Kubernetes goes deeper. You can manage your (distributed) SQL database from INSIDE the Kubernetes perspective. Some of the key features:
- Kubernetes’ Pods are created and restored on the fly in case of a failure. This works just like any other Pod in Kubernetes. Nothing new to learn here.
- Scaling your SQL databases is easy and very straight forward. No downtime, data loss is kept to a minimum or completely out of the question.
- Managing CockroachDB from inside a Kubernetes cluster enables you to patch and upgrade your databases without interruption or downtime.
Conclusion
This article provided a good overview of what CockroachDB is and what the benefits of distributed SQL databases are. It provides a lot of advantages over traditional databases, thus it helps to speed up teams who want to migrate their data to the cloud but who face difficulties maintaining it in a secure and robust way. I hope this article helped you to consider giving CockroachDB a try. Can’t wait to see what’s next.