Canary Deployment

23 april 2021 om 10:00 by ParTech Media - Post a comment

It is universal knowledge that Continuous Integration and Continuous Deployment are the new and improved ways to develop and deploy software applications. Over the last few years, CI/ CD has undergone a massive transformation with respect to its application in software development.

A CI/CD environment is way different from a production environment. Most of the bugs only uncover themselves after the updates or the changes have hit the users. This is a major drawback for developers and organizations across the world.

Luckily, there is a solution to this and it is called Canary deployment. It is a process by which developers can put their application to LIVE testing before deploying it to the masses.

Let us find out how in this blog.

Table of Contents

  1. What is Canary deployment?
  2. How does Canary deployment work?
  3. How do you select your Canary?
  4. Practical Application involving Canary deployment
  5. Closing Thoughts

What is Canary deployment?

Canary release is a process where new versions are rolled out to a very select list of users. Developers select people in a particular geographical location or the ones who fulfill a certain condition to release their updates. The changes/ updates will eventually be released to everyone, but a small subset of users get to taste it before the others.

Deploying your updates to this small subset, allows developers to spot any errors or bugs in the code. Rather than releasing it to the entire user base and finding the bugs, this is a better alternative.

If the initial users find any bugs or issues, the developers can immediately work to fix that and launch a new update. If they find that the new version has no bugs whatsoever, they can launch it to their entire user base.

As we have highlighted before, the Canary release allows developers to test the application before it goes LIVE to the entire user base. But this time, the test involves a small number of real users who use the application.

In a way, the blue-green deployment and Canary deployment are very similar to each other. They both have two environments and they shift users between the two environments. The blue environment is the staging area and the green environment is the production area. But unlike Canary, they don’t shift a small subset of users before the whole shift.

The shift between the two environments is carried on a total user basis.

How does Canary deployment work?

Initially, there are two environments created by the software development team. One is the pre-existing environment and one is the completely new environment where new updates and changes are made to the application.

The developers start by routing a few users to the new environment to test out the changes. The way these users are selected will be explained later in this blog post. As the developers gain confidence with their new version, they start routing even more users to it.

This allows them to test the performance of the updates in real life with real users. With time they slowly route more users to the new version, until everyone has been transferred. Most companies decommission the old version after they have gained full confidence with their update. While some organizations hold on to their old version, just to make sure that there aren’t any errors. If they find errors, they can immediately issue a rollback and shift all the users back to the old version on short notice.

In summary, you can split Canary deployment into these stages -

  1. Deploy the update to one or more Canary servers.
  2. Test the Canary servers and identify any bugs
  3. Fix the bugs and wait until you get a satisfying output.
  4. Deploy to the remaining servers.

Some developers carry automated tests while some just onboard the test users to the new environment and wait it out. If they find the test users satisfied with the update, they deploy it to all the users.

How will you select your Canary?

It is important to know what things in your system you can use to partition users. There are commonly two areas that make great partitions: users and instances. Whilst creating a two-way partition is a good start, a many-way partition is much better as it allows you to incrementally increase your exposure, whilst simultaneously gaining confidence in the release.

Users and instances are the two most common ways in which developers select their test set. Sometimes this is a two-way partition while other times it is a multi-way partition. This totally depends on the development team and the update they’re trying to launch.

All the applications out there have users or some concept of an end-user. The development team can choose to separate these users according to timezone or their geographical location. This is a good way to send your updates to a select location and test it out before launching them to all users.

Some organizations also partition on a percentage scale. They set a maximum limit to the number of users who can use this update. This can be anything from 5% to 15%. The first 15% of users who log onto the application after this condition is made ‘live’ and will receive an update notification. If they choose to update it, they will be a part of that test set.

Finally, applications with a high number of users use a Beta Program to find their canaries. You might have seen this on applications that are used by millions of people. Early adopters who are willing to try this can register and download the update. This can be beneficial as these early adopters are usually the people who are ready to give extensive feedback about your new update.

In times where you don’t have access to user information, developers can roll updates with instances. If there is more than one instance of your application, developers can select one and roll the update onto it.

But at the end of the day, you should select the group that would have the lowest impact on your business. You should select the user or instance list that will affect revenue in a minor way if anything goes wrong.

At this stage, you may ask, is the Canary environment similar to the staging environment?

Canary environments are not similar to staging environments. The main reason being that the staging environment is dedicated to a specific task. It will never be a complete production environment, nor does it have the capabilities to be so. But the Canary environment is meant to be the production environment where all the users will eventually come to exist.

Practical application involving Canary deployment

Facebook employed a strategy with multiple canaries, the first one being visible only to their internal employees and having all the FeatureToggles turned on so they can detect problems with new features early.

Facebook carries out the Canary deployment in a very unique way. It has multiple stages of canaries, where the first stage is dedicated to the employees of Facebook. This environment is made available to the internal employees with all the features switched on. After that, it moves on to a small subset of users that have been selected by the developers at Facebook.

Once both these stages give a positive response, it is deployed to the whole world.

Closing Thoughts

Canary is a simple concept that has been saving time and money for so many businesses around the world. With the world becoming quicker each minute, Canary development is here to stay!