CI/CD-related processes are a commodity for software companies nowadays. In modern companies, applications are not developed, tested, and deployed without CI/CD pipelines. Projects which make use of Artificial Intelligence and Machine Learning capabilities increase the Agility of those companies and also expand their business benefits. With the increased use of Machine Learning, the number of Machine Models also rises.
In this article, we’ll explore which extra challenges you will encounter developing Machine Learning Models with respect to CI/CD pipelines. CI/CD and Machine Learning – a great marriage.
Setting the right context
First of all, it’s important to set the right context and provide some background information about the terms which are used.
CI
Continuous Integration (CI) refers to building, validating, and packaging software components in a pipeline. The end result of this stage is a single version in a package repository.
CD
Following the CI phase is the CD (Continuous Delivery / Deployment) phase which aims to actually deploy the new version of the software package to the target environment of choice. Feedback is fed back to the system to be able to improve every next version.
Machine Learning Model
The subject for both CI and CD within the context of this article is the Machine Learning Model (ML model). Typically, an ML model consists of the following components:
- Raw data and (example) applications
- Data processing modules
- Machine Learning Algorithms
- Infrastructure related resources
All of these components basically consist of source code that needs to be developed, tested, and maintained. Therefore the need to have a solid CI/CD pipeline applies to an ML model as it does to other software applications and pieces of infrastructure.
Understanding the process
A typical CI/CD pipeline for a ML model consists of the following steps which are derived from a Machine Learning Process. Every change to the model also triggers the pipeline to generate a new version of the model. Common steps such as checkout the source code and version increments are left out.
First of all, the raw data is collected. It can be pulled from a database or use data that is available as a blob file.
Use data preprocessing modules to apply preprocessing activities to the raw data. The CI/CD pipeline should validate the data in such a way that it is well-prepared for the next step. More practically, the data should be tested for integrity and accuracy before the ML model is used in production.
Testing and validation
Various testing and validation techniques are used to actually determine which is the best model.
- Speed test: is the model fast enough to actually process the massive amounts of data? If not, see if the data sources can be optimized using a different or better structure. Or adjust the model itself so it can perform faster. Another option would be adding more infrastructure resources such as horizontal or vertical scaling.
- Accuracy: if the accuracy of the model is not sufficient, the outcomes are not reliable. This means human intervention to correct predictions and this results in a slowed-down process or even an unworkable solution.
Perhaps there more than one Data Model exists. ML Model creators use an iterative process (often using a CI/CD pipeline) to find the best model. They apply the learning algorithms to the data to find the best candidate.
The next step is to actually deploy the ML model in production. Deploying it in production also means setting up the right infrastructure which actually hosts the model. After it has been deployed, the ML model should also be validated.
Different deployment options
There are several deployment models of which you must choose one that matches your use case. In case you require real-time results of your model, you should choose online deployment. Accepting a bit of latency gives you the option to select the batch deployment method. The embedded method gives you the option to embed the model in a mobile or Edge device.
Online deployments are the most demanding methods since the model needs to produce results immediately via APIs. For example: update locations and/or arrival times of taxis. Batch deployment methods enable you to let the model run in batches for new data after previous runs. For example: calculate risk scores for new vulnerabilities in your development environments.
The deployment pipeline should take into account the following topics to use it in large and diverse environments:
- Be able to re-use (parts of) the CI/CD pipeline for other models besides the one you’re working on.
- Keep an eye on the pipeline architecture, data quality, and performance to make sure it’s robust and future-proof.
- Use high-quality datasets and populate the data automatically. Make sure it’s used consistently in every pipeline iteration.
- Reduce the number of manual steps needed to maintain the lifecycle of your model to a minimum.
Post-deployment actions
Once the model has been deployed to production, it’s wise to take into account the following points of interest:
- Integrate the CI/CD pipeline with automated regression tests (just like regular software components).
- Only allow pull requests to trigger any change to the pipeline. And only after a peer-to-peer review has taken place.
- Report metrics and other special occurrences to the entire team. This leads to quick attention and thus responses in case there are any problems.
Various tools exist to help you deploy your model with ease. A number of them are TorchServe which specializes in deploying, serving, and scaling PyTorch models in production environments. A very complete and open-source solution is BentoML. This is a framework to build reliable, scalable, and budget-wise AI applications. It’s much more than just a “deployment tool” for your ML models.
Retraining the model
In addition to the previous phases, the ML model should also be retrained with every iteration or when new data is added. So the CI/CD pipeline should also include a stage focused on Continuous Training (CT).
This is a must-have for every model since nothing stays static:
- The value of the data sets might decrease as new data arrives or when the structure of the dataset is getting updated.
- The model itself might not cover all of the use cases of the current system.
- Existing bugs in the model need to be patched and this has an effect on the model itself.
- The business rules which apply to the model change and the model should also incorporate them to remain accurate.
Retraining becomes even more important when things change in a rapid fashion.
Resources
Various resources help to get you started. Since the number of topics is rather extended, it’s nice to have some hooks.
Neptune.ai provides an excellent article that greatly covers a lot of topics that should be taken into account when building a CI/CD pipeline for ML models.
Azure provides a great set of pages to learn a rich set of AI skills. It also covers the support of CI/CD for machine learning for the Databricks service. This includes machine learning elements that require CI/CD as well as other related topics such as DataOps and ModelOps to name a few.
A deep dive into the lessons learned from Ari Bajo is presented on the website of Valohai.com. It covers collecting user feedback for ML models, concept drift as well as model validation through the use of CI/CD pipelines.
Summary
There is so much more to explore. The topic of Machine Learning models is pretty big and the possibilities are expanding every day. In this article, we’ve highlighted the main concepts and points of interest within the domain of CI/CD pipelines for Machine Learning models. We covered the main process, model deployment options as well as post-deployment steps. Besides, various examples, use cases, and tools helped to get you started. With these in mind, this should be enough to put things into practice. CI/CD and Machine Learning – a great marriage.