ML platforms help you efficiently scale machine learning (ML) efforts, reduce deployment time, and automate and connect pipelines to reliably go from development stage ML models to production stage model deployment.
According to a report by McKinsey, only 12% of corporate AI initiatives have moved beyond the testing and deployment phases. This is due to a lack of understanding of appropriate accelerators to be used and the processes for model deployment, model management, and ML Ops. However, that same study showed that organizations that productionalize their AI and ML initiatives witnessed an increase in profit margins by 3 - 15% with those numbers predicted to go up as high as 38% by 2035.
What does this mean for your business?
Developing the necessary infrastructure and processes to get your AI and ML-enabled products and systems functioning in the real world can lead to a substantial increase in your company's profits. Most organizations have taken the initial steps to begin exploring AI and ML. The difficult part is deploying AI/ML-based solutions at scale. To truly realize the potential of these new capabilities, companies need to be able to operationalize them and make it easier to deploy, manage, and scale them.
Companies don't need to start from zero to build the ML ecosystem needed to support AI/ML-based innovation. There are already established best practices for how teams can operationalize their AI/ML environment by adopting modern DevOps practices and technologies.
The absence of a unified ML platform/toolkit and automation can be a major disadvantage for companies using traditional ML processes. Organizations can end up spending a significant amount of time setting up the infrastructure instead of tuning or optimizing the model. Lack of the right infrastructure and automation can lead to performance issues while training ML models.
Building scalable platforms and optimized models requires experienced resources, this can also become a major roadblock for companies still maturing in their ML and MLOps journey. Due to this, most organizations’ AI initiatives “never make it from the prototype stage to production”. The reason for this high failure rate is the difficulty in bridging the gap between the data scientists who build and train the inference models, IT teams that maintain the infrastructure, and engineers who develop and deploy production-ready ML applications.
What is the difference between ML Automation/Orchestration, Platforms, and ML Ops?
When we talk about ML Ops, the first image that comes to mind is ML automation or orchestration. However, orchestration of any number of steps of the ML process is not ML Ops. For example, the orchestration of data pre-processing to model inference and then post-processing is not ML Ops on its own. Understanding the differences and selecting the right tools, processes, and best practices for your ML journey can go a long way in accelerating ML efforts.
ML Automation is an important part of ML Ops. It consists of automating different processes in the ML lifecycle, such as data analysis, training jobs, and inference generation using orchestrators. Quantiphi has over the years created ML automation for several firms. One of the many examples is automating inference and training for a Canada-based high-growth AI startup that creates text-to-speech using proprietary deep learning models. We leveraged Sagemaker, Lambda, SQS, and a few other AWS services to develop an automated training pipeline and an automated inference pipeline for the client. These pipelines made the whole process of training efficient and cost-effective, and minimized manual intervention in inference generation; furthermore, it is scalable and can be optimized for performance.
An ML Platform development involves automation of different processes such as data analysis, training jobs, and inference with ML automation existing within the ML Platform. These are point and click tools designed for citizen data scientists who prefer low code/no code ML for various reasons. It can point and click to build an ML model. It can potentially join multiple modules like pre-processing, inference, and post-processing to build automation pipelines, but again that's only an advanced form of ML automation and not ML Ops in its entirety.
The Quantiphi ML Platform leverages AWS Sagemaker and other services to automate different processes such as data analysis, training jobs, and inference and provides a point and click the option to build pipelines and models.
Here is a reference architecture for the Quantiphi ML Platform:
Quantiphi works extensively with AWS computing services, and our ML platform has been designed leveraging AWS Sagemaker, CodeBuild, CodePipeline, CloudFormation, and a host of other AWS services to configure and automate deployment, quickly setup ML pipelines, join multiple ML modules ensure and ensure best practice implementation.
The ease of automation offered by a point and click tool ends quickly when it comes to building an end-to-end operational AI system in production at scale. An MLOps platform follows the Build, Test, and Release cycle for enterprise-scale ML solutions and helps in automating different ML pipelines across experimentation and production environments. An MLOps Center of Excellence (CoE) would be responsible for running machine learning workloads at scale whereas users may or may not be involved in model development depending on their preference. Various automation features exist within an MLOps platform for automating each step of the product life cycle.
Quantiphi has over the years created MLOps platform components and pipelines for several firms. One of the many examples where we have created an end-to-end ML Platform is for a highly rated real estate firm based in the US where we created the MLOps platform for automating training and inference pipelines in a fully configured environment requiring minimal manual intervention. Quantiphi's solution includes features that allow users to onboard new algorithms, processing steps, orchestration pipelines, etc. onto the platform and a test environment application to test them. It also includes an application for the production environment which allows the users to deploy pipelines manually on schedule or in an event-driven manner. The overall solution ensures DevOps best practices like build - test - release cycles, segregation in test and prod environments, modularity, etc.
Depending on the organization's technology needs, budget, and resources, firms can choose to implement ML automation or an ML Platform or an MLOps Platform. At Quantiphi, we jointly brainstorm the best approach and help clients decide when to go for the MLOps platform, an ML Platform or automation links by understanding their requirements.
ML Platform Customer Use Case: US-based Marketing Data Company
The client, a US-based technology and services-based company that enables people-based marketing had vast amounts of data running on-premise and ML models running on virtual machines through a manual process. However, with over 5000 models in Python, C++, and R, it became difficult to manage them at scale with time and cost optimization. Coupled with these challenges, the client was looking for a one-stop fix to address the below-mentioned technology challenges:
- Terabytes of data residing in on-prem systems
- ML inference on Spark system on-prem would take over 11 days to run end-to-end for 5000 models
- Inferencing 5000 models needed to be done in 24 hours
- Reducing setup time and cost as on-prem servers were used
The solution involved the creation of a model training and inference pipeline and platform that is easily adaptable for a variety of machine learning (ML) use cases. It re-engineered the training and inference pipeline, reducing turnaround time and on-demand server provisioning on the cloud. With our experience in building machine learning models in various industries, our team developed an implementation strategy for the client's use case using an ML platform framework that consisted of the below segments:
- ML Training, Inference, and Automation: Developed an automated training and inference pipeline in a configurable environment with minimum manual intervention along with a framework to allow users to bring their models for model training and inference. Re-engineered their legacy systems to build an optimized training and inference pipeline that would minimize their manual intervention and enhance the process.
- Model Lifecycle Management: Built a framework to manage the entire lifecycle of the machine learning models in production and monitor their performance. Models could be deployed, undeployed, and versioned, along with monitoring of models and their approval and rejection.
- Custom-built User Interface: Facilitates the alerting and reporting of the performance and error metrics
The training pipeline uses a combination of AWS Sagemaker training, AWS Sagemaker Batch Transform, and Sagemaker processing while the inference pipeline uses Sagemaker processing. All components and steps are orchestrated through Airflow. Our control plane implementation automatically monitors the status of inference jobs and ensures all 5000 models are scored within the given SLA.
Quantiphi’s ML platform allows data scientists to create models rapidly and efficiently, building best practices and helping them meet ML deadlines. Moreover, the robust architecture allowed the client to build models at scale with ease, significantly reducing costs while cutting down maintenance costs.
Conclusion
Looking at the advancements made in machine learning and automation, it's easy for organizations to adopt a "build it and they will come" mindset. However, there are several features, tools, and frameworks that need to be taken into consideration before investing in an ML project or program. We believe that MLOps and automation deployments should depend on three things - resources, environment, and practices. In order to get the most out of your investment, these components should not be overlooked when planning a new project or program rollout. Ensuring you have the right framework and tools will allow you to build ML best practices, reduce risk, and shorten time-to-market.
Reach out to our experts for a free consultation and know how you can accelerate your strategic AI/ML projects and workloads to production with our MLOps solutions.