Business Impact

  • 5K

    Legacy models processed

  • 11 - 1 day

    Reduction in scoring time

  • Reduced manual effort

Customer Key Facts

  • Location : North America
  • Industry : Information Technology & Services

Problem Context

A leading marketing data & technology services company that provides the data foundation for the world’s best marketers, had terabytes of data residing in its on-premises systems, compelling them to perform manual ML operations. They wanted a machine learning lifecycle management framework that could automate the manual process of managing the training and inference jobs, as well as reduce the turnaround time to run the scoring using their on-premise infrastructure.



  • The customer’s ML models had modeling scripts being run for multiple different model types
  • The 5,000+ legacy models and 200+ models per month made it difficult to manage them at scale with time and cost optimizations
  • It took almost 11 days each month to run the scoring using their on-premise infrastructure

Technologies Used

Amazon S3
Amazon SageMaker
Amazon RDS
Amazon EMR
Amazon CloudWatch
AWS CloudTrail
AWS CloudFormation

Building a Central ML Platform and Inference Pipelines for Running Legacy Models and New Models on Top of AWS EMR


Quantiphi developed a framework that could manage the lifecycle of machine learning models in production and the incoming data to different models. An automated training and inference end-to-end pipeline was also built in a fully configured environment with minimal manual intervention; enabling users to pre-process, build, train, tune, and score the models.


  • Enabled parallel processing for ~5,000 total legacy models
  • Performed post processing on the ~13TB scored data on AWS EMR
  • Built User Interface using Angular and Node.js for scheduling and managing the pipelines

Looking for similar project?

Let's Talk

Get your digital transformation started

Let's Talk