ML Model Benchmarking on GCP for Snorkel AI

Business Impacts

5x Improvement in inference time with GCP accelerators (TPUs)

Optimized cost-to-performance ratio with GCP accelerators

Up to 93% reduction in model training time using GCP TPUs

Customer Key Facts

Country : USA
Size : Startup
Industry : Technology
Rank : Named as one of the 50 most promising AI startups in the world in 2023 by Forbes
Customer Base : Global 2000 enterprises, government, and AI Innovators (5 of the top 10 US banks use Snorkel AI)
Website : www.snorkel.ai

Problem Context

Snorkel AI’s platform enables data scientists to bring high-quality models to production faster with an iterative, interactive data-centric AI approach powered by programmatic labeling and foundation models. The Snorkel AI Research team wanted to evaluate the impact of running transformer models (CLIP/Owl-ViT) on GCP accelerators. The client aims to improve latency and optimize costs by using GPUs and TPUs in GCP’s preprocessing and training workflow.

Snorkel Al seeks to achieve the following objectives:
1. Explore the possibilities to leverage GCP GPUs and TPUs to accelerate current ML workflows leading to faster inference and training duration
2. Benchmark the results based on costs incurred and time taken

Challenges

Implementing multi-TPUs and benchmarking the metrics captured
Effort and resource intensive project with a fixed end date

Technologies Used

Google Cloud Filestore

Google Cloud Compute

Vertex AI Workbench

Google Cloud TPU VM

Google Cloud Functions

Google Cloud Scheduler

Cloud VPC

Google Cloud Storage

Cloud Monitoring

Cloud Logs

Solution

Quantiphi worked with Snorkel AI to develop the solution in three phases:

Phase One: Performing tests on GCP GPUs using CLIP or DETIC
Phase Two: Compare performance metrics for a single model setup using a TPU VM/Node
Phase Three: Compare performance metrics for a single model setup using a multi-TPU set-up

As part of the engagement, Quantiphi was successfully able to:

Demonstrate that GCP accelerators can reduce model training time up to 93% for key use cases
Demonstrate that GCP accelerators can enable faster interactive workflows for key use cases, ultimately leading to better inference and training throughputs
Leverage GCP accelerators effectively to optimize the cost-to-performance ratio
Benchmark the client's existing setup versus GCP accelerators (GPUs and TPUs) and provide a detailed report on the results

Results

Developed understanding of running models on TPUs
Migrated critical workloads to GCP

"Quantiphi has been an excellent partner as we explore how Google Cloud TPUs can accelerate AI/ML workloads and enable new interactive workflows for foundation model fine-tuning and training. The Quantiphi team was professional, well-organized, and kept the project on track to ensure completion on schedule. Not only did we have a great experience working with Quantiphi, the project was successful and we saw excellent results on inference and training throughputs."“
Braden Hancock, Co-Founder and Head of Research, Snorkel AI

ML Model Benchmarking on GCP for Snorkel AI

Business Impacts

Customer Key Facts

Challenges

Technologies Used

Solution

Results

Start Your Next Gen AI Journey Today

Partners

Solutions

Industries

Resources

Company