case study

Snorkel AI

ML Model Benchmarking on GCP for Snorkel AI

Technology

Business Impacts

5x Improvement in inference time with GCP accelerators (TPUs)

Optimized cost-to-performance ratio with GCP accelerators

Up to 93% reduction in model training time using GCP TPUs

Customer Key Facts

  • Country : USA
  • Size : Startup
  • Industry : Technology
  • Rank : Named as one of the 50 most promising AI startups in the world in 2023 by Forbes
  • Customer Base : Global 2000 enterprises, government, and AI Innovators (5 of the top 10 US banks use Snorkel AI)
  • Website : www.snorkel.ai

Problem Context

Snorkel AI’s platform enables data scientists to bring high-quality models to production faster with an iterative, interactive data-centric AI approach powered by programmatic labeling and foundation models. The Snorkel AI Research team wanted to evaluate the impact of running transformer models (CLIP/Owl-ViT) on GCP accelerators. The client aims to improve latency and optimize costs by using GPUs and TPUs in GCP’s preprocessing and training workflow.

Snorkel Al seeks to achieve the following objectives:
1. Explore the possibilities to leverage GCP GPUs and TPUs to accelerate current ML workflows leading to faster inference and training duration
2. Benchmark the results based on costs incurred and time taken

Challenges

  • Implementing multi-TPUs and benchmarking the metrics captured
  • Effort and resource intensive project with a fixed end date

Technologies Used

Google Cloud Filestore

Google Cloud Filestore

Google Cloud Compute

Google Cloud Compute

Vertex AI Workbench

Vertex AI Workbench

Google Cloud TPU VM

Google Cloud TPU VM

Google Cloud Functions

Google Cloud Functions

Google Cloud Scheduler

Google Cloud Scheduler

Cloud VPC

Cloud VPC

Google Cloud Storage

Google Cloud Storage

Cloud Monitoring

Cloud Monitoring

Cloud Logs

Cloud Logs

Solution

Quantiphi worked with Snorkel AI to develop the solution in three phases:

  • Phase One: Performing tests on GCP GPUs using CLIP or DETIC
  • Phase Two: Compare performance metrics for a single model setup using a TPU VM/Node
  • Phase Three: Compare performance metrics for a single model setup using a multi-TPU set-up

 

As part of the engagement, Quantiphi was successfully able to:

  • Demonstrate that GCP accelerators can reduce model training time up to 93% for key use cases
  • Demonstrate that GCP accelerators can enable faster interactive workflows for key use cases, ultimately leading to better inference and training throughputs
  • Leverage GCP accelerators effectively to optimize the cost-to-performance ratio
  • Benchmark the client's existing setup versus GCP accelerators (GPUs and TPUs) and provide a detailed report on the results

Results

  • Developed understanding of running models on TPUs
  • Migrated critical workloads to GCP

"Quantiphi has been an excellent partner as we explore how Google Cloud TPUs can accelerate AI/ML workloads and enable new interactive workflows for foundation model fine-tuning and training. The Quantiphi team was professional, well-organized, and kept the project on track to ensure completion on schedule. Not only did we have a great experience working with Quantiphi, the project was successful and we saw excellent results on inference and training throughputs."“

Braden Hancock, Co-Founder and Head of Research, Snorkel AI

Thank you for reaching out to us!

Our experts will be in touch with you shortly.

In the meantime, explore our insightful blogs and case studies.

Something went wrong!

Please try it again.

Share