loading

Business Impact

  • 10TB+

    of activity tracker data loaded into BigQuery

  • 100GB+

    of genomic data [vcf files] loaded into BigQuery using Healthcare API

  • Reduced querying time

Customer Key Facts

  • Country : USA
  • Industry : Healthcare & Lifesciences

Problem Context

A non-profit biotechnology research organization, focused on finding treatments for amyotrophic lateral sclerosis (ALS), wanted aggregated patient data in BigQuery that could be queried using a unique patient ID.

Challenges

  • Disparate patient data sources with multiple data types
  • Requirement of dataset level and tables level access control
  • Inadequate technical expertise to develop, run, and maintain a cloud-based data lake
  • Need for a longitudinal patient profile to perform analytics
Challenges

Technologies Used

Cloud Functions
Healthcare API
Google BigQuery
Dataflow
Cloud Data Fusion
Dataproc

Building a secure and scalable data lake on Google Cloud

Solution

Quantiphi built a secure and scalable data lake on Google Cloud with the following features:
– A bronze layer to store the raw data
– A silver layer to store aggregated data via scheduled SQL queries on the data received from the bronze layer. The aggregated layer is then utilized as a consumption layer for analytics

Results

  • Enabled data querying using a unique patient ID
  • Built scalable ingestion pipeline to handle adhoc file uploads
  • Created audit tables in BigQuery to monitor the data pipelines and log ingestion pipeline failures

Looking for similar project?

Let's Talk

Get your digital transformation started

Let's Talk