overview

AWS • February 9, 2023

Comparing AWS Serverless with Confluent Kafka Standard: Select the Appropriate Platform to Stream Your Data in Real Time

In the modern data platform world, companies rely heavily on data-driven decisions. However, the question remains: Are we using the data at the right time? The answer may not always be ‘yes.’ Companies in every industry quickly shift from batch processing to real-time data streams to keep up with modern business requirements. The specific use cases such as fraud detection, contextual marketing triggers, and dynamic pricing  rely on leveraging a data feed or real-time data. Based on the current and comprehensive data, organizations get  visibility to make informed operational decisions faster. 

When deciding on real-time data ingestion solutions, customers look for the following factors:

  • Scalability
  • Ordering
  • Consistency and durability
  • Fault tolerance and data guarantees
  • Cost effectiveness

AWS MSK Serverless and Confluent Kafka Standard are considered to be the best services for real-time data ingestion. This blog shares insights that will help you make an informed decision on the right fit for your requirements.

About AWS MSK Serverless

Amazon Managed Streaming for Apache Kafka (Amazon MSK) Serverless is a fully managed service that enables you to build and run applications that use Apache Kafka to process streaming data. It automatically provisions and scales capacity while managing the partitions in your topic, so you can stream data without thinking about right-sizing or scaling clusters. It offers a throughput-based pricing model, so you pay only for what you use. 

About Confluent Kafka Standard

Confluent Kafka standard provides a truly cloud-native experience, completing Kafka with a holistic set of enterprise-grade features to enhance developer productivity, operate efficiently at scale, and meet  your architectural requirements before moving to production. Underpinning the platform is its 99.99% uptime SLA and committer-driven expertise, providing support and services from the team with over one million hours of technical experience with Kafka.

Let us now compare AWS MSK Serverless and Confluent Kafka Standard across different focus areas:

1. Architecture

Criteria Group AWS MSK Serverless Confluent Kafka Standard
Ease of Setup Run Apache Kafka without having to manage and scale cluster capacity as a serverless offering No setup is involved as Confluent Kafka is serverless
Access Controls Amazon MSK uses IAM to check whether the client is an authenticated identity and is authorized to interact with your cluster Confluent Cloud role-based access control (RBAC) lets you control access to an organization, environment, cluster, or granular Kafka resources based on predefined roles and access permissions
Ecosystem integration Integrates with AWS ecosystems like VPC, Lambda, Glue Schema, and AWS Kinesis Analytics that process streaming data Offers Pre-built Kafka Connectors, Confluent Hub, Schema Registry and MQTT Proxy
Scalability In serverless clusters, Amazon MSK automatically balances partitions Automatic resource allocation to your cluster to manage consumer lag as throughput scales up or down with self-balancing clusters
Availability Amazon MSK uses multi-AZ replication for high availability. Data replication is included at no additional cost Standard clusters are designed for production-ready features with an uptime SLA: 99.95% for Single-Zone and 99.99% for Multi-Zone

2. Platform Support

Criteria Group AWS MSK Serverless Confluent Kafka Standard
Offerings AWS MSK and MSK Serverless are the two offerings provided by AWS for the managed Kafka Self managed and fully managed are the two offerings provided by Confluent for managed Kafka
Connectors Create custom plugins using MSK connect to move data between source and destination systems 120+ pre-built connectors for real-time integration between source and destination systems
Kafka Upgrades MSK Serverless automatically upgrades clusters without requiring customers to provide any input Upgrade policy in confluent Kafka is broadly classified into three types -  minor, major and deprecation upgrades listed here
Quota Max ingress throughput : 200 MBPS
Max egress throughput : 400 MBPS
Ingress per partition : 5 MBPS
Egress per partition  : 10 MBPS
Max partition size : 250 GB
Max ingress throughput : 250 MBPS
Max egress throughput : 750 MBPS
Ingress per partition : 5 MBPS
Egress per partition  : 15 MBPS
Max partition size : 250 GB
Monitoring Amazon MSK integrates with Amazon CloudWatch so that you can collect, view and analyze CloudWatch metrics for your Amazon MSK cluster Confluent Control Center is a web-based tool for managing and monitoring Apache Kafka®.
Control Center provides a user interface that provides you with a quick overview of cluster health, observe and control messages, topics, and Schema Registry, and develop and run ksqlDB queries
Workloads Workloads which are AWS native and need deeper integration with other AWS services are a good fit Workloads which are hybrid type, i.e on-prem and cloud/multi-cloud  are suited for Confluent Kafka platform

3. Data Governance and Security

Criteria Group AWS MSK Serverless Confluent Kafka Standard
Compliance Certificates HIPAA eligible, PCI, ISO, SOC 1,2,3,FedRAMP SOC 1/2/3, ISO 27001 and PCI, GDPR
Network Security MSK provides a strong network security pillar using AWS VPC Peering, AWS Transit Gateway and AWS Private link to provide a secure traffic flow between cross-account and cross-regions All the Confluent Standard clusters are accessible through secure internet endpoints. All connections to Confluent Cloud are encrypted with TLS and require authentication using API keys, regardless of network configuration
Data Encryption AWS KMS for data in rest and transit encryption. Bring your own key (BYOK) for at-rest encryption. Data in motion encryption is available

4. Pricing

Criteria Group AWS MSK Serverless Confluent Kafka Standard
Connector pricing MSK connect pricing breakdown Connectors in Confluent cloud breakdown as per source

Scenario:

Let’s assume the cluster has 5 topics with 20 partitions each. Your producers write on average 100GB of data daily and your consumers read 200GB of data. You also retain that data for 24 hours to ensure it is available for replay. In the above scenario, you would pay the following for a 31-day month:

Here’s the pricing breakdown for MSK Serverless (for Ohio Region)

Charges Usage Rate(USD) SubTotals
Cluster/hours 31 days * 24 hrs/day = 744 cluster-hours 0.75/cluster-hr 744 * 0.75 =558
Partition/hours 31 days * 24 hrs/day * 5 * 20 = 74,400 partition-hours 0.0015/partition-hr 74,400 * 0.001875 = 111.60
Data-in 100 GB x 31 days = 3,100 GB 0.10/GB-in 3,100 * 0.10 = 310
Data-out 200 GB x 31 days = 6,200 GB 0.05/GB-out 6,200 * 0.05 = 310
Storage Average storage used = 100 GB-months 0.10/GB-month 100 * 0.10 = 10
Total     1229.60 USD

Here’s the pricing breakdown for Confluent Kafka Standard:

Charges Usage Rate(USD) SubTotals
Cluster/hours 31 days * 24 hrs/day = 744 cluster-hours 1.5/cluster-hr 744 * 0.75 =1116
Partition/hours First 500 partitions at no additional cost 0.0015/partition-hr 0
Data-in 100 GB x 31 days = 3,100 GB 0.13/GB-in 3,100 * 0.13 = 403
Data-out 200 GB x 31 days = 6,200 GB 0.06/GB-out 6,200 * 0.06 = 372
Storage Average storage used = 100 GB-months 0.10/GB-month 100 * 0.10 = 10
Total     1901 USD

What do you select?

You should choose AWS MSK when  

  • You need more seamless and deep integration with other native AWS Services like AWS lambda for MSK event sourcing, AWS Secret Manager for client credentials used for SASL/SCRAM authentication, etc.
  • You want to lower costs since the cluster hours are reasonably less.
  • You need speed in provisioning Kafka Cluster
  • You need secure connectivity to your MSK Cluster and other clients accessing the resource using AWS Private Link, VPC Peering or Transit Gateway.

You should choose Confluent Kafka when 

  • You have a hybrid and multi-cloud strategy (available on AWS, Azure, GCP), making it native to the public cloud providers.
  • You need to centralize Kafka Management Operation and have a quick overview using the Confluent Control Center.
  • You have an on-premises deployment with cloud providers (e.g., AWS Outpost including Wavelength, Google’s Anthos) since it's primarily built on top of Kubernetes
  • You need a rich Kafka ecosystem like pre-built connectors, governance using stream lineage, and the ability to connect non-java Kafka clients.

AWS MSK Serverless and Confluent Kafka Standard  are both efficient, reliable, and among the market's best real-time data ingestion solutions. However, they cater to different uses and processes. Hence, evaluating the use case and selecting the most suitable tool is important.

Quantiphi as Amazon MSK Service Delivery Partner 

Quantiphi is a designated service delivery partner that makes it easy for customers to migrate and build data streaming solutions on Amazon MSK to not only take advantage of the rich Amazon MSK integrations with other AWS services and address real-time analytics use cases but also help them realize the cost benefits sooner. To get started, or learn more, get in touch with our experts

Written by

Karthik Shetty & Sanchit Jain

Thank you for reaching out to us!

Our experts will be in touch with you shortly.

In the meantime, explore our insightful blogs and case studies.

Something went wrong!

Please try it again.

Share