In the modern data platform world, companies rely heavily on data-driven decisions. However, the question remains: Are we using the data at the right time? The answer may not always be ‘yes.’ Companies in every industry quickly shift from batch processing to real-time data streams to keep up with modern business requirements. The specific use cases such as fraud detection, contextual marketing triggers, and dynamic pricing rely on leveraging a data feed or real-time data. Based on the current and comprehensive data, organizations get visibility to make informed operational decisions faster.
When deciding on real-time data ingestion solutions, customers look for the following factors:
- Scalability
- Ordering
- Consistency and durability
- Fault tolerance and data guarantees
- Cost effectiveness
AWS MSK Serverless and Confluent Kafka Standard are considered to be the best services for real-time data ingestion. This blog shares insights that will help you make an informed decision on the right fit for your requirements.
About AWS MSK Serverless
Amazon Managed Streaming for Apache Kafka (Amazon MSK) Serverless is a fully managed service that enables you to build and run applications that use Apache Kafka to process streaming data. It automatically provisions and scales capacity while managing the partitions in your topic, so you can stream data without thinking about right-sizing or scaling clusters. It offers a throughput-based pricing model, so you pay only for what you use.
About Confluent Kafka Standard
Confluent Kafka standard provides a truly cloud-native experience, completing Kafka with a holistic set of enterprise-grade features to enhance developer productivity, operate efficiently at scale, and meet your architectural requirements before moving to production. Underpinning the platform is its 99.99% uptime SLA and committer-driven expertise, providing support and services from the team with over one million hours of technical experience with Kafka.
Let us now compare AWS MSK Serverless and Confluent Kafka Standard across different focus areas:
1. Architecture
Criteria Group | AWS MSK Serverless | Confluent Kafka Standard |
---|---|---|
Ease of Setup | Run Apache Kafka without having to manage and scale cluster capacity as a serverless offering | No setup is involved as Confluent Kafka is serverless |
Access Controls | Amazon MSK uses IAM to check whether the client is an authenticated identity and is authorized to interact with your cluster | Confluent Cloud role-based access control (RBAC) lets you control access to an organization, environment, cluster, or granular Kafka resources based on predefined roles and access permissions |
Ecosystem integration | Integrates with AWS ecosystems like VPC, Lambda, Glue Schema, and AWS Kinesis Analytics that process streaming data | Offers Pre-built Kafka Connectors, Confluent Hub, Schema Registry and MQTT Proxy |
Scalability | In serverless clusters, Amazon MSK automatically balances partitions | Automatic resource allocation to your cluster to manage consumer lag as throughput scales up or down with self-balancing clusters |
Availability | Amazon MSK uses multi-AZ replication for high availability. Data replication is included at no additional cost | Standard clusters are designed for production-ready features with an uptime SLA: 99.95% for Single-Zone and 99.99% for Multi-Zone |
2. Platform Support
Criteria Group | AWS MSK Serverless | Confluent Kafka Standard |
---|---|---|
Offerings | AWS MSK and MSK Serverless are the two offerings provided by AWS for the managed Kafka | Self managed and fully managed are the two offerings provided by Confluent for managed Kafka |
Connectors | Create custom plugins using MSK connect to move data between source and destination systems | 120+ pre-built connectors for real-time integration between source and destination systems |
Kafka Upgrades | MSK Serverless automatically upgrades clusters without requiring customers to provide any input | Upgrade policy in confluent Kafka is broadly classified into three types - minor, major and deprecation upgrades listed here |
Quota | Max ingress throughput : 200 MBPS Max egress throughput : 400 MBPS Ingress per partition : 5 MBPS Egress per partition : 10 MBPS Max partition size : 250 GB |
Max ingress throughput : 250 MBPS Max egress throughput : 750 MBPS Ingress per partition : 5 MBPS Egress per partition : 15 MBPS Max partition size : 250 GB |
Monitoring | Amazon MSK integrates with Amazon CloudWatch so that you can collect, view and analyze CloudWatch metrics for your Amazon MSK cluster | Confluent Control Center is a web-based tool for managing and monitoring Apache Kafka®. Control Center provides a user interface that provides you with a quick overview of cluster health, observe and control messages, topics, and Schema Registry, and develop and run ksqlDB queries |
Workloads | Workloads which are AWS native and need deeper integration with other AWS services are a good fit | Workloads which are hybrid type, i.e on-prem and cloud/multi-cloud are suited for Confluent Kafka platform |
3. Data Governance and Security
Criteria Group | AWS MSK Serverless | Confluent Kafka Standard |
---|---|---|
Compliance Certificates | HIPAA eligible, PCI, ISO, SOC 1,2,3,FedRAMP | SOC 1/2/3, ISO 27001 and PCI, GDPR |
Network Security | MSK provides a strong network security pillar using AWS VPC Peering, AWS Transit Gateway and AWS Private link to provide a secure traffic flow between cross-account and cross-regions | All the Confluent Standard clusters are accessible through secure internet endpoints. All connections to Confluent Cloud are encrypted with TLS and require authentication using API keys, regardless of network configuration |
Data Encryption | AWS KMS for data in rest and transit encryption. | Bring your own key (BYOK) for at-rest encryption. Data in motion encryption is available |
4. Pricing
Criteria Group | AWS MSK Serverless | Confluent Kafka Standard |
---|---|---|
Connector pricing | MSK connect pricing breakdown | Connectors in Confluent cloud breakdown as per source |
Scenario:
Let’s assume the cluster has 5 topics with 20 partitions each. Your producers write on average 100GB of data daily and your consumers read 200GB of data. You also retain that data for 24 hours to ensure it is available for replay. In the above scenario, you would pay the following for a 31-day month:
Here’s the pricing breakdown for MSK Serverless (for Ohio Region)
Charges | Usage | Rate(USD) | SubTotals |
---|---|---|---|
Cluster/hours | 31 days * 24 hrs/day = 744 cluster-hours | 0.75/cluster-hr | 744 * 0.75 =558 |
Partition/hours | 31 days * 24 hrs/day * 5 * 20 = 74,400 partition-hours | 0.0015/partition-hr | 74,400 * 0.001875 = 111.60 |
Data-in | 100 GB x 31 days = 3,100 GB | 0.10/GB-in | 3,100 * 0.10 = 310 |
Data-out | 200 GB x 31 days = 6,200 GB | 0.05/GB-out | 6,200 * 0.05 = 310 |
Storage | Average storage used = 100 GB-months | 0.10/GB-month | 100 * 0.10 = 10 |
Total | 1229.60 USD |
Here’s the pricing breakdown for Confluent Kafka Standard:
Charges | Usage | Rate(USD) | SubTotals |
---|---|---|---|
Cluster/hours | 31 days * 24 hrs/day = 744 cluster-hours | 1.5/cluster-hr | 744 * 0.75 =1116 |
Partition/hours | First 500 partitions at no additional cost | 0.0015/partition-hr | 0 |
Data-in | 100 GB x 31 days = 3,100 GB | 0.13/GB-in | 3,100 * 0.13 = 403 |
Data-out | 200 GB x 31 days = 6,200 GB | 0.06/GB-out | 6,200 * 0.06 = 372 |
Storage | Average storage used = 100 GB-months | 0.10/GB-month | 100 * 0.10 = 10 |
Total | 1901 USD |
What do you select?
You should choose AWS MSK when
- You need more seamless and deep integration with other native AWS Services like AWS lambda for MSK event sourcing, AWS Secret Manager for client credentials used for SASL/SCRAM authentication, etc.
- You want to lower costs since the cluster hours are reasonably less.
- You need speed in provisioning Kafka Cluster
- You need secure connectivity to your MSK Cluster and other clients accessing the resource using AWS Private Link, VPC Peering or Transit Gateway.
You should choose Confluent Kafka when
- You have a hybrid and multi-cloud strategy (available on AWS, Azure, GCP), making it native to the public cloud providers.
- You need to centralize Kafka Management Operation and have a quick overview using the Confluent Control Center.
- You have an on-premises deployment with cloud providers (e.g., AWS Outpost including Wavelength, Google’s Anthos) since it's primarily built on top of Kubernetes
- You need a rich Kafka ecosystem like pre-built connectors, governance using stream lineage, and the ability to connect non-java Kafka clients.
AWS MSK Serverless and Confluent Kafka Standard are both efficient, reliable, and among the market's best real-time data ingestion solutions. However, they cater to different uses and processes. Hence, evaluating the use case and selecting the most suitable tool is important.
Quantiphi as Amazon MSK Service Delivery Partner
Quantiphi is a designated service delivery partner that makes it easy for customers to migrate and build data streaming solutions on Amazon MSK to not only take advantage of the rich Amazon MSK integrations with other AWS services and address real-time analytics use cases but also help them realize the cost benefits sooner. To get started, or learn more, get in touch with our experts.