
AWS
QAgent: Empowering Contact Center Excellence With AI-Powered Desktop Solution

AWS
Comparing AWS Serverless with Confluent Kafka Standard: Select the Appropriate Platform to Stream Your Data in Real Time

In the current digital transformation age, companies are looking to modernize their traditional ETL (extract, transform and load) tools and techniques within the Data Marketplace. In this case, we see a paradigm shift from on-premise to cloud technologies. Enterprises have begun to modernize their traditional ETL tools to acquire, process, and consume data in order to achieve speed and agility. As more companies evaluate options, they are realizing that migrating from PowerCenter to Informatica Intelligent Cloud Services (IICS) is cost-intensive and not completely automated. This is leading them to assess cost-effective, serverless Data Integration options that can modernize their data integration process.
The customers look for the following in data integration:
AWS Glue and Informatica are considered to be the best services for ETL migration. In this blog, we discuss insights that will help you make an informed decision about an ETL tool that will fit your requirements.
Informatica offers a portfolio of data integration products as well as tools for data integration, master data management, data quality, data cataloging, and API management. Informatica started out with on-premise tools and now offers cloud-based solutions. Informatica IICS is a cloud-based option for data integration. When you subscribe to Informatica Cloud, you use a web browser to connect to Informatica Cloud and you can configure connections, create users as well as create, run, schedule, and monitor tasks.
AWS Glue is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is a fully managed ETL service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores and data streams. AWS Glue can be used to efficiently organize, cleanse, validate, and format data for storage in a data warehouse or data lake. You can transform and move AWS Cloud data into your datastore. AWS Glue is serverless, hence there is no hassle of infrastructure set up and management.
Let us compare both Informatica and Glue across different dimensions or focus areas such as Architecture, Pricing, ETL development, and so on.
Criteria Group | Informatica | AWS Glue |
Deployment Options | Cloud and on-premise, recently launched Serverless but lacks specifics such as cost | Cloud only and Serverless |
Ease of Setup | Cloud Informatica needs a secure agent to be installed in a compute instance, be it a local desktop or any cloud compute | No setup is involved as Glue is serverless |
Access Control | Cloud Informatica has features to add tags, offering fine-grained control at DB, table, and cell levels | It offers fine-grained access control. You can access your data at the database, table, column, row, and cell level using resource names and Lake Formation tag attributes |
Ecosystem integration | Currently attempting to fully integrate with internal products like Informatica Data Catalog, Informatica MDM, etc. Can read and write to AWS but needs licensed connectors, such as Redshift, Kinesis, Kafka | Integrates with AWS Ecosystem that has technology and tools for enterprises from IoT, Streaming, Analytics, Machine Learning |
Scalability | Scalable if deployed on cloud using secure agents that are individually priced. Need to purchase additional licenses for scaling. Usually, machines are on 24 hours even if you are not using them | Glue is serverless and scaling is automatically handled. You provide the # of Data Processing Units and Glue automatically scales up or down based on the job’s capacity requirements |
Criteria Group | Informatica | AWS Glue |
Costs based on AWS Marketplace | $72K per year + additional costs for secure agents + additional cost for connectors + Cost to run EC2 machines. Difficult to get an accurate estimate but most of our customers spend $500K upwards on Informatica, regardless of usage | Yes, $0.44 per hour per DPU. 1 DPU provides 4 VCPUs and 16 GB of memory. Built-in connectors are free and third party connectors cost an average of 300 per connector per month. Dev tools are free of cost |
Pricing structure | Pricing based on Agents, Connectors. Additional licenses may need to be purchased if you initiate multiple connections to databases | DPU (Data Processing Unit) based pricing + additional pricing for Connectors in Connector Marketplace. No cost to bring open source JDBC connectors |
Total Cost of Ownership | Costs exclude any cloud or infrastructure costs such as EC2 instances required to run jobs | Costs include the infrastructure costs. No additional charges for running jobs |
Criteria Group | Informatica | AWS Glue |
Connect to different sources | Free connectors + licensed connectors | Free connectors + licensed connectors + connector marketplace + build your own connectors |
Build ETL pipelines | Offers easy to use IDE for visually developing ETL jobs with over x transformations. |
Glue Studio Notebooks – notebook styled interactive development
Glue studio – Visual ETL + Code can be embedded Glue Databrew – no code ETL for Data Analysts In addition, you can use your local machine to spin up notebooks of your choice and connect to Glue to build your jobs |
Custom codes | Custom codes as the script can be written in Informatica cloud | Custom codes can be written in the service itself and are easy to use |
Scheduling and Orchestrating | Informatica Cloud has its scheduler and can be triggered from a separate scheduler | Glue workflows can be used to schedule on-demand, or on an event trigger basis |
Batch and stream processing | Available. However, there are additional licensing requirements to achieve stream processing | Available. AWS Glue 3.0 introduces a performance-optimized Apache Spark 3.1 runtime for batch and stream processing |
Code Lock-in | Yes, objects are stored in a proprietary format as Informatica mappings | No, code is written in Open Source PySpark or Scala. Code generated by Glue Studio is easy to understand and modify without a visual interface |
Complex Transformations | Yes, Informatica offers a rich set of transformations to perform complex data processing | Yes, offers all transformations in Spark + Dynamic frames that allow customers to handle semi-structured data effectively. Not all transformations are available in the visual editor yet but can be easily coded using custom transformations |
Metadata driven change data capture (example: Modification timestamp in a table, primary keys, file modification ) | Can be accomplished using programming techniques such as Informatica variables | Glue offers bookmarks that can capture changes from relational databases, S3 files. You can use DMS to capture changes in real-time which is a separate AWS service. |
Log based change data capture | Need separate license | Need to use DMS or partner solution |
Event Driven Workflows (Trigger a workflow by listening to events) | Yes, but it has to be purchased as a separate license | In private preview. Glue jobs can easily trigger jobs based on Eventbridge notification without any licensing. You will incur costs for using Eventbridge |
Data Preparation – Allow non-data engineering team to build ETL pipeline | Enterprise Data Prep can do this with a separate license | Glue Data Brew can do this with a pay-as-you-go model. Glue Studio can automatically generate code from a visual graph. Additionally, customers can create “blueprints” to automatically generate workflows based on a template |
Automated code generation | PowerCenter supports Visio-based code generation. Cloud Integration product does not support any code generation capability | Jobs can be parameterized. Jobs can also be automatically created using Programmable Blueprints |
LakeHouse Architecture Support (Ability to easily synchronize data across purpose-built databases such as Document databases, NoSQL databases, etc) | Manual | Glue Elastic Views supports the creation of materialized views from sources and replicates them in real-time. This is in preview |
Reusability | Supports mapplets and worklets | Parameterized jobs, programmable blueprints |
Data Profiling | Supported via Enterprise Data Catalog | Supported using Glue Databrew |
Built-In metadata catalog | No. Offered as a separate product | Yes, Glue Data Catalog is integrated with Glue |
Ability to self discover schema for semi-structured data | No, need schema pre-defined | Yes, supported via Glue Crawlers |
Mainframe support | Yes, licensed separately | No. Need to use third-party partner solutions open source libraries (will provide the name) |
Criteria Group | Informatica | AWS Glue |
Data Profiling | Yes, licensed separately via Informatica Data Quality | Yes, via Glue Data Brew – pay as you go model |
Define Data Quality Rules | Yes, licensed separately via Informatica Data Quality | Supported via Open Source testing and Data Quality solution DeeQu |
Monitor results | Yes, licensed separately via Informatica Data Quality | Supported via Open Source testing and Data Quality solution DeeQu |
Criteria Group | Informatica | AWS Glue |
Data Lineage | Yes, licensed separately | Partial lineage is available in Databrew, but it is not complete |
Compliance Certificates | HIPAA, SOC 2, SOC 3, Privacy Shield | Third-party auditors assess the security and compliance of AWS Glue as part of multiple AWS compliance programs. These include SOC, PCI, FedRAMP, HIPAA, and others |
You should choose AWS Glue when –
You should choose Informatica when –
AWS Glue and Informatica are both efficient, reliable, and among the best data integration solutions in the market. However, they cater to different uses and processes. Hence, it is important to evaluate the use case and select a suiting tool.
To learn more about Seamless Data Pre-processing and ETL Migration with AWS Glue, check out our discussion with Marcelo Silva, Startup Analytics GTM Spec – Business Development, AWS.