overview

AWS โ€ข August 4, 2023

AWS Glue or Informatica: Discover the Better Option for Integration and Modernized Analytics

In the current digital transformation age, companies are looking to modernize their traditional ETL (extract, transform and load) tools and techniques within the Data Marketplace. In this case, we see a paradigm shift from on-premise to cloud technologies. Enterprises have begun to modernize their traditional ETL tools to acquire, process, and consume data in order to achieve speed and agility. As more companies evaluate options, they are realizing that migrating from PowerCenter to Informatica Intelligent Cloud Services (IICS) is cost-intensive and not completely automated. This is leading them to assess cost-effective, serverless Data Integration options that can modernize their data integration process.

The customers look for the following in data integration:

  • Scalable data integration solutions
  • Cost-effective solutions to optimize the IT expenses
  • Other data pipelines in open source languages
  • Easy-to-use ETL tools to cater to multiple personas

AWS Glue and Informatica are considered to be the best services for ETL migration. In this blog, we discuss insights that will help you make an informed decision about an ETL tool that will fit your requirements.

About Informatica

Informatica offers a portfolio of data integration products as well as tools for data integration, master data management, data quality, data cataloging, and API management. Informatica started out with on-premise tools and now offers cloud-based solutions. Informatica IICS is a cloud-based option for data integration. When you subscribe to Informatica Cloud, you use a web browser to connect to Informatica Cloud and you can configure connections, create users as well as create, run, schedule, and monitor tasks. 

About AWS Glue

AWS Glue is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is a fully managed ETL service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores and data streams. AWS Glue can be used to efficiently organize, cleanse, validate, and format data for storage in a data warehouse or data lake. You can transform and move AWS Cloud data into your datastore. AWS Glue is serverless, hence there is no hassle of infrastructure set up and management.

Let us compare both Informatica and Glue across different dimensions or focus areas such as Architecture, Pricing, ETL development, and so on.

1. Architecture

Criteria Group Informatica AWS Glue
Deployment Options Cloud and on-premise, recently launched Serverless but lacks specifics such as cost Cloud only and Serverless
Ease of Setup  Cloud Informatica needs a secure agent to be installed in a compute instance, be it a local desktop or any cloud compute No setup is involved as Glue is serverless
Access Control  Cloud Informatica has features to add tags, offering fine-grained control at DB, table, and cell levels It offers fine-grained access control. You can access your data at the database, table, column, row, and cell level using resource names and Lake Formation tag attributes
Ecosystem integration Currently attempting to fully integrate with internal products like Informatica Data Catalog, Informatica MDM, etc. Can read and write to AWS but needs licensed connectors, such as Redshift, Kinesis, Kafka Integrates with AWS Ecosystem that has technology and tools for enterprises from IoT, Streaming, Analytics, Machine Learning
Scalability  Scalable if deployed on cloud using secure agents that are individually priced. Need to purchase additional licenses for scaling. Usually, machines are on 24 hours even if you are not using them  Glue is serverless and scaling is automatically handled. You provide the # of Data Processing Units and Glue automatically scales up or down based on the job's capacity requirements

2. Pricing 

Criteria Group Informatica AWS Glue
Costs based on AWS Marketplace $72K per year + additional costs for secure agents + additional cost for connectors + Cost to run EC2 machines. Difficult to get an accurate estimate but most of our customers spend $500K upwards on Informatica, regardless of usage Yes, $0.44 per hour per DPU. 1 DPU provides 4 VCPUs and 16 GB of memory.  Built-in connectors are free and third party connectors cost an average of 300 per connector per month. Dev tools are free of cost
Pricing structure Pricing based on Agents, Connectors. Additional licenses may need to be purchased if you initiate multiple connections to databases  DPU (Data Processing Unit) based pricing + additional pricing for Connectors in Connector Marketplace. No cost to bring open source JDBC connectors
Total Cost of Ownership Costs exclude any cloud or infrastructure costs such as EC2 instances required to run jobs Costs include the infrastructure costs. No additional charges for running jobs

3. ETL Development

Criteria Group Informatica AWS Glue
Connect to different sources Free connectors + licensed connectors Free connectors + licensed connectors + connector marketplace + build your own connectors
Build ETL pipelines Offers easy to use IDE for visually developing ETL jobs with over x transformations. Glue Studio Notebooks - notebook styled interactive development

Glue studio - Visual ETL + Code can be embedded

Glue Databrew - no code ETL for Data Analysts

In addition, you can use your local machine to spin up notebooks of your choice and connect to Glue to build your jobs
Custom codes Custom codes as the script can be written in Informatica cloud Custom codes can be written in the service itself and are easy to use
Scheduling and Orchestrating Informatica Cloud has its scheduler and can be triggered from a separate scheduler Glue workflows can be used to schedule on-demand, or on an event trigger basis
Batch and stream processing Available. However, there are additional licensing requirements to achieve stream processing Available. AWS Glue 3.0 introduces a performance-optimized Apache Spark 3.1 runtime for batch and stream processing
Code Lock-in Yes, objects are stored in a proprietary format as Informatica mappings No, code is written in Open Source PySpark or Scala. Code generated by Glue Studio is easy to understand and modify without a visual interface
Complex Transformations Yes, Informatica offers a rich set of transformations to perform complex data processing Yes, offers all transformations in Spark + Dynamic frames that allow customers to handle semi-structured data effectively. Not all transformations are available in the visual editor yet but can be easily coded using custom transformations
Metadata driven change data capture (example: Modification timestamp in a table, primary keys, file modification ) Can be accomplished using programming techniques such as Informatica variables Glue offers bookmarks that can capture changes from relational databases, S3 files. You can use DMS to capture changes in real-time which is a separate AWS service.
Log based change data capture Need separate license Need to use DMS or partner solution
Event Driven Workflows (Trigger a workflow by listening to events) Yes, but it has to be purchased as a separate license In private preview. Glue jobs can easily trigger jobs based on Eventbridge notification without any licensing. You will incur costs for using Eventbridge
Data Preparation - Allow non-data engineering team to build ETL pipeline Enterprise Data Prep can do this with a separate license Glue Data Brew can do this with a pay-as-you-go model. Glue Studio can automatically generate code from a visual graph. Additionally, customers can create "blueprints" to automatically generate workflows based on a template
Automated code generation PowerCenter supports Visio-based code generation. Cloud Integration product does not support any code generation capability Jobs can be parameterized. Jobs can also be automatically created using Programmable Blueprints
LakeHouse Architecture Support (Ability to easily synchronize data across purpose-built databases such as Document databases, NoSQL databases, etc) Manual Glue Elastic Views supports the creation of materialized views from sources and replicates them in real-time. This is in preview
Reusability Supports mapplets and worklets Parameterized jobs, programmable blueprints
Data Profiling Supported via Enterprise Data Catalog Supported using Glue Databrew
Built-In metadata catalog No. Offered as a separate product Yes, Glue Data Catalog is integrated with Glue
Ability to self discover schema for semi-structured data No, need schema pre-defined Yes, supported via Glue Crawlers
Mainframe support Yes, licensed separately No. Need to use third-party partner solutions open source libraries (will provide the name)

4. Data Quality

Criteria Group Informatica AWS Glue
Data Profiling Yes, licensed separately via Informatica Data Quality Yes, via Glue Data Brew - pay as you go model
Define Data Quality Rules Yes, licensed separately via Informatica Data Quality Supported via Open Source testing and Data Quality solution DeeQu
Monitor results Yes, licensed separately via Informatica Data Quality Supported via Open Source testing and Data Quality solution DeeQu

5. Data Governance

Criteria Group Informatica AWS Glue
Data Lineage Yes, licensed separately Partial lineage is available in Databrew, but it is not complete
Compliance Certificates HIPAA, SOC 2, SOC 3, Privacy Shield Third-party auditors assess the security and compliance of AWS Glue as part of multiple AWS compliance programs. These include SOC, PCI, FedRAMP, HIPAA, and others

What do you select

You should choose AWS Glue when - 

  • You have a cloud-first strategy 
  • Scalability is a priority
  • You need to handle semi-structured data along with relational data 
  • You want to lower costs
  • You want an open-source option that addresses scalability challenges 

You should choose Informatica when -

  • You have a hybrid cloud strategy
  • You need to handle mostly relational data and flat files
  • You need to keep a strong on-premises presence
  • Cost is not an issue for you while scaling up

AWS Glue and Informatica are both efficient, reliable, and among the best data integration solutions in the market. However, they cater to different uses and processes. Hence, it is important to evaluate the use case and select a suiting tool.

To learn more about Seamless Data Pre-processing and ETL Migration with AWS Glue, check out our discussion with Marcelo Silva, Startup Analytics GTM Spec - Business Development, AWS.

Written by

Sanchit Jain And Anand Jha

Thank you for reaching out to us!

Our experts will be in touch with you shortly.

In the meantime, explore our insightful blogs and case studies.

Something went wrong!

Please try it again.

Share