IBM DataStage as a Service
IBM SoftwareReviews from AWS customer
0 AWS reviews
-
5 star0
-
4 star0
-
3 star0
-
2 star0
-
1 star0
External reviews
72 reviews
from
External reviews are not included in the AWS star rating for the product.
Blazingly Fast, Full-Featured ETL tool with Flexible Data Connections
What do you like best about the product?
DataStage is a full-featured and blazingly fast ETL tool. It handles many different types of data connection, and gives excellent options for parameterising processes to facilitate code promotion.
What do you dislike about the product?
The UI feels dated and for some "Stage" types (most notably "Hierarchical Stages") it can be difficult to understand. There isn't a lot of online assistance from typical forums (fora?) and much of IBMs help is difficult to access as it's hidden behind their login requirements.
What problems is the product solving and how is that benefiting you?
DataStage helps us process huge volumes of data into our Data Warehouse (on a Netezza appliance) on a regular basis. We also use it for many of our system-to-system integrations. It handles many use cases that SSIS had previously struggled with, though this is partly due to being paired with further tooling that wasn't available to us when using SSIS.
Unmatched Performance and Reliability for Enterprise Data Workloads
What do you like best about the product?
The most impressive aspect of DataStage is its high-performance parallel processing engine, which allows it to handle massive enterprise data volumes with ease. By utilizing "pipelining" and "partitioning," the system can process different stages of a job simultaneously across multiple CPU nodes. This means that instead of waiting for one task to finish before the next begins, data flows through the pipeline like an assembly line, ensuring that even petabyte-scale workloads are completed within tight processing windows.
Furthermore, its visual design environment offers a sophisticated balance between simplicity and power. The drag-and-drop interface allows engineers to build complex ETL logic using pre-built "Stages" for joins, lookups, and transformations without needing to write manual code. However, it remains highly extensible for developers; if a specific requirement isn't met by a standard component, you can integrate custom Python scripts or SQL, making it flexible enough for both standard reporting and complex data science pipelines.
Finally, DataStage excels in enterprise-grade reliability and governance, which is why it remains a staple in highly regulated industries like finance and healthcare. It integrates seamlessly with metadata catalogs to provide end-to-end data lineage, allowing users to track exactly how data has changed from source to target. Combined with robust error-handling and "Reject Links" that capture bad data without crashing the entire job, it provides a level of stability and auditability that many lightweight or open-source tools struggle to match.
Furthermore, its visual design environment offers a sophisticated balance between simplicity and power. The drag-and-drop interface allows engineers to build complex ETL logic using pre-built "Stages" for joins, lookups, and transformations without needing to write manual code. However, it remains highly extensible for developers; if a specific requirement isn't met by a standard component, you can integrate custom Python scripts or SQL, making it flexible enough for both standard reporting and complex data science pipelines.
Finally, DataStage excels in enterprise-grade reliability and governance, which is why it remains a staple in highly regulated industries like finance and healthcare. It integrates seamlessly with metadata catalogs to provide end-to-end data lineage, allowing users to track exactly how data has changed from source to target. Combined with robust error-handling and "Reject Links" that capture bad data without crashing the entire job, it provides a level of stability and auditability that many lightweight or open-source tools struggle to match.
What do you dislike about the product?
One of the most significant drawbacks of IBM DataStage is its prohibitive cost and complex licensing model, which often makes it inaccessible for small-to-medium businesses. Beyond the high initial purchase price, the "IBM Tax" includes ongoing maintenance and specialized infrastructure requirements that scale aggressively with data volume. Furthermore, because the tool is highly proprietary, organizations face heavy vendor lock-in; migrating logic out of DataStage to a modern, open-source-friendly stack like dbt or Airbyte is notoriously difficult and time-consuming.
From a technical standpoint, many engineers find the platform increasingly clunky and "legacy" compared to agile, cloud-native alternatives. While its parallel engine is powerful, it requires deep, specialized expertise to tune—settings like partition methods and buffer sizes are manual and unintuitive, leading to a steep learning curve for new hires. Additionally, while the newer "Next Gen" versions have improved, the ecosystem is still criticized for being batch-heavy, making it less agile for teams that require modern real-time streaming or "DataOps" automation.
From a technical standpoint, many engineers find the platform increasingly clunky and "legacy" compared to agile, cloud-native alternatives. While its parallel engine is powerful, it requires deep, specialized expertise to tune—settings like partition methods and buffer sizes are manual and unintuitive, leading to a steep learning curve for new hires. Additionally, while the newer "Next Gen" versions have improved, the ecosystem is still criticized for being batch-heavy, making it less agile for teams that require modern real-time streaming or "DataOps" automation.
What problems is the product solving and how is that benefiting you?
IBM DataStage primarily solves the challenge of data fragmentation and processing bottlenecks in massive enterprise environments. Large organizations often have data trapped in "silos" across legacy mainframes, modern cloud databases, and various third-party applications; DataStage provides a unified, high-performance bridge to extract and harmonize this information. Its parallel processing engine solves the "time problem" by breaking down petabyte-scale datasets into smaller chunks and processing them simultaneously, ensuring that critical business reports and data warehouses are updated within strict overnight windows rather than taking days to complete.
The primary benefit to you and your organization is data trust and operational efficiency. Because the platform includes built-in data quality and governance tools, it automatically cleanses and validates records as they move through the pipeline, reducing the risk of making business decisions based on "dirty" or inaccurate data. Furthermore, its "design once, run anywhere" architecture allows your team to build a data flow once and deploy it across on-premises servers or multiple cloud providers without rewriting code. This saves significant development time and future-proofs your infrastructure, allowing you to focus on gaining insights rather than troubleshooting manual data transfers.
The primary benefit to you and your organization is data trust and operational efficiency. Because the platform includes built-in data quality and governance tools, it automatically cleanses and validates records as they move through the pipeline, reducing the risk of making business decisions based on "dirty" or inaccurate data. Furthermore, its "design once, run anywhere" architecture allows your team to build a data flow once and deploy it across on-premises servers or multiple cloud providers without rewriting code. This saves significant development time and future-proofs your infrastructure, allowing you to focus on gaining insights rather than troubleshooting manual data transfers.
Exceptional Performance and Connectivity with Intuitive Interface
What do you like best about the product?
Wide Connectivity, High Performance and Scalability, Intuitive Graphical Interface
What do you dislike about the product?
High Learning Curve, Infrastructure Dependency
What problems is the product solving and how is that benefiting you?
Complex data integration, Data transformation and cleaning
Data Integration and Quality with DataStage
What do you like best about the product?
Best data integration tool on the market with a wide range of connectors and advanced data integration and quality features.
What do you dislike about the product?
I quite like the platform as a whole, but I believe it can improve regarding data lineage (it should indeed improve now with the arrival of Manta to the IBM portfolio).
What problems is the product solving and how is that benefiting you?
Help our clients work with integrated, qualified, and reliable data.
IBM Datastage for ETL
What do you like best about the product?
IBM InfoSphere DataStage is simple yet efficient tool for ETL processing.
It has the variety of stages to implement your designs and test the same at runtime.
It has got additional features compared to other ETL tools, which helps in debugging and error handling.
It has the variety of stages to implement your designs and test the same at runtime.
It has got additional features compared to other ETL tools, which helps in debugging and error handling.
What do you dislike about the product?
Datastage is UI is little at the backseat compared to other ETL tools.
Stages could be categorised based on functionalities.
Stages could be categorised based on functionalities.
What problems is the product solving and how is that benefiting you?
It is solving the data integration problems from variety of platforms and provide approciate data formats at the end user.
Like, JSON, Files, txts, DB , amd Bigdata etc
Like, JSON, Files, txts, DB , amd Bigdata etc
Good product
What do you like best about the product?
Its speed. It is very fast and responsive. Support is good.
What do you dislike about the product?
a little hard to use and implement. hs few bugs
What problems is the product solving and how is that benefiting you?
fast data integration and processing
Analyzing vendor data
What do you like best about the product?
There are two reasons for us to use it, less cost, and because it's user friendly.
What do you dislike about the product?
Customer support is excellent, furthermore there can be some improvement on the number of features.
We did not face any problems during its implementation and its integration.
Frequency of use is not high as we are not just relying on it, but we might in future.
We did not face any problems during its implementation and its integration.
Frequency of use is not high as we are not just relying on it, but we might in future.
What problems is the product solving and how is that benefiting you?
I cannot disclose it because of the company's policy, but in brief we are using it to analyse multiple vendor data.
Data Stage review
What do you like best about the product?
- excellent performance in executing ETL processes for large amounts of data.
What do you dislike about the product?
- Lack of documentation and available knowledge for study and learning.
- Lack of support from the supplier (various problems with the product and also lack of support for functionalities like the quality stage).
- Interface is not at all intuitive and difficult to use.
- Lack of support from the supplier (various problems with the product and also lack of support for functionalities like the quality stage).
- Interface is not at all intuitive and difficult to use.
What problems is the product solving and how is that benefiting you?
execution of ETL processes and data quality.
IBM InfoSphere DataStage
What do you like best about the product?
Easy of use, easy of implementation, compact product. Very good team of customer support.
Great performance for large data volumes, allows parallelism
Great performance for large data volumes, allows parallelism
What do you dislike about the product?
it is not support code versioning without git integration
What problems is the product solving and how is that benefiting you?
Data Transformation, Data governance, interconnection of non-homogeneous origins, quick creation of interfaces between applications.
Creation and impletation of data quality rules
Creation and impletation of data quality rules
Using Datastage for ETL
What do you like best about the product?
We use InfoSphere DataStage for ETL in our organisation and as datastage can easily handle large data (Tbs) and we can transform our data easily. It's easier to design our jobs in datastage and to run them.
What do you dislike about the product?
As a beginner I found using datastage hard. As there are so many functionalities and hence it takes time to get a hang of it. But once you start practicing it, it becomes easy.
What problems is the product solving and how is that benefiting you?
As our organisation handle very large data and to extract, transform and load we need some powerful tool. Hence Datastage is solving our problem by handling it prefectly. And we are easily able to build our ETL jobs.
showing 1 - 10