DataHub logo

    DataHub

    Sold by
    DataHub vision is to bring clarity to your data through its next-generation multi-cloud metadata management platform. The technology is based on LinkedIn DataHub and Apache Gobblin - two successful open-source projects incubated at LinkedIn and battle-hardened in production at scale at major enterprises.

    Ratings and reviews

    4.1
    20 ratings
    3 star
    2 star
    1 star
    40%
    60%
    0%
    0%
    0%
    12 AWS reviews
    |
    8 external reviews
    External reviews are from PeerSpot .

    Filters

    Review type

    AWS Marketplace reviews
    External reviews
    Reviews (20)
    reviewer2866170

    Centralizes data knowledge for all teams but has faced heavy infrastructure and maintenance demands

    Reviewed on Jun 29, 2026
    Review provided by PeerSpot

    What is our primary use case?

    Since many developers were developing and creating data models, and there were no DDLs left in the company, we had to recreate all the descriptions of tables and clarify which columns meant what. We made it a place where all stakeholders in our company could log in and see which data were used for which data marts, which column values meant for which definitions, and how they were measured. We primarily used Data Hub for sharing information and increasing the data literacy of our company.

    What is most valuable?

    The data injection feature was valuable to me. If comments were inserted in the tables, the data would automatically gather and enter all necessary data into Data Hub. Additionally, the data lineage graphs were really helpful in showing how data flowed to data marts and which columns were used for creating other columns. Those were helpful but somewhat hard to manage features.

    What needs improvement?

    I am not familiar with how people use Data Hub as a knowledge system for developing LLM models or RAG, so I am uncertain what improvements could be made in that sense. However, the way we used it, the data was quite heavy because it consisted of multiple components such as graph DBs, Elasticsearch, Kafka for data injection, and MySQL for metadata storage. This made it somewhat bulky on our server when we deployed Data Hub, and we had difficulty managing the memory constraints and disk usage.

    When we used Data Hub, we attempted to provide different servers for different components, but we could not find good manuals on how to use it in a more productive server manner. Having guidelines or manuals on this could be helpful.

    For how long have I used the solution?

    I used Data Hub for about a year.

    What do I think about the stability of the solution?

    Other than out-of-memory issues, Data Hub was stable because we did not have to restart services much except for memory issues. The overall service itself was stable, but it was very bulky.

    What do I think about the scalability of the solution?

    For growing data graphs or data lineage, scalability depended on our manpower. From that understanding, Data Hub's scalability was not as great as we expected because we could not enter or obtain better data for all our tables because it was too much for our data teams. We had to be selective in that matter. From our understanding, we could not really enjoy the scalability of the data.

    How are customer service and support?

    We were fortunate not to need customer service, but we did not know that technical support for Data Hub was possible. If we had been aware of that, we could have asked for support or guidance.

    How was the initial setup?

    The initial deployment of Data Hub itself was easy because we could get the Docker Compose YAML file and run the Docker command to get the service up. However, since Data Hub uses various types of components, we had trouble assigning each server for each component and connecting them via network. We overcame that problem.

    Which other solutions did I evaluate?

    We searched for other use cases and how other companies were developing their own solutions for using data, but we have not been able to use other products similar to Data Hub itself.

    What other advice do I have?

    I am not familiar with how the data is priced, so I cannot answer that question. Since we were running all the components on one server, there were issues when data injection was occurring too frequently, and sometimes the server could run out of memory or the disk storage could become full. We had maintenance issues that we had to handle in terms of memory and disk storage. We had to alter some injection strategies and timelines so that the data would not grow from too many tables at the same time. These maintenance jobs took place regularly. I would rate this product a 7.

    Somashekar Venkataramaiah

    Centralized metadata has enabled us to build an enterprise catalog and streamline data discovery

    Reviewed on Jun 28, 2026
    Review provided by PeerSpot

    What is our primary use case?

    Our main use case for Data Hub is to build our enterprise data catalog within Visa using the open-source version.

    We use Data Hub to build pipelines and to construct our enterprise data catalog to see where the data is coming from, how the lineage flows, where the lineage of the data originates from, and how the metadata propagation occurs. With this metadata information and the description of all the fields, we have built a layer on top of this that performs natural language querying for people to find where and how the data comes from.

    What is most valuable?

    The best features that Data Hub offers include metadata propagation and lineage propagation.

    These features have specifically helped my team and our workflows by enabling people to find the right data. We have different sets of data that include business data and application data. People who are new, including data analysts, machine learning scientists, or data scientists, can easily find the specific data they are looking for because it is all centralized in one place.

    Data Hub has positively impacted our organization by centralizing and co-locating all data through metadata, and we have made this our enterprise metadata catalog rather than having disorganized information across different teams. It has saved time for many data analysts and data scientists to find the right data.

    What needs improvement?

    I have no comments on how Data Hub can be improved at this time.

    For how long have I used the solution?

    I have been using Data Hub for the past four years.

    What other advice do I have?

    On a scale of one to ten, I rate Data Hub a nine.

    I chose nine out of ten because Data Hub is a single solution that we could adopt easily and build our platform on top of it. It provided all the features that we needed, which is why I gave that rating.

    Regarding Data Hub's AI capabilities, I have not explored its governance and security features, but I would like to explore them. I have not gone through the AI features of Data Hub concerning the accuracy and reliability of output.

    My advice to others looking into using Data Hub is that it is a fantastic tool for people who want to centralize and keep all the data discoverable in one single place. I would highly recommend using it. I give this review an overall rating of nine out of ten.

    Chakib Bekhouche

    Data mapping has improved metadata completeness and now supports faster business data discovery

    Reviewed on Jun 24, 2026
    Review provided by PeerSpot

    What is our primary use case?

    My main use case for Data Hub in these projects is the mapping of business glossary terms to real data for the first project and the calibration and enrichment of all the necessary information within a specific scope in the second project, which involves real data and the business glossary.

    Within the engineering teams of Renault, there was a lot of data without sufficient metadata, such as descriptions of tables and columns. The objective was to complete the definitions and descriptions of business data objects within the glossary and map these descriptions to the tables and columns that comprise the data sets of this engineering department to ensure a comprehensive experience when searching for data, providing adequate definitions and descriptions of the data used in this department.

    I use Data Hub within two of my clients. With Renault, the car constructor, they changed their data catalog from Zeenea to Data Hub, and I have a mission to contribute to the enrichment of this data catalog by conducting workshops with data providers, data stewards, and all the stakeholders involved in this data catalog. The aim of this mission is to map real data to the definitions and descriptions of business data objects available in the company's glossary. My second mission was with Hitachi Rail, a company that provides rail services, where the mission involved benchmarking several data catalogs including OpenMetadata and Data Galaxy. Data Hub was chosen for its available functionalities, with the task of implementing this data catalog with a specific scope and then completing the usage of this data if everything works well.

    What is most valuable?

    I find that my main use case for Data Hub is easy to execute because the tool is user-friendly and its functionalities are simple to understand.

    The best feature that Data Hub offers in my experience is the ability to map between real data and data sets.

    The mapping feature helps my team and clients significantly because it addresses the lack of metadata information about the tables and columns used in the company's data lake, enriching the data catalog considerably through this mapping.

    Data Hub positively impacts my organization and clients by making it easier to search for data. It facilitates easier collaboration and helps save time. However, concerning data quality, it is not sufficiently equipped as it lacks components to evaluate the data quality level, which is a feature available in other data catalogs, indicating an area for improvement.

    What needs improvement?

    One aspect that could be improved is the ability to have more specific KPIs regarding the enrichment, completeness, and accuracy of the information.

    Data Hub can be improved in several ways, primarily by enhancing the data quality evaluation capabilities. Additionally, I would suggest improving the hierarchy of business glossary terms, as understanding the characteristics of each business data object can be challenging within the current structure of business glossary terms in Data Hub.

    For how long have I used the solution?

    I have been using Data Hub across these projects for approximately less than six months.

    What do I think about the scalability of the solution?

    In my experience, Data Hub offers good scalability.

    How are customer service and support?

    The customer support for Data Hub is robust. I had full support and did not use it extensively, relying primarily on Slack for questions and the documentation, which was sufficient since I utilized the open-source version.

    What other advice do I have?

    I do not have information about Data Hub's AI capabilities. However, I can mention that the documentation of Data Hub is usable within an AI tool, specifically an LLM tool, which would simplify finding information in the documentation.

    I have conducted benchmarks with OpenMetadata and Data Galaxy, but I have never used them for a mission with my clients. Before choosing Data Hub, I evaluated all the principal tools on the market, including Castor, Data Galaxy, and OpenMetadata.

    I have no experience with pricing as I used the free license. My advice for others looking into using Data Hub is to consider the paid version for enhanced options related to data quality and the availability of KPIs regarding the completeness and accuracy of metadata, which results in a superior experience with this tool. I would rate this product an eight out of ten.

    Jueun Moon

    Cataloging data and business terms has reduced questions and speeds up KPI tracking

    Reviewed on Jun 24, 2026
    Review from a verified AWS customer

    What is our primary use case?

    My main use case for Data Hub is for a catalog system because we are integrating all of the data sources to Snowflake and then we want to catalog and share business glossary terms with our company employees.

    A quick specific example of how I use Data Hub in my daily workflow is that we have all of the data in Snowflake and all of the employees using Snowflake did not know what kind of data is in Snowflake. They did not know all of the tables and what kind of columns and metrics, KPI definitions exist, so we are using Data Hub for searching the data in Snowflake and identifying who is using Snowflake.

    My main use case is covered.

    How has it helped my organization?

    Data Hub has positively impacted my organization because there are many data analysts in each team, and the time to Q&A has significantly decreased since we started using Data Hub. This improvement is also seen in our KPI tracking.

    I cannot provide specific time savings, but for example, we used to have 100 user requests for questions, which required searching Snowflake tables to determine what tables should be used, but now it is down to almost 10 questions.

    What is most valuable?

    In my opinion, the best features Data Hub offers are the searching function and tagging function. If I add a tag for some of the tables or columns, it is very easy to find people who need that information.

    I am trying to use the tagging function for all of our data, but we are currently developing it, so we have covered almost 70% of our data.

    What needs improvement?

    We are using the free version of Data Hub with Docker Compose, so it is somewhat difficult to find out the lineage. If we are using Data Hub free version, then we can only figure out the tables' lineage, but we cannot search the column lineage, which is why I would like to add the columns-level lineage.

    I need the lineage function for more column-level lineage and I think more example documents that are essential for our company would be very useful because there are many glossary terms and features in Data Hub, but I did not know which are more essential for us.

    Additionally, I also have one more concern regarding using Docker Compose for Data Hub; the memory issues come up sometimes and consume a lot of memory resources, so I need a more efficient way to use Data Hub without these issues.

    For how long have I used the solution?

    I have been using Data Hub for almost one year.

    What other advice do I have?

    We are using private clouds in AWS, and we have deployed Data Hub on the AWS EC2 server with Docker Compose.

    The cloud provider we use is AWS.

    I did not purchase Data Hub through the AWS Marketplace; I am just using the EC2 server and deploying it with Docker Compose.

    My advice for others looking into using Data Hub is that if there is no catalog system or data dictionary system and if there are many KPIs or metrics within their company, then I recommend Data Hub to those kinds of teams.

    I give Data Hub an overall rating of 8.

    Which deployment model are you using for this solution?

    Private Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    RohitJoshi1

    Metadata lineage tracking has improved governance and currently supports clear data observability

    Reviewed on Jun 22, 2026
    Review from a verified AWS customer

    What is our primary use case?

    My main use case for Data Hub is data lineage tracking. With Data Hub, we track multiple sources, ingestion sources, and different sources where the data resides in S3. We bring all that metadata into Data Hub to track lineage on the data ingestion patterns that we perform or transformations that we do, and how they move from different tables or assets or the data pipelines. Whatever transformations we do with Spark and S3, Snowflake, all those are being tracked via Data Hub. We have S3 buckets and Snowflake tables, and all those lineage tracking is managed through the platform.

    My main use case is mostly covered as we used Data Hub for metadata tracking and lineage for whatever transformations that we do so that we can track each transformation down the line.

    What is most valuable?

    In my experience, the best features Data Hub offers include lineage tracking, which is mostly on the asset level, a good glossary, and good connector support.

    Regarding asset level and the good glossary, we need the glossary of our products so that it is easy to track which product, what went at what time on that particular product, how many assets are related, and so on. For asset integrations, Data Hub makes it easy to ingest all that metadata of those particular assets from S3 via connectors, which is quite easy. It has good connector support, although limited in some cases.

    Overall, Data Hub is a good tool. If we talk about lineage, metadata, and observability on some high level, including domain descriptions, PII classification, datasets, and keeping datasets in one place along with policies, it is good in that particular sense. We do have a plan based on project-to-project usage, but in some of the projects, we do use Data Hub as well.

    What needs improvement?

    I would like to add that for the connectors, there is sometimes limited support for using wildcards to get the items or assets ingested from sources like S3; it does not support very good wildcard filters. Additionally, Data Hub has a problem with column-level lineage support, especially regarding non-pro users or those without any plans. If I talk about the free features of Data Hub open source, those two I found could be improved during my use case.

    Regarding improvements needed for Data Hub, I have already mentioned the limitations on the usage of wildcards in the ingestion or connectors; that can be worked upon, especially regarding the open-source part of Data Hub. The rest is that I hope the UI is quite good.

    For how long have I used the solution?

    I used Data Hub for one and a half years.

    What other advice do I have?

    My advice for others looking into using Data Hub is that it is a good tool if you want to capture all that metadata, lineage, keep track of governance, security, and observability. It just depends on how you want to use it; you can choose the open-source version or the paid version and subscription-based model. The paid versions have more features, but open-source Data Hub, which most people will try to go for, has some limitations, such as the missing column-level lineage with Spark. You need to consider those points, but overall, it is good. I would rate this product an 8 out of 10.

    Which deployment model are you using for this solution?

    Private Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Gbytyqi Gbytyqi

    Data mesh has connected 2,000 colleagues and has made cross‑team collaboration transparent

    Reviewed on Jun 15, 2026
    Review from a verified AWS customer

    What is our primary use case?

    My main use case for Data Hub involves integrating our HR system or Active Directory, which automatically pulls in all 2,000 workers and groups them into their respective project squads and R&D teams. Each team gets its own team profile page in Data Hub, which helps solve the classic corporate headache of determining who to ask for specific information.

    When a team builds a data pipeline, a Kafka topic for telecom signals, or a dashboard, it is tagged explicitly with their team profile as the owner in Data Hub. This means that if a developer in Split, working in the same company, needs to find a specific network dataset, they do not waste days spamming Slack channels; they can simply look it up in Data Hub and find the team profile that owns it along with the direct contact info or Slack channel.

    Additionally, it enables us to run a data mesh model with 2,000 people, allowing one central IT team to manage everything while Data Hub facilitates splitting the company into logical domains such as electronic health, telecom networks, IoT, or smart cities.

    What is most valuable?

    The best features that Data Hub offers include the ability to centralize everything in one platform, such as creating profiles and organizing them into separate domains like engineering, health teams, supporting teams, and HR teams. This allows information to be shared across different domains.

    Utilizing the data mesh model enables the company to maximize functionality using a single solution. Data Hub supports collaboration between different teams and departments significantly, as evidenced when we created various data mesh modules and established different domains such as E-Health, telecom networks, and IoT. This allowed us to share datasets effectively, and with authenticated users, the communication and responses were much quicker.

    Among those features, I find the collaborative aspects the most valuable in my work because it has greatly improved our operations over the past year. We evaluated various licenses and methods to integrate data catalog platforms, ultimately deciding to move forward with Data Hub since it was more compatible with our company's security requirements. Compared to other tools, it received better support from the community, which is updated daily, allowing us to collaborate effectively through contact sharing.

    Data Hub has positively impacted my organization by functioning as an all-in-one solution. It uses data mesh and separates domains to manage privileged access based on user validation, allowing us to share data sets across the company, which informs everyone about internal regulations. Furthermore, it significantly aids new joiners in understanding the operations and knowing who works on specific projects, while also providing updates on changes occurring within various sectors and domains.

    The frequency and quality of updates or new features released for Data Hub have been impressive. This extensive community support was a key factor for us at Ericsson Nikola Tesla to choose Data Hub as our data catalog.

    What needs improvement?

    Regarding how Data Hub can be improved, I believe they should focus on enhancing their marketing efforts. Within our company, we were unaware of the Data Hub platform while searching for data catalog options that offered strong security and collaboration. Better marketing would help other companies learn about this effective solution.

    My rating of eight rather than a nine or ten pertains to the connections with different systems. Specifically, the integration with Slack and Azure, as well as how we link our HR system to Data Hub, could be improved for better compatibility.

    Integrating Data Hub with our existing tools and systems was not very easy, which is why my rating is an eight. We attempted to incorporate our HR system with Data Hub, aiming to set governance status for the 2,000 employees in our organization, but I did not complete this aspect before leaving the organization.

    For how long have I used the solution?

    I have been using Data Hub for at least six months at the company called Ericsson Nikola Tesla in Zagreb, which has a massive operation with an entire ICT and R&D division of around 2,000 workers.

    What do I think about the scalability of the solution?

    In terms of scalability, I believe Data Hub performs exceptionally well as more teams come on board, making it efficient for large organizations with approximately 2,000 employees. It adequately supports the scalability of data sets and the implementation of data mesh models.

    How was the initial setup?

    During implementation, the documentation and support resources from Data Hub were very helpful. I followed the guidelines, accessed each section, and understood the platform effectively, which made the initial setup easy.

    What other advice do I have?

    Data Hub is flexible, optimistic, and user-friendly in terms of its interface and experience. I rate Data Hub an eight on a scale of one to ten.

    The learning curve for new users adopting Data Hub is addressed through their learning section that guides users on how to navigate the platform. I found it quite simple and effective to follow.

    We purchased Data Hub through the AWS Marketplace.

    As for specific outcomes or metrics, I currently do not possess numbers since we are still in the early stages of implementing Data Hub within our company. However, the HR department reported significant time savings in completing tasks before and after adopting Data Hub, which has resulted in faster completion and better collaboration without interrupting others.

    Data Hub has worked for me personally, as I noticed that after we began ingesting Data Hub into our Ericsson Nikola Tesla company network, it proved to be incredibly helpful for easier access to information. By positioning team profiles at the center of Data Hub, it prevents the duplication of data sets, accelerates onboarding for new engineers, and fosters more connected and collaborative teams within our large employee base. Personally, it has helped me specify tasks and has contributed to the company's progress with the data catalog we chose.

    My advice for others considering using Data Hub is to understand how it works and explore its integration potential within their organization. Engaging with community support can also be beneficial, as the team's collaborative approach is impressive.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Alireza Khorami

    Centralizes data lineage and ownership and has improved our organization-wide data governance

    Reviewed on Jun 15, 2026
    Review provided by PeerSpot

    What is our primary use case?

    I use Data Hub for our data lineage, data management, data heritage, and our data dictionary. Our organization is quite large, with about 2,000 people working on different initiatives, and everyone wants to connect to a database somehow. As the data engineering team, we are responsible for connecting every single data source that we have, defining each one, and providing an accurate single source of truth for the data so everyone can have the same understanding of the data they are discussing. Since we are ingesting every database that the company has into our own data infrastructure through different tools, we needed to have a clear understanding of data quality, data lineage, and the data discovery part of the process.

    What is most valuable?

    The Helm chart of Data Hub is designed really well, which makes our deployment strategies lean and operational. The UI is excellent, and I really appreciated the ability to treat data gathering and data ingestion as GitHub workflows. Data Hub is one of the services that was truly scalable, at least in the open-source version, which is one of the things we valued since you could scale every part of the system it used, including its internal MySQL or metadata database, Elasticsearch, and the search capabilities. Everything about Data Hub was quite scalable due to its excellent Helm chart, as they really focused on the Kubernetes aspect.

    What needs improvement?

    We encountered some issues when we wanted to connect our streaming infrastructure to Data Hub, which was somewhat problematic.

    In our data streaming infrastructure, we had a database CDC'd through Kafka Connect to a Kafka topic, and at the end of the pipeline, it would go to either an OLAP or a data lakehouse. However, the problem with visualizing this data lineage was that while the connection between MySQL and Kafka worked, when we wanted to track data from Kafka to other services, we couldn't track everything back because the IDs were generated randomly and couldn't be connected. We had to fix this manually by stating where the data had gone, which was tedious.

    Data Hub's GMS service, or General Metadata Service, is a good service that I used regularly, but the CLI version had considerable changes across different versions. When I installed a different version, there wasn't enough consistency to ensure that commands I used would work in future versions of Data Hub's GMS CLI, which was frustrating. I also recall that setting up Kafka without Zookeeper was not possible, which was inconvenient, though I should verify this as I don't remember if they fixed it. At least from my recollection, when I wanted to set it up one and a half years ago, they did not have direct support for KRAFT in their Helm chart.

    For how long have I used the solution?

    I have been using Data Hub for approximately one and a half years.

    Which solution did I use previously and why did I switch?

    Before adopting Data Hub, we considered moving forward with OpenMetadata but decided against it since it couldn't support MySQL version 5.

    How was the initial setup?

    The setup of Data Hub was quite straightforward. One aspect of the architecture I appreciated is that Data Hub relies heavily on Cron jobs and jobs in Kubernetes. Whenever it needs to fix something, it initiates a job to repair its MySQL or its Elasticsearch. Operationally, I find it to be an excellent service, as they worked well on that aspect with the open-source version. However, the lack of support for KRAFT out of the box was somewhat problematic.

    Which other solutions did I evaluate?

    I previously evaluated OpenMetadata as a tool we considered before choosing Data Hub. In comparison to OpenMetadata, the lack of support for more databases and data sources was the deciding factor, whereas for Data Hub, we didn't encounter any problems; it worked really well.

    What other advice do I have?

    Data Hub helped us by making it clear who owned which data and who needed to make changes to clean the deprecated data models and infrastructures we had, which was the most significant benefit. Using a tool that Data Hub provided made us visible to the faults and bugs in our different data sources.

    I would recommend that organizations considering Data Hub adopt GitOps practices, as we implemented it where every single ingestion or transformation was triggered by GitLab CI/CD, making it straightforward for everyone to use. That was the most innovative approach we took by running every single ingestion job as a Cron job in Kubernetes through our GitOps.

    I would rate this product a nine out of ten.

    Henrique dos Anjos

    Data catalog has unified business terms and democratized access to our data lake

    Reviewed on Jun 12, 2026
    Review provided by PeerSpot

    What is our primary use case?

    My main use case for Data Hub is to implement a data catalog for one of the clients that the consultancy I work at is serving.

    A specific example of how the data catalog was used for that client is that it was used to define business terms and to explore the terms from the data glossary by adding definitions. It was also used to capture all the tables and fields that were connected to a data lake, allowing me to explore the entire production data lake and tag the tables and fields, segmenting these tables by domains such as sales tables and marketing tables.

    What is most valuable?

    Data Hub offers several best features including the tagging capability, domain segmentation, data exploration, and creation of a data glossary, which was very interesting to me. Additionally, the ease of plugging in new data sources is exceptional. Data Hub can be easily integrated with a data lake, and the environment can be explored through the metadata via Data Hub. I found the connection part straightforward.

    Data Hub had a positive impact on my organization by disclosing to the organization and to business users what existed in the data lake. The interface that the technical team has with the tables and fields is designed for professionals in the technical area. Having a data catalog helps provide a better interface for data discovery and data democratization within the organization since everyone should have access to what types of data the organization has, and that was the biggest impact.

    What needs improvement?

    I started using the quality part for consistency, but I had limited contact with it and we did not progress much.

    I believe the data quality module can always be improved by examining what is available in the market and making appropriate improvements to the tool. The data quality part is very important and it is not always fully leveraged as it should be. I also think that providing consulting or support with professionals who are qualified to use Data Hub would be interesting, along with providing training and certifications for the tool so that those who are implementing it can specialize increasingly in its features.

    For how long have I used the solution?

    I have been using Data Hub for around one year.

    What do I think about the stability of the solution?

    Data Hub is stable, and I did not have any stability problems when I was working with the tool.

    What do I think about the scalability of the solution?

    Data Hub's scalability is very easy, as we were able to add users and new datasets very quickly and smoothly.

    Which solution did I use previously and why did I switch?

    I was not previously using a different solution. The implementation was already directly part of a data governance initiative and it was done directly with Data Hub, meaning there was no previous solution.

    What about the implementation team?

    I believe the consultancy has some kind of commercial relationship with Data Hub to promote and offer Data Hub as a data catalog solution.

    Which other solutions did I evaluate?

    Before choosing Data Hub, the consultancy worked with some tools such as Google's DataPlex and Purview.

    What other advice do I have?

    My advice for others thinking about using Data Hub is to have the governance initiative well-structured and to have all the documentation for data owners and data stewardship so you know who will be the points of contact when the tool starts being configured, ensuring that you have people responsible for doing reviews and approvals in the tool. I would rate this product an eight out of ten.

    Anhnq Ho

    Building a live data dictionary has improved governance while profiling of tables still needs work

    Reviewed on Jun 12, 2026
    Review from a verified AWS customer

    What is our primary use case?

    The main use case for Data Hub is using HubSpot for investing from our data by utilizing AWS service. Each year, we invest based on the data files. A specific example of how my team uses Data Hub in our daily work is that our data engineer team leverages Data Hub to check and verify data limits. The PIT tool helps in understanding data mass via metadata and integrates business philosophy to enhance dashboard comprehension. Regularly, Data Hub is utilized to monitor various aspects.

    What is most valuable?

    The best features that Data Hub offers and stand out to me are its ability to invest in metadata from a variety of databases including Oracle, MySQL, AWS database, and MongoDB. Data Hub can also work with files, embedding dataset structures. It is easy to comprehend and facilitates our teams, such as BI, EI, and data governance, to understand data and IT structures. Using Data Hub, we can build and maintain a live data dictionary, providing company-wide clarity on data.

    What needs improvement?

    Data Hub can be improved since the version we have in our company does not support profiling for the table side. It lacks functionality for checking the database and table side in production. Despite its robust support, the absence of table-profile support remains puzzling. An enhancement could include a customized Oracle Adapt connection for better handling data tables, especially for Oracle.

    For how long have I used the solution?

    I have been working in my current field for about four years.

    What do I think about the stability of the solution?

    Data Hub is stable for our needs, running reliably without issues of downtime.

    What do I think about the scalability of the solution?

    Data Hub's scalability is adequate. Sometimes, we need to scale up to ingest the database service from AWS, and when it is done, we scale it down.

    How are customer service and support?

    Customer support for Data Hub has been helpful. I often check the online forum for Data Hub or use AI chat to find solutions.

    Which solution did I use previously and why did I switch?

    We did not use another metadata management solution before Data Hub. Data Hub is the first solution we used.

    What was our ROI?

    We have seen a return on investment from using Data Hub, as it saves our data governance team time by collating metadata and viewing the live data dictionary. It is very helpful in building data quality for the company, leading to approximately thirty percent improvement in efficiency.

    What's my experience with pricing, setup cost, and licensing?

    Data Hub is deployed in our organization on-premises. It costs about zero since, if we win the setup, it probably results in no cost.

    Which other solutions did I evaluate?

    Before choosing Data Hub, we evaluated other options including Apigee and IBM. I do not remember the exact names, but we chose Data Hub because we had time to pilot it and found it to be the best choice at that moment.

    What other advice do I have?

    My advice for others looking into using Data Hub is that it is a simple way to ingest technical metadata, and it is an open-source solution worth trying out to optimize compared to other enterprise solutions. Regarding Data Hub's governance and security, I think it is a tool that supports the Data Hub team in improving their skills and work. I would rate this product a seven out of ten.
    Igor Uchoa

    Cataloging data lineage has clarified data origins and supports confident analytics

    Reviewed on Jun 10, 2026
    Review provided by PeerSpot

    What is our primary use case?

    I provide services related to Data Hub and data management. In total, I can consider my career using Data Hub to be approximately two to two and a half years. I tried to use it on several occasions in previous experiences, and I even started an initiative in my team to have it installed and used broadly in the company because everyone was always lost when it comes to data. They didn't know where the data comes from, which transformation it suffered, and where that data lands. These types of questions were always being asked, and I had the idea to use Data Hub, the open-source version, to give people these analytical capabilities and also integrate with Data Catalog and glossary. I tried to lead that initiative in the past, and then I left the company, so I couldn't see what the end result was.

    Right now, Data Hub is already in place, and I use it sometimes, not too frequently because it's maintained by another team, but I use it as an end-user. My company is migrating off the open-source version and is moving to the Atlan version, and I think they use the same tool, but Atlan has more functionalities because they contribute to their private repository.

    I used some built-in catalogs on Snowflake, for example, and Databricks, which serve well for that purpose inside their tools. In those cases, you could really see the lineage in Databricks or Snowflake, but at that time, there was no global capability. When we were evaluating Data Hub, there were other companies we were also checking, such as proprietary ones, including Atlan. Although I don't remember all of them, I haven't used those other private companies enough to be confident in my statements about the competitors.

    What is most valuable?

    I appreciate two things about Data Hub. One thing I appreciate a great deal is the ability that it has to automatically integrate with some of your tools and work as a receptor for receiving data information from your pipeline. For example, if the pipeline starts reading this table and writing these other tables, all this lineage information is sent to Data Hub automatically for some sources, while others only allow a pull-based approach. In those cases, the software needs to push the metrics from time to time to Data Hub, but it allows us to stream all these metrics to Data Hub, which is excellent and is almost seamlessly integrated with some tools that we had.

    The second one is the lineage itself, because Data Hub offers a tremendous amount of data lineage capabilities, even column-level lineage, which was very useful for us and is still very useful.

    I find Data Hub to be a very important data catalog tool for a company that values data.

    What needs improvement?

    Regarding what I dislike about Data Hub, I think the UI is minimalistic, and I found myself lost sometimes when looking for a specific dataset. I could find it in the search bar, but multiple things appeared with the same name, making it hard to differentiate if it was a pipeline, a table, or a schema. This UI aspect could use improvement. Additionally, there was another issue regarding configuration; back when I used it, we had to manually input a JSON with all the configurations for a given source, which caused some issues with no visibility on those issues. Mostly the UI is something they could improve.

    For how long have I used the solution?

    In total, I can consider my career using Data Hub to be approximately two to two and a half years.

    What do I think about the stability of the solution?

    When I used Data Hub, I did not experience any lagging, crashing, or downtime. I only had some issues with the data refresh that sometimes resulted in outdated information, but other than that, it was good.

    What do I think about the scalability of the solution?

    Regarding scalability, I think it's good, but you just need to tune it properly to handle the amount of data and data sources you need. When I used it, we didn't have a lot of tables or data entities, and the major issues I experienced were related to collecting metrics. So, while I had some performance issues, they were not related to a lot of tables.

    How are customer service and support?

    There was no technical support available; we did everything fully on our own.

    Which solution did I use previously and why did I switch?

    I used the open-source version of Data Hub.

    How was the initial setup?

    I think the initial deployment was straightforward. When I used it, there were Helm charts already available, so we just had to tweak some minor things, and that was it; it was quite easy.

    What about the implementation team?

    Data Hub does require some maintenance on my end, such as updates to keep it working. For example, the connectors that are pull-based depend on the volume of the source, and they would fail sometimes. So we had to set up monitoring tools, and whenever they failed, we needed to take a look and maybe tune the pull mechanism to make it more scalable. Other than that, that was the only operational load problem I remember having.

    Which other solutions did I evaluate?

    We have significant contracts with Google, Amazon, and Microsoft, all of them and our platform, which you can visit at peerspot.com, offers a comprehensive search bar where you can put in almost any product and probably find it, even if there are not a lot of reviews for it. So, Data Hub, as I remember, is a sort of metadata management and AI observability tool.

    What other advice do I have?

    Currently, I am an end user of Data Hub. I'm not sure about my company's relationship with Data Hub, but personally, I've used it as an enthusiastic user of the Data Catalog subject without any official partnerships. I would rate this product a nine out of ten.