Databricks Data Intelligence Platform
Databricks, Inc.External reviews
763 reviews
from
and
External reviews are not included in the AWS star rating for the product.
Efficient ETL and AI-Driven Data Validation
What do you like best about the product?
I like the AI-supported environment in Databricks, which I use extensively for ETL tasks and experimental AIBD dashboards. It's really helpful for fixing code issues and handling logic implementation efficiently. The DLT feature is also a great addition for supporting streaming data. I find Delta Lake very useful for reliable data handling with its ACID transactions, schema enforcement, and reliable versioned data. Notebooks make it easy to develop, test, and debug data logic interactively. I also appreciate the workflows for automating and scheduling pipelines, which improve reliability and reduce manual effort. Databricks is cost-effective compared to other platforms like Synapse and Snowflake, and it's easy to track versions and handle failures. The initial setup was straightforward, with workspace creation and cluster setup being fairly easy for my team.
What do you dislike about the product?
The DLT, one of my personal experiences, as when set on DLT for one flow, I could not create another flow with the same table used previously. On a business aspect, it's normal to use one table for different reporting aspects as a base table and require different refresh timing.
What problems is the product solving and how is that benefiting you?
I perform ETL tasks and reporting with Databricks. It helps set up streaming data using DLT, and features like Delta Lake enhance data quality. Notebooks support interactive logic development, while workflows automate pipeline scheduling, reducing manual effort.
All-in-One Platform for Data Engineering, ML, AI, and Data Management
What do you like best about the product?
It brings all the tech stacks together in one platform—data engineering, machine learning, AI, and data management—so everything is in one place. It also includes advanced features that make the platform feel complete and capable.
What do you dislike about the product?
We need more open-source, direct connectors to both legacy and current-generation platforms to enable better data extraction. These connectors should support real-time extraction as well as real-time data rendering.
What problems is the product solving and how is that benefiting you?
It brings all types of data into one place, which makes data and access management easier. I can build data warehouses and then downstream the data to AI BI dashboards and ML models, which is very useful. Special features like the feature store, serving endpoints, AI BI dashboard, and Genie help me understand the data, work with it more effectively, and ultimately reach my goals.
Databricks Lakehouse Powerhouse with Unity Catalog and Fast Photon SQL
What do you like best about the product?
I really value how the platform brings data lakes and warehouses together into one place. It makes managing data much easier, and the SQL performance is very fast thanks to the Photon engine. I also like the collaborative notebooks because they allow me to work with both SQL and Python seamlessly in a single environment.
What do you dislike about the product?
The cost can be high, and the DBU billing system is quite complex to track. I also found that there is a significant learning curve when it comes to Spark and configuring clusters. For smaller, quick tasks, the setup time and technical overhead can sometimes feel like a bit too much.
What problems is the product solving and how is that benefiting you?
It solves the issue of having data scattered everywhere. I love that I can switch between SQL and Python in the same spot, and the processing speed is top-notch. It’s been a game-changer for building out our financial models quickly without the usual lag.
Empowers Collaborative Data Science with Minor Learning Curve
What do you like best about the product?
I use Databricks for a lot of things. The main ones are making sense out of the data, looking at chunks of data, and doing machine learning. Databricks makes these tasks very easy and helpful, especially for data projects. It's great for collaborating with friends and developing my Python code in notebooks. I like Databricks because it has good capabilities for handling big data and is excellent for working with the data and machine learning. It's also easy to use when working with people, as many can work on a project and share their findings.
What do you dislike about the product?
Databricks is very powerful, but there are some things that need improvement. It's hard to learn for beginners when working with Spark and setting up clusters, as this was confusing at first. Sometimes the interface and settings can feel complicated. I think it would be helpful if there were clear setup instructions so new users could get started easily with Databricks.
What problems is the product solving and how is that benefiting you?
I use Databricks to make sense of data, collaborate with others, and develop Python code. It simplifies data engineering, machine learning, and handling data while allowing multiple people to work on notebooks simultaneously.
Intuitive Analytics with AI Genie, Needs Performance Tweaks
What do you like best about the product?
I really like the inclusion of the AI Genie into Databricks. It helps to make data analytics easier and more intuitive for me. I am able to query my datasets using natural language, translating it to SQL queries and generating visualizations and insights.
What do you dislike about the product?
I would like to have some flexibility around operational complexity and performance tuning.
What problems is the product solving and how is that benefiting you?
Databricks helps me fix data duplication and inconsistency with idempotent pipelines. I also use AI Genie to query datasets using natural language, translating into SQL queries and generating visualizations, making analytics easier.
Versatile Platform with Robust Data Governance
What do you like best about the product?
I personally like the Databricks UI, especially the dark mode. Technically, I find Unity Catalog's built-in lineage and governance very valuable. Auto-loader's incremental file processing with exactly-once guarantees and Delta Lake's ACID reliability are my personal favorites. Delta Lake's ACID transactions ensure our data pipelines either fully succeed or fully roll back, which prevents partial writes from corrupting tables. Time travel in Delta Lake allows us to query previous versions of our table for audits without needing separate snapshots. Unity Catalog's capability to auto-track lineage across our entire pipeline is critical for regulatory audits, and its role-based access control and column masking ensure data access is properly managed across teams. The workspace and notebook setup were straightforward, making the initial setup relatively easy.
What do you dislike about the product?
Migrating from hive_metastore to Unity Catalog is painful with limited tooling - UCX helps but it's still a heavy lift. Databricks-to-dbt cloud orchestration lack a clean native handoff, forcing custom API polling code that's fragile and hard to debug. Cost visibility for Serverless SQL warehouse could be more granular - it's hard to attribute DBU spend to specific pipelines or dbt models without digging into system tables manually.
What problems is the product solving and how is that benefiting you?
Databricks replaced our fragmented data stack with one platform for ingestion, ETL, analytics, and governance. Unity Catalog handles regulatory lineage needs by auto-tracking data provenance.
Fast, Seamless Databricks for Big Data Pipelines, and Analytics in One Place
What do you like best about the product?
What I love most about Databricks is how fast and connected everything is.
Compared to other platforms, it handles heavy big data pipelines without breaking a sweat. But the best part is how easy it is to use that data once it's processed.
Whether I need to build a quick analytics dashboard or train custom machine learning models specific to our data, it all connects seamlessly. It just takes the headache out of moving data around and lets you do everything in one place.
Compared to other platforms, it handles heavy big data pipelines without breaking a sweat. But the best part is how easy it is to use that data once it's processed.
Whether I need to build a quick analytics dashboard or train custom machine learning models specific to our data, it all connects seamlessly. It just takes the headache out of moving data around and lets you do everything in one place.
What do you dislike about the product?
If I had to choose what I dislike, it mainly comes down to the cost and how complex it can be.
First, it can get expensive very quickly. If you’re not careful about managing your computing clusters and shutting them down when you’re done, the bills can creep up on you.
Second, it can sometimes feel like overkill for simpler tasks. Since it’s built for massive data, having to dig through complicated error logs when something breaks can be a real headache compared to using lighter tools.
First, it can get expensive very quickly. If you’re not careful about managing your computing clusters and shutting them down when you’re done, the bills can creep up on you.
Second, it can sometimes feel like overkill for simpler tasks. Since it’s built for massive data, having to dig through complicated error logs when something breaks can be a real headache compared to using lighter tools.
What problems is the product solving and how is that benefiting you?
The main problem Databricks helps me solve in my business is performance. We used to wait for hours for pipelines to run in ADF, and now we can get them done in minutes.
Streamlines Data Engineering with Minor Delays
What do you like best about the product?
I like using Databricks because it helps me create fast processing ETL pipelines and solve orchestration and storage issues. I appreciate the genie because it helps me gain fast insights from the data. It reduces my processing times by days of discovery to hours and makes my quotes competitive to clients. The onboarding was smooth with intuitive features, which makes my job easier.
What do you dislike about the product?
I feel Databricks clusters to be slow in spinning up with long wait time for just small tasks as well.
What problems is the product solving and how is that benefiting you?
Databricks lets me create fast processing ETL pipelines, solving orchestration and storage issues. Plus, with Unity Catalog, I manage governance smoothly without worrying about background complexities.
Databricks Streamlines End-to-End ETL with Unity Catalog and AI-Powered Debugging
What do you like best about the product?
What stands out to me is how Databricks simplifies the end-to-end ETL lifecycle. The platform’s steady integration of new features has noticeably reduced the friction of ingesting data from a wide range of source systems.
Unity Catalog (UC) has also been a game-changer for data administration. It offers a centralized, robust governance layer that makes managing complex environments feel much more intuitive and easier to control.
I’m especially impressed by the recent AI-driven updates. Genie Code has become an essential part of my workflow; it has dramatically improved my debugging speed and is already proving to be a valuable asset in my current UC migration project. Overall, the way Databricks blends traditional data engineering with assisted intelligence feels genuinely forward-thinking.
Unity Catalog (UC) has also been a game-changer for data administration. It offers a centralized, robust governance layer that makes managing complex environments feel much more intuitive and easier to control.
I’m especially impressed by the recent AI-driven updates. Genie Code has become an essential part of my workflow; it has dramatically improved my debugging speed and is already proving to be a valuable asset in my current UC migration project. Overall, the way Databricks blends traditional data engineering with assisted intelligence feels genuinely forward-thinking.
What do you dislike about the product?
While Auto Loader is powerful, there are still notable gaps in the Lakehouse Data Pipeline (LDP) around schema inference. Right now, when inferSchema is enabled, the inferred schema only applies to the first level of the hierarchy. In complex datasets with multi-nested fields, the lack of deep schema inference creates manual overhead and makes streaming CDC pipelines harder to build and maintain.
Lakeflow Connect feels like a step in the right direction, but the library of native connectors still seems incomplete compared to some competitors. And while the AI features (like Genie) are promising and genuinely interesting, they still come across as being in a “developing” stage—sometimes lacking the consistency you need for high-stakes production environments. I’d like to see these capabilities evolve from “innovative extras” into hardened, production-ready tools.
Lakeflow Connect feels like a step in the right direction, but the library of native connectors still seems incomplete compared to some competitors. And while the AI features (like Genie) are promising and genuinely interesting, they still come across as being in a “developing” stage—sometimes lacking the consistency you need for high-stakes production environments. I’d like to see these capabilities evolve from “innovative extras” into hardened, production-ready tools.
What problems is the product solving and how is that benefiting you?
The Problem: Data Silos & Inefficient Support Operations
In many organizations, critical institutional knowledge ends up scattered across disconnected systems such as MySQL (structured), Jira (transactional), and Confluence (unstructured). When information is fragmented this way, support teams struggle to find fast, accurate answers for incoming tickets. The result is higher MTTR (Mean Time to Resolution) and a lot of repetitive, manual effort.
The Solution: A Unified “Intelligence Platform”
Databricks addresses this by serving as a single fabric that connects these silos. In my work, I focus on using the Lakehouse Data Pipeline (LDP) to ingest and unify these different sources into one governed environment.
How this benefits my project:
I use Databricks for seamless ingestion, centralizing data from MySQL, Jira, and Confluence to build a comprehensive “Knowledge Base” without having to manage multiple, disparate ETL tools.
I also rely on native AI integration. With Mosaic AI Vector Search, I can convert the unified data into embeddings directly within the platform, which lets me build an AI Automation Agent for our ticketing system.
Finally, it supports automated solutioning. The agent can run vector matching on newly created tickets against the full historical knowledge base and then propose accurate, context-aware solutions to engineers right away.
The Impact
The biggest benefit for us is operational velocity. Databricks has shifted our data from a passive archive into an active, “intelligent” engine. It reduces time spent on manual research and helps us automate the first line of support, improving the accuracy of ticket resolutions while lowering the burden on our technical teams.
In many organizations, critical institutional knowledge ends up scattered across disconnected systems such as MySQL (structured), Jira (transactional), and Confluence (unstructured). When information is fragmented this way, support teams struggle to find fast, accurate answers for incoming tickets. The result is higher MTTR (Mean Time to Resolution) and a lot of repetitive, manual effort.
The Solution: A Unified “Intelligence Platform”
Databricks addresses this by serving as a single fabric that connects these silos. In my work, I focus on using the Lakehouse Data Pipeline (LDP) to ingest and unify these different sources into one governed environment.
How this benefits my project:
I use Databricks for seamless ingestion, centralizing data from MySQL, Jira, and Confluence to build a comprehensive “Knowledge Base” without having to manage multiple, disparate ETL tools.
I also rely on native AI integration. With Mosaic AI Vector Search, I can convert the unified data into embeddings directly within the platform, which lets me build an AI Automation Agent for our ticketing system.
Finally, it supports automated solutioning. The agent can run vector matching on newly created tickets against the full historical knowledge base and then propose accurate, context-aware solutions to engineers right away.
The Impact
The biggest benefit for us is operational velocity. Databricks has shifted our data from a passive archive into an active, “intelligent” engine. It reduces time spent on manual research and helps us automate the first line of support, improving the accuracy of ticket resolutions while lowering the burden on our technical teams.
From 1 Hour to 10 Minutes: How Databricks Modernized Our Workflow
What do you like best about the product?
We used to use ADF to get data from SQL Server and then work on it in Databricks before putting it into Salesforce. The whole process took a time more than an hour because ADF added extra work.
Now everything happens inside Databricks. We transform the raw data in Databricks and put in into Salesforce all in one place. This has made the whole process much faster, it now takes 10 minutes. That is an improvement from what we had with ADF.
Delta Lake has also been really useful. It helps us keep track of changes and go back if something goes wrong. We can see what happened before . Fix mistakes easily.
Delta Lake also makes sure the data is good before it goes into the pipeline. It stops data from getting in and causing problems later on in Salesforce. This makes the whole process more reliable and easier to take care of.
Now everything happens inside Databricks. We transform the raw data in Databricks and put in into Salesforce all in one place. This has made the whole process much faster, it now takes 10 minutes. That is an improvement from what we had with ADF.
Delta Lake has also been really useful. It helps us keep track of changes and go back if something goes wrong. We can see what happened before . Fix mistakes easily.
Delta Lake also makes sure the data is good before it goes into the pipeline. It stops data from getting in and causing problems later on in Salesforce. This makes the whole process more reliable and easier to take care of.
What do you dislike about the product?
Databricks is really good at what it does.. Sometimes it takes a while to get the cluster up and running.. The user interface is slow at sometimes. This can be annoying when we are in a hurry to get things done for Salesforce. The Salesforce connectors in Databricks can be a bit tricky to work with. They often need to be set up right and do not work as we expect. This means we have to put in work when we are trying to figure out problems or keep an eye on the pipelines, in Databricks for Salesforce.
What problems is the product solving and how is that benefiting you?
It is solving our performance and reliability issues - by allowing us to extract, transform and load the data into Salesforce all in one place without ADF. This unified workflow has reduced our runtime from 1 hour to 10 minutes giving us faster job completion and on time Salesforce data updates.With delta lake features like ACID transactions and time travel,our data is more accurate and easier to recover when something goes wrong.
showing 21 - 30