Databricks Data Intelligence Platform
Databricks, Inc.External reviews
763 reviews
from
and
External reviews are not included in the AWS star rating for the product.
Centralized Dashboard with Smooth, Cost-Saving Autoscaling
What do you like best about the product?
Everything is centralized is a single dashboard spark jobs, notebooks and data pipelines. Autoscaling and auto termination genuinely help keep costs under control, and we could was a pleasant surprise that both run smoothly without any noticable lag. Sharing notebooks with the team is straightforward and cuts down on alot of back and forth.
What do you dislike about the product?
Finding older queries is really paunful. Anything beyond a few weeks becomes hard to track down, which makes it difficult to keep my data to day work flowing smoothly and to continue working without constant interruptions.
What problems is the product solving and how is that benefiting you?
We run ETL and ML workloads without having to worry too much about the underlying infrastructure. I can also manage inventory information, at least to some extent, without opening a bunch of different tabs. I spend less time troubleshooting clusters and more time actually working with the data.
Scalable, Unified Platform with a Steep Learning Curve
What do you like best about the product?
I use Databricks for my office projects, and I really like its ability to unify the entire data workflow in a single platform. It lets me seamlessly collaborate with data scientists and analysts, making it easy to ingest, clean, analyze, and model data. I appreciate its scalability and automation features, which save me time and reduce complexity when working with large datasets. I also like that it offers a scalable compute and storage solution, reducing infrastructure management overhead. The integration of shared notebooks and tools like Databricks Genie helps improve collaboration and speed up development.
What do you dislike about the product?
I haven't faced major issues with Databricks itself, but during my initial phase of using the platform, it wasn't very easy to get up to speed with all the features, tools, and configurations. Databricks evolves quickly and in the beginning, it was a bit challenging to match the pace of updates and fully leverage all its capabilities. The initial setup was moderately challenging. While the platform is well documented and user-friendly, getting familiar with all the features, configuring clusters and integrating it with our existing workflows required some learning and experimentation.
What problems is the product solving and how is that benefiting you?
Databricks solves scalability and performance issues, centralizing data from multiple sources and reducing silos. It simplifies collaboration among data professionals, offers scalable compute, and integrates advanced analytics, saving time and reducing complexity with large datasets.
Versatile Data Platform with Seamless Integration
What do you like best about the product?
What I like most about Databricks is that it's integratable with other platforms. I can literally set up a Databricks workspace using Azure data services from the Azure portal, and I can also use Databricks within AWS. It gives me the opportunity to integrate my Databricks notebooks into other environments and orchestration tools or ETL tools, like Azure Data Factory.
What do you dislike about the product?
For now, I noticed when I'm using Azure Databricks, particularly the Azure Databricks cluster, it usually times out, and it's kind of frustrating for me. Most times when I'm working, I just go into another tab. Every time I come back in a minute or two, it's timed out, and I have to sign in again. That experience can be frustrating. I would like that to be looked into. I don't know if it's an issue with Databricks or if it's an issue from the Azure side from the intraident authentication part of things.
What problems is the product solving and how is that benefiting you?
I use Databricks to unify my data by managing governance within the Unity catalog, simplifying user access and report sharing.
Databricks Makes Collaboration and Reliable Data Pipelines Easy
What do you like best about the product?
I really enjoy working in the Databricks environment because it makes it easy to collaborate with others through shared notebooks. Delta Lake technology has also been great for ensuring data quality and reliability across our pipelines. It lets us manage data, build pipelines, and run AI/BI workloads all in one place.
What do you dislike about the product?
The interface is quite laggy at times, especially when I’m scrolling through a notebook or spinning up a cluster.
What problems is the product solving and how is that benefiting you?
Because it’s a unified, end-to-end platform covering everything from data ingestion and transformation to AI and BI insights, it enables faster analysis and helps convert complex datasets into actionable decisions more efficiently
Robust Infrastructure and Easy Setup
What do you like best about the product?
I think the infrastructure in Databricks is really useful, especially its facilities and usage for an admin handling workspaces and client issues. It was easy to use token access, which made setup possible and fairly simple. I also appreciate how daily use in my community involves about 120-200 people. I find Databricks' infrastructure valuable in supporting my role effectively.
What do you dislike about the product?
I think it would be better if Databricks could send people alerts when any new features come to the market.
What problems is the product solving and how is that benefiting you?
I use Databricks to manage workspaces and client issues efficiently. It's easy to use, especially with token access management, facilitating smooth operations. The infrastructure and features make handling queries and workspace access straightforward, improving overall administration.
Unified Lakehouse Powerhouse: Fast, Scalable Analytics in One Databricks Workspace
What do you like best about the product?
What I like best about Databricks is the unified lakehouse platform. Everything—ingestion with Auto Loader/Lakeflow, Delta Live Tables pipelines, Spark transformations, SQL analytics, MLflow experiments, and governance via Unity Catalog—lives in one workspace. No more tool sprawl. Delta Lake delivers reliable ACID transactions, time travel, and schema evolution on massive datasets, while Photon makes queries fly. Serverless compute simplifies scaling, and collaboration in notebooks/repos is seamless for data teams.
What do you dislike about the product?
What I dislike most is the cost. It can spike quickly with poorly tuned jobs, forgotten clusters, or over-provisioning—DBU pricing adds up fast even with optimizations. Cold starts on interactive clusters slow quick prototyping, and it's overkill (and expensive) for tiny datasets or simple queries. The Spark/Delta learning curve is steep for newcomers, and heavy use creates some vendor-specific lock-in.
What problems is the product solving and how is that benefiting you?
Databricks solves data silos, unreliable lakes, and fragmented tooling by providing a governed lakehouse where raw data becomes clean, queryable assets for BI and AI. This benefits me by cutting infra firefighting so I focus on pipelines and quality; for the business, it means faster insights, better data reliability, easier AI adoption, and less tool sprawl—delivering real value from petabyte-scale data without constant re-architecture. (248 chars)
Great Governance and UI—Databricks Fits Our ETL Workflow Perfectly
What do you like best about the product?
I like the overall environment, especially the governance features and the way the UI is handled. I primarily use Databricks as my ETL platform, and it fits well with how I work. The SDP job management governance and lineage capabilities are also helpful.
What do you dislike about the product?
Sometimes there are glitches in the UI. For example, if I cancel something, it takes a bit longer for that change to be reflected in the UI.
What problems is the product solving and how is that benefiting you?
It addresses centralized database and lakehouse management through Unity Catalog. It has also helped solve governance needs and improved lineage tracking.
Comprehensive Platform with Room for Improvement
What do you like best about the product?
I find Databricks to be a one-stop solution because it incorporates various functionalities such as orchestrating pipelines. It also has an inbuilt AI called Genie, which helps in building jobs, and other AI-related tasks. I appreciate that compared to other providers like AWS and Azure, Databricks offers specific features that they lack, allowing me to use the database simply and access everything in one place. The initial setup was quite easy because I could use a single stop to directly implement and update tables using the data lakehouse, which is easier compared to others
What do you dislike about the product?
I think Databricks could improve on the orchestration part. Even though it has orchestration capabilities for pipelines and jobs, it misses the ease of access that something like Airflow provides, which is specifically designed for orchestration. It would be helpful if Databricks adopted a pattern similar to Airflow's for better orchestration and job linking. I also feel the Genie part could be improved. While the Genie works well, the output duration can be lengthy, usually taking more than five to ten minutes to perform specific tasks. So, I would like to see improvements in that area as well.
What problems is the product solving and how is that benefiting you?
I use Databricks as a one-stop solution for various tasks. It orchestrates pipelines and utilizes an inbuilt AI, making it more feature-rich than alternatives like AWS or Azure. This allows me to streamline workflows without relying on multiple providers.
Seamless Integration, Needs Performance Tuning
What do you like best about the product?
I think the most useful part of Databricks is its single architecture where you can have everything, like a database and dashboard, all in one. Compared to other providers like Azure or AWS, where I would need multiple services, Databricks offers everything in a single service. This simplifies my work because I don't have to manage integration or network level details across different services. The convenience of having everything inside Databricks means I can avoid multiple network updates when connecting with tools like Power BI, which makes it a standout feature for me. Additionally, the initial setup after migrating from Snowflake was pretty easy since Databricks allows us to manage access and security within a single service.
What do you dislike about the product?
Yeah, so one thing that needs to be updated is Genie code. If I look at it, Genie code is helpful for generating code but when it does in the back end, it consumes much memory. For example, if I'm opening Databricks in Chrome, it's gonna take at least one or two GB memory at the back end, and that takes a lot of time to generate the response as well. So if we could reduce that, it would be great. Also, on the pipeline stuff, for example, if you take Airflow, Airflow is specifically designed for our position. We use Airflow and I can see, for example, if I have thousands of jobs, I can see each and every job and what's happening. But with Databricks, it's a tough job for me to see the success and failures and to manage the charts. We have multiple options to monitor it in Databricks, but it's hard when compared with Airflow.
What problems is the product solving and how is that benefiting you?
Databricks helps us consolidate data from different locations into a single database, simplifying master data management and making data access easier with integrated dashboards, improving our AI-powered sales and prospect tracking.
Genie Code and Inline Assistant Dramatically Boosted My Debugging Productivity
What do you like best about the product?
Genie code and the inline Assistant were the most helpful tools for me on my project. They helped me debug a 2k-line codebase and clearly explained why I wasn’t getting accurate data. It also provided a query to run in my source system (SQLMI). By running the discrepancy script in parallel on the source and target, I was able to debug the entire code much faster and improve my productivity. Overall, it cut my work time from about 8 hours down to around 1 hour.
What do you dislike about the product?
In Delta Sharing, there isn’t a catalog-level SELECT permission, and I sometimes think having that would be helpful. Also, when I use the Genie code inside a VM, it can make the website unresponsive at times. These are areas that could be improved.
What problems is the product solving and how is that benefiting you?
In one of our claims-processing migration projects, the client needed near real-time data availability for downstream applications. Previously, the architecture used Amazon Redshift as the data warehouse, with Jasper and Sisense consuming the data for reporting and analytics. However, that setup didn’t support real-time or near real-time streaming efficiently, which led to delays in data availability for downstream systems.
After migrating the platform to Databricks, we were able to substantially improve the data pipeline architecture. We implemented streaming along with optimized ETL pipelines, reducing the data refresh cycle to about 30 minutes. We also created a dedicated view that retains data from the previous run, so downstream systems always have a consistent dataset available while the next pipeline execution is still in progress.
Before, we struggled with delayed refresh cycles and a limited ability to meet near real-time data needs in our Redshift-based architecture. After moving to Databricks, we enabled faster ETL processing and improved near real-time data availability.
As a result, we reduced ETL refresh time to roughly 30 minutes and enabled near real-time access for downstream tools like Jasper and Sisense. Reliability also improved because the stable view continues to serve the previous run’s data during pipeline updates. Finally, the overall architecture became simpler by consolidating processing and analytics capabilities within Databricks.
Overall, Databricks helped us build a more scalable and efficient near real-time data processing platform, significantly improving the timeliness and reliability of analytics for the claims-processing workflow.
After migrating the platform to Databricks, we were able to substantially improve the data pipeline architecture. We implemented streaming along with optimized ETL pipelines, reducing the data refresh cycle to about 30 minutes. We also created a dedicated view that retains data from the previous run, so downstream systems always have a consistent dataset available while the next pipeline execution is still in progress.
Before, we struggled with delayed refresh cycles and a limited ability to meet near real-time data needs in our Redshift-based architecture. After moving to Databricks, we enabled faster ETL processing and improved near real-time data availability.
As a result, we reduced ETL refresh time to roughly 30 minutes and enabled near real-time access for downstream tools like Jasper and Sisense. Reliability also improved because the stable view continues to serve the previous run’s data during pipeline updates. Finally, the overall architecture became simpler by consolidating processing and analytics capabilities within Databricks.
Overall, Databricks helped us build a more scalable and efficient near real-time data processing platform, significantly improving the timeliness and reliability of analytics for the claims-processing workflow.
showing 31 - 40