AWS Marketplace: Databricks Data Intelligence Platform Reviews

Hospital & Health Care

Intuitive, Limitless Analytics for End-to-End Data Pipelines

March 27, 2026
Review provided by G2

What do you like best about the product?

It’s very intuitive, and the breadth of data and analytics you can do with it is limitless.

You can create a medallion architecture, create data pipelines, create jobs, dashboards, data governance, etc.

What do you dislike about the product?

I feel like some of the newer releases features can be a bit buggy at times, but after a while those things usually get better.

What problems is the product solving and how is that benefiting you?

We have a data and analytics platform and we use Databricks as our key vendor. Our relationship with them has been great and they’ve been super helpful the whole way.

Aladdin A.

Simplifies Data Engineering, Needs Better Tool Integration

March 26, 2026
Review provided by G2

What do you like best about the product?

I like the features of Genie, especially the new junior code, which makes it possible to get SQL-ready scripts by just chatting using natural language. This is fascinating, especially with the governance layer on top of it with Unity. It accelerates both analysts' and engineers' jobs by helping build reports and getting them ready efficiently, especially since it has access to most of the metadata. The documentation is also useful, suggesting SQL code that can be provisioned on the fly. Tying Genie with AI functions such as the ai_query makes it a superpower.

What do you dislike about the product?

Honestly, a ton of features that can be improved, especially connectivity with other tools, such as cloud tools, especially like Azure. As a Microsoft employee, I evangelize Databricks, but many of our clients use the Microsoft stacks extensively. Sometimes, these tools feel isolated from the whole stack. There’s still a lot of work to be done to connect models provisioned in Azure and things like unity catalogs or governance that can sit outside of Databricks and Microsoft's stack. This feels like a disconnect, especially in highly regulated environments where on-prem stuff needs to interact with Databricks capabilities.

What problems is the product solving and how is that benefiting you?

Databricks simplifies provisioning services, streamlines data engineering, and speeds up workflow creation. It combines tools into one governed platform, making handling big data easier and faster. Its AI layer integrates well, reducing the need for multiple tools.

Dharun T.

Genie Code Agent Mode Made Our Migration to Databricks Fast and Accurate

March 26, 2026
Review provided by G2

What do you like best about the product?

Genie Code (Databricks Assistant Agent) — I’m currently working on migrating existing workloads from ADF and SQLMI to Databricks. As part of that, I need to convert stored procedures and ADF dataflows into Databricks notebooks. Initially, we refactored all the code manually, but once Agent Mode was available in preview, we tried using it to convert the stored procedures and dataflows into Databricks PySpark code. I was impressed by the accuracy: it handled about 90% of the code conversion without errors, aside from some case-handling and similar adjustments.

Also, Lakeflow Connect helped me connect SharePoint and SFTP data to Databricks more easily.

What do you dislike about the product?

It’s not a major issue, but in my project the client asked us to generate table and column descriptions using AI in Unity Catalog. For each environment, these descriptions vary, and I have around 300 tables just in the Bronze zone. Having to click into each table and generate AI descriptions one by one is very time-consuming, and the results are not consistent across environments.

It would be much more efficient if we had an option to generate descriptions at the schema level, and if there were an information schema or system tables that stored table and column descriptions as metadata. That way, we could easily replicate them across environments. In some cases, clients also have source system documentation we could leverage to generate more accurate table and column descriptions.

What problems is the product solving and how is that benefiting you?

One of my main scenarios was migrating all the existing stored procedures and ADF dataflows into Databricks notebooks. Doing this manually took more than 6 hours to complete both the development and the validation. Later, we used Agent Mode Preview and converted over 80+ medium/complex stored procedures and 20+ ADF dataflows into Databricks notebooks. This saved more than 100+ hours, and it also generated validation scripts for each table to close out unit testing.

Apart from the Agent Assistant, we also used external volume. Previously, we relied on the Azure library for file processing in ADLS storage, but we ran into rate-limit issues, couldn’t process in parallel, and sometimes the job would abort. After we created an external volume pointing to the required ADLS container, we achieved parallel processing and faster reads and writes, instead of using custom Python code.

Amit D.

Databricks: A True Unified Analytics & AI Platform That Boosts Speed and Reliability

March 26, 2026
Review provided by G2

What do you like best about the product?

What I like best about Databricks is how it finally delivered what every data engineer/data professional has been wishing for — a true unified analytics and AI platform.
I remember working across five different tools just to get a single pipeline from ingestion to reporting. Databricks collapsed all of that into one environment, and that changed everything for me.
Delta Lake was the first breakthrough. When it arrived around 2020, ACID transactions and time‑travel immediately eliminated the operational pain we used to consider “normal.” If a job corrupted a table, I could roll back to a previous version in seconds instead of spending hours restoring backups. That reliability alone saved multiple downstream failures.
Before Delta existed, our pipelines relied heavily on overwrite patterns because there was no reliable way to apply updates or handle late‑arriving data safely. Overwrites were slow, expensive, and risky — especially for large tables. A single failure during overwrite could leave the table in a half-written, inconsistent state. Processing took longer, compute costs shot up, and recovery often meant manually rebuilding partitions from scratch.
The ROI became obvious as soon as we used Databricks end‑to‑end. Because one platform handles ingestion → transformation → ML → BI → governance, we retired entire categories of legacy tools and reduced operational overhead dramatically.
Then Genie arrived — and it genuinely transformed my day‑to‑day work.
I once needed a PySpark module for data quality checks. Genie generated the full logic — null checks, schema validation, aggregations — in seconds. Instead of spending 30 minutes writing boilerplate, I spent 3 minutes refining the logic. It shifted my focus from syntax to decisions.
Integrations are another strength. Connecting Databricks to S3, SQL Server, and especially Power BI has been seamless. Publishing Delta tables directly to BI models removed the need for brittle extracts and sped up refreshes. Unity Catalog made everything even cleaner with consistent permissions and lineage.
Performance is consistently strong when it matters — heavy joins, window functions, multi‑stage pipelines, or streaming workloads. Serverless compute starts instantly, and workloads scale predictably even under pressure.
Finally, onboarding surprised me. Features like serverless compute, natural‑language queries, AI‑generated code suggestions, and automatic comments make Databricks intuitive even for engineers new to Spark. It feels like the platform actively helps you learn.
In short: Databricks lets me work faster, recover instantly, integrate seamlessly, and scale confidently — all in one place. It’s the rare platform that improves both speed and reliability at the same time.

What do you dislike about the product?

What I dislike most about Databricks is the cost visibility and predictability.
Even as an experienced engineer, it can be difficult to get a straight, real‑time view of what a workflow will cost before running it. Photon vs. standard runtime, autoscaling behaviour, shuffle-heavy operations, DBUs—these can stack up quickly, and cost surprises happen unless you actively monitor and tune everything. A simple pipeline misconfiguration can quietly double your spend.
Another challenge is the rapid pace of new features and changes.
Databricks innovates incredibly fast, which is great, but it also means features may land before documentation, best practices, or governance patterns are fully mature. Sometimes functionality behaves differently across runtimes or cloud providers, and staying on top of everything requires continuous learning and refactoring. This can create team friction and technical debt.

In short: Databricks is exceptional, but the cost model isn’t always transparent, and the rapid feature rollout can introduce operational complexity that teams must actively manage.

What problems is the product solving and how is that benefiting you?

Business : Before adopting Databricks, our aerospace analytics environment — particularly around Customer engine health monitoring — suffered from the same challenges many traditional engineering organisations face.
We had multiple disconnected systems handling telemetry ingestion, fault-code processing, fleet analytics, and maintenance prediction. Data from engine sensors (FADEC, vibration, thermals, oil systems) arrived in different formats and needed heavy manual work just to normalise. Pipelines relied on full overwrites because our legacy setup didn’t support updates or late-arriving data, which made processing slow and expensive.
We struggled with slow ingestion of engine telemetry, inconsistent datasets across engineering teams, and long turnaround times for anomaly detection models.

Architecture challenge: Before using Databricks, we were operating in a fragmented data landscape.
We had multiple systems, disconnected storage layers, and a heavy reliance on overwrite‑based ETL jobs because our old data platform couldn’t support updates, late‑arriving data, or ACID guarantees. This meant pipelines were slow, error‑prone, and expensive. Rolling back bad data could take hours, and data inconsistencies across teams were common.
We struggled with siloed systems, slow pipelines, unreliable data, and high operational cost.

We struggled with manual overwrites and inconsistent data — but now we can use Delta Lake with ACID and time‑travel,
which has resulted in:

Instant rollback from data corruption scenarios
Reliable incremental processing instead of full overwrites
Consistent data consumed across engineering, BI, and ML teams

This reduced our telemetry pipeline processing window from hours to under 30 minutes for a fleet‑wide daily batch..

We struggled with multiple tools and duplicated architectures — but now we have one unified Lakehouse,
which has resulted in:

A single platform for ingestion → transformation → ML → BI → governance
Removal of 3–5 legacy tools (ETL schedulers, BI extracts, legacy ML infra)
Lower maintenance and licensing overhead

We struggled with slow development cycles — but now we can leverage Genie for AI‑assisted engineering,
which has resulted in:

70–80% faster creation of PySpark modules
Automatic generation of schema checks, null checks, and DQ logic
More time spent on decisions, less on boilerplate code

For example, a data quality module that used to take 30 minutes now takes 2–3 minutes to scaffold.

We struggled with inconsistent governance — but now Unity Catalog gives us end‑to‑end visibility,
which has resulted in:

Faster onboarding (reduced from days to minutes)
Centralised permissions, lineage, and audit trails
Stronger compliance alignment

We struggled to scale pipelines and ML workloads — but now we use distributed compute + Photon,
which has resulted in:

Large joins and window operations executing up to 10× faster
Stable handling of terabyte‑scale datasets
Predictable performance even under heavy workloads

Joseph F.

Databricks Notebooks Make Collaboration Seamless Across Python, SQL, and Scala

March 26, 2026
Review provided by G2

What do you like best about the product?

Databricks collaborative notebooks are really useful and let me work in whatever language I need to meet my requirements effectively. The ability to mix Python, SQL and even Scala within a dashboard makes collaboration and teamwork much smoothet. I also appreciate how easily it integrates with other tools and cloud platforms, so it fits into my existing workflows without very little friction.

What do you dislike about the product?

I like their customer support and the frequent updates are a big reason this has become my favorite for data management, I also appreciate how well it integrates with external tools like Power BI for reporting its really good.

What problems is the product solving and how is that benefiting you?

Its simplifies cross team collaboration and helps us work through large datasets without having to worry too much about infrastructure or analytics overhead. Calcuations and reporting are fast, which has improved our development cycles and reduced the back and forth between the engineering and analytics teams.

Senthil K.

Reimagining Data Workflows & Insights with Genie: NLQ spaces, Agent Mode, and Intelligent Coding

March 25, 2026
Review provided by G2

What do you like best about the product?

1) In our implementation, Genie Space is actively used to enable NLQ-based access across multiple data products like Finance, HR, Marketing, Sales, and Supply Chain (inventory, demand planning, and replenishment), reducing dependency on data teams for ad-hoc queries.

2) We designed separate Genie Spaces for each BU/team/data product, ensuring domain-level isolation while still supporting cross-functional querying where required (e.g., Finance + Sales joins).
Each Genie Space is carefully configured with curated data tables, business-level instructions, and semantic context, which significantly improves the accuracy of SQL generation.

3) We provide few-shot examples, guided prompts, and sample business questions tailored to each domain, helping Genie understand real business intent instead of generic query patterns.

4) In Chat Mode, business users directly ask questions in natural language, and Genie translates them into SQL and returns results, which has improved self-service analytics adoption.

5) In Agent Mode, Genie goes beyond SQL generation by creating a logical execution plan, breaking down complex queries into multiple steps before querying the underlying data.

6) We built a dedicated Anomaly Detector Genie Space, where users ask questions about cluster cost, performance issues, and inefficient workloads.
This anomaly-focused Genie analyzes long-running jobs, inefficient queries, and cluster utilization patterns, using historical workload data to identify optimization opportunities.

7) A key implementation is notebook-level analysis, where Genie highlights code issues, shows before vs after optimization, categorizes problems (performance, cost, inefficiency), and explains improvements clearly.

8) Genie also provides quantified recommendations, including expected cost savings (e.g., idle cluster reduction, query tuning impact) and workload-based optimization strategies, making it highly actionable for engineering teams.

9) We extended Genie into Genie Code integrated with Databricks AI Assistant, enabling an agentic development experience directly within our data engineering workflows.
Our team defined custom skills in Markdown (MD files) such as Coder, Tester, Mapper, and Data Generator, which are attached to Genie Code to modularize capabilities.
These skills are used to support end-to-end SDLC activities, including code generation, transformation logic creation, test case design, and synthetic data generation.

10) Genie Code operates by first creating a structured execution plan, outlining all required steps before starting any development activity.
It then breaks the plan into a detailed to-do list, executing each step sequentially (e.g., create notebook → write transformation → validate logic → optimize code).

11) During execution, Genie Code follows a human-in-the-loop model, asking for approvals at every step with options like allow once, always allow, or read-only execution.
The behavior of Genie Code is controlled through project-specific guidelines and instructions, ensuring it aligns with our coding standards, architecture patterns, and governance rules.

12) It acts as a co-developer within the workspace, assisting engineers in writing optimized code, validating logic, and ensuring best practices are followed consistently.
We are leveraging it for proactive development workflows, where Genie not only executes tasks but also suggests improvements and optimization opportunities during development itself.
This approach has enabled a “vibe coding” style of development, where engineers focus on intent while Genie handles structured execution, resulting in faster delivery, reduced manual effort, and improved overall code quality.

What do you dislike about the product?

Context limitation across Genie Spaces, also number of tables can be attached is 30 if i remember
Agent Mode reasoning depth is good but not fully autonomous
Need improvements in performance efficiency and reduce the latency

What problems is the product solving and how is that benefiting you?

1) Bridging business and data teams through NLQ
Databricks Genie solves the gap between business users and technical teams by enabling natural language access to data, reducing dependency on data engineers for everyday queries.

2) Eliminating data silos across domains
By integrating data from Finance, HR, Sales, and Supply Chain, it helps us analyze cross-domain datasets, improving decision-making for use cases like demand planning and inventory optimization.

3) Accelerating self-service analytics
With Genie Chat Mode converting NLQ to SQL, business users can independently fetch insights, significantly reducing turnaround time for reporting and analysis.

4) Handling complex analytical queries with Agent Mode
Genie Agent Mode solves complex query scenarios by breaking them into structured execution plans, which is especially useful for multi-step analytical and optimization problems.

5) Improving cost and performance visibility
Through our Anomaly Detector Genie Space, Databricks helps identify cluster inefficiencies, long-running jobs, and costly queries, giving clear visibility into platform usage.

6) Driving workload optimization and cost savings
The platform provides actionable recommendations like query tuning, cluster right-sizing, and idle resource reduction, helping us optimize cost based on actual workload patterns.

7) Enhancing code quality through notebook analysis
Genie analyzes notebook code and highlights performance issues with before/after comparisons, enabling developers to improve efficiency and follow best practices.

8) Supporting proactive development with Genie Code
Databricks enables an agentic development workflow, where Genie Code assists in planning, coding, testing, and executing tasks step-by-step, reducing manual effort.

9) Standardizing development using skill-based automation
By attaching custom skills (Coder, Tester, Mapper, Data Generator), we ensure consistent development practices and faster onboarding for new use cases.

10) Increasing overall productivity and faster delivery
Combining Genie Space and Genie Code, Databricks significantly improves developer productivity, reduces iteration cycles, and accelerates delivery of data solutions, while maintaining governance and control.

Rupesh A.

Essential Data Processing with Seamless Collaboration

March 24, 2026
Review provided by G2

What do you like best about the product?

I like how Databricks allows not just engineers, but also data managers, analysts, data scientists, and everyone to work in a simplified and collaborative manner. That's a feature I appreciate which Databricks does well, setting it apart from competitors who are trying to offer similar capabilities. Many people have already adopted it, and it has become the de facto choice.

What do you dislike about the product?

I think the lineage and the addition of business assets, as well as how the data translates to the business layer of the bank or any other vendor, is where Databricks can improve. I don't see different departments getting connected in Databricks by the glossary or items which they use for themselves.

What problems is the product solving and how is that benefiting you?

I use Databricks to manage vast datasets from multiple sources, helping organize infrastructure and access management, and aids in some visualization tasks.

Verified User

Revolutionized HR Analytics with Genie, Minor Cost Concerns

March 24, 2026
Review provided by G2

What do you like best about the product?

I really like the Genie feature on Databricks, it's great and unifies well with the ecosystem. Combining the lakehouse with Genie is simple and has transformed our HR analytics. We can ask questions in plain English about attrition and get instant, accurate responses. This effectively removes the engineering bottleneck almost completely, allowing HR to access insights directly from Genie without waiting weeks for custom dashboards. It saves the engineering team loads of hours and accelerates decision-making. Plus, setting up Databricks is seamless, as we could set up the account and start running lakehouses in minutes.

What do you dislike about the product?

Setting up Genie requires meticulous planning and data curation to get excellent responses. If the semantic model isn't perfect, it can stumble. Cost management is tricky when multiple teams use open-ended queries all day. The metric views and serverless costs features make it better, but there's room for improvement.

What problems is the product solving and how is that benefiting you?

Databricks solves ingestion, transformation, governance, and data quality challenges, offering AI and BI tools for instant insights. With Genie, HR bypasses engineering bottlenecks, saving hours and accelerating decisions. It's simple to unify lakehouse with Genie for quick, accurate responses.

Pang L.

An all-in-one platform

March 18, 2026
Review provided by G2

What do you like best about the product?

It's an all-in-one platform for data engineers, analysts, data scientists, and business users.

What do you dislike about the product?

It’s easy to overspend and it is a vendor lock-in.

What problems is the product solving and how is that benefiting you?

Data engineering, model training and inference, GenAI.

Databricks solves the problem of having fragmented tools across the data and AI lifecycle. Traditionally, teams would need separate platforms for data engineering, analytics, machine learning, and AI — leading to silos, duplicated work, and governance challenges.

With Databricks, data engineering pipelines, model training and inference, and GenAI development all live in one unified environment. This means data engineers can build and orchestrate pipelines, data scientists can train and deploy models, and teams can develop and serve GenAI applications — without constantly moving data or context-switching between tools.

Banking

Powerful Warehousing, Collaborative, AI Debugging

March 13, 2026
Review provided by G2

What do you like best about the product?

As a growing Data Engineer, the community support and clear documentation of Databricks really helps me to guide through the problems. I've been managing the jobs and pipelines where failures are bound to happen, debugging with the Diagnose this error with AI feature has helped me with fasterthe failure recovery SLA. The UI is neat and makes it very easy to move between notebooks, SQL, and PySpark without much friction. Since I work with a team, collaboration is must. Sharing notebooks and iterating with teammates feels easy. I really like that I can rely on the ABAC policies to setup the Data Quality and Governance.

What do you dislike about the product?

I am not hundred percent sure if I would use the term dislike, I think it's just a personal preference. I sometimes feel the compute being used is a lot more than it should be for a simple query. Maybe the shuffle read/write that always gets involved when you're using a delta tables sometimes slows down the job.

What problems is the product solving and how is that benefiting you?

Databricks is helping our clients to manage the lakehouse and warehouse architecture in a much more structured way. We use it as the landing layer from S3 and then process data through our medallion architecture (bronze, silver, and gold) before delivering it to the final products. It’s been very effective for orchestrating daily jobs and pipelines. I also really like the asset bundles and how easily everything integrates with Git, which makes version control and deployments much smoother for the team. I am more likely to use Databricks as my go to platform for data lakehouse and warehousing.

Databricks Data Intelligence Platform

Reviews from AWS customer

External reviews

Intuitive, Limitless Analytics for End-to-End Data Pipelines

Simplifies Data Engineering, Needs Better Tool Integration

Genie Code Agent Mode Made Our Migration to Databricks Fast and Accurate

Databricks: A True Unified Analytics & AI Platform That Boosts Speed and Reliability

Databricks Notebooks Make Collaboration Seamless Across Python, SQL, and Scala

Reimagining Data Workflows & Insights with Genie: NLQ spaces, Agent Mode, and Intelligent Coding

Essential Data Processing with Seamless Collaboration

Revolutionized HR Analytics with Genie, Minor Cost Concerns

An all-in-one platform

Powerful Warehousing, Collaborative, AI Debugging