Databricks Data Intelligence Platform
Databricks, Inc.External reviews
763 reviews
from
and
External reviews are not included in the AWS star rating for the product.
Feature-Rich Databricks: Genie, Lakehouse Connect, and Streaming Tables Shine
What do you like best about the product?
Databricks has many features compared to other platforms. The key ones I’ve noticed are Genie, Lakehouse Connect, and streaming tables.
What do you dislike about the product?
One thing I’ve noticed in Databricks is that we aren’t able to deploy alerts from one environment to another.
What problems is the product solving and how is that benefiting you?
Databricks addresses key data challenges like siloed tools, scalability limits, and complex governance in modern analytics.
From Hive Chaos to Unity Catalog - Worth Every DBU
What do you like best about the product?
Unity Catalog has been the single biggest value-add for our enterprise migration. We moved from a Hive Metastore architecture to Unity Catalog and gained centralized governance, lineage tracking, and fine-grained access control across all our data assets without bolting on third-party tools. For a multi-domain organization (finance, manufacturing, supply chain, procurement), having one catalog that enforces consistent naming and permissions across bronze, silver, gold, and platinum layers saved us weeks of manual policy work.
UI/UX: The notebook experience with inline Spark SQL and PySpark, combined with the workspace file browser, makes it straightforward for our team to develop and test transformations iteratively. The SQL editor for ad-hoc queries against Unity Catalog tables is clean and responsive.
Integrations: Native Delta Lake support means we don't manage format conversions. The Azure Key Vault integration via secret scopes (dbutils.secrets.get) keeps credentials out of code. ADF integration for orchestration in our V1 environment was seamless, and Databricks Asset Bundles (DAB) for V2 deployment give us a clean CI/CD path with databricks.yml configs targeting dev/qa/prod without custom scripting.
Performance: Switching to CTEs over temp views in our Gold notebooks reduced cluster memory pressure noticeably. The ability to right-size clusters per environment (1 worker for dev, 3 for production) with Standard_D4ds_v5 nodes keeps costs predictable while maintaining performance for our batch ETL workloads.
Pricing/ROI: The pay-as-you-go compute model paired with single-user security mode clusters means we're not over-provisioning. Consolidating our ETL, governance, and BI serving layer into one platform eliminated licensing for separate catalog, orchestration, and data quality tools.
AI/Intelligence (Genie): Genie Spaces have been an unexpected win. Our business analysts in finance and supply chain can ask natural language questions against curated Gold/Platinum tables without writing SQL. It reduced the number of ad-hoc report requests coming to the data team by giving domain users a self-service path that still respects Unity Catalog permissions.
Support/Onboarding: The documentation is thorough, and the skills-based approach to learning (bundles, Unity Catalog, jobs, SQL) maps well to how our team actually works. Onboarding new engineers to the V2 architecture took about half the time compared to V1 because the platform conventions (medallion architecture, asset bundles, catalog naming) are well-documented and consistent.
UI/UX: The notebook experience with inline Spark SQL and PySpark, combined with the workspace file browser, makes it straightforward for our team to develop and test transformations iteratively. The SQL editor for ad-hoc queries against Unity Catalog tables is clean and responsive.
Integrations: Native Delta Lake support means we don't manage format conversions. The Azure Key Vault integration via secret scopes (dbutils.secrets.get) keeps credentials out of code. ADF integration for orchestration in our V1 environment was seamless, and Databricks Asset Bundles (DAB) for V2 deployment give us a clean CI/CD path with databricks.yml configs targeting dev/qa/prod without custom scripting.
Performance: Switching to CTEs over temp views in our Gold notebooks reduced cluster memory pressure noticeably. The ability to right-size clusters per environment (1 worker for dev, 3 for production) with Standard_D4ds_v5 nodes keeps costs predictable while maintaining performance for our batch ETL workloads.
Pricing/ROI: The pay-as-you-go compute model paired with single-user security mode clusters means we're not over-provisioning. Consolidating our ETL, governance, and BI serving layer into one platform eliminated licensing for separate catalog, orchestration, and data quality tools.
AI/Intelligence (Genie): Genie Spaces have been an unexpected win. Our business analysts in finance and supply chain can ask natural language questions against curated Gold/Platinum tables without writing SQL. It reduced the number of ad-hoc report requests coming to the data team by giving domain users a self-service path that still respects Unity Catalog permissions.
Support/Onboarding: The documentation is thorough, and the skills-based approach to learning (bundles, Unity Catalog, jobs, SQL) maps well to how our team actually works. Onboarding new engineers to the V2 architecture took about half the time compared to V1 because the platform conventions (medallion architecture, asset bundles, catalog naming) are well-documented and consistent.
What do you dislike about the product?
UI/UX: The notebook editor still feels behind dedicated IDEs. No native multi-file search, limited refactoring support, and the git integration UI is clunky for teams managing dozens of notebooks across workflow bundles. We ended up doing all real development in VS Code and treating the Databricks workspace as a deployment target, which adds friction. The workspace file browser also doesn't handle folder structures well when you have 50+ notebooks organized by domain there's no filtering, tagging, or favorites.
Integrations: Databricks Asset Bundles (DAB) are a step forward, but the documentation has gaps for complex multi-bundle deployments. We run a shared Global_Utilities bundle that other workflow bundles depend on, and getting cross-bundle references to work reliably across dev/qa/prod targets required significant trial and error. The ADF-to-Databricks integration works, but debugging failed pipeline runs means jumping between the ADF monitoring UI and Databricks job runs with no unified view. A tighter handshake between orchestration and compute monitoring would save hours of troubleshooting.
Performance: Cluster cold-start times remain a pain point for development workflows. Spinning up a single-node Standard_D4ds_v5 cluster takes 4-7 minutes, which breaks flow when you're iterating on notebook logic. Serverless compute helps but isn't available for all workload types yet, and the cost premium is hard to justify for dev/test environments.
Pricing/ROI: The DBU pricing model is opaque for capacity planning. Estimating monthly costs for a project with 30+ scheduled jobs, interactive development clusters, and SQL warehouse queries requires building custom spreadsheets because the built-in cost management tools don't give you a clear forecast by workflow or domain. We've been surprised by cost spikes from jobs that ran longer than expected with no easy way to set per-job budget alerts.
Support/Onboarding: Enterprise support response times are inconsistent. Critical issues with Unity Catalog permissions during our migration took 3-5 business days for initial triage, which stalled our deployment timeline. The community forums are helpful for common patterns, but for Unity Catalog edge cases (cross-catalog lineage, complex permission inheritance), the knowledge base is thin.
AI/Intelligence: Genie is promising but still rough for production use. It struggles with joins across more than 3-4 tables, sometimes generates incorrect SQL against our Gold layer, and there's no easy way to curate or correct its responses to improve accuracy over time. Our business users got excited, tried it, hit wrong answers on moderately complex questions, and lost trust. A feedback loop where domain experts can flag and correct Genie's outputs would make it genuinely production-ready.
Integrations: Databricks Asset Bundles (DAB) are a step forward, but the documentation has gaps for complex multi-bundle deployments. We run a shared Global_Utilities bundle that other workflow bundles depend on, and getting cross-bundle references to work reliably across dev/qa/prod targets required significant trial and error. The ADF-to-Databricks integration works, but debugging failed pipeline runs means jumping between the ADF monitoring UI and Databricks job runs with no unified view. A tighter handshake between orchestration and compute monitoring would save hours of troubleshooting.
Performance: Cluster cold-start times remain a pain point for development workflows. Spinning up a single-node Standard_D4ds_v5 cluster takes 4-7 minutes, which breaks flow when you're iterating on notebook logic. Serverless compute helps but isn't available for all workload types yet, and the cost premium is hard to justify for dev/test environments.
Pricing/ROI: The DBU pricing model is opaque for capacity planning. Estimating monthly costs for a project with 30+ scheduled jobs, interactive development clusters, and SQL warehouse queries requires building custom spreadsheets because the built-in cost management tools don't give you a clear forecast by workflow or domain. We've been surprised by cost spikes from jobs that ran longer than expected with no easy way to set per-job budget alerts.
Support/Onboarding: Enterprise support response times are inconsistent. Critical issues with Unity Catalog permissions during our migration took 3-5 business days for initial triage, which stalled our deployment timeline. The community forums are helpful for common patterns, but for Unity Catalog edge cases (cross-catalog lineage, complex permission inheritance), the knowledge base is thin.
AI/Intelligence: Genie is promising but still rough for production use. It struggles with joins across more than 3-4 tables, sometimes generates incorrect SQL against our Gold layer, and there's no easy way to curate or correct its responses to improve accuracy over time. Our business users got excited, tried it, hit wrong answers on moderately complex questions, and lost trust. A feedback loop where domain experts can flag and correct Genie's outputs would make it genuinely production-ready.
What problems is the product solving and how is that benefiting you?
Data Governance Fragmentation → Unified Catalog We struggled with a Hive Metastore environment where table ownership, access control, and lineage were managed through a patchwork of manual documentation and custom scripts. After implementing Unity Catalog, we now have centralized governance across 4 catalog layers (bronze, silver, gold, platinum) spanning 6 business domains. What used to take a full-time data steward to track manually is now enforced automatically through catalog-level permissions and lineage. This cut our access provisioning time from days to under an hour per request.
Siloed ETL Logic → Standardized Medallion Architecture Before Databricks, our ETL pipelines were inconsistent — different teams wrote transformations differently, with no shared utilities or patterns. We built a standardized framework (Batch_Utilities.py) with reusable functions for schema validation, merge operations, data quality checks, and audit column management. Every notebook across all domains now follows the same 7-cell structure. This reduced new notebook development time from 2-3 days to roughly 4 hours, and onboarding a new developer to the pattern takes a single afternoon instead of a week.
Costly Report Refresh Failures → Reliable Pipeline Orchestration We had recurring issues with Power BI reports pulling stale or incomplete data because upstream jobs failed silently. With Databricks Jobs and metadata-driven pipeline tracking (pipeline status, start/end timestamps logged per run), we now catch failures at the transformation layer before they propagate to reports. Report data freshness issues dropped by approximately 80%, and our finance team stopped scheduling "data verification" meetings that used to consume 3-4 hours per week.
Multi-Environment Deployment Chaos → Asset Bundles Deploying notebooks across dev, QA, and production used to involve manual file copies and environment-specific config edits — error-prone and slow. Databricks Asset Bundles gave us declarative databricks.yml configs with variable substitution per target. A deployment that took 45 minutes of manual steps now runs in under 5 minutes via CLI. We deploy with confidence because the same bundle definition is validated before it hits production.
Self-Service Analytics Gap → Genie + Platinum Layer Business analysts in supply chain and finance were fully dependent on the data team for any ad-hoc analysis. By building denormalized Platinum tables optimized for reporting and exposing them through Genie Spaces, we enabled self-service querying in natural language. Early adoption has reduced ad-hoc report requests to the data team by roughly 30%, freeing up engineering capacity for new feature development.
Cost Visibility → Right-Sized Compute We were over-provisioning clusters because we had no clear view of actual utilization. By standardizing on Standard_D4ds_v5 nodes with environment-specific worker counts (1 for dev/QA, 3 for production) and single-user security mode, we reduced our monthly compute spend by approximately 25% compared to the shared cluster model we ran in V1.
Siloed ETL Logic → Standardized Medallion Architecture Before Databricks, our ETL pipelines were inconsistent — different teams wrote transformations differently, with no shared utilities or patterns. We built a standardized framework (Batch_Utilities.py) with reusable functions for schema validation, merge operations, data quality checks, and audit column management. Every notebook across all domains now follows the same 7-cell structure. This reduced new notebook development time from 2-3 days to roughly 4 hours, and onboarding a new developer to the pattern takes a single afternoon instead of a week.
Costly Report Refresh Failures → Reliable Pipeline Orchestration We had recurring issues with Power BI reports pulling stale or incomplete data because upstream jobs failed silently. With Databricks Jobs and metadata-driven pipeline tracking (pipeline status, start/end timestamps logged per run), we now catch failures at the transformation layer before they propagate to reports. Report data freshness issues dropped by approximately 80%, and our finance team stopped scheduling "data verification" meetings that used to consume 3-4 hours per week.
Multi-Environment Deployment Chaos → Asset Bundles Deploying notebooks across dev, QA, and production used to involve manual file copies and environment-specific config edits — error-prone and slow. Databricks Asset Bundles gave us declarative databricks.yml configs with variable substitution per target. A deployment that took 45 minutes of manual steps now runs in under 5 minutes via CLI. We deploy with confidence because the same bundle definition is validated before it hits production.
Self-Service Analytics Gap → Genie + Platinum Layer Business analysts in supply chain and finance were fully dependent on the data team for any ad-hoc analysis. By building denormalized Platinum tables optimized for reporting and exposing them through Genie Spaces, we enabled self-service querying in natural language. Early adoption has reduced ad-hoc report requests to the data team by roughly 30%, freeing up engineering capacity for new feature development.
Cost Visibility → Right-Sized Compute We were over-provisioning clusters because we had no clear view of actual utilization. By standardizing on Standard_D4ds_v5 nodes with environment-specific worker counts (1 for dev/QA, 3 for production) and single-user security mode, we reduced our monthly compute spend by approximately 25% compared to the shared cluster model we ran in V1.
Databricks Makes End-to-End Data Workflows Fast, Collaborative, and Easy
What do you like best about the product?
What I like most about Databricks is how it simplifies the entire data workflow. Instead of switching between multiple tools for data processing, analysis, and machine learning, everything is available in one place. The notebook environment makes collaboration really smooth it feels natural to work with teammates, share code, and explain logic without extra effort.
Another thing I appreciate is the performance. Working with large datasets can usually be painful, but Databricks handles it efficiently in the background. You don’t have to worry much about managing clusters or optimizing everything manually it just works most of the time, which lets you focus more on solving the actual problem rather than dealing with infrastructure.
What also stands out is the way it handles data governance and organization. With features like centralized access control and better visibility into data usage, it becomes much easier to manage data responsibly, especially in larger projects. Overall, it gives a good balance between power and ease of use, which is why I enjoy working with it.
Another thing I appreciate is the performance. Working with large datasets can usually be painful, but Databricks handles it efficiently in the background. You don’t have to worry much about managing clusters or optimizing everything manually it just works most of the time, which lets you focus more on solving the actual problem rather than dealing with infrastructure.
What also stands out is the way it handles data governance and organization. With features like centralized access control and better visibility into data usage, it becomes much easier to manage data responsibly, especially in larger projects. Overall, it gives a good balance between power and ease of use, which is why I enjoy working with it.
What do you dislike about the product?
One thing I don’t particularly like about Databricks is that it can get expensive pretty quickly, especially if clusters are not managed properly. If you forget to terminate clusters or run heavy workloads without optimization, costs can spike without much visibility at first. For teams that are still learning or experimenting, this can become a concern.
Another downside is that debugging can sometimes feel a bit tricky, particularly when working with distributed jobs. Errors are not always straightforward, and tracing issues across multiple nodes can take extra time compared to working in a simpler local environment. It requires a certain level of experience to quickly understand and fix issues.
Also, while the platform is powerful, it has a bit of a learning curve for beginners. Concepts like cluster configuration, job scheduling, and data governance are not always very intuitive at the start. It takes some hands-on time before you feel fully comfortable navigating and using everything efficiently
Another downside is that debugging can sometimes feel a bit tricky, particularly when working with distributed jobs. Errors are not always straightforward, and tracing issues across multiple nodes can take extra time compared to working in a simpler local environment. It requires a certain level of experience to quickly understand and fix issues.
Also, while the platform is powerful, it has a bit of a learning curve for beginners. Concepts like cluster configuration, job scheduling, and data governance are not always very intuitive at the start. It takes some hands-on time before you feel fully comfortable navigating and using everything efficiently
What problems is the product solving and how is that benefiting you?
What Databricks really solves is the problem of handling large-scale data without making the process overly complex. Earlier, working with big data meant dealing with multiple tools, managing infrastructure, and spending a lot of time just setting things up. Databricks simplifies all of that by bringing data engineering, analytics, and machine learning into one place, so the focus shifts more toward solving actual business problems instead of managing systems.
It also addresses performance and scalability issues. When working with huge volumes of data, traditional systems often struggle or slow down. Databricks handles this efficiently in the background, allowing workloads to scale without much manual effort. For me, this means I can process large datasets faster and run transformations or queries without constantly worrying about performance tuning.
Another big problem it solves is collaboration and data management. In many projects, teams struggle with version control, access management, and keeping data consistent. Databricks makes it easier to collaborate, track changes, and control who can access what. This helps me work more smoothly with others, reduces errors, and ensures that the data I’m using is reliable and well-governed.
It also addresses performance and scalability issues. When working with huge volumes of data, traditional systems often struggle or slow down. Databricks handles this efficiently in the background, allowing workloads to scale without much manual effort. For me, this means I can process large datasets faster and run transformations or queries without constantly worrying about performance tuning.
Another big problem it solves is collaboration and data management. In many projects, teams struggle with version control, access management, and keeping data consistent. Databricks makes it easier to collaborate, track changes, and control who can access what. This helps me work more smoothly with others, reduces errors, and ensures that the data I’m using is reliable and well-governed.
Databricks Unifies Data, Analytics, and ML for Scalable Lakehouse Workflows
What do you like best about the product?
Databricks is especially helpful because it brings data engineering, analytics, and machine learning together in a single unified platform, which reduces the need to manage multiple separate tools. Built on Apache Spark, it can process massive datasets quickly and scale smoothly as workloads grow, making it a strong fit for big data use cases. It also supports collaborative notebooks where teams can work together in languages like Python and SQL, which makes it easier for data scientists and engineers to collaborate effectively.
With its lakehouse architecture powered by Delta Lake, Databricks combines the flexibility of data lakes with the reliability of data warehouses, helping ensure better data consistency and performance. In addition, it integrates with tools like MLflow to streamline the machine learning lifecycle end to end, from experimentation through deployment. Overall, Databricks simplifies complex data workflows, improves performance, and helps organizations build scalable data and AI solutions more efficiently.
With its lakehouse architecture powered by Delta Lake, Databricks combines the flexibility of data lakes with the reliability of data warehouses, helping ensure better data consistency and performance. In addition, it integrates with tools like MLflow to streamline the machine learning lifecycle end to end, from experimentation through deployment. Overall, Databricks simplifies complex data workflows, improves performance, and helps organizations build scalable data and AI solutions more efficiently.
What do you dislike about the product?
Databricks does have some limitations, although many of them feel more like trade-offs than outright negatives. A frequently cited drawback is cost: while the platform is flexible and scalable, expenses can rise quickly if clusters aren’t managed carefully. At the same time, that cost often reflects its ability to handle very large workloads efficiently when it’s properly optimized.
Another consideration is the learning curve, especially for beginners who aren’t familiar with Apache Spark or distributed systems. That complexity can be challenging at first, but it also comes with the benefit of powerful capabilities once you get comfortable with it. Some users also find that debugging and performance tuning are less straightforward than with simpler tools; however, Databricks offers detailed monitoring and optimization features that can make these tasks easier over time.
Finally, because it’s a managed platform, there can be a sense of reduced control compared with fully self-managed systems. In return, it removes much of the operational burden that comes with infrastructure management. Overall, while these areas may be seen as the “least helpful” aspects, they’re often balanced by the platform’s scalability, integration, and productivity gains.
Another consideration is the learning curve, especially for beginners who aren’t familiar with Apache Spark or distributed systems. That complexity can be challenging at first, but it also comes with the benefit of powerful capabilities once you get comfortable with it. Some users also find that debugging and performance tuning are less straightforward than with simpler tools; however, Databricks offers detailed monitoring and optimization features that can make these tasks easier over time.
Finally, because it’s a managed platform, there can be a sense of reduced control compared with fully self-managed systems. In return, it removes much of the operational burden that comes with infrastructure management. Overall, while these areas may be seen as the “least helpful” aspects, they’re often balanced by the platform’s scalability, integration, and productivity gains.
What problems is the product solving and how is that benefiting you?
Databricks helps solve the challenge of fragmented data and disconnected workflows across multiple business verticals by providing a unified lakehouse platform. In my role as a data engineer, this allows me to consolidate data from different sources into a single, reliable system using Apache Spark for scalable processing and Delta Lake for ensuring data quality and consistency. This significantly reduces pipeline complexity, improves reliability, and enables faster delivery of clean, governed data to downstream teams. As a result, I’m able to support analytics and machine learning use cases more efficiently while minimizing operational overhead and improving overall productivity across the organization.
Intuitive Analytics with AI Genie, Needs Performance Tweaks
What do you like best about the product?
I really like the inclusion of the AI Genie into Databricks. It helps to make data analytics easier and more intuitive for me. I am able to query my datasets using natural language, translating it to SQL queries and generating visualizations and insights.
What do you dislike about the product?
I would like to have some flexibility around operational complexity and performance tuning.
What problems is the product solving and how is that benefiting you?
Databricks helps me fix data duplication and inconsistency with idempotent pipelines. I also use AI Genie to query datasets using natural language, translating into SQL queries and generating visualizations, making analytics easier.
Versatile Platform with Robust Data Governance
What do you like best about the product?
I personally like the Databricks UI, especially the dark mode. Technically, I find Unity Catalog's built-in lineage and governance very valuable. Auto-loader's incremental file processing with exactly-once guarantees and Delta Lake's ACID reliability are my personal favorites. Delta Lake's ACID transactions ensure our data pipelines either fully succeed or fully roll back, which prevents partial writes from corrupting tables. Time travel in Delta Lake allows us to query previous versions of our table for audits without needing separate snapshots. Unity Catalog's capability to auto-track lineage across our entire pipeline is critical for regulatory audits, and its role-based access control and column masking ensure data access is properly managed across teams. The workspace and notebook setup were straightforward, making the initial setup relatively easy.
What do you dislike about the product?
Migrating from hive_metastore to Unity Catalog is painful with limited tooling - UCX helps but it's still a heavy lift. Databricks-to-dbt cloud orchestration lack a clean native handoff, forcing custom API polling code that's fragile and hard to debug. Cost visibility for Serverless SQL warehouse could be more granular - it's hard to attribute DBU spend to specific pipelines or dbt models without digging into system tables manually.
What problems is the product solving and how is that benefiting you?
Databricks replaced our fragmented data stack with one platform for ingestion, ETL, analytics, and governance. Unity Catalog handles regulatory lineage needs by auto-tracking data provenance.
Great Infrastructure for Reliable Data Management
What do you like best about the product?
They have really great and initiative infrastructure that gives all that we need for our data management.
What do you dislike about the product?
All the techy issues that we are having we consult with their support team and got solution for it.
What problems is the product solving and how is that benefiting you?
Overall the reliability and features that Databricks providing is the most helpful and privious for us that make work more easy and hassle free.
Fast, Seamless Databricks for Big Data Pipelines, and Analytics in One Place
What do you like best about the product?
What I love most about Databricks is how fast and connected everything is.
Compared to other platforms, it handles heavy big data pipelines without breaking a sweat. But the best part is how easy it is to use that data once it's processed.
Whether I need to build a quick analytics dashboard or train custom machine learning models specific to our data, it all connects seamlessly. It just takes the headache out of moving data around and lets you do everything in one place.
Compared to other platforms, it handles heavy big data pipelines without breaking a sweat. But the best part is how easy it is to use that data once it's processed.
Whether I need to build a quick analytics dashboard or train custom machine learning models specific to our data, it all connects seamlessly. It just takes the headache out of moving data around and lets you do everything in one place.
What do you dislike about the product?
If I had to choose what I dislike, it mainly comes down to the cost and how complex it can be.
First, it can get expensive very quickly. If you’re not careful about managing your computing clusters and shutting them down when you’re done, the bills can creep up on you.
Second, it can sometimes feel like overkill for simpler tasks. Since it’s built for massive data, having to dig through complicated error logs when something breaks can be a real headache compared to using lighter tools.
First, it can get expensive very quickly. If you’re not careful about managing your computing clusters and shutting them down when you’re done, the bills can creep up on you.
Second, it can sometimes feel like overkill for simpler tasks. Since it’s built for massive data, having to dig through complicated error logs when something breaks can be a real headache compared to using lighter tools.
What problems is the product solving and how is that benefiting you?
The main problem Databricks helps me solve in my business is performance. We used to wait for hours for pipelines to run in ADF, and now we can get them done in minutes.
Streamlines Data Engineering with Minor Delays
What do you like best about the product?
I like using Databricks because it helps me create fast processing ETL pipelines and solve orchestration and storage issues. I appreciate the genie because it helps me gain fast insights from the data. It reduces my processing times by days of discovery to hours and makes my quotes competitive to clients. The onboarding was smooth with intuitive features, which makes my job easier.
What do you dislike about the product?
I feel Databricks clusters to be slow in spinning up with long wait time for just small tasks as well.
What problems is the product solving and how is that benefiting you?
Databricks lets me create fast processing ETL pipelines, solving orchestration and storage issues. Plus, with Unity Catalog, I manage governance smoothly without worrying about background complexities.
Databricks Streamlines End-to-End ETL with Unity Catalog and AI-Powered Debugging
What do you like best about the product?
What stands out to me is how Databricks simplifies the end-to-end ETL lifecycle. The platform’s steady integration of new features has noticeably reduced the friction of ingesting data from a wide range of source systems.
Unity Catalog (UC) has also been a game-changer for data administration. It offers a centralized, robust governance layer that makes managing complex environments feel much more intuitive and easier to control.
I’m especially impressed by the recent AI-driven updates. Genie Code has become an essential part of my workflow; it has dramatically improved my debugging speed and is already proving to be a valuable asset in my current UC migration project. Overall, the way Databricks blends traditional data engineering with assisted intelligence feels genuinely forward-thinking.
Unity Catalog (UC) has also been a game-changer for data administration. It offers a centralized, robust governance layer that makes managing complex environments feel much more intuitive and easier to control.
I’m especially impressed by the recent AI-driven updates. Genie Code has become an essential part of my workflow; it has dramatically improved my debugging speed and is already proving to be a valuable asset in my current UC migration project. Overall, the way Databricks blends traditional data engineering with assisted intelligence feels genuinely forward-thinking.
What do you dislike about the product?
While Auto Loader is powerful, there are still notable gaps in the Lakehouse Data Pipeline (LDP) around schema inference. Right now, when inferSchema is enabled, the inferred schema only applies to the first level of the hierarchy. In complex datasets with multi-nested fields, the lack of deep schema inference creates manual overhead and makes streaming CDC pipelines harder to build and maintain.
Lakeflow Connect feels like a step in the right direction, but the library of native connectors still seems incomplete compared to some competitors. And while the AI features (like Genie) are promising and genuinely interesting, they still come across as being in a “developing” stage—sometimes lacking the consistency you need for high-stakes production environments. I’d like to see these capabilities evolve from “innovative extras” into hardened, production-ready tools.
Lakeflow Connect feels like a step in the right direction, but the library of native connectors still seems incomplete compared to some competitors. And while the AI features (like Genie) are promising and genuinely interesting, they still come across as being in a “developing” stage—sometimes lacking the consistency you need for high-stakes production environments. I’d like to see these capabilities evolve from “innovative extras” into hardened, production-ready tools.
What problems is the product solving and how is that benefiting you?
The Problem: Data Silos & Inefficient Support Operations
In many organizations, critical institutional knowledge ends up scattered across disconnected systems such as MySQL (structured), Jira (transactional), and Confluence (unstructured). When information is fragmented this way, support teams struggle to find fast, accurate answers for incoming tickets. The result is higher MTTR (Mean Time to Resolution) and a lot of repetitive, manual effort.
The Solution: A Unified “Intelligence Platform”
Databricks addresses this by serving as a single fabric that connects these silos. In my work, I focus on using the Lakehouse Data Pipeline (LDP) to ingest and unify these different sources into one governed environment.
How this benefits my project:
I use Databricks for seamless ingestion, centralizing data from MySQL, Jira, and Confluence to build a comprehensive “Knowledge Base” without having to manage multiple, disparate ETL tools.
I also rely on native AI integration. With Mosaic AI Vector Search, I can convert the unified data into embeddings directly within the platform, which lets me build an AI Automation Agent for our ticketing system.
Finally, it supports automated solutioning. The agent can run vector matching on newly created tickets against the full historical knowledge base and then propose accurate, context-aware solutions to engineers right away.
The Impact
The biggest benefit for us is operational velocity. Databricks has shifted our data from a passive archive into an active, “intelligent” engine. It reduces time spent on manual research and helps us automate the first line of support, improving the accuracy of ticket resolutions while lowering the burden on our technical teams.
In many organizations, critical institutional knowledge ends up scattered across disconnected systems such as MySQL (structured), Jira (transactional), and Confluence (unstructured). When information is fragmented this way, support teams struggle to find fast, accurate answers for incoming tickets. The result is higher MTTR (Mean Time to Resolution) and a lot of repetitive, manual effort.
The Solution: A Unified “Intelligence Platform”
Databricks addresses this by serving as a single fabric that connects these silos. In my work, I focus on using the Lakehouse Data Pipeline (LDP) to ingest and unify these different sources into one governed environment.
How this benefits my project:
I use Databricks for seamless ingestion, centralizing data from MySQL, Jira, and Confluence to build a comprehensive “Knowledge Base” without having to manage multiple, disparate ETL tools.
I also rely on native AI integration. With Mosaic AI Vector Search, I can convert the unified data into embeddings directly within the platform, which lets me build an AI Automation Agent for our ticketing system.
Finally, it supports automated solutioning. The agent can run vector matching on newly created tickets against the full historical knowledge base and then propose accurate, context-aware solutions to engineers right away.
The Impact
The biggest benefit for us is operational velocity. Databricks has shifted our data from a passive archive into an active, “intelligent” engine. It reduces time spent on manual research and helps us automate the first line of support, improving the accuracy of ticket resolutions while lowering the burden on our technical teams.
showing 41 - 50