AWS Marketplace: Databricks Data Intelligence Platform Reviews

Kishor D.

Lake House platform review

October 17, 2023
Review provided by G2

What do you like best about the product?

It's ease of use and optimisation databricks offers for big data test cases

What do you dislike about the product?

Some times it's slow and initial learning should be encouraged

What problems is the product solving and how is that benefiting you?

Solved customer feedback through audio video then fed the data to Lakehouse platform to generate insights for a multinational company

Program Development

A Robust Solution for Big Data Management and Analytics

October 16, 2023
Review provided by G2

What do you like best about the product?

I really appreciate how Databricks Lakehouse Platform merges the strengths of data lakes and data warehouses, creating a unified space for handling data tasks. It’s like having the best of both worlds which simplifies data management and analytics significantly. The collaborative notebooks are a cherry on top, fostering teamwork and making iterative development smooth among our data teams. Plus, the high-speed analytics capabilities ensure that data queries and sharing are a breeze, which is crucial for our day-to-day decision-making processes.

What do you dislike about the product?

The initial hill to climb in terms of learning can be a bit steep, especially if you're new to big data platforms. It felt like a slow start, but once past that hurdle, things started to click. The cost factor can be a bit of a pinch, especially for smaller setups or projects on a tight budget. While there's a decent amount of guidance out there, the documentation can sometimes leave you wanting more, especially when you hit complex or unique challenges that require a deeper dive to navigate through.

What problems is the product solving and how is that benefiting you?

Databricks Lakehouse Platform is tackling the headache of juggling between data lakes and data warehouses. It's kind of bundled the two into one neat package, making data management and analytics way less complicated. This fusion is cutting down on the tech pile-up in our projects big time, making it simpler to switch between data engineering and data science chores. When it comes to number crunching, the platform's speedy analytics is making queries quick and data sharing reliable, which is big for our decision-making. The collaborative notebooks are a cool feature too, they're encouraging teamwork and making back-and-forth on development ideas smoother among our data teams. There’s a bit to chew on initially learning-wise and the pricing can sting a little, but the payback in simpler data operations and better teamwork is solid.

Hospital & Health Care

Experience with databricks

October 12, 2023
Review provided by G2

What do you like best about the product?

Capability to run the sql query using spark sql to read the data from containers , which also gives the flexibility to play with data. As we are using spark which is very fast and optimize the sql query which help in getting the outputs very even the volume of data is very huge.

What do you dislike about the product?

I have mostly used azure databricks which some time takes lot of time to get started and and some time it didn't even start.

What problems is the product solving and how is that benefiting you?

Interoperability and no vendor lock-in.
End-to-end support for machine learning and faster AI delivery.

Paras J.

Databricks - A breath of Fresh air in Big Data

October 11, 2023
Review provided by G2

What do you like best about the product?

The best part about the Databricks Lakehouse platform is the integration of traditional, tried and tested big data technologies with a UI that is welcoming, refined and revolutionary!

What do you dislike about the product?

The older ui for the platform had well separated elements for data science, data engineering and SQL Workspace. In an effort to combine them in the sidebar, the new UI tries too hard and ends up as a laggy and chaotic mess.

What problems is the product solving and how is that benefiting you?

the databricks lakehouse platform brings out the best of open source together, stitched beautifully in a notebook based UI that feels welcoming and way less intimidating than a traditional Spark distributions.
The platform has a solution for every data person, including but not limited to a Notebook that works with Scala, Python, R and SQL, a traditional SQL Editor, downloadable datasets and in house visualisations just a click away!

Pranshu G.

Data Lake but combined with Datawarehouse benefits

October 07, 2023
Review provided by G2

What do you like best about the product?

It offers ACID transactions which is a massive suppport for data consistency, along with this, the leveraging features such as Time travel and schema evolution comes real handy while builidng a scalable solution. In addition of all above,it reduce data storage costs all while not compromising on powerful distributed programming.

What do you dislike about the product?

With all the features combines, it truly is a powerful tool however, it can be a real challange for new users to master it. For BI users, analysts, who arent skilled with programming, may find it difficult to understand the workflow. Moreover, the community for this tool is currently relatively small and hence minimzing community support.

What problems is the product solving and how is that benefiting you?

The business requires to keep to update powerBI dashboard reports which are ever increasing day by day. As a solution, we are utilizing the lakehouse's ACID, and features such as Schema evolution to clean and transform data and build BI dashboard using the now cleaned data.
This solution has eliminated dependency on our already saturated datawarehouse resources. This has also helped in debugging as all data is processed and resides in one place.Last but not the least, this has reduced costs of our datawarehouse by 20%

Senthil K.

Databricks Genie Code - Agentic Applied AI for end-end SDL liefecycle

October 03, 2023
Review provided by G2

What do you like best about the product?

Genie Code

1) Genie Code automated our ETL processes, reducing manual effort and increasing efficiency. With Agentic’s SDL, we implemented CI/CD pipelines for faster, seamless updates and deployments.

2) Genie Code streamlined complex STTM mappings, improving accuracy and speed. Agentic’s real-time updates ensured mapping adjustments were made dynamically to align with changing transaction data.

3) We defined automated unit tests using SKILL.md, ensuring data transformations are validated before deployment. This reduced errors and ensured data quality, boosting confidence in our analytics.

4) Using Skills.md, we added custom extensions to Genie Code, such as integrating third-party data for enriched reports. This agility allowed us to quickly adapt to business needs and deliver new capabilities.

5) Agentic’s SDL enabled real-time data processing, providing immediate analytics for decision-making. Our marketing and sales teams now act on fresh data instantly, improving response times and overall efficiency.

What do you dislike about the product?

Hope it can be improved in next update -

Debugging issues in complex workflows can be time-consuming due to limited visibility into intermediate data transformations.

Genie Code lacks advanced error recovery mechanisms, making it difficult to manage failures in large-scale data pipelines.

As data volume increases, Genie Code’s performance can degrade, requiring significant manual adjustments to ensure smooth operation at scale.

What problems is the product solving and how is that benefiting you?

1) Scalable Processing - Built on Databricks' Spark-based architecture, Genie Code efficiently handles and scales processing for massive datasets, ensuring performance even with increasing data volumes.

2) Genie Code automates end-to-end ETL workflows, from data extraction to transformation and loading, streamlining data operations and eliminating manual tasks.

3) Real time collaboration - Genie Code enables real-time collaboration across teams by using shared notebooks, making it easier for data professionals to build and refine workflows collectively.

Financial Services

A Tool Box to the Modern Big Data Data Scientist

September 05, 2023
Review provided by G2

What do you like best about the product?

The upscale in storing and retrieving large quantities of data with its sdk to s3. In addition, great resources allocation support and additional tools such as clearml.

What do you dislike about the product?

The compatibility to pandas is lacking due to the fact that it is mainly used by me with pyspark which didnt allow an optimal usage for the various pandas libraries.

What problems is the product solving and how is that benefiting you?

Retrieving and querying a very large data warehouse on s3 (several hunders of T'). Performing basic filtering and quering on the data and running a ML experiment on huge amounts of data.

Felix V.

Great tool for data exploration and development, no so much for production pipelines

August 23, 2023
Review provided by G2

What do you like best about the product?

Easy to set up processes and iterate.
Shareability

What do you dislike about the product?

Not tailored for production integration
Hard to incorporate without being databricks aware, which leads to a vendor lock

What problems is the product solving and how is that benefiting you?

Gaining data visibility
Developing spark jobs towards production

Nabil Fegaiere1

A powerful solution that is easily integrated into a variety of platforms

August 21, 2023
Review provided by PeerSpot

What is our primary use case?

I am a Databricks service partner, and my customers use Azure Databricks and Data Factory.

What is most valuable?

It's very simple to use Databricks Apache Spark. It's really good for parallel execution to scale up the workload. In this context, the usage is more about virtual machines.

Using meta-stores like Hive was optional, and the solution is good for data science use cases. With the Authenticator Log, Databricks is good for data transformation and BI usage. We have a platform.

What needs improvement?

I would like more integration with SQL for using data in different workspaces. We use the user interface for some functionalities, while for others, we have to use SQL to create data sets and grant permissions. For example, when creating a cluster, we have to create it with some API or user interface. Creating a cluster with some properties using SQL grants the possibility of using SQL syntax. Integration with SQL will make Databricks easier to use by people who have experience with databases like Lakehouse, and they would be able to use the data lake and BI. More integration will help have one point of view for everyone using SQL syntax.

Integration with Kubernetes could also be good for minimizing the price because you can use Kubernetes instead of virtual machines. But that won't be easy.

For how long have I used the solution?

I have worked with the solution for four or five years, with some experience since 2016.

What do I think about the stability of the solution?

The solution is stable. The only problem with stability would be that people are not using it efficiently.

What do I think about the scalability of the solution?

The solution is good for scalability.

How was the initial setup?

When we have administration experience, the solution is not difficult to deploy. Technically, however, it's difficult because governance is more complex. For example, I have two warehouses on Databricks, which are clusters in this workspace, and we have to switch from workspace to workspace to have all this information. There is a system table that has all this, but I don't know if everyone can use these tables.

What's my experience with pricing, setup cost, and licensing?

Databricks are not costly when compared with other solutions' prices.

Which other solutions did I evaluate?

Databricks's functionalities are as good as solutions like Snowflake, BigQuery, and Redshift.

What other advice do I have?

People sometimes do not use the solution efficiently. They misunderstand databases, the usage of tables, and the performance. Many data engineers are very junior and don't have skills in that. Stability is more a customer problem than a problem with the product itself. One possible problem with the product is that there's no method to pause the usage of something. For example, we have to use the meta server or the data catalog in Synapse. But in Databricks, we have a choice to use a catalog or not, or Hive, which is always integrated, but we have to choose whether to use it or not. Many customers directly use the passes on Databricks, which causes performance and governance problems.

I can offer a lot of advice on Databricks, and one is to use meta stores like Unity Catalog or Hive Metastore. For incoming use cases, it's better to use Unity Catalog.

I rate Databricks a nine out of ten.

Rupal Sharma

Processes large data for data science and data analytics purposes

August 15, 2023
Review provided by PeerSpot

What is our primary use case?

It's mainly used for data science, data analytics, visualization, and industrial analytics.

What is most valuable?

Specifically for data science and data analytics purposes, it can handle large amounts of data in less time. I can compare it with Teradata. If a job takes five hours with Teradata databases, Databricks can complete it in around three to three and a half hours.

So that's why it's quite convenient to use for data science, for training machine learning models. By using more computing power, you can make it even faster.

What needs improvement?

There is room for improvement in visualization.

For how long have I used the solution?

I used it for two years. I worked with the latest update.

What do I think about the stability of the solution?

I would rate the stability a nine out of ten. I didn't face performance drops.

What do I think about the scalability of the solution?

I would rate the scalability an eight out of ten.

How are customer service and support?

Databrick's support is great. If we need any support, they are very quick with it. And they genuinely want you to use Databricks. So, whatever we ask them, they come up with multiple solutions to problem statements. That's really good.

Overall, the customer service and support are very good.

Which solution did I use previously and why did I switch?

I personally prefer using Databricks. However, we also considered using Snowflake, but the pricing was different. It's price per query.

So, as per your storage, a data scientist or a data analytics team needs to query again and again, which does not suit a data-heavy organization.

What was our ROI?

It's a good return on investment for Databricks from a delivery perspective. Delivered multiple dashboards. So, it's quite a good return on investment. And being a small organization, everyone can use Databricks, and cost-wise, it's also good for small organizations.

Which other solutions did I evaluate?

If the company is a startup, Databricks might be suitable. If a big company needs a lot of storage, Teradata might be best for them. It depends on the situation.

What other advice do I have?

Overall, I would rate the solution a eight out of ten. I would definitely recommend this solution for small organizations.

Databricks Data Intelligence Platform

Reviews from AWS customer

External reviews

Lake House platform review

A Robust Solution for Big Data Management and Analytics

Experience with databricks

Databricks - A breath of Fresh air in Big Data

Data Lake but combined with Datawarehouse benefits

Databricks Genie Code - Agentic Applied AI for end-end SDL liefecycle

A Tool Box to the Modern Big Data Data Scientist

Great tool for data exploration and development, no so much for production pipelines

A powerful solution that is easily integrated into a variety of platforms

What is our primary use case?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How was the initial setup?

What's my experience with pricing, setup cost, and licensing?

Which other solutions did I evaluate?

What other advice do I have?

Processes large data for data science and data analytics purposes

What is our primary use case?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

Which solution did I use previously and why did I switch?

What was our ROI?

Which other solutions did I evaluate?

What other advice do I have?