Visual workflows have streamlined daily ETL analysis and support collaborative project work
What is our primary use case?
My main use case for Dataiku involves ETL pipelines, mainly for data analysis, and I majorly use SQL queries for that.
For ETL pipelines and data analysis, I had to create the output by combining a few datasets and then running SQL queries, applying filters, joining the tables, and so on; so I used Dataiku for that.
Regarding my main use case with Dataiku, I primarily use it for analysis only, and the visual recipes of Dataiku and the SQL query are enough for that. No challenges have occurred so far, but the only challenge is that Dataiku gets slow sometimes and lags a lot.
What is most valuable?
The best features Dataiku offers in my experience are its visual recipes, which are very easy to use for analysis.
The visual recipes are easy and useful for my analysis because the Sync recipe is very useful if I want to download a table from the cloud into the Dataiku database and schema. Other recipes such as the Prepare recipe are also very useful since you don't have to write code; it's all visual and very easy to use. Recipes such as Stack are also very useful as you don't have to write full SQL code for it, allowing you to speed up the process.
Dataiku has positively impacted my organization since we use it majorly for our day-to-day work, and it is very helpful in creating and managing ETL pipelines to create a project flow, making it easy to go back to any step and then make edits if some changes occur.
What needs improvement?
I have no suggestions for improvements because it's all good; it just sometimes lags a lot, and I don't know if the server is full or what, but it sometimes takes a lot of time while loading and refreshing the page.
No additional thoughts on improvements have come to mind, but the performance can be more optimized to reduce the waiting time. Dataiku is down a lot of times, and we have to wait for sometimes five, ten, or fifteen minutes, after which it gets working again, and during those times, we are unable to get our work done.
For how long have I used the solution?
I have been using Dataiku for four years, so my experience with it is quite extensive.
What do I think about the stability of the solution?
Dataiku is stable for most of the time, but for around ten percent of the day, it is usually down, and we are unable to do work on it.
What do I think about the scalability of the solution?
Dataiku's scalability is good.
How are customer service and support?
I have never needed the requirement for customer support from Dataiku.
Which solution did I use previously and why did I switch?
I have been using Dataiku for the last four years, and I have not used any other solution besides Dataiku.
What was our ROI?
It is a good return on investment since it helps save a lot of time, and it's easy for my teammates to work cross-functionally on the same project.
Which other solutions did I evaluate?
I did not evaluate other options before choosing Dataiku because it was all managed by my organization, so I had to use Dataiku only.
What other advice do I have?
My advice for others looking into using Dataiku is that it's a good software, and I would suggest them to keep using it since it's a very good tool for data analysis uses.
I have no additional thoughts about Dataiku; it's all very good for the use cases, but if the performance can be improved to be more stable with lesser lags, it would be much better. I would rate my overall experience with Dataiku an 8 out of 10.
Visual workflows have streamlined healthcare analytics and have reduced reporting time significantly
What is our primary use case?
My main use case for Dataiku is mostly based on the client's data where we look into life sciences data, mostly aligned to claims, campaign measurement, campaign targeting, IQVIA, LAD Epsilon data, and Komodo for instance.
Apart from this, I'm basically working on creating an end-to-end pipeline as a bundled unit project, which has been the overall case. We primarily work on Next Best Engagement and Next Best Actions, more or less aligned to the healthcare side, while sometimes working on the consumer front and on the professionals front, meaning healthcare professionals (HCP).
A specific project I built in Dataiku was on HCP campaign measurement. Our day-to-day cycle involves ingestion of data from our S3, which is the client's S3 storage. We fetch the data, perform some visual recipes to bring it onto Dataiku DSS, make preliminary changes, preprocess the data, do some data preparation, and perform feature engineering to have the final model ready dataset for modeling.
We create multiple iterations of the model where Dataiku is of great help, allowing us to try multiple modeling iterations with different hyperparameters, saving a lot of time and providing a visual overview for everyone to understand how the data is performing. Once the modeling is done, we push the data downstream through an API or use MLOps for productionization, either via CI/CD pipeline or just simple scenario triggers such as sending an email once a job gets done. This primarily results in our day-to-day activity.
What is most valuable?
The best features Dataiku offers include primarily the visual recipes, which ease data preparation greatly. It is very easy now to handle small tasks where you need to understand the shape of data; instead of writing a query, you can just use a visual recipe to create the views. You can also have multiple intermediate views, which is significantly helpful for larger streams, especially during reverse engineering.
Additionally, the automation piece and scenario triggering has been a boon for me, as my projects often involve weekly or monthly reporting. Everything is set up so that we just need a human in the loop to ensure everything follows properly, with time-based triggers automatically generating and sending reports to stakeholders.
Furthermore, the integration capabilities and the ability for multiple team members to access the same projects concurrently enhance collaboration, making it quite beneficial for data scientists such as myself as we progress in our careers.
Dataiku has positively impacted my organization, specifically in one project where we performed migration from AWS to Dataiku, speeding up the solution by close to 40%. We completed tasks that used to take 10 days in just four days. Moreover, the architecture costs associated with AWS were reduced by almost 70%, which was a significant benefit and greatly impacted our operations. This success has enabled us to pitch Dataiku to clients, who have actively incorporated it into their daily work streams, resulting in a win-win situation.
The 70% cost reduction and 40% faster delivery came primarily from the ease of use in how we were creating architecture. Since we were migrating, we leveraged the opportunity to improve and enhance the architecture. The earlier AWS architecture was hampered by multiple services leading to high costs, but moving to Dataiku streamlined everything into one platform. Consequently, the delivery time for generating reports for stakeholders decreased from 10 days to three to four days.
What needs improvement?
In terms of improvement, I cannot comment on the LLMs or the agentic view as I have not used them yet. However, I feel that better documentation is necessary. Dataiku should establish a stronger community since this is proprietary software, where users can share knowledge. Although they have some community interaction, it is often challenging to find assistance when stuck.
For example, when I was new to Dataiku and trying to use an external optimization tool such as CPLEX, I struggled with resource directory linking to a project's notebook. Detailed documentation and community discussions could have significantly alleviated these issues for users such as myself.
For how long have I used the solution?
I have been using Dataiku for close to three and a half years.
What do I think about the stability of the solution?
There were a few challenges, but they were not from Dataiku's standpoint in terms of technicality; they were more related to the rapid updates where we currently work on version 10.2, and soon, we are on version 11.4, requiring things to be redone. The support for earlier projects created on the older version is something the team could look at, as it would help if there was a backup proposition in place to avoid hampering our work due to updates.
What other advice do I have?
While I do not have a particular feature that surprised me, I found the plugins available in Dataiku to be very helpful. Not only can users leverage existing plugins, but we can also create our plugins based on the rules we use daily. This feature is quite handy and extends beyond just individual projects, as published plugins can be used by everyone across the board.
My advice to others considering Dataiku is to utilize the visual recipes, as they can significantly expedite project creation. Although the fundamental processes remain the same, leveraging elements in visual recipes can enhance efficiency, making it easier than writing code for basic queries, resulting in quicker execution. Dataiku encompasses everything from visualization to integrations and sharing the results, so once you dive in, it is important to familiarize yourself with the available features and make the most out of them.
I would rate this product a 9 out of 10.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Reusable preparation workflows have transformed recurring datasets and automate end to end projects
What is our primary use case?
My main use of Dataiku is especially data preparation. I use Dataiku a lot for preparing my data, in particular processing and transforming my datasets by using recipes, creating recipes, and especially what I really value is being able to reuse the recipes already created for preparation on another dataset.
I was preparing for my Core Dataiku certificate, and all of the modules were focused on data preparation. I load the data into Dataiku, then I use the recipes and tools to add columns, unpivot columns, delete, and transpose columns so I can format them. Then I create groups of recipes and I also reuse them by importing another dataset into Dataiku, which gives me the ability to save time. I don't have to redo all the previous processes since I already have a recipe for data preparation that I can reuse.
After data preparation, I had the opportunity to carry out an end-to-end project with Dataiku. This involved first the data preparation and then I went on to set up a model to predict a stock index. I used the machine learning models for this project.
Let's suppose I have datasets that I use every time, and each time I'm going to check the data formats to format a certain number of columns, for example dates, to see if they are in date format or not, delete certain columns, rename certain columns, transform the data, and clean them. If I've done all these steps once and I manage to put all these recipes into a group, next time it's an enormous time saver not to have to repeat these steps one by one, but to use directly the recipe or the group of recipes created.
I want to emphasize again the recipes and how we can reuse them with Dataiku. In most data projects, data preparation takes a huge amount of time for professionals, and sometimes unfortunately we repeat the same tasks. Dataiku really brings a solution to this in that we can create groups of recipes or recipes that we can reuse. The reuse of recipes is already a significant advantage. Additionally, Dataiku is one of the rare platforms that offers end-to-end services where we can carry out a data project from start to finish. For example, to carry out my personal project on predicting a stock index, I did practically everything on Dataiku, from data preparation to putting my model into production.
For example, we have an Excel file that will be incremented every month, a file that comes from outside where the format of the file is exactly the same. Each time we do the transformation, but we did it only on the first file. When the others come in, we apply the old recipes. This allowed us to save an enormous amount of time because we were able to automate everything with Dataiku. As soon as the file comes in, the recipes are automatically applied to these files. We no longer have to intervene at the end of each month and spend twenty to thirty minutes cleaning the files. Additionally, it ensures the validation of our data and consistency between the files.
What is most valuable?
What I especially prefer about Dataiku are the recipes for data preparation and also the feature we have to create groups of recipes and also to be able to reuse them again.
Once you get the hang of Dataiku, learning the features is also intuitive. It's really a very intuitive platform.
Dataiku is very scalable. It can easily adapt to the expansion of our datasets and it is very powerful. If we have more and more data, Dataiku is very scalable.
What needs improvement?
Currently, Dataiku is a platform that is almost perfect, and I don't see how to improve it further. I don't have suggestions for potential improvements.
Maybe on the interface in general, the information can easily get lost. If we could summarize the tools bar in a more organized way than what we currently have, that would be helpful.
For how long have I used the solution?
I have been using Dataiku for one year and a half, and I already obtained my Core Dataiku certification about a month ago.
What's my experience with pricing, setup cost, and licensing?
The licenses are a bit high for companies that are still hesitating to get started with using Dataiku. For my personal projects, I used the thirty-day free trial. Regarding my company, I did not have access to this pricing information.
Which other solutions did I evaluate?
Given our needs, the best tool, despite the licenses and the cost of the license, Dataiku turns out to be by far the favorite tool compared to the others.
What other advice do I have?
Dataiku is really a very intuitive platform. It allows you to carry out data projects from end to end. We also have the opportunity to reuse templates, models, and recipes. That's one of the big advantages of using Dataiku.
In the context of my personal projects, I developed a pleasure in using Dataiku, which is not the case for other tools. Because the platform is intuitive, I can easily guide myself through it.
I find the documentation on Dataiku very informative and also very instructive.
I would tell others to go for it if Dataiku truly meets their needs. It's the best offer on the market with good documentation. My overall rating for Dataiku is nine out of ten.
Dataiku:A plug in tool for Data Science
What do you like best about the product?
What I like most about Dataiku is how it brings the entire data workflow into one place. It allows teams to easily prepare data, build machine learning models, and deploy them without switching between multiple tools. The visual interface makes it easy to understand data pipelines, while still allowing advanced users to write code when needed. This balance between visual tools and coding flexibility makes collaboration between data scientists, analysts, and engineers much smoother. It helps teams move faster from raw data to real insights and production-ready models.
What do you dislike about the product?
One thing I dislike about Dataiku is that it can feel a bit heavy and complex, especially when working with very large datasets or many workflows. Sometimes the interface becomes slower, and managing multiple projects can get confusing. Also, while the visual tools are helpful, certain advanced customizations still require coding, which might be challenging for non-technical users. Overall, it’s a powerful platform, but there is a bit of a learning curve when you first start using it.
What problems is the product solving and how is that benefiting you?
Dataiku helps solve the problem of managing the entire data and machine learning workflow in one platform. Instead of using separate tools for data preparation, analysis, model building, and deployment, Dataiku brings everything together. This makes it easier to organize projects, track data pipelines, and collaborate with other team members.
For me, it has been helpful because it simplifies the process of turning raw data into useful insights and models. It also improves collaboration between technical and non-technical teams, since analysts can use the visual interface while data scientists can still write code when needed. Overall, it helps speed up the development process and makes data projects more structured and easier to manage.
Dataiku: User-Friendly Collaboration Across the Full Data Lifecycle
What do you like best about the product?
What I like most about Dataiku is its user-friendly interface and strong collaboration features. It makes it easy for data scientists, analysts, and engineers to work together on the same projects. I also appreciate that it supports the full data lifecycle, from data preparation to machine learning and deployment.
What do you dislike about the product?
One thing I dislike about Dataiku is that it can be quite demanding on system resources, especially when I’m working with large datasets. In addition, some of the more advanced features come with a learning curve, so it can take time to fully understand how to use them effectively.
What problems is the product solving and how is that benefiting you?
Dataiku addresses the challenge of fragmented data workflows by bringing data preparation, analysis, machine learning, and deployment together in a single platform. It also makes it easier for teams to collaborate and automate key processes. For me, this translates into time savings, better productivity, and data projects that are simpler to manage end to end.
A Unified Platform That Bridges Data Experts and Business Teams Seamlessly
What do you like best about the product?
Its greatest strength is enabling true collaboration between data experts and business teams on a single platform. It seamlessly bridges technical work like coding and ML engineering with visual and no-code interfaces. This breaks down silos, accelerates project delivery and ensures AI solutions are built with crucial business context, making them more impactful and sustainable.
What do you dislike about the product?
For smaller teams or simpler projects, Dataiku will be premium. The platform's extensive features come with inherent complexity, which can lead to a steeper learning curve. Its pricing model is often seen as enterprise-focused, potentially making it less accessible for startups or individual users who don't need its full collaborative scale.
What problems is the product solving and how is that benefiting you?
Dataiku solves the critical challenges of fragmented data science workflows. It provides a unified, collaborative platform that connects data preparation, experimentation and deployment into one governed environment. This directly benefits us by drastically reducing project lead times, improving model governance and reproducibility and enabling both technical and business users to contribute effectively to data-driven outcomes.
End-to-End Data Science Platform That Makes Collaboration Easy
What do you like best about the product?
What I like best about Dataiku is its end-to-end data science and machine learning platform that brings data preparation, analysis, model building, and deployment into a single environment. The visual workflows combined with code-based options make it accessible for both technical and non-technical users. It also supports strong collaboration between data scientists, analysts, and business teams, which helps speed up model development and improve decision-making.
What do you dislike about the product?
While Dataiku is a powerful platform, it can feel complex for first-time users because of its extensive feature set. The initial setup and learning curve may take time, especially for non-technical users. In some cases, performance can slow down when handling very large datasets, and the pricing structure may not be ideal for smaller teams or limited use cases.
What problems is the product solving and how is that benefiting you?
It's solves the challenge of managing the entire data science and machine learning lifecycle in one platform. It brings together data preparation, analysis, model development, deployment, and monitoring, reducing the need for multiple disconnected tools. This benefits me by improving collaboration between teams, speeding up model development, and making it easier to turn data into actionable insights while maintaining consistency and governance across projects.
Effortless Data Collaboration with Robust Features
What do you like best about the product?
I like that Dataiku lets me handle data projects and build machine learning models by pulling in data from different sources, cleaning and organizing it, and experimenting with models all in one place. The combination of a visual interface with coding options makes it accessible for both technical and non-technical team members, smoothing out data project management. I love how it reduces repetitive tasks, decreases mistakes, and keeps complex projects organized and running smoothly. It's great that everyone on the team can contribute, no matter their technical skills, making data work easier and less stressful.
What do you dislike about the product?
One thing I’ve noticed about Dataiku is that it can feel a bit overwhelming at first because there are so many features and options. Working with really large datasets or complex workflows can sometimes be a little slow. I also think it could be a bit easier for new users to get started. Overall, it’s a great tool, but a little more guidance and smoother performance would make it even better.
What problems is the product solving and how is that benefiting you?
I use Dataiku to streamline data projects by integrating data sources, cleaning data, and building models in one platform. It allows team collaboration regardless of technical skills, saves time on repetitive tasks, reduces mistakes, and keeps complex projects organized.
Low-code projects have empowered non-technical teams and now need better integration and visuals
What is our primary use case?
My main use case for Dataiku is data science and AI projects.
We used Dataiku for a demand forecasting project where the objective is to forecast the demand for each product for the next three months.
What is most valuable?
The best features Dataiku offers include the ability for users to use the node without having to code and the functionality related to low-code/no-code.
Dataiku has positively impacted my organization by allowing non-technical users to adapt a data science project and to maintain a part of a data science project.
What needs improvement?
I think a pain point related to Dataiku is the visualization, which is not straightforward, and the integration, which is also not straightforward for non-technical users.
To improve Dataiku, the company could enhance the capabilities related to integration and visualization.
For how long have I used the solution?
I have been using Dataiku for three years.
What do I think about the stability of the solution?
Dataiku is stable.
What do I think about the scalability of the solution?
Dataiku's scalability can be better.
How are customer service and support?
I have never tried Dataiku's customer support.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
Before, we used a solution that I cannot mention, but the change is more related to using a more straightforward solution for non-technical users.
Before choosing Dataiku, I evaluated KNIME.
What was our ROI?
I have not seen any specific outcomes or metrics such as time saved, reduced costs, or improved project delivery.
I have not seen a return on investment with Dataiku in terms of time saved, money saved, or fewer employees needed.
What's my experience with pricing, setup cost, and licensing?
I am not the person involved in the process regarding pricing, setup cost, and licensing.
What other advice do I have?
My advice to others looking into using Dataiku is to use it principally to help and support non-technical users.
Dataiku is deployed in my organization on a public cloud on Amazon Web Services.
Amazon Web Services is our cloud provider.
I am not the person involved in the process of determining whether we purchased Dataiku through the AWS Marketplace.
My review rating for Dataiku is 7.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Flow-based demand forecasting has improved collaboration but still needs better visualization options
What is our primary use case?
My main use case for Dataiku is for data science and AI projects. I use Dataiku for a demand forecasting use case where the objective is to predict the demand for each product for the next four months. Demand forecasting is the primary focus where I use Dataiku.
What is most valuable?
The best features Dataiku offers that help me with my demand forecasting and data science projects include having a complete overview of the flow directly from the flowchart, allowing me to observe all the steps in a single overview, and the ability to use a no-code, low-code node.
Having that flow overview and the no-code, low-code nodes makes my work easier by allowing me to use a simple function without coding directly, meaning I can avoid using Python. In 80% of the project, we are using Python, but for very simple steps, we also use a low-code, no-code node, which can be simpler for users that are not technical and may want to do some preprocessing steps.
Dataiku has positively impacted my organization, but it is a tool that is very similar to others and it helps for what I mentioned before and not for other areas. The ability to use low-code or no-code nodes is more a convenience in that case, mainly for a non-technical user. We deliver this kind of solution for a client where the user is not so technical, and for this reason, it is better to have this kind of flow and tool.
What needs improvement?
To improve Dataiku, it could enhance its visualization features, as it is not possible in Dataiku to create direct visualizations or to integrate a web app directly or in a simpler way as it is possible for a preprocessing step. Visualization and integration are the main areas I would like to see enhanced.
In my experience, Dataiku can be more stable.
For how long have I used the solution?
I have been using Dataiku for two years.
What do I think about the stability of the solution?
In my experience, Dataiku can be more stable.
What do I think about the scalability of the solution?
Dataiku's scalability is not one of the best solutions to scale.
Which solution did I use previously and why did I switch?
We used a lot of other solutions before Dataiku and we switched only so that non-technical users can improve and maintain this kind of flow.
What other advice do I have?
My advice to others looking into using Dataiku is to use it for a simple flow in data science and to teach how to make a data science project or flow for non-technical users. I would rate this product a 7 out of 10.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?