WhyLabs AI Observatory: The Data and ML Monitoring Platform logo

    WhyLabs AI Observatory: The Data and ML Monitoring Platform

    Sold by
    Model monitoring, data health, data drift detection, and AI observability.

    Ratings and reviews

    4.6
    28 ratings
    2 star
    1 star
    86%
    11%
    4%
    0%
    0%
    1 AWS reviews
    |
    27 external reviews
    External reviews are from G2 .

    Filters

    Review type

    AWS Marketplace reviews
    External reviews
    Reviews (28)
    Akashkhurana Hirana

    Monitoring multi-agent LLM workflows has become reliable and protects PII in real time

    Reviewed on Jun 29, 2026
    Review from a verified AWS customer

    What is our primary use case?

    My main use case for WhyLabs was for LLM monitoring and observability. At that time, I had an AI application that I deployed on Vertex AI, and I used WhyLabs for the observability, logging, and monitoring of that application and the model.

    I can provide a specific example of how I used WhyLabs for monitoring my LLM application. It was a multi-agent system with around four agents involved, and each agent had around seven or eight tools that it could use or invoke. Whenever a user sent a query to the main agent, its responsibility was to delegate the request among the other sub-agents. Each sub-agent could communicate with each other using the A2A protocol and call their tools. I monitored how the request progressed through the system. For instance, if a user sent a request to one agent, which then transferred it to a third agent, the third agent used a tool, and then it went to the seventh agent. I could easily monitor all this communication between the agents, the logging time, the request, the response, any errors, and any guardrails I wanted in my application in WhyLabs.

    This was my only use case, and then WhyLabs got discontinued. WhyLabs was acquired by Apple in January or February 2025. The company then open-sourced their software so that anyone can use it. It is now open-source software available on GitHub where you can set it up yourself and use it.

    What is most valuable?

    WhyLabs's best features are real-time guardrails, PII personal information data detection, hallucination mitigation, and monitoring. It has a centralized dashboard so I can create a project and see an overall summary of the dashboards, and I can check the health metric on specific dates or specific times for WhyLabs or for the application. Additionally, it provides an alerting system. If there is an error or the system is down, it generates an alert via email.

    Out of all those features, I find the PII detection and the monitoring most valuable in my day-to-day work because it is very hard to monitor an LLM application. As I mentioned earlier, it was a multi-agent system and a query can go from one agent to another agent very easily, which created problems in debugging how the request was progressing and how the data flow was happening. The monitoring and the PII detection of the guardrails are the three features most useful to me. Regarding the guardrails or the PII detection, if I do not want my PII data given to the agents or any LLM, this feature is particularly useful in that scenario.

    WhyLabs has positively impacted my organization by reducing the error time and debugging time. It has increased and enhanced the user experience. When the application is down, I receive alerts, which has reduced a significant amount of time for my team.

    What needs improvement?

    Regarding how WhyLabs can be improved, since it is not available in the market as of now, improvements cannot be made to the product itself. However, there is an open-source version that anyone can set up on their machine and try to accomplish the same things.

    I do not think there is anything else needed for improvement.

    For how long have I used the solution?

    I was using it in 2024 for around 1.5 years.

    What was our ROI?

    WhyLabs has saved my team time by 30 to 40%.

    What other advice do I have?

    Regarding WhyLabs's AI capabilities, I believe its governance and security are totally secured. It was deployed in our on-premises infrastructure, so all the data remains in our infrastructure only. The guardrails and the PII detection work perfectly. I have not seen any scenario where it has not generated an alert for PII data or the guardrails have not worked, so it performed very well.

    In terms of WhyLabs's AI capabilities, I believe it is totally accurate. I used it for around 1.5 years, and it was the best software available, but it was discontinued. However, it was a very good software.

    My advice to others considering WhyLabs is that as of now it is open-source, and you can set it up on your own machine for free and use it. It has very good features. I would rate this product a 10 out of 10.

    Which deployment model are you using for this solution?

    On-premises

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Consulting

    I built a monitoring solution with Whylabs for multiple ML models for a client of mine.

    Reviewed on Nov 21, 2024
    Review provided by G2
    What do you like best about the product?
    The UI is very user friendly and the support team is extremely responsive and helpful.
    What do you dislike about the product?
    Some features such as deleting a profile or changing the data type of a feature can only be done through the API.
    What problems is the product solving and how is that benefiting you?
    Data and concept drift detection
    Model performance monitoring
    Internet

    Whylabs helped us setup end-to-end monitoring of our ML projects

    Reviewed on Nov 19, 2024
    Review provided by G2
    What do you like best about the product?
    * The customer support is very helpful and proactive
    * Tool allows for easy ingestion of big number of features and setting up initial monitoring on them
    * We can use it to monitor both: input quality and model performance
    * The alerts can be raised to specific group of users via specific channels (email/slack), which is helpful
    What do you dislike about the product?
    * It can be challenging to setup the monitoring in the correct way when it comes to sensitivty - it requires a lot of trial and error
    * Some actions are not possible via UI and require specific API calls
    * Documentation can be hard to navigate
    What problems is the product solving and how is that benefiting you?
    Monitoring model performance and input data quality in one place.
    Houssam K.

    Excellent tool for ML Monitoring with many out-of-the box solutions

    Reviewed on Nov 14, 2024
    Review provided by G2
    What do you like best about the product?
    Great to collaborate with; very responsive; really appreciate their OHs to help out with issues that pop up; many out-of-the-box solutions for different kinds of ML models which really helped us out given the wide variety of ML models we run at the company.
    What do you dislike about the product?
    Nothing major to mention! We got everything resolved and the team was very helpful.
    What problems is the product solving and how is that benefiting you?
    Data Drift and ML Monitoring
    Rafael S.

    Developed efficient solutions for optimizing ERP workflows through data analysis

    Reviewed on Sep 18, 2024
    Review provided by G2
    What do you like best about the product?
    One of the standout features of WhyLabs is its robust data observability capabilities. It provides continuous monitoring of data pipelines and ML models, allowing teams to quickly identify issues like data drift, model degradation, and training-serving skew. The platform's privacy-preserving integration ensures that data can be analyzed without moving or duplicating it, which is critical for maintaining security and privacy in sensitive industries like healthcare and finance​
    What do you dislike about the product?
    One potential drawback of WhyLabs is its relatively limited user reviews and feedback due to its newness in the market, making it harder for potential users to gauge its real-world performance at scale. This lack of detailed reviews can raise concerns about its maturity and support infrastructure​.Additionally, since it’s a newer platform, some advanced features might still be in development, and there could be steep learning curves for teams unfamiliar with observability tools in machine learning​.
    What problems is the product solving and how is that benefiting you?
    Data quality issues: It helps detect and address data drift and data integrity problems early, which is crucial for maintaining accurate and reliable ML models​
    Biotechnology

    Reliable AI Monitoring with Some Complexity

    Reviewed on Sep 13, 2024
    Review provided by G2
    What do you like best about the product?
    I like the privacy preserving solutions for scaling AI models. I like that WhyLabs offer responsive support and detailed documentation.
    What do you dislike about the product?
    I dislike that the platform might be overly technical for users who are not well-versed in AI or data science
    What problems is the product solving and how is that benefiting you?
    WhyLabs helps me solve issues like data drift and performance degradation in my AI models. This is crucial because I am working with sensitive medical data.
    Consulting

    Self-Serve Observability Platform

    Reviewed on Mar 05, 2024
    Review provided by G2
    What do you like best about the product?
    WhyLabs is the second observability platform I have ever used, and I can say the core features I like about the platform is that it is easy to set up and implement the features, the checks and metrics were already pre-loaded so I did not need to do much in configuring the application, and monitoring was not difficult to get started with. It also integrates well with the serving and data libraries we used for the production tutorial setup.
    What do you dislike about the product?
    Nothing so far, I only experienced a stability issues once (sometime in 2022), but support was able to help me quickly fix it.
    What problems is the product solving and how is that benefiting you?
    Since 2022, I have sparesely used WhyLabs to monitor the quality of datasets for one client and 2 customers (because it was not their core requirment but a nice piece of their stack to have).
    whylogs seemed like the perfect choice for a consultant that clients did not want to entirely release their data to; I found that it only captures the profile and stats info instead of the raw data here.

    Rcently, I started testing out LLM security features with LangKit and I cannot believe how quick it is to use. I followed a workshop few months ago that showed me how to detect jailbreak attempts and toxicity in LLM inputs and outputs using LangKit. Took that learning and now with a client's project, we have tested out logging the telemetary data from the evaluation to WhyLabs. Looks good so far, so once I upgrade the pricing limit for this client, we plan to scale our usage here. Excited about this one.
    Federico G.

    Top notch features at an affordable price

    Reviewed on Mar 05, 2024
    Review provided by G2
    What do you like best about the product?
    I've used WhyLabs for a few weeks and I was extremely pleased with it!
    I will evaluate some dimensions of the tool that summarize my experience with it.

    Easy Data Ingestion:
    The ingestion API is straightforward to use and supports multiple connectors such as BigQuery, Databricks, and Spark, making data importation easy. Whylabs' use of Data Profiling ensures fast and secure data processing, eliminating the need to upload entire datasets, and making all the process very secure, since your data doesn't leave your servers.


    Reliable Data Features:
    Whylabs delivers all standard feature metrics accurately. Tracking data and model drift is very straightforward using Monitors.
    Also, the platform supports custom metrics creation during or after ingestion.
    Grouping by variables (segments) works well but must be defined during ingestion. Then you can analyze dataset features and track model performance per segment.


    Flexible Monitors:
    The monitoring system in Whylabs is highly adaptable and user-friendly, covering multiple variables with ease.
    Monitors are easy to set up via the UI or JSON import, with summarized notifications for each monitor, keeping users informed without overwhelming them.
    Additionally, monitors are JSON serializable, which is very helpful since you can track them with version control.


    User-Friendly Usability:
    Whylabs have a clean and intuitive UI, simplifying navigation for users.
    While some advanced features may require programming knowledge, most tasks can be accomplished within the UI.
    Thanks to data profiling, Whylabs delivers speedy performance without compromising on accuracy.


    Solid Documentation:
    The documentation provided by Whylabs is comprehensive and easy to understand, enabling users to make the most of the platform.


    Pricing:
    It's simply cheaper than its competition while having top notch features.


    Customer Support:
    They are always very helpful, answering all our questions and having several calls showcasing us different uses cases directly on the platform.


    Overall, Whylabs offers a straightforward, efficient and affordable solution for monitoring Machine Learning models, with easy data ingestion, reliable feature analysis, and flexible monitoring options.
    What do you dislike about the product?
    There are a few cons:
    - Dashboards are in beta, and while functional, they lack polish in terms of user interface. They are working actively on this, so probably a few months after this review this may be already fixed.
    - Defining groupings by variables must be done at ingestion time, limiting flexibility for post-ingestion analysis.

    That being said, they are very open to feedback and they may change or add features based on your needs.
    In our case, dashboards were important and they are working on them.
    What problems is the product solving and how is that benefiting you?
    It solves most of the ML model monitoring needs that ML models often have while being affordable.
    Zaid A.

    Simplifying Complexity in LLM Monitoring

    Reviewed on Mar 05, 2024
    Review provided by G2
    What do you like best about the product?
    WhyLabs is an exceptional observability tool for applications built with large language models. Its ease of use enables us to integrate langkit into the existing architecture. Additionally, they have an excellent customer support community.
    What do you dislike about the product?
    Building custom applications with large language models is a challenging experience. Better documentation about how to effectively use Whylabs in monitoring applications with LLMs would be helpful.
    What problems is the product solving and how is that benefiting you?
    WhyLabs is very helpful in monitoring applications built with large language models by providing a highly simplified observability tool. This has overcome the challenges we were facing with our productionized application.
    Information Technology and Services

    Monitoring LLMs for succees!

    Reviewed on Mar 04, 2024
    Review provided by G2
    What do you like best about the product?
    The team behind WhyLabs is awesome. I like how easy it is to get started with their platform, their commitment to an open-source approach, and their active engagement with the AI community with regular workshops and education around cutting-edge monitoring and evaluation techniques.
    What do you dislike about the product?
    Having more flexibility in visualizations and easier ways to share them outside the product would be nice.
    What problems is the product solving and how is that benefiting you?
    I use WhyLabs to help keep a pulse on LLMs by monitoring valuable metrics such as jailbreak scores, sentiment, toxicity, and readability. This has helped me catch problems early and gives me some good metrics to compare when finetuning models or adjusting prompts.