PagerDuty Operations Cloud
Centralized alerting has streamlined on-call workflows and reduced incident response times
What is our primary use case?
What is most valuable?
PagerDuty Operations Cloud is part of our daily operational workflow. It sits between monitoring tools and response teams, ensuring alerts reach the right people without delay. We use it for on-call scheduling, incident escalations, and coordinating responses across teams. Having everything centralized has reduced alert fatigue and helped us respond to issues more consistently, especially during off-hours and high-priority incidents.
PagerDuty Operations Cloud offers intelligent alerting, on-call scheduling, automated escalations, and incident management as its best features. The platform makes it easy to ensure alerts reach the right person, and escalation policies prevent critical issues from being missed. We also rely heavily on its integration with monitoring and collaboration tools and its real-time visibility into operations. Together, these features help our team respond faster, stay organized during an incident, and reduce service disruptions for our customers.
What needs improvement?
While PagerDuty Operations Cloud is strong overall, there are a few areas for improvement. The initial setup and configuration can be complex, especially for teams managing multiple services, escalation policies, and integrations. Some reporting and analytics features could offer more customization without requiring additional configurations. The mobile app works well for alerting, but managing more advanced settings is generally easier from the web interface. It would also be helpful to have more out-of-the-box workflow templates and automation recommendations to simplify onboarding for new teams.
To make the daily workflow smoother, simplifying the user interface for certain administrative tasks would be a significant improvement. Sometimes, navigating the settings to adjust on-call schedules or escalation policies can take a few extra steps, particularly for large environments. More customizable dashboards and easier reporting for non-technical stakeholders, along with additional guided recommendations for alert tuning, could help teams get even more value from the platform. These are relatively minor points, but addressing them would make an already great tool even more user-friendly.
I did not give PagerDuty Operations Cloud a perfect rating because there is still room for improvement in areas such as reporting flexibility, dashboard customization, and simplifying certain administrative tasks. Overall, it is a mature and dependable platform that positively impacts our work.
For how long have I used the solution?
What do I think about the stability of the solution?
What do I think about the scalability of the solution?
How are customer service and support?
Which solution did I use previously and why did I switch?
How was the initial setup?
What about the implementation team?
What was our ROI?
Which other solutions did I evaluate?
What other advice do I have?
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Intelligent alerts have protected revenue and now drive faster incident triage with AI guidance
What is our primary use case?
I have been using PagerDuty for the last nine years, but PagerDuty Operations Cloud for over one and a half years.
We work directly with merchants and need to trigger immediate alerts whenever there are 5xx errors or business errors like 4xx issues, as well as payment failures. We have configured every alert on a data log in some other monitoring tools that are integrated with PagerDuty. We receive alerts very immediately and trigger calls and Slack notifications. We integrate everything with PagerDuty and get notifications instantly, after which we start our triage process.
One use case I can mention is when we have an auth rate dip. Whenever there is an auth rate dip, we run into revenue losses with the merchants or partners that PayPal currently works with. Since everything is integrated, PagerDuty Operations Cloud catches when there is an auth rate dip for particular merchants and immediately triggers a notification for us. We then immediately dive into what the problem is and figure out how to fix the issue with the help of engineering teams.
What is most valuable?
PagerDuty Operations Cloud is one of the best tools we have seen because it is already integrated with AI. We use it as a barrier tool, meaning it is the top tool that we consider and we get notified when there is an issue.
The best features include integrating with any tool and analyzing all previous alerts that have been stored. When an alert occurred on a particular day, we can immediately be notified on Slack with historical data and, since it is integrated with AI, we receive suggestions on how it can be resolved, how it was resolved earlier, and who resolved it. These are the very best features we have seen on PagerDuty Operations Cloud.
Since we have historical data showing when an alert has triggered on a particular day, we can turn it into a problem incident and work with the relevant teams to get it fixed completely so it does not reoccur. We are recording these kinds of repetitive issues using that feature.
It is very helpful that we can integrate with numerous monitoring tools such as Datadog, Splunk, and Kibana. Since we have integrated many other tools, I feel this is one of the features that PagerDuty Operations Cloud offers that makes it great.
What needs improvement?
Since PagerDuty Operations Cloud is already equipped with the latest technologies, I do not feel that anything more needs to be added, including summarizing content, as it is already available. Since it is already connected with AI, I do not feel that any other features could be added, so I do not have a concrete answer right now since we already have a number of features available and this is already a highly improved state.
While PagerDuty has comment functionality, a chat option would be a potential addition.
For how long have I used the solution?
I have been using PagerDuty for the last nine years, but PagerDuty Operations Cloud for over one and a half years.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is highly accurate and there are no issues with the accuracy. It is highly reliable in terms of alert triggering and we do not get any false alarms, with only very minimal ones based on our internal signals. We do not have any complaints about PagerDuty Operations Cloud.
What do I think about the scalability of the solution?
PagerDuty Operations Cloud definitely increases efficiency for us. Since we do not have much manual work with workflows and everything is automated, it definitely helps.
Which solution did I use previously and why did I switch?
We are using only PagerDuty and do not have any other tool in use. There is no other tool that can match PagerDuty Operations Cloud.
What was our ROI?
We definitely have an ROI in terms of earlier requiring multiple employees. Since we are now using AI, we have reduced our staffing needs and can save a lot of time and money as well.
Which other solutions did I evaluate?
There is no other tool that can match PagerDuty Operations Cloud.
What other advice do I have?
Earlier, PagerDuty Operations Cloud was just notifying incidents, but now it is showing historical data and we can see how it was resolved earlier and quickly get notes from that to resolve issues with the historical data and suggestions.
Earlier, when there was an auth rate dip or different signals that we received through Datadog or different platforms, we used to have some false alarms. Now, everything we are using is AI-based with agents that were configured with those signals. We have very accurately configured the AI using factors such as holiday seasons that will have high traffic, and everything was configured with historical data. We are getting very solid results and signals.
Since PagerDuty Operations Cloud has all the data and provides forward-looking resolution steps and information about which team was involved, PagerDuty AI helps us tremendously.
We definitely do not have any revenue loss since we are getting accurate signals and alerts and have a solution for all configured alerts.
Since it has all advanced features integrated with AI, I am really impressed with the ability to integrate with numerous monitoring tools very easily and the ease of onboarding any member to PagerDuty Operations Cloud. Setting up the alerts and everything is very easy with a number of monitoring tools. That is why I rated this product a nine out of ten. There is no other tool that can match PagerDuty Operations Cloud right now.
We have a number of layers in terms of governance and security since we are a payment gateway. PagerDuty Operations Cloud has its own governance and security at a great level, so we do not need to think about any security concerns from PagerDuty Operations Cloud governance.
Since it already has AI features, I am going to recommend others to use PagerDuty Operations Cloud. I rate this solution a nine out of ten.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Unified alerting has improved incident response and enabled proactive, multi‑channel notifications
What is our primary use case?
I primarily use PagerDuty Operations Cloud for alert management and incident call rotations. In my earlier firm, we managed rotation shifts across three time zones: EMEA, APAC, and New York time. All rotation and shift management was handled through PagerDuty Operations Cloud schedules. Application monitoring was also updated through PagerDuty Operations Cloud. According to the schedule, we updated people's contact information so that in case of any issues, the contact would be transferred to the respective shift member. We also managed escalations with five layers of escalation. If a first team member missed an alert, it would go to the second team member after 10 minutes, then to the next person after five minutes, continuing according to the priority of the service.
What is most valuable?
The most valuable features I found were the integration capabilities and notification system. We used the open-source tool Alertmanager, which triggers health metrics from Prometheus and Splunk. PagerDuty Operations Cloud allowed us to integrate alerting seamlessly and notify users effectively, which helped the business significantly. Early detection of issues leads to better service provision. PagerDuty Operations Cloud provides multiple notification channels including SMS, phone calls, and email, which I found to be the best part of the platform.
Regarding the autonomous AI agents, I have not explored them because the AI trend started recently and I have been out of touch for the last seven or eight months. However, I have read about how AI integrates with the scheduling part. Previously, we had to manually update schedules every week, but with AI integration, we can write a prompt and build MCPs. Some firms I read about integrated an MCP they built in-house, and with the MCP, they can provide an Excel sheet or image, and PagerDuty Operations Cloud API can update everything without needing to manually access the platform.
We implemented automation through PagerDuty Operations Cloud for incident response. Previously, we had to manually update service level details, SLAs, notification mechanisms, and API keys. Now we can submit an Excel sheet or CSV file and make an API call using Python, which updates everything automatically. PagerDuty Operations Cloud also helps with analytics by showing how many alerts were triggered, how many were resolved, and which person handled which alert. This visualization helps us demonstrate to clients that we managed a certain number of alerts and reduced the alert count.
What needs improvement?
PagerDuty Operations Cloud has been excellent so far. Over the last six months, generative AI could help further. Some organizations are using their own MSP engines, but if PagerDuty Operations Cloud provides in-house MCP tools integrated with GenAI, it would be better for end-users. Integrating with in-house tools and something already provided by PagerDuty Operations Cloud would make a difference. I am not certain if this has been explored in the last six months, but this is an area PagerDuty Operations Cloud could improve.
For how long have I used the solution?
I used PagerDuty Operations Cloud for approximately 3.5 to four years.
What do I think about the stability of the solution?
PagerDuty Operations Cloud provided notifications early when issues occurred. We used PagerDuty Operations Cloud's status page as our first source of truth to check for existing or ongoing issues. If no issues were listed there, we reached out to a dedicated account manager who would connect us with the concerned team. We rarely encountered any operational issues with PagerDuty Operations Cloud because it was always working. We experienced only one or two latency issues, which were due to underlying cloud infrastructure issues rather than PagerDuty Operations Cloud itself.
What do I think about the scalability of the solution?
PagerDuty Operations Cloud maintained good scalability levels. We started with a beta phase with approximately 50 to 60 members, then moved to a development stage where we increased to 150 people. The platform remained stable as we expanded. We ran some instances on-premises, which required high security, and others on cloud premises for client-facing deployments. We experienced no issues with scalability on either on-premises or cloud deployments, and integration was seamless in both cases.
How are customer service and support?
When making on-premises installations, we connected with PagerDuty Operations Cloud's technical support. They guided us on setup and what to take care of during installation. We had two or three calls with them, and they were very helpful throughout the process.
Which solution did I use previously and why did I switch?
Before PagerDuty Operations Cloud, we used Alertmanager, which triggered only email notifications and not calls or SMS. PagerDuty Operations Cloud introduced the calling mechanism and SMS capability, which was innovative compared to what we had seen with open-source tools.
How was the initial setup?
The initial setup process involved starting with PagerDuty Operations Cloud's cloud offering. We purchased a plan and set up our account. During actual deployment, we purchased a license with our own DNS, meaning instead of using pagerduty.com, we mapped our own subdomain to our environment. We then created licenses for individual users, starting with approximately 150 members from our technical support team and L1 engineers. We gradually increased our user count rather than immediately granting licenses to thousands of people because they would have received spam calls. We started with 50 to 60 members for a trial to understand how the system should behave and how we could optimize it.
What about the implementation team?
We handled the initial setup and installation of PagerDuty Operations Cloud ourselves, although we received support. When we signed up, the PagerDuty Operations Cloud team called to offer assistance. They set up a demo for our team, but we proceeded with the installation ourselves since we had prior knowledge before starting.
Which other solutions did I evaluate?
We evaluated other options before choosing PagerDuty Operations Cloud. We attempted to build our own solution using an existing open-source tool, but the latency issues made it not time and cost-optimized. Since a stable product like PagerDuty Operations Cloud already existed, investing two to three years in building our own solution did not make sense. We also explored building a Python solution using Alertmanager before deciding on PagerDuty Operations Cloud.
What other advice do I have?
I have not explored the generative AI capabilities of PagerDuty Operations Cloud. PagerDuty Operations Cloud delivers very high performance when notifying users, especially in high-frequency trading environments where even a second of delay can result in billion or trillion dollar transaction losses. The notification service and seamless integration across different team layers provide significant value. Although open-source tools are available, they are not as effective as PagerDuty Operations Cloud.
Regarding alert fatigue and incident costs, when onboarding new clients in my previous project, I demonstrated our capabilities using incident management charts to showcase our skills. We showed clients how many alerts triggered daily, weekly, or monthly before PagerDuty Operations Cloud, and how we reduced them to monthly or bi-weekly intervals based on specific conditions. This data helped us acquire deals.
PagerDuty Operations Cloud improved my team's ability to focus on core tasks rather than routine issues. Initially, my team was exploring multiple notification and monitoring options and building their own tools. With PagerDuty Operations Cloud as an organization-level mandate, instead of managing different tools across ten teams, we now use one standard tool. This has allowed the team to focus on other important tasks since this major challenge has been resolved.
Regarding preventing costly incidents, I would emphasize business trust more than direct cost savings. We earned significant client trust by detecting issues early and informing clients promptly, allowing them to manage their side of any issues. On multiple occasions, we caught issues before business hours and clients were appreciative of our proactive approach.
I am not aware of the specific pricing and licensing details of PagerDuty Operations Cloud as that is managed by our management. From what I have heard, the business plan is not very expensive. I have not explored individual pricing since our organization was large with dedicated departments handling such decisions. My review rating for PagerDuty Operations Cloud is eight out of ten.
On-call alerts have ensured critical issues are addressed faster and teams focus on core work
What is our primary use case?
I usually use PagerDuty Operations Cloud for the notification of high-priority incidents within the infrastructure.
I also use it for escalating to the on-call members, scheduling the priority of incidents or issues within the infrastructure, and creating scheduled rotations for team members.
What is most valuable?
The most valuable feature of PagerDuty Operations Cloud is that even though my device is on silent, it still rings and lets me know that something happened for the organization.
On-call schedules for team members are very helpful to find out who is currently on call to get help with incidents or to get tickets routed to them. At the same time, it pushes me notifications, gives me a call on my mobile number, and triggers emails on my email address, so the multiple notification service of PagerDuty Operations Cloud is excellent.
From a user perspective, the most valuable part of PagerDuty Operations Cloud is the notification feature that continuously contacts me until I acknowledge it. High and critical incidents are totally valuable for the organization because something is failing and I need to repair it on priority to not lose the business.
PagerDuty Operations Cloud improved my team's ability to focus on core tasks rather than routine issues by having the notification feature that was very helpful to monitor and trigger high and critical issues directly to team members.
What needs improvement?
I am not using PagerDuty Operations Cloud's autonomous AI agents now because we have not gotten into that yet.
I have not used generative AI yet.
The integration with ServiceNow is very good, as even though if I add some notes over there, it directly pushes the email or also pastes it on the ServiceNow tickets.
PagerDuty Operations Cloud also provides me information about how many incidents with the same errors I have encountered, as it does have the analysis engine running with incoming tickets.
There was agent alert fatigue with more granular root cause analysis that can be done. If I consider the false positive alerts, reducing them and giving real numbers of the issue would be beneficial.
I believe there is always room for improvement, and since technology is changing day by day, I will rate PagerDuty Operations Cloud as a nine.
For how long have I used the solution?
I have been using PagerDuty Operations Cloud for two-plus years, and I am still actively using it.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is stable, but I did have one issue where services were down for about ten to twelve minutes. I consider it highly stable and reliable overall.
What do I think about the scalability of the solution?
The scalability of PagerDuty Operations Cloud is good and I have not encountered any problems with it.
How are customer service and support?
I did not have to reach customer service because the product has been stable and reliable, and I can say it is really good.
Which solution did I use previously and why did I switch?
I found PagerDuty Operations Cloud to be more stable than other solutions, so I directly went with PagerDuty Operations Cloud.
How was the initial setup?
Another team integrated PagerDuty Operations Cloud into the system and set it up.
We did refer to the PagerDuty Operations Cloud documents for setting up teams and creating schedules.
What about the implementation team?
Another team integrated PagerDuty Operations Cloud into the system and set it up.
What was our ROI?
PagerDuty Operations Cloud improved my team's ability to focus on core tasks rather than routine issues by having the notification feature that was very helpful to monitor and trigger high and critical issues directly to team members.
Regarding cost saving, PagerDuty Operations Cloud provides the feature but is not really reducing the cost of other operations.
What's my experience with pricing, setup cost, and licensing?
I do not usually focus on pricing for PagerDuty Operations Cloud at the moment, but for smaller teams, I believe it is costlier, while for multi-million dollar companies, it is still affordable. For smaller teams who want to improve their operations, the cost is an issue.
Which other solutions did I evaluate?
I found PagerDuty Operations Cloud to be more stable than other solutions, so I directly went with PagerDuty Operations Cloud.
What other advice do I have?
I am satisfied with PagerDuty Operations Cloud and really appreciate the product, so I do not have any questions at the moment, but I do have interest in whether PagerDuty Operations Cloud has implemented agents to help with any issues that happen. I rate this product a nine overall.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
On-call automation has reduced critical incident impact and ensures faster production responses
What is our primary use case?
As a cloud operation team, I was a user who set the alerts, and whatever important incidents or anomalies were detected that needed to be immediately taken care of were bifurcated through our APM tools that we integrated with PagerDuty Operations Cloud. As a cloud operation team, we supported the platform for rotational shifts. My roles involved setting the person in the shift according to the shift roster, so whenever any incidents triggered, they would get the call. The primary use was supporting production operations and cloud activities.
Our multi-environment consists of AWS infrastructure, Linux servers, Kubernetes clusters, and customer-facing applications. PagerDuty Operations Cloud was mainly used for incident management and alerting. We integrated it with AppDynamics, Instana, and CloudWatch, where it would monitor the patterns and platform, and then PagerDuty Operations Cloud would generate the critical alerts that the appropriate support team who was working in that present shift would get notified of immediately. This platform really helped us manage production incidents beyond service outages, mostly high CPU utilization where we set alerts, application failures, pod issues in Kubernetes, and infrastructure-related alerts. We configured all kinds of alerts, which ensured that alerts were routed to the correct on-call person, helping us reduce response time in critical situations.
What is most valuable?
One of the best features I would mention about PagerDuty Operations Cloud is its on-call rotational scheduling support and escalation management practices. If an engineer did not acknowledge the alert within a defined time frame, the incident was automatically escalated to the next person, support team, or manager of that specific team. Another useful feature was its integration capability. We were able to integrate PagerDuty Operations Cloud with monitoring and observability tools that allow alerts to generate automatically whenever issues were detected in the environment within a fraction of time. We also had the mobile application that was very helpful because the engineer could receive calls, notifications, and acknowledge the incident and track the updates even when they were away from their laptop.
I also valued the centralized incident management dashboard that provides visibility into active incidents, response status, escalation history, and overall operational health. I used to get all the data accumulated there through the dashboard.
PagerDuty Operations Cloud helps us manage production incidents beyond service outages, mostly high CPU utilization where we set alerts, application failures, pod issues in Kubernetes, and infrastructure-related alerts.
What needs improvement?
My experience with PagerDuty Operations Cloud has been positive overall. One area where I believe improvement can be made is reporting and dashboard customization to make it more user-friendly. The operations team often requires different views compared to the management team. Having more flexibility in generating custom reports would be helpful. Another improvement could be providing more advanced AI-driven collaboration capabilities to reduce unnecessary noise alerts and help the team focus on the most critical issues. Apart from these areas, the platform is very reliable and effective for managing production incidents and on-call operations.
For how long have I used the solution?
I have been using PagerDuty Operations Cloud for almost five to six years.
What do I think about the stability of the solution?
PagerDuty Operations Cloud has been stable and performing well wherever our incident management or alerting was configured for production support. Timely notifications and incident responses were critical. PagerDuty Operations Cloud delivers alerts immediately through multiple channels which we configured, including mobile on-call notifications, email, SMS, and phone calls. Since PagerDuty Operations Cloud was integrated with our monitoring and observability tools, it helped ensure that critical incidents were captured and routed to the appropriate on-call team. During my usage, I did not encounter any significant outages or stability issues that impacted our operations due to PagerDuty Operations Cloud.
What do I think about the scalability of the solution?
PagerDuty Operations Cloud is highly scalable and works well with small and large environments. The project I worked on was integrated with multiple application servers and cloud resources for monitoring. PagerDuty Operations Cloud handles all the alerts from different resources and routes them to the appropriate teams. As the infrastructure grows, new services get implemented, escalation policies get defined, and schedules and teams are easily available without requiring major changes in our existing setup. This makes it suitable for an organization to manage large cloud infrastructure and multiple team supports.
Which solution did I use previously and why did I switch?
When I joined this project, they had already implemented PagerDuty Operations Cloud. When I joined, the SOPs and testing were already in process. After a few days, when I was actually onboarded, many of the alerts were configured in PagerDuty Operations Cloud. I did not get the chance to work on different tools besides PagerDuty Operations Cloud.
How was the initial setup?
During the initial setup of PagerDuty Operations Cloud, when I joined the project, I got a Jira ticket listing a few of the servers where I needed to install PagerDuty agents so it could trigger any alerts or integrate with the server. I was mostly involved in the configuration part.
The setup was straightforward. PagerDuty Operations Cloud also helped us in this process. It was not directly integrated on the individual servers, but we integrated our monitoring tools and observability with PagerDuty Operations Cloud. The servers and applications were monitored through application monitoring tools such as Instana, Zabbix, and Splunk. Whenever critical alerts were generated, they would automatically forward to PagerDuty Operations Cloud through the configured integrations we set up with the application. PagerDuty Operations Cloud would notify the on-call engineers and follow different escalation policies if the alerts were not acknowledged within a specific time. Our flow was that we had EC2 instances, AWS servers, and CloudWatch alarms, and if any alert triggered, it would send through SNS, AWS Simple Notification Service, and then to PagerDuty Operations Cloud and the on-call engineer.
What about the implementation team?
We followed the documentation provided by PagerDuty Operations Cloud for the configuration part.
The documentation is full-fledged with proper details on how to configure it depending on the integration with any application monitoring tool. They specify what steps need to be followed. If integrating with servers, they mention which type of server, whether it is Windows or Linux, and accordingly, they have provided all the documents. The documentation is comprehensive and easy to understand, such that even a layperson can do the configuration part with the way they have provided the documentation.
What other advice do I have?
We are not mostly focused on utilizing PagerDuty's autonomous AI agents because we are working on cloud infrastructure where we do the deployments. We have not implemented AI in our cloud to that extent. Going forward, if our infrastructure is AI-based, then we will definitely explore where PagerDuty Operations Cloud can help in that.
As of now, we do not use generative AI capabilities of PagerDuty Operations Cloud. Our infrastructure is huge, and there is a dedicated developer team working on AI-related things. They are still in two POCs, and the POC is being evaluated. If it looks good, then only we can roll this out into production because my application is customer-facing, and we do not want anything to go wrong or if the alert triggers unnecessarily due to some AI alert that did not notify us. That would ultimately cause us to lose our SLAs and SLOs, and all the other escalation matrices would come into the picture. That is why we are still in POCs as it is critical.
That part is taken care of by a different team or mostly the clients themselves. My main role is to keep the environment always up and running, and all alerts should be properly centralized and customized accordingly.
PagerDuty Operations Cloud is basically where we get the alert, and we can integrate through Slack and on-call rotational shifts on cell phones. Prior to this, we were mostly relying on application monitoring tools only and emails and Slack notifications. If an on-call shift person is not at their desk and if any alert has been triggered and no one is there to acknowledge it or look into it and take necessary action, then ultimately there will be customer impact. That is why we implemented PagerDuty Operations Cloud. Even if the on-call person is not near their laptop, they will get the call and can immediately acknowledge and report to the team that we have received a P1 call for this specific environment or that the alert is regarding a production issue. Another team member will immediately take action, so there will not be any miss.
I did not encounter any issues that required contacting support for PagerDuty Operations Cloud. This review represents an overall rating of 9 out of 10.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Automation has improved incident workflows and response times but still offers unexplored features
What is our primary use case?
My main use case for PagerDuty Operations Cloud involves working in the AIOps team, which is an operations team. We have a monitoring tool called Checkmk, and we have integrated it with PagerDuty for incident management. We monitor many servers across different teams, including the Linux team, network team, Windows teams, and database team. All of these servers are monitored in Checkmk, tracking live CPU, memory, and file systems. Upon reaching certain thresholds, Checkmk generates events, which we integrate into PagerDuty Operations Cloud console. There, we set conditions so that if an event is critical or a warning, it converts into an incident. We then route the incidents to respective teams, who handle the details.
A specific example of an incident where PagerDuty Operations Cloud played a key role involves automations we created within PagerDuty. Rundeck is a job workflow tool where we can implement scripts or schedule jobs. If a server meets its threshold, it triggers PagerDuty Operations Cloud. We create scripts in Rundeck to handle issues, such as clearing a full file system. We utilize a feature called Automation Actions in PagerDuty Operations Cloud, and whenever an incident comes that matches specific conditions, that job will automatically run in Rundeck. This incident management cycle is effectively managed in PagerDuty Operations Cloud, allowing jobs to run and resolve incidents automatically, ensuring the server is healthy again.
What is most valuable?
The best features PagerDuty Operations Cloud offers include Incident Workflows, which we use frequently to ease our team's work. These workflows trigger jobs in Rundeck based on certain conditions when incidents occur. We can create flows in Incident Workflow features and utilize Automation Actions, which allow us to run individual jobs in Rundeck. Additionally, Event Orchestration enables us to integrate various tools using integration keys with multiple applications. These features significantly simplify our daily operations within the team.
Integrations with other tools have been beneficial for our team as we receive requests from different teams to integrate their tools with PagerDuty Operations Cloud, enabling them to manage incidents. We have integrated AWS CloudWatch and Azure for monitoring, as well as CyberArk and Guardicore. If teams have specific requirements for integrating their tools, they approach us to create the necessary flows.
PagerDuty Operations Cloud has positively impacted our organization significantly. The response time has improved, and the team responds more quickly now. The PagerDuty Operations Cloud mobile application allows team members to acknowledge incidents via their mobile devices, where they can also receive calls when incidents trigger. The response time has become very quick.
I do not have precise numbers regarding the improvement in response time since using PagerDuty Operations Cloud, but I can share a story about a major incident with Checkmk. After we upgraded our Checkmk console, everything crashed, causing random events to be sent to PagerDuty Operations Cloud. We fixed the event flow from Checkmk using PagerDuty Operations Cloud's features. Furthermore, we have automated the restart of systems through PagerDuty Operations Cloud. If any server requires a restart, we trigger that job with just one click using Ansible, completing the task efficiently.
What needs improvement?
I do not see immediate improvements for PagerDuty Operations Cloud because there are numerous features we have yet to explore. As a product, it is continually upgrading its features, so we are focusing on how we can incorporate those into our use case.
Concerning PagerDuty Operations Cloud's AI capabilities, I am not certain as we currently do not use advanced AI-related features since our package offers limited access in that area. However, regarding governance and security, it appears very secure.
For how long have I used the solution?
I have been using PagerDuty Operations Cloud for three months.
What do I think about the stability of the solution?
PagerDuty Operations Cloud appears to be quite stable. I have not encountered any downtime or reliability issues, and it has consistently operated successfully.
What do I think about the scalability of the solution?
Regarding scalability, I think there is demand for PagerDuty Operations Cloud as we have been striving to mitigate manual tasks within our organization. We conduct demonstrations to illustrate how we can reduce manual work through our automations.
How are customer service and support?
The customer support for PagerDuty Operations Cloud is excellent. They have been very responsive. We have standing weekly calls to discuss any doubts, and there is a dedicated team, including an engineer and a PagerDuty Relations Manager, assigned to support us. They have been excellent in following up on the features we need to use.
What was our ROI?
I believe a return on investment is occurring because we are promoting PagerDuty Operations Cloud within our organization, aiming to involve more people and teams in using it. We continuously explore new features to facilitate ease of use among many people.
What other advice do I have?
There are many new features introduced in PagerDuty Operations Cloud. AI has been included, and specific features including Incident Workflows and Event Orchestrations have been implemented. One recent implementation in Incident Workflows is SLA tagging for incidents. We created a workflow to notify managers if an SLA has been breached beyond a certain time. This planning has helped us manage incidents more effectively.
I have not utilized PagerDuty Operations Cloud's AI agents to address routine issues. However, for team productivity, we leverage escalation policies in PagerDuty Operations Cloud, assigning individual service directories to teams. Consequently, team members receive calls and messages based on their escalation hierarchy.
We have not utilized PagerDuty Operations Cloud's generative AI for decision-making, but the event analytics and operations console provide valuable insights. I can observe real-time data on incidents and alerts, which helps us address the inflow of events from integration keys. This information allows us to refine our planning and reduce event volumes from Checkmk.
I would highly recommend PagerDuty Operations Cloud as a reliable product. I do not have any negative experiences using PagerDuty Operations Cloud, and I believe it adds significant value to our environment if used properly.
When it comes to the accuracy and reliability of PagerDuty Operations Cloud's output, I find it quite reliable. It presents us with extensive data and analytics. The event flow we get from Checkmk provides much useful information, and we rely heavily on PagerDuty Operations Cloud for this analytics format.
I do not believe we have a business relationship with PagerDuty Operations Cloud beyond being a customer. We purchase memberships based on their plans and use them within our organization. I think we are not partners; rather, we simply resell their services internally.
My overall rating for PagerDuty Operations Cloud is seven out of ten.
Centralized incident workflows have reduced outage windows and improved response coordination
What is our primary use case?
PagerDuty is predominantly used for our enterprise notifications for all of the incident management processes, especially the major incident management. We have many applications and infrastructure components. Earlier, we used a solution that only provided text-based communication. When we wanted to look for something with multi-channel notification and correlation capability, that is where we leveraged PagerDuty Operations Cloud.
I am currently going through the governance process to get additional capabilities onboarded. GenAI is not yet enabled since I am from a regulated organization and had to secure approvals before enabling any AI-related components. Most probably in the next two or three months, we will be enabling both GenAI, SRE agent, and the AI capabilities of PagerDuty Operations Cloud.
What is most valuable?
The ease of use is one of the key strengths. Creating the escalation policies and notification channels per user is straightforward, and it is not a requirement that everyone has the same notification rules. Users have flexibility in getting the communication they need. Event orchestration is the other part which works well for us.
Primarily, we were able to get the right people at the right time through our escalation mechanism, which is an automated switch from level one to level two. This helped us improve the overall MTTA, and the acknowledgment rate has drastically improved. For the major incidents, we were able to triage everything with PagerDuty Operations Cloud itself instead of switching between multiple tools such as Teams or other orchestration platforms. With one solution, we are able to do the triaging, and that definitely reduced the outage window and the average outage window.
We do have automations in two main ways. One is the incoming automation where we have multiple monitoring tools and systems that generate events. We ingest them into PagerDuty Operations Cloud and then using event orchestration, we create all of the respective incidents, whether they are PagerDuty Operations Cloud-only incidents, ServiceNow incidents, or different methods we use. The other automation method is incident workflows where we are able to call out to respective endpoints for the remaining automations. This is growing at this point in time, but event orchestration is mainly what we use for the automation of the triaging.
We used to have a two-digit figure of MTTA, and now it is reduced to less than one hour.
Getting the right people on board whenever there is a major issue and dialing them individually took a longer time. Now with PagerDuty Operations Cloud, having all of the predefined rules and the orchestrations we can create, it is definitely bringing value. Bringing the right people at the right time and improving the restoration time so that we do not impact any of the business end-user services is where PagerDuty Operations Cloud definitely plays a key role in delivering the business value.
What needs improvement?
I have submitted a few enhancement requests. Dynamic scheduling is something I was waiting for almost three or four years. Finally, I believe they are coming up in a few weeks with dynamic scheduling because whenever any operations deals occur, the shift rosters will not be static. People may be rotating between different shifts, and setting up on PagerDuty Operations Cloud was a challenging task. They are in the early access stage of dynamic rosters, and I believe that will address this issue. On the reporting perspective, there is a wide variety of reports, and the out-of-the-box reports can be matured further. Though we are getting customized reports through professional services and it is beneficial, if they were out-of-the-box, then they would further help. There are plenty of reports, but still, there is maturity that can be addressed.
For how long have I used the solution?
I have been using PagerDuty Operations Cloud for almost four years.
What do I think about the stability of the solution?
There are not many issues except during Cloudflare or major AWS issues. Otherwise, we do not have any performance issues. The platform is performing well.
What do I think about the scalability of the solution?
PagerDuty Operations Cloud is scalable, but how you will take the business model matters. We are on the user license basis, so we know how many users we can onboard to PagerDuty Operations Cloud. The rest of the things are definitely scalable, depending on how you agree with them on the contractual level. There is no challenge with that unless you have not calculated or forecasted your requirement.
How are customer service and support?
I used PagerDuty Operations Cloud support.
I would say they are pretty good, with regular support scoring eight or nine out of ten, and professional services scoring around nine out of ten. Both are pretty good for our business requirements.
Which solution did I use previously and why did I switch?
We were using different HP tools for all of the alerting and also a solution from OnSolve, earlier called TelAlert. Those solutions were distributed and not one central solution for incident management and alerts. Now it is centralized with one of our ITSM tools and PagerDuty Operations Cloud for both alerting and incident management.
How was the initial setup?
The initial setup was comparatively easy. We had to train the people because it was a new solution altogether. We got professional services support, and they helped us move forward. We did not have many challenges on the system level. Only user experience took more time as the team needed to learn how to use and operate the solution.
What about the implementation team?
I used PagerDuty Operations Cloud support.
What was our ROI?
From the pricing perspective, we got a good deal. When we took the tool, we did a comparison of the competitors and evaluated, and we are satisfied with that pricing. From the ROI perspective specific to the tool, we have not had a chance to calculate it. But overall, with the end-to-end process where PagerDuty Operations Cloud is present, I think we are almost near to getting the ROI.
Which other solutions did I evaluate?
We verified Twilio and two other solutions at that time.
What other advice do I have?
I would definitely ask them to do a PoC and do integrations with their existing ITSM tools or wherever they are looking for and thoroughly verify one end-to-end testing. Taking a major incident as a simulation and performing comparison on what metrics they do internally and what additional could help them out with the new solution of PagerDuty Operations Cloud, I think these two things definitely should be tested.
PagerDuty Operations Cloud as a product, I would give an eight out of ten. The only reason I put eight instead of ten is the enhancement requests or any new features. The time to market has to be much faster than what they have at this point. Some flexibility on the customization should also be provided. My overall review rating for this product is eight out of ten.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Cloud platform has improved efficiency and time savings but still needs stronger AI integrations
What is our primary use case?
I have been using PagerDuty Operations Cloud for the past few months. My main use case is that I use it in my day-to-day applications. A specific example of how I use PagerDuty Operations Cloud in my application is that I use it for hosting my agent. Hosting my agent on PagerDuty Operations Cloud helps me with my day-to-day work by being efficient in terms of scalability and managing infrastructure. It has been pretty helpful.
What is most valuable?
PagerDuty Operations Cloud's best features include scalability, managing infrastructure, and managing other services. It helps me manage other services comprehensively, and I think it is pretty good overall. PagerDuty Operations Cloud has positively impacted my organization by being effective in terms of managing the system and in terms of scalability. A specific outcome that shows how PagerDuty Operations Cloud has helped my organization is that it has improved efficiency and helped in saving a lot of time.
What needs improvement?
I think PagerDuty Operations Cloud can be improved in terms of services, such as integration with AI.
For how long have I used the solution?
I have been working in my current field for the past three years.
What do I think about the stability of the solution?
PagerDuty Operations Cloud has been stable based on what we have used.
What do I think about the scalability of the solution?
PagerDuty Operations Cloud's scalability has been pretty good because we are able to spin up different resources based on the use case and load.
How are customer service and support?
We have not used customer support explicitly.
Which solution did I use previously and why did I switch?
I did not previously use a different solution.
What was our ROI?
I have seen a return on investment, as I mentioned earlier; there has been a lot of improvement in terms of time and cost.
What's my experience with pricing, setup cost, and licensing?
My experience with pricing, setup cost, and licensing has been quite reasonable and cost-effective.
Which other solutions did I evaluate?
Before choosing PagerDuty Operations Cloud, I did not evaluate other options.
What other advice do I have?
I would rate PagerDuty Operations Cloud six out of ten because I believe you can add more features to make the platform even better. Regarding PagerDuty Operations Cloud's AI capabilities, I think its governance and security are pretty good, and the applications are quite secure. As for PagerDuty Operations Cloud's accuracy and reliability of output, I think the accuracy is pretty high and pretty good, and I believe it should be quite reliable, though I have not explored much on the recent AI capabilities.
I would definitely suggest PagerDuty Operations Cloud as a good platform, but it depends on your use case and the amount of scalability that you are looking for. PagerDuty Operations Cloud is pretty good and quite helpful. My overall rating for PagerDuty Operations Cloud is six out of ten.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Centralized alerts have improved incident response and now support flexible on-call workflows
What is our primary use case?
My main use case for PagerDuty Operations Cloud is on-call staff. For instance, when we have sites go down, we need somebody to investigate, so we require a text SMS or a phone call alert.
What is most valuable?
PagerDuty Operations Cloud offers several best features including cloud-based hosting, reliable performance, and flexible expandability.
Regarding the flexibility and expandability, you can scale up and down the amount of employees, add different paths to contacting people, and have monitoring capabilities, which has greatly helped my team.
PagerDuty Operations Cloud has positively impacted my organization with its very good interface and centralized operation. Having a centralized interface has made things easier by providing easy access administration.
What needs improvement?
PagerDuty Operations Cloud could be improved with clearer instructions for beginners.
For how long have I used the solution?
I have been using PagerDuty Operations Cloud for a year.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is stable.
What do I think about the scalability of the solution?
The scalability of PagerDuty Operations Cloud is very good; when we need to add or reduce employees, it can adjust.
How are customer service and support?
Customer support has been very good, and I can reach somebody anytime. I would rate customer support an eight on a scale of one to ten.
Which solution did I use previously and why did I switch?
Previously, we used just a custom alerting solution.
How was the initial setup?
We are testing AI and automation through PagerDuty Operations Cloud for incident response right now, but not too much has changed yet.
What's my experience with pricing, setup cost, and licensing?
My experience with pricing, setup cost, and licensing has been fairly reasonable and not expensive.
Which other solutions did I evaluate?
Before choosing PagerDuty Operations Cloud, I did not evaluate other options and only considered some standard custom operator solutions.
What other advice do I have?
I would rate PagerDuty Operations Cloud an eight out of ten because it is pretty good, but it is not perfect yet. Regarding PagerDuty Operations Cloud's AI capabilities, I think its governance and security are pretty good with no issues. Regarding PagerDuty Operations Cloud's AI capabilities, I find its accuracy and reliability of output to be pretty accurate and pretty stable. My advice to others looking into using PagerDuty Operations Cloud is to see how many users you need and use the licensing accordingly. My overall review rating for PagerDuty Operations Cloud is eight out of ten.
Automated incident alerts have reduced response times and improve on-call efficiency
What is our primary use case?
My main use case for PagerDuty Operations Cloud is monitoring multiple platforms. In cloud operations, whenever any application or device has an issue, it triggers PagerDuty, and the on-call shift engineer can immediately check on it.
For a specific example of how I have used PagerDuty Operations Cloud to handle an incident, if any media connect flow breaks or applications deployed on cloud instances have issues, we start receiving alerts. This can be configured using PagerDuty schedules to inform the on-call engineer to take immediate action.
Regarding my main use case, I would add that we do not need to be on the dashboards or monitor manually. Instead, our APIs work in the back-end. They check for things working on the platform's cloud, and if any action is required, the appropriate team can be aligned using PagerDuty.
What is most valuable?
The best features PagerDuty Operations Cloud offers are the policies and the insights. It provides a lot of data which is required to drill down the specific errors we have to work on. The escalation policy works well because it timely triggers the respective people required to work on those issues.
I use those insights and escalation policies in my day-to-day work to review the root causes of why we are getting those issues and why in such high numbers. It helps us to drill down the specific areas where we need to improve in order to have better robustness in terms of solution. It also creates the required awareness. We can use PagerDuty Operations Cloud to trigger the correct stakeholders who need to be involved when such issues occur, based on the definition defined.
PagerDuty Operations Cloud has positively impacted our organization by helping us save a lot of costs, and our responsiveness and actions towards any issues on the platform have improved many folds. If any issue occurs in real-time, we get paged. The correct engineer starts working on it. Considering we have thousands of customers, it is not possible to monitor all of those, but PagerDuty Operations Cloud helps in defining which ones are actionable and which ones can be ignored.
Regarding response times, it used to take around 10 to 15 minutes, and now that can be achieved within seconds. Similarly, for operations, around five to six engineers were required in a shift, and now that can be fulfilled using two to three engineers. That definitely saves a lot of effort and time and helps to improve operations as well.
What needs improvement?
PagerDuty Operations Cloud can be improved by using proper AI into it, where several actions can be triggered from PagerDuty Operations Cloud itself, instead of writing code. That could be a good approach. Additionally, grouping alerts would be beneficial. Though grouping alerts exists in PagerDuty Operations Cloud, it is not that effective.
Regarding PagerDuty Operations Cloud's AI capabilities, I am not aware of how the security and governance aspects are being handled. For an AI tool from PagerDuty Operations Cloud, it would be helpful in reading the correct metrics and can perform actions automatically. Based on the data we can feed, we can establish use case scenarios.
Concerning PagerDuty Operations Cloud's AI capabilities, I think its accuracy and reliability of output is still maturing. I would say it is going well. Still, a lot of work needs to be done to get this working fluently. Whatever we have achieved so far is decent to use.
Customer support can be improved. At times, we get a delay in response. It takes time to get things back on track and to get the fulfillment done. That is something PagerDuty can work on.
For how long have I used the solution?
I have been using PagerDuty Operations Cloud for around seven to eight years.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is stable.
What do I think about the scalability of the solution?
Regarding PagerDuty Operations Cloud's scalability, it is good. As and when you require it based on the license, you can immediately get it. That is not much of a problem for us.
How are customer service and support?
Customer support can be improved. At times, we get a delay in response. It takes time to get things back on track and to get the fulfillment done. That is something PagerDuty can work on. I would rate the customer support on a scale of one to ten as seven.
Which solution did I use previously and why did I switch?
We were using PagerDuty Operations Cloud only. We did not use any other solution; it was always PagerDuty Operations Cloud.
How was the initial setup?
Regarding my experience with pricing, setup cost, and licensing, pricing looks a little on the higher side, which can definitely be improved. Setup is quite easy and nice and convenient to use.
What about the implementation team?
PagerDuty Operations Cloud was purchased by a security team. They take on those licenses in bulk. It is being used. It is not us directly purchasing it, but there is a specific team for that.
What was our ROI?
I have seen a return on investment. Previously, this was handled by a team of 40 to 50 people. Now, in terms of licenses, we can operate similar functionality with fewer people using AI tools in place. Those actions get automatically performed, so not everyone needs licenses. This definitely saves cost.
What's my experience with pricing, setup cost, and licensing?
Regarding my experience with pricing, setup cost, and licensing, pricing looks a little on the higher side, which can definitely be improved.
Which other solutions did I evaluate?
Before choosing PagerDuty Operations Cloud, we evaluated Opsgenie, which looks good, but it is in the initial state. It will be interesting to see how they evolve soon. Datadog is good. Then there is Grafana On-call and GoAlert. There used to be a tool called One Time. That was quite decent as well. But trusting the legacy of PagerDuty Operations Cloud and its reliability, we went with PagerDuty Operations Cloud.
What other advice do I have?
My advice to others looking into using PagerDuty Operations Cloud is that it is quite a stable platform. You can use it and work on it easily. It is an effective tool. Considering what is available in the market right now, PagerDuty Operations Cloud definitely has an edge in functionality and robustness.
Concerning PagerDuty Operations Cloud's generative AI's effectiveness in providing insights for decision-making, it is good. These are new types of models, so it is getting tuned to our requirements and business requirements. Initially it is good, but it can improve.
For automations, we are using PagerDuty Operations Cloud things and then it triggers our tool. As soon as it gets triggered with the specific information, the respective actions are performed. This has become more of an automation, and there is a team working on AI as well. They are looking to get this built up, but it is a work in progress as of now.
I am not aware of much regarding the solution's alert reduction feature on preventing costly incidents in my organization.
It saves time regarding how PagerDuty Operations Cloud's AI functionality has improved my team's ability to focus on core tasks rather than routine issues. The repetitive actions, reporting, and all those have become automatic now. A person or human does not need to sit and perform those. It has saved a lot of effort and time and monotony of work. I would rate this product overall as an eight out of ten.