PagerDuty is a SaaS-based agile incident management platform that provides notifications, automatic escalations, on-call scheduling, and other functionality to help teams detect and fix infrastructure problems. It is headquartered in San Francisco with operations in Toronto, Atlanta, the United Kingdom, and Australia. PagerDuty is not a monitoring tool. Think of it as a service that’s designed to use machine learning and automation to alert users to disruptions and outages based on data from your existing monitoring tools.
Overview of PagerDuty
PagerDuty prescriptive machine learning analytics helps organizations gain valuable insights into their team’s performance trends, the health of their operations, the business impact of an incident, and more. PagerDuty also engages the right people to accelerate issue resolution.
Other ways that PagerDuty can be used include:
- PagerDuty can be used to automate security incident response, critical event management, DevOps, and full-service ownership, among others.
- PagerDuty can enable your applications and detection tools such as intrusion detection systems and environmental sensors to call you if they detect an anomaly.
- PagerDuty can be used to initiate a call if cron or other critical system maintenance task fails.
- PagerDuty can be used to automatically escalate support tickets to ensure response SLAs are met.
For most organizations, revenue and brand reputation depends on customer satisfaction, PagerDuty empowers developers, DevOps, IT operations, and business leaders with the insight to proactively manage events and incidents that may negatively impact customer experience across their IT environment.
Key features and capabilities include:
- Agile incident response Automates work across teams, executes detailed playbooks, and accelerates resolutions.
- On-call management Alerting and scheduling so your teams are ready and empowered to take fast action.
- Integration Integrates with any tool in your ecosystem, aggregating and transforming any signal into real-time insight and action.
- Coordinated response Drive coordinated business response with stakeholders in real-time through proactive notifications and status dashboards.
- Event intelligence Apply machine learning for full incident context, real-time triaging, and personalized recommendations.
- Analytics Understand the systemic impact of issues on your customers, teams, and bottom line.
Systems Requirements and Installation
Being a SaaS-based application, there are no on-premise system requirements other than a modern web browser with JavaScript enabled; There are also no installation requirements other than the usual signup process using your email address and password—which is pretty much straightforward. Once the signup process is completed, you’ll be required to install a PagerDuty mobile app (available for Android and iOS), and that’s about it.
Dashboards and Visualizations
PagerDuty provides infrastructure health visualizations tools to identify patterns and trends. PagerDuty dashboard displays the status of incidents and alerts across all of your monitoring tools. In addition, PagerDuty provides various pre-built dashboards such as intelligent dashboards, and status dashboards, among others. These dashboards and their GUI are designed with simplicity in mind. This makes navigation and configuration of policies so easy.
Alerts and Notifications
Alerts and notifications are among the key services provided by PagerDuty. PagerDuty has some of the best alerting features that balance effectiveness with ease of use. PagerDuty is designed to ensure that you are notified of incidents in a timely fashion, wherever they occur, and re-notifying when no action has been taken to resolve the issues. The goal is to elicit a response and accelerate issue resolution. This unique approach to alerts and notifications helps organizations keep mean time to resolution (MTTR) as low as possible.
The service can be configured to deliver alerts via email, SMS, and voice call to any country, or mobile apps (iOS and Android). Users can confirm that they’ve received an SMS by replying to it. You can configure PagerDuty to repeat the SMS or voice call if a notification isn’t replied to. These notification methods can be customized in a way that allows incidents to dynamically select the right notification behavior depending on their severity, time of day, or during defined support hours.
PagerDuty can automatically group related alerts into a single incident to minimize noise while centralizing relevant context. Each PagerDuty service has its associated notification and escalation rules called escalation policies. This automatically routes issues to the people best able to resolve them. For example, you might want to create an escalation policy for your security incident response team, and use this policy for all services that integrate with SOCs operations. This ensures that security issues are always forwarded to a security specialist.
Reporting and Integration
PagerDuty operates as a standalone service or can be integrated with just about any monitoring service. You can integrate virtually any monitoring tool with PagerDuty via email or API. Some of the commonly supported monitoring tools used with PagerDuty include Nagios, Pingdom, Amazon Cloudwatch, Splunk, and Datadog, among others.
PagerDuty provides various exportable out-of-the-box reports depending on your subscription plan. Summary metrics that show the meantime to acknowledge, the meantime to resolve, the number of escalated incidents, and more are also provided. The real power of PagerDuty lies in its ability to integrate with tons of third-party infrastructure and application monitoring tools, including Datadog, to enrich and accelerate your incident response process. When you integrate PagerDuty with a third-party monitoring tool, an event in the tool can immediately trigger an incident in PagerDuty, which creates an alert, thereby kicking off your team’s incident response process so the right person can acknowledge and resolve the incident.
Licensing and Price Plans
PagerDuty’s licensing model is based on per user, per month price plans (billed monthly or annually), and it’s free for up to 5 users. The premium price plans include Professional, Business, and Digital operations, as shown in Table 1.0 below. However, PagerDuty’s pricing model tends to make customers pay more to enable them to unlock complete reporting and other essential add-on features. This may seem high for most small and mid-sized organizations.
Free | Professional | Business | Digital Operations | |
---|---|---|---|---|
Max users per account | 5 | Unlimited | Unlimited | Unlimited |
Features | On-call scheduling, Unlimited API calls, 600+ out-of-box integrations | Free plan + Unlimited international phone/SMS notifications, schedules,escalation policies, SSO, Ticketing integrations, Email, and chat support | Professional Plan + Streamlined incident response, Advanced admin features, ITSM integrations, Unlimited data access | Business plan features + Intelligent noise reduction, Real-time context for faster triage Advanced analytics, Customizable operations dashboard, Live call routing |
SMS/Phone notifications | 100/month | Unlimited | Unlimited | Unlimited |
Price | $0 up to 5 users /month | $19 per user /month | $39 per user /month | On request |
Table 1.0 | PagerDuty licensing and price plans
A 14-day free trial with full access to all the features is available with no credit card required. During the free trial, you’ll be able to send phone calls and SMSes free of charge. After the 14-day trial, it will default to the Free plan unless you upgrade to the premium plan of your choice.
On the flip side, the reporting feature appears to be tailored toward operational use. This means that customers that want to use an external reporting solution will have to upgrade to the more expensive premium plans.
The Best PagerDuty Alternatives
- Opsgenie A well-known on-call and alert management application that helps organizations operate always-on services. Opsgenie is one of the major competitors to PagerDuty. Opsgenie supports integration with lots of third-party monitoring and ticketing tools such as Datadog, Slack, Jira, and Amazon CloudWatch, among others. Key features include on-call management, alerting, reporting, and analytics. A free 14-day trial is available on request.
- xMatters A service reliability platform that helps organizations and key teams like DevOps and SOC teams automate workflows, receive actionable notifications, and ensure infrastructure and applications services are available. The platform offers services such as signal intelligence, workflow automation, on-call management, incident response, and analytics. A free online demo and a free trial are available on request.
- Splunk On-Call An incident management software that captures essential remediation data from various sources to support effective incident response. Splunk On-Call used to be VictorOps but was acquired by Splunk and enhanced with better context, intelligent escalation policies, machine learning, and streamlined collaboration to extend the alerting and messaging services from all Splunk products. This allows organizations to leverage existing team contact, scheduling, and escalation policies to get Splunk alerts to the right teams and people. A free 14-day trial is available on request.
- Uptime Just like PagerDuty is a SaaS-based incident management platform that monitors your infrastructure and provides alerts and notifications about critical events. Other use cases include uptime monitoring and a public status page for communicating downtimes to the right persons on your team. Check out the price plans including a free version for your projects.
- Alertops A SaaS-based n response automation and incident alert management platform with all the capabilities needed to deliver reduced downtime, improved time to the resolution process, operational efficiency, and a better customer experience. If you are looking for a good PagerDuty alternative, Alertops is it. A Free personalized demo and a 14-day free trial are available upon request.
- ServiceNow Cloud Observability (formerly Lightstep) A unified monitoring, observability, distributed tracing, and incident response platform that aims to reduce noise and alert fatigue. Lightstep Incident Response was acquired by ServiceNow in 2021 and was subsequently integrated into services for the automation of various tasks and workflow management and collaboration capabilities. Lightstep is built for DevOps and SREs, and it’s a good alternative to PagerDuty. A demo is available on request.
- Moogsoft A SaaS-based artificial intelligence (AI) for IT operations (AIOps) solution platform that helps businesses analyze data, detect anomalies, diagnose the root cause of a problem, and send alerts. It uses AI to provide intelligent monitoring, detection, and reporting of incidents to ensure uninterrupted service delivery and improved customer experience. A demo is available on request.
- OnPage A SaaS-based incident alert management platform that provides organizations with the ability to receive persistent alerts on mobile phones that inform them of a critical situation to elicit a fast response. OnPage is used in hospitals, security operations centers (SOCs), DevOps, supply chain operations, and more to drive fast incident remediation. A free 7-day trial is available on request.
- Squadcast An all-in-one incident management solution that provides actionable alerts, notification rules, escalations, schedules, runbooks, and more. Squadcast is a good PagerDuty alternative for Site Reliability Engineering (SRE) and DevOps. The software integrates with popular third-party performance monitoring and communication tools such as Datadog, New Relic, Papertrail, Slack, and others. A free trial is available on request.
- Datadog A SaaS-based infrastructure monitoring service for cloud applications, servers, databases, tools, and services. The Datadog platform is also integrated with an Incident Management module that enables teams to effectively manage their incident response workflows and resolve issues directly in the Datadog platform. The goal of Datadog Incident Management is to automate as much as possible the process of analyzing alerts and creating incidents and then identifying the team needed to resolve incidents within Datadog without needing to switch tools.
Concluding Remarks
PagerDuty is an excellent incident response and alerting service that is both straightforward and powerful and will make sure that every member of your team stays in the loop regarding IT infrastructure status. MSPs and other organizations committed to keeping mean time to resolution (MTTR) as low as possible will find PagerDuty very appealing.
You should consider using PagerDuty if:
- Your organization needs capabilities for customized notification rules when certain thresholds or unexpected anomalies occur for your existing monitoring tools.
- You need a centralized view of the overall health of your systems and operations, no matter how many tools, services, or applications your team is managing.
- You want to extend agile incident management workflows to your existing IT operations with on-call scheduling, automated escalations, incident tracking, and more.