This article was produced in collaboration with Statuspage as part of our supported channels program. We appreciate their support as it helps us invest in content for the greater Support Driven community.
Want to be part of the discussion? Join us in #incident-comms
If you’ve ever been tasked with writing a status update when things are on fire you know that it’s hard. It’s often urgent, stressful, and requires conveying clear and accurate communication (often before you’re clear on the details yourself).
When we saw this tweet from Ben McCormack, Head of Support at FullStory, we knew we had to dig into the topic further:
What makes one-to-many communication so much harder than one-to-one communication? And how can we ease this angst for folks tasked with the job of getting the word out to customers when things go wrong?
While (thankfully) incidents are rare for FullStory, Ben and his team have developed helpful practices to ensure downtime doesn’t equal disaster for their support team or their customers. He was kind enough to sit down with us and share their techniques for combating stress and writer’s block during an incident:
1. Define the situations that warrant customer communication (before an incident strikes)
When things are on fire, you don’t want to waste time determining whether you should communicate the problem to customers. This ‘should we/shouldn’t we debate is harmful in a couple of ways: 1) It keeps customers in the dark longer than necessary if you end up determining that comms are needed and 2) It can cause internal confusion and debate that could have been sorted out before an incident, saving your team time and strife.
Ben’s team recognized this and decided to create a “Statuspage Constitution.” This constitution is essentially a document that lists out things that must be true in order for the team to post to Statuspage during an incident. The questions focus upon incident severity, expected duration, and customer impact:
- A core piece of FullStory functionality is broken, nonfunctional, or experiencing significant performance degradation.
- The incident persists over a non-negligible period of time.
- The incident impacts a large number of users/customers.
For each bullet point, the Statuspage Constitution includes further definition (e.g. what does “large number of customers mean”) and examples of what might qualify or be disqualified. It’s a lot of work up front, but the added clarity lets them move fast if something comes up.
As soon as the team is alerted about an issue, they quickly answer the questions together in Slack. If they determine that communication is necessary, they quickly spring into action and get the right people on the communication front lines.
2. Make incident communication part of your on-call schedule
It’s easy to let communication fall to the wayside during incident response. The dev team is focused on fixing the problem, while the support team is trying to handle the surge in inbound tickets or emails coming in.
The FullStory team ensures communication remains front and center by embedding their support team into their on-call process. There are always support team members on-call alongside the dev team, and at least one ‘Incident Hugger’ gets paged when comms are needed. The Incident Hugger’s job is to be heads down focused on customer communication, allowing engineers to stay focused on resolving the incident.
3. Practice, practice, practice
When you work on a support team, communicating with customers on support tickets becomes second nature. Since incidents (hopefully) aren’t an everyday thing, status update practice is not inherent in the role.
As Ben told us:
To actually sit down and write a status update – even if you’re an expert at customer communication – is such a different type of communication and audience you’re trying to speak to. You can’t rely on your expertise in other types of comms.
That’s why setting aside time to practice writing incident updates is so crucial. They recommend holding mock incidents or fire drills to get your team comfortable with writing under pressure. Ben’s team is even working on a series of playbooks for incident response which will include incident communication fire drills.
4. Breathe, and focus on your goals
Even the most practiced team will encounter stressful situations that they may not feel ready for. In this case, Ben recommends pausing, taking a breath, and refocusing on your goal. For the FullStory team, the goal is to deliver the quick and accurate information to customers, while precluding additional follow-up. If they are trying as hard as they can to meet this goal, the rest should fall into place.
Join the incident communication conversation
How do you ease the pain of incident communication? Have any of Ben’s tips worked well for you? We’d love to hear about it in the #incident-comms channel in the Support Driven Slack. Not a member of the Slack yet? Join here!