Better planning for maintenance windows
A few recent changes to our internal downtime process
I participated in my first maintenance window in 1999. Well, let’s say downtime. Or, if we want to be charitable, accident.
A fancy new switch was delivered to the office where I was working at the time, along with a bunch of cables, and my boss said to me, “Plug that into our network!”
You might be able to guess what happened next. I took down three subnets with a network loop. My boss walked in as I was pulling cables out and congratulated me for being the fastest sprinter back to the data center.
And then we had a meeting to talk about how we could keep that from happening again. We talked about spanning tree, running configurations by a coworker before plugging things in and testing.
What I came to respect my boss for was not just that he didn’t hang me out to dry for an amateur mistake (and, hey, I was an amateur!). He took the time immediately to fix the process: educating me about what to do next time with a focus on testing.
At Emma, we have policies in place to manage maintenance like change management tickets, and a weekly meeting among IT and engineering staff to discuss changes. With a growing office in Nashville and database folks in Portland, OR, we recently realized we needed a little more, both to make it clear to folks when a change was coming and to plan for when things didn’t go quite right.
So I set out to document the existing process and add a few extra steps for explicit communication. Here’s what I came up with:
- Create a ticket to track the change.
- Get in touch with folks implementing the change — who will make the change, who will restart, reload and test the affected services.
- Loop in our Community team.
- Choose a date suitable for all involved.
- Set up a pre-change meeting to discuss the changes and rollback plan with all team members — who might think of things that you forgot!
- Announce change to departments.
- Send an announcement to customers through Community.
- Conduct the change!
- Send an announcement that the change was completed through Community.
- Have a post-change meeting with Community to wrap up lose ends, and hopefully celebrate a successful change!
The things we added to the existing process were “Loop in the Community team” and “Have a post-change meeting with Community.” The Community team includes all of Emma’s post-sales support staff and our designers.
In the past, we’d just emailed folks in Community about what we were up to. Now, we’re making them part of our change team. This keeps them better informed, and it allows us to ask for their insight into testing and the impact we’re having on customers.
Our first crack at the new process was the second weekend in January. We implemented a few configuration changes and upgraded our production databases. Immediate feedback was good. Internally, folks working on the change reported a sense of calm, as well as satisfaction that the change went smoothly. Customers reported appreciation with our communication, but they would have liked a little more notice. Point taken.
And then we had the post-change meeting…
My next post will be all about that meeting. Hopefully, it will give y’all a peek into how we tend to handle suggestions for improvements at Emma.
0 comments
Leave a comment
