Complex projects in Zalando most of the time end up in a roll-out. It could be a new feature, a new design or a completely new country we operate in, as it happened with the Market Expansion program where we launched in six new markets in summer 2021: Lithuania, Slovenia, Slovakia, Estonia, Latvia and Croatia. These big scale projects especially need a proper preparation of the roll-out activities as so many different teams and units are involved. But also for smaller roll-outs with only a handful of teams involved the aspects mentioned in this article are really useful.
I want to highlight the most important learnings from the two roll-outs (3 countries each) of the Market Expansion program:
A roll-out is a team effort. To have the correct people involved and briefed each team should nominate a representative and a backup. Due to vacation plans, 24x7 schedules or specific knowledge about certain components, it worked best in our case to trust the teams on who should represent them during the roll-out.
Pre-release checklist with owners, sign-offs by Heads
A roll-out plan does not only contain the tasks for the actual roll-out itself, but also a list of things which need to happen before that. It is very important to assign clear owners to these items and not assume they will be done automagically. Additionally the final sign-off by the Heads to greenlight the work their teams did was feeding the overall go/no-go decision. This was a well received mechanism for such a big and distributed program.
Clear schedule with times and owners
The most important part of the roll-out plan is the actual schedule of tasks which need to happen in order to go live. Also here it is really important to be clear on timings (time zones!) and even more importantly on who will execute the task. If there are pull requests involved which need to be merged (i.e. routing configurations or similar) in order to execute the planned rollout-step, they should be prepared, reviewed and approved already as part of the pre-release checklist.
Have a Launch Commander
This could either be the Technical Program Manager or one of the Principal Engineers involved. The Launch Commander will guide the teams through the roll-out and moderate the rollout-chat and, if necessary the situation room call. It is also strongly recommended to have a backup Launch Commander available.
Common chat room (tag everyone beforehand already)
To coordinate the roll-out especially with everyone being remote a common chat room worked best in our experience. Every representative got tagged already a few days before so that they receive the notifications of this chat. The Launch Commander is coordinating the roll-out steps in this chat room, getting the check points approved and only then greenlighting the next step.
Situation room meeting in representatives calendar
In addition to the chat room we added a Situation Room meeting to the calendars of everyone involved. If needed, people could jump on that call right away, luckily we didn't need this for both roll-outs of the Market Expansion. The second purpose of this meeting invite is to have the roll-out in everyone's calendars so that they can move or decline other meetings accordingly.
Although the teams are responsible for observing their components during and after the roll-out we collected the most important dashboards also in the roll-out plan. Especially to see the first real customer orders which is kind of a unique experience when going live in a new country.
Rollback or Roll forward?
Another important aspect is what to do if reality does not stick to the plan and something unforeseen happens? Usually the rollback strategy consists of exit criteria and the rollback steps themselves. If one or all of the criteria are met, the rollback is executed. For the Market Expansion roll-outs we aligned beforehand with all the involved teams and the 24x7 group that every incident during roll-out will be handled like a live incident and follow the normal and well established process. Additionally, as the expected traffic right after the roll-out was rather low, we had a roll forward strategy in case of incidents. We would have fixed them right away without rolling back, as only a limited number of customers would have been affected.
Two successful roll-outs with no incidents showed how important the detailed preparation was and what a key role the rollout-plan was playing.