A little chaos ensued on the Kusama network, Polkadot’s canary-net. Here’s a quick summary of the cause and solution.
At around midday UK time on Saturday, a referendum ended on an upgrade proposal (ostensibly meant to update the Kusama blockchain to Kusama runtime version 1034). The associated upgrade happened, switching out the core logic of the Kusama blockchain with the new logic of the upgrade. However, due to a naming issue related to a recent change which split apart the Kusama logic from the provisional Polkadot logic, the upgrade inadvertently updated the chain not to the Kusama runtime but to the provisional Polkadot mainnet runtime! Due to a difference in the parameters of the Babe consensus algorithm between the two, the new runtime was incompatible with the Kusama client code. Babe stopped producing blocks and Kusama became bricked.
Kusama finalises blocks one block before the head of the chain. This means that it’s possible, with the support of 50% of the validators, to “roll back” a block in which some change occurs that causes the chain to halt, and change matters, perhaps by altering the transactions/extrinsic data that are fed into it. In our case, this provided no solution since the referendum would always end at the beginning of block number #516559, causing the code to be changed. Rolling back just one block was therefore useless: There was no known transaction that could be constructed or removed which could avert the problematic change of code.
Thankfully it was also clear that we didn’t need to roll back much more than one block: the upgrade proposal referendum has few votes and a single vote being switched from “approve” to “reject” on the upgrade proposal would be enough to ensure it did not get passed. This vote transaction could happen at any point up until block #516558, the block immediately before the votes are counted and the proposal executed (or not). In theory we only needed to roll back two blocks, though in practice a few more would be helpful to give a safety margin to ensure the needed transaction was included.
However, the niggle was that one of those two blocks had already been finalised by our finality gadget Grandpa. Rollbacks in such circumstances have been intentionally designed to be practically impossible, at least without the support of a vast super-majority of network participants and a deep knowledge of the codebase. Rolling back the Grandpa state would be difficult but not impossible, assuming we had the coordination of the Kusama validator community.
The final sticking point was the fact that Babe, our block production scheme, is designed with the assumption that at least a single block would be authored every Babe session. In Kusama, Babe sessions are every hour. No fix was ever going to be rolled out across the 130 validators of Kusama in the first hour after halting; indeed by the time our initial analysis of the situation was network had already been halted for two hours. This meant that even if we could revert Grandpa and build an alternative block which averted the problematic upgrade, the chain would still be bricked shortly afterwards once the validator nodes attempted to build on the chain after more than one hour of stalling.
Inventing a DeLorian
In fact, we needed to think a bit more creatively. If we couldn’t author blocks as of now, then we would need change things so that the missing blocks were all authored as Babe expected, at least one per hour. In fact we want more than one per hour in order to give validators a chance to notice that the sessions are going by and signal that they are online — without that signal, good validators would be kicked out and possibly slashed for being “offline”.
The answer was to rollback not just a few blocks, but also time itself, at least from the point of view of Babe. As of 9am GMT on the 7th of January 2020, the Kusama validators will believe that they have gone back in time to the 4th January 2020 12:10:12 GMT, around 25 minutes (and 48 blocks) before the problematic upgrade event happens. This gives us time to reject the upgrade proposal and prevent Kusama from becoming a canary-shaped brick.
But, unless we want to live two and half days in the past indefinitely (we don’t), then our avian DeLorian also needs a way to get Back to the Future. We can’t jump directly into future like Doc and Marty; Babe would notice the lack of blocks produced between now and then and halt.
It’s Just a Jump to the Left…
Thankfully when you can control the external environment for a blockchain, all sorts of things become possible. Not only can you jump backwards and forwards in time but you can also do subtler things like speeding time up.
So, Kusama will return to the present day following its visit to the past, but will do so not through a single jump but instead by warping time itself. Specifically by a factor of six times, essentially creating a bubble between the real world and Kusama. Inside the bubble, time runs at six times the speed of time outside the bubble. This gives validators a chance to produce blocks and state that they are online (though they have only one sixth of the time as usual, so they’d better be quick!). It keeps Babe happy as from the point of view of the chain, encased inside the bubble and unable to see out, things are happening perfectly normally.
Of course we can’t stay at warp speed forever or Kusama would zoom off into the future and leave us all behind. So we built in an automatic off-switch; once Kusama catches up to our present time and the clock inside the bubble is the same as the clock outside, then the bubble disappears and things return completely to normal.
If it all works, then the upshot is that Kusama, at around 9pm GMT tomorrow, will return from its trip, with a “normal” history in place and none the wiser that anything out of the ordinary happened at all.