Shortly before 11:40 UTC we began a server-side data migration to support a new Proton feature. Unfortunately, a bug in this migration caused our servers to issue a signal to some users' mail clients that instructed them to reset their local cache and redownload all data from the server. At 11:40 UTC this mass resynchronization reached critical load on one of our storage layers, which then began having difficulty serving data to all requests. In order to reduce the load and protect other services such as Proton Drive and VPN, we took Mail and Calendar offline. We then restored the offline services gradually, with all services except for Proton Mail Bridge restored by approximately 12:44 UTC.
We have reverted the change, so any clients which were offline at the time of the signal will not reset their cache, and most mail clients will have resynchronized very quickly. Proton Mail Bridge, however, is a special case. While it was not the root cause of the issue, it maintains a complete offline copy of all mails and attachments, so Bridge resynchronizations caused the most additional load. As a result, the best way to manage the recovery of Mail services was to slow down Bridge resynchronization, which took some time to put in place. Bridge service is now restored as well, though resynchronizations will take longer than usual. We will lift these limits once load on our infrastructure returns to normal levels. As always, we will do a thorough post-mortem to identify exactly what happened and how, and study how we can both avoid recurrence and improve our response to issues in the future. We deeply appreciate your patience and apologize for the inconvenience.