Spreedly Subscriptions went into emergency maintenance mode for about 50 minutes today. We apologize for the incident and any inconvenience that occurred. We are acutely aware that you depend on our availability and we take that responsibility seriously. Ironically, what triggered the incident was a small part of a much larger plan to move data centers with zero downtime. This is not how we wanted to break the news of our new facilities.

To make the move we needed to extend our MySQL replication chain to the new data center. In the process of doing so, a database dump was accidentally loaded onto the primary database at the production data center. The intended target was the primary database at the new data center. This caused tables to be dropped and reloaded all while the site was online. We can refer to this timeframe as the “disaster window.” The error was caught when we noticed our new database’s storage wasn’t filling. At that point, we went into emergency maintenance mode to begin recovering.

While in maintenance mode, we restored all data from the latest backup. This particular backup was originally made to seed the new sql nodes at the new data center, so it was less than an hour old. Once imported, we brought the site back online. Then we manually handled the tough work of restoring the data that came via API while the database was tearing itself apart. With this done we were fully recovered.

Again, our apologies for any issues this may have caused you and your business. In brighter news, the original task of enabling site-to-site database replication has been completed as well and puts us one step closer to a very big upgrade for Spreedly and its customers.

Archives