Outbound Mail Outage
Incident Report for MailChannels
Postmortem

MailChannels Cloud operates across several completely independent clusters to ensure that software configuration issues never result in a completely down service. Today, one of these independent clusters lost the ability to authenticate SMTP connections properly, causing 421 temporary-failure messages to be returned to SMTP clients. Any SMTP client that reached this cluster through our load balancers would receive this message and be unable to submit messages for delivery. Because of the way the load balancers work, it’s possible that some clients may have repeatedly reached the same failing cluster rather than being appropriately load-balanced to other, working clusters.

To resolve the issue, we initially disabled the cluster with the authentication issue, causing all SMTP connections to be handled by working clusters. As a result of the temporary failures, customers would have seen email delivery delays and increasing queues.

We have identified the root cause of the authentication failures and have rolled out a permanent fix. We will also be improving our monitoring to ensure that our team will be notified more rapidly if a condition like this ever occurs in future.

Posted May 09, 2019 - 16:59 PDT

Resolved
The issue has been resolved. We will post a postmortem for the issue soon.
Posted May 09, 2019 - 13:29 PDT
Monitoring
Mail is being processed normally; however, we are operating with less redundancy than normal in our infrastructure.
Posted May 09, 2019 - 07:03 PDT
Investigating
Starting at approximately 1 AM PST time, we were alerted about mail flow issues in our outbound service and have been working on resolving the issues since.

As of 2:30 AM PST time, all the mailflow has been redirected through one of the working clusters while we continue to investigate what caused the issues in the failed cluster.

There should be no mail flow issues at the moment. If you are still facing issues, please contact support@mailchannels.com and we will investigate.
Posted May 09, 2019 - 03:35 PDT
This incident affected: Outbound Filtering (SMTP Service).