SMTP Service Outage
Incident Report for MailChannels
Postmortem

Post Mortem - Outbound SMTP Service Degradation on Oct 24

On October 24th between 15:32 and 17:25 GMT time, there was an outage in our Outbound Filtering service. This outage resulted in our SMTP service not accepting connection from upstream servers, resulting in increases queues on the customers servers. To our knowledge, no email messages have been lost. We have now identified the root cause for these delays.

What happened?

A configuration error was introduced into our policy system during a routine deployment of new software. This resulted in degradation of a core policy system resulting in the service accepting new SMTP connections from customers’ servers very slowly. Our monitoring systems notified us of an issue at 15:32 GMT time - within one minute of the error being introduced. We quickly identified the root cause and started rolling back the deployment that caused the issue. By 17:25 GMT time, all the services were completely restored.

We are looking into how a failure like this can be avoided in the future and how we can recover faster in case there are issues with new deployments.

Posted Oct 25, 2019 - 07:57 PDT

Resolved
We now consider this incident to be resolved. As mentioned earlier, a post-mortem will be provided. The outage began at 15h30 UTC and was fully resolved at 17h05 UTC.
Posted Oct 24, 2019 - 10:30 PDT
Monitoring
A fix has been implemented and we are closely monitoring things to ensure the service has indeed been fully restored. Once this issue has been completely resolved, we will be providing a post-mortem.
Posted Oct 24, 2019 - 10:24 PDT
Update
Our team is making good progress toward resolving the issue and mail is flowing in bursts; however, we are not ready to consider the systems fully operational.
Posted Oct 24, 2019 - 10:23 PDT
Update
We are continuing to investigate the root cause of this disruption. At present connections are being handled extremely slowly. No email is being lost; however, messages are being queued in customer queues in response to connection timeouts.
Posted Oct 24, 2019 - 09:28 PDT
Update
We have added extensive new capacity in an attempt to resume processing email. SMTP is now being processed however extreme delays remain an issue. We are continuing to investigate.
Posted Oct 24, 2019 - 08:49 PDT
Update
We are continuing to investigate this issue.
Posted Oct 24, 2019 - 08:45 PDT
Update
We are not yet aware of the root cause yet, but have confirmed that SMTP is inaccessible at the present time. The web console continues to be available; however, email traffic is not being processed. To our knowledge, no messages have been lost; however, customers will notice increasing queues.
Posted Oct 24, 2019 - 08:42 PDT
Investigating
Severity Level: 1

The SMTP service is currently down. We are investigating and will provide updates as we have more information to share.
Posted Oct 24, 2019 - 08:37 PDT
This incident affected: Outbound Filtering (SMTP Service).