Degraded Outbound SMTP Service
Incident Report for MailChannels
Postmortem

We regret to inform you that a configuration error resulted in the outgoing email sent through the MailChannels Outbound Filtering Service between 3:20 PM UTC and 6:20 PM UTC on July 13th being inadvertently quarantined instead of delivered. The surge in load on our quarantine system caused delivery delays across the entire service.

We have now delivered all of the incorrectly quarantined emails. Messages that are released this way will not appear in the log search page on the console. If you need confirmation of delivery, please contact support@mailchannels.com.

Due to the delays introduced by this problem, some messages may have timed out of the queue. Non-delivery receipts are sent to the sender for all messages that time out this way. The timed out messages will have the following error message

"delivery temporarily suspended: conversation with

outbound-smtp-gateway.outbound.svc.cluster.local[] timed out

while sending end of data "

We deeply regret the issues caused by this configuration error. To ensure that the same problem does not happen again, we have expanded our monitoring of quarantine traffic to detect unusual fluctuations. We have also made changes to strengthen the resilience of our configuration change process in this part of the system to prevent human error.

Frequently Asked Questions

Did any of my email get lost silently - i.e. without the sender being notified?

A small number of emails may not have been released from the quarantine. If you need confirmation of delivery, please contact support@mailchannels.com.

Due to the delays introduced by this problem, some messages may have timed out of the queue. Senders will always receive a “non-delivery receipt” letting them know that one of their messages could not be delivered. The solution would be to resend these messages.

How can I be sure this won’t happen again? MailChannels has fully identified the cause of this failure and we will make changes to ensure it cannot happen again.

If you have any further questions or concerns about this, please do not hesitate to reach out to us at support@mailchannels.com

Posted Jul 14, 2021 - 13:41 PDT

Resolved
We have identified the root cause of this issue. A configuration error resulted in mails between 3:20 PM UTC and 6:20 PM UTC being inadvertently quarantined. The quarantine system became overloaded, which caused delivery delays to continue for some time after the configuration error was corrected. We are releasing incorrectly quarantined emails, but it may be several hours before all such messages are released. Messages that are released this way will not appear in the log search page on the console. If you need confirmation of delivery, please contact support@mailchannels.com.

Due to the delays introduced by this problem, some messages may have timed out of the queue. Non-delivery receipts are sent to the sender for all messages that time out this way. The timed out messages will have the following error message

"delivery temporarily suspended: conversation with
outbound-smtp-gateway.outbound.svc.cluster.local[] timed out
while sending end of data "

The solution for this issue is to resend the email and it will be delivered.

We will post a full postmortem soon.
Posted Jul 13, 2021 - 17:16 PDT
Update
Mails in our queue from earlier this morning are still slowly being delivered. It might take a few hours for all the queued mails to be delivered.
Posted Jul 13, 2021 - 16:31 PDT
Update
Mails in our queue from earlier this morning are slowly being delivered. It might take a few hours for all the queued mails to be delivered.
Posted Jul 13, 2021 - 14:42 PDT
Update
We are working on clearing out the backlog of emails from our queues.
Posted Jul 13, 2021 - 13:39 PDT
Update
We have scaled up our infrastructure to handle the emails that may have backed up. We are continuing to investigate and fix the root cause.
Posted Jul 13, 2021 - 12:54 PDT
Investigating
We are investigating increased latency in delivery of outbound SMTP emails. We will keep the page updated when we have more information.
Posted Jul 13, 2021 - 10:00 PDT
This incident affected: Outbound Filtering (SMTP Service).