Recent service issues with MailChannels Outbound Filtering
Incident Report for MailChannels
Postmortem

Between September 30th and October 2nd, our Outbound Filtering service periodically became overloaded because of a coordinated attack on our infrastructure by automated spam bots. The overloading condition caused delays for some customers. We have now identified the root cause for these delays and have made changes to prevent the same situation from recurring.

What happened?

A flood of invalid authentication attempts from automated spam bots placed unexpected, sudden load on our SMTP service. One of our internal components responded to the additional load by artificially slowing down its response rate to incoming connections. This “load shedding” behaviour led to client queues growing as incoming SMTP connections piled up waiting for service.

Note that MailChannels is designed to scale automatically and, in most cases, a sudden increase in requests will result in additional capacity being brought online within seconds.

What have we done to mitigate this?

We have written new policies to detect and thwart the bot attacks before they can overwhelm our authentication capacity. We have deployed more resources (database, MTA) so that we can handle these additional spurious requests without delaying legitimate authentication requests. Previously, we were permanently rejecting the connections in case of the authentication failure. We have now made these temporary rejections so that the mail server can retry sending the emails.

We have also improved our monitoring around the authentication issue and the number of connections. Our infrastructure auto-scales resources based on a number of parameters. We have added additional parameters so that the back-end MTA’s will auto-scale based on the number of connections before they go into stress mode.

In the near future, we will add automated updates to the status page if we identify any issues with the authentication infrastructure.

We would like to reassure our customers that we are constantly working on new capabilities so that these issues do not happen again.

If you have any concerns, please do not hesitate to reach out to us.

Posted Oct 09, 2019 - 13:54 PDT

Resolved
Over the last few days, some of our valued Outbound Filtering service customers have been experiencing intermittent delays in the delivery of email. We apologize for the issue and assure you that we are working around the clock to identify the root cause and put in a permanent fix.

Here’s what we know so far:

Delays are being caused at least partially because of a flood of invalid authentication attempts from bots.
This issue is most pronounced between 0600 and 1300 UTC when the bot attacks are most intense.

We are writing new policies to detect and thwart the bot attacks before they can overwhelm our authentication capacity. In the meantime, we are deploying more authentication resources so that we can handle these additional spurious requests without delaying legitimate authentication requests.

We would like to reassure our customers that we are doing everything that we can to reduce the impact and that we are working round the clock to fix this issue permanently.

If you have any concerns, please do not hesitate to reach out to us.
Posted Oct 02, 2019 - 00:00 PDT