AWS Outage Making Canvas Inaccessible for Many Users

Incident Report for Instructure

Resolved

The delay in notifications being sent has been resolved, and any queued notifications have since been processed. Thank you for your patience during this time while we've utilized additional resources to fully process notifications, and you should now see notifications being sent out as expected.

Posted Feb 28, 2017 - 19:37 MST

Update

We are confident users are not impacted by any further issues with accessing Canvas at this time. However, we are continuing to monitor as we see about a 1.5 hour delay in notifications being sent: Throughout the incident today, we've had a queue of Canvas notifications building, to be sent to users once AWS was up and running again for the affected region. We started sending those notifications again around 3:18 pm MT, but there is still a large queue of notifications. This may keep users from receiving those until an hour or two after they normally would have appeared.

Posted Feb 28, 2017 - 18:13 MST

Update

Our DevOps team is still monitoring to ensure everything is working well, though we have not seen any lingering issues that would affect your users in the last hour. We will add another update here when we're sure everything is good again.

Posted Feb 28, 2017 - 17:11 MST

Monitoring

Amazon has verified that uploads to their service should be working again; users should be seeing improved performance with their uploads to Canvas. Our DevOps team is continuing to monitor the situation, but we are not currently aware of any lingering issues that affect Canvas functionality at this time.

Posted Feb 28, 2017 - 15:59 MST

Update

In our previous update, we mentioned there would still be areas of impaired functionality between Canvas and Amazon. The biggest area of impact right now is that uploads are not yet working. This includes student uploads to assignments, instructor grade uploads, and similar functions, but also the ability for Canvas' background processes to upload files such as admin reports (which is required as part of the process to generate a report at the account level). You may continue to see issues with this, and other areas in Canvas, as Amazon works to fully restore all services.

Posted Feb 28, 2017 - 14:37 MST

Update

Canvas performance and service recovery continues to progress quickly. Although many users should now be able to access Canvas, there may still be areas of impaired functionality as we work through remaining issues.

Posted Feb 28, 2017 - 14:15 MST

Update

We are beginning to see positive indications of recovery and have successfully tested workflows that were previously failing. We are still awaiting full resolution, and we will provide updates as the situation continues to improve.

Posted Feb 28, 2017 - 13:54 MST

Update

AWS is still working through their recovery process. Unfortunately, the number of Amazon services that have been impacted has grown in the time it took to find the root cause, and it will be a significant effort on their side to recover all of the services. They are understandably starting with the most critical ones. Since Canvas depends on so many of their services, a full recovery may still take some time.

On our side, our DevOps team has moved on to other ideas about how to get from a "service disruption" state to a "degraded performance" state in Canvas. We are also discussing the plans for addressing similar circumstances in the future, though our options are limited due to the perniciousness of this incident; but we are considering all options at this time.

Posted Feb 28, 2017 - 13:45 MST

Update

Amazon is continuing to work through their recovery process. On our side, our DevOps team has implemented a temporary change to ensure tools and apps not hosted on AWS (Amazon Web Services) are still accessible to those that are able to access Canvas, which is an improvement to the complete service disruption we have had since 10:37 AM MST. However, the majority of Canvas users are still unable to access their Canvas site, due to the outage with AWS.

We will continue our efforts to ensure a good experience with Canvas for users once they are able to access the site again, and will provide an update on the overall issue within the next 30 minutes.

Posted Feb 28, 2017 - 13:05 MST

Update

As Amazon works to restore availability in their systems, our DevOps team continues their efforts to expedite the process to restore access to Canvas. We will provide a new update on their progress in 30 minutes or less.

Posted Feb 28, 2017 - 12:29 MST

Update

Amazon Web Services has informed us that they have identified the underlying root cause of the issue and they are beginning the remediation process. Our internal DevOps team continues to explore options to facilitate faster recovery.

Posted Feb 28, 2017 - 12:04 MST

Update

Amazon is still working to restore server access for sites that have been affected by their outage today, including many Canvas sites. They will keep us updated on their progress.

Posted Feb 28, 2017 - 11:52 MST

Identified

Amazon has narrowed the scope of their investigation and has identified a specific region impacted by the networking issue. They are actively working on a solution. Our own DevOps team is investigating options that may allow us to work around the problem. We will provide another update in 15 minutes.

Posted Feb 28, 2017 - 11:27 MST

Update

Amazon has updated their status page to indicate they are investigating increased error rates for their servers. They are working with us to provide updates on the issue; we will update this page with any new information. In the meantime, you can monitor their status page at https://status.aws.amazon.com/.

Posted Feb 28, 2017 - 11:18 MST

Update

Amazon Web Services is currently experiencing what appears to be a large-scale networking issue that has impacted Instructure along with many other companies. We are working with Amazon to diagnose the problem and waiting for updates on their mitigation timeline. We will keep you posted as soon as we have more information.

Posted Feb 28, 2017 - 11:03 MST

Investigating

Canvas is currently experiencing an outage that we are investigating. Our DevOps team has determined that this is an AWS (Amazon Web Services) Outage. We will post updates as they become available.

Posted Feb 28, 2017 - 10:45 MST