Amazon Web Services brought down by a typo

After hours of disruptions for services such as project management tool Trello, news website Business Insider and image hoster Giphy, it turns out Amazon Web Services’ (AWS) outage on Tuesday was caused by the simplest of errors: a typo.

S3, Amazon’s popular web hosting and storage platform, crashed on Feb. 28 due to what the company called “high error rates,” but according to new information, an Amazon employee accidentally input the wrong command and took a larger number of servers offline than was intended.

“An authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process,” explains a Mar. 1 post on the AWS website. “Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.”

AWS Canada outlines cloud-first model for tomorrow

The servers that were removed supported two other S3 subsystems, including the index subsystem, which manages the metadata and location information of all S3 objects in the region and is necessary for processing GET, LIST, PUT, and DELETE requests, as well as the placement subsystem, which relies on the index system to allocate new storage.

Essentially, as these systems restarted and took longer than expected to get back online, the entire eastern region’s network stayed down.

Amazon says that it will be making several changes to its operations as a result of this incident, which includes limiting the capacity of the tool that took down the servers as well as improving recovery time of key S3 subsystems.

“We have modified this tool to remove capacity more slowly and added safeguards to prevent capacity from being removed when it will take any subsystem below its minimum required capacity level,” the company says. “This will prevent an incorrect input from triggering a similar event in the future. We are also auditing our other operational tools to ensure we have similar safety checks.

Amazon apologizes for the inconvenience the outage caused for its customers, adding “we will do everything we can to learn from this event and use it to improve our availability even further.”

Amazon Web Services brought down by a typo

Would you recommend this article?

Share

Featured Download

Related Tech News

Hashtag Trending Mar.6- Facebook goes down; Amazon nuclear-powered data centres; Public trust...

Telus spearheads virtualized roaming gateways, in collaboration with AWS and Samsung

New AI Alliance to advance open source AI convenes IBM, Meta, AMD,...

Featured Tech Jobs

Hashtag Trending – Facebook admits to another data leak, Netflilx’s war on...

Hashtag Trending – New privacy tools from Google, Alexa goes job hunting,...

Hashtag Trending – Facebook’s new business venture, IT safeguards from Elections Canada,...

CDN in your inbox

Channel Daily News

Latest news

IT World Canada fights for survival

Only 23 per cent of Canadians have a healthy relationship with work; AI can help, says HP

Government of Canada announces major broadband investments in the west

Popular this week

HP enhances partner program, expands Amplify Impact

Pilot cybersecurity training program for women to recruit third cohort

Review: OnePlus 12 vs OnePlus 12R – A solid pair of devices

ITWC network