Comments on: Amazon S3 Outage: We’ve All Been There https://networkphil.com/2017/03/08/amazon-s3-outage-weve-all-been-there/ networking | writing | teaching Wed, 27 Jul 2022 14:14:19 +0000 hourly 1 http://wordpress.com/ By: Kendrick https://networkphil.com/2017/03/08/amazon-s3-outage-weve-all-been-there/comment-page-1/#comment-20084 Wed, 27 Jul 2022 14:14:19 +0000 http://networkphil.com/?p=2450#comment-20084 Thiis is a great post

Liked by 1 person

]]>
By: Oleg https://networkphil.com/2017/03/08/amazon-s3-outage-weve-all-been-there/comment-page-1/#comment-224 Sat, 11 Mar 2017 16:54:23 +0000 http://networkphil.com/?p=2450#comment-224 I’ve been there. I got an embarrassing call from a vendor asking me why they see links to my core switch bouncing. This is when I realized that I forgot to do reload cancel after making some changes….
Good thing it was waay after hours.

Like

]]>
By: Steve Winwood https://networkphil.com/2017/03/08/amazon-s3-outage-weve-all-been-there/comment-page-1/#comment-223 Fri, 10 Mar 2017 23:33:09 +0000 http://networkphil.com/?p=2450#comment-223 Telnet’d to the VTP server (also core switch), created a new VLAN, then *thought* I’d telneted to the access switch were I wanted to change the VLAN on some ports, and made the VLAN membership change.

Was *actually* still logged onto the core switch so took down the internet and MPLS links for the data centre.

The clue was the helpdesk wallboard lighting up straight away, cue cycling back through the command history to work out what I’d done!

Like

]]>
By: Phil Gervasi https://networkphil.com/2017/03/08/amazon-s3-outage-weve-all-been-there/comment-page-1/#comment-221 Fri, 10 Mar 2017 14:54:50 +0000 http://networkphil.com/?p=2450#comment-221 In reply to Nick Moody.

I would never think the stretched L2 design was your idea hahaha! I think you’re right in both ways – those incidents definitely shave a little bit off our life spans, but I think they also make us better engineers 🙂

Like

]]>
By: Nick Moody https://networkphil.com/2017/03/08/amazon-s3-outage-weve-all-been-there/comment-page-1/#comment-220 Fri, 10 Mar 2017 14:48:01 +0000 http://networkphil.com/?p=2450#comment-220 It’s refreshing reading your article Phil, not many people care to talk about previous mistakes. The biggest outages I’ve caused in the past have not always been because I’ve broken something but because I’ve fixed something.

One example of that was an IPSEC tunnel that failed to establish providing a backup layer 2 DCI link. Unbeknown to me spanning tree had not been configured properly on either of the switches at each DC and almost immediately post fixing the tunnel issue the link came up, caused a layer 2 loop and levelled both DC’s. I Had to ‘un fix’ the tunnel to break the loop and restore service.

I’m not sure if those types of incidents make us stronger engineers or just shave a bit more time off our life spans?

And no, the stretched L2 between the DC’s was not my idea!

Like

]]>
By: Charles https://networkphil.com/2017/03/08/amazon-s3-outage-weve-all-been-there/comment-page-1/#comment-219 Fri, 10 Mar 2017 07:47:42 +0000 http://networkphil.com/?p=2450#comment-219 In reply to Charles.

*the storage took its self offline! (Android autocorrect has gone mad!!)

Like

]]>
By: Charles https://networkphil.com/2017/03/08/amazon-s3-outage-weve-all-been-there/comment-page-1/#comment-218 Fri, 10 Mar 2017 07:40:43 +0000 http://networkphil.com/?p=2450#comment-218 In reply to Charles.

*for some reason!

Like

]]>
By: Charles https://networkphil.com/2017/03/08/amazon-s3-outage-weve-all-been-there/comment-page-1/#comment-217 Fri, 10 Mar 2017 07:39:01 +0000 http://networkphil.com/?p=2450#comment-217 I had a change to create a new vlan at a large automated warehouse, console reasons I couldn’t remember if you had to add the vlan to the port channel or the switchports so I had a bright idea to do it to them both at the same time ‘int range t1/1-4,po10’…. It didn’t work, the port channel unbundled spaning-tree reconverged while the port channel re-bundled. I thought I’d got away with it untill a couple of the server guys who were monitoring the site during the change came over and asked me if the network was down. Turns out the site had some old LeftHand iSCSI storage for all the server’s, connected via the network, when the network was partitioned both storage clusters became active data got corrupted and then when the network reconverged the storage room it’s self offline! Took the server guys the rest of the day to restore stuff and bring the site back online.

Like

]]>
By: Phil Gervasi https://networkphil.com/2017/03/08/amazon-s3-outage-weve-all-been-there/comment-page-1/#comment-216 Fri, 10 Mar 2017 00:30:12 +0000 http://networkphil.com/?p=2450#comment-216 In reply to Timothy Manito.

Oh that’s a good one! I’ve done that one myself, and I bet most network engineers have too. Thanks!

Like

]]>
By: Timothy Manito https://networkphil.com/2017/03/08/amazon-s3-outage-weve-all-been-there/comment-page-1/#comment-215 Thu, 09 Mar 2017 23:29:58 +0000 http://networkphil.com/?p=2450#comment-215 Forgot to add the word “add” in switchport trunk allowed vlan command in one of the MDF switches in a remote site. Luckily there is no production that time and there is an engineer that time that can setup a workstation for me that I can RDP to.

Like

]]>