What Your Business Can Learn From EB Games' Colossal Website Failure

Image: iStock

When EB Games took pre-orders for the Nintendo Mini NES Classic earlier this week, the website crashed under the sheer volume of traffic from eager buyers. This happened two days in a row, leaving a mob of angry customers in its wake. With all the hype around the classic console, you'd think EB Games would have expected the level of traffic to its online store and worked to ensure its website was reliable. We take a look at where EB Games might have gone wrong and what businesses can learn from this debacle.

People who were trying to buy the Nintendo Classic Mini NES from EB Games on Monday were pulling their hair out after the website crashed because the servers in the back-end couldn't support the amount of traffic that was coming through. Many customers who had their hopes up after adding the Mini NES to their cart ended up missing out on pre-ordering the console because they were stuck in checkout hell - the website wouldn't let them complete the transaction. Other customers didn't even get that far as the page failed to load at all.

So the next day, EB Games beefed up its servers. But that wasn't enough and the website crashed again on Tuesday. The company released a statement to appease angry customers who missed out on the pre-order due to the failures of the website:

"Despite juicing up our servers, our website just couldn't cope with the record traffic of tens of thousands of enthusiastic gamers. We were running 45 servers, each with 32 CPUs, for a total of 1440 CPUs handling the website.   "On a normal Tuesday, we have about 500,000 page views. Yesterday, we hit over 7,500,000.

EB Games hasn't given us anymore detail about the hardware and software supporting its website but we can make a few assumptions based on what happened during the website crashes. Some users who visited EB Games online were greeted by the Microsoft Information Internet Services (IIS) welcome page so it's safe to assume that is the web server that is being used. There were some customers who saw a message on their browser that pointed to EB Games hosting at least a portion of the website or web app on the Microsoft Azure public cloud platform.

Image: IIS welcome page/Lifehacker

First off, lets address the hardware. The 45 servers should have been enough to handle 7.5 million requests in one day, given that they're not concurrent. There are online retailer websites out there that handle more requests with fewer servers than that.

Now, assuming that the website is hosted in a public cloud environment, you'd think that you could just spin up capacity to meet the demands on the fly. After all, aren't public cloud services all about scaling up on-demand? Not always.

If you want a web app to be scalable, you need to build elasticity into it which should be factored in during the development process. That may not have happened with EB Games online.

If the web app itself isn't that scalable, there are other measures that EB Games could have considered to handle the heavy traffic coming through. On the web server front, perhaps EB Games should have used NGINX instead of IIS. The NGINX open source web server is good at handling concurrent web requests. This might be a controversial suggestion given there have been benchmark reports that have been published that claim IIS outperforms NGINX, but there is also a lot of internet chatter among IT professionals that question those results. It's a nuanced topic and I'd love to hear from IT professionals out there who have used both IIS and NGINX about their real-world experiences in the comments section below.

So let's assume the EB Games web app isn't built for scaling and it could have used a better web server. It would take a long time to overhaul all of this and the company needed to find temporary fixes to try and ensure the website doesn't crash again the next day. EB Games has already said they had ramped up its servers in preparation of day-two of the traffic onslaught, but that still wasn't enough.

One thing the company could have done was implement more intelligent load balancing, spreading the pre-order and checkout specific traffic across multiple back-end servers dedicated to this task alone. Each of these back-end servers should also have session limits in place to ensure the server can respond successfully to users whose translations are already in progress. They could also set specific rules for session limiting using a web application firewall.

EB Games might retort that there was session limiting but it seems the company based it on the total amount of concurrent users. This is a big problem when you consider how many customers that got in early to add the Mini NES to their carts only to get screwed over at the checkout process. One way EB Games could have prevented this was to prioritise session IDs of users who have already started the buying process so that they could complete the checkout. That way, it frees up a space for another user to complete the checkout process.


We may never know exactly what happened behind-the-scenes on the EB Games online store and I doubt the company will be releasing anymore comments about the Mini NES pre-order debacle. Please bear in mind that we've made suggestions based on what we think might have been happening in the back-end; this is just one possible scenario. But the incident has served to highlight the importance of good solution architecture for web apps, especially for businesses that deal with online sales.

As customers continue to favour online shopping, your retail website serves as the face of the business. It might be the first thing that your customers interact with when they consider doing business with you and first impressions matter.


We'd love to hear from the IT community about other ways EB Games could have handled the pre-order situation better. It'd be great to get some healthy discussion going on about good solution architecture for websites.

What are your guesses as to what EB Games' architecture is? What strategy do you think the company used to handle Tuesday's traffic? Let us know in the comments.


Comments

    Sounds like they were using their standard ecommerce platform for this pre-order campaign. It's probably likely this is an off the shelf product, possibly from an ERP vendor, that hooks into their own internal sales back end. Or maybe the developed it themselves, with no integration into their internal systems. Maybe it was completely outsourced, hosted by a cloud service provider. Maybe they're just lying, they didn't have 40 servers, they just had 1. We can speculate about everything really...

    My suggestion would be, work harder on your supply chain, so pre-orders don't become an issue. That's the main problem here, don't put yourself in a situation where you need to take 7.5 million pre orders in 1 day.

    There's really no excuse for this type of failure anymore. Even in a critical load situation an end user should never see the default error pages - flicking them to a custom page or even a "we're a bit busy, but keep browsing' setup.

    Robust cloud installations should have the ability to up or downscale 'hardware', number of servers, load balancers etc - and all of this should have been tested and prepped before the sale.

    There are many ways to prepare, mitigate or handle spikes, any one of them would have been useful here.

    It all looks and smells like a company that doesn't really care about it's online presence and can't be bothered to do it right.

Join the discussion!