UPDATE: Console Failure Rates – Should We Really Panic?


With the release of the PS4 in the US on Friday, and the subsequent reports of systems failing or bricking within hours of use, I thought I’d have a dig into reports of console failure rates, the science behind product reliability, and take a look at what we’ve seen happen in the past with new technologies.  I’ve got a PS4 on order, lots of people are excited about the Xbox One release in a week, so should we be worried now about whether our shiny and expensive new purchases are going to last?

 

Hardware Failures

 

First of all, tech products are getting more complex.  There’s no argument against it.  Developments in electronics mean components and circuitry are getting smaller and more powerful, and consumers want their devices to do more and last longer.  As things get more complicated the chance of a failure increases and the reliability of a product decreases, and the longer a product is used, the higher the chance of component or system failure.  Reliability engineering is used to quantify the probability of failure and help in the development of any product to reduce that failure rate, also ensuring things work as they should and increase the overall product life.  If we wanted to express reliability mathematically, it would look like this:

 

Reliability Function

 

where f(x) is the failure probability density function and t is time.  But what does this really mean?  Reliability engineering has a great term “the bathtub curve” which is used to visualise a products lifecycle from release to the end, and can be used as a general rule for how things behave as a holistic system.

 

Bathtub Curve

The “Bathtub Curve” (credit: weibull.com)

 

At the beginning of a product lifecycle there is an increased chance of failure mainly due to manufacturing flaws and defects.  As a product becomes established these issues are ironed out and the failure rate stabilises at a pretty low level.  Once the product nears the end of its life, you’re more likely to see it wear out simply because it gets old.  This is very normal, and manufacturers work with this scenario constantly, employing analysis and testing techniques throughout development which can be performed from an individual component level right up to full system level.

 

From a production perspective, once a batch of products have been manufactured there will be a standard set of inspections performed to ensure the quality of the product, in most cases using an AQL system (Acceptable Quality Level).  This method provides a standard way of approaching each manufactured batch (or lot) so that a number of samples can be taken out at random and checked for acceptable quality, and this negates the need for 100% inspection.  Handy when you’re making hundreds of thousands of units.  Within this process a certain percentage are allowed to “fail”, and by that I mean don’t perform to the accepted standard, though they may still be functional.  This failure percentage can appear to be quite high if you’re not used to the manufacturing and production environments, and could be up to 4.0% of the batch depending on how the failure is classed.

 

AQL-tables-ISO-2859-1

AQL Sample Size Selection Chart

 

As an example of what is an acceptable quality level: critical failures are not accepted at all, and critical could be causing harm to the end user or failing to meet regulatory standards; major failures of up to 2.5% might be accepted, and a major failure is something that the end user would not accept; and minor failures of up to 4.0% might be accepted, with minor being classed as something the end user might not mind.  If the acceptable failure level is exceeded then in theory the whole batch should be QC checked or, in certain industries, discarded.  Please note that I’m generalising somewhat with these numbers, some product areas have higher standards (such as aircraft), some have lower, it all depends on the type and end use.

 

So what am I trying to say with this?  Simply, nothing is perfect.  No matter how well designed and manufactured something is, the more components and moving parts, the more chance it’s going to break on you.  This is why you’ll always find problems with the first iteration of new electronic equipment.  The question stops being about why individual units fail, and becomes more about how many of the total have failed, and from there, is there a serious issue that would warrant corrective action or (in the worst case), a full product recall?  Let’s take the example of the reports of failed PS4 units and do some basic calculations to work out if there’s a serious issue or not.

 

We have no way of knowing how many units have failed, how many are reported to Sony directly, or how many units have been manufactured to date, so this is to demonstrate the principle and cannot be taken as accurate.  We know Sony’s sales forecast for the PS4 up to the end of the year is 3 million units (widely reported, but the link I used is here), and a suggested AQL for high-end electronic goods being directly sold to the consumer is 1.5% for major failure (this might be 2.5% for selling to the wholesale sector, but we’ll go with the worst case scenario).  So 3,000,000 x 0.015 = 45,000 units would be the maximum “acceptable” failure over the next 6 weeks.  Let’s not assume an even distribution of those failures over the time period of 42 days because  I’d actually expect to see a higher percentage in the first couple of launch days and at Christmas when consoles are switched on for the first time, and the fact there are staggered launches worldwide, as well as stock shortage periods over the coming weeks.  We also have an estimate of launch consoles for the US (not sure how reliable) of 1 million units for the first week, so that would be 15,000 “acceptable” failures in this time frame, with the majority of these coming from launch day units because we know most stores and websites are sold out at launch.  Take a minute to process that, 15,000 PS4′s, that’s a lot of fancy dead tech!  But this is deemed as an acceptable worst case because it falls in line with the complexity of the hardware, the precision tolerances of the manufacturing process, and the transit trauma of international freight.

 

Container Freight

Crushing and collapsing container stacks.

 

So, are we seeing that many fail?  As of writing this, there are 334 one star reviews on Amazon.com for failed units (straight, PS4 only units), there are the media reported IGN and the Reddit competition winner issues, the NeoGAF users, then there are a number of YouTube videos as well.  There are a lot, but we’d expect a lot because it’s new tech, and (so far) we’re not into the tens of thousands of reports that would hint at a serious manufacturing defect.  Large multinational companies do not ship tech that they’ll have to recall or replace on mass because they’ve got a launch day to hit, the cost and share price impact is too much, especially as the units are selling at a loss initially.  You also have to think of the context.  You might say 334 upset Amazon users is a lot, particularly as there are only 927 reviews, but how many times have you seen someone who’s happy with a product report it?  You’d go back and complain if you got a bad coffee from Starbucks, but it takes a special type of person to go back and compliment them if it was good.  Same with the Amazon users.  It’s probably as little as 1% of their users who’ve given their opinion, and the ones that are unlucky with the D.O.A.’s will be the most vocal, and rightly so because there’ll be a lot of disappointment out there.  I’d be gutted too.

 

If we take what’s happened in the past with console launches as examples, does this make any difference to the information we’re getting back about the PS4 launch?  Cast your mind back to 2005 and the Xbox 360 launch and the number of failures reported at the time, mainly due to the Red Ring of Death (RROD).  Microsoft advised that the hardware failures were in-line with electronics expectations of between 3% – 5% (much higher than the percentage I’ve worked to above), but independent surveys put in between 23% and 54%, though these are not fully reliable sources of information due to them either being a 3rd party warranty provider that didn’t deal with all the consoles, or gamer opinion.  Whatever the cause of the RROD, which has not officially been confirmed by Microsoft, they felt compelled to instigate a 3 year warranty for all consoles, and collected, repaired and returned all machines to customers with a problem free of charge.  The release of revised hardware in the slimmer model removed the RROD indication system and reduced the warranty back to 12 months, near enough confirming the issue was resolved.  Did this impact sales?  No, the Xbox 360 has sold over 80 million units to date and anticipation for the Xbox One is very high.

 

Xbox 360 RROD

 

The PS3 had a smoother launch but wasn’t free from issues.  The most widely reported failure was the Yellow Light of Death (YLOD) which indicated a non-specific machine fault but generally has been assumed to be the solder connections on the motherboard cracking due to heat cycling within the machine.  BBC’s Watchdog reported the issue back in 2009 which prompted Sony to respond that around 0.5% of all PS3 failures could be attributed to the YLOD, and that it was a manufacturing defect.  This issue seems to have pretty much disappeared since the redesign to the slim model.  Alongside YLOD, but with no numbers to back it up, there was the issue of the Blu-ray drive failures.  Purely conjecture here, but a proportion of people with the original launch models suffered Blu-ray failures with their machines, usually outside the 12 month warranty period.  As there was no indication that it wasn’t anything outside standard usage that caused the problem, Sony would repair these for a nominal fee, but it was frustrating that the original drives didn’t have a longer working life.  Again, did this impact sales?  No, the PS3 also has sold over 80 million units to date.

 

PS3 YLOD

 

What about failures outside the console industry for perspective?  The recently released Samsung Gear had Best Buy reporting that 30% were being returned by customers.  Was this due to faulty hardware, or the fact Samsung hadn’t patched its Android version on phones and tablets to accept the connection to the watch?  It’s not clear, but you’d hope that the manufacturer had made it visible that the Galaxy S4 was the only phone at launch it would work with, which would suggest there’s something seriously wrong with the device.  There’s also the MacBook Air’s that have been sold between June 2012 and June 2013 that might have a faulty flash drive.  Apple are repairing or replacing all units that are affected free of charge (see here for info if you’ve got one).  The first generation iPod Nano’s were also subject to a similar notice for replacement due to faulty batteries that might overheat, and notices issued for the HDDs for some iMacs, and the casings of some MacBooks.  A more sobering thought is in aircraft design and how serious defects aren’t found until it’s too late.  de Havilland’s Comet was the worlds first jet airliner, launched in 1952 for the route between London and New York.  The designers opted for square windows to make the look more appealing, not understanding how this would raise the stress levels at the corners of the windows and cause fatigue in the frame, and consequently, the planes to crash.  All these examples show that no matter how good the design or manufacture, you can’t account for everything and a percentage will fail, high percentages or critical failures in these cases.

 

To conclude this article, the only thing I can say is things aren’t looking disastrous for the PS4 yet, and the perceived high level of failures looks to be normal at this stage, and is probably exacerbated by the level of media connection we have now versus 7 years ago.  It’s so easy for us to Tweet, Facebook and Blog all our bad news, and by nature it’s the thing we all latch on to when we read it.  We’ve got an Xbox One launch in 6 days and there will be console failures there too.  These will make the headlines alongside the PS4 numbers, and most likely set off a fanboy argument over who’s the most/least reliable.  If you’ve got a new console on order, don’t worry just yet, and if it is dead on arrival, there’s an automatic 12 month warranty which means you’ll get it repaired or replaced at no cost.  Remember, the statistics and probability of failure are in your favour, enjoy the build up and the hype, and if things go wrong it’s not the end of the world.

 

UPDATE: The Xbox One has launched worldwide with glittering events in many locations (and some controversy in the UK over the choice of representatives for the Microsoft brand), and surprise, surprise; there are reports of failures of the day one consoles.  Overwhelmingly the issue seems to be faulty disc drive units, with many users complaining of grinding sounds coming from the console when discs are being read.  There are reports of the “Green Screen of Death” when it comes to powering up the console for the first time, where the unit stalls on the pre-controller configuration screen, though there are a lot of users saying this does clear after time, or rebooting fixes it.  But where are we in terms of failure numbers?

 

Applying the same logic as the PS4 release (i.e., we have absolutely no clue and cannot get access to actual information), we can get a very small and unreliable steer from Amazon.com and their number of negative reviews.  There are 288 one star reviews there at the point of writing this, giving approximately 30% of the total number who’ve chosen to leave a review on the site.  Comparing this with the PS4 numbers from last week (and a check and update of the numbers today), it is very much the same.  Both consoles are experiencing failures, but ones that should be within acceptable limits from the manufacturing process.  It’s also worth noting that the Xbox One sold 1 million consoles worldwide on launch day, the same number the PS4 managed, making the failure information we’re hearing about so far seem very normal.

 

The one difference between the Xbox One failures and the PS4′s is the response to consumers.  Microsoft is using it’s advance replacement service to send out new consoles to affected customers before they return their faulty one, that’s a great a piece of customer service, and considering how quickly it’s been implemented strikes me as something they were prepared for (which makes sense when they know they’re expecting at least 1% – 2% to not work out of the box).  So far I’ve not seen anything on Sony’s customer service and their plans to replace consoles, I can only assume they are working in their normal way with this.

 

Should this new information affect your decision on whether to go for one of the new consoles?  No.  A hardware failure could happen at any point, with some estimates saying at least 15% will die within the first two years, which from personal experience I’ve had with the PS2 and the PS3, but most of us early adopters have seen the same.  The joy is always in the new technology and the new experiences they bring with them, and we should focus on the positive side – there are nearly 2 million people out there with working machines having fun, and I can’t wait to join them.

Written by Matt

Matt

Gamer, F1 fanatic, amateur DJ (out of practice), MGS obsessed, tech geek.


3 Comments

  1. Reason November 18, 2013 5:51 am  Reply

    Fact: Most people don’t go post a review for a defective product. They handle things offline. This could be very troubling that hundreds of verified users are reporting on Amazon. They are coming in every few mins, and these are verified purchasers. Who knows how many calls Sony is getting, especially when many people report 2 hr wait times to speak to someone.

  2. Chris November 18, 2013 8:55 am  Reply

    Great opinion piece Matt. Enjoyed reading it.!

    • Matt November 18, 2013 7:02 pm  Reply

      Thanks Chris, was good researching it, and seen even more information since it was posted too. Think this debate is going to run and run…

Agree or disagree? Let us know!