CIO Today

CIO Today Network Sites:   Top Tech News  |   CIO Today   |   Mobile Tech Today   |   Data Storage Today
Daily Briefing for Technology's Top Decision-Makers
APC Free White Paper
Optimize your network investment &
Enter to win a Samsung Galaxy Note

www.apc.com
Thursday, April 24th 
Next Generation Data Center Is Here!
This ad will display for the next 20 seconds. Please click for more information, or scroll down to pass the ad, or Close Ad.
Trending Topics:   Security Heartbleed Big Data Cloud Computing Windows XP Data Centers OS X Mavericks
Home
Enterprise Software
Enterprise Hardware
Big Data
Network Security
Cloud Computing
CRM Systems
Data Storage
Operating Systems
Communications
CIO Issues
Mobile Tech
Chips & Processors
World Wide Web
Business Briefing
After Hours
Press Releases
 
Free Newsletters
Top CIO News
 
Mobile Tech Today
 

Network Security

Google Blames Outage on Software Bug

Google Blames Outage on Software Bug
January 27, 2014 1:30PM

Bookmark and Share
Isolated events like this are not a problem and users will forgive the Google outage. It becomes a problem when a pattern develops. If it were to happen multiple times it could become a problem for Google. Gmail has become a very strategic product and it's unlikely that Google will experience many more of these outages, said analyst Greg Sterling.

APC has an established a reputation for solid products that virtually pay for themselves upon installation. Who has time to spend worrying about system downtime? APC makes it easy for you to focus on business growth instead of business downtime with reliable data center systems and IT solutions. Learn more here.

If you are a hardcore Google user, you may have been tempted to pull out a few hairs last Friday as several of the company’s key services experienced a painful hiccup. Now, Google is shedding some light on the incident.

Specifically, Google users who use logged-in services like Gmail, Google+, Calendar and Documents were unable to access those services for about 25 minutes, according to Google vice president of Engineering Ben Treynor.

“For about 10 percent of users, the problem persisted for as much as 30 minutes longer,” he said on Friday. “Whether the effect was brief or lasted the better part of an hour, please accept our apologies -- we strive to make all of Google’s services available and fast for you, all the time, and we missed the mark today.”

What Really Happened?

Treynor reports that the issue has been resolved, and the company is now focused on correcting the bug that caused the outage, as well as putting more checks and monitors in place to ensure that this kind of problem doesn’t happen again. He then offered a technical explanation for what occurred and how it was fixed.

At 10:55 a.m. PST Friday morning, Treynor explained, an internal system that generates configurations -- essentially, information that tells other systems how to behave -- encountered a software bug and generated an incorrect configuration. The incorrect configuration was sent to live services over the next 15 minutes, caused users’ requests for their data to be ignored, and those services, in turn, generated errors.

“Users began seeing these errors on affected services at 11:02 a.m., and at that time our internal monitoring alerted Google’s Site Reliability Team. Engineers were still debugging 12 minutes later when the same system, having automatically cleared the original error, generated a new correct configuration at 11:14 a.m. and began sending it; errors subsided rapidly starting at this time,” Treynor said. “By 11:30 a.m. the correct configuration was live everywhere and almost all users’ service was restored.”

Will Google See User Backlash?

With services once again working normally, Treynor said work is now focused on removing the source of failure that caused Friday’s outage, and speeding up recovery when a problem does occur. Google then took three more steps:

(1) Correcting the bug in the configuration generator to prevent recurrence, and auditing all other critical configuration generation systems to ensure they do not contain a similar bug; (2) adding additional input validation checks for configurations, so that a bad configuration generated in the future will not result in service disruption; and (3) adding additional targeted monitoring to more quickly detect and diagnose the cause of service failure.

We caught up with Greg Sterling, principal analyst at Sterling Market Intelligence, to get his take on the outage -- and its resolution. He told us because Google has such a strong reputation as a engineering-driven company when something like this happens it's surprising to many people.

“Again, isolated events like this are not a problem and users will forgive the outage,” Sterling said. “It becomes a problem when a pattern develops. If this were to happen multiple times it would start to become a problem for Google. Gmail has become a very strategic product for the company and it's unlikely that Google will experience many more of these outages.”

Tell Us What You Think
Comment:

Name:



 Network Security
1. Fund Seeks To Head Off Heartbleeds
2. Lessons from Verizon's Threat Report
3. Verizon Report Exposes Cyberthreats
4. How Are Web Sites Post-Heartbleed?
5. White House Updating Privacy Policy




 Most Popular Articles
1. BlackBerry Drops T-Mobile After Nasty Spat
2. Cisco, IBM Launch Internet of Things Consortium
3. Salesforce CRM Gets Industry Specific for Internet of Customers
4. Intel Bets on Cloudera for Big Data Analytics
5. SAP HANA Data Warehouse App Gets Faster Analytics

Have an informed opinion on this story?
Send a Letter to the Editor.
We want to know what you think.
Send us your Feedback.

 Related Topics  Latest News & Special Reports

  Fund Seeks To Head Off Heartbleeds
  Salesforce Developing App SOS Button
  Google Maps, Now with Time Travel
  Lessons from Verizon's Threat Report
  NYPD Twitter Campaign Backfires

 Technology Marketplace
Business Intelligence
Get real-time, cloud-based information services with Neustar.
 
Cloud Computing
Next Generation Data Center Is Here! Vblock™ Systems from VCE
 
Contact Centers
HP delivers the future of the contact center with HP Qfiniti 10.
 
Data Storage
Next Generation Data Center Is Here! Vblock™ Systems from VCE
Barium Ferrite (BaFe) is the future of tape.
2.5" Enterprise-class SATA & SAS SSDs for server & storage applications
 
Enterprise Hardware
Barium Ferrite (BaFe) is the future of tape.
2.5" Enterprise-class SATA & SAS SSDs for server & storage applications
 
Hardware
Protect your network with APC Smart-UPS battery backup
 
Network Security
Protect your network with APC Smart-UPS battery backup
 

Network Security Spotlight
Tech Giants Fund Initiative To Prevent Future Heartbleeds
Can more funding prevent Heartbleed vulnerabilities in future open-source software? A new Core Infrastructure Initiative at the Linux Foundation is attempting to find out.
 
What Verizon's Data Breach Report Can Teach Enterprises
It’s probably not a jaw-dropper, but cyberespionage is officially on the rise. And the use of stolen or misused credentials is still the leading way the bad guys gain access to corporate information.
 
Top Cyberthreats Exposed by Verizon Report
Beyond Heartbleed, there are cyberthreats vying to take down enterprise networks, corrupt smartphones, and wreak havoc on businesses. Verizon is exposing these threats in a new report.
 
Navigation
CIO Today
Home/Top News | Enterprise Software | Enterprise Hardware | Big Data | Network Security | Cloud Computing | CRM Systems
Data Storage | Operating Systems | Communications | CIO Issues | Mobile Tech | Chips & Processors | World Wide Web
Business Briefing | After Hours | Press Releases
Also visit these Enterprise Technology Sites
Top Tech News | CIO Today | Mobile Tech Today | Data Storage Today

Services:
FreeNewsFeed | Free Newsletters | XML/RSS Feed

About CIO Today Network | How To Contact Us | Article Reprints | Services for PR Pros (In partnership with NewsFactor) | Top Tech Wire | How To Advertise

Privacy Policy | Terms of Service
© Copyright 2000-2014 CIO Today. All rights reserved. Article rating technology by Blogowogo. Member of Accuserve Ad Network.