My pre-getting-out-of-bed ritual includes a check on the weather. This morning I see that in Upplands Vasby (the Stockholm suburb where I live) the high will be 1°C and the low -3°C. Which is fine, except the current temperature is below -8°C.

When automated calculations go wrong

When automated calculations go wrong


And it is not just Weather Underground with this problem. The BBC is reporting similar unusual conditions for Stockholm with a minimum temperature 6°C warmer than it actually is already.

Whoever wrote the system for generating or importing the forecasts must have left out a check to see does the forecast match the current reality. But then again, if they did have one, what can the system do if it finds a mis match? Ask for the forecast to be re-run (a massive computation problem for the agencies that produce them), or try and come up with their own? Neither really is viable. I guess they just have t hope that the theory and the reality will converge during the day.

It’s an interesting problem from a technical point of view though. At the weekend I talked with a guy who works at making sure the reviews you see on price comparison sites are correct. He doesn’t ensure that if someone says a TV is great, that it is. His problem is that when they scrape the retailers websites for product details and prices, they must match the products that internet users are actually looking for. Nothing would damage the credibility of a site faster than if you search for details of a fridge, but the responses include electric toothbrushes.

The solutions of course cannot involve humans checking these things. The volume and variety of things that people look for in the internet mean this all has to run without human involvement. The job of the people is to build and tune machine learning systems that constantly improve the quality of the output.

This is one of the key things that I have learned about Cloud – it’s all about the machines. Providing services on demand to users at scale means automation. Everywhere. From the things people see – you don’t ring up AWS to get an EC2 instance provisioned. Through the back end platforms – the only time a human in Amazon should get involved in your bill is when it becomes so big that they feel you need your own account manager. All the way through to the management of the underlying infrastructure.

A metric I was given recently was that in a traditional data centre 1 person can manage about 50-300 servers. In a Hyperscale data centre like Google’s or Facebook’s it is well over 10,000 per admin. Why? Because of automation. As well as enabling the system to manage itself (I am sure they rarely see “routine” errors), automating everything also removes one of the biggest causes of issues in the first place – human error.

Of course this isn’t easy to do (or everyone would be at it). Cracking the problems of automation, governance, and machine learning has enabled the likes of Google, Amazon and Facebook to scale to the size they have, without collapsing under the need to recruit half the planet to be admins for their infrastructure.

The challenge then is bringing that capability to everyone else. That’s what the teams I work with are doing, and is one of the reasons why I am (usually) happy to head into work each morning, even if it is -9°C.