Machine Learning is a Marvelously Executed Scam

I’ve joked periodically that machine learning (or “ML”) adherents claim to be able to sort through vast quantities of data and find absolutely anything except a viable business model.

I’d like to revisit that with a clarification: They also apparently can’t tell a story around what they do that isn’t completely batty.

At first, I thought this was something innocent—along the lines of “something that attracts people to the ML space also precludes attracting storytellers.”

I’ve since shifted my annoyance into what can only be termed “grudging admiration” around the sheer genius of the scam they’ve pulled off.

Isn’t that a bit over-the-top?

Not really!

The use cases we’ve seen of companies using AI/ML to solve business problems are either highly specific in the sense of “you input your records of many millions of credit card transactions, and then the magic technology will find the fraudulent ones,” or highly ridiculous stories like WeWork discovering that people drink coffee in the morning so they should hire a barista.

In fact, if you visit AWS’s machine learning page and scroll down to the customer quotes, you’ll see that every one of them alludes to marginal process improvements.

There’s not a single customer testimonial on that page that says anything remotely close to “this technology transformed our business” or “we’d be sunk if not for this modern technical miracle.”

Customer testimonials around other AWS services give concrete numbers, say things like “this would not have been possible in our data centers,” and sing the services’ praises.

In ML, the overwhelming takeaway is “yeah, it was fine.”

The level of ML excitement on the part of AWS is in no way matched by its customers—at least, those customers that have businesses that don’t involve “selling ML onwards.”

That doesn’t make it a scam!

To use AI/ML effectively, you need three ingredients that are universally agreed upon by everyone.

The first is a vast quantity of data, for which you’ll invariably pay your cloud provider (or heaven forbid, your SAN vendor) an eye-wateringly large pile of money. The second is a lot of compute power—frequently and specifically GPUs, which are a specialized form of compute that of course costs significantly more. And the third is people who are trained in this arcane form of wizardry, who are a lot like regular software engineers except they cost a lot more money.

When you can’t spot the sucker at the poker table, wisdom holds that it’s you.

When you look at the business case for a company’s ML ventures, it’s reasonable to ask “did an ML algorithm write this?”

During the California Gold Rush, the people who made actual profit from the rush west weren’t the miners themselves. It was the merchants who supplied them.

It’s pretty clear that AWS and its cloud competitors are selling pickaxes into a frenzy.

But this is what customers want!

Let’s take the AWS base case for AI/ML offerings: their Rekognition service. If you click that link (at least, at the time of this writing), you’ll see this:

amazon rekognition

No customer—not a single one—cares at all about that.

What they care about is this business problem: “I have a picture and I need to know if it’s a picture of a cat or not. As an added bonus, I’d like you to also tell me how confident you are in that assessment.”

Past that, the customer doesn’t care whether the service is powered by machine learning, an incredibly lucky/accurate coin toss, or an army of people who click yes or no very quickly (assuming that those people are allotted ample restroom breaks).

I repeat: The customer does not care.

So why does every keynote about Rekognition and every other service like it belabor just how much machine learning goes into the service?

Honestly? I usually test out AWS services by standing them up myself to determine if they’re good or not; I may soon be able to bypass that by a “machine learning” algorithm of my own: the more the marketing for a service talks about ML, the worse the service is.

It’s still early days; technology is built upon iterative improvements, you insufferable jackass.

Look, I don’t expect a machine learning algorithm to write a blog post like this; natural language is hard even without sarcasm factored in.

But let’s take a perfectly bounded problem space with vast amounts of data but clearly defined inputs and outputs: the AWS bill.

Why is the only machine-learning-powered service that AWS offers around the bill one that basically kinda sucks?

Don’t get me wrong, Compute Optimizer will look at your usage and supposedly identify some rightsizing opportunities. But rightsizing is usually nonsense, and it’s not touching things like “you take in 1 petabyte of data from the internet and pass 50 petabytes of data between AZs every month”—a behavior pattern that’s almost certainly wrong and vastly more impactful to your AWS spend.

The scattering of startups offering to use ML to optimize your AWS bill, fine-tune your observability, or unleash “AIOps” fare no better. They’re selling hype—not outcome. (Shameless aside: Give us a ring at The Duckbill Group and we can get that AWS bill under control—no ML needed!)

And I’m sorry: I have a hard time accepting that the company that’s going to paint the way forward around extracting meaning from your data is the same one that literally has a product named AWS Systems Manager Session Manager, emails you six copies of the same marketing email, and greets brand-new users with a “look what’s changed in our console” banner ad.