The year was 2005 & I had had some excellent results predicting future news stories, trends, etc… & monetizing them. Of course being a programmer I did this programmatically.

Microsoft, Google, & Yahoo were all paying me. Paying me to pollute the internet. It would be many years before they’d assert more control & frankly as they took away our predictions and gave them to others our algorithms had to stretch out into the “long tail” to find content.

At first we weren’t polluting, in fact we were probably on track to become a legit journalistic establishment. What’s so funny? Oh, yeah, legit and journalism in the same sentence. So maybe we didn’t miss out on much, but the point is once it wasn’t enough to get the scoop to get the traffic because the powers-that-be would give it to someone else, we started scraping and aggregating and arbitraging and looking for a way to roll with the punches. The long tail was working, but it was pollution. We no longer felt good about what we were producing. In fact, no matter how much they paid us for being the only page on the internet to have content on a common misspelling of some celebrity’s name, it didn’t make up for the illegitimacy of it all. Despite the profitability we ended up shutting those servers down.

The synthetic theft of our scoops via evolving “analytics” were a cruel task master. Once the top page in the search results was Google News’ scrape of our own site while our own site was on page 22.75 the writing was on the wall. What are we to expect from companies whose business models literally began by plagiarizing the entire internet?

Now they serve up fabulous features like “image search” and “smart descriptions” that allow the world to browse your content w/o you ever knowing it. “No hits for you!” they say as the top of the search page is covered with their ads. Ads that they no longer share 1 penny of. Oh, too bad, so sad right?

In the midst of that you can imagine how complex our own code was getting. Scooping major trends legitimately was easy. Did you know they pre-choose the names of hurricanes & tropical storms? Yep, I learned more about that than I ever wanted to know.

Anyway, all of this rambling is leading somewhere. During that time I was personally desperate to produce something legit. So of course I set out to analyze the stock market. I mean what could be more legit than that?

We all know that stocks are entirely “organic” right? That “fundamentals” drive the market. Wink, wink. Silly me…

So if you can search wikipedia for the Farmer’s Almanac & scroll down to the section about accuracy… Don’t worry we’re on track now. Realize, I was embarking on producing the legit Farmers Almanac of stocks! So whether you’ve found it yet or not I’ll summarize, & since it is on the internet it must be true!

As it turns out, the publishers claim it’s 80-85% accurate. Go figure that, a range, 80 to 85, which of course means that the lowest it could ever be is 80% accurate. Wow, impressive. Then the next paragraph points out that actual studies have shown it to be a coin toss, 50% accuracy, finally pointing out that it does have a slightly higher accuracy rate than Punxsutawney Phil. (Ironically when preparing for this I found the misspelled version first). All that and only a few days ago I bit my tongue as a small board I’m on discussed “the implications of the Farmer’s Almanac on our planning for 2020!”

People crave crystal balls. That’s why predictive analytics, machine learning, & just data in general is such a big part of the budget in so many places. As you might begin to guess, I’m the guy that shut down profitable servers because they were illegitimate, you can imagine what I was about to discover about “predictive analytics.” You guessed it, a whole lot of it is pure bovine scatology.

I pulled data from every source I could find. I keyed & indexed it & maintained detailed hit and miss records. I spotted some trends, this contributor at Fool.com. That ticker @ Forbes, I was about to strike gold! Only I wasn’t. In the end every pattern I’d thought I’d spotted was about as reliable as a coin toss. I was reminded of a college professor that taught us how to determine probabilities making the statement: “if you flip a coin 10 times and it comes up heads every time, what are the odds the next one will be heads?”

Yet here I am, in a field where these trends are so lucrative. It’s not enough to make something work. No, we need to “predict the future!” Anyone that convinces corporate America that they know the future is rewarded handsomely! It used to be the starts, or tea leaves, or chicken bones. Even just numbers themselves. Any set of dice, stack of cards, and some complex description of why/how was enough. Now we have computers & computer models which is how we know that polar bears are now extinct, the ice caps have melted, the coasts are flooded, & COVID19 would have killed more people than the Spanish flu by now if we all hadn’t agreed to voluntary house arrest. This sounds cynical, but it’s not and here’s why.

There are very important reasons to quantify historic trends. Knowledge is power & data is knowledge. The scientific method mandates that we analyze and repeat results and even then any hypothesis remains falsifiable with new data. These are good practices when done without false promises and shortcuts. Those who don’t learn from the past are doomed to repeat it.

Yet there’s another aspect to this. So much of what we call “computer science” or now “data science” short circuits actual science. Of course it shouldn’t, but it does. As soon as being “legit” isn’t a priority. As soon as a political and/or profit motivation or even just wishful thinking or fear get mixed in…

Why is the Farmers’ Almanac so popular? Because, people have seen droughts and freezes, and floods, and they long for some sense of security. If we’er not careful, we become charlatan bait!

More than “predictive analytics” we need “data wisdom.” We need to identify what trends repeat in a finite enough way to guide our decisions and which are just irrational to pursue. The right balance here can be as illusive as the crystal ball that so many crave.

The sorcerer’s stone, the “Holy Grail”, the “goose that lays the golden egg…” I can’t say whether mankind will ever evolve beyond superstition, the trends all point to “probably not” as does my “Magic 8-BallĀ®.” What I can tell you is if we value “legit” over anything else. Honesty over flattery, practicality over politics. If we use our influence to promote reality & embrace uncertainty it will give us a rational edge over those that still consult mediums or fortune tellers.

It doesn’t mean you’ll always be right, or they will always be wrong. It’ll just mean that you know the difference between a data based decision and a coin toss. That just because someone with your same first name won 100 million dollars “only” 6 states away in 2003 doesn’t mean you’ll rush out and waste your money on a lottery ticket. It also doesn’t mean you’d lose if you did either.

Data wisdom only happens to the degree that we insulate data science from the synthetic factors, like politics, profitability, & plain ole superstition. To do this is not easy, but again knowing what is legit and what’s just lip service is a big start.

Finally, a last piece of advice, if and when you are hoodwinked by a charlatan and they prove themselves to be a charlatan. Then the best data driven decision you can make is to move on and not continue to even listen to their advice. They won’t always be wrong any more than a coin toss will always be heads. What they will do is inject persistent irrationality into your organization which in turn will likely result in the distinct possibility that Punxsutawney Phil would have served you better.

Thanks for listening.