REGARDING AI: STOP WAITING, START DOING

a woman sitting at a workstation drinking from a mug

Data science, machine learning, AI, etc. isn’t about perfection. It’s about getting your hands dirty with real-world data and solving actual problems.

If you’re waiting for pristine data, continually chasing after the latest AI buzzword, or just assume you aren’t ‘ready’ for AI, you’ll find yourself falling further and further behind the competition. Here’s how to approach AI without losing your mind (or your budget).

Embracing Messy Data & Avoiding ‘Shiny Object Syndrome’

In a perfect world, all our data would be clean, complete, and ready for analysis. But we don’t live in that world, nor will data ever be perfect.

Regarding bias: the world we live in is biased, which means some data we generate will be biased. This is just something we have to work with.

The truth is that signal is just as important as cleanliness, but you really should strive for a balance between both.

Missing Values Aren’t the End of the World

Does your data have gaps? Join the club! In many cases we have methods of dealing with missing values, each with its own strengths and considerations:

  • Imputation (fill missing values with statistical estimations/predictions)
  • Missing category (flag missing values as a new categorical variable)
  • Deletion (just remove the rows/columns)
  • Other advanced techniques (factorization, KNN, etc.)

Something we tend to overlook is that those missing values might actually tell you something valuable. For instance, in a customer churn analysis the “missingness” of data can be turned into a feature itself (bullet #2). It turns out that customers with missing ‘last purchase date’ were more likely to churn.

Synthetic Data Generation Is a Real Thing

Maybe your data lacks any predictive signal, or the data you’re analyzing is sparse and difficult to generate. Sometimes you just need more data.

Lucky for us, we can employ techniques like SMOTE (Synthetic Minority Over-sampling Technique) or use tools like generative AI to expand datasets synthetically, while adhering to common patterns or constraints. It’s like cloning your data, but without the ethical dilemmas.

Wrangling the Unruly Nature of Natural Language

Dealing with text data can be a bit like trying to herd cats. With methods like tokenization, lemmatization, and embeddings, you can make that jumbled text into something useful for analysis.

Yes, natural language is messy by nature, but that doesn’t mean we can’t manage it.

The “Million Dollar AI Solution” Myth

If you think throwing money at AI will magically solve all your problems, think again. AI success isn’t about your budget size – it’s about your willingness to explore, fail quickly, and continuously learn.

An AI-generated image of a man standing in front of an AI Solutions vending machine

AI Isn’t a Vending Machine

You can’t just insert various money and products and expect a fully formed AI solution to pop out.

Truly competitive and unique AI solutions are more like scientific research than product implementations. It’s messy, can be a bit unpredictable, and is sometimes frustrating. But it’s also incredibly rewarding when you crack the problem, both personally and professionally.

There is no ‘how-to guide’ for AI solutions that give you a competitive advantage (otherwise, they wouldn’t be competitive).

The “Shiny Chatbot” Syndrome

Let’s be real with ourselves for a second – not every problem needs a chatbot. Or a neural network. Or some expensive magical product. Focus on the actual business problem you’re trying to solve, not the trendiest tech buzzword. Don’t forget Occam’s razor just because of generative AI’s buzz – the simplest solution/explanation usually is the best one.

Real Solutions, Not Toys

An AI-generated image featuring Lego people in what looks like a giant warehouse

The image generated above might impress your friends, but will it actually improve your business? Probably not.

The allure of current-state AI is hard to resist. It can be a bit like being a kid in a tech candy store. “Ooh, look at that Llama 3 variant with a 0.2% higher MMLU score!”

Applied AI is a very different beast from theoretical/research/academia. We’re in the business of solving problems while minimizing tech debt and ensuring safe and responsible use. That 0.2% increase in an MMLU score, or slightly better image generation quality, is likely the least of your concerns.

New versions of models and slight improvements will continue; however, constantly changing your approach may impact your ability to ever realize value. Do your use cases actually require the latest and greatest model? Focus on answering the question of “should we” rather than “could we.”

Machine Learning: The Unsung Hero

While everyone’s excited over the latest generative AI and how cool it all is, good old-fashioned machine learning algorithms are still solving the majority of real problems.

Many believe they should use generative AI for challenges we’ve been solving for decades with much simpler techniques. Linear regression anyone?

The challenge with using generative AI for every problem lies in the details of how generative models work and the specific requirements of a use case.

For example, in a recent project we needed to classify documents. We were able to do this with extremely strong performance using a model we built on a basic VM that is much faster at analyzing new documents, more interpretable, and costs significantly less to train/host/consume than generative models.

It’s All Built on Math & Stats

Every model, whether it’s a simple regression or a state-of-the-art transformer, is built on fundamental principles of statistics. Understanding the data, knowing how to preprocess it, and selecting the right features are all core challenges math and statistics help us solve.

The latest AI craze might obscure and shift focus from this fact, but the foundations still remain the same. Techniques like hypothesis testing, confidence intervals, and probability distributions still play crucial roles in model development and evaluation. Ignoring these principles in favor of the latest shiny algorithm can lead to misinterpretations and unreliable results.

Remember, there is still immense value in those with a deep understanding of statistical methods, and this will often provide more robust and interpretable solutions. Fancy toys are fun, but solid math/statistics foundations build lasting results.

Getting to Meaningful Results

So, how do you cut through the hype and actually gain value from AI? Your specific answer will depend on where you’re at on your ‘AI journey’, but here are some of our tips:

Invest in Discovery

Stop obsessing over arbitrary goals like “we need to implement X number of AI models” or “we need to start using generative AI.” Instead, focus on outcomes. What problem are you trying to solve? How will you measure success? What does ‘good’ look like? Start there, and the rest will follow.

Embrace Failure (Trust Us, It’ll Happen)

Remember when we stated a fundamental concept of AI is experimentation and failing quickly? Well, some of your AI projects will fail, and that’s okay. Some failures will be harder to digest than others, but each failure is a learning opportunity.

What went wrong? What assumptions did you make? How can you do better next time? Failure is just a pit stop on the road to success.

The failure we shouldn’t be ok with is never getting started in the first place. Again, don’t let perfection be the enemy of good. Small, incremental wins can lead to massive improvements to the business.

Use What You’ve Got, Plan for What You Need

Don’t wait for perfect data to start your AI journey or assume you need an enterprise AI platform to find and prove value. Use what you have, but keep an eye on what you will need as your AI initiatives begin to scale.

A benefit of experimenting early and failing fast is using AI projects to uncover gaps in your data strategy and operations. Use those insights to evolve your data collection and management practices. Determine what tools/methods/customization your analytics team desires in a collaborative platform. Use quick wins to build justification and alignment with a longer-term AI platform.

Keep the Humans in the Loop

AI isn’t a magic wand that makes humans obsolete. In fact, there is a reason you’re seeing more and more organizations using the term “co-pilot” – AI’s planned future is to augment humans, not replace them.

Ignore the technological dystopia that many Silicon Valley companies (and sci-fi movies) seem to be living in. Focus on what is real and how your employees and AI tools can work better together. Many applications of AI (especially generative AI) should actually be demanding a human-in-the-loop (HITL) requirement.

Engage critical stakeholders throughout the entire AI lifecycle. Demonstrate AI capabilities and provide organization-wide training to help minimize barriers to entry and increase adoption.

Wrapping Up

AI isn’t about perfection, big budgets, or shiny toys. It’s about solving real problems with the data you have, learning from your failures, and continuously improving.

Stop waiting for the perfect conditions and start exploring. The AI revolution is happening now, and it’s exciting and full of opportunities. If you aren’t taking advantage of the “experiment now, expand later” mindset, then your competitors are.

Now, go forth and build something truly valuable. Just please, don’t make another chatbot (unless you really, really need one).

 

This article was first published on LinkedIn Pulse. You can read the original here. Special thanks to Matt Adkins, Senior Associate Technical Consultant, for his contributions to this article.

About the author

Ben Prescott

Principal Technical Consultant

Ben Prescott is a Principal Data Scientist with over 15 years of experience across various managerial and individual contributor roles. His expertise spans data science, ML engineering, and AI, with a focus on cloud platforms, advanced analytics, and responsible AI practices. Ben has an aptitude for leveraging cutting-edge techniques that drive measurable business value, from developing prescriptive models for automotive manufacturers to implementing computer vision solutions for behavioral risk assessment. With a Master's in Data Science from Northwestern University and a background in Electronics Engineering, Ben brings a unique blend of technical depth and practical problem-solving to complex data challenges.

SUBSCRIBE
Subscribe to the AHEAD I/O Newsletter for a periodic digest of all things apps, opps, and infrastructure.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.