REGARDING AI: STOP WAITING, START DOING
Data science, machine learning, AI, etc. isn’t about perfection. It’s about getting your hands dirty with real-world data and solving actual problems.
If you’re waiting for pristine data, continually chasing after the latest AI buzzword, or just assume you aren’t ‘ready’ for AI, you’ll find yourself falling further and further behind the competition. Here’s how to approach AI without losing your mind (or your budget).
Embracing Messy Data & Avoiding ‘Shiny Object Syndrome’
In a perfect world, all our data would be clean, complete, and ready for analysis. But we don’t live in that world, nor will data ever be perfect.
Regarding bias: the world we live in is biased, which means some data we generate will be biased. This is just something we have to work with.
The truth is that signal is just as important as cleanliness, but you really should strive for a balance between both.
Missing Values Aren’t the End of the World
Does your data have gaps? Join the club! In many cases we have methods of dealing with missing values, each with its own strengths and considerations:
- Imputation (fill missing values with statistical estimations/predictions)
- Missing category (flag missing values as a new categorical variable)
- Deletion (just remove the rows/columns)
- Other advanced techniques (factorization, KNN, etc.)
Something we tend to overlook is that those missing values might actually tell you something valuable. For instance, in a customer churn analysis the “missingness” of data can be turned into a feature itself (bullet #2). It turns out that customers with missing ‘last purchase date’ were more likely to churn.
Synthetic Data Generation Is a Real Thing
Maybe your data lacks any predictive signal, or the data you’re analyzing is sparse and difficult to generate. Sometimes you just need more data.
Lucky for us, we can employ techniques like SMOTE (Synthetic Minority Over-sampling Technique) or use tools like generative AI to expand datasets synthetically, while adhering to common patterns or constraints. It’s like cloning your data, but without the ethical dilemmas.
Wrangling the Unruly Nature of Natural Language
Dealing with text data can be a bit like trying to herd cats. With methods like tokenization, lemmatization, and embeddings, you can make that jumbled text into something useful for analysis.
Yes, natural language is messy by nature, but that doesn’t mean we can’t manage it.
The “Million Dollar AI Solution” Myth
If you think throwing money at AI will magically solve all your problems, think again. AI success isn’t about your budget size – it’s about your willingness to explore, fail quickly, and continuously learn.
AI Isn’t a Vending Machine
You can’t just insert various money and products and expect a fully formed AI solution to pop out.
Truly competitive and unique AI solutions are more like scientific research than product implementations. It’s messy, can be a bit unpredictable, and is sometimes frustrating. But it’s also incredibly rewarding when you crack the problem, both personally and professionally.
There is no ‘how-to guide’ for AI solutions that give you a competitive advantage (otherwise, they wouldn’t be competitive).
The “Shiny Chatbot” Syndrome
Let’s be real with ourselves for a second – not every problem needs a chatbot. Or a neural network. Or some expensive magical product. Focus on the actual business problem you’re trying to solve, not the trendiest tech buzzword. Don’t forget Occam’s razor just because of generative AI’s buzz – the simplest solution/explanation usually is the best one.
Real Solutions, Not Toys
The image generated above might impress your friends, but will it actually improve your business? Probably not.
The allure of current-state AI is hard to resist. It can be a bit like being a kid in a tech candy store. “Ooh, look at that Llama 3 variant with a 0.2% higher MMLU score!”
Applied AI is a very different beast from theoretical/research/academia. We’re in the business of solving problems while minimizing tech debt and ensuring safe and responsible use. That 0.2% increase in an MMLU score, or slightly better image generation quality, is likely the least of your concerns.
New versions of models and slight improvements will continue; however, constantly changing your approach may impact your ability to ever realize value. Do your use cases actually require the latest and greatest model? Focus on answering the question of “should we” rather than “could we.”
Machine Learning: The Unsung Hero
While everyone’s excited over the latest generative AI and how cool it all is, good old-fashioned machine learning algorithms are still solving the majority of real problems.
Many believe they should use generative AI for challenges we’ve been solving for decades with much simpler techniques. Linear regression anyone?
The challenge with using generative AI for every problem lies in the details of how generative models work and the specific requirements of a use case.
For example, in a recent project we needed to classify documents. We were able to do this with extremely strong performance using a model we built on a basic VM that is much faster at analyzing new documents, more interpretable, and costs significantly less to train/host/consume than generative models.
It’s All Built on Math & Stats
Every model, whether it’s a simple regression or a state-of-the-art transformer, is built on fundamental principles of statistics. Understanding the data, knowing how to preprocess it, and selecting the right features are all core challenges math and statistics help us solve.
The latest AI craze might obscure and shift focus from this fact, but the foundations still remain the same. Techniques like hypothesis testing, confidence intervals, and probability distributions still play crucial roles in model development and evaluation. Ignoring these principles in favor of the latest shiny algorithm can lead to misinterpretations and unreliable results.
Remember, there is still immense value in those with a deep understanding of statistical methods, and this will often provide more robust and interpretable solutions. Fancy toys are fun, but solid math/statistics foundations build lasting results.
Getting to Meaningful Results
So, how do you cut through the hype and actually gain value from AI? Your specific answer will depend on where you’re at on your ‘AI journey’, but here are some of our tips:
Invest in Discovery
Stop obsessing over arbitrary goals like “we need to implement X number of AI models” or “we need to start using generative AI.” Instead, focus on outcomes. What problem are you trying to solve? How will you measure success? What does ‘good’ look like? Start there, and the rest will follow.
Embrace Failure (Trust Us, It’ll Happen)
Remember when we stated a fundamental concept of AI is experimentation and failing quickly? Well, some of your AI projects will fail, and that’s okay. Some failures will be harder to digest than others, but each failure is a learning opportunity.
What went wrong? What assumptions did you make? How can you do better next time? Failure is just a pit stop on the road to success.
The failure we shouldn’t be ok with is never getting started in the first place. Again, don’t let perfection be the enemy of good. Small, incremental wins can lead to massive improvements to the business.
Use What You’ve Got, Plan for What You Need
Don’t wait for perfect data to start your AI journey or assume you need an enterprise AI platform to find and prove value. Use what you have, but keep an eye on what you will need as your AI initiatives begin to scale.
A benefit of experimenting early and failing fast is using AI projects to uncover gaps in your data strategy and operations. Use those insights to evolve your data collection and management practices. Determine what tools/methods/customization your analytics team desires in a collaborative platform. Use quick wins to build justification and alignment with a longer-term AI platform.
Keep the Humans in the Loop
AI isn’t a magic wand that makes humans obsolete. In fact, there is a reason you’re seeing more and more organizations using the term “co-pilot” – AI’s planned future is to augment humans, not replace them.
Ignore the technological dystopia that many Silicon Valley companies (and sci-fi movies) seem to be living in. Focus on what is real and how your employees and AI tools can work better together. Many applications of AI (especially generative AI) should actually be demanding a human-in-the-loop (HITL) requirement.
Engage critical stakeholders throughout the entire AI lifecycle. Demonstrate AI capabilities and provide organization-wide training to help minimize barriers to entry and increase adoption.
Wrapping Up
AI isn’t about perfection, big budgets, or shiny toys. It’s about solving real problems with the data you have, learning from your failures, and continuously improving.
Stop waiting for the perfect conditions and start exploring. The AI revolution is happening now, and it’s exciting and full of opportunities. If you aren’t taking advantage of the “experiment now, expand later” mindset, then your competitors are.
Now, go forth and build something truly valuable. Just please, don’t make another chatbot (unless you really, really need one).
This article was first published on LinkedIn Pulse. You can read the original here. Special thanks to Matt Adkins, Senior Associate Technical Consultant, for his contributions to this article.
About the author
Ben Prescott
Principal Technical Consultant
Ben Prescott is a Principal Data Scientist with over 15 years of experience across various managerial and individual contributor roles. His expertise spans data science, ML engineering, and AI, with a focus on cloud platforms, advanced analytics, and responsible AI practices. Ben has an aptitude for leveraging cutting-edge techniques that drive measurable business value, from developing prescriptive models for automotive manufacturers to implementing computer vision solutions for behavioral risk assessment. With a Master's in Data Science from Northwestern University and a background in Electronics Engineering, Ben brings a unique blend of technical depth and practical problem-solving to complex data challenges.