AI and Machine Learning Explained for Non-Technical TPMs
• public
If you've been in or around tech for the past few years, you've sat in rooms where people throw around AI and machine learning like they're obvious—where engineers debate model architectures and product managers argue about training data while you nod along, hoping the conversation moves on before someone asks for your opinion.
That ends today.
AI is no longer a niche concern for a specialized engineering team. It's embedded in product decisions, infrastructure conversations, hiring discussions, and program planning at companies of all sizes. As a TPM—especially one who wants to manage AI programs or work at companies building AI products—you need a working understanding of what's actually happening under the hood.
Not the sci-fi version. The real version.
Start Here: What AI Actually Is
Artificial intelligence is the broad term for computer systems that perform tasks that, until recently, required human intelligence—things like recognizing images, understanding language, making predictions, or generating text.
That definition has expanded dramatically as the technology has improved, but it still means something specific: AI systems are pattern-matchers at scale. They find structure in data. They use patterns from past examples to make predictions or generate outputs for new inputs.
What AI is not: a system that "understands" in the way humans do. There's no consciousness, no genuine comprehension. There are mathematical operations—often very large numbers of them—that produce outputs that look like understanding because they were trained on enormous amounts of human-generated data.
That distinction matters for TPMs because it shapes what can go wrong. AI systems fail in ways that feel strange and unpredictable to humans because they don't generalize the way humans do—they generalize the way their training data did. If the data was biased, skewed, or incomplete, the model's behavior will reflect that.
The Relationship Between AI, ML, and Deep Learning
These terms get used interchangeably, which causes confusion. Here's how they actually relate:
Artificial Intelligence (AI)
└── Machine Learning (ML)
└── Deep Learning (DL)
Artificial Intelligence is the umbrella. Any system designed to do something that seems intelligent is technically an AI system.
Machine Learning is a specific subset of AI. Instead of programming a computer with explicit rules ("if the email contains 'click here,' mark it as spam"), you train a model on examples and let it find its own patterns ("here are 10,000 examples of spam and 10,000 examples of not-spam—figure out the difference"). The computer learns from data rather than from hand-coded rules.
Deep Learning is a subset of machine learning that uses a specific type of model called a neural network—loosely inspired by the structure of the human brain (many interconnected nodes, organized in layers). Deep learning is what powers modern image recognition, natural language processing, and the large language models behind ChatGPT and Claude. It requires more data and more computing power than traditional machine learning, but it's dramatically more capable for certain tasks.
For most TPMs, the distinction that matters most is ML vs. traditional software. Managing an ML program is different because the output isn't determined by explicit logic—it's determined by patterns learned from data. That changes what can go wrong, what success looks like, and what the engineering process looks like.
Training vs. Inference: The Two Modes of ML
Every machine learning system has two distinct phases, and confusing them will make you look uninformed in design reviews. Here's the difference:
Training
Training is how a model learns. You feed it a large dataset—millions of examples—and the model adjusts its internal parameters to get better at predicting the right answer.
Think of it like studying for an exam. You work through thousands of practice problems. Each time you get one wrong, you adjust your understanding. By the end, you've internalized patterns that help you answer new questions you haven't seen before.
Training is computationally expensive. It requires powerful hardware (often GPUs—specialized processors that are good at the parallel math that training requires) and can take hours, days, or even weeks to complete. It's done once (or periodically, when the model needs to be updated).
TPM implications:
· Training jobs are long-running, resource-intensive workloads. They have their own dependencies, failure modes, and cost profiles.
· If training data changes (new data added, old data cleaned, labeling corrected), you may need to retrain. That takes time and resources.
· Training runs can fail. Infrastructure issues, data quality problems, and hyperparameter choices can all cause a training run to produce a bad model or no model at all.
Inference
Inference is when a trained model actually does the thing it was trained to do. A user submits a photo and the model classifies it. A sentence is typed and the model generates the next word. A transaction is submitted and the model predicts whether it's fraudulent.
Inference is what happens in production, at scale, in real time. It's usually much faster and cheaper per request than training—but when you're handling millions of requests, cost and latency still matter enormously.
TPM implications:
· Inference has a latency budget: how long can the model take to respond before the user experience is unacceptable? Faster models sometimes sacrifice accuracy. This is a trade-off that requires a decision.
· Inference at scale requires infrastructure planning. A model that handles 100 requests per day behaves very differently from one that handles 100,000.
· Model performance can degrade over time if the real-world data drifts away from the training data. This is called model drift, and it's a risk your monitoring strategy needs to account for.
How Models Actually Learn: The Short Version
You don't need to understand backpropagation. But understanding the basic loop of machine learning will help you in engineering discussions.
The core loop is:
1. Feed the model a labeled example: An email with a label ("spam" or "not spam"). An image with a label ("cat" or "not cat"). A sentence with a label ("positive sentiment" or "negative sentiment").
2. The model makes a prediction using its current parameters.
3. Compare the prediction to the correct label and calculate how wrong it was (the "loss").
4. Adjust the model's parameters to reduce the loss.
5. Repeat, millions of times, across the entire training dataset.
After enough iterations, the model has adjusted its parameters to the point where it gets most of the training examples right—and, ideally, can generalize to new examples it hasn't seen.
The key insight for TPMs: Model performance is not magical. It's a function of (1) the quality and quantity of training data, (2) the architecture of the model (how it's structured), and (3) the training process (how long it trains, with what settings). When a model isn't performing well, the root cause usually lives in one of those three places.
The Data Dependency: Why ML Programs Are Different
The single biggest difference between managing a traditional software program and an ML program is data dependence.
In traditional software, the output is determined by the code. If the code is right, the output is right. The data is something the software operates on, but it doesn't shape the software itself.
In machine learning, the data is the software in a meaningful sense. The model's behavior is determined by what it was trained on. You cannot separate the model from its training data.
This creates dependencies that traditional program management doesn't have to deal with:
Data collection: Where is the training data coming from? Is it labeled? How accurate are the labels? Is the volume sufficient?
Data quality: Is the data clean, balanced, and representative? Biased data produces biased models. Noisy data produces inaccurate models. Missing data can produce models that fail unexpectedly on edge cases.
Data labeling: For supervised learning (the most common type), someone has to label the training data. This is often expensive, slow, and done by external vendors or crowdsourced workers. Labeling quality varies. This is a program management dependency with its own timeline, budget, and quality risk.
Data pipelines: The infrastructure that moves data from its source to the model training job. If the pipeline breaks, the training data is stale or unavailable. Data pipelines are their own engineering workstream with their own risks and dependencies.
Data governance and privacy: Training data often contains information about real people. Privacy regulations (GDPR, CCPA, HIPAA depending on the domain) impose constraints on what data can be used, how it must be stored, and what rights users have over it. Legal and compliance need to be involved early.
Personal Note: Share an example from your own experience where a data dependency was the critical path on an ML program—what the dependency was, how you tracked it, and what happened when it slipped
Key Vocabulary for TPM Contexts
These are the terms you'll hear in meetings and need to follow without asking for a definition every time:
Model: The trained artifact—the mathematical function that takes inputs and produces outputs. "The model" in an ML conversation is the end product of the training process.
Feature: An input variable the model uses to make predictions. If you're predicting whether a customer will churn, features might include days since last login, number of support tickets, and average monthly spend.
Label / Ground truth: The correct answer used during training. For image classification, the label is the actual name of what's in the image. For fraud detection, the label is whether a transaction was actually fraudulent.
Accuracy / Precision / Recall: Different ways of measuring how well a model performs. Accuracy is the percentage of predictions that are correct. Precision and recall measure performance specifically on the positive class—important when false positives and false negatives have different costs (fraud detection, medical diagnosis).
Overfitting: When a model learns the training data so well that it fails to generalize to new data. Like a student who memorized the practice tests but can't answer different questions. A sign that the model is too complex for the amount of data.
Underfitting: The opposite—a model that's too simple to capture the patterns in the data. Performs poorly on both training and new data.
Hyperparameters: Settings that control how the model trains (learning rate, number of layers, etc.). These aren't learned from data—they're chosen by the engineer before training starts. Selecting good hyperparameters is part of the craft of ML engineering.
Baseline: The performance benchmark you're trying to beat. Often the simplest possible solution—a heuristic rule, the previous model, or random chance. A model that doesn't beat the baseline isn't adding value.
A/B test: Running two versions (A and B) simultaneously with real users to measure which performs better. Standard practice for evaluating model improvements in production.
MLOps: The practice of operationalizing machine learning—managing the end-to-end lifecycle of models from development through deployment and monitoring. If your company has an MLOps team or MLOps tooling, they're responsible for the infrastructure that makes models production-ready.
Model drift: When a model's real-world performance degrades over time because the distribution of inputs has shifted away from the training data. A fraud model trained on 2022 data might perform poorly on 2025 fraud patterns.
Prompt: For large language models specifically (GPT, Claude, etc.), the input you provide to get a response. "Prompt engineering" is the practice of crafting inputs to get better outputs.
What Makes Managing an AI/ML Program Different
If you're asked in an interview what's different about managing an ML program, here's your answer:
Longer feedback loops: In traditional software, you deploy code and immediately know if it works. In ML, you train a model, evaluate it on a test set, deploy it, and then wait for real-world performance data to confirm it's working as expected. The cycle from decision to validated outcome is longer.
Non-deterministic output: The same input to an ML model can produce slightly different outputs under different conditions. This makes testing more complex—you can't just check "does this return the right value?" You have to think about distributions of outputs, edge cases, and confidence thresholds.
Data as a dependency: As described above. Data collection, labeling, cleaning, and governance are engineering workstreams with their own timelines, risks, and external dependencies.
Experiment-driven development: ML engineers run experiments—they try different architectures, different datasets, different training configurations. Not all experiments produce useful results. Your planning needs to account for the experimental nature of the work; traditional sprint velocity metrics don't translate cleanly.
Model monitoring is a production requirement: A traditional software deployment either works or it doesn't. An ML model can work technically but perform poorly (making bad predictions). You need to track both: is the service up? And is the model still making good predictions?
Responsible AI considerations: Bias, fairness, transparency, and explainability are not afterthoughts in AI programs—they're program requirements. If your model makes decisions that affect real people (credit scoring, content moderation, hiring screening, medical diagnosis), you need explicit processes for evaluating and mitigating bias.
Personal Note: Share your experience managing an AI or ML program, or working adjacent to one—what surprised you most, and what you wish you'd understood from the start
The Practical AI Landscape for TPMs
You'll encounter a few specific types of AI systems in your work. Here's a quick map:
Recommendation systems: Suggest products, content, or connections based on user behavior. Amazon's product recommendations, Netflix's content suggestions, LinkedIn's "People You May Know." Highly ML-dependent, with data pipelines from user behavior as the core input.
Natural language processing (NLP): Systems that understand or generate human language. Search engines, chatbots, sentiment analysis, document classification, translation. GPT and Claude are large language models—extremely capable NLP systems.
Computer vision: Systems that analyze images or video. Fraud detection using document images, content moderation for user-uploaded photos, manufacturing defect detection, medical imaging analysis.
Fraud and anomaly detection: Systems that identify unusual patterns in transactions, user behavior, or network traffic. Heavily used in financial services, e-commerce, and security.
Predictive analytics: Systems that forecast future outcomes—demand forecasting, churn prediction, resource capacity planning. Often the entry point for ML in organizations new to the technology.
Generative AI: Systems that produce new content—text (GPT, Claude), images (DALL-E, Stable Diffusion), code (Copilot). This is the category driving most current AI investment and excitement. These systems are built on large language models (LLMs) and diffusion models.
Where to Go From Here
Understanding the conceptual framework is the starting point. The next steps in the AI track of this series will build on this foundation:
· How AI is Changing Program Management (Week 16): What this technology means for the TPM role specifically
· AI Tools Every TPM Should Be Using (Week 23): Practical tools to make your work better, right now
· Managing an AI/ML Program (Week 38): The specific challenges and frameworks for running these programs end-to-end
Key Takeaways
6. AI is pattern-matching at scale, not magic or consciousness. Understanding this framing will help you reason clearly about what can go wrong and where the limits are.
7. AI ⊃ ML ⊃ Deep Learning. Machine learning is the most relevant subset for most TPM contexts. Deep learning is what powers modern large language models and image recognition.
8. Training (learning from data) and inference (making predictions) are distinct phases with different cost profiles, timelines, and failure modes. Both need to be on your program radar.
9. Data is the core dependency in ML programs—more than code, more than infrastructure. Data quality, labeling, pipelines, and governance are engineering workstreams that belong on your dependency map from day one.
10. ML programs are experiment-driven, have longer feedback loops, and require ongoing monitoring in production. Traditional project management rhythms need to adapt to this reality.
11. Responsible AI is a program requirement, not a nice-to-have. Bias, fairness, and transparency need to be designed into the program from the start—not addressed after launch.
Next week: We pause the technical track and come back to stakeholder management's nearest neighbor—how to run a meeting that engineers actually find useful. Then in Week 9, we complete the Phase 1 foundation with a day in the life of a TPM.
Related Reading:
· How the Internet Actually Works: A TPM's Guide
· How AI is Changing Program Management(Coming)
· Managing an AI/ML Program: What's Different?(Coming)