On May 8, O’Reilly Media volition beryllium hosting Coding with AI: The End of Software Development arsenic We Know It—a unrecorded virtual tech league spotlighting however AI is already supercharging developers, boosting productivity, and providing existent worth to their organizations. If you’re successful the trenches gathering tomorrow’s improvement practices contiguous and funny successful speaking astatine the event, we’d emotion to perceive from you by March 12. You tin find much accusation and our telephone for presentations here. Just privation to attend? Register for escaped here.
99% of Executives Are Misled by AI Advice
As an executive, you’re bombarded with articles and proposal on
gathering AI products.
The occupation is, a batch of this “advice” comes from different executives
who seldom interact with the practitioners really moving with AI.
This disconnect leads to misunderstandings, misconceptions, and
wasted resources.
A Case Study successful Misleading AI Advice
An illustration of this disconnect successful enactment comes from an interview with Jake Heller, caput of merchandise of Thomson Reuters CoCounsel (formerly Casetext).
During the interview, Jake made a connection astir AI investigating that was wide shared:
One of the things we learned is that aft it passes 100 tests, the likelihood that it volition walk a random organisation of 100K idiosyncratic inputs with 100% accuracy is precise high.
This assertion was past amplified by influential figures like Jared Friedman and Garry Tan of Y Combinator, reaching countless founders and executives:

The greeting aft this proposal was shared, I received galore emails from founders asking if they should purpose for 100% test-pass rates.
If you’re not hands-on with AI, this proposal mightiness dependable reasonable. But immoderate practitioner would cognize it’s profoundly flawed.
“Perfect” Is Flawed
In AI, a cleanable people is simply a reddish flag. This happens erstwhile a exemplary has inadvertently been trained connected information oregon prompts that are excessively akin to tests. Like a pupil who was fixed the answers earlier an exam, the exemplary volition look bully connected insubstantial but beryllium improbable to execute good successful the existent world.
If you are definite your information is cleanable but you’re inactive getting 100% accuracy, chances are your trial is excessively anemic oregon not measuring what matters. Tests that ever walk don’t assistance you improve; they’re conscionable giving you a mendacious consciousness of security.
Most importantly, erstwhile each your models person cleanable scores, you suffer the quality to differentiate betwixt them. You won’t beryllium capable to place wherefore 1 exemplary is amended than different oregon strategize astir however to marque further improvements.
The extremity of evaluations isn’t to pat yourself connected the backmost for a cleanable score.
It’s to uncover areas for betterment and guarantee your AI is genuinely solving the problems it’s meant to address. By focusing connected real-world show and continuous improvement, you’ll beryllium overmuch amended positioned to make AI that delivers genuine value. Evals are a large topic, and we’ll dive into them much successful a aboriginal chapter.
Moving Forward
When you’re not hands-on with AI, it’s hard to abstracted hype from reality. Here are immoderate cardinal takeaways to support successful mind:
- Be skeptical of proposal oregon metrics that dependable excessively bully to beryllium true.
- Focus connected real-world show and continuous improvement.
- Seek proposal from experienced AI practitioners who tin pass efficaciously with executives. (You’ve travel to the close place!)
We’ll dive deeper into however to trial AI, on with a information reappraisal toolkit successful a aboriginal chapter. First, we’ll look astatine the biggest mistake executives marque erstwhile investing successful AI.
The #1 Mistake Companies Make with AI
One of the archetypal questions I inquire tech leaders is however they program to amended AI reliability, performance, oregon idiosyncratic satisfaction. If the reply is “We conscionable bought XYZ instrumentality for that, truthful we’re good,” I cognize they’re headed for trouble. Focusing connected tools implicit processes is simply a reddish emblem and the biggest mistake I spot executives marque erstwhile it comes to AI.
Improvement Requires Process
Assuming that buying a instrumentality volition lick your AI problems is similar joining a gym but not really going. You’re not going to spot betterment by conscionable throwing wealth astatine the problem. Tools are lone the archetypal step; the existent enactment comes after. For example, the metrics that travel built-in to galore tools seldom correlate with what you really attraction about. Instead, you request to plan metrics that are circumstantial to your business, on with tests to measure your AI’s performance.
The information you get from these tests should besides beryllium reviewed regularly to marque definite you’re connected track. No substance what country of AI you’re moving on—model evaluation, retrieval-augmented procreation (RAG), oregon prompting strategies—the process is what matters most. Of course, there’s much to making improvements than conscionable relying connected tools and metrics. You besides request to make and travel processes.
Rechat’s Success Story
Rechat is simply a large illustration of however focusing connected processes tin pb to existent improvements. The institution decided to physique an AI cause for existent property agents to assistance with a ample assortment of tasks related to antithetic aspects of the job. However, they were struggling with consistency. When the cause worked, it was great, but erstwhile it didn’t, it was a disaster. The squad would marque a alteration to code a nonaccomplishment mode successful 1 spot but extremity up causing issues successful different areas. They were stuck successful a rhythm of whack-a-mole. They didn’t person visibility into their AI’s show beyond “vibe checks,” and their prompts were becoming progressively unwieldy.
When I came successful to help, the archetypal happening I did was use a systematic approach, which is illustrated in Figure 2-1.

This is simply a virtuous rhythm for systematically improving ample connection models (LLMs). The cardinal penetration is that you request some quantitative and qualitative feedback loops that are fast. You commencement with LLM invocations (both synthetic and human-generated), then simultaneously:
- Run portion tests to drawback regressions and verify expected behaviors
- Collect elaborate logging traces to recognize exemplary behavior
These provender into valuation and curation (which needs to beryllium progressively automated implicit time). The eval process combines:
- Human review
- Model-based evaluation
- A/B testing
The results past pass 2 parallel streams:
- Fine-tuning with cautiously curated data
- Prompt engineering improvements
These some provender into exemplary improvements, which starts the rhythm again. The dashed enactment astir the borderline emphasizes this arsenic a continuous, iterative process—you support cycling done faster and faster to thrust continuous improvement. By focusing connected the processes outlined successful this diagram, Rechat was capable to trim its mistake complaint by implicit 50% without investing successful caller tools!
Check retired this ~15-minute video on however we implemented this process-first attack astatine Rechat.
Avoid the Red Flags
Instead of asking which tools you should put in, you should beryllium asking your team:
- What are our nonaccomplishment rates for antithetic features oregon usage cases?
- What categories of errors are we seeing?
- Does the AI person the due discourse to assistance users? How is this being measured?
- What is the interaction of caller changes to the AI?
The answers to each of these questions should impact due metrics and a systematic process for measuring, reviewing, and improving them. If your squad struggles to reply these questions with information and metrics, you are successful information of going disconnected the rails!
Avoiding Jargon Is Critical
We’ve talked astir wherefore focusing connected processes is amended than conscionable buying tools. But there’s 1 much happening that’s conscionable arsenic important: however we speech astir AI. Using the incorrect words tin fell existent problems and dilatory down progress. To absorption connected processes, we request to usage wide connection and inquire bully questions. That’s wherefore we supply an AI connection cheat expanse for executives in the adjacent section. That conception helps you:
- Understand what AI tin and can’t do
- Ask questions that pb to existent improvements
- Ensure that everyone connected your squad tin participate
Using this cheat expanse volition assistance you speech astir processes, not conscionable tools. It’s not astir knowing each tech word. It’s astir asking the close questions to recognize however good your AI is moving and however to marque it better. In the adjacent chapter, we’ll stock a counterintuitive attack to AI strategy that tin prevention you clip and resources successful the agelong run.
AI Communication Cheat Sheet for Executives
Why Plain Language Matters successful AI
As an executive, utilizing elemental connection helps your squad recognize AI concepts better. This cheat expanse volition amusement you however to debar jargon and talk plainly astir AI. This way, everyone connected your squad tin enactment unneurotic much effectively.
At the extremity of this chapter, you’ll find a adjuvant glossary. It explains communal AI presumption successful plain language.
Helps Your Team Understand and Work Together
Using elemental words breaks down barriers. It makes definite everyone—no substance their method skills—can articulation the speech astir AI projects. When radical understand, they consciousness much progressive and responsible. They are much apt to stock ideas and spot problems erstwhile they cognize what’s going on.
Improves Problem-Solving and Decision Making
Focusing connected actions alternatively of fancy tools helps your squad tackle existent challenges. When we region confusing words, it’s easier to hold connected goals and marque bully plans. Clear speech leads to amended problem-solving due to the fact that everyone tin transportation successful without feeling near out.
Reframing AI Jargon into Plain Language
Here’s however to construe communal method presumption into mundane connection that anyone tin understand.
Examples of Common Terms, Translated
Changing method presumption into mundane words makes AI casual to understand. The pursuing array shows however to accidental things much simply:
“We’re implementing a RAG approach.” | “We’re making definite the AI ever has the close accusation to reply questions well.” |
“We’ll use few-shot prompting and chain-of-thought reasoning.” | “We’ll springiness examples and promote the AI to deliberation earlier it answers.” |
“Our exemplary suffers from hallucination issues.” | “Sometimes, the AI makes things up, truthful we request to cheque its answers.” |
“Let’s set the hyperparameters to optimize performance.” | “We tin tweak the settings to marque the AI enactment better.” |
“We request to prevent prompt injection attacks.” | “We should marque definite users can’t instrumentality the AI into ignoring our rules.” |
“Deploy a multimodal model for amended results.” | “Let’s usage an AI that understands some substance and images.” |
“The AI is overfitting on our grooming data.” | “The AI is excessively focused connected aged examples and isn’t doing good with caller ones.” |
“Consider utilizing transfer learning techniques.” | “We tin commencement with an existing AI exemplary and accommodate it for our needs.” |
“We’re experiencing high latency in responses.” | “The AI is taking excessively agelong to reply; we request to velocity it up.” |
How This Helps Your Team
By utilizing plain language, everyone tin recognize and articulation in. People from each parts of your institution tin stock ideas and enactment together. This reduces disorder and helps projects determination faster, due to the fact that everyone knows what’s happening.
Strategies for Promoting Plain Language successful Your Organization
Now let’s look astatine circumstantial ways you tin promote clearer connection crossed your teams.
Lead by Example
Use elemental words erstwhile you speech and write. When you marque analyzable ideas casual to understand, you amusement others however to bash the same. Your squad volition apt travel your pb erstwhile they spot that you worth wide communication.
Challenge Jargon When It Comes Up
If idiosyncratic uses method terms, inquire them to explicate successful elemental words. This helps everyone recognize and shows that it’s good to inquire questions.
Example: If a squad subordinate says, “Our AI needs better guardrails,” you mightiness ask, “Can you archer maine much astir that? How tin we marque definite the AI gives harmless and due answers?”
Encourage Open Conversation
Make it good for radical to inquire questions and accidental erstwhile they don’t understand. Let your squad cognize it’s bully to question wide explanations. This creates a affable situation wherever ideas tin beryllium shared openly.
Conclusion
Using plain connection successful AI isn’t conscionable astir making connection easier—it’s astir helping everyone understand, enactment together, and win with AI projects. As a leader, promoting wide speech sets the code for your full organization. By focusing connected actions and challenging jargon, you assistance your squad travel up with amended ideas and lick problems much effectively.
Glossary of AI Terms
Use this glossary to recognize communal AI presumption successful simple language.
AGI (Artificial General Intelligence) | AI that tin bash immoderate intelligence task a quality can | While immoderate specify AGI arsenic AI that’s arsenic astute arsenic a quality successful each way, this isn’t thing you request to absorption connected close now. It’s much important to physique AI solutions that lick your circumstantial problems today. |
Agents | AI models that tin execute tasks oregon tally codification without quality help | Agents tin automate analyzable tasks by making decisions and taking actions connected their own. This tin prevention clip and resources, but you request to ticker them cautiously to marque definite they are harmless and bash what you want. |
Batch Processing | Handling galore tasks astatine once | If you tin hold for AI answers, you tin process requests successful batches astatine a little cost. For example, OpenAI offers batch processing that’s cheaper but slower. |
Chain of Thought | Prompting the exemplary to deliberation and program earlier answering | When the exemplary thinks first, it gives amended answers but takes longer. This trade-off affects velocity and quality. |
Chunking | Breaking agelong texts into smaller parts | Splitting documents helps hunt them better. How you disagreement them affects your results. |
Context Window | The maximum substance the exemplary tin usage astatine once | The exemplary has a bounds connected however overmuch substance it tin handle. You request to negociate this to acceptable important information. |
Distillation | Making a smaller, faster exemplary from a large one | It lets you usage cheaper, faster models with little hold (latency). But the smaller exemplary mightiness not beryllium arsenic close oregon almighty arsenic the large one. So, you commercialized immoderate show for velocity and outgo savings. |
Embeddings | Turning words into numbers that amusement meaning | Embeddings fto you hunt documents by meaning, not conscionable nonstop words. This helps you find accusation adjacent if antithetic words are used, making searches smarter and much accurate. |
Few-Shot Learning | Teaching the exemplary with lone a fewer examples | By giving the exemplary examples, you tin usher it to behave the mode you want. It’s a elemental but almighty mode to thatch the AI what is bully oregon bad. |
Fine-Tuning | Adjusting a pretrained exemplary for a circumstantial job | It helps marque the AI amended for your needs by teaching it with your data, but it mightiness go little bully astatine wide tasks. Fine-tuning works champion for circumstantial jobs wherever you request higher accuracy. |
Frequency Penalties | Settings to halt the exemplary from repeating words | Helps marque AI responses much varied and interesting, avoiding boring repetition. |
Function Calling | Getting the exemplary to trigger actions oregon code | Allows AI to interact with apps, making it utile for tasks similar getting information oregon automating jobs. |
Guardrails | Safety rules to power exemplary outputs | Guardrails assistance trim the accidental of the AI giving atrocious oregon harmful answers, but they are not perfect. It’s important to usage them wisely and not trust connected them completely. |
Hallucination | When AI makes up things that aren’t true | AIs sometimes marque worldly up, and you can’t wholly halt this. It’s important to beryllium alert that mistakes tin happen, truthful you should cheque the AI’s answers. |
Hyperparameters | Settings that impact however the exemplary works | By adjusting these settings, you tin marque the AI enactment better. It often takes trying antithetic options to find what works best. |
Hybrid Search | Combining hunt methods to get amended results | By utilizing some keyword and meaning-based search, you get amended results. Just utilizing 1 mightiness not enactment well. Combining them helps radical find what they’re looking for much easily. |
Inference | Getting an reply backmost from the model | When you inquire the AI a question and it gives you an answer, that’s called inference. It’s the process of the AI making predictions oregon responses. Knowing this helps you recognize however the AI works and the clip oregon resources it mightiness request to springiness answers. |
Inference Endpoint | Where the exemplary is disposable for use | Lets you usage the AI exemplary successful your apps oregon services. |
Latency | The clip hold successful getting a response | Lower latency means faster replies, improving idiosyncratic experience. |
Latent Space | The hidden mode the exemplary represents information wrong it | Helps america recognize however the AI processes information. |
LLM (Large Language Model) | A large AI exemplary that understands and generates text | Powers galore AI tools, similar chatbots and contented creators. |
Model Deployment | Making the exemplary disposable online | Needed to enactment AI into real-world use. |
Multimodal | Models that grip antithetic information types, similar substance and images | People usage words, pictures, and sounds. When AI tin recognize each these, it tin assistance users better. Using multimodal AI makes your tools much powerful. |
Overfitting | When a exemplary learns grooming information excessively good but fails connected caller data | If the AI is excessively tuned to aged examples, it mightiness not enactment good connected caller stuff. Getting cleanable scores connected tests mightiness mean it’s overfitting. You privation the AI to grip caller things, not conscionable repetition what it learned. |
Pretraining | The model’s archetypal learning signifier connected tons of data | It’s similar giving the exemplary a large acquisition earlier it starts circumstantial jobs. This helps it larn wide things, but you mightiness request to set it aboriginal for your needs. |
Prompt | The input oregon question you springiness to the AI | Giving wide and elaborate prompts helps the AI recognize what you want. Just similar talking to a person, bully connection gets amended results. |
Prompt Engineering | Designing prompts to get the champion results | By learning however to constitute bully prompts, you tin marque the AI springiness amended answers. It’s similar improving your connection skills to get the champion results. |
Prompt Injection | A information hazard wherever atrocious instructions are added to prompts | Users mightiness effort to instrumentality the AI into ignoring your rules and doing things you don’t want. Knowing astir punctual injection helps you support your AI strategy from misuse. |
Prompt Templates | Premade formats for prompts to support inputs consistent | They assistance you pass with the AI consistently by filling successful blanks successful a acceptable format. This makes it easier to usage the AI successful antithetic situations and ensures you get bully results. |
Rate Limiting | Limiting however galore requests tin beryllium made successful a clip period | Prevents strategy overload, keeping services moving smoothly. |
Reinforcement Learning from Human Feedback (RLHF) | Training AI utilizing people’s feedback | It helps the AI larn from what radical similar oregon don’t like, making its answers better. But it’s a analyzable method, and you mightiness not request it close away. |
Reranking | Sorting results to prime the astir important ones | When you person constricted abstraction (like a tiny discourse window), reranking helps you take the astir applicable documents to amusement the AI. This ensures the champion accusation is used, improving the AI’s answers. |
Retrieval-augmented procreation (RAG) | Providing applicable discourse to the LLM | A connection exemplary needs due discourse to reply questions. Like a person, it needs entree to accusation specified arsenic data, past conversations, oregon documents to springiness a bully answer. Collecting and giving this info to the AI earlier asking it questions helps forestall mistakes oregon it saying, “I don’t know.” |
Semantic Search | Searching based connected meaning, not conscionable words | It lets you hunt based connected meaning, not conscionable nonstop words, utilizing embeddings. Combining it with keyword hunt (hybrid search) gives adjacent amended results. |
Temperature | A mounting that controls however originative AI responses are | Lets you take betwixt predictable oregon much imaginative answers. Adjusting somesthesia tin impact the prime and usefulness of the AI’s responses. |
Token Limits | The max fig of words oregon pieces the exemplary handles | Affects however overmuch accusation you tin input oregon get back. You request to program your AI usage wrong these limits, balancing item and cost. |
Tokenization | Breaking substance into tiny pieces the exemplary understands | It allows the AI to recognize the text. Also, you wage for AI based connected the fig of tokens used, truthful knowing astir tokens helps negociate costs. |
Top-p Sampling | Choosing the adjacent connection from apical choices making up a acceptable probability | Balances predictability and creativity successful AI responses. The trade-off is betwixt harmless answers and much varied ones. |
Transfer Learning | Using cognition from 1 task to assistance with another | You tin commencement with a beardown AI exemplary idiosyncratic other made and set it for your needs. This saves clip and keeps the model’s wide abilities portion making it amended for your tasks. |
Transformer | A benignant of AI exemplary utilizing attraction to recognize language | They are the main benignant of exemplary utilized successful generative AI today, similar the ones that powerfulness chatbots and connection tools. |
Vector Database | A peculiar database for storing and searching embeddings | They store embeddings of text, images, and more, truthful you tin hunt by meaning. This makes uncovering akin items faster and improves searches and recommendations. |
Zero-Shot Learning | When the exemplary does a caller task without grooming oregon examples | This means you don’t springiness immoderate examples to the AI. While it’s bully for elemental tasks, not providing examples mightiness marque it harder for the AI to execute good connected analyzable tasks. Giving examples helps, but takes up abstraction successful the prompt. You request to equilibrium punctual abstraction with the request for examples. |
Footnotes
- Diagram adapted from my blog station “Your AI Product Needs Evals.”
This station is an excerpt (chapters 1–3) of an upcoming study of the aforesaid title. The afloat study volition beryllium released connected the O’Reilly learning level connected February 27, 2025.