The Survival Guide to SaaS in a World of AI (I)

So you have been serving your customers for years successfully. The business is solid. Then suddenly, customers start asking for “AI features.” You have 3 options:

Ignore them and hope for the best
Toys and gimmicks
Work on AI features that provide real value to your customers

Let's explore option 3.

What value can AI bring to your SaaS

All starts from the customer. What is the job the customer is trying to get done? If you have been paying attention to them, you already know. So the #1 task is to identify those use cases. Don't think about AI for now. Think about the job they are trying to get done.

Once you have a list of use cases, it becomes possible to think about how AI can help solve them. Usually those use cases fall into 3 categories:

Better search. You can probably replace your current keyword search with semantic search.
Suggest and fill in data. Help users create content faster. This can be done in many ways, and depending on the use case it can be a simple text completion or a more complex data generation process, where structured data is generated based on the user's input. Also images can be generated based on the user's input.
Audit and correct data. Help users identify and correct errors in their data.

The naive approach

So you have your list of use cases and an approximate idea of how each of them falls into the categories above. After experimenting with ChatGPT and reading a lot of blog posts and articles about AI, it feels like you have a good understanding of what is possible.

Current models support 1M or more tokens of context, so large chunks of customer data can simply be dumped into the model to generate content. A prototype gets put together.

A question is asked, and the model gives something that seems to work. But with every request, the payload grows and costs climb, because the same tokens are processed over and over again. Latency follows.

Then quality degrades. The model isn’t a search engine, it’s a pattern matcher. When given too much context, it doesn’t pick the right pieces every time. It mixes things, misses key details, and confidently answers from the wrong part of the dataset.

In addition, when everything is in the prompt, a clear link between answer and source is lost. It becomes hard to say “this came from this document”. That makes results harder to trust, harder to debug, and harder to explain to users.

So yes, sometimes everything can be dumped into context. But what gets built is a very expensive, slightly confused data assistant instead of a system that knows how to properly look things up. That’s the gap embeddings fill.

A more robust approach

Don’t give the model everything. Give it just what it needs.

You treat the system as two steps: first, figure out what matters for the question; then, let the model work on that smaller, relevant data chunk.

When a request comes in, you use embeddings (plus filters like tenant, type, etc) to narrow the dataset down to the few pieces that actually matter for the query. Then you pass only that context to the model to generate the answer.

This keeps cost and latency under control, improves accuracy, and makes grounding explicit because you know which pieces of data were used.

In short: retrieve first, reason second.

Embeddings are a way of turning data, text, images, even audio, into numbers that capture meaning. Once everything is in that numeric form, you can compare items by similarity, so things that “mean the same” (a sentence, a product photo, a diagram) end up close to each other. That’s what lets you search by intent instead of exact matches, regardless of whether the input is words or pixels.

Basic building blocks

We’re basically building a small system that assembles the right context before calling the model.

At a high level, we need three pieces:

A way to find relevant data. Embeddings + metadata filters. This is our “search layer” that narrows down the dataset to what actually matters for the query.
A way to select and shape context. This is where we decide what goes into the prompt. Take the results, clean them up, maybe rank or expand them, and build a tight context packet.
A model to reason and generate. The LLM takes that curated context and produces the answer. One-liner: search → curate → generate.