Every AI knowledge stack hinges on an embedding model—the thing that turns text into vectors you can search, cluster, and feed into RAG pipelines. The default answer is almost always OpenAI’s text-embedding-3-small, but it’s worth knowing exactly when you should ignore that default.
For the vast majority of projects, especially at personal or small-team scale, the 1,536-dimension embeddings from OpenAI are hard to beat. At $0.02 per million tokens, you’re looking at maybe $2 a month to power a serious AI memory tool like SuperMemory or a Notion AI–style knowledge base. Latency is sub-100ms, MTEB scores are excellent, and you don’t have to maintain any infrastructure. If your vector database of choice is pgvector, Pinecone, or Supabase, OpenAI’s endpoint just slots right in.
Where it falls apart is when you need local inference or massive throughput. Nomic Embed v1.5 (768 dims) runs entirely on your own hardware via LM Studio, zero cost per token, with sub-10ms latency. Privacy absolutely matters here—think sensitive notes, a local Obsidian vault wired into a Tiago Forte–style second brain. If you’re churning through 100M+ tokens a day for a chaotic dev loop, the local pipeline pays for itself in avoiding API fees and network jitter. Nomic gets slightly lower MTEB scores on general queries, but after a little domain fine-tuning it can outperform on your specific data.
Then there are the specialty players: Cohere embed-v3 for multilingual coverage, and Voyage AI’s voyage-code-3 or voyage-law-2 for domain-specific grunt work. They compete on price with OpenAI, but they’re solving different problems. If your knowledge stack treats code or legal docs as first-class citizens, or if you’re building a RAG system that has to work across languages, it’s worth a serious look. The default isn’t wrong—it’s just not the only right answer anymore.
The article breaks down the numbers and the edge cases—well worth a read if you’re building anything that stores and retrieves AI memories.
Read it →