How we index images for RAG (kapa.ai) AI

Kapa.ai describes how it improves RAG over technical documentation images by generating and storing one-time text captions for each image at indexing time, then retrieving those captions with ordinary text at query time instead of using a multimodal model on every request. The company argues query-time vision is too expensive, payload-constrained, and often loses fine details needed for charts/tables, while ingestion-time transcription can preserve load-bearing information. In experiments across three customer projects, image captions were reported as measurably better than a text-only baseline with only a small per-query cost increase (about 1%–6%) and correct image citation 94%–99% of the time.

June 02, 2026 19:35 Source: Hacker News