EcoSearch is an open, modular Retrieval-Augmented Generation (RAG) framework for efficient and interpretable question answering. It prioritises high-precision retrieval (dense semantic search + CrossEncoder reranking), minimises LLM usage, and introduces sentence-level oracle-span overlap for rigorous evaluation.
EcoSearch is a modular, lightweight semantic retrieval framework for open-domain QA that emphasises transparency, reproducibility, and efficiency. Unlike pipelines that compare user queries directly to document chunks, EcoSearch adopts a question-to-question retrieval approach: documents are split into overlapping chunks; each chunk is summarised and used to generate a synthetic “guiding question” via an LLM. All guiding questions are embedded with SentenceTransformers and indexed in FAISS.
At query time, the user question is embedded and matched against the pre-generated guiding questions (not raw chunks) to retrieve candidates with strong semantic alignment. Candidates are then reranked with a CrossEncoder for precise ordering. This pipeline reduces LLM dependency, improves retrieval transparency, and supports agentic user control over answer generation: the user can accept the best chunk, trigger a focused LLM answer from it, invoke a fallback that selects the best three consecutive sentences from the corpus, or declare “no answer” when appropriate.
Evaluation focuses on oracle-span overlap (not recall@k). EcoSearch tracks sentence indices within chunks and compares retrieved spans against LLM-derived oracle spans, yielding fine-grained semantic and structural alignment signals and more interpretable diagnostics for retrieval quality.