EcoSearch

EcoSearch is an open, modular Retrieval-Augmented Generation (RAG) framework for efficient and interpretable question answering. It prioritises high-precision retrieval (dense semantic search + CrossEncoder reranking), minimises LLM usage, and introduces sentence-level oracle-span overlap for rigorous evaluation.

Abstract

EcoSearch is a modular, lightweight semantic retrieval framework for open-domain QA that emphasises transparency, reproducibility, and efficiency. Unlike pipelines that compare user queries directly to document chunks, EcoSearch adopts a question-to-question retrieval approach: documents are split into overlapping chunks; each chunk is summarised and used to generate a synthetic “guiding question” via an LLM. All guiding questions are embedded with SentenceTransformers and indexed in FAISS.

At query time, the user question is embedded and matched against the pre-generated guiding questions (not raw chunks) to retrieve candidates with strong semantic alignment. Candidates are then reranked with a CrossEncoder for precise ordering. This pipeline reduces LLM dependency, improves retrieval transparency, and supports agentic user control over answer generation: the user can accept the best chunk, trigger a focused LLM answer from it, invoke a fallback that selects the best three consecutive sentences from the corpus, or declare “no answer” when appropriate.

Evaluation focuses on oracle-span overlap (not recall@k). EcoSearch tracks sentence indices within chunks and compares retrieved spans against LLM-derived oracle spans, yielding fine-grained semantic and structural alignment signals and more interpretable diagnostics for retrieval quality.

Why no public demo? EcoSearch integrates metered third-party LLMs (OpenAI) and protects evaluation datasets. To avoid unintended costs and leakage, interactive demos are offered 1:1 on request. Use the form below to request the white paper or a private walkthrough.

Roadmap snapshot: • Stable web pipeline (internal). • Mobile companion in progress (OCR capture, multi-page PDF assembly, local query). • Ongoing: prompt stabilisation for guiding-question generation.

Request the EcoSearch White Paper