EcoSearch

EcoSearch is an open, modular Retrieval-Augmented Generation (RAG) framework designed to make question answering more efficient and interpretable. It combines dense semantic search, minimal reliance on large language models, and rigorous evaluation—offering a modern approach to intelligent information retrieval.

Abstract
EcoSearch is a modular and lightweight semantic retrieval-based question answering system, designed to offer a transparent, reproducible, and efficient pipeline for open-domain QA. Unlike typical pipelines that compare user queries directly to document chunks, EcoSearch introduces a question-to-question retrieval approach: documents are split into overlapping chunks, each summarised and used to generate a synthetic “guiding question” via an LLM. All questions are embedded using SentenceTransformers, and the resulting corpus of question embeddings is indexed with FAISS. At query time, user questions are embedded and compared against the pre-generated questions via semantic similarity, retrieving the top relevant chunks. These candidates are then re-ranked using a CrossEncoder for precise matching. This pipeline reduces LLM usage, improves retrieval transparency, and allows for detailed evaluation: performance is measured by comparing retrieved chunks against ideal oracle spans identified by the LLM. This approach distinguishes EcoSearch from classic RAG pipelines by prioritising retrieval, and by requiring explicit user action for fallback LLM answer generation rather than automated end-to-end generation. While EcoSearch delivers robust, interpretable retrieval and answer generation, we find that automatic question generation remains the key source of variability, as LLM outputs are sensitive to prompt design and inherent randomness. Addressing this remains a priority for future development.
📦 Download Pitch (ZIP)

Request the EcoSearch White Paper