Deploying RAG in Bedrock vs. Local: WattBot 2025 Case Study
Many researchers are exploring retrieval-augmented generation (RAG) to build document-grounded, trustworthy AI tools, but it is often unclear how design choices around models, infrastructure, and deployment play out in practice. In this session, we present lessons learned from replicating the winning RAG system from the WattBot 2025 challenge. The challenge focuses on producing citation-backed energy and sustainability estimates for AI workloads from a fixed corpus of 30+ academic papers — or explicitly abstaining when evidence is missing. After a short overview of the winning approach, Nils Matteson and Blaise Enuh walk through how the system is implemented in practice, including:
- A cloud deployment using AWS Bedrock
- Local, open-source deployments (e.g., Hugging Face models on GB10 and Dell PowerEdge R7725 hardware)
The session compares performance, cost, latency, and operational tradeoffs across environments. It also includes a Streamlit-based interface demo for those looking to host their own RAG apps.
This work was conducted as part of ongoing AI infrastructure evaluation within the Research Cyberinfrastructure (RCI) office in DoIT.
Links
- GitHub: WattBot in Bedrock and Local
- Kaggle challenge: WattBot 2025
- Winning solution: KohakuBlueleaf/KohakuRAG
- Annual hackathon: Machine Learning Marathon: Learn about the annual Machine Learning Marathon (3-month AI/ML hackathon) hosted by ML+X each fall. Reach out to Chris if you’d like to submit a project!