Deploying RAG in Bedrock vs. Local: WattBot 2025 Case Study

Videos
ML+X
UW-Madison
RAG
Retrieval
LLM
Cloud
AWS
Bedrock
Hugging Face
Foundation models
GenAI
Sustainability
Energy
GPU
Deep learning
NLP
Presenters

Nils Matteson

Blaise Enuh

Chris Endemann

Date

February 17, 2026

Many researchers are exploring retrieval-augmented generation (RAG) to build document-grounded, trustworthy AI tools, but it is often unclear how design choices around models, infrastructure, and deployment play out in practice. In this session, we present lessons learned from replicating the winning RAG system from the WattBot 2025 challenge. The challenge focuses on producing citation-backed energy and sustainability estimates for AI workloads from a fixed corpus of 30+ academic papers — or explicitly abstaining when evidence is missing. After a short overview of the winning approach, Nils Matteson and Blaise Enuh walk through how the system is implemented in practice, including:

  1. A cloud deployment using AWS Bedrock
  2. Local, open-source deployments (e.g., Hugging Face models on GB10 and Dell PowerEdge R7725 hardware)

The session compares performance, cost, latency, and operational tradeoffs across environments. It also includes a Streamlit-based interface demo for those looking to host their own RAG apps.

This work was conducted as part of ongoing AI infrastructure evaluation within the Research Cyberinfrastructure (RCI) office in DoIT.