Deploying RAG in Bedrock vs. Local: WattBot 2025 Case Study

Videos

ML+X

UW-Madison

RAG

Retrieval

LLM

Cloud

AWS

Bedrock

Hugging Face

Foundation models

GenAI

Sustainability

Energy

GPU

Deep learning

NLP

Presenters

Nils Matteson

Blaise Enuh

Chris Endemann

Date

February 17, 2026

Many researchers are exploring retrieval-augmented generation (RAG) to build document-grounded, trustworthy AI tools, but it is often unclear how design choices around models, infrastructure, and deployment play out in practice. In this session, we present lessons learned from replicating the winning RAG system from the WattBot 2025 challenge. The challenge focuses on producing citation-backed energy and sustainability estimates for AI workloads from a fixed corpus of 30+ academic papers — or explicitly abstaining when evidence is missing. After a short overview of the winning approach, Nils Matteson and Blaise Enuh walk through how the system is implemented in practice, including:

A cloud deployment using AWS Bedrock
Local, open-source deployments (e.g., Hugging Face models on GB10 and Dell PowerEdge R7725 hardware)

The session compares performance, cost, latency, and operational tradeoffs across environments. It also includes a Streamlit-based interface demo for those looking to host their own RAG apps.

This work was conducted as part of ongoing AI infrastructure evaluation within the Research Cyberinfrastructure (RCI) office in DoIT.

Deploying RAG in Bedrock vs. Local: WattBot 2025 Case Study

Links

Comments

Links

Related resources

Comments