<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>ML+X Nexus: All Resources</title>
<link>https://uw-madison-datascience.github.io/ML-X-Nexus/</link>
<atom:link href="https://uw-madison-datascience.github.io/ML-X-Nexus/index.xml" rel="self" type="application/rss+xml"/>
<description>All new ML and AI resources from the UW-Madison ML+X community</description>
<generator>quarto-1.9.36</generator>
<lastBuildDate>Tue, 10 Mar 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>Claude Code Best Practices</title>
  <dc:creator>Chris Endemann</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Blogs/claude-code-best-practices.html</link>
  <description><![CDATA[ 




<p>AI coding agents — tools that can autonomously write, edit, run, and test code on your behalf — are rapidly changing how software gets built. The space is crowded and evolving fast: <a href="https://docs.anthropic.com/en/docs/claude-code/overview">Claude Code</a>, <a href="https://github.com/features/copilot">GitHub Copilot</a>, <a href="https://www.cursor.com/">Cursor</a>, <a href="https://windsurf.com/">Windsurf</a>, <a href="https://www.augmentcode.com/">Augment Code</a>, <a href="https://aws.amazon.com/q/developer/">Amazon Q Developer</a>, <a href="https://cloud.google.com/products/gemini/code-assist">Gemini Code Assist</a>, and <a href="https://about.gitlab.com/gitlab-duo/">GitLab Duo</a> are among the most prominent, with new entrants appearing regularly.</p>
<p>Best practices in this space are still being discovered — by the ML+X community and the broader developer ecosystem alike. This guide is our attempt to start mapping what works, using <strong><a href="https://code.claude.com/">Claude Code</a></strong> as the primary lens. Claude Code is Anthropic’s agentic coding tool — distinct from the <a href="https://claude.ai">Claude.ai</a> chat interface — and it comes in several forms: a <strong>CLI</strong>, a <strong>desktop app</strong>, <strong>IDE extensions</strong> (<a href="https://code.claude.com/docs/en/vs-code">VS Code</a>, <a href="https://code.claude.com/docs/en/jetbrains">JetBrains</a>), and a <strong>web IDE</strong> at <a href="https://claude.ai/code">claude.ai/code</a>. All give the agent real shell access to read, write, and execute code, which makes it a great lens for exploring the trade-offs of agentic coding: permissions, context management, and cost. We’ll reference Claude.ai, GitHub Copilot, and other tools for comparison where useful.</p>
<p>This is a first pass based on early experience — we expect it to evolve as the ML+X community builds more hands-on knowledge. If you have tips, corrections, or experiences to share, please leave a comment below.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>This guide reflects the author’s understanding as of the last date modified
</div>
</div>
<div class="callout-body-container callout-body">
<p>AI tools, pricing, features, and contractual terms change frequently. This post is <strong>community guidance, not official UW-Madison policy</strong>. For the latest institutional policies, data-use agreements, or questions about what data types are permitted with specific tools, consult <a href="https://it.wisc.edu/about/division-of-information-technology/research-cyberinfrastructure/">UW-Madison Research Cyberinfrastructure</a> or your department’s IT office.</p>
</div>
</div>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>UW-Madison cloud users: Get started with Claude Code + Vertex AI or Bedrock
</div>
</div>
<div class="callout-body-container callout-body">
<p>If you’re at UW-Madison and want to use Claude Code through your institutional cloud account (GCP or AWS), check out our <strong><a href="../../Learn/Guides/claude-code-cloud-setup.html">Claude Code Cloud Setup Guide</a></strong> for a step-by-step walkthrough — from cloud project setup to running your first session. Note: UW does not yet have a direct data agreement with Anthropic, so <strong>avoid using Claude Code with restricted or sensitive data</strong>. Cloud routing is suitable for general, non-sensitive research code. See Data Privacy for details.</p>
</div>
</div>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-3-contents" aria-controls="callout-3" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Sources and attribution
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-3" class="callout-3-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p>Much of the Claude Code-specific guidance in this post draws on Anthropic’s official documentation, including their <a href="https://code.claude.com/docs/en/best-practices">best practices guide</a>, <a href="https://code.claude.com/docs/en/permissions">permissions</a> and <a href="https://code.claude.com/docs/en/sandboxing">sandboxing</a> docs, <a href="https://code.claude.com/docs/en/memory">CLAUDE.md reference</a>, <a href="https://code.claude.com/docs/en/data-usage">data usage policy</a>, and <a href="https://code.claude.com/docs/en/costs">cost management guide</a>. GitHub Copilot sections draw on GitHub’s <a href="https://github.blog/news-insights/product-news/github-copilot-meet-the-new-coding-agent/">coding agent docs</a> and <a href="https://github.blog/changelog/2026-02-04-claude-and-codex-are-now-available-in-public-preview-on-github/">changelog</a>. Where we paraphrase official documentation, we’ve linked to the source. Community perspectives and independent analyses are cited inline throughout.</p>
</div>
</div>
</div>
<section id="what-is-agentic-coding" class="level2">
<h2 class="anchored" data-anchor-id="what-is-agentic-coding">What is agentic coding?</h2>
<p>Traditional AI code assistants (like early GitHub Copilot or ChatGPT) work in a simple loop: you ask, they suggest, you accept or reject. <strong>Agentic</strong> coding tools go further. They can:</p>
<ul>
<li>Read and navigate your entire codebase</li>
<li>Execute shell commands and run tests</li>
<li>Edit multiple files in a single pass</li>
<li>Iterate on their own output (fix errors, re-run tests, refine)</li>
<li>Operate semi-autonomously over multi-step tasks</li>
</ul>
<p>This is powerful, but it also means these tools have real access to your system — and the potential to do real damage if not managed carefully.</p>
</section>
<section id="the-landscape-at-a-glance" class="level2">
<h2 class="anchored" data-anchor-id="the-landscape-at-a-glance">The landscape at a glance</h2>
<p>Before diving into Claude Code specifically, here’s a rough map of the major agentic coding tools as of early 2026:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Tool</th>
<th>Interface</th>
<th>Cost model</th>
<th>Notable strengths</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><a href="https://code.claude.com/"><strong>Claude Code</strong></a></td>
<td>CLI, desktop app, IDE extensions, <a href="https://claude.ai/code">web IDE</a></td>
<td>Pay-per-token (API) or Max plan</td>
<td>Strong multi-step reasoning, explicit permission model, <code>CLAUDE.md</code> project config</td>
</tr>
<tr class="even">
<td><a href="https://github.com/features/copilot"><strong>GitHub Copilot</strong></a></td>
<td>VS Code/IDE, GitHub.com</td>
<td>Subscription + usage-based</td>
<td>Native GitHub integration, async PR creation via coding agent, multi-model support</td>
</tr>
<tr class="odd">
<td><a href="https://www.cursor.com/"><strong>Cursor</strong></a></td>
<td>Custom IDE (VS Code fork)</td>
<td>Subscription</td>
<td>Polished IDE experience, fast inline edits, multi-file context handling</td>
</tr>
<tr class="even">
<td><a href="https://windsurf.com/"><strong>Windsurf</strong></a></td>
<td>Custom IDE</td>
<td>Subscription (free tier available)</td>
<td>Low-friction agentic workflow, accessible pricing</td>
</tr>
<tr class="odd">
<td><a href="https://www.augmentcode.com/"><strong>Augment Code</strong></a></td>
<td>IDE extension</td>
<td>Subscription</td>
<td>Large context window, whole-codebase awareness</td>
</tr>
<tr class="even">
<td><a href="https://aws.amazon.com/q/developer/"><strong>Amazon Q Developer</strong></a></td>
<td>IDE, CLI, AWS console</td>
<td>Free tier / Pro</td>
<td>Deep AWS service integration, infrastructure-aware suggestions</td>
</tr>
<tr class="odd">
<td><a href="https://codeassist.google/"><strong>Gemini Code Assist</strong></a></td>
<td>IDE, Google Cloud</td>
<td>Free tier / Enterprise</td>
<td>Google Cloud integration, Gemini model access</td>
</tr>
<tr class="even">
<td><a href="https://about.gitlab.com/gitlab-duo/"><strong>GitLab Duo</strong></a></td>
<td>GitLab IDE, MR workflows</td>
<td>GitLab subscription add-on</td>
<td>Native GitLab CI/CD and merge request integration</td>
</tr>
</tbody>
</table>
<p>This space is moving fast — capabilities and pricing change frequently. See <a href="https://artificialanalysis.ai/insights/coding-agents-comparison">Coding Agents Comparison</a> for up-to-date benchmarks and pricing.</p>
<p>In practice, many developers use <strong>multiple tools</strong>: a chat UI for brainstorming and review, an agentic tool for multi-step feature work, and an IDE copilot for inline completions throughout the day.</p>
<section id="what-the-same-task-looks-like-across-different-tools" class="level3">
<h3 class="anchored" data-anchor-id="what-the-same-task-looks-like-across-different-tools">What the same task looks like across different tools</h3>
<p>To make these distinctions concrete, let’s walk through the same scenario — <em>“I have a repo on GitHub and I want Claude to add a utility function, write tests, and open a PR”</em> — across Claude.ai, Claude Code, and GitHub Copilot.</p>
<section id="claude.ai-chat-not-claude-code" class="level4">
<h4 class="anchored" data-anchor-id="claude.ai-chat-not-claude-code">Claude.ai (Chat — not Claude Code)</h4>
<p><a href="https://claude.ai">Claude.ai</a> is Anthropic’s general-purpose chat interface. It’s <em>not</em> an agentic coding tool — it can’t execute code, edit files, or run commands on your system. You provide context by pasting code into the conversation, and you copy the output back into your editor.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-4-contents" aria-controls="callout-4" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>The Claude Desktop app has multiple tabs — don’t confuse them
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-4" class="callout-4-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p>The <a href="https://claude.ai/download">Claude Desktop app</a> includes three tabs: <strong>Chat</strong> (standard conversation, with <a href="https://modelcontextprotocol.io/">MCP</a> support), <strong>Code</strong> (the full Claude Code agentic experience — see below), and <strong>Cowork</strong> (an autonomous agent for knowledge work — it can execute multi-step tasks on your desktop, like research, file organization, and document creation). Only the <strong>Code</strong> tab is an agentic coding tool. The Chat tab provides the same experience as <a href="https://claude.ai">claude.ai</a> in a native app.</p>
</div>
</div>
</div>
<ol type="1">
<li>Start a new conversation at <a href="https://claude.ai">claude.ai</a> or in the Claude Desktop app’s Chat tab</li>
<li>Paste in the relevant code (e.g., the contents of <code>src/utils/</code> and a few example utilities)</li>
<li>Ask: <em>“Add a <code>slugify</code> function that matches the style of these existing utilities. Also write tests.”</em></li>
<li>Claude generates the code and tests in the chat</li>
<li><strong>You</strong> copy the output back into your editor, create a branch, commit, and open the PR yourself</li>
</ol>
<p><strong>Friction:</strong> You’re the middleware in both directions — pasting code in and copying code out. But notice what’s <em>not</em> here: permission prompts, approve/deny flows, or any risk of it running a bad command. Claude can’t touch your system, so the conversation feels fast and fluid even though you do all the manual work.</p>
<p><strong>Best for:</strong> Quick code generation, architecture discussions, explaining unfamiliar code, and brainstorming — any task where you’re happy to provide context manually and apply changes yourself.</p>
</section>
<section id="claude-code" class="level4">
<h4 class="anchored" data-anchor-id="claude-code">Claude Code</h4>
<p><a href="https://code.claude.com/">Claude Code</a> is Anthropic’s agentic coding tool — completely different from the Claude.ai chat interface. You point it at a repository (by attaching a GitHub repo on the web or desktop, or launching it from a project directory in the terminal), and it can read your code, edit files, run shell commands, execute tests, and iterate on its own output — all within the scope of that project.</p>
<p>Claude Code is available across multiple surfaces — a <a href="https://code.claude.com/docs/en/desktop">desktop app</a>, a <a href="https://code.claude.com/docs/en/claude-code-on-the-web">web IDE</a>, a <a href="https://code.claude.com/docs/en/getting-started">terminal CLI</a>, and IDE extensions for <a href="https://code.claude.com/docs/en/vs-code">VS Code</a> and <a href="https://code.claude.com/docs/en/jetbrains">JetBrains</a> — but the core agentic engine is the same everywhere. You describe what you want, it reads your code, makes changes, runs tests, and iterates until the task is done. The differences between surfaces are mostly about <em>how you interact</em>, <em>where the work runs</em>, and <em>how much the agent can do autonomously</em>.</p>
<p>Here’s what a typical Claude Code session looks like. You type a request like:</p>
<blockquote class="blockquote">
<p><em>Look at src/utils/ and add a slugify function that matches the style of existing utilities. Write tests too. Create a branch, commit, and open a PR when you’re done.</em></p>
</blockquote>
<p>Claude Code will:</p>
<ol type="1">
<li>Read your existing utils to understand the style</li>
<li>Write the function and tests</li>
<li>Run <code>pytest</code> (or whatever your test runner is), see results</li>
<li>If tests fail, iterate — fix the code, re-run</li>
<li>Create a branch, commit, push, and open a PR</li>
</ol>
<p><strong>How much you’re involved depends on the surface.</strong> On the <strong><a href="https://claude.ai/code">web version</a></strong>, Claude runs in an isolated cloud VM and auto-accepts edits — you review the results (a PR, a diff, test output) rather than approving each individual action. The <strong>desktop app</strong> and <strong>CLI</strong> both default to “ask permissions” mode, where Claude proposes changes and waits for your approval before applying them. The desktop app shows visual diffs with accept/reject buttons; the CLI prompts in the terminal. You can reduce this friction on either surface by switching to “auto accept edits” mode, configuring <a href="https://code.claude.com/docs/en/permissions">allow rules</a>, or enabling <a href="https://code.claude.com/docs/en/sandboxing">sandboxing</a> to auto-approve actions that stay within your project directory. Many experienced users auto-approve most actions and invest their review time at the PR stage instead. If you’re just getting started, a good sweet spot is: auto-approve reads and test execution, manually approve writes and git operations.</p>
<section id="claude-code-desktop-web" class="level5">
<h5 class="anchored" data-anchor-id="claude-code-desktop-web">Desktop &amp; Web</h5>
<p>The easiest way to get started is through the <a href="https://code.claude.com/docs/en/desktop">Claude Desktop app</a> (Code tab) or <a href="https://code.claude.com/docs/en/claude-code-on-the-web">Claude Code on the web</a> at <a href="https://claude.ai/code">claude.ai/code</a>. Both provide the same GUI experience. The main difference is where it runs: the desktop app works with local git repositories on your machine, with each session getting its own isolated <a href="https://git-scm.com/docs/git-worktree">git worktree</a> so parallel tasks don’t collide. The web version clones your GitHub repo into an isolated cloud VM — no local setup needed. The web version is also available on mobile (<a href="https://apps.apple.com/us/app/claude-by-anthropic/id6473753684">iOS</a> / <a href="https://play.google.com/store/apps/details?id=com.anthropic.claude">Android</a>) for kicking off and monitoring tasks on the go. <strong>Note:</strong> the desktop app requires Git — your project must be a git repo with at least one commit.</p>
<p>Key capabilities:</p>
<ul>
<li><strong>Visual diff review</strong> — see exactly what Claude changed, leave inline comments on specific lines, and ask Claude to revise</li>
<li><strong>Live app preview</strong> — Claude can start a dev server and verify its own changes in an embedded browser, taking screenshots and fixing issues it finds</li>
<li><strong>Parallel sessions</strong> — run multiple tasks simultaneously in separate tabs, each on its own isolated branch</li>
<li><strong>GitHub PR monitoring</strong> — watch CI status, auto-fix failing checks, and auto-merge when everything passes</li>
<li><strong><a href="https://code.claude.com/docs/en/scheduled-tasks">Scheduled tasks</a></strong> — set up recurring tasks using cron expressions (e.g., daily dependency checks, periodic code reviews, deployment monitoring). On desktop, these persist across sessions; on the web, you can <a href="https://support.claude.com/en/articles/13854387-schedule-recurring-tasks-in-cowork">schedule them in Cowork</a>. In the CLI, use the <code>/loop</code> skill for lightweight in-session polling</li>
<li><strong>Connectors</strong> — one-click integrations for GitHub, Slack, Linear, Notion, and more</li>
<li><strong>Async handoff</strong> — start a task on the web and close your laptop; it runs in the cloud and notifies you when done. You can also start a task from the terminal with <code>claude --remote</code>, or pull a web session into your terminal with <code>claude --teleport</code></li>
</ul>
<p><strong>Best for:</strong> Users who prefer a GUI, want visual diff review and parallel task management, or want to get started without installing anything. The web version is the fastest way to try Claude Code — just open <a href="https://claude.ai/code">claude.ai/code</a> and point it at a repo.</p>
</section>
<section id="terminal-cli" class="level5">
<h5 class="anchored" data-anchor-id="terminal-cli">Terminal (CLI)</h5>
<p>Claude Code is also available as a CLI, installed via npm (<code>npm install -g @anthropic-ai/claude-code</code>). It’s the same agentic engine, but the terminal interface offers some distinct advantages:</p>
<ul>
<li><strong>IDE extensions</strong> — Claude Code integrates directly into <a href="https://code.claude.com/docs/en/vs-code">VS Code</a> and <a href="https://code.claude.com/docs/en/jetbrains">JetBrains</a>, so you can use it without leaving your editor</li>
<li><strong>Scriptability</strong> — pipe commands, chain with shell tools, and integrate into automated workflows (CI/CD, git hooks)</li>
<li><strong><code>CLAUDE.md</code> authoring</strong> — the terminal is the natural place to set up and iterate on your project’s <code>CLAUDE.md</code> configuration</li>
<li><strong>SSH and remote environments</strong> — works anywhere you have a terminal, including remote servers, containers, and cloud dev environments</li>
<li><strong>Full local control</strong> — no cloud dependency; everything runs on your machine (or wherever your terminal is)</li>
<li><strong>Flexible auth and billing</strong> — the desktop and web apps require an Anthropic login (Max plan or API credits). The CLI also supports routing requests through <a href="https://code.claude.com/docs/en/google-vertex-ai">Google Vertex AI</a> or <a href="https://code.claude.com/docs/en/amazon-bedrock">AWS Bedrock</a>, so organizations that need to keep API traffic within their own cloud environment (for compliance, billing, or data residency reasons) can do so. See our <a href="../../Learn/Guides/claude-code-cloud-setup.html">Claude Code Cloud Setup Guide</a> for a step-by-step walkthrough using UW-Madison GCP or AWS</li>
</ul>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Install npm if needed (e.g., on a fresh WSL2 or Ubuntu setup)</span></span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> apt install npm</span>
<span id="cb1-3"></span>
<span id="cb1-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Install and start Claude Code (sudo needed on Linux/WSL2)</span></span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> npm install <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-g</span> @anthropic-ai/claude-code</span>
<span id="cb1-6"></span>
<span id="cb1-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Install sandbox dependencies (WSL2/Linux only)</span></span>
<span id="cb1-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> apt-get update <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> apt-get install bubblewrap socat</span>
<span id="cb1-9"></span>
<span id="cb1-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Navigate to your project and launch Claude Code</span></span>
<span id="cb1-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Note: in WSL2, your Windows files are at /mnt/c/Users/&lt;username&gt;/...</span></span>
<span id="cb1-12"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> yourrepo <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">claude</span></span></code></pre></div></div>
<p><strong>Best for:</strong> Developers comfortable with the terminal, CI/CD integration, scripting and automation, working in remote/SSH environments, and organizations that need to route traffic through their own cloud provider.</p>
</section>
<section id="security-note" class="level5">
<h5 class="anchored" data-anchor-id="security-note">A note on security: Claude Code runs with <em>your</em> permissions</h5>
<div class="callout callout-style-default callout-warning callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Warning</span>Agentic coding tools can access more than you might expect
</div>
</div>
<div class="callout-body-container callout-body">
<p>The level of system access depends on which surface you use:</p>
<ul>
<li><strong>CLI and desktop app</strong> — Claude Code operates with your full user-level filesystem and shell permissions. It can read your SSH keys, modify files outside your project, run arbitrary shell commands, and access anything your user account can reach.</li>
<li><strong>IDE extensions</strong> (VS Code, JetBrains) — same access as the CLI, since the extension runs Claude Code as a local process under your user account.</li>
<li><strong>Web version</strong> (claude.ai/code) — runs in an isolated cloud VM with access only to your cloned GitHub repo. It cannot reach your local filesystem, SSH keys, or other local resources. This is the most restricted surface by default.</li>
</ul>
<p>This isn’t unique to Claude Code — any agentic tool with shell access (Cursor, Windsurf, Copilot coding agent) has similar access on your local machine. The difference is in what mitigations each tool provides.</p>
</div>
</div>
<p>Claude Code mitigates this with several layers of protection:</p>
<ul>
<li><strong>Permission prompts</strong> — Claude asks before every file write, shell command, and git operation. You can configure allow/deny rules to auto-approve trusted actions and hard-block sensitive paths.</li>
<li><strong>Built-in sandboxing</strong> — an <a href="https://code.claude.com/docs/en/sandboxing">OS-level sandbox</a> restricts filesystem access to your project directory and limits outbound network traffic. <strong>This is the single most impactful security measure you can enable.</strong></li>
<li><strong>Desktop app</strong> — adds git worktree isolation on top of the sandbox, so changes in one session don’t affect others until committed.</li>
<li><strong>Web version</strong> (claude.ai/code) — the most restricted surface. Each task runs in a fresh, ephemeral VM with <a href="https://gvisor.dev/">Gvisor-based kernel isolation</a>; storage is wiped when the task completes and credentials never exist inside the sandbox.</li>
</ul>
<p>See Security fundamentals below for configuration details, deny-rule examples, and container guidance — or the <a href="../../Learn/Guides/claude-code-cloud-setup.html#understand-risk">Cloud Setup Guide’s security section</a> for a step-by-step walkthrough.</p>
</section>
</section>
<section id="github-copilot" class="level4">
<h4 class="anchored" data-anchor-id="github-copilot">GitHub Copilot</h4>
<p><a href="https://github.com/features/copilot">GitHub Copilot</a> is GitHub’s AI coding assistant. It’s a <strong>multi-model platform</strong> — you can choose from Claude, GPT, Gemini, and others as the underlying model. This is fundamentally different from Claude Code, and the distinction matters.</p>
<section id="claude-in-copilot-vs.-claude-code-whats-actually-different" class="level5">
<h5 class="anchored" data-anchor-id="claude-in-copilot-vs.-claude-code-whats-actually-different">“Claude” in Copilot vs.&nbsp;Claude Code: what’s actually different?</h5>
<p>When you select Claude as the model in Copilot (whether in VS Code agent mode or the async coding agent), you’re using Claude’s <em>language model</em> — but <strong>GitHub’s orchestration layer</strong> is driving it. GitHub controls the system prompts, the tool-calling framework, the context management, and how your instructions are delivered to the model. Think of it as Claude’s brain in GitHub’s body.</p>
<p>Claude Code, by contrast, is Anthropic’s <em>own</em> agentic system built specifically around Claude. Anthropic controls the entire stack: the system prompts are purpose-built for agentic coding, the tool framework is designed for Claude’s strengths, and features like extended thinking, <code>CLAUDE.md</code> project configuration, and the permission model are all tightly integrated.</p>
<p><strong>Why this matters in practice:</strong></p>
<ul>
<li><strong>Context handling</strong> — Copilot primarily derives context from open tabs and (when indexing is enabled) broader repo structure, with a <a href="https://docs.github.com/en/copilot/reference/ai-models/supported-models">platform-level cap of ~128k tokens</a>. Claude Code uses Claude’s full 200k-token context window and maps your entire repository, accumulating context through conversation threading. For multi-file tasks, Claude Code generally <a href="https://www.sitepoint.com/github-copilot-vs-claude-code-accuracy-speed-2026/">understands project architecture more holistically</a>.</li>
<li><strong>Instruction following</strong> — Claude Code reads your <code>CLAUDE.md</code> files natively. Copilot has its own instruction mechanism (<code>copilot-instructions.md</code>), but users have <a href="https://github.com/orgs/community/discussions/176156">reported</a> that Claude models don’t always follow Copilot’s instruction files as reliably — because the model is being orchestrated by a system designed for multiple models, not optimized for any one.</li>
<li><strong>Extended thinking</strong> — Claude Code uses extended thinking by default with adjustable token budgets. Copilot support for thinking tokens has been <a href="https://github.com/orgs/community/discussions/176156">inconsistent</a>, with some configurations producing errors when extended thinking parameters are passed.</li>
<li><strong>Tools and sub-agents</strong> — Claude Code ships with 18+ built-in tools (file editing, bash, search, git, sub-agents), plus full <a href="https://modelcontextprotocol.io/">MCP</a> support and hooks. Copilot agent mode uses its own curated tool set, which is capable but less extensive.</li>
<li><strong>Quality on complex tasks</strong> — In a <a href="https://www.sitepoint.com/github-copilot-vs-claude-code-accuracy-speed-2026/">50-session benchmark study</a>, Claude Code produced a higher accept rate (44% vs 38%) and scored significantly better on bug-fixing context fidelity (8.5/10 vs 5.9/10). Copilot was ~15 seconds faster per task on average and excels at inline completions.</li>
</ul>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-6-contents" aria-controls="callout-6" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Claude as a standalone agent on GitHub (Feb 2026)
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-6" class="callout-6-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p>As of <a href="https://github.blog/changelog/2026-02-04-claude-and-codex-are-now-available-in-public-preview-on-github/">February 2026</a>, Claude is also available as a <strong>standalone agent</strong> on GitHub — not just a model choice within Copilot. You can assign issues directly to <code>@claude</code> (or <code>@copilot</code>, or <code>@codex</code>) on GitHub.com, and in <a href="https://code.visualstudio.com/blogs/2026/02/05/multi-agent-development">VS Code 1.109+</a> you can start Claude agent sessions that use Anthropic’s own agent harness rather than Copilot’s orchestration. In these modes, you get the same prompts, tools, and architecture as Claude Code — which should close the quality gap vs.&nbsp;using Claude as a model within Copilot. Initially available for Pro+ and Enterprise plans; <a href="https://github.blog/changelog/2026-02-26-claude-and-codex-now-available-for-copilot-business-pro-users/">expanded to Copilot Business and Pro</a> on Feb 26 at no additional cost.</p>
</div>
</div>
</div>
</section>
<section id="agent-mode-in-vs-code" class="level5">
<h5 class="anchored" data-anchor-id="agent-mode-in-vs-code">Agent mode (in VS Code)</h5>
<ol type="1">
<li>Open your repo in VS Code with the Copilot extension installed</li>
<li>Open the Copilot chat panel (Ctrl/Cmd+Shift+I)</li>
<li>Select agent mode, choose Claude as the model</li>
<li>Type: <em>“Add a slugify function to src/utils/ matching the existing style. Write tests.”</em></li>
</ol>
<p>Copilot will:</p>
<ol type="1">
<li>Read relevant files</li>
<li>Create/edit files directly — <strong>no permission prompt by default</strong> in many configurations</li>
<li>May run tests if it decides to (or you can ask it to)</li>
<li>You review the changes in VS Code’s diff view</li>
<li><strong>You</strong> handle the git workflow (branch, commit, push, PR) — or use the async coding agent for that</li>
</ol>
<p><strong>Friction:</strong> The IDE experience is smooth but you have less visibility into <em>why</em> the agent made certain choices. Agent mode is still evolving — for complex multi-step tasks it may not iterate as effectively as Claude Code’s agentic loop. The upside is zero context-switching: you’re already in your editor.</p>
</section>
<section id="coding-agent-async" class="level5">
<h5 class="anchored" data-anchor-id="coding-agent-async">Coding agent (async)</h5>
<p>GitHub’s async coding agents let you delegate work directly from issues and PRs — no IDE or terminal needed:</p>
<ol type="1">
<li>Go to your repo on GitHub.com</li>
<li>Create an issue: <em>“Add a slugify utility function to src/utils/ with tests”</em></li>
<li>Assign the issue to <code>@copilot</code>, <code>@claude</code>, or <code>@codex</code> via the Assignees dropdown</li>
<li>Walk away — the agent works in a secure cloud environment</li>
</ol>
<p>The agent will:</p>
<ol type="1">
<li>Create a branch</li>
<li>Implement the function and tests in an ephemeral environment</li>
<li>Open a draft PR referencing the issue</li>
<li>You get a notification when the PR is ready to review</li>
<li>You can leave review comments mentioning <code>@claude</code> to request changes — the agent iterates like a human collaborator</li>
</ol>
<p><strong>What’s running under the hood?</strong> When you assign to <code>@claude</code>, GitHub runs Anthropic’s <a href="https://github.com/anthropics/claude-code-action">Claude Code Action</a> — which uses the same Claude Code engine (agentic loop, tools, extended thinking) that powers the CLI and desktop app. The key difference is that it runs in GitHub’s managed environment rather than your local machine, and its scope is limited to the repo and issue context. Assigning to <code>@copilot</code> uses GitHub’s own orchestration with your selected model, and <code>@codex</code> uses OpenAI’s agent.</p>
<p>By default, the async coding agent uses Claude Sonnet 4.6 when no model is explicitly selected. You can choose from Claude Opus 4.6, Claude Sonnet 4.5, GPT-5.1-Codex-Max, GPT-5.2-Codex, and others via the model picker.</p>
<p><strong>Friction:</strong> This is the most hands-off option, but you have the least control during execution. Works best for well-scoped, clearly described issues. If the task is ambiguous or requires judgment calls, you may end up doing multiple rounds of PR review and comments to guide it.</p>
<p><strong>Best for:</strong> Inline autocomplete, single-file edits, and quick agent tasks within the IDE. Also excellent for async PR generation on well-defined issues. Many developers use Copilot <em>alongside</em> Claude Code — Copilot for inline completions in the editor, Claude Code in the terminal for deep multi-file work.</p>
</section>
</section>
<section id="key-takeaway" class="level4">
<h4 class="anchored" data-anchor-id="key-takeaway">Key takeaway</h4>
<p>The same task ranges from fully manual (Claude.ai — you apply every change) to fully hands-off (Copilot coding agent — you just review the PR). But “more autonomous” doesn’t always mean “better results.”</p>
<p>Counterintuitively, <strong>Claude.ai can feel lower-friction than Claude Code</strong> for many tasks — the chat interface just <em>answers</em>, with no permission prompts or approve/deny flow. You lose the ability to have Claude execute things directly, but you gain a frictionless conversation. Claude Code (in any form) is far more capable — it can run tests, iterate on failures, and push code — but its default guardrails (which exist for good reason) mean more interruptions until you tune them.</p>
<p>The trade-off is between <strong>autonomy</strong>, <strong>control</strong>, and <strong>optimization</strong>:</p>
<ul>
<li><strong>Claude.ai (chat)</strong> — not agentic, but fluid and zero-risk. You do the manual work.</li>
<li><strong>Claude Code (desktop, web, CLI, or IDE extension)</strong> — fully agentic, with Anthropic’s purpose-built orchestration optimized for Claude. The deepest integration between model and tooling.</li>
<li><strong>Copilot with Claude model (IDE)</strong> — agentic within the IDE, fewer interruptions, but Claude is running through GitHub’s orchestration layer rather than Anthropic’s. Good for inline work; less optimized for complex multi-step reasoning.</li>
<li><strong>Claude agent on GitHub (async)</strong> — Anthropic’s own agent harness running on GitHub’s infrastructure. Assign issues to <code>@claude</code> for async PR generation.</li>
</ul>
<p>Pick based on the task. Sensitive work or unfamiliar codebase? Claude Code’s guardrails are a feature. Quick question or brainstorming? Claude.ai chat is hard to beat. Already in VS Code and want inline help? Copilot is hard to beat. Need Claude’s full reasoning depth on a complex refactor? Claude Code is the most direct path to the model’s capabilities.</p>
</section>
</section>
</section>
<section id="working-effectively-with-claude-code" class="level2">
<h2 class="anchored" data-anchor-id="working-effectively-with-claude-code">Working effectively with Claude Code</h2>
<p>This is the core of the guide. Whether you’re using the CLI, desktop app, an IDE extension, or the web IDE, these practices apply across all Claude Code surfaces. Anthropic’s own <a href="https://code.claude.com/docs/en/best-practices">best practices guide</a> goes deeper on context management, prompt patterns, and scaling across parallel sessions — we’ll highlight the essentials here and add our own perspective.</p>
<section id="think-in-features-not-projects" class="level3">
<h3 class="anchored" data-anchor-id="think-in-features-not-projects">Think in features, not projects</h3>
<p>One of the biggest lessons from working with agentic coding tools: <strong>use them for feature-level development, not for building entire projects in one shot.</strong></p>
<p>Why? Because agents work best with clear, well-scoped requests. The less clarity you provide, the more the agent has to guess — and guessing leads to:</p>
<ul>
<li>Agentic loops (trying approaches, failing, trying again)</li>
<li>Drift from your intended architecture</li>
<li>Wasted tokens and time</li>
<li>Code that technically works but doesn’t match your vision</li>
</ul>
<p><strong>Precise requests get precise results.</strong> Instead of “build me a web app with auth,” try:</p>
<ul>
<li>“Add a login form component that submits to <code>/api/auth/login</code> and stores the JWT in a httpOnly cookie”</li>
<li>“Write a pytest fixture that creates a test database with the schema from <code>models.py</code>”</li>
<li>“Refactor the <code>process_data</code> function in <code>pipeline.py</code> to handle the case where <code>input_df</code> has missing columns”</li>
</ul>
<p>Each of these is a single, well-defined task that an agent can execute without ambiguity.</p>
</section>
<section id="use-claude.md-as-your-control-surface" class="level3">
<h3 class="anchored" data-anchor-id="use-claude.md-as-your-control-surface">Use <code>CLAUDE.md</code> as your control surface</h3>
<p><a href="https://code.claude.com/docs/en/memory"><code>CLAUDE.md</code></a> is a markdown file you place in your project root that gives Claude Code persistent context about your project. Think of it as a README for the agent — it’s loaded automatically at the start of every session and shapes how Claude behaves. You can include things like:</p>
<ul>
<li>How your project is structured (key directories, entry points)</li>
<li>Coding conventions (naming, formatting, patterns to follow or avoid)</li>
<li>Testing and build commands</li>
<li>Safety rules (“never force-push,” “don’t modify migrations/”)</li>
<li>Links to docs or specs the agent should reference</li>
</ul>
<p>Claude Code also supports <code>CLAUDE.md</code> files in subdirectories (loaded when Claude works in that directory) and a global <code>~/.claude/CLAUDE.md</code> for preferences that apply across all projects. The file is advisory — Claude will follow these instructions in good faith, but they’re not enforced at the system level the way <a href="https://code.claude.com/docs/en/hooks">hooks</a> or <a href="https://code.claude.com/docs/en/permissions">deny rules</a> are. For anything safety-critical, back it up with a hook or deny rule.</p>
<p>This is one of the most underrated features — it’s your main lever for shaping how the agent behaves across sessions.</p>
<p><strong>Prevent runaway loops:</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode markdown code-with-copy"><code class="sourceCode markdown"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">## Testing requirements</span></span>
<span id="cb2-2"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Always run the full test suite (<span class="in" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">`pytest tests/`</span>) after making changes</span>
<span id="cb2-3"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>If tests fail, fix the failing tests before moving on</span>
<span id="cb2-4"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Do not push code with failing tests</span></code></pre></div></div>
<p>This single instruction saves enormous headaches. Without it, the agent might push broken code, you discover test failures in CI, and then you’re spending time fixing things that should have been caught locally.</p>
<p><strong>Enforce project conventions:</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode markdown code-with-copy"><code class="sourceCode markdown"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">## Code style</span></span>
<span id="cb3-2"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Use type hints for all function signatures</span>
<span id="cb3-3"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Follow the existing import ordering convention</span>
<span id="cb3-4"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Do not add new dependencies without asking first</span></code></pre></div></div>
<p><strong>Limit destructive actions:</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode markdown code-with-copy"><code class="sourceCode markdown"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">## Safety</span></span>
<span id="cb4-2"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Never run <span class="in" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">`rm -rf`</span> on any directory</span>
<span id="cb4-3"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Never force-push to any branch</span>
<span id="cb4-4"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Never modify files in the <span class="in" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">`config/production/`</span> directory</span>
<span id="cb4-5"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Always create a new branch for changes; never commit directly to main</span></code></pre></div></div>
<p><strong>Provide architectural context:</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode markdown code-with-copy"><code class="sourceCode markdown"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">## Project structure</span></span>
<span id="cb5-2"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>API routes go in <span class="in" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">`src/routes/`</span></span>
<span id="cb5-3"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Business logic goes in <span class="in" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">`src/services/`</span></span>
<span id="cb5-4"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Database models are in <span class="in" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">`src/models/`</span></span>
<span id="cb5-5"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Tests mirror the source structure under <span class="in" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">`tests/`</span></span></code></pre></div></div>
<p>Good <code>CLAUDE.md</code> context reduces agentic loops — the agent spends less time exploring your project and more time doing useful work. But there’s a tradeoff: <strong><code>CLAUDE.md</code> loads into every session</strong>, and a bloated file can hurt more than it helps.</p>
<div class="callout callout-style-default callout-warning callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Warning</span>Don’t overload <code>CLAUDE.md</code>
</div>
</div>
<div class="callout-body-container callout-body">
<p>LLMs perform best when their context window is full of <em>focused, relevant</em> content. Since <code>CLAUDE.md</code> is injected into every conversation, everything in it competes for attention with the actual task at hand. A few things to keep in mind:</p>
<ul>
<li><strong>Aim for under 300 lines.</strong> Some teams keep theirs <a href="https://www.humanlayer.dev/blog/writing-a-good-claude-md">under 60</a>. There’s no hard limit, but shorter files mean less context pollution.</li>
<li><strong>Frontier models can track ~150–200 instructions consistently.</strong> Beyond that, the model starts selectively ignoring rules it considers irrelevant to the current task — which means your most important instructions may get lost in the noise.</li>
<li><strong>Use linters, not <code>CLAUDE.md</code>, for code style.</strong> If a tool can enforce a rule deterministically, don’t spend context budget asking an LLM to follow it.</li>
<li><strong>Prefer progressive disclosure.</strong> Rather than documenting everything upfront, tell Claude <em>where to find</em> information (e.g., “see <code>docs/api-spec.md</code> for endpoint details”) so it can load context on demand.</li>
<li><strong>Move specialized knowledge into <a href="https://code.claude.com/docs/en/skills">skills</a>.</strong> Skills load on demand — a <code>/deploy</code> skill or a <code>migration-conventions</code> skill only enters context when relevant, keeping your base <code>CLAUDE.md</code> lean.</li>
</ul>
<p>The bottom line: write <code>CLAUDE.md</code> for the <em>model</em>, not for humans. Keep it concise, universally applicable, and structured. As <a href="https://platform.claude.com/docs/en/build-with-claude/context-windows">Anthropic has noted</a>, larger context windows won’t solve this — “context pollution and information relevance concerns” apply at any window size.</p>
</div>
</div>
<p>See the <a href="https://code.claude.com/docs/en/memory">official <code>CLAUDE.md</code> reference</a> for the full spec, including file resolution order and advanced features.</p>
</section>
<section id="tune-the-permission-dial" class="level3">
<h3 class="anchored" data-anchor-id="tune-the-permission-dial">Tune the permission dial</h3>
<p>Claude Code’s permission system is the main thing that distinguishes it from tools that “just go.” By default, it asks before every file write, shell command, and git operation. This is safe but slow.</p>
<p>The key insight: <strong>permissions aren’t all-or-nothing</strong>. You can configure a spectrum:</p>
<ul>
<li><strong>Start conservative</strong> — approve everything manually while you’re learning what the agent does</li>
<li><strong>Auto-approve low-risk actions</strong> — file reads, grep/search, test execution. These rarely cause harm and the prompts add friction without adding safety.</li>
<li><strong>Manually approve writes and git operations</strong> — this is where real damage can happen (overwriting files, force-pushing, committing secrets)</li>
<li><strong>Use <code>CLAUDE.md</code> safety rules</strong> as a second layer — even if you auto-approve shell commands, the agent will respect instructions like “never force-push”</li>
</ul>
<p>The sweet spot for most developers: auto-approve reads and test runs, manually approve everything else. As you build trust with specific workflows, you can loosen further. See Anthropic’s <a href="https://code.claude.com/docs/en/permissions">permissions reference</a> for the full rule syntax and available tool names.</p>
</section>
<section id="use-branches-and-commit-frequently" class="level3">
<h3 class="anchored" data-anchor-id="use-branches-and-commit-frequently">Use branches and commit frequently</h3>
<p>The non-negotiable: <strong>always work on a branch</strong>, never let an agent commit directly to <code>main</code>. Beyond that, there are two common workflows:</p>
<ul>
<li><strong>Auto-commit freely, review at the PR stage.</strong> Let the agent commit (and even push) as it works. You review the full diff when you open the PR, just like you would with a human contributor. This keeps momentum high and works well when you have CI checks and a good test suite gating your merges.</li>
<li><strong>Commit manually after reviewing each change.</strong> Approve each commit yourself so you stay close to every change as it happens. This is safer when you’re learning the tool, working on sensitive code, or don’t yet have strong CI guardrails.</li>
</ul>
<p>Either way, frequent commits help — they give you clean revert points if the agent goes off track. A good <code>CLAUDE.md</code> instruction like <em>“commit after each completed task”</em> keeps things granular regardless of which workflow you prefer.</p>
</section>
<section id="review-everything-at-the-right-level" class="level3">
<h3 class="anchored" data-anchor-id="review-everything-at-the-right-level">Review everything (at the right level)</h3>
<p>Agent-generated code isn’t exempt from review — but <em>when</em> you review is a matter of workflow. Some developers review each diff before committing; others let the agent run and review the full PR diff before merging. Both are valid. What matters is that <em>someone</em> (you, a teammate, or CI) checks the code before it lands on <code>main</code>:</p>
<ul>
<li>Read the diffs</li>
<li>Check for security issues (hardcoded secrets, SQL injection, etc.)</li>
<li>Verify it matches your architectural patterns</li>
<li>Make sure it doesn’t introduce unnecessary complexity</li>
</ul>
</section>
<section id="give-the-agent-a-way-to-verify-its-own-work" class="level3">
<h3 class="anchored" data-anchor-id="give-the-agent-a-way-to-verify-its-own-work">Give the agent a way to verify its own work</h3>
<p>This is the single highest-leverage thing you can do. Claude performs dramatically better when it can check its own output — running tests, comparing screenshots, validating behavior — rather than relying on you as the only feedback loop.</p>
<ul>
<li><strong>Include test cases in your prompt</strong>: <em>“Write a <code>validateEmail</code> function. Test cases: <code>user@example.com</code> → true, <code>invalid</code> → false, <code>user@.com</code> → false. Run the tests after implementing.”</em></li>
<li><strong>Ask it to verify UI changes visually</strong>: <em>“[paste screenshot] Implement this design. Take a screenshot of the result and compare it to the original.”</em></li>
<li><strong>Point to the symptom, not just the fix</strong>: <em>“The build fails with this error: [paste error]. Fix it and verify the build succeeds. Address the root cause, don’t suppress the error.”</em></li>
</ul>
<p>The more you invest in making your verification rock-solid (a good test suite, a linter, a build check), the more autonomously the agent can work.</p>
</section>
<section id="explore-first-then-plan-then-code" class="level3">
<h3 class="anchored" data-anchor-id="explore-first-then-plan-then-code">Explore first, then plan, then code</h3>
<p>For complex tasks, resist the urge to let Claude jump straight to implementation. Use <strong><a href="https://code.claude.com/docs/en/common-workflows#use-plan-mode-for-safe-code-analysis">Plan Mode</a></strong> (toggle with <code>Shift+Tab</code>) to separate exploration from execution:</p>
<ol type="1">
<li><strong>Explore</strong>: In Plan Mode, Claude reads files and answers questions without making changes. <em>“Read <code>src/auth/</code> and understand how we handle sessions and login.”</em></li>
<li><strong>Plan</strong>: Ask Claude to create an implementation plan. <em>“I want to add Google OAuth. What files need to change? Create a plan.”</em></li>
<li><strong>Implement</strong>: Switch back to Normal Mode and let Claude execute the plan, verifying against tests.</li>
<li><strong>Commit</strong>: Ask Claude to commit with a descriptive message.</li>
</ol>
<p>Skip this for small, clear tasks — if you could describe the diff in one sentence, just ask Claude to do it directly. Planning is most useful when you’re uncertain about the approach or the change touches multiple files.</p>
</section>
<section id="manage-context-aggressively" class="level3">
<h3 class="anchored" data-anchor-id="manage-context-aggressively">Manage context aggressively</h3>
<p>Claude’s context window is your most important resource. As it fills up with conversation history, file contents, and command outputs, performance degrades — Claude may “forget” earlier instructions or make more mistakes. (This section is adapted from Anthropic’s <a href="https://code.claude.com/docs/en/best-practices">official best practices</a>.)</p>
<ul>
<li><strong>Use <code>/clear</code> between unrelated tasks</strong> — a clean context dramatically improves quality</li>
<li><strong>Use <code>/compact</code> to summarize long conversations</strong> — run <code>/compact focus on the API changes</code> to keep what matters and discard the rest</li>
<li><strong>Delegate exploration to subagents</strong> — when Claude needs to read dozens of files to investigate something, have it use a <a href="https://code.claude.com/docs/en/sub-agents">subagent</a>. The subagent works in its own context and returns a summary, keeping your main conversation lean.</li>
<li><strong>Run <code>/context</code></strong> to see what’s consuming your context window (MCP servers can be surprisingly expensive)</li>
<li><strong>Course-correct early</strong> — if Claude is going in the wrong direction, interrupt with <code>Esc</code> rather than letting it generate more output that clutters context. After two failed corrections, <code>/clear</code> and start fresh with a better prompt.</li>
</ul>
</section>
<section id="extend-claude-code-with-skills-hooks-and-mcp" class="level3">
<h3 class="anchored" data-anchor-id="extend-claude-code-with-skills-hooks-and-mcp">Extend Claude Code with skills, hooks, and MCP</h3>
<p>Beyond <code>CLAUDE.md</code>, Claude Code has a rich <a href="https://code.claude.com/docs/en/features-overview">extension system</a> for customizing behavior:</p>
<ul>
<li><strong><a href="https://code.claude.com/docs/en/skills">Skills</a></strong> — reusable knowledge and workflows. Create a <code>/deploy</code> skill that runs your deployment checklist, or an API conventions skill that Claude loads when working on your endpoints. Skills load on demand, so they don’t bloat every session like <code>CLAUDE.md</code> does.</li>
<li><strong><a href="https://code.claude.com/docs/en/hooks">Hooks</a></strong> — deterministic scripts that run at specific points in Claude’s workflow. Unlike <code>CLAUDE.md</code> instructions (which are advisory), hooks are guaranteed to fire. Use them for things like running ESLint after every file edit or blocking writes to a <code>migrations/</code> directory.</li>
<li><strong><a href="https://code.claude.com/docs/en/mcp">MCP</a></strong> — connect Claude to external services. Query your database, post to Slack, control a browser, or pull issues from your project tracker — all from within a Claude Code session.</li>
<li><strong><a href="https://code.claude.com/docs/en/sub-agents">Subagents</a></strong> — isolated workers with their own context. Useful for research tasks, code review, or any work where you don’t want the intermediate steps cluttering your main conversation.</li>
</ul>
<p>Start with <code>CLAUDE.md</code> for your core conventions. Add skills when you find yourself repeating the same workflows. Add hooks when you need guaranteed automation. Add MCP when you need external integrations. For a deeper dive, see Anthropic’s <a href="https://code.claude.com/docs/en/features-overview">extension system overview</a>, which covers when to use each mechanism.</p>
</section>
</section>
<section id="managing-costs" class="level2">
<h2 class="anchored" data-anchor-id="managing-costs">Managing costs</h2>
<p>Agentic coding tools that use API tokens (like Claude Code) charge per token, and agentic workflows are token-hungry — the agent reads files, reasons through problems, writes code, runs commands, reads output, and iterates. A single focused task might use 50K–200K tokens. A sprawling, underspecified session can easily burn through 1M+ tokens.</p>
<section id="what-does-this-actually-cost" class="level3">
<h3 class="anchored" data-anchor-id="what-does-this-actually-cost">What does this actually cost?</h3>
<p>There are two ways to pay for Claude Code: <strong>subscription plans</strong> (fixed monthly cost) or <strong>API tokens</strong> (pay-per-use). Most individuals should start with a subscription; API pricing is better for automation and CI/CD pipelines. (Pricing details adapted from <a href="https://www.anthropic.com/pricing">Anthropic’s pricing page</a> and <a href="https://code.claude.com/docs/en/costs">Claude Code cost management docs</a> — verify current prices, as they change frequently.)</p>
<p><strong>Subscription plans</strong> (as of early 2026):</p>
<table class="caption-top table">
<colgroup>
<col style="width: 33%">
<col style="width: 33%">
<col style="width: 33%">
</colgroup>
<thead>
<tr class="header">
<th>Plan</th>
<th>Price</th>
<th>What you get</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Pro</strong></td>
<td>$20/month</td>
<td>Claude Code access with moderate usage limits</td>
</tr>
<tr class="even">
<td><strong>Max 5x</strong></td>
<td>$100/month</td>
<td>5× the Pro usage limit — sweet spot for most active developers</td>
</tr>
<tr class="odd">
<td><strong>Max 20x</strong></td>
<td>$200/month</td>
<td>20× the Pro usage limit — for heavy agentic work or parallel sessions</td>
</tr>
<tr class="even">
<td><strong>Team (premium seats)</strong></td>
<td>$150/user/month (min 5 seats)</td>
<td>Team management, shared billing, org-level policies</td>
</tr>
</tbody>
</table>
<p>With subscription plans, you never get a surprise bill — you hit rate limits instead. The <code>/cost</code> command shows your token usage in a session, but on a subscription plan this is informational only; it doesn’t affect your bill.</p>
<p><strong>API token pricing</strong> (pay-per-use):</p>
<table class="caption-top table">
<colgroup>
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
</colgroup>
<thead>
<tr class="header">
<th>Model</th>
<th>Input tokens</th>
<th>Output tokens</th>
<th>Best for</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Haiku 4.5</strong></td>
<td>$1/MTok</td>
<td>$5/MTok</td>
<td>Fast, cheap tasks (linting, simple edits)</td>
</tr>
<tr class="even">
<td><strong>Sonnet 4.6</strong></td>
<td>$3/MTok</td>
<td>$15/MTok</td>
<td>Default for most coding work</td>
</tr>
<tr class="odd">
<td><strong>Opus 4.6</strong></td>
<td>$5/MTok</td>
<td>$25/MTok</td>
<td>Complex reasoning, architecture decisions</td>
</tr>
</tbody>
</table>
<p>Note that output tokens cost <strong>5× more</strong> than input tokens across all models — and code generation is output-heavy. Also, requests exceeding 200K input tokens are charged at 2× input / 1.5× output rates, which matters for large codebases. <a href="https://platform.claude.com/docs/en/build-with-claude/prompt-caching">Prompt caching</a> can reduce input costs by up to 90% on repeated system prompts, and the <a href="https://platform.claude.com/docs/en/build-with-claude/batch-processing">Batch API</a> offers a 50% discount for async processing.</p>
<p>Anthropic reports that the average Claude Code user on API pricing spends <a href="https://code.claude.com/docs/en/costs">roughly $6/day</a>, with 90% of users under $12/day. That translates to <strong>$100–200/month</strong> for active development with Sonnet. But averages hide a lot of variance — one developer documented a single session that <a href="https://jeffry.in/expensive-ai/">hit ~$150/hour</a> running multiple parallel agents. One detailed <a href="https://ccusage.com/guide/monthly-reports">usage report</a> showed ~892K output tokens vs ~45K input tokens in a single month on a mix of Opus and Sonnet, costing ~$1,248.</p>
<p><strong>Rule of thumb:</strong> If your monthly API costs would exceed $60–80, Max 5x is cheaper. If they’d exceed $150, Max 20x is the clear winner.</p>
<p>For comparison, GitHub Copilot runs $10–$39/month depending on tier, with usage-based pricing for premium models beyond included allowances.</p>
</section>
<section id="watch-out-for-runaway-costs" class="level3">
<h3 class="anchored" data-anchor-id="watch-out-for-runaway-costs">Watch out for runaway costs</h3>
<p>Agentic workflows can burn through tokens fast, especially when things go wrong:</p>
<ul>
<li><strong>Agentic loops</strong>: A vague prompt can send the agent into cycles of trying approaches, failing, reading more files, and trying again — each loop consuming thousands of tokens.</li>
<li><strong>Context accumulation</strong>: As your conversation grows, every new message includes the full context window — so the 50th message in a session costs far more than the 1st. Use <code>/clear</code> between unrelated tasks and <code>/compact</code> to summarize long conversations.</li>
<li><strong>Parallel sessions</strong>: Running multiple Claude Code sessions simultaneously (especially on the web or with agent teams) multiplies your token consumption proportionally. Five parallel sessions = 5× the cost.</li>
<li><strong>Extended thinking</strong>: Thinking tokens are billed as output tokens. A complex Opus session with deep reasoning can generate thousands of thinking tokens per turn.</li>
</ul>
<p><strong>How to protect yourself:</strong></p>
<ul>
<li><strong>On a subscription</strong>: You can’t overspend, but you can hit rate limits mid-task. Monitor with <code>/cost</code> and plan your usage around your limit.</li>
<li><strong>On API pricing</strong>: Set <a href="https://console.anthropic.com/">spending alerts and hard limits</a> on your Anthropic account. Use separate API keys for different projects so you can track spending.</li>
<li><strong>In both cases</strong>: Use <code>/cost</code> to monitor token usage mid-session. If a session is getting expensive, <code>/clear</code> and start fresh with a more specific prompt. Break large tasks into focused sessions.</li>
</ul>
</section>
<section id="for-uw-madison-researchers-institutional-cloud-benefits" class="level3">
<h3 class="anchored" data-anchor-id="for-uw-madison-researchers-institutional-cloud-benefits">For UW-Madison researchers: institutional cloud benefits</h3>
<p>If you’re at UW-Madison (or a similar research institution), routing AI API costs through a UW-provisioned cloud account offers two main benefits: <strong>institutional billing</strong> (charges go to your cloud project, not your personal card — important for grants and shared budgets) and <strong>lower overhead on grants</strong> (UW’s <a href="https://rsp.wisc.edu/proposalprep/cloudComputeInfo.cfm">Cloud Computing Pilot</a> cuts F&amp;A from 55.5% to 26%, saving ~$2,950 per $10,000 in cloud spending). NIH-funded researchers may get additional discounts through <a href="https://kb.wisc.edu/109813">STRIDES</a>. Note that these savings are on the <em>overhead and billing side</em> — Anthropic’s per-token pricing is the same whether you route through Vertex AI, Bedrock, or the direct API. Also note that institutional cloud agreements cover the cloud provider’s services — they do <strong>not</strong> extend to Anthropic’s data handling (see Data Privacy below).</p>
<p>Contact your department’s IT staff or <a href="https://it.wisc.edu/about/division-of-information-technology/research-cyberinfrastructure/">Research Computing</a> to ask about available cloud credits and whether AI API costs are eligible.</p>
</section>
<section id="strategies-to-keep-costs-down" class="level3">
<h3 class="anchored" data-anchor-id="strategies-to-keep-costs-down">Strategies to keep costs down</h3>
<ul>
<li><strong>Be specific in your prompts</strong> — vague requests lead to more agentic loops, which means more tokens. “Add a login form” costs more than “Add a React component at <code>src/components/LoginForm.tsx</code> that posts email/password to <code>/api/auth/login</code>”</li>
<li><strong>Use <code>/clear</code> aggressively</strong> — reset context between unrelated tasks. A clean context means fewer input tokens per message.</li>
<li><strong>Use <code>/compact</code></strong> — summarize long conversations to free up context space without losing key information</li>
<li><strong>Use the right model for the task</strong> — Haiku or Sonnet for straightforward tasks, reserve Opus for complex reasoning</li>
<li><strong>Break large tasks into smaller sessions</strong> — each focused session is cheaper than one sprawling conversation that loses context and re-reads files</li>
<li><strong>Use <code>CLAUDE.md</code></strong> to provide project context upfront — this reduces the amount of exploration the agent needs to do</li>
<li><strong>Delegate exploration to subagents</strong> — they run in isolated context and return summaries, keeping your main session lean</li>
<li><strong>Monitor session costs</strong> — run <code>/cost</code> periodically to see where you stand</li>
</ul>
<p>For more detail, see Anthropic’s <a href="https://code.claude.com/docs/en/costs">cost management guide</a>.</p>
</section>
<section id="energy-and-environmental-considerations" class="level3">
<h3 class="anchored" data-anchor-id="energy-and-environmental-considerations">Energy and environmental considerations</h3>
<p>Agentic coding is more compute-intensive than a simple chat query or web search. A single LLM text query now uses <a href="https://epoch.ai/gradient-updates/how-much-energy-does-chatgpt-use">roughly 0.3 Wh</a> — about the same as a Google search — thanks to hardware improvements and model optimization. But an agentic coding session chains <strong>hundreds or thousands</strong> of such calls together as the agent reads files, reasons, writes code, runs commands, and iterates.</p>
<p><strong>How much energy does agentic coding actually use?</strong></p>
<p>A <a href="https://www.simonpcouch.com/blog/2026-01-20-cc-impact/">detailed analysis by Simon P. Couch</a> estimated Claude Code’s energy footprint at roughly <strong>41 Wh per session</strong> — over 130× a single chat query. A heavy day of usage (multiple sessions, parallel agents) can reach <strong>~1,300 Wh/day</strong>. To put that in perspective:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Activity</th>
<th>Energy</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Google search or single AI chat query</td>
<td>~0.3 Wh</td>
</tr>
<tr class="even">
<td>LED lightbulb (1 hour)</td>
<td>~10 Wh</td>
</tr>
<tr class="odd">
<td>One Claude Code session</td>
<td>~41 Wh</td>
</tr>
<tr class="even">
<td>Streaming 1 hour of video (incl.&nbsp;device)</td>
<td>~36–80 Wh</td>
</tr>
<tr class="odd">
<td>Heavy Claude Code daily use</td>
<td>~1,300 Wh</td>
</tr>
<tr class="even">
<td>Running a dishwasher once</td>
<td>~1,300 Wh</td>
</tr>
<tr class="odd">
<td>Daily refrigerator use</td>
<td>~1,200–1,500 Wh</td>
</tr>
</tbody>
</table>
<p>So a heavy day of agentic coding is roughly equivalent to running your dishwasher — modest at the individual level, but significant in aggregate.</p>
<p><strong>The bigger picture:</strong></p>
<ul>
<li>The <a href="https://www.iea.org/reports/energy-and-ai/energy-demand-from-ai">IEA projects</a> that global data center electricity consumption will roughly double from ~415 TWh in 2024 to over 945 TWh by 2030, driven largely by AI workloads. In the US, data centers are projected to consume <a href="https://www.carbonbrief.org/ai-five-charts-that-put-data-centre-energy-use-and-emissions-into-context/">more electricity than all energy-intensive manufacturing combined</a> (aluminum, steel, cement, chemicals) by 2030.</li>
<li>An estimated <a href="https://aimultiple.com/ai-energy-consumption">60–90% of AI computing energy</a> goes to inference (running models), not training. Training grabs headlines, but inference — every agentic session, every chat query — is where the ongoing energy cost lives.</li>
<li>Cloud providers are investing in renewable energy, but coverage varies. Anthropic has <a href="https://www.anthropic.com/news/investing-in-energy-to-secure-america-s-ai-future">pledged to offset energy costs</a> and invested in grid optimization research, though the company <a href="https://ditchcarbon.com/organizations/anthropic">lacks formal carbon reduction targets</a> and a significant portion of new capacity is <a href="https://www.fastcompany.com/91336991/openai-anthropic-deepseek-ai-models-environmental-impact">natural gas powered</a>.</li>
<li>On the efficiency side, a <a href="https://www.fastcompany.com/91336991/openai-anthropic-deepseek-ai-models-environmental-impact">University of Rhode Island study</a> found Claude Sonnet to be among the most energy-efficient frontier models, and energy per token has <a href="https://muxup.com/2026q1/per-query-energy-consumption-of-llms">improved ~120× from GPT-3 to current models</a> due to hardware and architecture advances.</li>
</ul>
<p><strong>What this means for you:</strong></p>
<p>This doesn’t mean you shouldn’t use agentic tools — the productivity gains can be substantial, and the energy per unit of <em>useful output</em> may be better than the alternative (a human developer running builds, searching docs, and context-switching for hours). But it’s a reason to be intentional: don’t let an agent spin in wasteful loops when a well-scoped prompt would get the job done in one pass. <strong>Efficient prompting is both cheaper and greener.</strong></p>
</section>
</section>
<section id="data-privacy" class="level2">
<h2 class="anchored" data-anchor-id="data-privacy">Data privacy: who sees your code?</h2>
<p>When you use Claude Code, your code and prompts are sent to Anthropic’s servers for inference. This is a reasonable concern — here’s what actually happens to that data, depending on how you access Claude.</p>
<section id="is-your-data-used-for-model-training" class="level3">
<h3 class="anchored" data-anchor-id="is-your-data-used-for-model-training">Is your data used for model training?</h3>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Access method</th>
<th>Used for training?</th>
<th>Default retention</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Claude API, Team, Enterprise</strong> (commercial terms)</td>
<td><strong>No</strong> — prohibited unless you explicitly opt in (e.g., <a href="https://support.claude.com/en/articles/11174108-about-the-development-partner-program">Development Partner Program</a>)</td>
<td>30 days</td>
</tr>
<tr class="even">
<td><strong>Free / Pro / Max</strong> (consumer plans)</td>
<td><strong>Your choice</strong> — controlled via <a href="https://claude.ai/settings/data-privacy-controls">Privacy Settings</a></td>
<td>5 years (training on) / 30 days (training off)</td>
</tr>
</tbody>
</table>
<div class="callout callout-style-default callout-warning callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Warning</span>If you’re on a Free, Pro, or Max plan: check your training settings
</div>
</div>
<div class="callout-body-container callout-body">
<p>Anthropic gives you the choice to allow training on your data — check your setting at <a href="https://claude.ai/settings/data-privacy-controls">claude.ai/settings/data-privacy-controls</a>. This applies to Claude Code sessions on consumer plans too.</p>
</div>
</div>
<p><strong>Important nuances for consumer plans:</strong></p>
<ul>
<li><strong>Safety exception:</strong> Even if you disable training, conversations flagged for <a href="https://www.anthropic.com/legal/aup">safety review</a> may still be used to improve Anthropic’s ability to detect and enforce their Usage Policy (e.g., training safeguard models).</li>
<li><strong>What’s included:</strong> When training is enabled, Anthropic may use the entire conversation — prompts, outputs, custom styles, and conversation preferences.</li>
<li><strong>What’s excluded:</strong> Raw content from connectors (Google Drive, MCP servers) is <strong>not</strong> included in training data, unless you directly copy that content into your conversation.</li>
<li><strong>Feedback (thumbs up/down):</strong> Submitting feedback stores the full related conversation for up to 5 years, de-linked from your user ID. This data may be used for training regardless of your training setting.</li>
</ul>
<p><strong>For researchers with sensitive or restricted data:</strong> Routing through a cloud provider (Vertex AI, Bedrock) ensures your data is not used for training and limits retention to 30 days — but <strong>your prompts still reach Anthropic’s infrastructure</strong> for inference. UW-Madison has agreements with Google, AWS, and Microsoft for their cloud services, but does <strong>not</strong> yet have a direct data-use agreement with Anthropic. This means cloud routing alone does not provide UW-sanctioned data protections for restricted data (HIPAA/PHI, FERPA, CUI, export-controlled, or data under a DUA that prohibits third-party processing). <strong>Avoid using Claude Code with restricted data until a formal UW-Anthropic agreement is in place.</strong> For general, non-sensitive research code, cloud-routed Claude Code is fine to use today. UW is actively exploring institutional Anthropic licenses and data agreements. Enterprise customers can negotiate <a href="https://privacy.claude.com/en/articles/8956058-i-have-a-zero-data-retention-agreement-with-anthropic-what-products-does-it-apply-to">zero-data retention (ZDR)</a> agreements where Anthropic stores nothing after the API response. See our <a href="../../Learn/Guides/claude-code-cloud-setup.html">Cloud Setup Guide</a> for how UW-Madison researchers can use institutional cloud accounts (GCP or AWS) and for more details on <a href="../../Learn/Guides/claude-code-cloud-setup.html#use-caution-with-restricted-or-sensitive-data">data sensitivity considerations</a>.</p>
</section>
<section id="can-anthropic-employees-see-your-code" class="level3">
<h3 class="anchored" data-anchor-id="can-anthropic-employees-see-your-code">Can Anthropic employees see your code?</h3>
<p>Not by default. Employee access to conversation data requires one of:</p>
<ul>
<li><strong>You submit feedback</strong> (thumbs up/down, <code>/bug</code> command) — the full related conversation becomes reviewable, stored for up to 5 years (de-linked from your user ID for thumbs up/down)</li>
<li><strong>A trust &amp; safety investigation</strong> — if Anthropic’s automated systems flag a policy violation (this data may also be used for training safeguard models)</li>
<li><strong>Explicit consent</strong> — you voluntarily share data with Anthropic</li>
</ul>
<p>Under commercial terms (API, Vertex AI, Bedrock), access is further restricted by contractual obligations.</p>
</section>
<section id="what-about-the-web-version" class="level3">
<h3 class="anchored" data-anchor-id="what-about-the-web-version">What about the web version?</h3>
<p>When you use Claude Code on the web (claude.ai/code), your GitHub repo is cloned into an ephemeral VM. The VM is destroyed when the task completes — there’s no persistent repo storage between sessions. The same retention policies above apply to any code Claude reads during the session.</p>
</section>
<section id="telemetry-and-error-reporting" class="level3">
<h3 class="anchored" data-anchor-id="telemetry-and-error-reporting">Telemetry and error reporting</h3>
<p>Claude Code sends operational telemetry (latency, reliability metrics — <strong>no code or file paths</strong>) to Statsig, and error reports to Sentry. These are enabled by default on the direct Claude API but <strong>disabled by default on Vertex AI, Bedrock, and Foundry</strong>.</p>
<p>To opt out individually: <code>DISABLE_TELEMETRY=1</code>, <code>DISABLE_ERROR_REPORTING=1</code>, <code>DISABLE_BUG_COMMAND=1</code>, or <code>CLAUDE_CODE_DISABLE_FEEDBACK_SURVEY=1</code>. To disable all non-essential traffic at once: <code>CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1</code>. These can be set in your <a href="https://code.claude.com/docs/en/settings"><code>settings.json</code></a>.</p>
<p>For a detailed breakdown by provider, see the <a href="../../Learn/Guides/claude-code-cloud-setup.html#data-usage">Cloud Setup Guide — Data Usage &amp; Privacy</a>.</p>
</section>
<section id="further-reading-on-data-privacy" class="level3">
<h3 class="anchored" data-anchor-id="further-reading-on-data-privacy">Further reading on data privacy</h3>
<ul>
<li><a href="https://privacy.claude.com/en/articles/10023580-is-my-data-used-for-model-training">Is my data used for model training?</a> — Anthropic Privacy Center</li>
<li><a href="https://privacy.claude.com/en/articles/10023548-how-long-do-you-store-my-data">How long do you store my data?</a> — retention periods by account type</li>
<li><a href="https://code.claude.com/docs/en/data-usage">Data usage — Claude Code docs</a> — what Claude Code specifically transmits and how cloud sessions handle your repo</li>
<li><a href="https://code.claude.com/docs/en/security">Security — Claude Code docs</a> — prompt injection safeguards, data retention, and web session isolation</li>
<li><a href="https://privacy.claude.com/en/articles/12109829-how-do-i-change-my-model-improvement-privacy-settings">How do I change my model improvement privacy settings?</a> — step-by-step opt-out instructions</li>
<li><a href="https://privacy.claude.com/en/articles/10458704-how-does-anthropic-protect-the-personal-data-of-claude-users">How does Anthropic protect personal data?</a> — security practices and encryption</li>
</ul>
</section>
</section>
<section id="security-fundamentals" class="level2">
<h2 class="anchored" data-anchor-id="security-fundamentals">Security fundamentals</h2>
<p>When you launch Claude Code from the CLI, it runs with your user’s full filesystem permissions. It can read, modify, or delete files anywhere your account can reach — not just your project directory. A poorly worded prompt, an agentic loop, or a prompt injection attack could cause changes you didn’t intend. Here’s how to limit the blast radius, from most important to least.</p>
<section id="use-permissions-and-deny-rules" class="level3">
<h3 class="anchored" data-anchor-id="use-permissions-and-deny-rules">Use permissions and deny rules</h3>
<p>Claude Code has a <a href="https://code.claude.com/docs/en/permissions">built-in permissions system</a> that controls what it can do. In the default mode, it asks for approval before file writes, shell commands, and git operations. You can customize this with rules in <code>settings.json</code>:</p>
<ul>
<li><strong><code>deny</code></strong> — hard block. Claude can’t use the tool, period. Deny rules always win, even if you accidentally click “always allow” on a prompt.</li>
<li><strong><code>allow</code></strong> — auto-approve. Skips the approval prompt for things you trust (e.g., <code>git add</code>, <code>pytest</code>).</li>
</ul>
<p><strong>Deny rules are your most important security layer.</strong> They protect sensitive paths — SSH keys, cloud credentials, <code>.env</code> files — regardless of what the agent tries to do. The approval prompt is your first line of defense; deny rules are the backup that can’t be bypassed.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb6-2">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"permissions"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb6-3">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"deny"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span></span>
<span id="cb6-4">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(//home/youruser/.ssh/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb6-5">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Edit(//home/youruser/.ssh/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb6-6">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(//home/youruser/.aws/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb6-7">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Edit(//home/youruser/.aws/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb6-8">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(./.env)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb6-9">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Edit(./.env)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb6-10">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bash(rm -rf *)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb6-11">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bash(curl:*)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb6-12">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bash(wget:*)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb6-13">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bash(cat:*)"</span></span>
<span id="cb6-14">    <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb6-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb6-16"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>See the <a href="../../Learn/Guides/claude-code-cloud-setup.html#understand-risk">Cloud Setup Guide’s security section</a> for a full walkthrough with platform-specific examples, or the <a href="https://code.claude.com/docs/en/permissions">official permissions docs</a> for the complete rule syntax.</p>
</section>
<section id="enable-claude-codes-built-in-sandbox" class="level3">
<h3 class="anchored" data-anchor-id="enable-claude-codes-built-in-sandbox">Enable Claude Code’s built-in sandbox</h3>
<p>Claude Code’s <a href="https://code.claude.com/docs/en/sandboxing">built-in sandbox</a> uses OS-level isolation (Linux namespaces / macOS Seatbelt) to restrict what shell commands can do — limiting filesystem writes to your project directory and blocking unauthorized network requests. This is separate from running inside a container (covered below). It’s lightweight, adds negligible overhead, and <a href="https://www.anthropic.com/engineering/claude-code-sandboxing">Anthropic’s internal testing</a> found it reduced permission prompts by 84% while <em>increasing</em> security. Use it alongside deny rules for the strongest protection — Anthropic calls this <a href="https://code.claude.com/docs/en/sandboxing">“defense in depth”</a>.</p>
</section>
<section id="scope-your-credentials" class="level3">
<h3 class="anchored" data-anchor-id="scope-your-credentials">Scope your credentials</h3>
<p>Even with deny rules and sandboxing, it’s good practice to limit what credentials the agent has access to in the first place.</p>
<p><strong>Use minimal-scope tokens.</strong> Create fine-grained GitHub tokens scoped to only the repos and permissions the agent needs. If it only pushes to one repo, don’t give it access to your entire account. Use a bot account for agent-driven git operations, and generate dedicated deploy keys rather than reusing your personal SSH keys.</p>
<p><strong>Set spending limits</strong> on API keys and use separate keys from your personal or production ones.</p>
<p><strong>Add secrets to <code>.gitignore</code></strong> — <code>.env</code>, <code>credentials.json</code>, <code>*.pem</code>, <code>*.key</code>, <code>.netrc</code> — before the agent ever runs. Once a secret is committed, it’s in the history. (But note: <code>.gitignore</code> prevents <em>committing</em> secrets, not <em>reading</em> them. Deny rules are what actually block the agent from accessing sensitive files.)</p>
</section>
<section id="consider-containers-for-cicd-and-headless-environments" class="level3">
<h3 class="anchored" data-anchor-id="consider-containers-for-cicd-and-headless-environments">Consider containers for CI/CD and headless environments</h3>
<p><strong>For interactive development, the built-in sandbox is the right choice</strong> — it’s what Anthropic recommends, it’s lightweight, and combined with deny rules it provides strong isolation without any setup overhead. You don’t need Docker for local coding sessions.</p>
<p>Containers solve a different problem: <strong>unattended, non-interactive execution</strong> where there’s no human to approve permission prompts. In CI/CD pipelines, GitHub Actions, and headless automation, running Claude Code inside a container lets you use <code>--dangerously-skip-permissions</code> safely — the container itself is the isolation boundary, so there’s nothing outside it to damage. This is also the pattern Anthropic’s own <a href="https://code.claude.com/docs/en/github-actions">GitHub Actions</a> and <a href="https://code.claude.com/docs/en/gitlab-ci-cd">GitLab CI/CD</a> integrations use.</p>
<p>Options include a plain <a href="https://docs.docker.com/ai/sandboxes/agents/claude-code/">Docker container</a> with your project mounted as a volume, <a href="https://www.docker.com/blog/docker-sandboxes-run-claude-code-and-other-coding-agents-unsupervised-but-safely/">Docker sandboxes</a> (microVM-based isolation), or cloud sandbox platforms like <a href="https://e2b.dev/">E2B</a>. For CI pipelines, ephemeral containers that are destroyed after each run are the safest option — nothing persists between runs.</p>
<p><strong>Don’t stack them.</strong> The built-in sandbox and Docker containers are alternative isolation strategies. Running bubblewrap inside Docker introduces <a href="https://code.claude.com/docs/en/sandboxing">nested sandbox complexity</a> without meaningful security benefit. Pick one: sandbox for interactive work, containers for headless automation.</p>
</section>
<section id="watch-for-prompt-injection-and-runaway-agents" class="level3">
<h3 class="anchored" data-anchor-id="watch-for-prompt-injection-and-runaway-agents">Watch for prompt injection and runaway agents</h3>
<p><strong>Prompt injection</strong> is when an agent reads a file or message that contains hidden instructions designed to hijack its behavior. A malicious <code>README.md</code>, issue body, or <a href="https://www.promptarmor.com/resources/claude-cowork-exfiltrates-files"><code>.docx</code> attachment</a> could trick the agent into exfiltrating files or running harmful commands. Be especially cautious when pointing an agent at untrusted repositories or external content. Deny rules and sandboxing are your main defenses here — they limit what the agent can do even if it’s been tricked.</p>
<p><strong>Runaway agents</strong> burn tokens and make unwanted changes when they get stuck in loops. Commit your work frequently so you can recover from mistakes, set spending limits on your API keys, and don’t hesitate to interrupt (<code>Ctrl+C</code>) and redirect. Set up git hooks or CI checks as safety nets — for example, preventing force-pushes to <code>main</code>.</p>
<p><strong>Never</strong> give an agent unsupervised access to production systems, databases, or deployment pipelines.</p>
</section>
</section>
<section id="platform-and-deployment-notes" class="level2">
<h2 class="anchored" data-anchor-id="platform-and-deployment-notes">Platform and deployment notes</h2>
<section id="running-claude-code-remotely" class="level3">
<h3 class="anchored" data-anchor-id="running-claude-code-remotely">Running Claude Code remotely</h3>
<p>You don’t have to run Claude Code on your local machine. Running it over <strong>SSH on a cloud VM or remote server</strong> keeps your local system untouched and gives you access to more powerful hardware. For CI/CD integration — running Claude Code in GitHub Actions, GitLab CI, or similar systems — see the container discussion in Security fundamentals above, plus the official docs for <a href="https://code.claude.com/docs/en/github-actions">GitHub Actions</a> and <a href="https://code.claude.com/docs/en/gitlab-ci-cd">GitLab CI/CD</a>.</p>
</section>
<section id="a-note-for-gitlab-users" class="level3">
<h3 class="anchored" data-anchor-id="a-note-for-gitlab-users">A note for GitLab users</h3>
<p>Many teams — including many at UW-Madison — use GitLab rather than GitHub. Claude Code works with GitLab, but the integration is <strong>less mature</strong> than the GitHub experience.</p>
<p><strong>What works well:</strong></p>
<ul>
<li><strong>Claude Code CLI with GitLab repos</strong> — the core experience (reading code, editing files, running commands) works identically regardless of your git host. Claude Code operates on your local checkout, so the remote platform doesn’t matter for day-to-day coding.</li>
<li><strong>GitLab CI/CD integration</strong> — Anthropic provides <a href="https://code.claude.com/docs/en/gitlab-ci-cd">official documentation for running Claude Code in GitLab CI/CD pipelines</a>, including merge request review and test scaffolding.</li>
<li><strong>Git operations</strong> — push, pull, branching, and committing all work normally since these are standard git operations.</li>
</ul>
<p><strong>What’s different or limited compared to GitHub:</strong></p>
<ul>
<li><strong>No native GitLab integration in Claude Code’s Slack bot</strong> — the Slack integration currently only supports GitHub repos. GitLab support is <a href="https://github.com/anthropics/claude-code/issues/21527">an open feature request</a>.</li>
<li><strong>No <code>@claude</code> mention in GitLab issues/MRs</strong> — GitHub Copilot’s coding agent lets you assign issues to Copilot or mention it in PRs. There’s no equivalent native integration for GitLab yet, though <a href="https://gitlab.com/gitlab-org/gitlab/-/issues/557820">GitLab is working on it</a>.</li>
<li><strong>Community-built CI/CD tooling</strong> — while official docs exist, you may find yourself using <a href="https://github.com/RealMikeChong/claude-code-for-gitlab">community solutions</a> to replicate the smoother GitHub Actions experience.</li>
<li><strong>Self-hosted GitLab</strong> — if your institution runs a self-hosted GitLab instance, be aware that Claude Code sends code context to Anthropic’s API for processing. This may raise compliance concerns depending on your institution’s data policies.</li>
</ul>
<p><strong>Practical advice:</strong> The CLI workflow is essentially identical — focus your setup effort on CI/CD integration. For MR review automation, use Claude Code in your <code>.gitlab-ci.yml</code> with the <code>claude -p</code> (prompt) flag for non-interactive pipeline usage. If your institution has data sensitivity requirements, check with your IT governance team before sending code to external APIs — this applies to <strong>all</strong> cloud-based AI coding tools, not just Claude.</p>
</section>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p>Agentic coding tools are genuinely powerful — they can dramatically accelerate feature development, help you explore unfamiliar codebases, and automate tedious multi-step tasks. But they require a different mindset than traditional code assistants:</p>
<ol type="1">
<li><strong>Scope your requests tightly</strong> — features, not projects</li>
<li><strong>Use <code>CLAUDE.md</code></strong> to encode guardrails and project context</li>
<li><strong>Tune permissions deliberately</strong> — start conservative, loosen as you build trust</li>
<li><strong>Set deny rules and enable the sandbox</strong> — your two strongest security layers</li>
<li><strong>Scope your credentials</strong> — fine-grained tokens, dedicated keys, <code>.gitignore</code> for secrets</li>
<li><strong>Monitor costs</strong> — set limits, be specific, use the right model for the task</li>
<li><strong>Commit frequently</strong> — keep escape hatches available</li>
<li><strong>Review everything</strong> — you’re the engineer; the agent is a very fast intern</li>
</ol>
<p>The technology is moving fast, and best practices will continue to evolve. The core principle stays the same: <strong>give agents the minimum access they need, provide maximum clarity in your instructions, and always keep a human in the loop for decisions that matter</strong>.</p>
</section>
<section id="further-reading-and-perspectives" class="level2">
<h2 class="anchored" data-anchor-id="further-reading-and-perspectives">Further reading and perspectives</h2>
<p>Agentic coding is evolving fast. Here are some of the best resources for staying current:</p>
<section id="official-documentation-and-guides" class="level3">
<h3 class="anchored" data-anchor-id="official-documentation-and-guides">Official documentation and guides</h3>
<ul>
<li><a href="https://code.claude.com/docs/en/best-practices">Best Practices for Claude Code</a> — Anthropic’s official guide, covering context management, prompt patterns, CLAUDE.md, and scaling across parallel sessions</li>
<li><a href="https://code.claude.com/docs/en/how-claude-code-works">How Claude Code Works</a> — the agentic loop architecture, built-in tools, and how Claude interacts with your project</li>
<li><a href="https://code.claude.com/docs/en/features-overview">Extend Claude Code</a> — when to use CLAUDE.md vs skills vs subagents vs hooks vs MCP</li>
<li><a href="https://code.claude.com/docs/en/common-workflows">Common Workflows</a> — step-by-step guides for debugging, refactoring, testing, creating PRs, and more</li>
<li><a href="https://code.claude.com/docs/en/claude-code-on-the-web">Claude Code on the Web</a> — running Claude Code tasks asynchronously on cloud infrastructure</li>
<li><a href="https://code.claude.com/docs/en/desktop">Claude Code Desktop</a> — the desktop GUI with visual diffs, parallel sessions, and managed updates</li>
<li><a href="https://code.claude.com/docs/en/sandboxing">Claude Code Sandboxing Documentation</a> — reference for configuring Claude Code’s built-in sandboxing, including OS-level primitives (Linux bubblewrap, macOS Seatbelt) and deny rules for sensitive files</li>
<li><a href="https://www.anthropic.com/engineering/claude-code-sandboxing">Making Claude Code More Secure and Autonomous</a> — Anthropic Engineering’s deep-dive into their dual-layer sandboxing architecture (filesystem + network isolation)</li>
<li><a href="https://www.anthropic.com/research/prompt-injection-defenses">Mitigating the Risk of Prompt Injections</a> — Anthropic Research on defending AI agents against prompt injection, including their use of reinforcement learning to build injection robustness into Claude</li>
<li><a href="https://github.blog/news-insights/product-news/github-copilot-meet-the-new-coding-agent/">GitHub Copilot: Meet the New Coding Agent</a> — GitHub’s announcement of their enterprise-ready coding agent that spins up secure environments via GitHub Actions</li>
<li><a href="https://github.blog/ai-and-ml/github-copilot/github-copilot-coding-agent-101-getting-started-with-agentic-workflows-on-github/">GitHub Copilot Coding Agent 101</a> — GitHub’s getting-started guide for agentic workflows, including environment setup and PR creation</li>
<li><a href="https://github.blog/ai-and-ml/github-copilot/whats-new-with-github-copilot-coding-agent/">What’s New with GitHub Copilot Coding Agent</a> — latest updates including self-review, security scanning, and custom agents</li>
</ul>
</section>
<section id="community-voices-and-analysis" class="level3">
<h3 class="anchored" data-anchor-id="community-voices-and-analysis">Community voices and analysis</h3>
<ul>
<li><a href="https://simonwillison.net/2026/Feb/23/agentic-engineering-patterns/">Agentic Engineering Patterns</a> — Simon Willison’s guide to coding practices for getting the best results from agents like Claude Code and Codex. He frames this as “expertise amplification, not expertise replacement”</li>
<li><a href="https://www.oneusefulthing.org/p/a-guide-to-which-ai-to-use-in-the">A Guide to Which AI to Use in the Agentic Era</a> — Ethan Mollick’s updated guide arguing that “using AI” now means agents with tools, not chatbots, and that users must think in terms of Models, Apps, and Harnesses</li>
<li><a href="https://www.builder.io/blog/claude-code">How I Use Claude Code (+ My Best Tips)</a> — practical walkthrough from Builder.io on real-world Claude Code workflows</li>
<li><a href="https://www.teamday.ai/blog/complete-guide-agentic-coding-2026">The Complete Guide to Agentic Coding in 2026</a> — broad overview comparing tools, workflows, and team strategies</li>
<li><a href="https://cacm.acm.org/opinion/redefining-the-software-engineering-profession-for-ai/">Redefining the Software Engineering Profession for AI</a> — ACM opinion piece on how AI amplifies senior talent but risks leaving junior developers without the chance to develop architectural intuition</li>
</ul>
</section>
<section id="tool-comparisons" class="level3">
<h3 class="anchored" data-anchor-id="tool-comparisons">Tool comparisons</h3>
<ul>
<li><a href="https://dev.to/pockit_tools/cursor-vs-windsurf-vs-claude-code-in-2026-the-honest-comparison-after-using-all-three-3gof">Cursor vs Windsurf vs Claude Code in 2026</a> — hands-on comparison arguing Cursor has the best IDE UX, Claude Code leads on deep reasoning and terminal-first workflows, and Windsurf offers the best value</li>
</ul>
</section>
<section id="benchmarks-and-leaderboards" class="level3">
<h3 class="anchored" data-anchor-id="benchmarks-and-leaderboards">Benchmarks and leaderboards</h3>
<p>Agentic coding benchmarks are evolving rapidly. These track how well different models and agent scaffolds perform on real-world software engineering tasks:</p>
<ul>
<li><a href="https://www.swebench.com/">SWE-bench Leaderboards</a> — the most widely cited benchmark for agentic coding. Models are evaluated on their ability to resolve real GitHub issues from open-source Python repos. The “Verified” split is the standard comparison point, though <a href="https://scale.com/blog/swe-bench-pro">contamination concerns</a> have motivated harder variants</li>
<li><a href="https://scale.com/leaderboard/swe_bench_pro_public">SWE-bench Pro</a> — Scale AI’s harder benchmark (1,865 tasks across 41 repos). Top models that score 70%+ on SWE-bench Verified score only ~23% here</li>
<li><a href="https://openai.com/index/swe-lancer/">SWE-Lancer</a> — OpenAI’s benchmark based on 1,400+ real Upwork freelance tasks valued at $1M in payouts, ranging from $50 bug fixes to $32K feature implementations. Provides a natural difficulty gradient tied to real-world economics</li>
<li><a href="https://www.tbench.ai/">Terminal-Bench</a> — evaluates agents on multi-step terminal workflows (not just code generation). Tests planning, execution, and recovery in sandboxed command-line environments</li>
<li><a href="https://artificialanalysis.ai/insights/coding-agents-comparison">Coding Agents Comparison</a> — Artificial Analysis’s ongoing comparison with pricing breakdowns alongside benchmark scores</li>
<li><a href="https://www.anthropic.com/engineering/infrastructure-noise">Quantifying Infrastructure Noise in Agentic Coding Evals</a> — Anthropic’s analysis showing that a 2-point leaderboard lead may reflect hardware differences rather than genuine capability gaps — important context for interpreting any benchmark</li>
</ul>
<p><strong>Caveat:</strong> Benchmarks measure specific capabilities under controlled conditions. Real-world performance depends heavily on your prompt quality, project structure, and <code>CLAUDE.md</code> configuration. Use benchmarks to track the field’s trajectory, not to pick a tool.</p>
</section>
<section id="security" class="level3">
<h3 class="anchored" data-anchor-id="security">Security</h3>
<ul>
<li><a href="https://fortune.com/2025/12/15/ai-coding-tools-security-exploit-software/">AI Coding Tools Exploded in 2025. The First Security Exploits Show What Could Go Wrong</a> — Fortune’s reporting on the “IDEsaster” vulnerabilities found across Cursor, Copilot, Windsurf, and other tools</li>
<li><a href="https://thehackernews.com/2025/12/researchers-uncover-30-flaws-in-ai.html">Researcher Uncovers 30+ Flaws in AI Coding Tools</a> — technical breakdown of the universal attack chains affecting major AI IDEs</li>
<li><a href="https://devops.com/security-flaws-in-anthropics-claude-code-risk-stolen-data-system-takeover/">Security Flaws in Claude Code Risk Stolen Data, System Takeover</a> — Check Point’s findings on Claude Code-specific CVEs, including hook injection and API key theft</li>
<li><a href="https://blog.cyberdesserts.com/ai-agent-security-risks/">AI Agent Security Risks in 2026: A Practitioner’s Guide</a> — practical guide to defending against prompt injection, credential theft, and MCP vulnerabilities</li>
</ul>
<p>This is an area where best practices are being written in real time. What works today may be outdated in six months. Stay plugged into the communities above, and don’t assume any single tool or configuration is permanently “safe.”</p>
</section>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Blogs</category>
  <category>GenAI</category>
  <category>LLM</category>
  <category>Agentic coding</category>
  <category>Security</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Blogs/claude-code-best-practices.html</guid>
  <pubDate>Tue, 10 Mar 2026 00:00:00 GMT</pubDate>
  <media:content url="https://uw-madison-datascience.github.io/ML-X-Nexus/images/claudecode.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Understanding Quantization and Precision</title>
  <dc:creator>Chris Endemann</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Notebooks/Quantization-and-Precision.html</link>
  <description><![CDATA[ 




<p><a href="https://colab.research.google.com/github/UW-Madison-DataScience/ML-X-Nexus/blob/main/Learn/Notebooks/Quantization-and-Precision.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" class="img-fluid"></a></p>
<p>When working with large language models, you’ll often encounter terms like “FP32”, “FP16”, “INT8”, and “4-bit quantization.” These describe how a model’s weights are stored in memory — and they have a direct impact on how much GPU memory a model requires, how fast it runs, and whether it fits on your hardware at all.</p>
<p>This notebook unpacks these concepts step by step:</p>
<ol type="1">
<li>Precision: What floating-point formats (FP32, FP16, BF16) mean and how they affect memory.</li>
<li>Quantization: How tools like <code>bitsandbytes</code> reduce precision further (to 8-bit or 4-bit) to shrink memory footprints.</li>
<li>Parameter counts vs.&nbsp;memory: Why the number of model parameters stays the same, but memory usage changes.</li>
<li>A PyTorch gotcha: Why <code>model.parameters()</code> can report misleading numbers after quantization — and how to correctly count parameters.</li>
<li>When to quantize: Practical guidance on where quantization helps most, and where it doesn’t.</li>
</ol>
<section id="prerequisites" class="level3">
<h3 class="anchored" data-anchor-id="prerequisites">Prerequisites</h3>
<ul>
<li>Basic familiarity with PyTorch and Hugging Face <code>transformers</code></li>
<li>Access to a GPU runtime (e.g., Google Colab with T4)</li>
</ul>
</section>
<section id="setup" class="level2">
<h2 class="anchored" data-anchor-id="setup">Setup</h2>
<div id="e3b69479" class="cell" data-execution_count="1">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>pip install <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>q transformers accelerate bitsandbytes torch</span></code></pre></div></div>
</div>
<div id="70fec532" class="cell" data-execution_count="2">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> torch</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> gc</span>
<span id="cb2-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> transformers <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig</span></code></pre></div></div>
</div>
</section>
<section id="part-1-what-is-precision" class="level2">
<h2 class="anchored" data-anchor-id="part-1-what-is-precision">Part 1: What is precision?</h2>
<p>Every number in a neural network — every weight, bias, and activation — is stored as a sequence of bits. The <strong>precision</strong> (or data type) determines how many bits are used per number, which controls both the range and granularity of values that can be represented.</p>
<section id="the-ruler-analogy" class="level3">
<h3 class="anchored" data-anchor-id="the-ruler-analogy">The ruler analogy</h3>
<p>Think of precision like the markings on a ruler. A high-precision ruler has markings at every millimeter — you can represent fine distinctions like 3.217 cm vs.&nbsp;3.218 cm. A low-precision ruler might only have markings at each centimeter — you can still measure things, but 3.217 cm and 3.218 cm both round to 3 cm. You’ve lost the ability to distinguish them, but you need far less space to write down your measurement.</p>
<p>That’s exactly what happens with neural network weights. At FP32, a weight might be stored as <code>0.31415927</code>. At FP16, it becomes <code>0.3142</code> — close, but not identical. At 4-bit, it gets mapped to one of only 16 possible values, like <code>0.3125</code>. The question is whether those small differences matter for the model’s outputs. For most deep learning tasks, they don’t.</p>
</section>
<section id="how-floating-point-numbers-are-stored" class="level3">
<h3 class="anchored" data-anchor-id="how-floating-point-numbers-are-stored">How floating-point numbers are stored</h3>
<p>A floating-point number is stored in three parts:</p>
<ul>
<li><strong>Sign bit</strong> (1 bit): positive or negative</li>
<li><strong>Exponent bits</strong>: control the <em>range</em> — how large or small the number can be (like the power in scientific notation)</li>
<li><strong>Fraction bits</strong> (aka mantissa): control the <em>precision</em> — how many significant digits you get</li>
</ul>
<p>For example, FP32 uses 1 sign + 8 exponent + 23 fraction = 32 bits. FP16 cuts this to 1 + 5 + 10 = 16 bits. Fewer fraction bits means coarser rounding; fewer exponent bits means a narrower range of representable values. The table below summarizes the common formats:</p>
<table class="caption-top table">
<colgroup>
<col style="width: 15%">
<col style="width: 8%">
<col style="width: 14%">
<col style="width: 14%">
<col style="width: 27%">
<col style="width: 18%">
</colgroup>
<thead>
<tr class="header">
<th>Data type</th>
<th>Bits</th>
<th>Exponent</th>
<th>Fraction</th>
<th>Approximate range</th>
<th>Typical use</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>FP32 (float32)</td>
<td>32</td>
<td>8</td>
<td>23</td>
<td>~ 1e-38 to ~ 3e38</td>
<td>Default training precision</td>
</tr>
<tr class="even">
<td>FP16 (float16)</td>
<td>16</td>
<td>5</td>
<td>10</td>
<td>~6e-5 to 65504</td>
<td>Mixed-precision training</td>
</tr>
<tr class="odd">
<td>BF16 (bfloat16)</td>
<td>16</td>
<td>8</td>
<td>7</td>
<td>~ 1e-38 to ~ 3e38</td>
<td>Training on modern GPUs (A100, H100)</td>
</tr>
<tr class="even">
<td>INT8</td>
<td>8</td>
<td>—</td>
<td>—</td>
<td>-128 to 127</td>
<td>Post-training quantization</td>
</tr>
<tr class="odd">
<td>NF4 (4-bit)</td>
<td>4</td>
<td>—</td>
<td>—</td>
<td>16 discrete values</td>
<td>Aggressive quantization via bitsandbytes</td>
</tr>
</tbody>
</table>
<p>Note that INT8 and NF4 are <strong>integer/discrete</strong> formats — they don’t have exponent and fraction parts at all. They can only represent a small, fixed set of values, and real-valued weights must be <em>mapped</em> onto those values (more on this in Part 3).</p>
</section>
<section id="key-insight-precision-controls-memory-per-parameter" class="level3">
<h3 class="anchored" data-anchor-id="key-insight-precision-controls-memory-per-parameter">Key insight: precision controls memory per parameter</h3>
<p>A model with 1 billion parameters requires:</p>
<ul>
<li><strong>4 GB</strong> at FP32 (4 bytes per param)</li>
<li><strong>2 GB</strong> at FP16/BF16 (2 bytes per param)</li>
<li><strong>1 GB</strong> at INT8 (1 byte per param)</li>
<li><strong>~0.5 GB</strong> at 4-bit (0.5 bytes per param)</li>
</ul>
<p>The number of parameters hasn’t changed — only how much memory each one occupies.</p>
<p>Let’s verify this with a real model.</p>
</section>
</section>
<section id="part-2-loading-a-model-at-different-precisions" class="level2">
<h2 class="anchored" data-anchor-id="part-2-loading-a-model-at-different-precisions">Part 2: Loading a model at different precisions</h2>
<p>We’ll use a small model — GPT-2 (124M parameters) — to keep things manageable and demonstrate the concepts clearly.</p>
<section id="helper-measure-gpu-memory" class="level3">
<h3 class="anchored" data-anchor-id="helper-measure-gpu-memory">Helper: Measure GPU memory</h3>
<div id="02faabe8" class="cell" data-execution_count="3">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> get_gpu_memory_mb():</span>
<span id="cb3-2">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""Return current GPU memory allocated in MB."""</span></span>
<span id="cb3-3">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> torch.cuda.is_available():</span>
<span id="cb3-4">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> torch.cuda.memory_allocated() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1024</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb3-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span></span>
<span id="cb3-6"></span>
<span id="cb3-7"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> load_and_measure(model_name, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, quantization_config<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>):</span>
<span id="cb3-8">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""Load a model and report memory usage and parameter info."""</span></span>
<span id="cb3-9">    gc.collect()</span>
<span id="cb3-10">    torch.cuda.empty_cache()</span>
<span id="cb3-11"></span>
<span id="cb3-12">    before <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_gpu_memory_mb()</span>
<span id="cb3-13"></span>
<span id="cb3-14">    kwargs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"device_map"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"auto"</span>}</span>
<span id="cb3-15">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> dtype:</span>
<span id="cb3-16">        kwargs[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"torch_dtype"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dtype</span>
<span id="cb3-17">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> quantization_config:</span>
<span id="cb3-18">        kwargs[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"quantization_config"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> quantization_config</span>
<span id="cb3-19"></span>
<span id="cb3-20">    model <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> AutoModelForCausalLM.from_pretrained(model_name, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span>kwargs)</span>
<span id="cb3-21"></span>
<span id="cb3-22">    after <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_gpu_memory_mb()</span>
<span id="cb3-23">    mem_used <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> after <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> before</span>
<span id="cb3-24"></span>
<span id="cb3-25">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Count parameters</span></span>
<span id="cb3-26">    total_params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(p.numel() <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> p <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> model.parameters())</span>
<span id="cb3-27">    total_elements <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(p.nelement() <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> p <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> model.parameters())</span>
<span id="cb3-28"></span>
<span id="cb3-29">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'='</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">60</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb3-30">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>label<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb3-31">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'='</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">60</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb3-32">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  GPU memory used:     </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>mem_used<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> MB"</span>)</span>
<span id="cb3-33">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  model.parameters():  </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>total_params<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> (via numel())"</span>)</span>
<span id="cb3-34">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  Expected params:     ~124,000,000 (GPT-2)"</span>)</span>
<span id="cb3-35"></span>
<span id="cb3-36">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Show dtypes present in model</span></span>
<span id="cb3-37">    dtypes <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>()</span>
<span id="cb3-38">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> p <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> model.parameters():</span>
<span id="cb3-39">        dtypes.add(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>(p.dtype))</span>
<span id="cb3-40">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  Parameter dtypes:    </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>dtypes<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb3-41"></span>
<span id="cb3-42">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> model, mem_used, total_params</span></code></pre></div></div>
</div>
</section>
<section id="fp32-default" class="level3">
<h3 class="anchored" data-anchor-id="fp32-default">FP32 (default)</h3>
<div id="5cdfddf5" class="cell" data-execution_count="4">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1">model_name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpt2"</span></span>
<span id="cb4-2"></span>
<span id="cb4-3">model_fp32, mem_fp32, params_fp32 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> load_and_measure(</span>
<span id="cb4-4">    model_name, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>torch.float32, label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"FP32 (32-bit floating point)"</span></span>
<span id="cb4-5">)</span>
<span id="cb4-6"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">del</span> model_fp32</span>
<span id="cb4-7">gc.collect()</span>
<span id="cb4-8">torch.cuda.empty_cache()</span></code></pre></div></div>
</div>
</section>
<section id="fp16-half-precision" class="level3">
<h3 class="anchored" data-anchor-id="fp16-half-precision">FP16 (half precision)</h3>
<div id="a4cb038f" class="cell" data-execution_count="5">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">model_fp16, mem_fp16, params_fp16 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> load_and_measure(</span>
<span id="cb5-2">    model_name, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>torch.float16, label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"FP16 (16-bit floating point)"</span></span>
<span id="cb5-3">)</span>
<span id="cb5-4"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">del</span> model_fp16</span>
<span id="cb5-5">gc.collect()</span>
<span id="cb5-6">torch.cuda.empty_cache()</span></code></pre></div></div>
</div>
</section>
<section id="bf16-bfloat16" class="level3">
<h3 class="anchored" data-anchor-id="bf16-bfloat16">BF16 (bfloat16)</h3>
<p>BF16 uses 16 bits like FP16, but allocates them differently (as shown in the table in Part 1). FP16 gives 10 bits to the fraction for finer precision, but only 5 bits to the exponent — which is why it caps out around 65,504 and can’t represent very small values. BF16 flips this tradeoff: it keeps the same 8-bit exponent as FP32 (giving it the same massive range), at the cost of only 7 fraction bits. In practice, this works well for deep learning — the range matters more than fine-grained precision, and BF16 avoids the overflow/underflow issues that can plague FP16 during training.</p>
<div id="9b0ce0a3" class="cell" data-execution_count="6">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1">model_bf16, mem_bf16, params_bf16 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> load_and_measure(</span>
<span id="cb6-2">    model_name, dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>torch.bfloat16, label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"BF16 (bfloat16)"</span></span>
<span id="cb6-3">)</span>
<span id="cb6-4"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">del</span> model_bf16</span>
<span id="cb6-5">gc.collect()</span>
<span id="cb6-6">torch.cuda.empty_cache()</span></code></pre></div></div>
</div>
</section>
<section id="compare-precision-vs.-memory" class="level3">
<h3 class="anchored" data-anchor-id="compare-precision-vs.-memory">Compare: precision vs.&nbsp;memory</h3>
<div id="136f77a4" class="cell" data-execution_count="7">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Memory comparison (GPT-2, 124M params):"</span>)</span>
<span id="cb7-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  FP32:  </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>mem_fp32<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> MB"</span>)</span>
<span id="cb7-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  FP16:  </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>mem_fp16<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> MB"</span>)</span>
<span id="cb7-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  BF16:  </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>mem_bf16<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> MB"</span>)</span>
<span id="cb7-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Parameter count (should be identical):"</span>)</span>
<span id="cb7-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  FP32:  </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>params_fp32<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb7-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  FP16:  </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>params_fp16<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb7-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  BF16:  </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>params_bf16<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div></div>
</div>
<p>At this point, the key takeaway should be clear: reducing precision halves memory, but the parameter count is unchanged. Every weight is still there — it just takes up less space.</p>
</section>
<section id="precision-reduction-vs.-quantization-whats-the-difference" class="level3">
<h3 class="anchored" data-anchor-id="precision-reduction-vs.-quantization-whats-the-difference">Precision reduction vs.&nbsp;quantization: what’s the difference?</h3>
<p>What we’ve done so far — loading a model in FP16 or BF16 instead of FP32 — is <strong>precision reduction</strong> (sometimes called “casting” or “downcasting”). It’s straightforward: each floating-point value is converted to a format with fewer bits, using standard IEEE rounding rules. The value <code>0.31415927</code> in FP32 becomes <code>0.3142</code> in FP16. There’s no special algorithm involved — it’s just rounding.</p>
<p><strong>Quantization</strong> is fundamentally different. It doesn’t just round values to a lower-precision float — it <em>maps</em> them onto a small, discrete set of values (like the 256 integers in INT8, or just 16 values in NF4). This mapping requires decisions that simple rounding can’t make:</p>
<ul>
<li><strong>What range of weight values should map to what integers?</strong> (This is called calibration.)</li>
<li><strong>Should all layers use the same mapping, or should each layer be calibrated separately?</strong></li>
<li><strong>What do you do about outlier weights that fall far outside the typical range?</strong></li>
</ul>
<p>Different quantization algorithms answer these questions differently, and their choices directly affect how much quality you lose. That’s why quantization is a more involved process than just picking <code>torch.float16</code> — it’s a compression technique with real engineering behind it.</p>
</section>
</section>
<section id="part-3-quantization-mapping-weights-to-fewer-values" class="level2">
<h2 class="anchored" data-anchor-id="part-3-quantization-mapping-weights-to-fewer-values">Part 3: Quantization — mapping weights to fewer values</h2>
<p>Going back to our ruler analogy: precision reduction is like switching from a millimeter ruler to a centimeter ruler — you still have a continuous ruler, just with fewer markings. Quantization is like replacing the ruler entirely with a set of labeled bins. Every weight gets sorted into the nearest bin, and from that point on, it’s stored as just a bin number (an integer). The bins are chosen carefully so that the most common weight values land close to a bin center, minimizing the error introduced by this binning.</p>
<p>Here’s the key idea more concretely. Suppose a layer has weights ranging from -1.0 to 1.0, and you’re quantizing to INT8 (256 possible values). A simple approach would:</p>
<ol type="1">
<li><strong>Find the range</strong> of the weights: min = -1.0, max = 1.0.</li>
<li><strong>Divide the range</strong> into 256 equally spaced bins, each spanning ~0.0078.</li>
<li><strong>Map each weight</strong> to the nearest bin center and store just the bin index (an integer from 0 to 255).</li>
<li><strong>Store the scale factor</strong> (bin width) and <strong>zero point</strong> so you can approximately reconstruct the original value later: <code>reconstructed ≈ scale × integer + zero_point</code>.</li>
</ol>
<p>This is called <strong>linear (uniform) quantization</strong>, and it’s the simplest scheme. More advanced methods — like the ones used in practice — improve on this in important ways:</p>
<ul>
<li><a href="https://arxiv.org/abs/2208.07339"><strong>LLM.int8()</strong></a> (Dettmers et al., 2022) discovered that a small fraction of “outlier” features in transformer models have very large magnitudes. If you force these into the same bins as normal-range weights, quality collapses. Their solution: detect outlier features at runtime, keep them in FP16, and quantize only the remaining ~99.9% of values to INT8. This mixed-precision decomposition makes 8-bit quantization effectively lossless.</li>
<li><a href="https://arxiv.org/abs/2305.14314"><strong>NF4</strong></a> (Dettmers et al., 2023) takes a different approach for 4-bit. Instead of spacing bins evenly, it places them at the quantiles of a normal distribution — because neural network weights are approximately normally distributed. This means bins are denser where weights are most concentrated (near zero) and sparser in the tails, making optimal use of only 16 possible values. <strong>Double quantization</strong> further compresses the scale factors themselves, saving additional memory.</li>
<li><a href="https://arxiv.org/abs/2210.17323"><strong>GPTQ</strong></a> (Frantar et al., 2023) uses a one-shot weight quantization approach that considers the interaction between weights: when one weight is rounded to a bin, it adjusts the remaining weights to compensate for the rounding error. This layer-wise optimization enables 3-4 bit quantization of 175B-parameter models with negligible accuracy loss.</li>
</ul>
<p>The tools below make these algorithms accessible through a simple configuration interface.</p>
<section id="using-bitsandbytes-for-quantization" class="level3">
<h3 class="anchored" data-anchor-id="using-bitsandbytes-for-quantization">Using bitsandbytes for quantization</h3>
<p><a href="https://github.com/bitsandbytes-foundation/bitsandbytes"><code>bitsandbytes</code></a> integrates directly with Hugging Face <code>transformers</code>, letting you apply these quantization algorithms at model load time with just a configuration flag. Let’s see the memory savings in practice.</p>
</section>
<section id="bit-quantization" class="level3">
<h3 class="anchored" data-anchor-id="bit-quantization">8-bit quantization</h3>
<div id="a911d281" class="cell" data-execution_count="8">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1">bnb_config_8bit <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> BitsAndBytesConfig(load_in_8bit<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb8-2"></span>
<span id="cb8-3">model_8bit, mem_8bit, params_8bit <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> load_and_measure(</span>
<span id="cb8-4">    model_name, quantization_config<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>bnb_config_8bit, label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"INT8 (8-bit via bitsandbytes)"</span></span>
<span id="cb8-5">)</span>
<span id="cb8-6"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">del</span> model_8bit</span>
<span id="cb8-7">gc.collect()</span>
<span id="cb8-8">torch.cuda.empty_cache()</span></code></pre></div></div>
</div>
</section>
<section id="bit-quantization-nf4" class="level3">
<h3 class="anchored" data-anchor-id="bit-quantization-nf4">4-bit quantization (NF4)</h3>
<div id="7c6cf339" class="cell" data-execution_count="9">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1">bnb_config_4bit <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> BitsAndBytesConfig(</span>
<span id="cb9-2">    load_in_4bit<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,</span>
<span id="cb9-3">    bnb_4bit_quant_type<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nf4"</span>,           <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># normalized float 4-bit</span></span>
<span id="cb9-4">    bnb_4bit_use_double_quant<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,       <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># further compress quantization constants</span></span>
<span id="cb9-5">    bnb_4bit_compute_dtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>torch.float16  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># compute in FP16 for speed</span></span>
<span id="cb9-6">)</span>
<span id="cb9-7"></span>
<span id="cb9-8">model_4bit, mem_4bit, params_4bit <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> load_and_measure(</span>
<span id="cb9-9">    model_name, quantization_config<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>bnb_config_4bit, label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"NF4 (4-bit via bitsandbytes)"</span></span>
<span id="cb9-10">)</span></code></pre></div></div>
</div>
</section>
<section id="compare-all-configurations" class="level3">
<h3 class="anchored" data-anchor-id="compare-all-configurations">Compare all configurations</h3>
<div id="2fc37166" class="cell" data-execution_count="10">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'='</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">60</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb10-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  Summary: GPT-2 (124M params) at different precisions"</span>)</span>
<span id="cb10-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'='</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">60</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb10-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Config'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&lt;12}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Memory (MB)'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;14}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'numel()'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;16}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb10-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'-'</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">44</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb10-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'FP32'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&lt;12}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>mem_fp32<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;14,.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>params_fp32<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;16,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb10-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'FP16'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&lt;12}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>mem_fp16<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;14,.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>params_fp16<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;16,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb10-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'BF16'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&lt;12}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>mem_bf16<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;14,.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>params_bf16<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;16,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb10-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'INT8'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&lt;12}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>mem_8bit<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;14,.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>params_8bit<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;16,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb10-10"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'NF4 (4-bit)'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&lt;12}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>mem_4bit<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;14,.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>params_4bit<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:&gt;16,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div></div>
</div>
</section>
</section>
<section id="part-4-the-pytorch-parameter-count-gotcha" class="level2">
<h2 class="anchored" data-anchor-id="part-4-the-pytorch-parameter-count-gotcha">Part 4: The PyTorch parameter count gotcha</h2>
<p>If you ran the summary above, you may have noticed something odd: the 4-bit model reports ~ 82M parameters via <code>numel()</code> instead of the expected ~ 124M. Did quantization remove 42 million weights?</p>
<p>No.&nbsp;The model still has the same architecture and the same logical number of parameters. The discrepancy comes from how <code>bitsandbytes</code> stores quantized weights — and from the fact that <strong>not all parameters get quantized</strong>.</p>
<section id="what-gets-quantized-and-what-doesnt" class="level3">
<h3 class="anchored" data-anchor-id="what-gets-quantized-and-what-doesnt">What gets quantized (and what doesn’t)</h3>
<p>When you load a model with <code>bitsandbytes</code>, only the large linear layer weight matrices are quantized. Smaller parameters — biases, layer normalization weights, and embedding layers — are kept in their original precision (typically FP16 or FP32). This is by design: these small parameters contribute little to total memory, and quantizing them would hurt quality disproportionately.</p>
</section>
<section id="why-numel-is-misleading-for-quantized-parameters" class="level3">
<h3 class="anchored" data-anchor-id="why-numel-is-misleading-for-quantized-parameters">Why <code>numel()</code> is misleading for quantized parameters</h3>
<p>For the parameters that <em>are</em> quantized, <code>bitsandbytes</code> packs multiple low-bit values into each byte:</p>
<ul>
<li><strong>8-bit</strong>: Each weight occupies 1 byte, stored as a <code>uint8</code> tensor. <code>numel()</code> still returns the correct count here, since it’s one element per weight.</li>
<li><strong>4-bit</strong>: Two weights are packed into a single byte, stored as a <code>uint8</code> tensor of <strong>half the length</strong>.</li>
</ul>
<p>When you call <code>p.numel()</code> on a 4-bit quantized parameter, PyTorch reports the number of elements in the storage tensor (the packed <code>uint8</code> values), not the number of logical weights. Since two 4-bit values are packed into one <code>uint8</code> element, <code>numel()</code> returns half the true count for those parameters. Combined with the non-quantized parameters (which report correctly), the total <code>numel()</code> across the model ends up somewhere between the true count and half — in GPT-2’s case, ~ 82M instead of ~ 124M.</p>
</section>
<section id="correctly-counting-parameters" class="level3">
<h3 class="anchored" data-anchor-id="correctly-counting-parameters">Correctly counting parameters</h3>
<p>To get the real parameter count, we need to check whether each parameter is a quantized <code>bitsandbytes</code> type and recover the original shape:</p>
<div id="df53baa6" class="cell" data-execution_count="11">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> bitsandbytes <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> bnb</span>
<span id="cb11-2"></span>
<span id="cb11-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> count_parameters_correct(model):</span>
<span id="cb11-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""Count parameters correctly, handling bitsandbytes quantized layers."""</span></span>
<span id="cb11-5">    total <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb11-6">    quantized <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb11-7">    non_quantized <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb11-8"></span>
<span id="cb11-9">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> name, param <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> model.named_parameters():</span>
<span id="cb11-10">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">hasattr</span>(param, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"quant_state"</span>):</span>
<span id="cb11-11">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># This is a bitsandbytes quantized parameter.</span></span>
<span id="cb11-12">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># The original shape is stored in quant_state.</span></span>
<span id="cb11-13">            original_numel <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> param.quant_state.shape.numel()</span>
<span id="cb11-14">            total <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> original_numel</span>
<span id="cb11-15">            quantized <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> original_numel</span>
<span id="cb11-16">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>:</span>
<span id="cb11-17">            total <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> param.numel()</span>
<span id="cb11-18">            non_quantized <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> param.numel()</span>
<span id="cb11-19"></span>
<span id="cb11-20">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> total, quantized, non_quantized</span></code></pre></div></div>
</div>
<div id="8e34a5c5" class="cell" data-execution_count="12">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1">naive_count <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(p.numel() <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> p <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> model_4bit.parameters())</span>
<span id="cb12-2">correct_total, quantized_params, non_quantized_params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> count_parameters_correct(model_4bit)</span>
<span id="cb12-3"></span>
<span id="cb12-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"4-bit quantized GPT-2:"</span>)</span>
<span id="cb12-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  Naive numel() count:          </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>naive_count<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb12-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  Correct parameter count:      </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>correct_total<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb12-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"    - Quantized (true count):   </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>quantized_params<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  (numel reports ~</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>quantized_params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> due to packing)"</span>)</span>
<span id="cb12-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"    - Non-quantized:            </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>non_quantized_params<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  (numel reports correctly)"</span>)</span>
<span id="cb12-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  Expected (GPT-2):             ~124,000,000"</span>)</span>
<span id="cb12-10"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  Sanity check: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>quantized_params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> (packed) + </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>non_quantized_params<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> (unquantized) ≈ </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>naive_count<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> (naive total) ✓"</span>)</span></code></pre></div></div>
</div>
</section>
<section id="whats-happening-under-the-hood" class="level3">
<h3 class="anchored" data-anchor-id="whats-happening-under-the-hood">What’s happening under the hood</h3>
<p>Let’s peek at an individual layer to see the difference between the stored tensor shape and the logical weight shape:</p>
<div id="1b42b8bd" class="cell" data-execution_count="13">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> name, param <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> model_4bit.named_parameters():</span>
<span id="cb13-2">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">hasattr</span>(param, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"quant_state"</span>):</span>
<span id="cb13-3">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Layer: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>name<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb13-4">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  Storage tensor shape: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>param<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb13-5">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  Storage dtype:        </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>param<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>dtype<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb13-6">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  numel() reports:      </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>param<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>numel()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb13-7">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  Original shape:       </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>param<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>quant_state<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb13-8">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  Original numel:       </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>param<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>quant_state<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>shape<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>numel()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb13-9">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>()</span>
<span id="cb13-10">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">break</span>  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># just show one example</span></span></code></pre></div></div>
</div>
<p>This confirms that quantization doesn’t remove parameters — it repacks them into a more compact representation. The model’s architecture and logical weight count are unchanged, but the storage is compressed.</p>
</section>
<section id="summary" class="level3">
<h3 class="anchored" data-anchor-id="summary">Summary</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 45%">
<col style="width: 54%">
</colgroup>
<thead>
<tr class="header">
<th>What you check</th>
<th>What it tells you</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>p.numel()</code></td>
<td>Number of elements in the storage tensor (misleading for quantized params)</td>
</tr>
<tr class="even">
<td><code>p.quant_state.shape</code></td>
<td>The original logical shape of the weight matrix</td>
</tr>
<tr class="odd">
<td>GPU memory usage</td>
<td>The actual memory footprint — the metric that matters for fitting on your hardware</td>
</tr>
</tbody>
</table>
<p>The bottom line: quantization does not remove parameters. It changes how they’re stored. Always use <code>quant_state</code> or measure GPU memory directly if you want an accurate picture of a quantized model.</p>
</section>
</section>
<section id="part-5-when-to-quantize-and-how-aggressively" class="level2">
<h2 class="anchored" data-anchor-id="part-5-when-to-quantize-and-how-aggressively">Part 5: When to quantize (and how aggressively)</h2>
<p>Now that we’ve seen how quantization works mechanically, the natural next question is: when should you actually use it, and how far should you go?</p>
<section id="quantization-is-primarily-an-inference-technique" class="level3">
<h3 class="anchored" data-anchor-id="quantization-is-primarily-an-inference-technique">Quantization is primarily an inference technique</h3>
<p>Quantization shines at inference time. Training requires high-precision gradients to make stable updates to model weights, and aggressive quantization (8-bit or below) introduces too much noise for standard backpropagation to work well. For this reason, most models are trained at FP32, BF16, or with mixed-precision strategies (FP16 compute with FP32 accumulation), and then quantized after training for deployment.</p>
<p>The notable exception is <a href="https://arxiv.org/abs/2305.14314">QLoRA</a> (Dettmers et al., 2023), which freezes a 4-bit quantized base model and trains only small low-rank adapter (LoRA) layers in higher precision. This makes it possible to fine-tune a 65B-parameter model on a single 48GB GPU — but the base weights themselves are never updated in low precision.</p>
</section>
<section id="a-bigger-model-at-lower-precision-often-beats-a-smaller-model-at-full-precision" class="level3">
<h3 class="anchored" data-anchor-id="a-bigger-model-at-lower-precision-often-beats-a-smaller-model-at-full-precision">A bigger model at lower precision often beats a smaller model at full precision</h3>
<p>One of the most practical insights from quantization research: you can often get better results by running a larger model at 4-bit than a smaller model at FP16, using the same GPU memory. For example:</p>
<ul>
<li>A <strong>70B model at 4-bit</strong> (~ 35 GB) can fit on a single 48GB GPU and typically outperforms a <strong>13B model at FP16</strong> (~ 26 GB) on reasoning and knowledge benchmarks.</li>
<li>A <strong>13B model at 4-bit</strong> (~ 6.5 GB) fits comfortably on a 12GB consumer GPU and often outperforms a <strong>7B model at FP16</strong> (~ 14 GB, which wouldn’t even fit).</li>
</ul>
<p>The rule of thumb: <strong>spend your memory budget on more parameters first, then reduce precision to fit.</strong> A 4-bit model loses a small amount of quality from quantization, but it gains far more from having access to more learned knowledge and capacity.</p>
</section>
<section id="speed-and-cost-quantization-isnt-just-about-fitting" class="level3">
<h3 class="anchored" data-anchor-id="speed-and-cost-quantization-isnt-just-about-fitting">Speed and cost: quantization isn’t just about fitting</h3>
<p>Quantization isn’t just about fitting a model onto your GPU — it also makes inference faster. Lower-precision operations use less memory bandwidth, and for small batch sizes (which are common in interactive applications), memory bandwidth is often the bottleneck. So a 4-bit model doesn’t just use ~8x less memory than FP32 — it can also generate tokens noticeably faster.</p>
<p>This creates a practical consideration:</p>
<ul>
<li>The question worth asking is not just “which model is most accurate?” but “which model gives me acceptable quality at the speed and cost I need?”</li>
<li>For batch applications (processing thousands of documents), the speed improvement from quantization can cut costs significantly.</li>
<li>For interactive applications (chatbots, coding assistants), faster token generation directly improves user experience.</li>
</ul>
</section>
<section id="how-low-can-you-go" class="level3">
<h3 class="anchored" data-anchor-id="how-low-can-you-go">How low can you go?</h3>
<p>Research suggests <strong>4-bit is the practical sweet spot</strong> for inference:</p>
<ul>
<li><a href="https://arxiv.org/abs/2212.09720">Dettmers &amp; Zettlemoyer (2023)</a> ran over 35,000 quantization experiments and found that 4-bit precision is nearly universally optimal when trading off total model bits against zero-shot accuracy. At 3-bit, quality degrades sharply.</li>
<li>8-bit quantization (via <a href="https://arxiv.org/abs/2208.07339">LLM.int8()</a>) is effectively lossless for most models — it’s a safe default when memory is tight but you don’t want to risk any quality loss.</li>
<li><a href="https://arxiv.org/abs/2210.17323">GPTQ</a> (Frantar et al., 2023) demonstrated that one-shot weight quantization to 3-4 bits is feasible even for 175B-parameter models with negligible accuracy loss, enabling single-GPU inference for models that otherwise require multiple GPUs.</li>
</ul>
<p>A recent study on <a href="https://arxiv.org/abs/2411.04330">Scaling Laws for Precision</a> (Kumar, Ankner et al., 2024) adds important nuance: <strong>the quality degradation from post-training quantization grows as models are trained on more data.</strong> A model trained to its full data budget may be more sensitive to aggressive quantization than one that was undertrained. This means the “safe” bit-width may shift upward as foundation models continue to scale their training data.</p>
</section>
<section id="decision-flowchart" class="level3">
<h3 class="anchored" data-anchor-id="decision-flowchart">Decision flowchart</h3>
<p>When deciding whether and how to quantize, work through these questions:</p>
<ol type="1">
<li><strong>Does the model fit on your GPU at FP16/BF16?</strong> If yes, start there — no quantization needed unless you want faster inference.</li>
<li><strong>Is this for training or inference?</strong>
<ul>
<li><em>Training from scratch</em>: Use BF16 (or FP32 if your GPU lacks BF16 support). Don’t quantize.</li>
<li><em>Fine-tuning</em>: If the full model doesn’t fit, use QLoRA (4-bit base + FP16 adapters).</li>
<li><em>Inference</em>: Continue to step 3.</li>
</ul></li>
<li><strong>How much quality loss can you tolerate?</strong>
<ul>
<li><em>None</em>: Use INT8. It’s effectively lossless for most models.</li>
<li><em>Minimal, with significant memory savings</em>: Use 4-bit NF4 with double quantization.</li>
<li><em>You need extreme compression</em>: Try 3-bit, but <strong>benchmark on your specific task</strong> — quality loss at 3-bit is sharp and task-dependent.</li>
</ul></li>
<li><strong>Could you run a larger model by quantizing more aggressively?</strong> Often the answer is yes. A 70B model at 4-bit typically beats a 13B model at 16-bit.</li>
</ol>
</section>
<section id="quick-reference" class="level3">
<h3 class="anchored" data-anchor-id="quick-reference">Quick reference</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 31%">
<col style="width: 68%">
</colgroup>
<thead>
<tr class="header">
<th>Scenario</th>
<th>Recommended precision</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Training from scratch</td>
<td>FP32 or BF16 (mixed precision)</td>
</tr>
<tr class="even">
<td>Fine-tuning (full)</td>
<td>BF16 or FP16</td>
</tr>
<tr class="odd">
<td>Fine-tuning (parameter-efficient on limited hardware)</td>
<td>QLoRA (4-bit base + FP16 adapters)</td>
</tr>
<tr class="even">
<td>Inference (quality-sensitive)</td>
<td>8-bit (INT8) — effectively lossless</td>
</tr>
<tr class="odd">
<td>Inference (memory/speed-constrained)</td>
<td>4-bit (NF4 or GPTQ) — slight quality loss, large memory/speed gain</td>
</tr>
<tr class="even">
<td>Inference (extreme compression)</td>
<td>3-bit or below — expect meaningful quality loss, benchmark carefully</td>
</tr>
<tr class="odd">
<td>Choosing between model sizes</td>
<td>Prefer larger model at lower precision over smaller model at full precision</td>
</tr>
</tbody>
</table>
</section>
</section>
<section id="cleanup" class="level2">
<h2 class="anchored" data-anchor-id="cleanup">Cleanup</h2>
<div id="d2355f02" class="cell" data-execution_count="14">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">del</span> model_4bit</span>
<span id="cb14-2">gc.collect()</span>
<span id="cb14-3">torch.cuda.empty_cache()</span>
<span id="cb14-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"GPU memory cleared."</span>)</span></code></pre></div></div>
</div>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Notebooks/2025-05-07_RAG-Romeo-Juliet.html"><strong>Notebook</strong>: Exploring Fact-Based QA with RAG: Romeo and Juliet</a>: A RAG tutorial that uses bitsandbytes for running quantized models in resource-constrained environments.</li>
<li><a href="https://huggingface.co/docs/transformers/main/en/quantization/bitsandbytes">Hugging Face: bitsandbytes integration</a>: Official documentation on using bitsandbytes with the <code>transformers</code> library.</li>
<li><a href="https://github.com/bitsandbytes-foundation/bitsandbytes">bitsandbytes on GitHub</a>: The bitsandbytes library for 8-bit and 4-bit quantization.</li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Notebooks</category>
  <category>Code-along</category>
  <category>Compute</category>
  <category>Deep learning</category>
  <category>LLM</category>
  <category>Quantization</category>
  <category>GPU</category>
  <category>Hugging Face</category>
  <category>PyTorch</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Notebooks/Quantization-and-Precision.html</guid>
  <pubDate>Mon, 09 Mar 2026 00:00:00 GMT</pubDate>
  <media:content url="https://uw-madison-datascience.github.io/ML-X-Nexus/images/bitsandbytes.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>Intro to GCP for Machine Learning &amp; AI</title>
  <dc:creator>Chris Endemann</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-GCP.html</link>
  <description><![CDATA[ 




<p>This <a href="https://qualiamachine.github.io/Intro_GCP_for_ML/">Intro to GCP</a> workshop teaches core workflows for building, training, and tuning ML/AI models in Google Cloud’s Vertex AI platform. Participants learn to set up data, configure Vertex AI Workbench notebooks, launch training and tuning jobs, and optimize resource costs effectively within GCP. The workshop also includes a section on building retrieval-augmented generation (RAG) pipelines using Gemini models.</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>UW-Madison Cloud Users
</div>
</div>
<div class="callout-body-container callout-body">
<p>A personal GCP account is fine for this workshop. However, for <strong>long-term research use</strong>, we recommend switching to a <strong>UW-provisioned GCP account</strong>. You’ll get institutional pricing, <a href="https://rsp.wisc.edu/proposalprep/cloudComputeInfo.cfm">lower overhead on grants</a> (26% instead of 55.5% — saving ~$2,950 per $10k in cloud costs), data protection agreements (including BAA for HIPAA), and dedicated support from the <a href="https://kb.wisc.edu/page.php?id=109785">Public Cloud Team</a>. NIH-funded researchers can get additional discounts through the <a href="https://kb.wisc.edu/109813">STRIDES Initiative</a>. You can also <a href="https://edu.google.com/intl/ALL_us/programs/credits/research/">apply for $5,000 in Google Cloud Research Credits</a>.</p>
<p><strong><a href="https://kb.wisc.edu/data/100171">Request a UW GCP account</a></strong> | <strong><a href="https://kb.wisc.edu/page.php?id=109785">Why use a UW account?</a></strong> | <strong><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/UW-Cloud-Services.html">Full details: UW Cloud Services</a></strong></p>
</div>
</div>
<section id="cost-estimate" class="level4">
<h4 class="anchored" data-anchor-id="cost-estimate">Cost estimate</h4>
<p>Running through this workshop should cost approximately <strong>$3–$8</strong> on GCP, assuming short GPU runs and limited hyperparameter tuning trials. Using <code>n2-standard-4</code> or <code>e2-standard-4</code> instances with a single T4 GPU generally stays within this range. New accounts may be eligible for <strong>$300 in free GCP credits</strong>, which typically cover the full cost of this workshop. It is recommended to track usage in the <a href="https://console.cloud.google.com/billing">GCP Billing Console</a> and delete unused resources once completed.</p>
</section>
<section id="prerequisites" class="level4">
<h4 class="anchored" data-anchor-id="prerequisites">Prerequisites</h4>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-ML_Sklearn.html"><strong>Workshop</strong>: Intro to Machine Learning</a></li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-Python_Gapminder.html"><strong>Workshop</strong>: Basic Python Programming</a></li>
</ul>
</section>
<section id="estimated-time-to-complete" class="level4">
<h4 class="anchored" data-anchor-id="estimated-time-to-complete">Estimated time to complete</h4>
<p><strong>4–6 hours</strong>: Based on running through training, tuning, and the Gemini RAG pipeline example.</p>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/UW-Cloud-Services.html"><strong>Compute</strong>: UW-Madison Cloud Services (AWS, GCP, Azure)</a> – Institutional discounts, lower grant overhead, data protections, research credits, and how to request a UW cloud account.</li>
<li><a href="https://kb.wisc.edu/101516">Public Cloud Team Office Hours</a> – Drop-in hours on Thursdays, 2–3:15 PM via Zoom. Get answers to cloud-related questions from the RCI and Public Cloud Team.</li>
<li><a href="https://cloud.google.com/free">GCP Free Tier</a>: Overview of free-tier limits and credits for new users.</li>
<li><a href="https://edu.google.com/intl/ALL_us/programs/credits/research/">Google Cloud Research Credits</a>: Apply for up to $5,000 in GCP credits for research.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/BadgerCompute.html"><strong>Compute</strong>: BadgerCompute</a> – UW–Madison’s lightweight, NetID-authenticated Jupyter service for short interactive sessions and classroom use. Includes a 4-hour runtime limit (which may sometimes beat the free version of Colab).</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/GoogleColab.html"><strong>Compute</strong>: Google Colab</a> – Learn how to use Google Colab for machine learning workflows.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/CHTC.html"><strong>Compute</strong>: Center for High Throughput Computing (CHTC)</a> – Learn how to use CHTC for machine learning jobs.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-Amazon_SageMaker.html"><strong>Compute</strong>: AWS SageMaker</a> – Parallel workshop covering similar cloud ML concepts using AWS infrastructure.</li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Workshops</category>
  <category>Code-along</category>
  <category>Carpentries</category>
  <category>Compute</category>
  <category>Cloud</category>
  <category>Google</category>
  <category>GCP</category>
  <category>GPU</category>
  <category>LLM</category>
  <category>RAG</category>
  <category>Retrieval</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-GCP.html</guid>
  <pubDate>Thu, 05 Mar 2026 00:00:00 GMT</pubDate>
  <media:content url="https://uw-madison-datascience.github.io/ML-X-Nexus/images/Google-Cloud-Logo.png" medium="image" type="image/png" height="81" width="144"/>
</item>
<item>
  <title>UW-Madison Cloud Services (AWS, GCP, Azure)</title>
  <dc:creator>Chris Endemann</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/UW-Cloud-Services.html</link>
  <description><![CDATA[ 




<p>UW-Madison offers enterprise cloud computing through contracts with <strong>Amazon Web Services (AWS)</strong>, <strong>Google Cloud Platform (GCP)</strong>, and <strong>Microsoft Azure</strong>. These services are managed by the <a href="https://kb.wisc.edu/page.php?id=109785">UW Public Cloud Team</a>, a cross-disciplinary group of operations, cybersecurity, and research cyberinfrastructure (RCI) professionals.</p>
<p>Using a UW-provisioned cloud account — rather than a personal one — gives you access to institutional pricing discounts, lower overhead on grants, data protection agreements, security monitoring, and dedicated support. If you’re doing any research or university work in the cloud, start here.</p>
<section id="why-run-mlai-in-the-cloud" class="level2">
<h2 class="anchored" data-anchor-id="why-run-mlai-in-the-cloud">Why run ML/AI in the cloud?</h2>
<p>You have ML/AI code that works on your laptop. But at some point you need more — a bigger GPU (or several), a dataset that won’t fit on disk, or the ability to run dozens of training experiments overnight. You could invest in local hardware or compete for time on a shared HPC cluster, but cloud platforms let you rent exactly the hardware you need, for exactly as long as you need it, and then shut it down.</p>
<section id="cloud-vs.-university-hpc-clusters" class="level3">
<h3 class="anchored" data-anchor-id="cloud-vs.-university-hpc-clusters">Cloud vs.&nbsp;university HPC clusters</h3>
<p>Most universities offer shared HPC clusters with GPUs. These are excellent resources — but they have tradeoffs worth understanding:</p>
<table class="caption-top table">
<colgroup>
<col style="width: 22%">
<col style="width: 41%">
<col style="width: 36%">
</colgroup>
<thead>
<tr class="header">
<th>Factor</th>
<th>University HPC</th>
<th>Cloud (AWS, GCP, Azure)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Cost</strong></td>
<td>Free or subsidized</td>
<td>Pay per hour</td>
</tr>
<tr class="even">
<td><strong>GPU availability</strong></td>
<td>Shared queue; wait times during peak periods and per-job runtime limits (often 24–72 hrs) that may require checkpointing long training runs</td>
<td>On-demand (subject to quota); jobs run as long as needed</td>
</tr>
<tr class="odd">
<td><strong>Hardware variety</strong></td>
<td>Fixed hardware refresh cycle (3–5 years)</td>
<td>Latest GPUs available immediately (A100, H100, B200)</td>
</tr>
<tr class="even">
<td><strong>Scaling</strong></td>
<td>Limited by cluster size</td>
<td>Spin up hundreds of jobs in parallel</td>
</tr>
<tr class="odd">
<td><strong>Multi-GPU / NVLink</strong></td>
<td>Sometimes available, depends on cluster</td>
<td>Available on demand — essential for training, fine-tuning, or serving large LLMs that don’t fit in a single GPU’s memory</td>
</tr>
<tr class="even">
<td><strong>Job orchestration</strong></td>
<td>Writing scheduler scripts, packaging environments, and wiring up parallel job arrays can take significant refactoring</td>
<td>Managed ML platforms (Vertex AI, SageMaker, Azure ML) handle provisioning, parallelism, and teardown</td>
</tr>
<tr class="odd">
<td><strong>Software environment</strong></td>
<td>Module system; some clusters support containers — research computing staff can often help with setup</td>
<td>Prebuilt containers for common ML frameworks (PyTorch, TensorFlow, XGBoost); bring your own Docker image for full control</td>
</tr>
</tbody>
</table>
<p><strong>The short version:</strong> use your university cluster when it has the hardware you need and the queue isn’t blocking you. Use the cloud when you need hardware your cluster doesn’t have, need to scale beyond what the queue allows, or need a specific software environment you can’t easily get on-campus. Many researchers use both — develop and test on HPC, then scale to cloud for large experiments or specialized hardware.</p>
</section>
<section id="when-does-model-size-justify-cloud-compute" class="level3">
<h3 class="anchored" data-anchor-id="when-does-model-size-justify-cloud-compute">When does model size justify cloud compute?</h3>
<p>Not every model needs cloud hardware. Here’s a rough guide:</p>
<table class="caption-top table">
<colgroup>
<col style="width: 24%">
<col style="width: 20%">
<col style="width: 29%">
<col style="width: 25%">
</colgroup>
<thead>
<tr class="header">
<th>Model scale</th>
<th>Parameters</th>
<th>Example models</th>
<th>Where to run</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Small</td>
<td>&lt; 10M</td>
<td>Logistic regression, small CNNs, XGBoost</td>
<td>Laptop — HPC cloud adds overhead without much benefit</td>
</tr>
<tr class="even">
<td>Medium</td>
<td>10M–500M</td>
<td>ResNets, BERT-base, mid-sized transformers</td>
<td>HPC with a single GPU (RTX 2080 Ti, L40) or cloud (T4, L4)</td>
</tr>
<tr class="odd">
<td>Large</td>
<td>500M–10B</td>
<td>GPT-2, LLaMA-7B, fine-tuning large transformers</td>
<td>HPC with A100 (40/80 GB) or cloud — both work well</td>
</tr>
<tr class="even">
<td>Very large</td>
<td>10B–70B</td>
<td>LLaMA-70B, Mixtral</td>
<td>HPC with H100/H200 (80–141 GB) or cloud</td>
</tr>
<tr class="odd">
<td>Frontier</td>
<td>70B+</td>
<td>GPT-4-scale, multi-expert models</td>
<td>Cloud — requires multi-node NVLink clusters beyond what most HPC queues offer</td>
</tr>
</tbody>
</table>
<p><strong>CHTC’s <a href="https://chtc.cs.wisc.edu/uw-research-computing/gpu-lab">GPU Lab</a> covers more than you might think.</strong> The GPU Lab includes A100s (40 and 80 GB), H100s (80 GB), and H200s (141 GB) — enough VRAM to run inference on models up to ~70B parameters with quantization, or to fine-tune smaller models on a single high-memory GPU. For many UW researchers, this hardware handles “large model” workloads without needing cloud. Note that CHTC GPUs are not NVLink-connected, so multi-GPU parallelism is limited to methods that don’t require fast inter-GPU communication. Jobs have time limits (12 hrs for short, 24 hrs for medium, 7 days for long jobs), so plan your runs accordingly.</p>
<p>Cloud becomes the clear choice when you need NVLink multi-GPU or multi-node setups for frontier-scale training or inference, long-running services like RAG applications or model endpoints that need to stay up beyond HPC job time limits, or when queue wait times are blocking a deadline.</p>
</section>
<section id="llm-apis-skip-the-infrastructure-entirely" class="level3">
<h3 class="anchored" data-anchor-id="llm-apis-skip-the-infrastructure-entirely">LLM APIs: skip the infrastructure entirely</h3>
<p>For many GenAI tasks, you don’t need to provision GPUs at all. Services like the OpenAI API, Google’s Vertex AI, and Amazon Bedrock let you call frontier models (GPT-4o, Gemini, Claude, etc.) with a simple API request — no GPU provisioning, no model hosting. LLM API calls cost fractions of a cent each and are often the fastest, most cost-effective path. See <a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/GenAI/GenAI-at-UW-Madison.html">GenAI at UW-Madison</a> for available services.</p>
</section>
<section id="a-note-on-cloud-costs" class="level3">
<h3 class="anchored" data-anchor-id="a-note-on-cloud-costs">A note on cloud costs</h3>
<p>Cloud computing is not free, but it’s worth putting costs in context:</p>
<ul>
<li><strong>Hardware is expensive and ages fast.</strong> A single A100 GPU costs ~$15,000 and is outdated within a few years. Cloud lets you rent the latest hardware by the hour.</li>
<li><strong>You pay only for what you use.</strong> Stop a VM and the meter stops — valuable for bursty research workloads. A single T4 GPU instance runs ~$1–3/hr. Fine-tuning a small model on a moderate dataset might cost $10–50.</li>
<li><strong>Managed services save development time.</strong> You don’t have to write scheduling logic, package custom containers, or maintain orchestration infrastructure — managed ML platforms handle that plumbing so you can focus on the ML.</li>
<li><strong>Budgets and alerts keep you safe.</strong> All three platforms offer billing dashboards and budget alerts to prevent surprise bills.</li>
</ul>
<p>The key habit: choose the right machine size, stop resources when idle, and monitor spending.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>Cloud isn’t the right fit for every workload. If you want to avoid cloud costs, UW’s <a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/CHTC.html">CHTC</a> offers free GPU access for batch jobs (though jobs are queued and have runtime limits). Many researchers use a mix of both.</p>
</div>
</div>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Tip
</div>
</div>
<div class="callout-body-container callout-body">
<p>There is a learning curve, as with any new tool. But UW-developed workshop materials are available to help you get started — see the Related resources at the bottom of this page.</p>
</div>
</div>
</section>
</section>
<section id="why-use-a-uw-provisioned-account" class="level2">
<h2 class="anchored" data-anchor-id="why-use-a-uw-provisioned-account">Why use a UW-provisioned account?</h2>
<p>A self-provisioned cloud account (e.g., one you create directly with Google or AWS) is a personal agreement between you and the vendor — it is <strong>not</strong> covered by UW-Madison’s institutional contracts. By going through the UW Public Cloud Team, you get:</p>
<ul>
<li><strong>Negotiated pricing</strong>: UW contracts leverage <a href="https://internet2.edu/cloud/cloud-solutions-community/net-plus/">Internet2 NET+</a> agreements and institutional reseller rates. For example, GCP accounts include a <a href="https://kb.wisc.edu/100173">network egress waiver</a> (up to 15% of your total bill), and Azure accounts receive ~3.5% off retail pricing.</li>
<li><strong>Lower overhead on grants</strong>: Normally, UW adds 55.5% in overhead (F&amp;A) to cloud expenses on grants. With a UW cloud account, that drops to 26% — so for every $10,000 you spend on cloud computing, you save about $2,950 in overhead. See the <a href="https://rsp.wisc.edu/proposalprep/cloudComputeInfo.cfm">Cloud Computing Pilot</a> for details.</li>
<li><strong>NIH STRIDES discounts</strong>: NIH-funded researchers get additional cloud pricing discounts (on top of the UW contract rates) through the <a href="https://kb.wisc.edu/109813">STRIDES Initiative</a>. The UW cloud team can transition you into or out of STRIDES at any time — no data migration needed.</li>
<li><strong>Business Associates Agreement (BAA)</strong>: UW’s contracts include a BAA that governs vendor access to your data, which is critical for HIPAA-regulated health data.</li>
<li><strong>Security monitoring</strong>: UW accounts benefit from Security Command Center monitoring with alerts escalated to the UW Cybersecurity Operations Team (CSOC).</li>
<li><strong>Baseline security configuration</strong>: Accounts come pre-configured to meet <a href="https://www.cisecurity.org/cis-benchmarks">CIS benchmark</a> standards with NetID authentication built in.</li>
<li><strong>Dedicated support</strong>: Get help from the DoIT Cloud Team via email (<a href="mailto:cloud-services@cio.wisc.edu">cloud-services@cio.wisc.edu</a>), <a href="https://kb.wisc.edu/101516">office hours</a>, and in-person/video consultations.</li>
</ul>
<p>For the full breakdown, see <a href="https://kb.wisc.edu/page.php?id=109785">Why Should I Use a UW Madison Public Cloud Account?</a> on the UW KnowledgeBase.</p>
</section>
<section id="paying-for-cloud-compute-with-grant-money" class="level2">
<h2 class="anchored" data-anchor-id="paying-for-cloud-compute-with-grant-money">Paying for cloud compute with grant money</h2>
<p>If you’re using grant funding to pay for cloud compute — from NIH, NSF, DOE, or any other sponsor — a UW-provisioned account can significantly reduce what your grant actually pays.</p>
<section id="lower-overhead-cloud-computing-pilot" class="level3">
<h3 class="anchored" data-anchor-id="lower-overhead-cloud-computing-pilot">Lower overhead (Cloud Computing Pilot)</h3>
<p>UW-Madison normally adds 55.5% in overhead (formally called “F&amp;A” or “facilities &amp; administrative costs”) to cloud expenses on grants. The <a href="https://rsp.wisc.edu/proposalprep/cloudComputeInfo.cfm">Cloud Computing Pilot</a> cuts that to 26% when you use a UW-provisioned cloud account. In practice, that means for every $10,000 in cloud spending, you’ll pay ~$2,600 in overhead instead of ~$5,550 — a savings of about $2,950.</p>
<ul>
<li>Applies to new proposals and awards (including new funding increments).</li>
<li>You must use a UW cloud account — costs paid via purchasing card or personal accounts are charged the full 55.5%.</li>
<li>RSP provides <a href="https://rsp.wisc.edu/proposalprep/cloudComputeInfo.cfm">budget templates</a> to help you plan proposals with the reduced rate.</li>
<li>Contact <a href="https://rsp.wisc.edu/">RSP</a> with questions about grant compliance.</li>
</ul>
</section>
<section id="nih-strides-initiative" class="level3">
<h3 class="anchored" data-anchor-id="nih-strides-initiative">NIH STRIDES Initiative</h3>
<p>If you have NIH funding specifically, you can get additional cloud discounts on top of the standard UW rates through the <a href="https://kb.wisc.edu/109813">STRIDES Initiative</a>. STRIDES covers AWS, GCP, and Azure:</p>
<ul>
<li>Discounted pricing on cloud services, layered on top of UW’s institutional rates.</li>
<li>Professional service consultations and technical support from STRIDES partners.</li>
<li>No data or configuration changes needed — the UW cloud team can transition you in or out at any time.</li>
</ul>
</section>
</section>
<section id="how-to-request-a-uw-cloud-account" class="level2">
<h2 class="anchored" data-anchor-id="how-to-request-a-uw-cloud-account">How to request a UW cloud account</h2>
<p>To get started with any of the three platforms:</p>
<ol type="1">
<li><strong>Get a DoIT Billing Customer ID</strong> — you’ll need this to tie your cloud usage to a funding source.</li>
<li><strong>Fill out the <a href="https://kb.wisc.edu/sbsedirbs/page.php?id=104090">UW-Madison Cloud Account Request Form</a></strong> — this covers AWS, GCP, and Azure. Indicate your intended data types and use case.</li>
<li><strong>For sensitive/restricted data</strong> — you must complete a <a href="https://kb.wisc.edu/115296">Cybersecurity risk assessment</a> before processing HIPAA, FERPA, or other regulated data in the cloud.</li>
</ol>
<p>Platform-specific details:</p>
<ul>
<li><a href="https://it.wisc.edu/services/amazon-web-services/">AWS service page</a> | <a href="https://kb.wisc.edu/data/page.php?id=65532">AWS pricing &amp; billing FAQ</a></li>
<li><a href="https://it.wisc.edu/services/google-cloud-platform/">GCP service page</a> | <a href="https://kb.wisc.edu/100173">GCP pricing</a> | <a href="https://kb.wisc.edu/data/100171">Requesting a GCP project</a></li>
<li><a href="https://it.wisc.edu/services/microsoft-azure/">Azure service page</a> | <a href="https://kb.wisc.edu/69212">Azure pricing</a></li>
</ul>
</section>
<section id="research-credits-training" class="level2">
<h2 class="anchored" data-anchor-id="research-credits-training">Research credits &amp; training</h2>
<section id="research-credits" class="level3">
<h3 class="anchored" data-anchor-id="research-credits">Research credits</h3>
<p>All three cloud providers offer credit programs for academic researchers:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Platform</th>
<th>Program</th>
<th>Amount</th>
<th>Eligibility</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>GCP</strong></td>
<td><a href="https://edu.google.com/intl/ALL_us/programs/credits/research/">Cloud Research Credits</a></td>
<td>Up to $5,000 (faculty/postdocs); $1,000 (PhD students)</td>
<td>Faculty, postdocs, non-profit researchers, PhD students</td>
</tr>
<tr class="even">
<td><strong>AWS</strong></td>
<td><a href="https://aws.amazon.com/government-education/research-and-technical-computing/cloud-credit-for-research/">Cloud Credit for Research</a></td>
<td>Varies by proposal</td>
<td>Researchers at accredited institutions; students may receive up to $5,000</td>
</tr>
<tr class="odd">
<td><strong>Azure</strong></td>
<td><a href="https://www.microsoft.com/en-us/azure-academic-research/">Azure for Research</a></td>
<td>Varies by proposal</td>
<td>Faculty, researchers, and graduate students at accredited institutions</td>
</tr>
<tr class="even">
<td><strong>Azure</strong></td>
<td><a href="https://microsoft.qualtrics.com/jfe/form/SV_3fl9dfFrkC3g0aG?aq_source=acom">Azure Quantum Credits</a></td>
<td>Up to $10,000</td>
<td>Project-by-project basis; evaluated on research, educational, or commercial value</td>
</tr>
</tbody>
</table>
<p>All three programs are rolling applications. You’ll need a research proposal describing your intended cloud usage and the specific services you plan to use.</p>
</section>
<section id="grants-for-social-impact-sustainability-research" class="level3">
<h3 class="anchored" data-anchor-id="grants-for-social-impact-sustainability-research">Grants for social impact &amp; sustainability research</h3>
<p>The major cloud providers also offer larger grants for research focused on public good — sustainability, environmental science, public health, education, and underserved communities:</p>
<ul>
<li><strong>Google</strong>: The <a href="https://opportunitydesk.org/2026/02/25/google-org-impact-challenge-ai-for-science-2026/">Google.org Impact Challenge: AI for Science</a> awards $500K–$3M for projects using AI to tackle scientific challenges, with a specific focus area on climate resilience and environmental science (biodiversity, agriculture, oceans). Applications open through April 17, 2026.</li>
<li><strong>AWS</strong>: The <a href="https://aws.amazon.com/government-education/nonprofits/aws-imagine-grant-program/">AWS Imagine Grant</a> provides up to $200K in unrestricted funding plus AWS credits to nonprofits and research organizations working on social impact. Past winners include sustainability, public health, and underserved community projects. The 2026–2027 cycle opens spring 2026.</li>
<li><strong>Microsoft</strong>: The <a href="https://www.microsoft.com/en-us/research/academic-program/ai-for-good-lab-open-call/">AI for Good Lab</a> runs open calls awarding Azure credits and scientific collaboration for projects in sustainability, public health, education, and human rights. Academic institutions are eligible. Microsoft also offers free access to petabytes of environmental data through the <a href="https://planetarycomputer.microsoft.com/">Planetary Computer</a>.</li>
</ul>
</section>
<section id="free-cloud-training" class="level3">
<h3 class="anchored" data-anchor-id="free-cloud-training">Free cloud training</h3>
<p>Each platform offers free, self-paced training to help you get started:</p>
<ul>
<li><strong>GCP</strong>: UW-Madison has a limited number of seats for <a href="https://www.cloudskillsboost.google/">Google Cloud Skills Boost</a> — reach out to the Public Cloud Team at <a href="mailto:cloud-services@cio.wisc.edu">cloud-services@cio.wisc.edu</a> to request access.</li>
<li><strong>AWS</strong>: <a href="https://skillbuilder.aws/">AWS Skill Builder</a> offers 600+ free courses covering compute, ML, and more.</li>
<li><strong>Azure</strong>: <a href="https://learn.microsoft.com/en-us/training/azure/">Microsoft Learn</a> provides free, structured learning paths for Azure services.</li>
</ul>
</section>
</section>
<section id="data-protection-compliance" class="level2">
<h2 class="anchored" data-anchor-id="data-protection-compliance">Data protection &amp; compliance</h2>
<p>UW-Madison classifies institutional data into four risk categories: <strong>Restricted, Sensitive, Internal, and Public</strong>. Cloud eligibility depends on data classification:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Data type</th>
<th>Cloud eligible?</th>
<th>Requirements</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Public / Internal</td>
<td>Yes</td>
<td>Standard UW cloud account</td>
</tr>
<tr class="even">
<td>Sensitive</td>
<td>Yes, with assessment</td>
<td><a href="https://kb.wisc.edu/115296">Cybersecurity risk assessment</a> required</td>
</tr>
<tr class="odd">
<td>Restricted (HIPAA, etc.)</td>
<td>Yes, with assessment</td>
<td>Risk assessment + risk executive approval + HIPAA-eligible services</td>
</tr>
</tbody>
</table>
<p>Key compliance resources:</p>
<ul>
<li><a href="https://kb.wisc.edu/itpolicy/page.php?id=59205">Data classification policy</a></li>
<li><a href="https://kb.wisc.edu/100124">Data elements allowed in public cloud</a></li>
<li><a href="https://kb.wisc.edu/115296">GCP for sensitive and restricted data</a></li>
<li><a href="https://kb.wisc.edu/data/page.php?id=115300">Shared responsibility model for cloud platforms</a></li>
<li><a href="https://it.wisc.edu/about/division-of-information-technology/enterprise-information-security-services/office-of-cybersecurity/hipaa-security-program/">HIPAA Security Program</a></li>
<li>SMPH researchers using Azure: contact <a href="mailto:platformx-support@mailplus.wisc.edu">platformx-support@mailplus.wisc.edu</a> about <a href="https://it.wisc.edu/services/microsoft-azure/">Platform X</a> for HIPAA workloads.</li>
</ul>
</section>
<section id="getting-help" class="level2">
<h2 class="anchored" data-anchor-id="getting-help">Getting help</h2>
<ul>
<li><strong>Office hours</strong>: The RCI and Public Cloud Team hold drop-in hours on <strong>Thursdays, 2–3:15 PM</strong> via <a href="https://kb.wisc.edu/101516">Zoom</a>. Open to the entire UW community.</li>
<li><strong>Cloud Community</strong>: Join the <a href="https://it.wisc.edu/research-ci/building-cloud-community-at-uw-madison/">UW Cloud Community</a> group — they meet every other month to share cloud computing experiences and tips.</li>
<li><strong>Email</strong>: <a href="mailto:cloud-services@cio.wisc.edu">cloud-services@cio.wisc.edu</a></li>
<li><strong>Public Cloud KnowledgeBase</strong>: <a href="https://kb.wisc.edu/page.php?id=109785">kb.wisc.edu</a> — FAQs, pricing info, and how-to guides.</li>
<li><strong>ML+X Community</strong>: Join <a href="https://hub.datascience.wisc.edu/communities/mlx/">ML+X</a> for monthly meetings on machine learning and AI.</li>
</ul>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-GCP.html"><strong>Workshop/Compute</strong>: Intro to GCP for Machine Learning &amp; AI</a> – Hands-on workshop covering Vertex AI, model training/tuning, and RAG pipelines on GCP.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-Amazon_SageMaker.html"><strong>Workshop/Compute</strong>: Intro to AWS SageMaker for Predictive ML/AI</a> – Workshop covering ML workflows in AWS SageMaker.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/GoogleColab.html"><strong>Compute</strong>: Google Colab</a> – Free cloud-based Jupyter notebooks with GPU access.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/CHTC.html"><strong>Compute</strong>: Center for High Throughput Computing (CHTC)</a> – Free on-campus HPC/HTC resources for UW researchers.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/BadgerCompute.html"><strong>Compute</strong>: BadgerCompute</a> – UW-Madison’s lightweight, NetID-authenticated Jupyter service.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/GenAI/GenAI-at-UW-Madison.html"><strong>GenAI</strong>: UW Generative AI Services &amp; Policies</a> – Overview of UW-vetted AI tools including pay-as-you-go cloud AI services.</li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Compute</category>
  <category>Cloud</category>
  <category>UW-Madison</category>
  <category>AWS</category>
  <category>GCP</category>
  <category>Azure</category>
  <category>GPU</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/UW-Cloud-Services.html</guid>
  <pubDate>Mon, 02 Mar 2026 00:00:00 GMT</pubDate>
  <media:content url="https://uw-madison-datascience.github.io/ML-X-Nexus/images/uw-logo-vertical-color-web-digital.png" medium="image" type="image/png" height="95" width="144"/>
</item>
<item>
  <title>SWE-bench: Evaluating AI on Real-World Software Engineering</title>
  <dc:creator>Chris Endemann</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Benchmarks/SWE-Bench.html</link>
  <description><![CDATA[ 




<p><a href="https://www.swebench.com/">SWE-bench</a> is a benchmark designed to evaluate whether AI models can solve real-world software engineering tasks. Rather than testing code generation in isolation, SWE-bench presents models with actual GitHub issues from popular open-source Python repositories and asks them to produce a patch that resolves the issue and passes the associated test suite.</p>
<p>The benchmark was introduced in the 2024 paper <a href="https://arxiv.org/abs/2310.06770"><em>SWE-bench: Can Language Models Resolve Real-World GitHub Issues?</em></a> by Carlos E. Jimenez et al.&nbsp;at Princeton University.</p>
<section id="how-it-works" class="level2">
<h2 class="anchored" data-anchor-id="how-it-works">How it works</h2>
<p>Each SWE-bench task consists of:</p>
<ul>
<li><strong>A GitHub issue description</strong> — the natural-language problem statement as written by the original issue author.</li>
<li><strong>A codebase snapshot</strong> — the state of the repository at the time the issue was filed.</li>
<li><strong>A gold patch and test suite</strong> — the model’s output is evaluated by checking whether it passes the same tests used to validate the human-authored fix.</li>
</ul>
<p>Models are scored on <strong>% resolved</strong> — the fraction of issues where the generated patch passes the full test suite. This makes SWE-bench more rigorous than benchmarks that only check if code compiles or passes a single test case.</p>
</section>
<section id="swe-bench-verified" class="level2">
<h2 class="anchored" data-anchor-id="swe-bench-verified">SWE-bench Verified</h2>
<p>The original SWE-bench dataset contains 2,294 tasks, but not all of them are well-specified or reliably solvable. To address this, <a href="https://openai.com/index/introducing-swe-bench-verified/">OpenAI collaborated with the SWE-bench team</a> to create <strong>SWE-bench Verified</strong> — a human-filtered subset of 500 tasks where annotators confirmed that:</p>
<ul>
<li>The issue description contains enough information to identify the problem.</li>
<li>The test suite reliably validates correct solutions.</li>
<li>The task is not ambiguous or under-specified.</li>
</ul>
<p>SWE-bench Verified is now the standard subset used for most leaderboard comparisons.</p>
</section>
<section id="current-state-of-the-leaderboard-early-2025" class="level2">
<h2 class="anchored" data-anchor-id="current-state-of-the-leaderboard-early-2025">Current state of the leaderboard (early 2025)</h2>
<p>On the <a href="https://www.swebench.com/#test">Bash Only leaderboard</a> — which evaluates all models on SWE-bench Verified using the same shell-based interface — the top models are resolving around 74% of issues:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Model</th>
<th>% Resolved (Verified)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Claude 4.5 Opus (medium)</td>
<td>74.40%</td>
</tr>
<tr class="even">
<td>Gemini 3 Pro Preview</td>
<td>74.20%</td>
</tr>
<tr class="odd">
<td>Claude 4.5 Sonnet</td>
<td>70.60%</td>
</tr>
<tr class="even">
<td>Claude 4 Opus (May 2025)</td>
<td>67.60%</td>
</tr>
<tr class="odd">
<td>GPT-5 (medium reasoning)</td>
<td>65.00%</td>
</tr>
</tbody>
</table>
<p>These numbers have been climbing quickly — for context, the best scores were around 50% in early 2024.</p>
</section>
<section id="interpreting-the-results" class="level2">
<h2 class="anchored" data-anchor-id="interpreting-the-results">Interpreting the results</h2>
<p>It’s tempting to read “74% resolved” as meaning AI can fix 74% of real-world software bugs, but several important caveats apply:</p>
<ul>
<li><strong>Curated subset</strong>: SWE-bench Verified deliberately filters out ambiguous, under-documented, or hard-to-test issues. Real-world GitHub issues are messier.</li>
<li><strong>Issue specification quality</strong>: In practice, much of the difficulty in software engineering lies in understanding vague requirements, reproducing bugs, and navigating large unfamiliar codebases. SWE-bench tasks are relatively well-scoped.</li>
<li><strong>Single-repo Python focus</strong>: The benchmark currently draws from a set of well-maintained Python libraries (e.g., Django, scikit-learn, sympy). Generalization to other languages, less-documented codebases, or proprietary software is an open question.</li>
<li><strong>No deployment or integration testing</strong>: SWE-bench tests whether a patch passes unit/integration tests, not whether it would be accepted in a real code review or function correctly at scale.</li>
</ul>
<section id="the-self-driving-car-analogy" class="level3">
<h3 class="anchored" data-anchor-id="the-self-driving-car-analogy">The self-driving car analogy</h3>
<p>The trajectory of SWE-bench scores is reminiscent of autonomous driving predictions circa 2015–2017, when rapid progress on structured benchmarks led many companies to predict full autonomy was just a year or two away. A decade later, the long tail of edge cases turned out to be the hardest part.</p>
<p>Similarly, while the pace of improvement on SWE-bench is genuinely impressive, the remaining 25–30% of unresolved issues — and the much larger space of tasks not captured by the benchmark — may prove disproportionately difficult. Benchmarks measure a specific, well-defined slice of capability, and the gap between benchmark performance and reliable, general-purpose software engineering likely remains significant.</p>
</section>
</section>
<section id="why-it-matters" class="level2">
<h2 class="anchored" data-anchor-id="why-it-matters">Why it matters</h2>
<p>Despite these caveats, SWE-bench provides a useful signal for tracking progress in AI-assisted software engineering. It tests end-to-end problem-solving (reading an issue, understanding a codebase, writing a correct fix) rather than narrow code completion, making it one of the more meaningful benchmarks for evaluating practical coding ability.</p>
<p>For researchers and practitioners in ML, SWE-bench offers:</p>
<ul>
<li><strong>A rough barometer</strong> for how quickly AI coding capabilities are improving.</li>
<li><strong>A reality check</strong> on what “AI can code” actually means today — useful for calibrating expectations when adopting AI tools.</li>
<li><strong>An evaluation framework</strong> that can be adapted for domain-specific benchmarks (e.g., testing AI on bioinformatics pipelines or data analysis workflows).</li>
</ul>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/GenAI/GenAI-at-UW-Madison.html"><strong>GenAI</strong>: GenAI at UW-Madison</a>: Overview of generative AI tools and resources available at UW-Madison.</li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Benchmarking</category>
  <category>Agentic coding</category>
  <category>LLM</category>
  <category>GenAI</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Benchmarks/SWE-Bench.html</guid>
  <pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate>
  <media:content url="https://uw-madison-datascience.github.io/ML-X-Nexus/images/SWE_bench.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>OpenScholar: Scientific Literature Synthesis with Retrieval-Augmented LMs</title>
  <dc:creator>Chris Endemann</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/GenAI/OpenScholar.html</link>
  <description><![CDATA[ 




<p><a href="https://openscholar.allen.ai/">OpenScholar</a> is an open-source, retrieval-augmented language model (LM) designed to help researchers navigate and synthesize scientific literature. Developed by the Allen Institute for AI (AI2) and the University of Washington, OpenScholar answers scientific queries by searching a datastore of 45 million open-access papers, retrieving relevant passages, and generating citation-backed responses. The work was <a href="https://www.nature.com/articles/s41586-025-10072-4">published in <em>Nature</em></a> in February 2026.</p>
<p>Unlike general-purpose LLMs that frequently hallucinate citations (GPT-4o hallucinates citations 78-90% of the time), OpenScholar achieves citation accuracy on par with human experts. In human evaluations conducted by 16 PhD-level experts, OpenScholar’s responses were preferred over expert-written ones 51% of the time for the 8B variant and 70% of the time for the GPT-4o-augmented variant.</p>
<section id="key-features" class="level4">
<h4 class="anchored" data-anchor-id="key-features">Key features</h4>
<ul>
<li><strong>Retrieval-augmented generation over 45M papers</strong>: OpenScholar searches a datastore of 45 million open-access papers (~236 million passage embeddings) drawn from <a href="https://www.semanticscholar.org/">Semantic Scholar</a>, ensuring responses are grounded in real, retrievable literature rather than model memory.</li>
<li><strong>Iterative self-feedback inference</strong>: At inference time, OpenScholar uses a self-feedback loop to iteratively refine its outputs — each iteration retrieves additional papers, improving factuality, coverage, and citation accuracy through natural language feedback.</li>
<li><strong>Highly accurate citations</strong>: While GPT-4o hallucinates the vast majority of its cited papers, OpenScholar’s retrieval-first design ensures all citations correspond to real, retrievable sources.</li>
<li><strong>Fully open-source</strong>: All code, model checkpoints, retriever/reranker weights, retrieval index, training data, and evaluation benchmarks are publicly available — the first complete open release of a scientific assistant LM pipeline.</li>
</ul>
</section>
<section id="model-variants-and-sizes" class="level4">
<h4 class="anchored" data-anchor-id="model-variants-and-sizes">Model variants and sizes</h4>
<p>OpenScholar can be used with different underlying language models:</p>
<ul>
<li><strong>OpenScholar-8B (OS-8B)</strong>: A fine-tuned version of <a href="https://huggingface.co/meta-llama/Llama-3.1-8B">Llama 3.1 8B</a>, optimized for scientific literature synthesis. This is the flagship open-weight model. Available on <a href="https://huggingface.co/OpenSciLM/Llama-3.1_OpenScholar-8B">Hugging Face</a>. Despite its compact size, it outperforms GPT-4o by 6.1% in correctness on multi-paper synthesis tasks, and is <strong>100x more cost-efficient</strong> than comparable systems like PaperQA2.</li>
<li><strong>OpenScholar-GPT4o (OS-GPT4o)</strong>: The OpenScholar pipeline (datastore, retriever, reranker, and self-feedback loop) applied on top of GPT-4o. This variant improves GPT-4o’s correctness by 12% and raises citation F1 from 0.1 to 39.5, demonstrating how the pipeline enhances any off-the-shelf LLM.</li>
<li><strong>OpenScholar-70B (OS-70B)</strong>: The pipeline applied using Llama 3.1 70B as the underlying generator, offering a middle ground between the compact 8B model and proprietary API-based options.</li>
</ul>
</section>
<section id="how-the-8b-model-was-trained" class="level4">
<h4 class="anchored" data-anchor-id="how-the-8b-model-was-trained">How the 8B model was trained</h4>
<p>The OpenScholar-8B model was trained using the same self-feedback pipeline used at inference time, but repurposed for synthetic data generation:</p>
<ol type="1">
<li><strong>Curated abstracts</strong>: Starting from 1 million curated scientific paper abstracts from the datastore.</li>
<li><strong>Synthetic data generation</strong>: The self-feedback loop was used to generate 130,000 high-quality training instances, where the model iteratively refined its own outputs with retrieval feedback.</li>
<li><strong>Instruction tuning</strong>: The final 13K instruction-tuning dataset (OS_Train_Data) was used to fine-tune Llama 3.1 8B using a modified version of <a href="https://github.com/pytorch/torchtune">torchtune</a> on 8x A100 GPUs.</li>
</ol>
<p>This approach allows a compact 8B model to achieve performance competitive with much larger proprietary models by distilling the quality of the iterative self-feedback pipeline into the model weights.</p>
</section>
<section id="evaluation-scholarqabench" class="level4">
<h4 class="anchored" data-anchor-id="evaluation-scholarqabench">Evaluation: ScholarQABench</h4>
<p>To rigorously evaluate scientific literature synthesis, the authors created <a href="https://github.com/AkariAsai/ScholarQABench">ScholarQABench</a>, the first large-scale multi-domain benchmark for this task:</p>
<ul>
<li><strong>2,967 expert-written queries</strong> and <strong>208 long-form answers</strong> across four domains: computer science, physics, neuroscience, and biomedicine.</li>
<li>Evaluation metrics include <strong>correctness</strong>, <strong>citation accuracy</strong> (are cited papers real and relevant?), <strong>coverage</strong> (does the response address all aspects of the query?), and <strong>writing quality</strong>.</li>
<li>Human evaluations were conducted by <strong>16 experts with PhDs</strong> across 108 questions, providing gold-standard comparisons between model and expert-written responses.</li>
</ul>
<p><strong>Key results on ScholarQABench:</strong></p>
<table class="caption-top table">
<colgroup>
<col style="width: 9%">
<col style="width: 30%">
<col style="width: 23%">
<col style="width: 36%">
</colgroup>
<thead>
<tr class="header">
<th>Model</th>
<th>Correctness vs.&nbsp;GPT-4o</th>
<th>Citation quality</th>
<th>Human preference vs.&nbsp;expert</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>GPT-4o (no retrieval)</td>
<td>baseline</td>
<td>Hallucinates 78-90% of citations</td>
<td>Preferred 32% of the time</td>
</tr>
<tr class="even">
<td>OpenScholar-8B</td>
<td>+6.1%</td>
<td>On par with human experts</td>
<td>Preferred 51% of the time</td>
</tr>
<tr class="odd">
<td>OpenScholar-GPT4o</td>
<td>+12%</td>
<td>Citation F1: 0.1 → 39.5</td>
<td>Preferred 70% of the time</td>
</tr>
</tbody>
</table>
</section>
<section id="genai-use-at-uw-madison" class="level4">
<h4 class="anchored" data-anchor-id="genai-use-at-uw-madison">GenAI use at UW-Madison</h4>
<p>UW–Madison faculty, staff, students, and affiliates are required to follow <a href="https://it.wisc.edu/generative-ai-uw-madison-use-policies/">campus policies relevant to AI use</a>. Uses of <a href="https://it.wisc.edu/statement-on-use-of-generative-ai/">generative AI</a> that are explicitly prohibited by policy include, but are not limited to, the following:</p>
<ul>
<li>Entering any sensitive, restricted or otherwise protected institutional data – including hard-coded passwords – into any generative AI tool or service;</li>
<li>Using AI-generated code for institutional IT systems or services without review by a human to verify the absence of malicious elements;</li>
<li>Using generative AI to violate laws; institutional policies, rules or guidelines; or agreements or contracts.</li>
</ul>
</section>
<section id="potential-use-cases" class="level4">
<h4 class="anchored" data-anchor-id="potential-use-cases">Potential use cases</h4>
<ul>
<li><strong>Literature reviews</strong>: Rapidly synthesize the state of research on a topic with properly cited sources, saving hours of manual search and reading. Particularly useful for getting up to speed in unfamiliar fields.</li>
<li><strong>Research question exploration</strong>: Ask nuanced scientific questions and receive grounded answers that point you to the most relevant papers, helping identify gaps and opportunities in the literature.</li>
<li><strong>Grant writing and proposals</strong>: Quickly gather and cite supporting evidence for research proposals, ensuring claims are backed by real, verifiable literature.</li>
<li><strong>Cross-disciplinary research</strong>: Explore connections between fields (e.g., neuroscience and computer science) by querying across OpenScholar’s multi-domain datastore of 45 million papers.</li>
<li><strong>Teaching and mentoring</strong>: Help students and early-career researchers learn to navigate scientific literature effectively, with a tool that models good citation practices.</li>
</ul>
</section>
<section id="links" class="level4">
<h4 class="anchored" data-anchor-id="links">Links</h4>
<ul>
<li><strong>Paper</strong>: <a href="https://arxiv.org/abs/2411.14199">arXiv:2411.14199</a> | <a href="https://www.nature.com/articles/s41586-025-10072-4">Nature</a></li>
<li><strong>Demo</strong>: <a href="https://openscholar.allen.ai/">openscholar.allen.ai</a></li>
<li><strong>Hugging Face model (8B)</strong>: <a href="https://huggingface.co/OpenSciLM/Llama-3.1_OpenScholar-8B">OpenSciLM/Llama-3.1_OpenScholar-8B</a></li>
<li><strong>Hugging Face retriever</strong>: <a href="https://huggingface.co/OpenSciLM/OpenScholar_Retriever">OpenSciLM/OpenScholar_Retriever</a></li>
<li><strong>GitHub</strong>: <a href="https://github.com/AkariAsai/OpenScholar">AkariAsai/OpenScholar</a></li>
<li><strong>Project page</strong>: <a href="https://openscilm.allen.ai/">openscilm.allen.ai</a></li>
</ul>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/GenAI/NotebookLM.html"><strong>GenAI</strong>: NotebookLM</a>: Another GenAI summarization tool, useful for quickly digesting individual papers and generating audio summaries.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Notebooks/2025-05-07_RAG-Romeo-Juliet.html"><strong>Notebook</strong>: RAG with Romeo and Juliet</a>: A hands-on tutorial on retrieval-augmented generation, the technique that underpins OpenScholar.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Applications/Videos/Forums/mlx_2026-02-17.html"><strong>Forum</strong>: Deploying RAG in Bedrock vs.&nbsp;Local</a>: A case study comparing cloud-based and local RAG deployment strategies.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Libraries/HuggingFace.html"><strong>Library</strong>: Hugging Face</a>: The platform hosting OpenScholar’s model weights and retriever — learn more about the Hub, pipelines, and model hosting.</li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>GenAI</category>
  <category>NLP</category>
  <category>LLM</category>
  <category>RAG</category>
  <category>Retrieval</category>
  <category>Foundation models</category>
  <category>Hugging Face</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/GenAI/OpenScholar.html</guid>
  <pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate>
  <media:content url="https://uw-madison-datascience.github.io/ML-X-Nexus/images/OpenScholar.png" medium="image" type="image/png" height="62" width="144"/>
</item>
<item>
  <title>Deploying RAG in Bedrock vs. Local: WattBot 2025 Case Study</title>
  <dc:creator>Nils Matteson</dc:creator>
  <dc:creator>Blaise Enuh</dc:creator>
  <dc:creator>Chris Endemann</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Applications/Videos/Forums/mlx_2026-02-17.html</link>
  <description><![CDATA[ 




<p>Many researchers are exploring retrieval-augmented generation (RAG) to build document-grounded, trustworthy AI tools, but it is often unclear how design choices around models, infrastructure, and deployment play out in practice. In this session, we present lessons learned from replicating the winning RAG system from the WattBot 2025 challenge. The challenge focuses on producing citation-backed energy and sustainability estimates for AI workloads from a fixed corpus of 30+ academic papers — or explicitly abstaining when evidence is missing. After a short overview of the winning approach, <a href="https://nilsmatteson.com/">Nils Matteson</a> and <a href="https://blaiseenuh.com/">Blaise Enuh</a> walk through how the system is implemented in practice, including:</p>
<ol type="1">
<li>A cloud deployment using AWS Bedrock<br>
</li>
<li>Local, open-source deployments (e.g., Hugging Face models on GB10 and Dell PowerEdge R7725 hardware)</li>
</ol>
<p>The session compares performance, cost, latency, and operational tradeoffs across environments. It also includes a Streamlit-based interface demo for those looking to host their own RAG apps.</p>
<p><em>This work was conducted as part of ongoing AI infrastructure evaluation within the <a href="https://it.wisc.edu/about/division-of-information-technology/research-cyberinfrastructure/">Research Cyberinfrastructure (RCI)</a> office in DoIT.</em></p>
<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/WYSzI3WZmKo" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<section id="links" class="level3">
<h3 class="anchored" data-anchor-id="links">Links</h3>
<ul>
<li><strong>GitHub</strong>: <a href="https://github.com/matteso1/KohakuRAG_UI/">WattBot in Bedrock and Local</a></li>
<li><strong>Kaggle challenge</strong>: <a href="https://www.kaggle.com/competitions/WattBot2025/overview">WattBot 2025</a></li>
<li><strong>Winning solution</strong>: <a href="https://github.com/KohakuBlueleaf/KohakuRAG">KohakuBlueleaf/KohakuRAG</a></li>
<li><strong>Annual hackathon</strong>: <a href="https://ml-marathon.wisc.edu/">Machine Learning Marathon</a>: Learn about the annual Machine Learning Marathon (3-month AI/ML hackathon) hosted by ML+X each fall. Reach out to <a href="mailto:endemann@wisc.edu">Chris</a> if you’d like to submit a project!</li>
</ul>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Projects/ML-Marathon/WattBot-2025.html"><strong>Project</strong>: WattBot 2025</a>: Full project page for the WattBot ML Marathon challenge, including challenge design, winning approach, and related resources.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Applications/Videos/Forums/mlx_2025-09-09.html"><strong>Talk</strong>: AI’s Environmental Footprint: Insights and Actions</a>: The previous ML+X forum where WattBot was first introduced, covering AI sustainability measurement and the RAG-based challenge design.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Notebooks/2025-05-07_RAG-Romeo-Juliet.html"><strong>Notebook</strong>: Exploring Fact-Based QA with RAG: Romeo and Juliet</a>: Learn how to build an end-to-end RAG pipeline from scratch using sentence-transformers and Hugging Face models.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-Amazon_SageMaker.html"><strong>Workshop</strong>: Intro to AWS SageMaker for Predictive ML/AI</a>: Hands-on workshop covering AWS SageMaker and Bedrock for cloud-based ML/AI workflows, including RAG deployment.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/GenAI/GenAI-at-UW-Madison.html"><strong>Resource</strong>: UW Generative AI Services &amp; Policies</a>: Overview of vetted GenAI tools at UW-Madison, including AWS Bedrock access.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/GenAI/OpenScholar.html"><strong>GenAI</strong>: OpenScholar</a>: A fully open, retrieval-augmented language model for searching and synthesizing scientific literature — relevant as an alternative RAG architecture for citation-grounded answers.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Libraries/HuggingFace.html"><strong>Library</strong>: Hugging Face</a>: The local deployment in this talk uses Hugging Face models — learn more about the platform and how to find and run open-source models.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Notebooks/Quantization-and-Precision.html"><strong>Notebook</strong>: Understanding Quantization and Precision</a>: Learn how quantization and floating-point precision (FP32, FP16, INT8, 4-bit) affect GPU memory and inference speed — directly relevant to running models locally on constrained hardware.</li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Videos</category>
  <category>ML+X</category>
  <category>UW-Madison</category>
  <category>RAG</category>
  <category>Retrieval</category>
  <category>LLM</category>
  <category>Cloud</category>
  <category>AWS</category>
  <category>Bedrock</category>
  <category>Hugging Face</category>
  <category>Foundation models</category>
  <category>GenAI</category>
  <category>Sustainability</category>
  <category>Energy</category>
  <category>GPU</category>
  <category>Deep learning</category>
  <category>NLP</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Applications/Videos/Forums/mlx_2026-02-17.html</guid>
  <pubDate>Tue, 17 Feb 2026 00:00:00 GMT</pubDate>
  <media:content url="https://img.youtube.com/vi/WYSzI3WZmKo/maxresdefault.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Claude Code Cloud Setup Guide (Vertex AI &amp; Bedrock)</title>
  <dc:creator>Chris Endemann</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Guides/claude-code-cloud-setup.html</link>
  <description><![CDATA[ 




<p>This guide walks UW-Madison researchers and staff through setting up <a href="https://code.claude.com/">Claude Code</a> with a cloud provider — either <a href="https://cloud.google.com/vertex-ai">Google Vertex AI</a> or <a href="https://aws.amazon.com/bedrock/">Amazon Bedrock</a>. It covers Windows and macOS, with provider-specific steps in tabs so you can follow whichever path matches your institutional account.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>This guide reflects the author’s understanding as of the last date modified
</div>
</div>
<div class="callout-body-container callout-body">
<p>AI tools, pricing, features, and contractual terms change frequently. This guide is <strong>community guidance, not official UW-Madison policy</strong>. For the latest institutional policies, data-use agreements, or questions about what data types are permitted with specific tools, consult <a href="https://it.wisc.edu/about/division-of-information-technology/research-cyberinfrastructure/">UW-Madison Research Cyberinfrastructure</a> or your department’s IT office.</p>
</div>
</div>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>New to agentic coding?
</div>
</div>
<div class="callout-body-container callout-body">
<p>See our companion guide, <a href="../../Learn/Blogs/claude-code-best-practices.html">Claude Code Best Practices</a>, for a broader introduction to agentic coding tools and how to use them effectively.</p>
</div>
</div>
<div class="callout callout-style-default callout-important callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Important</span>CLI only — cloud providers don’t work with the Desktop app or Web IDE
</div>
</div>
<div class="callout-body-container callout-body">
<p>Vertex AI and Bedrock routing is only supported in the <strong>Claude Code CLI</strong> (terminal) and <strong>IDE extensions</strong> (VS Code, JetBrains). The <a href="https://code.claude.com/docs/en/desktop">Claude Desktop app</a> (Code tab) and <a href="https://claude.ai/code">Web IDE</a> require a direct Anthropic login (Max plan or API credits) — they cannot be configured to route through a cloud provider at this time. If you need the desktop GUI or web experience, you’ll need a separate Claude Max subscription. This guide covers the CLI path.</p>
</div>
</div>
<div class="callout callout-style-default callout-warning callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Warning</span>Use caution with restricted or sensitive data
</div>
</div>
<div class="callout-body-container callout-body">
<p>UW-Madison does <strong>not</strong> currently have a direct data-use or privacy agreement with Anthropic. While UW has agreements with cloud providers (Google, AWS, Microsoft) for their respective services, those agreements <strong>do not extend to Anthropic’s data handling</strong> — your prompts still reach Anthropic’s infrastructure for inference regardless of which cloud provider routes the request.</p>
<p><strong>What this means in practice:</strong></p>
<ul>
<li><strong>Do not use Claude Code with restricted data</strong> (HIPAA/PHI, FERPA, CUI, export-controlled, or data under a DUA that prohibits third-party processing) until a formal UW-Anthropic agreement is in place.</li>
<li><strong>Avoid running Claude Code on machines where restricted data is stored.</strong> Even with sandboxing enabled, Claude can still <strong>read</strong> files outside your project directory — sandboxing only restricts <em>writes</em>. Anything Claude reads can be sent to Anthropic’s servers as part of a prompt. If you must run Claude Code on a machine with sensitive files, configure <strong>deny rules</strong> to block read access to sensitive paths (see Section 7) — but note that <code>Read</code> deny rules are best-effort and can be bypassed by allowed Bash commands like <code>cat</code>. The safest approach is to keep restricted data off machines where Claude Code runs.</li>
<li><strong>Always enable sandboxing</strong> (Section 8) — it prevents unintended <em>writes</em> outside your project and adds network isolation. These are valuable protections even though read access isn’t fully restricted.</li>
<li><strong>The <a href="https://claude.ai/code">web version</a> offers the strongest isolation</strong> — each task runs in a fresh, ephemeral VM that can only access your cloned GitHub repo. It cannot reach your local filesystem, SSH keys, or other local resources, and storage is wiped when the task completes. This eliminates the local file-read risk entirely. However, the web version requires a paid Anthropic subscription (Pro $20/month through Max $200/month), which currently must be paid out of pocket — UW does not yet have institutional Anthropic billing in place.</li>
<li><strong>For general/non-sensitive research code</strong>, cloud-routed Claude Code is fine to use today — your code is not used for model training under commercial terms, and telemetry is disabled by default for third-party providers.</li>
</ul>
<p>UW-Madison is actively exploring institutional Anthropic licenses, credits, and data agreements to make this easier. We’ll update this guide as the situation evolves.</p>
</div>
</div>
<section id="why-use-a-cloud-provider-instead-of-a-direct-subscription" class="level2">
<h2 class="anchored" data-anchor-id="why-use-a-cloud-provider-instead-of-a-direct-subscription">Why use a cloud provider instead of a direct subscription?</h2>
<p>You can use Claude Code with a personal Claude Max subscription ($100–$200/month, unlimited usage) — no cloud setup required. So why go through the extra complexity of routing through a cloud provider?</p>
<p><strong>What cloud providers give you:</strong></p>
<ul>
<li><strong>Institutional billing and cost tracking</strong> — charges go to your UW-Madison cloud project, not your personal credit card. This matters for grant-funded research, shared lab budgets, and institutional procurement. UW-Madison has <a href="../../Toolbox/Compute/UW-Cloud-Services.html">negotiated cloud discounts</a> with both GCP and AWS, including reduced overhead on grants (26% vs 55.5% F&amp;A) and NIH STRIDES pricing.</li>
<li><strong>Enterprise security controls</strong> — both GCP and AWS let you manage IAM permissions, control network traffic, and audit API usage within <em>your</em> cloud project. Note that these controls govern access to <em>your cloud resources</em> — they do not change how Anthropic handles your data once it reaches their infrastructure.</li>
</ul>
<p><strong>What cloud providers do NOT change:</strong></p>
<ul>
<li><strong>Your prompts still reach Anthropic’s infrastructure.</strong> Neither Vertex AI nor Bedrock runs the model inside your cloud project — your provider routes requests to Anthropic’s serving infrastructure. This means your data is processed by Anthropic regardless of which cloud provider you use.</li>
<li><strong>UW’s cloud agreements don’t cover Anthropic’s data handling.</strong> UW-Madison has agreements with Google, AWS, and Microsoft for their respective services, but those agreements <strong>do not extend to Anthropic</strong>. There is no UW-Anthropic data-use agreement in place yet, so routing through a cloud provider does not give your data the same protections it would have for a native cloud service. See the warning above for practical guidance.</li>
<li><strong>Data storage location</strong> is not controlled by your deployment choice. Anthropic stores data in the US, with processing distributed across multiple regions for reliability. If you need geographic control over where <em>inference</em> runs, that’s a separate <a href="https://platform.claude.com/docs/en/build-with-claude/data-residency">data residency</a> feature, not a cloud provider feature.</li>
<li><strong>Training exclusion still applies.</strong> Under Anthropic’s commercial terms (including both Vertex AI and Bedrock), your prompts and outputs are <a href="https://privacy.claude.com/en/articles/10023580-is-my-data-used-for-model-training">not used for model training</a>. This is a positive, but it is distinct from having a comprehensive data-use agreement. See Data Usage &amp; Privacy for full details on retention, telemetry, and opt-outs.</li>
</ul>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong>Bottom line:</strong> If you just want to try Claude Code personally, a Claude Max subscription is simpler and often cheaper for heavy individual use. Cloud providers make sense when you need institutional billing or enterprise security controls. However, neither approach currently provides UW-sanctioned data protections for restricted or sensitive data — UW is actively working toward an institutional Anthropic agreement. For now, <strong>avoid running Claude Code on machines that contain restricted data</strong>, since even sandboxing doesn’t prevent reads — and <strong>always enable sandboxing</strong> to at least restrict writes and network access. UW-Madison <a href="../../Toolbox/Compute/UW-Cloud-Services.html">provisions both GCP and AWS accounts</a> for research groups — running Claude Code through either provider is just one of many uses.</p>
</div>
</div>
</section>
<section id="already-set-up-returning-user-quick-start" class="level2">
<h2 class="anchored" data-anchor-id="already-set-up-returning-user-quick-start">Already set up? Returning User Quick-Start</h2>
<p>If you’ve already completed the full setup and just need to start a new session:</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">Vertex AI</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">Amazon Bedrock</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># If your credentials have expired (you'll get auth errors):</span></span>
<span id="cb1-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">gcloud</span> auth application-default login</span>
<span id="cb1-3"></span>
<span id="cb1-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Navigate to your project and launch</span></span>
<span id="cb1-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> ~/projects/my-project   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># macOS / Linux / WSL2 (recommended for Windows)</span></span>
<span id="cb1-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># cd C:\Users\you\projects\my-project   # Windows PowerShell (no sandbox support)</span></span>
<span id="cb1-7"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">claude</span></span></code></pre></div></div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># If your credentials have expired (you'll get auth errors):</span></span>
<span id="cb2-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">aws</span> sso login <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--profile</span> your-profile</span>
<span id="cb2-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Or re-export your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY</span></span>
<span id="cb2-4"></span>
<span id="cb2-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Navigate to your project and launch</span></span>
<span id="cb2-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> ~/projects/my-project   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># macOS / Linux / WSL2 (recommended for Windows)</span></span>
<span id="cb2-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># cd C:\Users\you\projects\my-project   # Windows PowerShell (no sandbox support)</span></span>
<span id="cb2-8"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">claude</span></span></code></pre></div></div>
</div>
</div>
</div>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong>Windows users:</strong> For the best experience — including OS-level sandboxing — run Claude Code from <strong>WSL2</strong> instead of PowerShell. Native Windows does not support sandboxing. If you haven’t set up WSL2 yet, see Section 8 — Sandboxing (Windows/WSL2) for a full walkthrough.</p>
</div>
</div>
<p><strong>Once inside Claude Code:</strong></p>
<pre class="text"><code>/status       # Confirm your cloud provider is active and project/region are correct
/sandbox      # Enable sandboxing if not already persistent in settings
/cost         # Check token usage at any time during your session</code></pre>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Tip
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong>Vertex AI:</strong> If you get 401/403 errors, re-run <code>gcloud auth application-default login</code>. If you get 404 “model not found” errors, check your region settings in <code>~/.claude/settings.json</code> (see Section 5). If you get 429 “resource exhausted” errors, you need a quota increase (see Troubleshooting).</p>
<p><strong>Bedrock:</strong> If you get <code>AccessDeniedException</code>, check your IAM permissions and ensure model access is enabled. If you get <code>ValidationException: Model not found</code>, verify your <code>AWS_REGION</code> and model IDs. If credentials expired, re-run <code>aws sso login</code> or refresh your access keys.</p>
</div>
</div>
</section>
<section id="prerequisites" class="level2">
<h2 class="anchored" data-anchor-id="prerequisites">1. Prerequisites</h2>
<p><strong>Both platforms:</strong></p>
<ul>
<li>Internet connection</li>
</ul>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" aria-controls="tabset-2-1" aria-selected="true" href="">Vertex AI</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" aria-controls="tabset-2-2" aria-selected="false" href="">Amazon Bedrock</a></li></ul>
<div class="tab-content">
<div id="tabset-2-1" class="tab-pane active" aria-labelledby="tabset-2-1-tab">
<ul>
<li>A <strong>UW-Madison GCP project</strong> with billing enabled (e.g., <code>doit-rci-sandbox-gcp-baa4</code>)</li>
</ul>
</div>
<div id="tabset-2-2" class="tab-pane" aria-labelledby="tabset-2-2-tab">
<ul>
<li>A <strong>UW-Madison AWS account</strong> with billing enabled</li>
</ul>
</div>
</div>
</div>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-3-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-1" aria-controls="tabset-3-1" aria-selected="true" href="">Windows</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-2" aria-controls="tabset-3-2" aria-selected="false" href="">macOS</a></li></ul>
<div class="tab-content">
<div id="tabset-3-1" class="tab-pane active" aria-labelledby="tabset-3-1-tab">
<ul>
<li>Windows 10 (build 1809+) or Windows 11</li>
<li><strong>PowerShell</strong> (use this — not Git Bash, not CMD)</li>
</ul>
</div>
<div id="tabset-3-2" class="tab-pane" aria-labelledby="tabset-3-2-tab">
<ul>
<li>macOS 13.0 (Ventura) or later</li>
<li><strong>Terminal</strong> (built-in) or any terminal emulator (iTerm2, Warp, etc.)</li>
</ul>
</div>
</div>
</div>
<blockquote class="blockquote">
<p>See the official <a href="https://code.claude.com/docs/en/setup">Claude Code system requirements</a> for the full list (OS versions, RAM, shell, etc.).</p>
</blockquote>
</section>
<section id="install-claude-code" class="level2">
<h2 class="anchored" data-anchor-id="install-claude-code">2. Install Claude Code</h2>
<blockquote class="blockquote">
<p>Official install docs: <a href="https://code.claude.com/docs/en/setup">code.claude.com/docs/en/setup</a></p>
</blockquote>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-4-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-1" aria-controls="tabset-4-1" aria-selected="true" href="">Windows</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-2" aria-controls="tabset-4-2" aria-selected="false" href="">macOS</a></li></ul>
<div class="tab-content">
<div id="tabset-4-1" class="tab-pane active" aria-labelledby="tabset-4-1-tab">
<p>Open <strong>PowerShell</strong> and run:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode powershell code-with-copy"><code class="sourceCode powershell"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">irm</span> https<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">://</span>claude<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ai</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>install<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ps1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">iex</span></span></code></pre></div></div>
<div class="callout callout-style-default callout-warning callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Warning
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong>Do NOT use Git Bash (MINGW64)</strong> — the installer does not support it. Use PowerShell only.</p>
</div>
</div>
<p>After installation, you’ll likely see a message that <code>C:\Users\&lt;you&gt;\.local\bin</code> is not in your PATH. Fix this:</p>
<ol type="1">
<li>Press <strong>Win + R</strong>, type <code>sysdm.cpl</code>, press Enter.</li>
<li>Go to <strong>Advanced</strong> tab → <strong>Environment Variables</strong>.</li>
<li>Under <strong>User variables</strong>, select <code>Path</code> → click <strong>Edit</strong>.</li>
<li>Click <strong>New</strong> and add: <code>C:\Users\&lt;your-username&gt;\.local\bin</code></li>
<li>Click <strong>OK</strong> on all dialogs.</li>
<li><strong>Close and reopen PowerShell.</strong></li>
</ol>
<p>Verify:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode powershell code-with-copy"><code class="sourceCode powershell"><span id="cb5-1">claude <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span>version</span></code></pre></div></div>
</div>
<div id="tabset-4-2" class="tab-pane" aria-labelledby="tabset-4-2-tab">
<p>Open <strong>Terminal</strong> and run:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb6-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">curl</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-fsSL</span> https://claude.ai/install.sh <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">|</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bash</span></span></code></pre></div></div>
<p>Alternatively, if you use Homebrew:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb7-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">brew</span> install <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--cask</span> claude-code</span></code></pre></div></div>
<p>After installation, restart your terminal (or run <code>source ~/.zshrc</code>), then verify:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb8-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">claude</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--version</span></span></code></pre></div></div>
<blockquote class="blockquote">
<p>The native installer places the binary at <code>~/.claude/bin/claude</code> or <code>~/.local/bin/claude</code> and updates your shell profile automatically. If <code>claude</code> isn’t found, ensure one of these paths is in your <code>$PATH</code>.</p>
</blockquote>
</div>
</div>
</div>
</section>
<section id="install-cloud-cli" class="level2">
<h2 class="anchored" data-anchor-id="install-cloud-cli">3. Install &amp; Configure Your Cloud CLI</h2>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-7-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-7-1" aria-controls="tabset-7-1" aria-selected="true" href="">Vertex AI (gcloud)</a></li><li class="nav-item"><a class="nav-link" id="tabset-7-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-7-2" aria-controls="tabset-7-2" aria-selected="false" href="">Amazon Bedrock (AWS CLI)</a></li></ul>
<div class="tab-content">
<div id="tabset-7-1" class="tab-pane active" aria-labelledby="tabset-7-1-tab">
<blockquote class="blockquote">
<p>Official gcloud install docs: <a href="https://cloud.google.com/sdk/docs/install">cloud.google.com/sdk/docs/install</a></p>
</blockquote>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-5-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-1" aria-controls="tabset-5-1" aria-selected="true" href="">Windows</a></li><li class="nav-item"><a class="nav-link" id="tabset-5-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-2" aria-controls="tabset-5-2" aria-selected="false" href="">Windows (WSL2)</a></li><li class="nav-item"><a class="nav-link" id="tabset-5-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-5-3" aria-controls="tabset-5-3" aria-selected="false" href="">macOS</a></li></ul>
<div class="tab-content">
<div id="tabset-5-1" class="tab-pane active" aria-labelledby="tabset-5-1-tab">
<p>Download and install the <a href="https://cloud.google.com/sdk/docs/install">Google Cloud CLI for Windows</a>. The installer adds <code>gcloud</code> to your PATH automatically.</p>
<p><strong>Fix PowerShell Execution Policy (if needed):</strong></p>
<p>If <code>gcloud</code> gives you a “running scripts is disabled” error:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode powershell code-with-copy"><code class="sourceCode powershell"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Set-ExecutionPolicy</span> RemoteSigned <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>Scope CurrentUser</span></code></pre></div></div>
<p>Type <strong>Y</strong> to confirm.</p>
</div>
<div id="tabset-5-2" class="tab-pane" aria-labelledby="tabset-5-2-tab">
<p>If you’re running Claude Code from WSL2, you need <code>gcloud</code> installed <strong>inside</strong> your WSL2 environment (the Windows installation is not accessible from WSL2):</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb10-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> apt-get install apt-transport-https ca-certificates gnupg curl</span>
<span id="cb10-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">curl</span> https://packages.cloud.google.com/apt/doc/apt-key.gpg <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">|</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> gpg <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--dearmor</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-o</span> /usr/share/keyrings/cloud.google.gpg</span>
<span id="cb10-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">echo</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main"</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">|</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> tee /etc/apt/sources.list.d/google-cloud-sdk.list</span>
<span id="cb10-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> apt-get update <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> apt-get install google-cloud-cli</span></code></pre></div></div>
</div>
<div id="tabset-5-3" class="tab-pane" aria-labelledby="tabset-5-3-tab">
<p>Option A — Homebrew (easiest):</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb11-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">brew</span> install <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--cask</span> google-cloud-sdk</span></code></pre></div></div>
<p>Option B — Official installer:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb12-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">curl</span> https://sdk.cloud.google.com <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">|</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bash</span></span></code></pre></div></div>
<p>Then restart your terminal or run:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb13-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">source</span> ~/.zshrc   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># or source ~/.bashrc</span></span></code></pre></div></div>
</div>
</div>
</div>
<p><strong>Authenticate (both platforms):</strong></p>
<p>These commands are identical on both Windows and macOS:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb14-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Log in with your UW-Madison Google account</span></span>
<span id="cb14-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">gcloud</span> auth login</span>
<span id="cb14-3"></span>
<span id="cb14-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set your GCP project</span></span>
<span id="cb14-5"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">gcloud</span> config set project YOUR-PROJECT-ID</span>
<span id="cb14-6"></span>
<span id="cb14-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># CRITICAL — this is what Claude Code actually uses to authenticate</span></span>
<span id="cb14-8"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">gcloud</span> auth application-default login</span></code></pre></div></div>
<p>Both <code>login</code> commands open a browser window. Sign in with your UW-Madison email and grant the requested permissions.</p>
<p>Verify everything is configured correctly:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb15-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">gcloud</span> config list</span></code></pre></div></div>
<p>You should see your UW-Madison email under <code>[core] account</code> and your project ID under <code>[core] project</code>.</p>
</div>
<div id="tabset-7-2" class="tab-pane" aria-labelledby="tabset-7-2-tab">
<blockquote class="blockquote">
<p>Official AWS CLI install docs: <a href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html">docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html</a></p>
</blockquote>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-6-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-1" aria-controls="tabset-6-1" aria-selected="true" href="">Windows</a></li><li class="nav-item"><a class="nav-link" id="tabset-6-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-6-2" aria-controls="tabset-6-2" aria-selected="false" href="">macOS</a></li></ul>
<div class="tab-content">
<div id="tabset-6-1" class="tab-pane active" aria-labelledby="tabset-6-1-tab">
<p>Download and run the <a href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html">AWS CLI MSI installer for Windows</a>. The installer adds <code>aws</code> to your PATH automatically.</p>
<p>Verify:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode powershell code-with-copy"><code class="sourceCode powershell"><span id="cb16-1">aws <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span>version</span></code></pre></div></div>
</div>
<div id="tabset-6-2" class="tab-pane" aria-labelledby="tabset-6-2-tab">
<p>Option A — Homebrew (easiest):</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb17-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">brew</span> install awscli</span></code></pre></div></div>
<p>Option B — Official installer:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb18-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">curl</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://awscli.amazonaws.com/AWSCLIV2.pkg"</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-o</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AWSCLIV2.pkg"</span></span>
<span id="cb18-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> installer <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-pkg</span> AWSCLIV2.pkg <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-target</span> /</span></code></pre></div></div>
<p>Verify:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb19-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">aws</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--version</span></span></code></pre></div></div>
</div>
</div>
</div>
<p><strong>Authenticate (both platforms):</strong></p>
<p>Claude Code uses the standard <a href="https://docs.aws.amazon.com/sdkref/latest/guide/standardized-credentials.html">AWS credential chain</a>. Choose one of these methods:</p>
<p><strong>Option A — AWS SSO (recommended for UW-Madison):</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb20-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Configure your SSO profile (one-time setup)</span></span>
<span id="cb20-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">aws</span> configure sso</span>
<span id="cb20-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Follow the prompts: SSO start URL, region, account, role</span></span>
<span id="cb20-4"></span>
<span id="cb20-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Log in (do this whenever credentials expire)</span></span>
<span id="cb20-6"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">aws</span> sso login <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--profile</span> your-profile-name</span></code></pre></div></div>
<p><strong>Option B — IAM access keys:</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb21-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">aws</span> configure</span>
<span id="cb21-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Enter your AWS Access Key ID, Secret Access Key, and region</span></span></code></pre></div></div>
<p><strong>Option C — Environment variables:</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb22-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">export</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">AWS_ACCESS_KEY_ID</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"your-access-key"</span></span>
<span id="cb22-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">export</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">AWS_SECRET_ACCESS_KEY</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"your-secret-key"</span></span>
<span id="cb22-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">export</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">AWS_REGION</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"us-east-1"</span></span></code></pre></div></div>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Tip
</div>
</div>
<div class="callout-body-container callout-body">
<p>If your institution uses AWS SSO or federated identity, Option A is the most secure — no long-lived credentials on disk. Ask your UW-Madison AWS administrator which method your account uses.</p>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="enable-api" class="level2">
<h2 class="anchored" data-anchor-id="enable-api">4. Enable API Access &amp; Request Models</h2>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-8-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-8-1" aria-controls="tabset-8-1" aria-selected="true" href="">Vertex AI</a></li><li class="nav-item"><a class="nav-link" id="tabset-8-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-8-2" aria-controls="tabset-8-2" aria-selected="false" href="">Amazon Bedrock</a></li></ul>
<div class="tab-content">
<div id="tabset-8-1" class="tab-pane active" aria-labelledby="tabset-8-1-tab">
<blockquote class="blockquote">
<p>Official Vertex AI setup: <a href="https://code.claude.com/docs/en/google-vertex-ai">code.claude.com/docs/en/google-vertex-ai</a></p>
<p>Google’s Claude on Vertex docs: <a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude">cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude</a></p>
</blockquote>
<p><strong>Enable the API:</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb23-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">gcloud</span> services enable aiplatform.googleapis.com <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--project</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>YOUR-PROJECT-ID</span></code></pre></div></div>
<p><strong>Request access to Claude models:</strong></p>
<ol type="1">
<li>Go to the <a href="https://console.cloud.google.com/vertex-ai/model-garden">Vertex AI Model Garden</a> in the GCP Console.</li>
<li>Search for the Claude model you want (e.g., <strong>Claude Sonnet 4</strong>).</li>
<li>Click the model card and <strong>complete the access request form</strong>.</li>
<li>Approval may take minutes to a couple of days.</li>
</ol>
<p><strong>Verify IAM permissions:</strong></p>
<p>Your GCP account needs the <strong><code>roles/aiplatform.user</code></strong> role (<a href="https://cloud.google.com/vertex-ai/docs/general/access-control">Vertex IAM docs</a>), which includes:</p>
<ul>
<li><code>aiplatform.endpoints.predict</code> (model invocation)</li>
<li><code>aiplatform.endpoints.computeTokens</code> (token counting)</li>
</ul>
<p>If you don’t have this, ask your UW-Madison GCP administrator to grant it.</p>
</div>
<div id="tabset-8-2" class="tab-pane" aria-labelledby="tabset-8-2-tab">
<blockquote class="blockquote">
<p>Official Bedrock setup: <a href="https://code.claude.com/docs/en/amazon-bedrock">code.claude.com/docs/en/amazon-bedrock</a></p>
<p>AWS Bedrock model access docs: <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html">docs.aws.amazon.com/bedrock/latest/userguide/model-access.html</a></p>
</blockquote>
<p><strong>Enable model access:</strong></p>
<ol type="1">
<li>Open the <a href="https://console.aws.amazon.com/bedrock/">Amazon Bedrock console</a> and select your region (e.g., <strong>us-east-1</strong>).</li>
<li>Go to <strong>Model catalog</strong> in the left sidebar.</li>
<li>Find the Claude model you want (e.g., <strong>Claude Sonnet 4.6</strong>) and click <strong>Request model access</strong>.</li>
<li>AWS will ask for a brief use-case description. A one-sentence explanation is sufficient. For most Claude models, approval is automatic and takes less than a minute.</li>
</ol>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>The first time you request access to Anthropic models on a new AWS account, you’ll need to complete a First Time Use (FTU) form with use-case details. This is a one-time per-account requirement.</p>
</div>
</div>
<p><strong>Verify IAM permissions:</strong></p>
<p>Your AWS identity needs permissions to invoke Bedrock models. At minimum, create (or ask your admin to attach) an IAM policy with:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb24-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb24-2">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"Version"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2012-10-17"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb24-3">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"Statement"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span></span>
<span id="cb24-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb24-5">      <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"Effect"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Allow"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb24-6">      <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"Action"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span></span>
<span id="cb24-7">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bedrock:InvokeModel"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb24-8">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bedrock:InvokeModelWithResponseStream"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb24-9">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bedrock:ListInferenceProfiles"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb24-10">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bedrock:ListFoundationModels"</span></span>
<span id="cb24-11">      <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb24-12">      <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"Resource"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"*"</span></span>
<span id="cb24-13">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb24-14">  <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb24-15"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>If your account uses AWS Marketplace for model access, you may also need <code>aws-marketplace:Subscribe</code> and <code>aws-marketplace:ViewSubscriptions</code> permissions.</p>
</div>
</div>
</div>
</section>
<section id="configure-claude-code" class="level2">
<h2 class="anchored" data-anchor-id="configure-claude-code">5. Configure Claude Code</h2>
<p>Edit (or create) your settings file at <code>~/.claude/settings.json</code>.</p>
<table class="caption-top table">
<colgroup>
<col style="width: 47%">
<col style="width: 52%">
</colgroup>
<thead>
<tr class="header">
<th>Platform</th>
<th>Full path</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Windows (PowerShell)</td>
<td><code>C:\Users\&lt;your-username&gt;\.claude\settings.json</code></td>
</tr>
<tr class="even">
<td>Windows (WSL2)</td>
<td><code>/home/&lt;your-username&gt;/.claude/settings.json</code></td>
</tr>
<tr class="odd">
<td>macOS</td>
<td><code>/Users/&lt;your-username&gt;/.claude/settings.json</code></td>
</tr>
</tbody>
</table>
<div class="callout callout-style-default callout-important callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Important
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong>WSL2 has its own home directory.</strong> If you set up Claude Code in both Windows PowerShell and WSL2, you need <code>settings.json</code> in <strong>both</strong> locations — they are completely separate filesystems. The Windows file at <code>C:\Users\you\.claude\settings.json</code> is <strong>not</strong> visible to WSL2. Similarly, <code>gcloud</code> must be installed and authenticated separately inside WSL2 (see Section 3).</p>
</div>
</div>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-9-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-9-1" aria-controls="tabset-9-1" aria-selected="true" href="">Vertex AI</a></li><li class="nav-item"><a class="nav-link" id="tabset-9-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-9-2" aria-controls="tabset-9-2" aria-selected="false" href="">Amazon Bedrock</a></li></ul>
<div class="tab-content">
<div id="tabset-9-1" class="tab-pane active" aria-labelledby="tabset-9-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb25-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb25-2">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"model"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"claude-sonnet-4-6"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb25-3">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"autoUpdatesChannel"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"latest"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb25-4">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"env"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb25-5">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"CLAUDE_CODE_USE_VERTEX"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"1"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb25-6">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"CLOUD_ML_REGION"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"global"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb25-7">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"ANTHROPIC_VERTEX_PROJECT_ID"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"YOUR-PROJECT-ID"</span></span>
<span id="cb25-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">},</span></span>
<span id="cb25-9">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"sandbox"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb25-10">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"enabled"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">true</span></span>
<span id="cb25-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb25-12"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>Replace <code>YOUR-PROJECT-ID</code> with your actual GCP project ID (e.g., <code>doit-rci-sandbox-gcp-baa4</code>).</p>
<p><strong>Region notes:</strong></p>
<ul>
<li><strong><code>global</code></strong> is typically the cheapest and is a good default.</li>
<li>Not all models are available on the global endpoint. If you get 404 “model not found” errors, add region overrides for specific models by adding more entries to the <code>env</code> block:</li>
</ul>
<pre class="jsonc"><code>"VERTEX_REGION_CLAUDE_3_5_HAIKU": "us-east5",
"VERTEX_REGION_CLAUDE_4_0_SONNET": "us-east5"</code></pre>
</div>
<div id="tabset-9-2" class="tab-pane" aria-labelledby="tabset-9-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb27-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb27-2">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"model"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"claude-sonnet-4-6"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb27-3">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"autoUpdatesChannel"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"latest"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb27-4">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"env"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb27-5">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"CLAUDE_CODE_USE_BEDROCK"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"1"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb27-6">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"AWS_REGION"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"us-east-1"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb27-7">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"AWS_PROFILE"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"your-profile-name"</span></span>
<span id="cb27-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">},</span></span>
<span id="cb27-9">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"sandbox"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb27-10">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"enabled"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">true</span></span>
<span id="cb27-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb27-12"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>Replace <code>your-profile-name</code> with your AWS CLI profile name (from <code>aws configure sso</code> or <code>aws configure</code>). If you’re using environment variables for credentials instead of a profile, omit <code>AWS_PROFILE</code> and set <code>AWS_ACCESS_KEY_ID</code> and <code>AWS_SECRET_ACCESS_KEY</code> in the <code>env</code> block.</p>
<p><strong>Region notes:</strong></p>
<ul>
<li><strong><code>us-east-1</code></strong> has the broadest model availability and is a good default.</li>
<li>Not all models are available in all regions. Check the <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html">Bedrock model support by region</a> page if you get “model not found” errors.</li>
</ul>
<p><strong>Model pinning (recommended):</strong></p>
<p>Bedrock model aliases can change when new versions are released. To avoid surprises, pin specific model versions in your <code>env</code> block:</p>
<pre class="jsonc"><code>"ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-6",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "claude-opus-4-6",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "claude-haiku-4-5-20251001"</code></pre>
</div>
</div>
</div>
<div class="callout callout-style-default callout-warning callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Warning
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong>Do not include <code>//</code> comments in <code>settings.json</code>.</strong> The examples elsewhere in this guide use <code>jsonc</code> syntax for readability, but <code>settings.json</code> is parsed as <strong>plain JSON</strong>, which does not support comments. If you copy-paste a block with comments, Claude Code will report “Invalid or malformed JSON” in <code>claude doctor</code>.</p>
</div>
</div>
<p>The <code>model</code> field sets your default model — <code>claude-sonnet-4-6</code> is a good balance of cost and capability. Change it to <code>claude-opus-4-6</code> if you need maximum reasoning power (at higher cost).</p>
<div class="callout callout-style-default callout-important callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Important</span>Sandbox platform prerequisites
</div>
</div>
<div class="callout-body-container callout-body">
<p>The sandbox config above enables OS-level isolation that restricts Claude’s bash commands to your project directory. It works <strong>out of the box on macOS</strong> (uses Apple’s Seatbelt). On <strong>Linux/WSL2</strong>, you must first install <code>bubblewrap</code> and <code>socat</code> — see Section 8 for step-by-step instructions including how to verify you’re on WSL2 (WSL1 is not supported). <strong>Native Windows (PowerShell)</strong> does not yet support sandboxing — the setting will be ignored, so leave it in place for when support arrives. See Section 8 for full configuration details.</p>
</div>
</div>
</section>
<section id="verify-the-setup" class="level2">
<h2 class="anchored" data-anchor-id="verify-the-setup">6. Verify the Setup</h2>
<p>Run these in your terminal (PowerShell on Windows, Terminal on macOS):</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb29-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">claude</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--version</span></span>
<span id="cb29-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">claude</span> doctor</span></code></pre></div></div>
<p>Then launch Claude Code from a project directory:</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-10-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-10-1" aria-controls="tabset-10-1" aria-selected="true" href="">Windows</a></li><li class="nav-item"><a class="nav-link" id="tabset-10-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-10-2" aria-controls="tabset-10-2" aria-selected="false" href="">macOS</a></li></ul>
<div class="tab-content">
<div id="tabset-10-1" class="tab-pane active" aria-labelledby="tabset-10-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode powershell code-with-copy"><code class="sourceCode powershell"><span id="cb30-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cd</span> C<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>\Users\you\projects\my-project</span>
<span id="cb30-2">claude</span></code></pre></div></div>
</div>
<div id="tabset-10-2" class="tab-pane" aria-labelledby="tabset-10-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb31-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> ~/projects/my-project</span>
<span id="cb31-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">claude</span></span></code></pre></div></div>
</div>
</div>
</div>
<p>On first launch, Claude Code will present a login method selection screen. Since you’re using a 3rd-party cloud provider, select <strong>option 3 (“3rd-party platform”)</strong> and then choose your provider (e.g., Vertex AI) when prompted. Claude Code will detect your cloud credentials automatically — no API key is needed.</p>
<p>Once inside Claude Code, run <code>/status</code> to confirm your provider is active:</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-11-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-11-1" aria-controls="tabset-11-1" aria-selected="true" href="">Vertex AI</a></li><li class="nav-item"><a class="nav-link" id="tabset-11-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-11-2" aria-controls="tabset-11-2" aria-selected="false" href="">Amazon Bedrock</a></li></ul>
<div class="tab-content">
<div id="tabset-11-1" class="tab-pane active" aria-labelledby="tabset-11-1-tab">
<pre><code>API provider: Google Vertex AI
GCP project: your-project-id
Default region: global</code></pre>
</div>
<div id="tabset-11-2" class="tab-pane" aria-labelledby="tabset-11-2-tab">
<pre><code>API provider: Amazon Bedrock
AWS region: us-east-1</code></pre>
</div>
</div>
</div>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>When using either cloud provider, the <a href="https://code.claude.com/docs/en/google-vertex-ai"><code>/login</code> and <code>/logout</code> commands are disabled</a> — authentication is handled entirely through your cloud CLI credentials (<code>gcloud</code> or <code>aws</code>).</p>
</div>
</div>
</section>
<section id="filesystem-safety-permissions" class="level2">
<h2 class="anchored" data-anchor-id="filesystem-safety-permissions">7. Filesystem Safety &amp; Permissions</h2>
<blockquote class="blockquote">
<p>Official permissions docs: <a href="https://code.claude.com/docs/en/permissions">code.claude.com/docs/en/permissions</a></p>
<p>Sandboxing docs: <a href="https://code.claude.com/docs/en/sandboxing">code.claude.com/docs/en/sandboxing</a></p>
</blockquote>
<section id="understand-risk" class="level3">
<h3 class="anchored" data-anchor-id="understand-risk">Understand the risk before you start</h3>
<div class="callout callout-style-default callout-warning callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Warning</span>Claude Code runs with your user’s full filesystem permissions
</div>
</div>
<div class="callout-body-container callout-body">
<p>This is the single most important thing to understand before using it.</p>
</div>
</div>
<p>When you launch Claude Code from the CLI, it has access to your current working directory — but it is <strong>not strictly confined to it</strong>. Without sandboxing enabled, Claude can read, modify, create, or delete files <strong>anywhere your user account has access</strong>, including parent directories, your home folder, and system files. <a href="https://zenn.dev/tomioka/articles/0496a427f8bcd0?locale=en">Independent testing has confirmed</a> that without sandboxing, Claude can create files in parent directories above the working directory without any special prompt.</p>
<p>This means a poorly worded prompt, an agentic loop, or a <a href="https://www.petefreitag.com/blog/claude-code-permissions/">prompt injection attack</a> embedded in a file Claude reads could cause changes in places you didn’t intend.</p>
<p><strong>Practical recommendations:</strong></p>
<ul>
<li><strong>Avoid running Claude Code on machines where restricted or sensitive data is stored.</strong> Sandboxing restricts <em>writes</em> but not <em>reads</em> — Claude can still read files anywhere on the machine and send their contents to Anthropic’s servers as part of a prompt. UW does not yet have a data-use agreement with Anthropic (see warning above), so even cloud-routed usage sends your prompts to Anthropic’s infrastructure.</li>
<li><strong>If you must run on a machine with sensitive files</strong>, add <strong>deny rules</strong> (see below) to block read access to sensitive paths. But note that <code>Read</code> deny rules are best-effort — allowed Bash commands like <code>cat</code> can still read denied files. Denying both <code>Read</code> and the relevant <code>Bash</code> patterns together provides stronger protection (see <a href="https://www.petefreitag.com/blog/claude-code-permissions/">this security deep-dive</a> for details).</li>
<li><strong>Test only within isolated Git repositories</strong>, with no sensitive data present in the local repo.</li>
<li><strong>Always enable sandboxing</strong> (Section 8) — it restricts <em>writes</em> to your project directory and adds network isolation at the OS level. It’s available on macOS (built-in) and Linux/WSL2 (requires <code>bubblewrap</code>). Native Windows does not yet support it.</li>
</ul>
</section>
<section id="always-launch-from-your-project-directory" class="level3">
<h3 class="anchored" data-anchor-id="always-launch-from-your-project-directory">Always launch from your project directory</h3>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-12-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-12-1" aria-controls="tabset-12-1" aria-selected="true" href="">Windows</a></li><li class="nav-item"><a class="nav-link" id="tabset-12-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-12-2" aria-controls="tabset-12-2" aria-selected="false" href="">macOS</a></li></ul>
<div class="tab-content">
<div id="tabset-12-1" class="tab-pane active" aria-labelledby="tabset-12-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode powershell code-with-copy"><code class="sourceCode powershell"><span id="cb34-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cd</span> C<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>\Users\you\projects\my-project</span>
<span id="cb34-2">claude</span></code></pre></div></div>
</div>
<div id="tabset-12-2" class="tab-pane" aria-labelledby="tabset-12-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb35-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> ~/projects/my-project</span>
<span id="cb35-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">claude</span></span></code></pre></div></div>
</div>
</div>
</div>
<p>Claude Code defaults to operating in the directory where it’s launched. Starting from <code>C:\</code> or <code>/</code> or your home directory gives it an unnecessarily broad scope.</p>
</section>
<section id="use-git-as-your-safety-net" class="level3">
<h3 class="anchored" data-anchor-id="use-git-as-your-safety-net">Use Git as your safety net</h3>
<p><strong>Always work inside a Git repository.</strong> This is your most important protection:</p>
<ul>
<li>You can instantly see what changed with <code>git diff</code>.</li>
<li>You can revert any unwanted changes with <code>git checkout .</code> or <code>git stash</code>.</li>
<li>Claude Code itself is Git-aware and will generally respect repository boundaries.</li>
</ul>
<p><strong>Before starting a Claude Code session, make sure your working tree is clean</strong> (commit or stash pending changes). That way, anything Claude does can be cleanly reviewed or rolled back.</p>
</section>
<section id="use-the-default-permission-mode" class="level3">
<h3 class="anchored" data-anchor-id="use-the-default-permission-mode">Use the default permission mode</h3>
<p>Out of the box, Claude Code asks for your approval before running most commands. <strong>Do not change this unless you understand the implications.</strong> Specifically:</p>
<ul>
<li><strong>Do NOT use <code>--dangerously-skip-permissions</code></strong> unless you’re in an isolated container/VM. This flag bypasses all safety checks.</li>
<li><strong>Do NOT set the mode to <code>bypassPermissions</code></strong> in settings.</li>
</ul>
</section>
<section id="understand-how-permissions-actually-work" class="level3">
<h3 class="anchored" data-anchor-id="understand-how-permissions-actually-work">Understand how permissions actually work</h3>
<p>In the <strong>default mode</strong>, Claude Code already asks for your approval before running most commands that modify your system (file writes, bash commands, network requests, etc.). Read-only operations like viewing files generally run without prompting.</p>
<p>You can customize this behavior with three types of rules in <code>settings.json</code>:</p>
<ul>
<li><strong><code>deny</code></strong> — Hard block. Claude <strong>cannot</strong> use the tool, period. You won’t even be asked. Deny rules always win over allow rules.</li>
<li><strong><code>allow</code></strong> — Auto-approve. Claude can use the tool <strong>without asking you first</strong>. This is a convenience shortcut — it skips the approval prompt for things you trust.</li>
<li><strong><code>ask</code></strong> — Force a prompt, even if something else would auto-approve it.</li>
</ul>
<p><strong>You do NOT need to add <code>allow</code> rules for Claude Code to work.</strong> Without any custom rules, it will simply ask you to approve each action as it comes up. The default behavior is already safe — <code>deny</code> rules are the ones that add protection, while <code>allow</code> rules just reduce the number of prompts you see.</p>
</section>
<section id="add-deny-rules" class="level3">
<h3 class="anchored" data-anchor-id="add-deny-rules">Add <code>deny</code> rules to protect sensitive areas</h3>
<p>Add a <code>permissions</code> block to your <strong><code>~/.claude/settings.json</code></strong> file (the same file you edited in Section 5). Think of <code>deny</code> rules as guardrails — they block Claude from accessing sensitive paths regardless of what it tries to do.</p>
<p>These examples block: SSH keys, <code>.env</code> files, cloud credentials (<code>~/.config/gcloud/</code> and <code>~/.aws/</code>), destructive commands (<code>rm -rf</code>), and network tools (<code>curl</code>, <code>wget</code>). <strong>Replace <code>endemann</code> with your actual username</strong> before pasting into <code>settings.json</code>.</p>
<div class="callout callout-style-default callout-warning callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Warning
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong><code>settings.json</code> is plain JSON — no comments allowed.</strong> The blocks below are ready to copy-paste. Do not add <code>//</code> comments or Claude Code will report “Invalid or malformed JSON” in <code>claude doctor</code>.</p>
</div>
</div>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-13-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-13-1" aria-controls="tabset-13-1" aria-selected="true" href="">macOS</a></li><li class="nav-item"><a class="nav-link" id="tabset-13-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-13-2" aria-controls="tabset-13-2" aria-selected="false" href="">Windows (PowerShell)</a></li><li class="nav-item"><a class="nav-link" id="tabset-13-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-13-3" aria-controls="tabset-13-3" aria-selected="false" href="">Linux / WSL2</a></li></ul>
<div class="tab-content">
<div id="tabset-13-1" class="tab-pane active" aria-labelledby="tabset-13-1-tab">
<p>Home directory is <code>/Users/endemann</code>. Run <code>whoami</code> to check your username.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb36-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb36-2">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"permissions"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb36-3">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"deny"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span></span>
<span id="cb36-4">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(//Users/endemann/.ssh/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb36-5">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Edit(//Users/endemann/.ssh/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb36-6">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(//Users/endemann/.env)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb36-7">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Edit(//Users/endemann/.env)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb36-8">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(//Users/endemann/.config/gcloud/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb36-9">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Edit(//Users/endemann/.config/gcloud/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb36-10">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(//Users/endemann/.aws/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb36-11">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Edit(//Users/endemann/.aws/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb36-12">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bash(rm -rf *)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb36-13">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bash(curl:*)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb36-14">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bash(wget:*)"</span></span>
<span id="cb36-15">    <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb36-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb36-17"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
</div>
<div id="tabset-13-2" class="tab-pane" aria-labelledby="tabset-13-2-tab">
<p>Home directory is <code>C:\Users\endemann</code>. Run <code>echo $env:USERNAME</code> to check your username. Note: use forward slashes in permission rules, even on Windows.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb37-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb37-2">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"permissions"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb37-3">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"deny"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span></span>
<span id="cb37-4">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(//C:/Users/endemann/.ssh/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb37-5">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Edit(//C:/Users/endemann/.ssh/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb37-6">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(//C:/Users/endemann/.env)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb37-7">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Edit(//C:/Users/endemann/.env)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb37-8">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(//C:/Users/endemann/.config/gcloud/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb37-9">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Edit(//C:/Users/endemann/.config/gcloud/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb37-10">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(//C:/Users/endemann/.aws/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb37-11">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Edit(//C:/Users/endemann/.aws/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb37-12">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bash(rm -rf *)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb37-13">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bash(curl:*)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb37-14">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bash(wget:*)"</span></span>
<span id="cb37-15">    <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb37-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb37-17"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
</div>
<div id="tabset-13-3" class="tab-pane" aria-labelledby="tabset-13-3-tab">
<p>Home directory is <code>/home/endemann</code>. Run <code>whoami</code> to check your username.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb38-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb38-2">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"permissions"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb38-3">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"deny"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span></span>
<span id="cb38-4">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(//home/endemann/.ssh/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb38-5">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Edit(//home/endemann/.ssh/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb38-6">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(//home/endemann/.env)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb38-7">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Edit(//home/endemann/.env)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb38-8">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(//home/endemann/.config/gcloud/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb38-9">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Edit(//home/endemann/.config/gcloud/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb38-10">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(//home/endemann/.aws/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb38-11">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Edit(//home/endemann/.aws/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb38-12">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bash(rm -rf *)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb38-13">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bash(curl:*)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb38-14">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bash(wget:*)"</span></span>
<span id="cb38-15">    <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb38-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb38-17"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
</div>
</div>
</div>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Tip
</div>
</div>
<div class="callout-body-container callout-body">
<p>The <code>//</code> prefix in permission rules means “absolute path from the filesystem root” — there is no <code>~/</code> shorthand. That’s why you need the full path like <code>//Users/endemann/</code> rather than <code>~/.ssh/</code>.</p>
</div>
</div>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong>Why not just rely on the approval prompt?</strong> Because deny rules are absolute — they work even if you accidentally click “allow” or “always allow” on a prompt. They also protect against <a href="https://www.petefreitag.com/blog/claude-code-permissions/">prompt injection attacks</a> where malicious content in a file tricks Claude into running something harmful. The approval prompt is your first line of defense; deny rules are the backup that can’t be bypassed.</p>
<p><strong>Important caveat:</strong> <code>Read</code> deny rules apply on a “best-effort” basis to built-in tools like Grep and Glob. However, if a Bash command (like <code>cat .env</code>) is allowed, it can still read the file. This is why denying both <code>Read</code> and the relevant <code>Bash</code> commands together gives stronger protection. See <a href="https://www.petefreitag.com/blog/claude-code-permissions/">this security deep-dive</a> for details.</p>
</div>
</div>
</section>
<section id="allow-rules" class="level3">
<h3 class="anchored" data-anchor-id="allow-rules">Add <code>allow</code> rules to reduce prompt fatigue</h3>
<p>These also go in <strong><code>~/.claude/settings.json</code></strong>, inside the same <code>permissions</code> block as your deny rules. By default, Claude Code asks “Do you want to proceed?” before <em>every</em> file write, shell command, and git operation. This gets old fast — in a typical session you might approve the same <code>git add</code>, <code>git commit</code>, and <code>git push</code> sequence a dozen times. You can add <code>allow</code> rules so Claude runs trusted commands without asking:</p>
<p><strong>Minimal — just stop the git and test prompts:</strong></p>
<pre class="jsonc"><code>{
  "permissions": {
    "allow": [
      "Bash(git add *)",
      "Bash(git commit *)",
      "Bash(git push *)",
      "Bash(git checkout *)",
      "Bash(git log *)",
      "Bash(git diff *)",
      "Bash(git status)",
      "Bash(git rm *)",
      "Bash(gh *)",
      "Bash(python *)",
      "Bash(pytest *)"
    ]
  }
}</code></pre>
<p><strong>More aggressive — also auto-approve file edits:</strong></p>
<pre class="jsonc"><code>{
  "permissions": {
    "allow": [
      "Edit",
      "Write",
      "Bash(git *)",
      "Bash(gh *)",
      "Bash(python *)",
      "Bash(pytest *)",
      "Bash(npm *)",
      "Bash(pip *)"
    ]
  }
}</code></pre>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>The fastest way: use <code>acceptEdits</code> mode or enable sandboxing
</div>
</div>
<div class="callout-body-container callout-body">
<p>Instead of listing individual allow rules, you can:</p>
<ul>
<li><strong>Switch to <code>acceptEdits</code> mode</strong> (<code>Shift+Tab</code> inside a session) — auto-approves all file edits while still prompting for bash commands.</li>
<li><strong>Enable sandboxing</strong> (see Section 8) — this is the best option. Sandboxing auto-approves bash commands that stay within your project directory at the OS level, while actually <em>increasing</em> security.</li>
<li><strong>Use the interactive prompt</strong> — when Claude asks “Do you want to proceed?”, select <strong>“Yes, and don’t ask again for [command pattern]”</strong> to build up your allow list organically during a session.</li>
</ul>
</div>
</div>
<p>You don’t <em>have</em> to add any allow rules — the default “ask before doing” behavior is safe, especially when you’re starting out. But if you find yourself mindlessly hitting “Yes” on every prompt, that’s a sign you should either add allow rules or enable sandboxing.</p>
</section>
<section id="check-your-current-permissions" class="level3">
<h3 class="anchored" data-anchor-id="check-your-current-permissions">Check your current permissions</h3>
<p>Run <code>/permissions</code> inside a Claude Code session at any time to see what rules are active and which settings file they came from. This is especially helpful to verify your deny rules are loaded correctly.</p>
</section>
<section id="project-settings" class="level3">
<h3 class="anchored" data-anchor-id="project-settings">Use project-level settings for shared repos</h3>
<p>Claude Code reads <strong>two</strong> settings files, and <strong>merges them</strong>:</p>
<table class="caption-top table">
<colgroup>
<col style="width: 20%">
<col style="width: 23%">
<col style="width: 56%">
</colgroup>
<thead>
<tr class="header">
<th>File</th>
<th>Scope</th>
<th>What to put here</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>~/.claude/settings.json</code></td>
<td><strong>User-level</strong> — applies to every project you open</td>
<td>Personal deny rules (SSH keys, cloud creds, <code>.env</code>), model choice, sandbox config</td>
</tr>
<tr class="even">
<td><code>&lt;project-root&gt;/.claude/settings.json</code></td>
<td><strong>Project-level</strong> — applies only when Claude Code is launched inside this repo</td>
<td>Repo-specific deny rules (e.g., don’t read <code>secrets/</code>), shared team guardrails</td>
</tr>
</tbody>
</table>
<p><strong>How merging works:</strong> Claude Code combines both files. If either file denies something, it’s denied — project-level deny rules apply to everyone, even if their personal settings are more permissive. You don’t have to choose one or the other; most teams use both.</p>
<p>The project-level file lives inside your Git repo, so you can commit it and every collaborator automatically inherits the same rules:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb41" style="background: #f1f3f5;"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb41-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb41-2">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"permissions"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb41-3">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"deny"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span></span>
<span id="cb41-4">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(./.env)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb41-5">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(./secrets/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb41-6">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bash(curl:*)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb41-7">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bash(wget:*)"</span></span>
<span id="cb41-8">    <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb41-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb41-10"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong>Path syntax:</strong> Paths starting with <code>//</code> are absolute (used in user-level settings to protect home-directory files like <code>~/.ssh/</code>). Paths starting with <code>./</code> are relative to the project root (used in project-level settings). Use <code>**</code> to match recursively. See the <a href="https://code.claude.com/docs/en/permissions">permission rules reference</a> for full pattern syntax.</p>
</div>
</div>
</section>
<section id="interactive-safeguards-during-a-session" class="level3">
<h3 class="anchored" data-anchor-id="interactive-safeguards-during-a-session">Interactive safeguards during a session</h3>
<ul>
<li><strong>Press <code>Esc</code></strong> at any time to interrupt Claude Code mid-operation.</li>
<li><strong>Press <code>Shift+Tab</code></strong> to cycle through permission modes during a session.</li>
<li><strong>Use <code>/permissions</code></strong> to view and manage active permissions.</li>
<li><strong>Review every command</strong> before approving it — especially <code>rm</code>, <code>mv</code>, file writes outside your project, and anything involving <code>sudo</code>.</li>
</ul>
</section>
<section id="permission-mode-cheat-sheet" class="level3">
<h3 class="anchored" data-anchor-id="permission-mode-cheat-sheet">Permission mode cheat sheet</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 17%">
<col style="width: 37%">
<col style="width: 45%">
</colgroup>
<thead>
<tr class="header">
<th>Mode</th>
<th>What it does</th>
<th>When to use it</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>default</strong></td>
<td>Asks before most operations</td>
<td>Day-to-day work (recommended)</td>
</tr>
<tr class="even">
<td><strong>acceptEdits</strong></td>
<td>Auto-approves file edits, still asks for bash commands</td>
<td>Trusted refactoring tasks</td>
</tr>
<tr class="odd">
<td><strong>plan</strong></td>
<td>Read-only, no modifications allowed</td>
<td>Exploring a codebase, code review</td>
</tr>
<tr class="even">
<td><strong>dontAsk</strong></td>
<td>Auto-denies anything not explicitly in <code>allow</code> list</td>
<td>Strict automation</td>
</tr>
<tr class="odd">
<td><strong>bypassPermissions</strong></td>
<td>Skips ALL checks</td>
<td>Isolated containers only — <strong>never on your local machine</strong></td>
</tr>
</tbody>
</table>
</section>
</section>
<section id="enable-sandboxing" class="level2">
<h2 class="anchored" data-anchor-id="enable-sandboxing">8. Sandboxing (Strongly Recommended)</h2>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Tip
</div>
</div>
<div class="callout-body-container callout-body">
<p>If you followed Section 5, sandboxing is already enabled in your <code>settings.json</code>. This section covers platform prerequisites, how to choose a sandbox mode, and what sandboxing actually does.</p>
</div>
</div>
<p>Sandboxing is the <strong>strongest protection available</strong> against Claude modifying files outside your project. It uses OS-level enforcement (not just Claude’s own judgment) to restrict what bash commands can access — even if a prompt injection bypasses Claude’s decision-making.</p>
<p><strong>What sandboxing does (<a href="https://code.claude.com/docs/en/sandboxing">official docs</a>):</strong></p>
<ul>
<li><strong>Filesystem isolation:</strong> <em>Write</em> access is restricted to the current working directory and its subdirectories — Claude <strong>cannot modify, create, or delete files outside your project directory</strong>. This is enforced at the OS kernel level. <strong>However, Claude can still <em>read</em> files anywhere on the machine</strong> (unless blocked by deny rules), and anything it reads can be included in prompts sent to Anthropic’s servers. This is an important limitation — sandboxing alone is not sufficient to protect sensitive data that exists elsewhere on the same machine.</li>
<li><strong>Network isolation:</strong> Only approved domains can be accessed. New domain requests trigger a permission prompt. This prevents data exfiltration even if Claude is compromised.</li>
</ul>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-22-contents" aria-controls="callout-22" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>How to block reads outside your project directory
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-22" class="callout-22-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p>Add these deny rules to your <code>~/.claude/settings.json</code> to prevent Claude from reading files outside your project:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb42-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb42-2">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"permissions"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb42-3">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"deny"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span></span>
<span id="cb42-4">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(//**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb42-5">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read(~/**)"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb42-6">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bash(cat:*)"</span></span>
<span id="cb42-7">    <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb42-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb42-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<ul>
<li><code>Read(//**)</code> — blocks the Read tool from accessing any file via an absolute path (files inside your project use relative paths and are unaffected)</li>
<li><code>Read(~/**)</code> — blocks reads via home directory paths</li>
<li><code>Bash(cat:*)</code> — blocks the most common Bash bypass for Read deny rules</li>
</ul>
<p><strong>Caveats:</strong> <code>Read</code> deny rules are best-effort for built-in tools like Grep and Glob. Other Bash commands beyond <code>cat</code> (e.g., <code>head</code>, <code>tail</code>, <code>less</code>) could also read files — add deny rules for those too if needed. For more granular deny rules targeting specific sensitive paths (SSH keys, cloud credentials, <code>.env</code> files), see Section 7.</p>
</div>
</div>
</div>
<p><strong>Without sandboxing</strong>, these protections do not exist — Claude operates with your full user permissions, and <a href="https://zenn.dev/tomioka/articles/0496a427f8bcd0?locale=en">can create/modify files in parent directories</a> without any special prompt.</p>
<p>Anthropic reports that sandboxing <a href="https://www.anthropic.com/engineering/claude-code-sandboxing">reduces permission prompts by 84%</a> while <em>increasing</em> security — you get fewer interruptions and better protection.</p>
<p><strong>How to enable it:</strong></p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-15-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-15-1" aria-controls="tabset-15-1" aria-selected="true" href="">macOS</a></li><li class="nav-item"><a class="nav-link" id="tabset-15-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-15-2" aria-controls="tabset-15-2" aria-selected="false" href="">Windows (WSL2)</a></li><li class="nav-item"><a class="nav-link" id="tabset-15-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-15-3" aria-controls="tabset-15-3" aria-selected="false" href="">Windows (native PowerShell)</a></li></ul>
<div class="tab-content">
<div id="tabset-15-1" class="tab-pane active" aria-labelledby="tabset-15-1-tab">
<p>Works <strong>out of the box</strong> using Apple’s Seatbelt framework. Inside a Claude Code session, run:</p>
<pre><code>/sandbox</code></pre>
<p>Choose <strong>“Auto-allow mode”</strong> (recommended) — sandboxed commands run automatically without permission prompts, while anything that would exceed the sandbox boundaries falls back to the normal approval flow.</p>
</div>
<div id="tabset-15-2" class="tab-pane" aria-labelledby="tabset-15-2-tab">
<p>WSL2 uses <a href="https://github.com/containers/bubblewrap">bubblewrap</a> for sandbox isolation — the same mechanism used on native Linux. <strong>WSL1 is not supported</strong> because bubblewrap requires kernel features only available in WSL2.</p>
<p><strong>Step 1 — Confirm you’re on WSL2:</strong></p>
<p>Open a <strong>Windows PowerShell</strong> window (not your WSL terminal) and run:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode powershell code-with-copy"><code class="sourceCode powershell"><span id="cb44-1">wsl <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>l <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>v</span></code></pre></div></div>
<p>You should see <code>VERSION 2</code> next to your distribution. If you’re on WSL1, <a href="https://learn.microsoft.com/en-us/windows/wsl/install#upgrade-version-from-wsl-1-to-wsl-2">upgrade to WSL2</a> first. Once confirmed, open your <strong>WSL2 terminal</strong> — all remaining steps run inside WSL.</p>
<p><strong>Step 2 — Install Claude Code inside WSL2:</strong></p>
<p>WSL2 is a separate Linux environment — your Windows PowerShell installation of Claude Code doesn’t carry over. Inside your WSL2 terminal, install it with npm:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb45" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb45-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">npm</span> install <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-g</span> @anthropic-ai/claude-code</span></code></pre></div></div>
<p>You’ll also need to reconfigure your cloud CLI credentials inside WSL2, since it has its own filesystem and config:</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-14-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-14-1" aria-controls="tabset-14-1" aria-selected="true" href="">Vertex AI</a></li><li class="nav-item"><a class="nav-link" id="tabset-14-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-14-2" aria-controls="tabset-14-2" aria-selected="false" href="">Bedrock</a></li></ul>
<div class="tab-content">
<div id="tabset-14-1" class="tab-pane active" aria-labelledby="tabset-14-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb46" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb46-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">gcloud</span> auth login</span>
<span id="cb46-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">gcloud</span> auth application-default login</span>
<span id="cb46-3"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">gcloud</span> config set project YOUR_PROJECT_ID</span></code></pre></div></div>
</div>
<div id="tabset-14-2" class="tab-pane" aria-labelledby="tabset-14-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb47" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb47-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">aws</span> configure</span>
<span id="cb47-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Enter your Access Key ID, Secret Access Key, and default region when prompted</span></span></code></pre></div></div>
</div>
</div>
</div>
<p>Verify both are working:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb48" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb48-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">claude</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--version</span></span></code></pre></div></div>
<p><strong>Step 3 — Copy your settings into WSL2:</strong></p>
<p>WSL2 has its own home directory (<code>/home/&lt;you&gt;/</code>), completely separate from Windows (<code>C:\Users\&lt;you&gt;\</code>). Claude Code won’t see your Windows <code>settings.json</code> — you need a copy inside WSL2:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb49" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb49-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create the config directory if it doesn't exist</span></span>
<span id="cb49-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mkdir</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-p</span> ~/.claude</span>
<span id="cb49-3"></span>
<span id="cb49-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Copy your Windows settings into WSL2</span></span>
<span id="cb49-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cp</span> /mnt/c/Users/<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">$(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">cmd.exe</span> /C <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"echo %USERNAME%"</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span>/dev/null <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">|</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tr</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-d</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'\r'</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">)</span>/.claude/settings.json ~/.claude/settings.json</span></code></pre></div></div>
<p>Or if you prefer, create <code>~/.claude/settings.json</code> manually with the same content you used in Section 5 (use <code>nano ~/.claude/settings.json</code>).</p>
<p>Verify the settings look correct:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb50" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb50-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span> ~/.claude/settings.json</span></code></pre></div></div>
<p><strong>Step 4 — Install sandbox dependencies:</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb51" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb51-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Ubuntu/Debian (most common WSL distro)</span></span>
<span id="cb51-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> apt-get update <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> apt-get install bubblewrap socat</span>
<span id="cb51-3"></span>
<span id="cb51-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fedora</span></span>
<span id="cb51-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> dnf install bubblewrap socat</span></code></pre></div></div>
<p><strong>Step 5 — Launch Claude Code and enable sandboxing:</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb52" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb52-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> ~/projects/my-project   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># navigate to your project first</span></span>
<span id="cb52-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">claude</span>                      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># launch Claude Code</span></span></code></pre></div></div>
<p>Then inside the Claude Code session, run:</p>
<pre><code>/sandbox</code></pre>
<p>Choose <strong>“Auto-allow mode”</strong> — sandboxed commands run automatically, while anything outside sandbox boundaries falls back to the normal approval flow.</p>
<p><strong>Step 6 — Verify it’s working:</strong></p>
<p>After enabling, Claude’s bash commands should run without permission prompts as long as they stay within your project directory. If a command needs to write outside the project or access a new network domain, you’ll still get a prompt.</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Tip
</div>
</div>
<div class="callout-body-container callout-body">
<p>If you need tools like <code>npm</code>, <code>pip</code>, or <code>kubectl</code> to write outside your project directory (e.g., to <code>~/.npm</code> or <code>~/.kube</code>), grant specific write access in your <code>settings.json</code>:</p>
<pre class="jsonc"><code>{
  "sandbox": {
    "enabled": true,
    "filesystem": {
      "allowWrite": ["~/.npm", "~/.kube", "//tmp"]
    }
  }
}</code></pre>
<p>Path prefixes: <code>//</code> = absolute path, <code>~/</code> = home directory, <code>/</code> = relative to settings file. See the <a href="https://code.claude.com/docs/en/sandboxing">official sandboxing docs</a> for full path syntax.</p>
</div>
</div>
</div>
<div id="tabset-15-3" class="tab-pane" aria-labelledby="tabset-15-3-tab">
<p>Sandboxing is <strong>not yet supported</strong> on native Windows. Use deny rules and allow rules as your primary protection. If you need stronger isolation, consider running Claude Code from WSL2 instead.</p>
</div>
</div>
</div>
<p><strong>Sandbox modes explained:</strong></p>
<table class="caption-top table">
<colgroup>
<col style="width: 23%">
<col style="width: 38%">
<col style="width: 38%">
</colgroup>
<thead>
<tr class="header">
<th>Mode</th>
<th>Behavior</th>
<th>Best for</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Auto-allow</strong></td>
<td>Sandboxed commands run without prompts. Commands that exceed sandbox boundaries fall back to normal approval flow.</td>
<td>Day-to-day work — fewer interruptions, same security</td>
</tr>
<tr class="even">
<td><strong>Regular permissions</strong></td>
<td>All commands go through standard approval, even when sandboxed.</td>
<td>When you want to review every command regardless</td>
</tr>
</tbody>
</table>
<p>In both modes, the OS-level filesystem and network restrictions are identical. The only difference is whether sandboxed commands are auto-approved.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>Sandboxing applies only to Bash commands and their child processes. The built-in Read/Edit tools are governed by permission rules. Use both together for the strongest security posture — this is what Anthropic calls <a href="https://code.claude.com/docs/en/sandboxing">“defense in depth”</a>.</p>
<p>Some tools are incompatible with sandboxing (e.g., <code>docker</code>, <code>watchman</code>). If a tool fails inside the sandbox, you can exclude it with <code>"excludedCommands"</code> in your <a href="https://code.claude.com/docs/en/sandboxing">sandbox settings</a> so it runs outside the sandbox with normal permission prompts instead.</p>
</div>
</div>
<section id="further-reading-on-permissions-security" class="level3">
<h3 class="anchored" data-anchor-id="further-reading-on-permissions-security">Further reading on permissions &amp; security</h3>
<ul>
<li><a href="https://code.claude.com/docs/en/permissions"><strong>Official permissions docs</strong></a> — full reference for rule syntax, tool names, managed policies</li>
<li><a href="https://code.claude.com/docs/en/sandboxing"><strong>Official sandboxing docs</strong></a> — setup, configuration, and security model for OS-level isolation</li>
<li><a href="https://www.anthropic.com/engineering/claude-code-sandboxing"><strong>Anthropic engineering blog on sandboxing</strong></a> — explains the design rationale and how filesystem + network isolation work together</li>
<li><a href="https://www.petefreitag.com/blog/claude-code-permissions/"><strong>Security deep-dive by Pete Freitag</strong></a> — independent analysis of how permissions actually work under the hood, including gotchas and bypasses</li>
<li><a href="https://claudefa.st/blog/guide/development/permission-management"><strong>Permission modes guide</strong></a> — when to use each mode, with workflow examples</li>
</ul>
</section>
</section>
<section id="billing-cost-management-avoiding-runaway-costs" class="level2">
<h2 class="anchored" data-anchor-id="billing-cost-management-avoiding-runaway-costs">9. Billing, Cost Management &amp; Avoiding Runaway Costs</h2>
<blockquote class="blockquote">
<p>Official cost management docs: <a href="https://code.claude.com/docs/en/costs">code.claude.com/docs/en/costs</a></p>
</blockquote>
<div class="callout callout-style-default callout-warning callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Warning</span>Claude Code on cloud providers is pay-as-you-go
</div>
</div>
<div class="callout-body-container callout-body">
<p>Every token you send and receive is billed to your cloud project. Unlike a Claude Pro/Max subscription with fixed monthly pricing, there is no spending cap unless you set one up yourself. <strong>You can rack up significant charges quickly if you’re not careful.</strong></p>
</div>
</div>
<section id="typical-costs-to-expect" class="level3">
<h3 class="anchored" data-anchor-id="typical-costs-to-expect">Typical costs to expect</h3>
<p>According to <a href="https://code.claude.com/docs/en/costs">Anthropic’s own documentation</a>, the average Claude Code session costs about <strong>$6 per developer per day</strong>, with 90% of users staying under $12/day. However, this is an <em>average</em> — complex tasks, long sessions, or agentic loops can blow past this easily. Monthly costs with Sonnet typically run <strong>$100–$200/developer</strong>, but there’s large variance.</p>
<p>Use the <strong><code>/cost</code></strong> command inside Claude Code at any time to see your current session’s token usage and estimated cost:</p>
<pre><code>/cost</code></pre>
</section>
<section id="understand-what-drives-costs" class="level3">
<h3 class="anchored" data-anchor-id="understand-what-drives-costs">Understand what drives costs</h3>
<p>Token costs scale with <strong>context size</strong> — the more context Claude processes, the more you pay. Key cost drivers:</p>
<ul>
<li><strong>Long conversations:</strong> Claude re-processes the entire conversation history with each message. A session that’s been going for hours costs more per-message than a fresh one.</li>
<li><strong>Large codebases:</strong> If Claude reads many files to understand your project, that’s all input tokens.</li>
<li><strong>Extended thinking:</strong> Enabled by default with a budget of <a href="https://code.claude.com/docs/en/costs">~32K tokens</a>. Thinking tokens are billed as output tokens (the expensive kind). For simple tasks, this is overkill.</li>
<li><strong>Agentic loops (see below):</strong> The single biggest risk for runaway costs.</li>
</ul>
</section>
<section id="agentic-loops" class="level3">
<h3 class="anchored" data-anchor-id="agentic-loops">Agentic loops — the #1 cost risk</h3>
<div class="callout callout-style-default callout-warning callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Warning
</div>
</div>
<div class="callout-body-container callout-body">
<p>An <strong>agentic loop</strong> happens when Claude gets stuck in a cycle — repeatedly reading the same files, running the same commands, or auto-compacting and re-expanding its context. This can burn through tokens at an alarming rate.</p>
</div>
</div>
<p><strong>Real-world examples from the community:</strong></p>
<ul>
<li>One user <a href="https://github.com/anthropics/claude-code/issues/9579">reported a bug</a> where an auto-compacting loop consumed <strong>108 million tokens in a single day ($64–$78)</strong>, compared to their normal 12–68M tokens/day.</li>
<li>Runaway spikes of <strong>$235+ over a 4-day period</strong> were documented from that same loop bug.</li>
<li>Version updates have occasionally caused <a href="https://github.com/anthropics/claude-code/issues/16856"><strong>4x faster token consumption</strong></a> on the same tasks.</li>
</ul>
<p><strong>How to protect yourself:</strong></p>
<ul>
<li><strong>Watch for signs:</strong> If Claude seems to be reading the same files over and over, or your <code>/cost</code> jumps unusually fast, press <strong><code>Esc</code></strong> immediately to interrupt.</li>
<li><strong>Use <code>/compact</code></strong> to compress long conversations before they bloat.</li>
<li><strong>Use <code>/clear</code></strong> when switching to unrelated tasks — don’t let stale context accumulate.</li>
<li><strong>Be specific in prompts.</strong> Vague requests like “improve this codebase” trigger broad scanning. “Add input validation to the login function in <code>auth.ts</code>” is much cheaper.</li>
<li><strong>Use plan mode (Shift+Tab)</strong> before expensive operations. Claude outlines its approach for your approval before writing code, preventing costly rework.</li>
<li><strong>Start with Sonnet, not Opus.</strong> Only escalate to Opus for genuinely complex reasoning tasks. Opus costs significantly more per token.</li>
</ul>
</section>
<section id="budget-alerts" class="level3">
<h3 class="anchored" data-anchor-id="budget-alerts">Set up budget alerts (do this NOW)</h3>
<p>Budget alerts won’t automatically stop spending, but they’ll email you when you’re approaching a threshold. <strong>Set this up before your first Claude Code session.</strong></p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-16-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-16-1" aria-controls="tabset-16-1" aria-selected="true" href="">Vertex AI (GCP)</a></li><li class="nav-item"><a class="nav-link" id="tabset-16-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-16-2" aria-controls="tabset-16-2" aria-selected="false" href="">Amazon Bedrock (AWS)</a></li></ul>
<div class="tab-content">
<div id="tabset-16-1" class="tab-pane active" aria-labelledby="tabset-16-1-tab">
<blockquote class="blockquote">
<p>GCP budget alerts: <a href="https://cloud.google.com/billing/docs/how-to/budgets">cloud.google.com/billing/docs/how-to/budgets</a></p>
</blockquote>
<ol type="1">
<li>Go to <a href="https://console.cloud.google.com/billing/budgets"><strong>Billing → Budgets &amp; Alerts</strong></a> in the GCP Console.</li>
<li>Click <strong>Create Budget</strong>.</li>
<li>Scope it to your project (e.g., <code>doit-rci-sandbox-gcp-baa4</code>).</li>
<li>Set a monthly budget amount you’re comfortable with (e.g., $50, $100, $200).</li>
<li>Set alert thresholds at <strong>50%, 80%, and 100%</strong> of your budget.</li>
<li>Add your email as a notification recipient.</li>
</ol>
<div class="callout callout-style-default callout-important callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Important
</div>
</div>
<div class="callout-body-container callout-body">
<p>Budget alerts are <a href="https://cloud.google.com/billing/docs/how-to/budgets"><strong>notifications only</strong></a> — they do NOT automatically stop your services or disable billing. If you hit 100% of your budget at 2 AM, charges will continue until you manually intervene. Check your billing console daily.</p>
</div>
</div>
<p><strong>Set quota limits as a hard ceiling:</strong></p>
<p>Unlike budget alerts, <strong>quotas can actually stop usage.</strong> Go to <a href="https://console.cloud.google.com/apis/api/aiplatform.googleapis.com/quotas"><strong>Quotas &amp; System Limits</strong></a> and set reasonable TPM (tokens per minute) quotas for the Claude models you’re using. If you hit your quota, Claude Code will get 429 errors instead of running up your bill.</p>
</div>
<div id="tabset-16-2" class="tab-pane" aria-labelledby="tabset-16-2-tab">
<blockquote class="blockquote">
<p>AWS Budgets docs: <a href="https://docs.aws.amazon.com/cost-management/latest/userguide/budgets-managing-costs.html">docs.aws.amazon.com/cost-management/latest/userguide/budgets-managing-costs.html</a></p>
</blockquote>
<ol type="1">
<li>Go to <a href="https://console.aws.amazon.com/billing/home#/budgets"><strong>AWS Budgets</strong></a> in the AWS Console.</li>
<li>Click <strong>Create budget</strong>.</li>
<li>Choose <strong>Cost budget</strong> and set a monthly amount (e.g., $50, $100, $200).</li>
<li>Add alert thresholds at <strong>50%, 80%, and 100%</strong> of your budget.</li>
<li>Add your email as a notification recipient.</li>
</ol>
<div class="callout callout-style-default callout-important callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Important
</div>
</div>
<div class="callout-body-container callout-body">
<p>Like GCP, AWS budget alerts are <a href="https://docs.aws.amazon.com/cost-management/latest/userguide/budgets-managing-costs.html"><strong>notifications only</strong></a> by default — they do NOT automatically stop your services. However, AWS does support <a href="https://docs.aws.amazon.com/cost-management/latest/userguide/budgets-controls.html">budget actions</a> that can automatically restrict IAM permissions when a threshold is hit. Consider setting up an action to revoke <code>bedrock:InvokeModel</code> at 100% of budget as a hard stop.</p>
</div>
</div>
<p><strong>Bedrock quotas:</strong></p>
<p>Bedrock enforces per-model quotas on RPM (requests per minute) and TPM (tokens per minute). Default quotas are relatively low (e.g., 25 RPM for Opus). View and request increases at <a href="https://console.aws.amazon.com/servicequotas/home/services/bedrock/quotas"><strong>Service Quotas → Amazon Bedrock</strong></a> in the AWS Console.</p>
</div>
</div>
</div>
</section>
<section id="check-your-billing-console-daily" class="level3">
<h3 class="anchored" data-anchor-id="check-your-billing-console-daily">Check your billing console daily</h3>
<p>Make it a habit to check your cloud billing daily while actively using Claude Code. Look for:</p>
<ul>
<li>Unexpected spikes in AI/ML charges</li>
<li>Daily costs that exceed your expectations</li>
<li>Any charges from services you didn’t intentionally use</li>
</ul>
</section>
<section id="cost-saving-tips" class="level3">
<h3 class="anchored" data-anchor-id="cost-saving-tips">Cost-saving tips</h3>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Strategy</th>
<th>Savings</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Use <strong>Sonnet</strong> instead of Opus for routine tasks</td>
<td>Significantly lower per-token cost</td>
</tr>
<tr class="even">
<td>Run <strong><code>/clear</code></strong> between unrelated tasks</td>
<td>Avoids paying to re-process old context</td>
</tr>
<tr class="odd">
<td>Run <strong><code>/compact</code></strong> during long sessions</td>
<td>Compresses context, reduces per-message cost</td>
</tr>
<tr class="even">
<td>Lower extended thinking budget (<code>MAX_THINKING_TOKENS=8000</code>)</td>
<td>Reduces output token costs on simple tasks</td>
</tr>
<tr class="odd">
<td>Keep your <code>CLAUDE.md</code> files lean (&lt;200 lines recommended)</td>
<td>Less context loaded on every message — see below</td>
</tr>
<tr class="even">
<td>Use <strong>plan mode</strong> before big refactors</td>
<td>Catches wrong approaches before expensive execution</td>
</tr>
<tr class="odd">
<td>Disable non-essential model calls</td>
<td><code>DISABLE_NON_ESSENTIAL_MODEL_CALLS=1</code> in env</td>
</tr>
</tbody>
</table>
</section>
<section id="pricing-reference" class="level3">
<h3 class="anchored" data-anchor-id="pricing-reference">Pricing reference</h3>
<p>For the latest Claude model pricing, see:</p>
<ul>
<li><a href="https://platform.claude.com/docs/en/about-claude/pricing"><strong>Anthropic Pricing</strong></a> — base token rates by model, including cache and batch discounts</li>
<li><a href="https://code.claude.com/docs/en/costs"><strong>Claude Code Cost Management</strong></a> — Anthropic’s official guide to tracking and reducing Claude Code costs</li>
<li><a href="https://cloud.google.com/vertex-ai/generative-ai/pricing"><strong>Vertex AI Generative AI Pricing</strong></a> — Google’s per-token rates (regional endpoints carry a <a href="https://cloud.google.com/vertex-ai/generative-ai/pricing">10% premium</a> over global)</li>
<li><a href="https://aws.amazon.com/bedrock/pricing/"><strong>Amazon Bedrock Pricing</strong></a> — AWS per-token rates for Claude models</li>
</ul>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>On Vertex AI, the token usage shown on the <a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude/use-claude">GCP Quotas page may be inaccurate</a> for Claude models due to estimation and refund logic. For accurate billing, check the GCP Billing Reports page or use <code>/cost</code> inside Claude Code.</p>
</div>
</div>
</section>
</section>
<section id="data-usage" class="level2">
<h2 class="anchored" data-anchor-id="data-usage">10. Data Usage &amp; Privacy</h2>
<blockquote class="blockquote">
<p>Official data usage docs: <a href="https://code.claude.com/docs/en/data-usage">code.claude.com/docs/en/data-usage</a></p>
</blockquote>
<section id="training-policy" class="level3">
<h3 class="anchored" data-anchor-id="training-policy">Training policy</h3>
<p>Whether your code is used for model training depends on your account type:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Account type</th>
<th>Used for training?</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>API, Vertex AI, Bedrock, Foundry</strong> (commercial)</td>
<td><strong>No</strong> — Anthropic does not train on your prompts or outputs unless you explicitly opt in (e.g., the <a href="https://support.claude.com/en/articles/11174108-about-the-development-partner-program">Development Partner Program</a>)</td>
</tr>
<tr class="even">
<td><strong>Team &amp; Enterprise</strong> (commercial)</td>
<td><strong>No</strong> — same commercial terms</td>
</tr>
<tr class="odd">
<td><strong>Free, Pro, Max</strong> (consumer)</td>
<td><strong>Opt-in</strong> — you choose whether to allow training at <a href="https://claude.ai/settings/data-privacy-controls">claude.ai/settings/data-privacy-controls</a></td>
</tr>
</tbody>
</table>
<p>If you’re accessing Claude Code through a UW-Madison cloud account (Vertex AI or Bedrock), your usage falls under Anthropic’s commercial terms — <strong>your code is not used for training</strong>. However, note that UW-Madison does not yet have a direct data-use agreement with Anthropic. The no-training guarantee comes from Anthropic’s standard commercial terms, not from a UW-negotiated agreement. This distinction matters for restricted or regulated data — see the data sensitivity warning at the top of this guide.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong>Important nuances for consumer plans (Free/Pro/Max):</strong></p>
<ul>
<li><strong>Safety exception:</strong> Even with training disabled, conversations flagged for <a href="https://www.anthropic.com/legal/aup">safety review</a> may be used to improve Anthropic’s safety systems (e.g., training safeguard models).</li>
<li><strong>What’s included:</strong> The entire conversation — prompts, outputs, custom styles, and conversation preferences.</li>
<li><strong>What’s excluded:</strong> Raw content from connectors (Google Drive, MCP servers) is <strong>not</strong> included, unless you directly copy that content into your conversation.</li>
<li><strong>Feedback (thumbs up/down):</strong> Submitting feedback stores the full related conversation for up to 5 years (de-linked from your user ID). This data may be used for training regardless of your training setting.</li>
</ul>
</div>
</div>
</section>
<section id="data-retention" class="level3">
<h3 class="anchored" data-anchor-id="data-retention">Data retention</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 50%">
<col style="width: 50%">
</colgroup>
<thead>
<tr class="header">
<th>Account type</th>
<th>Retention period</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>API / Vertex AI / Bedrock / Team / Enterprise</td>
<td>30 days (default)</td>
</tr>
<tr class="even">
<td>Enterprise with Zero Data Retention (ZDR)</td>
<td>0 days (must be enabled per organization)</td>
</tr>
<tr class="odd">
<td>Consumer — training allowed</td>
<td>5 years</td>
</tr>
<tr class="even">
<td>Consumer — training not allowed</td>
<td>30 days</td>
</tr>
</tbody>
</table>
<p>Claude Code also caches sessions locally on your machine for up to 30 days to enable session resumption (configurable).</p>
</section>
<section id="telemetry-and-error-reporting" class="level3">
<h3 class="anchored" data-anchor-id="telemetry-and-error-reporting">Telemetry and error reporting</h3>
<p>Claude Code sends operational metrics and error reports by default when using the direct Claude API:</p>
<table class="caption-top table">
<colgroup>
<col style="width: 33%">
<col style="width: 33%">
<col style="width: 33%">
</colgroup>
<thead>
<tr class="header">
<th>Service</th>
<th>What it sends</th>
<th>Opt-out</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Statsig</strong> (metrics)</td>
<td>Latency, reliability, usage patterns — <strong>no code or file paths</strong></td>
<td><code>DISABLE_TELEMETRY=1</code></td>
</tr>
<tr class="even">
<td><strong>Sentry</strong> (errors)</td>
<td>Error logs — <strong>no code or file paths</strong></td>
<td><code>DISABLE_ERROR_REPORTING=1</code></td>
</tr>
<tr class="odd">
<td><strong><code>/bug</code> command</strong></td>
<td>Full conversation history including code (only when you run <code>/bug</code>)</td>
<td><code>DISABLE_BUG_COMMAND=1</code></td>
</tr>
<tr class="even">
<td><strong>Session quality surveys</strong></td>
<td>Numeric rating only (1/2/3/dismiss) — <strong>no conversation data</strong></td>
<td><code>CLAUDE_CODE_DISABLE_FEEDBACK_SURVEY=1</code></td>
</tr>
</tbody>
</table>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Tip
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong>Using Vertex AI, Bedrock, or Foundry?</strong> All non-essential traffic (telemetry, error reporting, <code>/bug</code>, surveys) is <strong>disabled by default</strong> for third-party providers. You don’t need to set any environment variables.</p>
<p>To disable everything at once regardless of provider, set <code>CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1</code>. Environment variables can be set in your <a href="https://code.claude.com/docs/en/settings"><code>settings.json</code></a>.</p>
</div>
</div>
</section>
</section>
<section id="troubleshooting" class="level2">
<h2 class="anchored" data-anchor-id="troubleshooting">11. Troubleshooting</h2>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-19-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-19-1" aria-controls="tabset-19-1" aria-selected="true" href="">Vertex AI</a></li><li class="nav-item"><a class="nav-link" id="tabset-19-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-19-2" aria-controls="tabset-19-2" aria-selected="false" href="">Amazon Bedrock</a></li></ul>
<div class="tab-content">
<div id="tabset-19-1" class="tab-pane active" aria-labelledby="tabset-19-1-tab">
<p><strong>“model not found” (404):</strong></p>
<p>The model may not be available on the <code>global</code> endpoint. Try changing your region:</p>
<pre class="jsonc"><code>"CLOUD_ML_REGION": "us-east5"</code></pre>
<p>Or add a model-specific override (see Section 5).</p>
<p><strong>429 “Resource Exhausted”:</strong></p>
<p>You need a quota increase. Go to <a href="https://console.cloud.google.com/apis/api/aiplatform.googleapis.com/quotas">Cloud Console → Quotas</a>, filter by the Claude model, and request an increase.</p>
<p><strong>Permission denied / IAM errors:</strong></p>
<p>Ask your UW-Madison GCP admin to ensure you have the <code>roles/aiplatform.user</code> role on the project.</p>
<p><strong><code>gcloud</code> command not found:</strong></p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-17-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-17-1" aria-controls="tabset-17-1" aria-selected="true" href="">Windows</a></li><li class="nav-item"><a class="nav-link" id="tabset-17-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-17-2" aria-controls="tabset-17-2" aria-selected="false" href="">macOS</a></li></ul>
<div class="tab-content">
<div id="tabset-17-1" class="tab-pane active" aria-labelledby="tabset-17-1-tab">
<p>Make sure the Google Cloud SDK is installed and restart PowerShell.</p>
</div>
<div id="tabset-17-2" class="tab-pane" aria-labelledby="tabset-17-2-tab">
<p>If you installed via Homebrew, run <code>source "$(brew --prefix)/share/google-cloud-sdk/path.zsh.inc"</code> or restart your terminal. If you used the curl installer, make sure you ran <code>source ~/.zshrc</code> after install.</p>
</div>
</div>
</div>
<p><strong>Authentication expired:</strong></p>
<p>Re-run (both platforms):</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb57" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb57-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">gcloud</span> auth application-default login</span></code></pre></div></div>
</div>
<div id="tabset-19-2" class="tab-pane" aria-labelledby="tabset-19-2-tab">
<p><strong><code>AccessDeniedException</code>:</strong></p>
<p>Your IAM user/role doesn’t have the required Bedrock permissions. Ensure your policy includes <code>bedrock:InvokeModel</code> and <code>bedrock:InvokeModelWithResponseStream</code>. See Section 4 for the full policy.</p>
<p><strong><code>ValidationException: Model not found</code>:</strong></p>
<p>Check that: (1) you’ve enabled model access for the specific Claude model in the Bedrock console, (2) your <code>AWS_REGION</code> is set to a region where the model is available, and (3) your model ID is correct. Try <code>us-east-1</code> if unsure.</p>
<p><strong><code>ResourceNotFoundException</code>:</strong></p>
<p>You haven’t completed model access approval. Go to the <a href="https://console.aws.amazon.com/bedrock/">Bedrock Model catalog</a> and request access.</p>
<p><strong>429 “ThrottlingException”:</strong></p>
<p>You’ve hit your Bedrock quota. Request an increase at <a href="https://console.aws.amazon.com/servicequotas/home/services/bedrock/quotas">Service Quotas → Amazon Bedrock</a>. Default quotas are low (e.g., 25 RPM for Opus).</p>
<p><strong><code>aws</code> command not found:</strong></p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-18-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-18-1" aria-controls="tabset-18-1" aria-selected="true" href="">Windows</a></li><li class="nav-item"><a class="nav-link" id="tabset-18-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-18-2" aria-controls="tabset-18-2" aria-selected="false" href="">macOS</a></li></ul>
<div class="tab-content">
<div id="tabset-18-1" class="tab-pane active" aria-labelledby="tabset-18-1-tab">
<p>Make sure the AWS CLI is installed and restart PowerShell.</p>
</div>
<div id="tabset-18-2" class="tab-pane" aria-labelledby="tabset-18-2-tab">
<p>If you installed via Homebrew, restart your terminal. If you used the pkg installer, ensure <code>/usr/local/bin</code> is in your <code>$PATH</code>.</p>
</div>
</div>
</div>
<p><strong>Credentials expired (SSO):</strong></p>
<p>Re-run:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb58" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb58-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">aws</span> sso login <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--profile</span> your-profile-name</span></code></pre></div></div>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Tip
</div>
</div>
<div class="callout-body-container callout-body">
<p>Claude Code supports <a href="https://code.claude.com/docs/en/amazon-bedrock">automatic credential refresh</a> for AWS SSO. Configure <code>awsAuthRefresh</code> in your settings so Claude Code automatically re-authenticates when credentials expire mid-session, instead of failing with auth errors.</p>
</div>
</div>
</div>
</div>
</div>
<section id="common-issues-both-providers" class="level3">
<h3 class="anchored" data-anchor-id="common-issues-both-providers">Common issues (both providers)</h3>
<p><strong><code>claude</code> command not found:</strong></p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-20-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-20-1" aria-controls="tabset-20-1" aria-selected="true" href="">Windows</a></li><li class="nav-item"><a class="nav-link" id="tabset-20-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-20-2" aria-controls="tabset-20-2" aria-selected="false" href="">macOS</a></li></ul>
<div class="tab-content">
<div id="tabset-20-1" class="tab-pane active" aria-labelledby="tabset-20-1-tab">
<p>Ensure <code>C:\Users\&lt;you&gt;\.local\bin</code> is in your user PATH (see Section 2). Restart PowerShell after editing PATH.</p>
</div>
<div id="tabset-20-2" class="tab-pane" aria-labelledby="tabset-20-2-tab">
<p>Ensure <code>~/.claude/bin</code> or <code>~/.local/bin</code> is in your <code>$PATH</code>. Restart your terminal or run <code>source ~/.zshrc</code>.</p>
</div>
</div>
</div>
<p><strong>Run diagnostics:</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb59" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb59-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">claude</span> doctor</span></code></pre></div></div>
<p>This checks your installation, authentication, and configuration for common issues.</p>
</section>
</section>
<section id="quick-reference" class="level2">
<h2 class="anchored" data-anchor-id="quick-reference">12. Quick Reference</h2>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-21-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-21-1" aria-controls="tabset-21-1" aria-selected="true" href="">Vertex AI + Windows</a></li><li class="nav-item"><a class="nav-link" id="tabset-21-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-21-2" aria-controls="tabset-21-2" aria-selected="false" href="">Vertex AI + macOS</a></li><li class="nav-item"><a class="nav-link" id="tabset-21-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-21-3" aria-controls="tabset-21-3" aria-selected="false" href="">Bedrock + Windows</a></li><li class="nav-item"><a class="nav-link" id="tabset-21-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-21-4" aria-controls="tabset-21-4" aria-selected="false" href="">Bedrock + macOS</a></li></ul>
<div class="tab-content">
<div id="tabset-21-1" class="tab-pane active" aria-labelledby="tabset-21-1-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb60" style="background: #f1f3f5;"><pre class="sourceCode powershell code-with-copy"><code class="sourceCode powershell"><span id="cb60-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Install Claude Code</span></span>
<span id="cb60-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">irm</span> https<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">://</span>claude<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ai</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>install<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ps1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">iex</span></span>
<span id="cb60-3"></span>
<span id="cb60-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Install gcloud (if needed)</span></span>
<span id="cb60-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Download from https://cloud.google.com/sdk/docs/install</span></span>
<span id="cb60-6"></span>
<span id="cb60-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Authenticate with GCP</span></span>
<span id="cb60-8">gcloud auth login</span>
<span id="cb60-9">gcloud auth application-default login</span>
<span id="cb60-10"></span>
<span id="cb60-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Launch Claude Code</span></span>
<span id="cb60-12"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cd</span> C<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>\your\project</span>
<span id="cb60-13">claude</span></code></pre></div></div>
</div>
<div id="tabset-21-2" class="tab-pane" aria-labelledby="tabset-21-2-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb61" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb61-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Install Claude Code</span></span>
<span id="cb61-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">curl</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-fsSL</span> https://claude.ai/install.sh <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">|</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bash</span></span>
<span id="cb61-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># or: brew install --cask claude-code</span></span>
<span id="cb61-4"></span>
<span id="cb61-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Install gcloud (if needed)</span></span>
<span id="cb61-6"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">brew</span> install <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--cask</span> google-cloud-sdk</span>
<span id="cb61-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># or: curl https://sdk.cloud.google.com | bash</span></span>
<span id="cb61-8"></span>
<span id="cb61-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Authenticate with GCP</span></span>
<span id="cb61-10"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">gcloud</span> auth login</span>
<span id="cb61-11"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">gcloud</span> auth application-default login</span>
<span id="cb61-12"></span>
<span id="cb61-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Launch Claude Code</span></span>
<span id="cb61-14"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> ~/your/project</span>
<span id="cb61-15"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">claude</span></span></code></pre></div></div>
</div>
<div id="tabset-21-3" class="tab-pane" aria-labelledby="tabset-21-3-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb62" style="background: #f1f3f5;"><pre class="sourceCode powershell code-with-copy"><code class="sourceCode powershell"><span id="cb62-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Install Claude Code</span></span>
<span id="cb62-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">irm</span> https<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">://</span>claude<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ai</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>install<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ps1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">iex</span></span>
<span id="cb62-3"></span>
<span id="cb62-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Install AWS CLI (if needed)</span></span>
<span id="cb62-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Download from https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html</span></span>
<span id="cb62-6"></span>
<span id="cb62-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Authenticate with AWS</span></span>
<span id="cb62-8">aws sso login <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span>profile your-profile</span>
<span id="cb62-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># or: aws configure</span></span>
<span id="cb62-10"></span>
<span id="cb62-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Launch Claude Code</span></span>
<span id="cb62-12"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cd</span> C<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>\your\project</span>
<span id="cb62-13">claude</span></code></pre></div></div>
</div>
<div id="tabset-21-4" class="tab-pane" aria-labelledby="tabset-21-4-tab">
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb63" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb63-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Install Claude Code</span></span>
<span id="cb63-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">curl</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-fsSL</span> https://claude.ai/install.sh <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">|</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bash</span></span>
<span id="cb63-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># or: brew install --cask claude-code</span></span>
<span id="cb63-4"></span>
<span id="cb63-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Install AWS CLI (if needed)</span></span>
<span id="cb63-6"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">brew</span> install awscli</span>
<span id="cb63-7"></span>
<span id="cb63-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Authenticate with AWS</span></span>
<span id="cb63-9"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">aws</span> sso login <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--profile</span> your-profile</span>
<span id="cb63-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># or: aws configure</span></span>
<span id="cb63-11"></span>
<span id="cb63-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Launch Claude Code</span></span>
<span id="cb63-13"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> ~/your/project</span>
<span id="cb63-14"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">claude</span></span></code></pre></div></div>
</div>
</div>
</div>
<section id="in-session-commands-all-providers" class="level3">
<h3 class="anchored" data-anchor-id="in-session-commands-all-providers">In-session commands (all providers)</h3>
<pre class="text"><code>/status        # Confirm your cloud provider is active
/cost          # Check current session token usage and cost
/compact       # Compress conversation context to save tokens
/clear         # Wipe context when switching tasks
/permissions   # View and manage active permission rules
/sandbox       # Check or enable OS-level sandboxing
Esc            # Interrupt Claude mid-operation
Shift+Tab      # Cycle through permission modes</code></pre>
</section>
</section>
<section id="next-steps" class="level2">
<h2 class="anchored" data-anchor-id="next-steps">13. Next Steps — Your First Session</h2>
<p>You’re set up. Here’s what to do the first time you launch Claude Code in a real project.</p>
<section id="set-up-your-claudemd" class="level3">
<h3 class="anchored" data-anchor-id="set-up-your-claudemd">Set up your <code>CLAUDE.md</code></h3>
<p><a href="https://code.claude.com/docs/en/memory"><code>CLAUDE.md</code></a> is a markdown file in your project root that gives Claude persistent context — build commands, test commands, architectural conventions, anything Claude can’t infer from the code alone. It loads automatically at the start of every session.</p>
<p>Run <code>/init</code> inside Claude Code to generate a starter file, but <strong>treat the output as a draft, not a finished product</strong>. Manually curate it down to only what Claude actually needs. Research from ETH Zurich (<a href="https://arxiv.org/abs/2602.11988">Banerjee et al., 2025</a>) found that LLM-generated context files can actually <em>decrease</em> agent performance compared to no context file at all — the auto-generated content adds noise that dilutes the instructions that matter.</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Less is more for <code>CLAUDE.md</code>
</div>
</div>
<div class="callout-body-container callout-body">
<p>Anthropic’s own guidance is blunt: <em>“For each line, ask: ‘Would removing this cause Claude to make mistakes?’ If not, cut it.”</em> Claude Code triggers a <a href="https://github.com/anthropics/claude-code/issues/2766">performance warning</a> when <code>CLAUDE.md</code> exceeds ~40,000 characters. Community consensus recommends <strong>under 200 lines</strong>. Key principles:</p>
<ul>
<li><strong>Don’t send an LLM to do a linter’s job.</strong> Code style rules belong in your linter/formatter config, not in <code>CLAUDE.md</code>.</li>
<li><strong>Use <a href="https://code.claude.com/docs/en/skills">skills</a> for specialized knowledge.</strong> Skills load on demand — a <code>/deploy</code> skill only enters context when relevant, keeping <code>CLAUDE.md</code> lean.</li>
<li><strong>Use <a href="https://code.claude.com/docs/en/memory"><code>.claude/rules/</code></a> for scoped rules.</strong> Rules that only apply to certain file types (e.g., <code>.tsx</code> conventions) can live in context-specific rule files instead of the global <code>CLAUDE.md</code>.</li>
<li><strong>Periodically prune.</strong> As your project evolves, remove instructions that Claude follows correctly without being told.</li>
</ul>
<p>For more detail on <code>CLAUDE.md</code> authoring, prompt techniques, and workflow patterns, see our companion guide: <a href="../../Learn/Blogs/claude-code-best-practices.html">Claude Code Best Practices</a>.</p>
</div>
</div>
</section>
<section id="pick-the-right-model-for-the-task" class="level3">
<h3 class="anchored" data-anchor-id="pick-the-right-model-for-the-task">Pick the right model for the task</h3>
<p>Your <code>settings.json</code> sets a default model, but you can switch mid-session with <code>/model</code>. A general rule of thumb:</p>
<table class="caption-top table">
<colgroup>
<col style="width: 30%">
<col style="width: 43%">
<col style="width: 26%">
</colgroup>
<thead>
<tr class="header">
<th>Model</th>
<th>Best for</th>
<th>Cost</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Sonnet</strong></td>
<td>Day-to-day coding, refactoring, test writing, most tasks</td>
<td>Lowest</td>
</tr>
<tr class="even">
<td><strong>Opus</strong></td>
<td>Complex multi-step reasoning, architecture decisions, subtle bugs</td>
<td>Highest</td>
</tr>
<tr class="odd">
<td><strong>Haiku</strong></td>
<td>Quick lookups, simple edits, boilerplate generation</td>
<td>Very low</td>
</tr>
</tbody>
</table>
<p>Start with <strong>Sonnet</strong> — it handles the vast majority of tasks well. Escalate to Opus only when you notice Sonnet struggling with complex reasoning. Use Haiku for high-volume, low-complexity work where cost matters.</p>
</section>
<section id="build-good-habits-early" class="level3">
<h3 class="anchored" data-anchor-id="build-good-habits-early">Build good habits early</h3>
<ul>
<li><strong>Commit before you start.</strong> A clean Git state means you can always <code>git diff</code> to see what Claude changed, or <code>git checkout .</code> to revert.</li>
<li><strong>Be specific.</strong> “Add input validation to the <code>login()</code> function in <code>auth.py</code>” is cheaper and more reliable than “improve the auth module.”</li>
<li><strong>Use <code>/compact</code> and <code>/clear</code></strong> to manage context. Long sessions get expensive and Claude’s attention degrades as context grows (<a href="https://arxiv.org/abs/2307.03172">Liu et al., 2024</a>).</li>
<li><strong>Press <code>Esc</code></strong> the moment something looks wrong. Don’t let an agentic loop run up your bill — see Section 9.</li>
</ul>
</section>
</section>
<section id="further-reading" class="level2">
<h2 class="anchored" data-anchor-id="further-reading">Further Reading</h2>
<ul>
<li><a href="https://code.claude.com/docs/en/setup">Claude Code Setup Docs</a></li>
<li><a href="https://code.claude.com/docs/en/google-vertex-ai">Claude Code on Vertex AI</a></li>
<li><a href="https://code.claude.com/docs/en/amazon-bedrock">Claude Code on Amazon Bedrock</a></li>
<li><a href="https://code.claude.com/docs/en/permissions">Claude Code Permissions</a></li>
<li><a href="https://code.claude.com/docs/en/sandboxing">Claude Code Sandboxing</a></li>
<li><a href="https://code.claude.com/docs/en/data-usage">Claude Code Data Usage</a></li>
<li><a href="https://code.claude.com/docs/en/costs">Claude Code Cost Management</a></li>
<li><a href="https://console.cloud.google.com/vertex-ai/model-garden">Vertex AI Model Garden</a></li>
<li><a href="https://console.aws.amazon.com/bedrock/">Amazon Bedrock Model Catalog</a></li>
<li><a href="https://cloud.google.com/billing/docs/how-to/budgets">GCP Budget Alerts</a></li>
<li><a href="https://docs.aws.amazon.com/cost-management/latest/userguide/budgets-managing-costs.html">AWS Budgets</a></li>
<li><a href="../../Toolbox/Compute/UW-Cloud-Services.html">UW-Madison Cloud Services</a></li>
</ul>


</section>

 ]]></description>
  <category>Guides</category>
  <category>GenAI</category>
  <category>LLM</category>
  <category>Agentic coding</category>
  <category>Cloud</category>
  <category>GCP</category>
  <category>AWS</category>
  <category>Bedrock</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Guides/claude-code-cloud-setup.html</guid>
  <pubDate>Sun, 15 Feb 2026 00:00:00 GMT</pubDate>
  <media:content url="https://uw-madison-datascience.github.io/ML-X-Nexus/images/claudecode.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Intro to AWS SageMaker for Predictive ML/AI</title>
  <dc:creator>Chris Endemann</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-Amazon_SageMaker.html</link>
  <description><![CDATA[ 




<p>This introductory <a href="https://carpentries-incubator.github.io/ML_with_AWS_SageMaker/">AWS SageMaker workshop</a> teaches core workflows for running predictive ML/AI models in AWS SageMaker, an AWS-managed machine learning environment. Participants will learn to set up data, configure SageMaker Notebooks, manage code repositories, train and tune models, and optimize resource costs effectively within AWS. Users will benefit from tips on controlling AWS expenses and scaling models efficiently, with real-world guidance on choosing appropriate CPU and GPU resources.</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>UW-Madison Cloud Users
</div>
</div>
<div class="callout-body-container callout-body">
<p>A personal AWS account is fine for this workshop. However, for <strong>long-term research use</strong>, we recommend switching to a <strong>UW-provisioned AWS account</strong>. You’ll get institutional pricing via <a href="https://internet2.edu/cloud/cloud-solutions-community/net-plus/">Internet2 NET+</a>, <a href="https://rsp.wisc.edu/proposalprep/cloudComputeInfo.cfm">lower overhead on grants</a> (26% instead of 55.5% — saving ~$2,950 per $10k in cloud costs), data protection agreements (including BAA for HIPAA), and dedicated support from the <a href="https://kb.wisc.edu/page.php?id=109785">Public Cloud Team</a>. NIH-funded researchers can get additional discounts through the <a href="https://kb.wisc.edu/109813">STRIDES Initiative</a>.</p>
<p><strong><a href="https://kb.wisc.edu/sbsedirbs/page.php?id=104090">Request a UW AWS account</a></strong> | <strong><a href="https://kb.wisc.edu/page.php?id=109785">Why use a UW account?</a></strong> | <strong><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/UW-Cloud-Services.html">Full details: UW Cloud Services</a></strong></p>
</div>
</div>
<section id="cost-estimate" class="level4">
<h4 class="anchored" data-anchor-id="cost-estimate">Cost estimate</h4>
<p>Running through this workshop should cost approximately <strong>$5-$10</strong> on AWS, assuming moderate usage of GPU instances and a few parallel jobs (i.e., sticking to the lesson materials). For new AWS accounts, the <strong>AWS Free Tier</strong> may cover some of these costs, including 250 hours per month of the <code>ml.t2.medium</code> instance for the first two months, as well as some limited S3 storage. New users may be able to complete certain parts of the workshop for free or at a significantly reduced cost. We recommend monitoring usage through the AWS Billing Dashboard to stay within the free tier and manage any extra expenses effectively.</p>
</section>
<section id="prerequisites" class="level4">
<h4 class="anchored" data-anchor-id="prerequisites">Prerequisites</h4>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-ML_Sklearn.html"><strong>Workshop</strong>: Intro to Machine Learning</a></li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-Python_Gapminder.html"><strong>Workshop</strong>: Basic Python Programming</a></li>
</ul>
</section>
<section id="estimated-time-to-complete" class="level4">
<h4 class="anchored" data-anchor-id="estimated-time-to-complete">Estimated time to complete</h4>
<p><strong>3-5 hours</strong>: Based on running through training, tuning, and experimenting with example code setups.</p>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/UW-Cloud-Services.html"><strong>Compute</strong>: UW-Madison Cloud Services (AWS, GCP, Azure)</a> – Institutional discounts, lower grant overhead, data protections, research credits, and how to request a UW cloud account.</li>
<li><a href="https://kb.wisc.edu/101516">Public Cloud Team Office Hours</a> – Drop-in hours on Thursdays, 2–3:15 PM via Zoom. Get answers to cloud-related questions from the RCI and Public Cloud Team.</li>
<li><a href="https://aws.amazon.com/free">AWS Free Tier Guide</a>: An overview of the AWS Free Tier, including limitations and expected costs for beginner users.</li>
<li><a href="https://researchci.it.wisc.edu/introduction-to-aws-for-researchers/">Introduction to AWS for Researchers (RCI)</a> – UW-Madison RCI’s guide to getting started with AWS for research.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-GCP.html"><strong>Compute</strong>: Intro to GCP for ML &amp; AI</a> – Parallel workshop covering similar cloud ML concepts using GCP infrastructure.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/BadgerCompute.html"><strong>Compute</strong>: BadgerCompute</a> – UW–Madison’s lightweight, NetID-authenticated Jupyter service for short interactive sessions and classroom use. Includes a 4-hour runtime limit (which may sometimes beat the free version of Colab).</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/GoogleColab.html"><strong>Compute</strong>: Google Colab</a> - Learn how to use Google Colab for machine learning workflows.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/CHTC.html"><strong>Compute</strong>: Center for High Throughput Computing (CHTC)</a> - Learn how to use CHTC for machine learning jobs.</li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Workshops</category>
  <category>Code-along</category>
  <category>Carpentries</category>
  <category>Compute</category>
  <category>AWS</category>
  <category>GPU</category>
  <category>Cloud</category>
  <category>RAG</category>
  <category>SageMaker</category>
  <category>Bedrock</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-Amazon_SageMaker.html</guid>
  <pubDate>Fri, 07 Nov 2025 00:00:00 GMT</pubDate>
  <media:content url="https://uw-madison-datascience.github.io/ML-X-Nexus/images/SageMaker.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>Efficient KV-Cache Compression for Long-Context and Reasoning Models</title>
  <dc:creator>Zefan Cai</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Applications/Videos/Forums/mlx_2025-11-04.html</link>
  <description><![CDATA[ 




<p>Large language models (LLMs) increasingly handle very long input contexts, and their inference relies on storing key-value (KV) caches for past tokens to avoid redundant computation. However, as context length grows, the memory footprint of full KV caches becomes a major bottleneck. In this talk, Zefan Cai (CS PhD Student, UW-Madison, advised by Prof.&nbsp;Junjie Hu) presents two complementary approaches to compressing the KV cache, highlighting the underlying principles, trade-offs, and practical benefits for inference efficiency.</p>
<section id="pyramid-kv" class="level4">
<h4 class="anchored" data-anchor-id="pyramid-kv">Pyramid KV</h4>
<p>Pyramid KV is motivated by the observation that in transformer-based LLMs, attention flows from broad scopes in lower layers to narrow, focused contexts in higher layers (“pyramidal information funneling”). By allocating more cache budget in lower layers and gradually reducing it in higher layers, Pyramid KV achieves near-full performance while retaining only ~12% of the full KV cache on long-context benchmarks.</p>
</section>
<section id="r-kv-redundancy-aware-kv-cache-compression" class="level4">
<h4 class="anchored" data-anchor-id="r-kv-redundancy-aware-kv-cache-compression">R-KV: Redundancy-aware KV Cache Compression</h4>
<p>Building upon Pyramid KV, R-KV targets reasoning-heavy tasks (e.g., chain-of-thought) where long outputs produce very large KV caches. R-KV identifies and prunes redundant tokens in the cache, enabling roughly a 90% memory saving and ~6.6x throughput improvement, while preserving or even slightly improving accuracy compared to the full cache.</p>
<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/hyEJi5N4p3Q" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Applications/Videos/ML4MI/2023-09-11_Exploring-Generative-AI-An-Intro-to-LLMs-and-Diffusion-Models_Kangwook-Lee.html"><strong>Talk</strong>: Exploring Generative AI: An Introduction to Large Language Models and Diffusion Models</a>: An introductory overview of LLMs covering next-word prediction, GPT models, and parameter-efficient fine-tuning with LoRA.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Applications/Videos/Forums/mlx_2025-09-09.html"><strong>Talk</strong>: AI’s Environmental Footprint: Insights and Actions</a>: Benchmarking the energy, water, and carbon costs of LLM inference—motivation for why KV-cache compression and inference efficiency matter.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Books/Intro-Deeplearning_SimonJDPrince.html"><strong>Book</strong>: Understanding Deep Learning</a>: A modern overview of deep learning fundamentals including transformer architecture, with interactive Colab notebooks.</li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Videos</category>
  <category>ML+X</category>
  <category>UW-Madison</category>
  <category>LLM</category>
  <category>Deep learning</category>
  <category>NLP</category>
  <category>GenAI</category>
  <category>Foundation models</category>
  <category>GPU</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Applications/Videos/Forums/mlx_2025-11-04.html</guid>
  <pubDate>Tue, 04 Nov 2025 00:00:00 GMT</pubDate>
  <media:content url="https://img.youtube.com/vi/hyEJi5N4p3Q/maxresdefault.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Google Colab</title>
  <dc:creator>Chris Endemann</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/GoogleColab.html</link>
  <description><![CDATA[ 




<p><a href="https://colab.research.google.com/">Google Colab</a> is a cloud-based Jupyter notebook environment that runs entirely in the browser. It allows you to write and execute Python code without installing anything locally, making it a popular choice for machine learning, data analysis, and teaching. Colab integrates directly with Google Drive, supports GPU and TPU acceleration, and makes it easy to share notebooks and collaborate with others.</p>
<section id="plans-and-compute-units" class="level2">
<h2 class="anchored" data-anchor-id="plans-and-compute-units">Plans and compute units</h2>
<p>While free-tier performance is often sufficient for teaching, tutorials, and lightweight experiments, paid plans offer more predictable runtime windows, stronger GPU availability, and improved overall stability for sustained machine learning workloads. Colab Pro is often the most practical choice for researchers and students who use Colab regularly, balancing cost, runtime, and GPU access without committing to the higher price of Pro+ or worrying about “pay as you go” charges.</p>
<table class="caption-top table">
<colgroup>
<col style="width: 9%">
<col style="width: 9%">
<col style="width: 24%">
<col style="width: 27%">
<col style="width: 12%">
<col style="width: 18%">
</colgroup>
<thead>
<tr class="header">
<th>Plan</th>
<th>Cost</th>
<th>Compute units</th>
<th>Typical runtime</th>
<th>Memory</th>
<th>GPU access</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Free</td>
<td>$0</td>
<td>–</td>
<td>Up to ~12 hours <em>under ideal conditions</em> (often much less; &lt;4 isn’t uncommon), ~90 min idle timeout</td>
<td>~12 GB</td>
<td>Shared GPUs (commonly T4/K80), no guarantees</td>
</tr>
<tr class="even">
<td>Pay As You Go</td>
<td>variable</td>
<td>Purchase as needed</td>
<td>Depends on units purchased</td>
<td>Varies</td>
<td>Access to faster GPUs and more memory when available</td>
</tr>
<tr class="odd">
<td>Colab Pro</td>
<td>$9.99/month</td>
<td>100 units/month</td>
<td>Often 12–24 hours, ~180 min idle timeout</td>
<td>~25 GB</td>
<td>More predictable access to T4/P100 GPUs and high-memory VMs</td>
</tr>
<tr class="even">
<td>Colab Pro+</td>
<td>~$49.99/month</td>
<td>~500–600 units/month</td>
<td>Up to ~24 hours, ~180 min idle timeout</td>
<td>~25 GB</td>
<td>Priority access to premium GPUs (T4/P100/V100) and background execution</td>
</tr>
<tr class="odd">
<td>Colab Enterprise</td>
<td>custom</td>
<td>Custom</td>
<td>Custom</td>
<td>Custom</td>
<td>Integrated with GCP services (BigQuery, Vertex AI)</td>
</tr>
</tbody>
</table>
<p>For the most up-to-date prices, check <a href="https://colab.research.google.com/signup">colab.research.google.com/signup</a></p>
</section>
<section id="data-storage-and-mounting-google-drive" class="level2">
<h2 class="anchored" data-anchor-id="data-storage-and-mounting-google-drive">Data storage and mounting Google Drive</h2>
<p>Colab notebooks themselves are stored in Google Drive, but any files you upload during a session are temporary and deleted once the session ends. To persist data between sessions, mount your Google Drive into the notebook runtime:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> google.colab <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> drive</span>
<span id="cb1-2">drive.mount(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'/content/drive'</span>)</span></code></pre></div></div>
<p>Once mounted, your Drive files are available under <code>/content/drive/MyDrive/</code>. For example:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb2-2">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'/content/drive/MyDrive/data.csv'</span>)</span></code></pre></div></div>
<p>This approach is essential for storing training data, saving model checkpoints, or writing outputs that need to persist after the notebook shuts down. For larger datasets, connecting to cloud storage services like Google Cloud Storage (GCS) or AWS S3 is also possible using their Python SDKs.</p>
</section>
<section id="best-practices-and-limitations" class="level2">
<h2 class="anchored" data-anchor-id="best-practices-and-limitations">Best practices and limitations</h2>
<p>While Google Colab is one of the easiest ways to experiment with machine learning, it has several limitations to consider:</p>
<ul>
<li>Session timeouts cannot be disabled and will interrupt long-running jobs.</li>
<li>GPU availability is shared and unpredictable in the free tier.</li>
<li>Persistent storage requires integrating with Google Drive or another external service.</li>
<li>Environment customization is limited compared to running Jupyter on your own server or cloud instance.</li>
</ul>
<p>Because of these constraints, Colab is best suited for:</p>
<ul>
<li>Rapid prototyping of notebooks and model experiments<br>
</li>
<li>Teaching and workshops<br>
</li>
<li>Exploratory data analysis and visualization<br>
</li>
<li>Small to medium-scale deep learning tasks</li>
</ul>
<p>For more control, longer runtimes, or production workflows, platforms like AWS SageMaker, Google Vertex AI or campus HPC systems (e.g., CHTC) are better suited.</p>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/BadgerCompute.html"><strong>Compute</strong>: BadgerCompute</a> – UW–Madison’s lightweight, NetID-authenticated Jupyter service for short interactive sessions and classroom use. Includes a 4-hour runtime limit (which may sometimes beat the free version of Colab).</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-Amazon_SageMaker.html"><strong>Compute</strong>: Intro to AWS SageMaker for Predictive ML/AI</a>. Learn how to launch and scale machine learning workflows in the cloud using AWS SageMaker.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/CHTC.html"><strong>Compute</strong>: Center for High Throughput Computing (CHTC)</a> - Learn how to use CHTC for machine learning jobs.</li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Compute</category>
  <category>Jupyter</category>
  <category>Google</category>
  <category>GPU</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/GoogleColab.html</guid>
  <pubDate>Fri, 26 Sep 2025 00:00:00 GMT</pubDate>
  <media:content url="https://uw-madison-datascience.github.io/ML-X-Nexus/images/GoogleColab.png" medium="image" type="image/png" height="89" width="144"/>
</item>
<item>
  <title>BadgerCompute</title>
  <dc:creator>Chris Endemann</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/BadgerCompute.html</link>
  <description><![CDATA[ 




<p><a href="https://badgercompute.wisc.edu/">BadgerCompute</a> is UW–Madison’s browser-based interactive computing service built on JupyterHub. It provides on-demand access to CPUs, memory, and GPUs without requiring any local software installation or server setup. Researchers, instructors, and students at UW-Madison can write and run code, visualize data, and develop workflows directly from their web browser using only a NetID.</p>
<p>BadgerCompute is similar in spirit to Google Colab: both offer hosted Jupyter notebook environments for writing and executing code interactively. However, BadgerCompute is campus-supported, requires NetID authentication, and is subject to UW data policies. It is free to use for UW affiliates, but it also comes with runtime limits (4 hours), limited storage (20 GB), and fewer GPU guarantees.</p>
<section id="gpu-access-and-runtime-limits" class="level2">
<h2 class="anchored" data-anchor-id="gpu-access-and-runtime-limits">GPU access and runtime limits</h2>
<p>GPU availability is limited and not guaranteed, but when available it is often sufficient for small to medium deep learning tasks, accelerated data analysis, or exploratory workflows. Each session runs in a containerized environment with common data science tools already installed. Sessions have the following limitations:</p>
<ul>
<li>Maximum runtime is four hours.</li>
<li>Sessions without an active browser connection shut down automatically after ten minutes.</li>
<li>The service can support roughly 80–100 concurrent users.</li>
<li>GPU capacity is shared and may not be available during peak usage times.</li>
</ul>
</section>
<section id="data-storage-limitations" class="level2">
<h2 class="anchored" data-anchor-id="data-storage-limitations">Data storage limitations</h2>
<p>BadgerCompute is designed for interactive computing, not data storage. Its storage model is deliberately minimal and ephemeral:</p>
<ul>
<li>Each BadgerCompute user is allocated 20 GB of persistent storage. However, this storage is retained only for 30 days after your last login. If you do not log into BadgerCompute within 30 days, your data will be automatically deleted.
<ul>
<li>As an alternative, <a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/GoogleColab.html">Google Colab</a> provides persistent storage via integrations with Google Drive.</li>
</ul></li>
<li>In addition, the default folder when you log into BadgerCompute is NOT persistent. You will need to put files in a particular folder to save them between sessions. See our <a href="https://badgercompute.wisc.edu/docs">documentation</a> for more details.</li>
<li>BadgerCompute is NOT suitable for work with restricted data.</li>
</ul>
</section>
<section id="getting-started" class="level2">
<h2 class="anchored" data-anchor-id="getting-started">Getting started</h2>
<p>To use BadgerCompute, you must:</p>
<ol type="1">
<li>Have an active UW–Madison NetID.</li>
<li>Complete the <a href="https://canvas.wisc.edu/enroll/JR887K">BadgerCompute Certification Course</a> on Canvas and wait 24 hours for access.</li>
</ol>
</section>
<section id="working-in-jupyterlab" class="level2">
<h2 class="anchored" data-anchor-id="working-in-jupyterlab">Working in JupyterLab</h2>
<p>When your session launches, you will see the standard JupyterLab interface:</p>
<ul>
<li>A file browser for navigating directories and uploading or downloading files</li>
<li>A launcher for creating new notebooks, terminals, or text files</li>
<li>A notebook interface for writing and running code interactively</li>
<li>A terminal for executing shell commands directly</li>
</ul>
<p>Only the <code>~/work</code> directory is persistent. Any files saved elsewhere are deleted when the session ends.</p>
</section>
<section id="best-practices-and-limitations" class="level2">
<h2 class="anchored" data-anchor-id="best-practices-and-limitations">Best practices and limitations</h2>
<p>Because BadgerCompute is a shared resource with limited capacity, plan your workflows with the following in mind:</p>
<ul>
<li>Sessions end automatically after four hours and cannot be extended.</li>
<li>Sessions without an active browser connection end after ten minutes.</li>
<li>GPU access depends on demand and may not be available.</li>
<li>Storage is temporary and limited. As an alternative, <a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/GoogleColab.html">Google Colab</a> provides persistent storage via integrations with Google Drive.</li>
<li>Availability is not guaranteed for classes or large workshops. For larger courses, coordinate in advance or consider alternatives.</li>
</ul>
</section>
<section id="when-to-use-badgercompute-vs.-other-platforms" class="level2">
<h2 class="anchored" data-anchor-id="when-to-use-badgercompute-vs.-other-platforms">When to use BadgerCompute vs.&nbsp;other platforms</h2>
<p>BadgerCompute is most useful for:</p>
<ul>
<li>Rapid prototyping of data analysis or machine learning workflows</li>
<li>Teaching and demonstrations without requiring software installation</li>
<li>Exploratory data analysis and small-scale model development</li>
<li>Short tasks that benefit from GPU acceleration</li>
</ul>
<p>For more intensive work — such as training large models, running distributed jobs, executing long-running tasks, or hosting large datasets — platforms like <a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/CHTC.html">CHTC</a>, <a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-Amazon_SageMaker.html">AWS</a>, GCP, or local HPC clusters are more appropriate. Users may choose to start exploratory work in BadgerCompute (or Google Colab) and transition to these systems when needed.</p>
</section>
<section id="learn-more-and-get-help" class="level2">
<h2 class="anchored" data-anchor-id="learn-more-and-get-help">Learn more and get help</h2>
<ul>
<li><strong>Documentation</strong>: <a href="https://badgercompute.wisc.edu/docs/">badgercompute.wisc.edu/docs/</a><br>
</li>
<li><strong>Community forum</strong>: <a href="https://badgercompute.wisc.edu/docs/get-help/">badgercompute.wisc.edu/docs/get-help/</a></li>
</ul>
<p>BadgerCompute is supported by DoIT, CHTC, and the Data Science Institute (DSI) as part of UW–Madison’s research computing ecosystem.</p>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-Amazon_SageMaker.html"><strong>Compute</strong>: Intro to AWS SageMaker for Predictive ML/AI</a>. Learn how to launch and scale machine learning workflows in the cloud using AWS SageMaker.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/GoogleColab.html"><strong>Compute</strong>: Google Colab</a> - Learn how to use Google Colab for machine learning workflows.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/CHTC.html"><strong>Compute</strong>: Center for High Throughput Computing (CHTC)</a> - Learn how to use CHTC for machine learning jobs.</li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Compute</category>
  <category>UW-Madison</category>
  <category>GPU</category>
  <category>Jupyter</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Compute/BadgerCompute.html</guid>
  <pubDate>Wed, 24 Sep 2025 00:00:00 GMT</pubDate>
  <media:content url="https://uw-madison-datascience.github.io/ML-X-Nexus/images/BadgerCompute.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>BioTrove</title>
  <dc:creator>Chris Endemann</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Data/BioTrove.html</link>
  <description><![CDATA[ 




<p><a href="https://baskargroup.github.io/BioTrove/">BioTrove</a> is the largest publicly accessible biodiversity image dataset, containing <strong>161.9 million images</strong> spanning approximately <strong>366,000 species</strong> across three kingdoms: Animalia, Fungi, and Plantae. Curated from <a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Data/iNaturalist.html">iNaturalist</a> research-grade observations, BioTrove provides an unprecedented resource for training and evaluating AI models in biodiversity and ecology. It was published as a <strong>Spotlight</strong> paper at the NeurIPS 2024 Datasets and Benchmarks track.</p>
<section id="what-makes-biotrove-valuable-for-ai" class="level3">
<h3 class="anchored" data-anchor-id="what-makes-biotrove-valuable-for-ai">What makes BioTrove valuable for AI?</h3>
<p>BioTrove addresses a critical gap in AI for biodiversity: the lack of large-scale, curated, and openly available training data. While previous datasets like TREEOFLIFE-10M offered strong species diversity, BioTrove exceeds their scale by a factor of ~16x while maintaining comparable taxonomic breadth.</p>
<p>Each image is annotated with:</p>
<ul>
<li>Scientific names and common names</li>
<li>Full taxonomic hierarchy (kingdom, phylum, class, order, family, genus, species)</li>
<li>Image URLs and metadata for reproducible access</li>
</ul>
</section>
<section id="taxonomic-coverage" class="level3">
<h3 class="anchored" data-anchor-id="taxonomic-coverage">Taxonomic coverage</h3>
<p>BioTrove covers eleven major taxonomic groups, including Aves (birds), Insecta (insects), Plantae (plants), Fungi, Mammalia (mammals), Reptilia, Amphibia, Arachnida, Mollusca, Actinopterygii (ray-finned fish), and Animalia (other animals).</p>
</section>
<section id="key-subsets-and-benchmarks" class="level3">
<h3 class="anchored" data-anchor-id="key-subsets-and-benchmarks">Key subsets and benchmarks</h3>
<ul>
<li><strong>BioTrove-Train (~40M images, ~33K species)</strong>: A curated training subset focused on seven taxonomic categories (Aves, Arachnida, Insecta, Plantae, Fungi, Mollusca, Reptilia) chosen for their biodiversity impact and underrepresentation in standard image models.</li>
<li><strong>BioTrove-Balanced (~112K images)</strong>: Up to 500 species per category with 50 images each, for balanced evaluation.</li>
<li><strong>BioTrove-Unseen</strong>: Species with fewer than 30 instances, for testing generalization to rare or unseen species.</li>
<li><strong>BioTrove-LifeStages</strong>: Evaluates recognition across developmental stages (egg, larva, pupa, adult) for five insect species.</li>
</ul>
</section>
<section id="pretrained-models-biotrove-clip" class="level3">
<h3 class="anchored" data-anchor-id="pretrained-models-biotrove-clip">Pretrained models (BioTrove-CLIP)</h3>
<p>Three CLIP-based models were trained on BioTrove-Train and released on Hugging Face:</p>
<ul>
<li><strong>BT-CLIP-O</strong>: ViT-B/16 initialized from OpenCLIP</li>
<li><strong>BT-CLIP-B</strong>: ViT-B/16 initialized from BioCLIP</li>
<li><strong>BT-CLIP-M</strong>: ViT-L/14 initialized from MetaCLIP</li>
</ul>
<p>These models are useful for biodiversity-focused image classification, retrieval, and zero-shot species identification.</p>
</section>
<section id="key-applications" class="level3">
<h3 class="anchored" data-anchor-id="key-applications">Key applications</h3>
<ul>
<li><strong>Pest control and crop monitoring</strong>: Training models to identify pest species and agricultural threats</li>
<li><strong>Biodiversity assessment</strong>: Large-scale species identification and population monitoring</li>
<li><strong>Environmental conservation</strong>: Detecting ecological changes and supporting wildlife monitoring</li>
<li><strong>Fine-grained classification</strong>: Building models that distinguish visually similar species</li>
<li><strong>Zero-shot species recognition</strong>: Leveraging CLIP-based models for identifying species not seen during training</li>
</ul>
</section>
<section id="access" class="level3">
<h3 class="anchored" data-anchor-id="access">Access</h3>
<p>BioTrove metadata and tools are available on <a href="https://github.com/baskargroup/BioTrove/">GitHub</a>, with dataset cards and pretrained models on <a href="https://huggingface.co/datasets/BGLab/BioTrove">Hugging Face</a>. The BioTrove library includes scripts for downloading, filtering, and preprocessing the data into ML-ready image-text pairs.</p>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Data/iNaturalist.html"><strong>Data</strong>: iNaturalist</a>: The source platform for BioTrove’s research-grade observations.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Data/INQUIRE.html"><strong>Data</strong>: INQUIRE</a>: A retrieval benchmark built on iNaturalist data.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Projects/ML-Marathon/BioTrove-Clustering.html"><strong>Project</strong>: Clustering the BioTrove Dataset</a>: An ML Marathon challenge using BioTrove for unsupervised species clustering.</li>
<li><a href="https://arxiv.org/abs/2406.17720"><strong>Paper</strong>: BioTrove: A Large Curated Image Dataset Enabling AI for Biodiversity (NeurIPS 2024)</a></li>
<li><a href="https://huggingface.co/datasets/BGLab/BioTrove"><strong>Hugging Face</strong>: BioTrove dataset card</a></li>
<li><a href="https://huggingface.co/BGLab/BioTrove-CLIP"><strong>Hugging Face</strong>: BioTrove-CLIP models</a></li>
<li><a href="https://github.com/baskargroup/BioTrove/"><strong>GitHub</strong>: BioTrove tools and scripts</a></li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Data</category>
  <category>Image data</category>
  <category>Computer vision</category>
  <category>Biology</category>
  <category>Ecology</category>
  <category>Biodiversity</category>
  <category>Deep learning</category>
  <category>CLIP</category>
  <category>Hugging Face</category>
  <category>Citizen science</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Data/BioTrove.html</guid>
  <pubDate>Thu, 11 Sep 2025 00:00:00 GMT</pubDate>
  <media:content url="https://uw-madison-datascience.github.io/ML-X-Nexus/images/Clustering-Biotrove.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Clustering the BioTrove Dataset</title>
  <dc:creator>Chris Endemann</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Projects/ML-Marathon/BioTrove-Clustering.html</link>
  <description><![CDATA[ 




<p>Clustering the BioTrove Dataset was featured in the <a href="https://ml-marathon.wisc.edu/">2025 Machine Learning Marathon (MLM25)</a>. This challenge asks participants to discover genus- and species-level structure in biodiversity images using unsupervised and self-supervised learning methods.</p>
<section id="challenge-design" class="level3">
<h3 class="anchored" data-anchor-id="challenge-design">Challenge design</h3>
<ul>
<li><strong>Task</strong>: Cluster biodiversity images to recover taxonomic structure (genus and species groupings) without explicit labels.</li>
<li><strong>Domain</strong>: Biodiversity and ecology – automated species identification can support pest control, crop monitoring, biodiversity assessment, and environmental conservation.</li>
<li><strong>Data</strong>: Images drawn from <a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Data/BioTrove.html">BioTrove</a>, the largest publicly accessible biodiversity image dataset (161.9 million images, ~366K species), curated from iNaturalist with research-grade annotations.</li>
<li><strong>Methods</strong>: Contrastive learning, autoencoders, CLIP-based embeddings, and other unsupervised/semi-supervised approaches.</li>
</ul>
</section>
<section id="links" class="level3">
<h3 class="anchored" data-anchor-id="links">Links</h3>
<ul>
<li><strong>Kaggle challenge</strong>: <a href="https://www.kaggle.com/competitions/biotrove-clustering/overview">Clustering the BioTrove Dataset</a></li>
<li><strong>Winning writeup</strong>: <a href="https://www.kaggle.com/competitions/biotrove-clustering/writeups/1-it-all-depends-on-a-good-embedding">1st place: It All Depends on a Good Embedding</a></li>
<li><strong>BioTrove project</strong>: <a href="https://baskargroup.github.io/BioTrove/">baskargroup.github.io/BioTrove</a></li>
</ul>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Data/BioTrove.html"><strong>Data</strong>: BioTrove</a>: Learn more about the BioTrove dataset, including its construction from iNaturalist and available CLIP embeddings.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Data/iNaturalist.html"><strong>Data</strong>: iNaturalist</a>: The citizen-science platform underlying BioTrove’s research-grade biodiversity observations.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Libraries/HuggingFace.html"><strong>Library</strong>: Hugging Face</a>: BioTrove-CLIP models and the dataset are hosted on Hugging Face.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-Deeplearning_PyTorch.html"><strong>Workshop</strong>: Intro to Deep Learning with PyTorch</a>: Build foundational deep learning skills relevant to contrastive learning and embedding-based clustering.</li>
<li><strong>Paper</strong>: <a href="https://arxiv.org/abs/2406.17720">BioTrove: A Large Curated Image Dataset Enabling AI for Biodiversity (NeurIPS 2024)</a></li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Projects</category>
  <category>ML Marathon</category>
  <category>MLM25</category>
  <category>Computer vision</category>
  <category>Clustering</category>
  <category>Unsupervised learning</category>
  <category>Biodiversity</category>
  <category>Image data</category>
  <category>Deep learning</category>
  <category>CLIP</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Projects/ML-Marathon/BioTrove-Clustering.html</guid>
  <pubDate>Thu, 11 Sep 2025 00:00:00 GMT</pubDate>
  <media:content url="https://uw-madison-datascience.github.io/ML-X-Nexus/images/Clustering-Biotrove.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Brain-to-Text ’25: Decoding Speech from Neural Activity</title>
  <dc:creator>Chris Endemann</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Projects/ML-Marathon/Brain-to-Text.html</link>
  <description><![CDATA[ 




<p>Brain-to-Text ’25 was featured in the <a href="https://ml-marathon.wisc.edu/">2025 Machine Learning Marathon (MLM25)</a>. This Kaggle competition challenges participants to decode intracortical neural activity during attempted speech into text – aiming to restore communication for people with paralysis.</p>
<section id="challenge-design" class="level3">
<h3 class="anchored" data-anchor-id="challenge-design">Challenge design</h3>
<ul>
<li><strong>Task</strong>: Decode neural recordings from speech-related brain regions into the words a participant is attempting to say.</li>
<li><strong>Domain</strong>: Brain-computer interfaces and neural speech decoding.</li>
<li><strong>Data</strong>: A new intracortical speech neuroscience dataset provided for the competition.</li>
<li><strong>Methods</strong>: The 2024 edition’s top approaches used RNN ensembles merged with fine-tuned large language models. The baseline achieved 9.7% word error rate; the top entrant reached 5.8%.</li>
</ul>
</section>
<section id="links" class="level3">
<h3 class="anchored" data-anchor-id="links">Links</h3>
<ul>
<li><strong>Kaggle challenge</strong>: <a href="https://www.kaggle.com/competitions/brain-to-text-25">Brain-to-Text ’25</a></li>
</ul>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-Deeplearning_PyTorch.html"><strong>Workshop</strong>: Intro to Deep Learning with PyTorch</a>: Learn RNNs and sequence modeling fundamentals in PyTorch — key building blocks for neural speech decoding.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-TextAnalysis_Python.html"><strong>Workshop</strong>: Intro to Natural Language Processing (NLP)</a>: Brush up on text processing and language model basics relevant to the text decoding target.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Libraries/HuggingFace.html"><strong>Library</strong>: Hugging Face</a>: Top solutions fine-tuned LLMs from Hugging Face — learn how to discover and use open-source models.</li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Projects</category>
  <category>ML Marathon</category>
  <category>MLM25</category>
  <category>Deep learning</category>
  <category>Signal processing</category>
  <category>Time-series</category>
  <category>NLP</category>
  <category>Neuroscience</category>
  <category>RNN</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Projects/ML-Marathon/Brain-to-Text.html</guid>
  <pubDate>Thu, 11 Sep 2025 00:00:00 GMT</pubDate>
  <media:content url="https://uw-madison-datascience.github.io/ML-X-Nexus/images/Brain-to-Text25.png" medium="image" type="image/png" height="72" width="144"/>
</item>
<item>
  <title>MaveDB: Protein Variant Effect Prediction</title>
  <dc:creator>Chris Endemann</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Projects/ML-Marathon/MaveDB-Variant-Effect.html</link>
  <description><![CDATA[ 




<p>The MaveDB challenge was featured in the <a href="https://ml-marathon.wisc.edu/">2025 Machine Learning Marathon (MLM25)</a>. Participants explored protein language models and other ML methods to predict variant effects using data from <a href="https://www.mavedb.org/">MaveDB</a>, an open-source database of multiplexed assays of variant effect (MAVEs) containing over 7 million variant effect measurements.</p>
<section id="challenge-design" class="level3">
<h3 class="anchored" data-anchor-id="challenge-design">Challenge design</h3>
<ul>
<li><strong>Task</strong>: Predict the functional impact of protein variants using deep mutational scanning data.</li>
<li><strong>Domain</strong>: Computational biology – understanding how single amino acid changes affect protein function is critical for clinical variant interpretation and protein engineering.</li>
<li><strong>Methods</strong>: Protein language models (e.g., ESM), fine-tuning strategies, and variant effect predictors.</li>
</ul>
</section>
<section id="links" class="level3">
<h3 class="anchored" data-anchor-id="links">Links</h3>
<ul>
<li><strong>MaveDB</strong>: <a href="https://www.mavedb.org/">mavedb.org</a></li>
<li><strong>ML Marathon</strong>: <a href="https://ml-marathon.wisc.edu/">ml-marathon.wisc.edu</a></li>
</ul>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Libraries/HuggingFace.html"><strong>Library</strong>: Hugging Face</a>: Protein language models like ESM are hosted on Hugging Face — learn how to discover and fine-tune open-source models.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-Deeplearning_PyTorch.html"><strong>Workshop</strong>: Intro to Deep Learning with PyTorch</a>: Learn deep learning fundamentals in PyTorch, the framework underlying most protein language models.</li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Projects</category>
  <category>ML Marathon</category>
  <category>MLM25</category>
  <category>Deep learning</category>
  <category>Protein language models</category>
  <category>Foundation models</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Projects/ML-Marathon/MaveDB-Variant-Effect.html</guid>
  <pubDate>Thu, 11 Sep 2025 00:00:00 GMT</pubDate>
  <media:content url="https://uw-madison-datascience.github.io/ML-X-Nexus/images/MaveDB.png" medium="image" type="image/png" height="72" width="144"/>
</item>
<item>
  <title>AI’s Environmental Footprint: Insights and Actions</title>
  <dc:creator>Chris Endemann</dc:creator>
  <dc:creator>Nidhal Jegham</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Applications/Videos/Forums/mlx_2025-09-09.html</link>
  <description><![CDATA[ 




<p>This forum explores how ML/AI practitioners can measure and reduce the environmental costs of AI. It pairs two complementary efforts: one that retrieves emissions and cost data from sustainability reports using RAG, and another that benchmarks energy, water, and carbon footprints across large language models.</p>
<section id="wattbot-estimating-ai-emissions-and-costs-with-rag-chris-endemann-0224" class="level4">
<h4 class="anchored" data-anchor-id="wattbot-estimating-ai-emissions-and-costs-with-rag-chris-endemann-0224">WattBot: Estimating AI Emissions and Costs with RAG — Chris Endemann <a href="https://www.youtube.com/watch?v=2dCQS1jAbUo&amp;t=144s" target="_blank">02:24</a></h4>
<p>Chris introduces WattBot, a Kaggle challenge and retrieval-augmented generation (RAG) framework for estimating AI emissions and compute costs. Using 35+ papers and 300+ curated Q&amp;A pairs, teams build systems that return citation-backed answers or explicitly state when evidence is missing—promoting transparency and reproducibility in sustainability reporting.</p>
<ul>
<li><strong>Kaggle challenge</strong>: <a href="https://www.kaggle.com/competitions/WattBot2025/overview">kaggle.com/competitions/WattBot2025/overview</a></li>
</ul>
</section>
<section id="how-hungry-is-ai-benchmarking-energy-water-and-carbon-footprint-of-llm-inference-nidhal-jegham-0907" class="level4">
<h4 class="anchored" data-anchor-id="how-hungry-is-ai-benchmarking-energy-water-and-carbon-footprint-of-llm-inference-nidhal-jegham-0907">How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference — Nidhal Jegham <a href="https://www.youtube.com/watch?v=2dCQS1jAbUo&amp;t=547s" target="_blank">09:07</a></h4>
<p>Nidhal presents a reproducible framework to estimate per-request energy, water, and carbon use for open and proprietary LLMs. The method combines hardware assumptions (A100–H200 GPUs), data center multipliers (PUE, WUE, CIF), and a DEA-style efficiency score that balances model accuracy against environmental cost.</p>
<ul>
<li><strong>Preprint</strong>: <a href="https://arxiv.org/abs/2505.09598">https://arxiv.org/abs/2505.09598</a><br>
</li>
<li><strong>Dashboard</strong>: <a href="https://app.powerbi.com/view?r=eyJrIjoiZjVmOTI0MmMtY2U2Mi00ZTE2LTk2MGYtY2ZjNDMzODZkMjlmIiwidCI6IjQyNmQyYThkLTljY2QtNDI1NS04OTNkLTA2ODZhMzJjMTY4ZCIsImMiOjF9">https://app.powerbi.com/view?r=eyJr9</a></li>
</ul>
<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/2dCQS1jAbUo" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
</section>
<section id="key-points" class="level3">
<h3 class="anchored" data-anchor-id="key-points">Key points</h3>
<ul>
<li>Data center efficiency and GPU generation (A100–H200) drive impact as much as model size.<br>
</li>
<li>Environmental multipliers like PUE (Power Usage Effectiveness) and WUE (Water Usage Effectiveness) are critical to cross-site comparisons.</li>
<li>Efficiency is not absolute: the Jevons paradox applies—lower per-query cost can increase overall usage.<br>
</li>
<li>U.S. regulation remains minimal, making voluntary transparency efforts (like Mistral’s) especially important.<br>
</li>
<li>Renewable energy sourcing and liquid cooling are among the most actionable interventions.<br>
</li>
<li>Academic and industry collaborations can close data gaps through open benchmarking.</li>
<li>Aggregate usage, not single-query cost, drives total environmental footprint.<br>
</li>
<li>Reporting environmental impact alongside accuracy metrics is an emerging best practice.</li>
</ul>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Projects/ML-Marathon/WattBot-2025.html"><strong>Project</strong>: WattBot 2025</a>: Full project page for the WattBot ML Marathon challenge, including challenge design, winning approach, and related resources.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Applications/Videos/Forums/mlx_2026-02-17.html"><strong>Talk</strong>: Deploying RAG in Bedrock vs.&nbsp;Local: WattBot 2025 Case Study</a>: See how the winning WattBot RAG system was deployed in AWS Bedrock and locally with open-source models.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Notebooks/2025-05-07_RAG-Romeo-Juliet.html"><strong>Notebook</strong>: Exploring RAG with Romeo and Juliet</a>: Learn how to build an end-to-end retrieval augmented generation (RAG) pipeline using Shakespeare’s Romeo and Juliet as example text.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Applications/Videos/Forums/"><strong>Video Archive</strong>: ML+X forum archive</a>: Check out other recorded forums from ML+X.</li>
<li><a href="https://ml-marathon.wisc.edu/">Machine Learning Marathon</a>: Learn about the annual Machine Learning Marathon (3-month AI/ML hackathon) hosted by ML+X each fall.</li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Videos</category>
  <category>ML+X</category>
  <category>UW-Madison</category>
  <category>Trustworthy AI</category>
  <category>Sustainability</category>
  <category>Energy</category>
  <category>Benchmarking</category>
  <category>LLM</category>
  <category>RAG</category>
  <category>Retrieval</category>
  <category>Cloud</category>
  <category>GPU</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Applications/Videos/Forums/mlx_2025-09-09.html</guid>
  <pubDate>Tue, 09 Sep 2025 00:00:00 GMT</pubDate>
  <media:content url="https://img.youtube.com/vi/2dCQS1jAbUo/maxresdefault.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>WattBot 2025: Estimating AI Emissions with RAG</title>
  <dc:creator>Chris Endemann</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Projects/ML-Marathon/WattBot-2025.html</link>
  <description><![CDATA[ 




<p>WattBot was an “Active” challenge in the <a href="https://ml-marathon.wisc.edu/">2025 Machine Learning Marathon (MLM25)</a>. Teams built retrieval-augmented generation (RAG) systems to extract credible, citation-backed emissions and cost estimates for AI workloads from a corpus of 35+ peer-reviewed papers and 300+ curated Q&amp;A pairs. Systems were expected to return citation-grounded answers or explicitly abstain when evidence was missing – promoting transparency and reproducibility in sustainability reporting.</p>
<section id="challenge-design" class="level3">
<h3 class="anchored" data-anchor-id="challenge-design">Challenge design</h3>
<ul>
<li><strong>Task</strong>: Given a natural-language question about AI energy use, water consumption, or carbon emissions, retrieve relevant passages from the provided corpus and generate a citation-backed answer.</li>
<li><strong>Evaluation</strong>: Answers were scored on factual accuracy, proper citation, and appropriate abstention when evidence was insufficient.</li>
<li><strong>Corpus</strong>: 35+ academic papers covering AI sustainability, energy benchmarking, and environmental impact reporting.</li>
</ul>
</section>
<section id="winning-approach" class="level3">
<h3 class="anchored" data-anchor-id="winning-approach">Winning approach</h3>
<p>The winning solution by <a href="https://github.com/KohakuBlueleaf/KohakuRAG">KohakuBlueleaf</a> used a RAG pipeline that was later replicated and deployed in both AWS Bedrock and locally with open-source Hugging Face models. See the follow-up talk below for deployment details.</p>
</section>
<section id="links" class="level3">
<h3 class="anchored" data-anchor-id="links">Links</h3>
<ul>
<li><strong>Kaggle challenge</strong>: <a href="https://www.kaggle.com/competitions/WattBot2025/overview">WattBot 2025</a></li>
<li><strong>Winning solution</strong>: <a href="https://github.com/KohakuBlueleaf/KohakuRAG">KohakuBlueleaf/KohakuRAG</a></li>
<li><strong>Deployment repo</strong>: <a href="https://github.com/matteso1/KohakuRAG_UI/">WattBot in Bedrock and Local</a></li>
</ul>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Applications/Videos/Forums/mlx_2026-02-17.html"><strong>Talk</strong>: Deploying RAG in Bedrock vs.&nbsp;Local: WattBot 2025 Case Study</a>: Follow-up ML+X forum where the winning RAG system was deployed in AWS Bedrock and locally with open-source models.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Applications/Videos/Forums/mlx_2025-09-09.html"><strong>Talk</strong>: AI’s Environmental Footprint: Insights and Actions</a>: The ML+X forum where WattBot was first introduced alongside LLM energy benchmarking work.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Notebooks/2025-05-07_RAG-Romeo-Juliet.html"><strong>Notebook</strong>: Exploring Fact-Based QA with RAG: Romeo and Juliet</a>: Build an end-to-end RAG pipeline from scratch – a great starting point before tackling WattBot.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-Amazon_SageMaker.html"><strong>Workshop</strong>: Intro to AWS SageMaker for Predictive ML/AI</a>: Covers AWS SageMaker and Bedrock for cloud-based ML/AI workflows, including RAG deployment.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Notebooks/Quantization-and-Precision.html"><strong>Notebook</strong>: Understanding Quantization and Precision</a>: Learn how quantization (e.g., 4-bit) reduces model size and memory requirements – relevant to the local deployment approach used in this project.</li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Projects</category>
  <category>ML Marathon</category>
  <category>MLM25</category>
  <category>RAG</category>
  <category>Retrieval</category>
  <category>LLM</category>
  <category>NLP</category>
  <category>Sustainability</category>
  <category>Energy</category>
  <category>GenAI</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Projects/ML-Marathon/WattBot-2025.html</guid>
  <pubDate>Tue, 09 Sep 2025 00:00:00 GMT</pubDate>
  <media:content url="https://uw-madison-datascience.github.io/ML-X-Nexus/images/WattBot.png" medium="image" type="image/png" height="72" width="144"/>
</item>
<item>
  <title>TorchAudio</title>
  <dc:creator>Andrew Piela</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Libraries/torchaudio.html</link>
  <description><![CDATA[ 




<p>The <a href="https://docs.pytorch.org/audio/stable/index.html">TorchAudio</a> library is an audio library that allows you to incorporate modern audio signal processing into deep learning workflows. Developed by the PyTorch team, it offers GPU-friendly tools for audio I/O, feature extraction, and augmentation. TorchAudio’s I/O means it can decode files like WAV/MP3/FLAC into PyTorch tensors of shape (channels, time) with torchaudio.load and then write tensors back to audio with torchaudio.save. Furthermore, TorchAudio provides differentiable transforms (STFT, Mel/CQT spectrograms, MFCC) and SoX-based effects for augmentation (pitch/tempo changes, masking). Built on PyTorch, it slots cleanly into DataLoader and nn.Module pipelines, making it perfect for researchers and practitioners building speech recognition, music transcription, and other audio ML systems.</p>
<section id="key-features" class="level4">
<h4 class="anchored" data-anchor-id="key-features">Key features</h4>
<ul>
<li><strong>Feature 1</strong>: Tensor-first transforms
<ul>
<li>MelSpectrogram, CQT, MFCC, Resample, AmplitudeToDB.</li>
</ul></li>
<li><strong>Feature 2</strong>: Audio I/O
<ul>
<li>Load/save WAV/MP3/FLAC straight to torch.Tensor (which is CPU/GPU-ready).</li>
</ul></li>
<li><strong>Feature 3</strong>: Augmentation
<ul>
<li>Pitch/tempo changes, masking, noise via SoX effects.</li>
</ul></li>
<li><strong>Performance</strong>: Offers batches and GPU acceleration through PyTorch and it also works well with DataLoader.</li>
</ul>
</section>
<section id="integration-and-compatibility" class="level2">
<h2 class="anchored" data-anchor-id="integration-and-compatibility">Integration and compatibility</h2>
<p>TorchAudio integrates with various machine learning frameworks and libraries, making it versatile for a range of tasks.</p>
<ul>
<li><strong>Frameworks Supported</strong>: PyTorch</li>
<li><strong>Compatible Libraries</strong>: NumPy, SciPy, librosa (complementary analysis), pretty_midi (export MIDI)</li>
<li><strong>Installation Instructions</strong>: ‘pip install torchaudio’</li>
</ul>
</section>
<section id="use-cases" class="level2">
<h2 class="anchored" data-anchor-id="use-cases">Use cases</h2>
<p>Here are some examples of how TorchAudio can be applied to different machine learning tasks.</p>
<ul>
<li><strong>Use Case 1</strong>: Wav file to midi transcription
<ul>
<li>Preprocess audio to log-mel/CQT tensors, train CNN/CRNN models for frame-wise notes/onsets.</li>
</ul></li>
<li><strong>Use Case 2</strong>: Data augmentation
<ul>
<li>Pitch/tempo shifts to increase robustness.</li>
</ul></li>
</ul>
</section>
<section id="tutorials-and-resources" class="level2">
<h2 class="anchored" data-anchor-id="tutorials-and-resources">Tutorials and resources</h2>
<section id="getting-started" class="level4">
<h4 class="anchored" data-anchor-id="getting-started">Getting started</h4>
<ul>
<li><p><strong><a href="https://www.youtube.com/watch?v=3mju52xBFK8">Official Tutorial</a></strong></p></li>
<li><p>code snippet (this code loads audio as a tensor and resamples it so every file has the same sample rate):</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># wav -&gt; log-mel spectrogram tensor (which would be model input)</span></span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> torchaudio</span>
<span id="cb1-3"></span>
<span id="cb1-4">w, sr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torchaudio.load(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"path/to/audio.wav"</span>)</span>
<span id="cb1-5">w <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> w.mean(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, keepdim<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> w.size(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> w</span>
<span id="cb1-6"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> sr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">22050</span>: w <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torchaudio.transforms.Resample(sr, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">22050</span>)(w)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> sr <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">22050</span></span>
<span id="cb1-7"></span>
<span id="cb1-8">mel <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torchaudio.transforms.MelSpectrogram(sample_rate<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>sr, n_fft<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2048</span>, hop_length<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">512</span>, n_mels<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">128</span>)</span>
<span id="cb1-9">X_db <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torchaudio.transforms.AmplitudeToDB(stype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"power"</span>)(mel(w))</span>
<span id="cb1-10"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(X_db.shape)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># shape would be (1, 128, T)  </span></span></code></pre></div></div></li>
</ul>
</section>
<section id="high-level-tips-for-effective-use" class="level4">
<h4 class="anchored" data-anchor-id="high-level-tips-for-effective-use">High-level tips for effective use</h4>
<ul>
<li><strong>Optimization</strong>: precompute log-mels for quicker training</li>
<li><strong>Memory Management</strong>: use small-ish n_mels and hop_length and also batch by time frames</li>
<li><strong>Common Pitfalls</strong>: a common mistake is having inconsistent sample rates and hop lengths, so you should make sure to keep them the same in training and inference</li>
</ul>
</section>
<section id="related-libraries-tools" class="level4">
<h4 class="anchored" data-anchor-id="related-libraries-tools">Related libraries &amp; tools</h4>
<ul>
<li>librosa: offers extra MIR utilities, like chroma and beat tracking</li>
<li>pretty_midi: turns frame-wise predictions into MIDI files for eventual listening/evaluation</li>
</ul>
</section>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Applications/Blogs/blog-music-identification.html"><strong>Blog</strong>: What Tune Is That? A Humanities Application of Deep Learning</a>: A Nexus community post applying deep learning to audio and music identification.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-Deeplearning_PyTorch.html"><strong>Workshop</strong>: Intro to Deep Learning with PyTorch</a>: TorchAudio is built on PyTorch — start here if you’re new to the framework.</li>
<li><a href="https://docs.pytorch.org/audio/stable/index.html">TorchAudio Documentation</a>: Includes official API as well as tutorials.</li>
<li><a href="https://www.youtube.com/watch?v=3mju52xBFK8">Getting Started With Torchaudio | PyTorch Tutorial (YouTube)</a>: In this youtube video from the AssemblyAI youtube channel, you can learn how to code the basic features of TorchAudio, including resampling and incorporating an audio dataset.</li>
<li><a href="https://magenta.tensorflow.org/datasets/maestro">The MAESTRO Dataset</a>: Popular dataset containing hundreds of paired audio and MIDI recordings that can be processed with TorchAudio and used for training.</li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Libraries</category>
  <category>Audio data</category>
  <category>PyTorch</category>
  <category>Music transcription</category>
  <category>Deep learning</category>
  <category>Signal processing</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Libraries/torchaudio.html</guid>
  <pubDate>Tue, 26 Aug 2025 00:00:00 GMT</pubDate>
  <media:content url="https://uw-madison-datascience.github.io/ML-X-Nexus/images/pexels-pixabay-257904.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>GeoDeepDive: Unlocking Knowledge from Scientific Literature</title>
  <dc:creator>Devanshi Jain</dc:creator>
  <link>https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Notebooks/geodeepdive.html</link>
  <description><![CDATA[ 




<p><a href="https://colab.research.google.com/github/UW-Madison-DataScience/ML-X-Nexus/blob/main/Learn/Notebooks/geodeepdive.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" class="img-fluid"></a></p>
<section id="overview" class="level2">
<h2 class="anchored" data-anchor-id="overview">Overview</h2>
<p><a href="https://geodeepdive.org/">GeoDeepDive (GDD)</a> is a cyberinfrastructure project designed to accelerate scientific discovery by extracting information from the vast and growing body of published scientific literature. While its roots are in geology, its applications span any domain that relies on published texts, including biology, materials science, medicine, and social sciences.</p>
<p>At its core, GDD is a massive database of over 15 million scientific documents (articles, theses, reports) that have been processed through a high-performance computing pipeline. This pipeline performs:</p>
<ul>
<li><strong>Optical Character Recognition (OCR)</strong> to convert scanned PDFs into machine-readable text.</li>
<li><strong>Natural Language Processing (NLP)</strong> to parse sentences, identify parts of speech, and perform named entity recognition (e.g., finding mineral names, locations, species).</li>
<li><strong>Relation Extraction</strong> to find and catalog relationships between entities (e.g., “mineral X is found at location Y”).</li>
</ul>
<p>The result is not just a collection of texts, but a structured, queryable knowledge graph. Researchers can use GDD’s public API to ask complex questions that would be impossible to answer by manual literature review, such as “find all papers that mention a specific fossil and its geological age” or “extract all measured values of a particular chemical compound.”</p>
</section>
<section id="prerequisites" class="level2">
<h2 class="anchored" data-anchor-id="prerequisites">Prerequisites</h2>
<ul>
<li><strong>Basic familiarity with Python</strong> and making HTTP requests.</li>
<li>A <strong>GitHub account</strong> (to use GDD’s public API).</li>
<li>An understanding of basic <strong>NLP concepts</strong> (Token, Sentence, Named Entity) is helpful but not strictly required to run the example.</li>
</ul>
</section>
<section id="key-concepts-and-definitions" class="level2">
<h2 class="anchored" data-anchor-id="key-concepts-and-definitions">Key Concepts and Definitions</h2>
<table class="caption-top table">
<colgroup>
<col style="width: 50%">
<col style="width: 50%">
</colgroup>
<thead>
<tr class="header">
<th style="text-align: left;">Concept</th>
<th style="text-align: left;">Definition</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;"><strong>Document</strong></td>
<td style="text-align: left;">Any processed text unit in the GDD database, typically a scientific publication.</td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>NLP</strong></td>
<td style="text-align: left;">Natural Language Processing, the field of AI concerned with interactions between computers and human language.</td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>Named Entity Recognition (NER)</strong></td>
<td style="text-align: left;">An NLP task to identify and classify key information (entities) in text into predefined categories like persons, organizations, locations, etc. In GDD, these are often scientific terms.</td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>API (Application Programming Interface)</strong></td>
<td style="text-align: left;">A set of rules and tools that allows different software applications to communicate with each other. GDD provides an API to query its database programmatically.</td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>JSON</strong></td>
<td style="text-align: left;">JavaScript Object Notation, a lightweight data-interchange format that is easy for humans to read and write and for machines to parse and generate. It is the primary format for data returned by the GDD API.</td>
</tr>
</tbody>
</table>
</section>
<section id="tutorial-querying-the-geodeepdive-api-for-mineral-mentions" class="level2">
<h2 class="anchored" data-anchor-id="tutorial-querying-the-geodeepdive-api-for-mineral-mentions">Tutorial: Querying the GeoDeepDive API for Mineral Mentions</h2>
<p>This tutorial will guide you through a simple example of using Python to query the GeoDeepDive API to find sentences that mention the mineral “stishovite.”</p>
<section id="step-1-get-your-github-token" class="level3">
<h3 class="anchored" data-anchor-id="step-1-get-your-github-token">Step 1: Get Your GitHub Token</h3>
<p>The GDD API uses GitHub OAuth for authentication. You need to generate a personal access token.</p>
<ol type="1">
<li>Go to your GitHub <a href="https://github.com/settings/profile">Settings</a>.</li>
<li>Navigate to <strong>Developer settings</strong> &gt; <strong>Personal access tokens</strong> &gt; <strong>Tokens (classic)</strong>.</li>
<li>Click <strong>Generate new token (classic)</strong>. Give it a descriptive note (e.g., “GeoDeepDive API”).</li>
<li>Select the <code>public_repo</code> scope. This is sufficient.</li>
<li>Click <strong>Generate token</strong> and <strong>copy the token immediately</strong> (you won’t see it again!).</li>
</ol>
</section>
<section id="step-2-set-up-your-python-environment" class="level3">
<h3 class="anchored" data-anchor-id="step-2-set-up-your-python-environment">Step 2: Set Up Your Python Environment</h3>
<p>We’ll use the <code>requests</code> library to make HTTP calls. Let’s install it:</p>
<div id="7ab02fdf" class="cell" data-execution_count="1">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>pip install requests</span></code></pre></div></div>
</div>
</section>
<section id="step-3-configure-your-authentication" class="level3">
<h3 class="anchored" data-anchor-id="step-3-configure-your-authentication">Step 3: Configure Your Authentication</h3>
<p>Now, let’s set up your authentication. Replace the placeholders with your actual GitHub credentials:</p>
<div id="d9462cbf" class="cell" data-execution_count="2">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> requests</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> json</span>
<span id="cb2-3"></span>
<span id="cb2-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Replace these with your actual GitHub credentials</span></span>
<span id="cb2-5">GITHUB_USERNAME <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"YourGitHubUsername"</span> </span>
<span id="cb2-6">GITHUB_TOKEN <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"YOUR_GITHUB_TOKEN"</span>  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Replace with the token you generated</span></span>
<span id="cb2-7"></span>
<span id="cb2-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Authentication configured successfully!"</span>)</span></code></pre></div></div>
</div>
</section>
<section id="step-4-query-the-geodeepdive-api" class="level3">
<h3 class="anchored" data-anchor-id="step-4-query-the-geodeepdive-api">Step 4: Query the GeoDeepDive API</h3>
<p>Let’s search for documents mentioning the mineral “stishovite”:</p>
<div id="5f6c7171" class="cell" data-execution_count="3">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># The public endpoint for the GDD API</span></span>
<span id="cb3-2">url <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://geodeepdive.org/api/articles"</span></span>
<span id="cb3-3"></span>
<span id="cb3-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># The parameters for our query. We want sentences about 'stishovite'</span></span>
<span id="cb3-5">params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {</span>
<span id="cb3-6">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"term"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"stishovite"</span>,   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># The word or phrase to search for</span></span>
<span id="cb3-7">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"full_results"</span>: <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get full details, including sentences</span></span>
<span id="cb3-8">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sentences"</span>: <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>       <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Include the sentences in the response</span></span>
<span id="cb3-9">}</span>
<span id="cb3-10"></span>
<span id="cb3-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Make the GET request to the API with authentication</span></span>
<span id="cb3-12">response <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> requests.get(url, params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>params, auth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(GITHUB_USERNAME, GITHUB_TOKEN))</span>
<span id="cb3-13"></span>
<span id="cb3-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Check if the request was successful</span></span>
<span id="cb3-15"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> response.status_code <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>:</span>
<span id="cb3-16">    data <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> response.json()</span>
<span id="cb3-17">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f" Found </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'success'</span>][<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> documents mentioning 'stishovite'.</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb3-18">    </span>
<span id="cb3-19">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Loop through the first few documents and print relevant sentences</span></span>
<span id="cb3-20">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i, doc <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'success'</span>][<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data'</span>][:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]):  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Look at first 3 docs</span></span>
<span id="cb3-21">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f" Document </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>doc[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'_gddid'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb3-22">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"   Title: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>doc<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'title'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'No title available'</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb3-23">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"   Sentences found:"</span>)</span>
<span id="cb3-24">        </span>
<span id="cb3-25">        stishovite_sentences <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [s <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> s <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> doc[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sentences'</span>] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'stishovite'</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> s[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'text'</span>].lower()]</span>
<span id="cb3-26">        </span>
<span id="cb3-27">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> j, sentence <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(stishovite_sentences[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]):  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Show first 2 sentences per doc</span></span>
<span id="cb3-28">            <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"     </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">. </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>sentence[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'text'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb3-29">        </span>
<span id="cb3-30">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"   Total sentences with 'stishovite': </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(stishovite_sentences)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb3-31">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"─"</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">80</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb3-32"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>:</span>
<span id="cb3-33">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f" Error: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>response<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>status_code<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb3-34">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(response.text)</span></code></pre></div></div>
</div>
</section>
<section id="step-5-advanced-query---filter-by-journal" class="level3">
<h3 class="anchored" data-anchor-id="step-5-advanced-query---filter-by-journal">Step 5: Advanced Query - Filter by Journal</h3>
<p>Let’s try a more specific query to find papers in specific journals:</p>
<div id="fc69d4fb" class="cell" data-execution_count="4">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Search for stishovite in specific journals</span></span>
<span id="cb4-2">advanced_params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {</span>
<span id="cb4-3">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"term"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"stishovite"</span>,</span>
<span id="cb4-4">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"journal"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"science,nature,geology"</span>,  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Filter by journal names</span></span>
<span id="cb4-5">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"full_results"</span>: <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,</span>
<span id="cb4-6">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sentences"</span>: <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,</span>
<span id="cb4-7">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"limit"</span>: <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Limit to 5 results</span></span>
<span id="cb4-8">}</span>
<span id="cb4-9"></span>
<span id="cb4-10">advanced_response <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> requests.get(url, params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>advanced_params, auth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(GITHUB_USERNAME, GITHUB_TOKEN))</span>
<span id="cb4-11"></span>
<span id="cb4-12"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> advanced_response.status_code <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>:</span>
<span id="cb4-13">    advanced_data <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> advanced_response.json()</span>
<span id="cb4-14">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"🔍 Found </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>advanced_data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'success'</span>][<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> documents in specified journals."</span>)</span>
<span id="cb4-15">    </span>
<span id="cb4-16">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> advanced_data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'success'</span>][<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>:</span>
<span id="cb4-17">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">📊 Journal distribution:"</span>)</span>
<span id="cb4-18">        journals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {}</span>
<span id="cb4-19">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> doc <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> advanced_data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'success'</span>][<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data'</span>]:</span>
<span id="cb4-20">            journal <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> doc.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'journal'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Unknown'</span>)</span>
<span id="cb4-21">            journals[journal] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> journals.get(journal, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb4-22">        </span>
<span id="cb4-23">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> journal, count <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> journals.items():</span>
<span id="cb4-24">            <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"   </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>journal<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>count<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> documents"</span>)</span>
<span id="cb4-25">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>:</span>
<span id="cb4-26">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"No documents found in the specified journals."</span>)</span>
<span id="cb4-27"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>:</span>
<span id="cb4-28">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f" Advanced query error: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>advanced_response<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>status_code<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div></div>
</div>
</section>
<section id="step-6-export-results-optional" class="level3">
<h3 class="anchored" data-anchor-id="step-6-export-results-optional">Step 6: Export Results (Optional)</h3>
<p>Let’s export the results to a JSON file for further analysis:</p>
<div id="22b7986f" class="cell" data-execution_count="5">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> json</span>
<span id="cb5-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> datetime <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> datetime</span>
<span id="cb5-3"></span>
<span id="cb5-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Export the results</span></span>
<span id="cb5-5"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> response.status_code <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>:</span>
<span id="cb5-6">    export_data <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {</span>
<span id="cb5-7">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"query"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"stishovite"</span>,</span>
<span id="cb5-8">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"execution_date"</span>: datetime.now().isoformat(),</span>
<span id="cb5-9">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"total_documents"</span>: data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'success'</span>][<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total'</span>],</span>
<span id="cb5-10">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sample_documents"</span>: data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'success'</span>][<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'data'</span>][:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>]  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># First 5 documents</span></span>
<span id="cb5-11">    }</span>
<span id="cb5-12">    </span>
<span id="cb5-13">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'geodeepdive_results.json'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'w'</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> f:</span>
<span id="cb5-14">        json.dump(export_data, f, indent<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb5-15">    </span>
<span id="cb5-16">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"💾 Results exported to 'geodeepdive_results.json'"</span>)</span>
<span id="cb5-17">    </span>
<span id="cb5-18">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Show a preview</span></span>
<span id="cb5-19">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">📋 Preview of exported data:"</span>)</span>
<span id="cb5-20">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Total documents: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>export_data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'total_documents'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb5-21">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Sample size: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(export_data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'sample_documents'</span>])<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div></div>
</div>
</section>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p>GeoDeepDive is a powerful tool for moving beyond simple keyword searches to true knowledge extraction. By providing programmatic access to a deeply processed corpus of scientific literature, it enables researchers to ask complex, data-driven questions at a scale that was previously impossible.</p>
<ul>
<li><strong>Key Takeaway:</strong> GDD turns unstructured text into structured, queryable data.</li>
<li><strong>What we accomplished:</strong> We successfully queried the GeoDeepDive API, retrieved scientific documents mentioning “stishovite,” filtered results by journal, and exported the data for further analysis.</li>
</ul>
</section>
<section id="additional-resources" class="level2">
<h2 class="anchored" data-anchor-id="additional-resources">Additional Resources</h2>
<ul>
<li><a href="https://geodeepdive.org/">GeoDeepDive Official Website</a></li>
<li><a href="https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens">GitHub Guide for Personal Access Tokens</a></li>
<li><a href="https://requests.readthedocs.io/">Requests: HTTP for Humans (Python Library Docs)</a></li>
</ul>
<p><strong>Note:</strong> Remember to keep your GitHub token secure and never share it publicly. For production use, consider using environment variables or secure secret management.</p>
</section>
<section id="related-resources" class="level2">
<h2 class="anchored" data-anchor-id="related-resources">Related resources</h2>
<ul>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Workshops/Intro-TextAnalysis_Python.html"><strong>Workshop</strong>: Intro to Text Analysis / NLP</a>: Covers NLP fundamentals like tokenization and named entity recognition used by GeoDeepDive’s pipeline.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Libraries/PyTesseract.html"><strong>Library</strong>: Pytesseract</a>: An OCR tool for extracting text from images — a key step in GeoDeepDive’s document processing.</li>
<li><a href="https://uw-madison-datascience.github.io/ML-X-Nexus/Toolbox/Data/Gutenberg.html"><strong>Data</strong>: Project Gutenberg</a>: Another large-scale text corpus useful for NLP and text mining research.</li>
</ul>
</section>
<section id="comments" class="level2">
<h2 class="anchored" data-anchor-id="comments">Comments</h2>


</section>

 ]]></description>
  <category>Notebooks</category>
  <category>Data</category>
  <category>NLP</category>
  <category>OCR</category>
  <category>Text analysis</category>
  <category>Retrieval</category>
  <guid>https://uw-madison-datascience.github.io/ML-X-Nexus/Learn/Notebooks/geodeepdive.html</guid>
  <pubDate>Thu, 21 Aug 2025 00:00:00 GMT</pubDate>
  <media:content url="https://uw-madison-datascience.github.io/ML-X-Nexus/images/geodeepdive_pipeline.jpg" medium="image" type="image/jpeg"/>
</item>
</channel>
</rss>
