Claude Code Best Practices

Blogs

GenAI

LLM

Agentic coding

Security

Author

Chris Endemann

Published

March 10, 2026

Modified

March 16, 2026

AI coding agents — tools that can autonomously write, edit, run, and test code on your behalf — are rapidly changing how software gets built. The space is crowded and evolving fast: Claude Code, GitHub Copilot, Cursor, Windsurf, Augment Code, Amazon Q Developer, Gemini Code Assist, and GitLab Duo are among the most prominent, with new entrants appearing regularly.

Best practices in this space are still being discovered — by the ML+X community and the broader developer ecosystem alike. This guide is our attempt to start mapping what works, using Claude Code as the primary lens. Claude Code is Anthropic’s agentic coding tool — distinct from the Claude.ai chat interface — and it comes in several forms: a CLI, a desktop app, IDE extensions (VS Code, JetBrains), and a web IDE at claude.ai/code. All give the agent real shell access to read, write, and execute code, which makes it a great lens for exploring the trade-offs of agentic coding: permissions, context management, and cost. We’ll reference Claude.ai, GitHub Copilot, and other tools for comparison where useful.

This is a first pass based on early experience — we expect it to evolve as the ML+X community builds more hands-on knowledge. If you have tips, corrections, or experiences to share, please leave a comment below.

This guide reflects the author’s understanding as of the last date modified

AI tools, pricing, features, and contractual terms change frequently. This post is community guidance, not official UW-Madison policy. For the latest institutional policies, data-use agreements, or questions about what data types are permitted with specific tools, consult UW-Madison Research Cyberinfrastructure or your department’s IT office.

UW-Madison cloud users: Get started with Claude Code + Vertex AI or Bedrock

If you’re at UW-Madison and want to use Claude Code through your institutional cloud account (GCP or AWS), check out our Claude Code Cloud Setup Guide for a step-by-step walkthrough — from cloud project setup to running your first session. Note: UW does not yet have a direct data agreement with Anthropic, so avoid using Claude Code with restricted or sensitive data. Cloud routing is suitable for general, non-sensitive research code. See Data Privacy for details.

Sources and attribution

Much of the Claude Code-specific guidance in this post draws on Anthropic’s official documentation, including their best practices guide, permissions and sandboxing docs, CLAUDE.md reference, data usage policy, and cost management guide. GitHub Copilot sections draw on GitHub’s coding agent docs and changelog. Where we paraphrase official documentation, we’ve linked to the source. Community perspectives and independent analyses are cited inline throughout.

What is agentic coding?

Traditional AI code assistants (like early GitHub Copilot or ChatGPT) work in a simple loop: you ask, they suggest, you accept or reject. Agentic coding tools go further. They can:

Read and navigate your entire codebase
Execute shell commands and run tests
Edit multiple files in a single pass
Iterate on their own output (fix errors, re-run tests, refine)
Operate semi-autonomously over multi-step tasks

This is powerful, but it also means these tools have real access to your system — and the potential to do real damage if not managed carefully.

The landscape at a glance

Before diving into Claude Code specifically, here’s a rough map of the major agentic coding tools as of early 2026:

Tool	Interface	Cost model	Notable strengths
Claude Code	CLI, desktop app, IDE extensions, web IDE	Pay-per-token (API) or Max plan	Strong multi-step reasoning, explicit permission model, `CLAUDE.md` project config
GitHub Copilot	VS Code/IDE, GitHub.com	Subscription + usage-based	Native GitHub integration, async PR creation via coding agent, multi-model support
Cursor	Custom IDE (VS Code fork)	Subscription	Polished IDE experience, fast inline edits, multi-file context handling
Windsurf	Custom IDE	Subscription (free tier available)	Low-friction agentic workflow, accessible pricing
Augment Code	IDE extension	Subscription	Large context window, whole-codebase awareness
Amazon Q Developer	IDE, CLI, AWS console	Free tier / Pro	Deep AWS service integration, infrastructure-aware suggestions
Gemini Code Assist	IDE, Google Cloud	Free tier / Enterprise	Google Cloud integration, Gemini model access
GitLab Duo	GitLab IDE, MR workflows	GitLab subscription add-on	Native GitLab CI/CD and merge request integration

This space is moving fast — capabilities and pricing change frequently. See Coding Agents Comparison for up-to-date benchmarks and pricing.

In practice, many developers use multiple tools: a chat UI for brainstorming and review, an agentic tool for multi-step feature work, and an IDE copilot for inline completions throughout the day.

What the same task looks like across different tools

To make these distinctions concrete, let’s walk through the same scenario — “I have a repo on GitHub and I want Claude to add a utility function, write tests, and open a PR” — across Claude.ai, Claude Code, and GitHub Copilot.

Claude.ai (Chat — not Claude Code)

Claude.ai is Anthropic’s general-purpose chat interface. It’s not an agentic coding tool — it can’t execute code, edit files, or run commands on your system. You provide context by pasting code into the conversation, and you copy the output back into your editor.

The Claude Desktop app has multiple tabs — don’t confuse them

The Claude Desktop app includes three tabs: Chat (standard conversation, with MCP support), Code (the full Claude Code agentic experience — see below), and Cowork (an autonomous agent for knowledge work — it can execute multi-step tasks on your desktop, like research, file organization, and document creation). Only the Code tab is an agentic coding tool. The Chat tab provides the same experience as claude.ai in a native app.

Start a new conversation at claude.ai or in the Claude Desktop app’s Chat tab
Paste in the relevant code (e.g., the contents of src/utils/ and a few example utilities)
Ask: “Add a slugify function that matches the style of these existing utilities. Also write tests.”
Claude generates the code and tests in the chat
You copy the output back into your editor, create a branch, commit, and open the PR yourself

Friction: You’re the middleware in both directions — pasting code in and copying code out. But notice what’s not here: permission prompts, approve/deny flows, or any risk of it running a bad command. Claude can’t touch your system, so the conversation feels fast and fluid even though you do all the manual work.

Best for: Quick code generation, architecture discussions, explaining unfamiliar code, and brainstorming — any task where you’re happy to provide context manually and apply changes yourself.

Claude Code

Claude Code is Anthropic’s agentic coding tool — completely different from the Claude.ai chat interface. You point it at a repository (by attaching a GitHub repo on the web or desktop, or launching it from a project directory in the terminal), and it can read your code, edit files, run shell commands, execute tests, and iterate on its own output — all within the scope of that project.

Claude Code is available across multiple surfaces — a desktop app, a web IDE, a terminal CLI, and IDE extensions for VS Code and JetBrains — but the core agentic engine is the same everywhere. You describe what you want, it reads your code, makes changes, runs tests, and iterates until the task is done. The differences between surfaces are mostly about how you interact, where the work runs, and how much the agent can do autonomously.

Here’s what a typical Claude Code session looks like. You type a request like:

Look at src/utils/ and add a slugify function that matches the style of existing utilities. Write tests too. Create a branch, commit, and open a PR when you’re done.

Claude Code will:

Read your existing utils to understand the style
Write the function and tests
Run pytest (or whatever your test runner is), see results
If tests fail, iterate — fix the code, re-run
Create a branch, commit, push, and open a PR

How much you’re involved depends on the surface. On the web version, Claude runs in an isolated cloud VM and auto-accepts edits — you review the results (a PR, a diff, test output) rather than approving each individual action. The desktop app and CLI both default to “ask permissions” mode, where Claude proposes changes and waits for your approval before applying them. The desktop app shows visual diffs with accept/reject buttons; the CLI prompts in the terminal. You can reduce this friction on either surface by switching to “auto accept edits” mode, configuring allow rules, or enabling sandboxing to auto-approve actions that stay within your project directory. Many experienced users auto-approve most actions and invest their review time at the PR stage instead. If you’re just getting started, a good sweet spot is: auto-approve reads and test execution, manually approve writes and git operations.

Desktop & Web

The easiest way to get started is through the Claude Desktop app (Code tab) or Claude Code on the web at claude.ai/code. Both provide the same GUI experience. The main difference is where it runs: the desktop app works with local git repositories on your machine, with each session getting its own isolated git worktree so parallel tasks don’t collide. The web version clones your GitHub repo into an isolated cloud VM — no local setup needed. The web version is also available on mobile (iOS / Android) for kicking off and monitoring tasks on the go. Note: the desktop app requires Git — your project must be a git repo with at least one commit.

Key capabilities:

Visual diff review — see exactly what Claude changed, leave inline comments on specific lines, and ask Claude to revise
Live app preview — Claude can start a dev server and verify its own changes in an embedded browser, taking screenshots and fixing issues it finds
Parallel sessions — run multiple tasks simultaneously in separate tabs, each on its own isolated branch
GitHub PR monitoring — watch CI status, auto-fix failing checks, and auto-merge when everything passes
Scheduled tasks — set up recurring tasks using cron expressions (e.g., daily dependency checks, periodic code reviews, deployment monitoring). On desktop, these persist across sessions; on the web, you can schedule them in Cowork. In the CLI, use the /loop skill for lightweight in-session polling
Connectors — one-click integrations for GitHub, Slack, Linear, Notion, and more
Async handoff — start a task on the web and close your laptop; it runs in the cloud and notifies you when done. You can also start a task from the terminal with claude --remote, or pull a web session into your terminal with claude --teleport

Best for: Users who prefer a GUI, want visual diff review and parallel task management, or want to get started without installing anything. The web version is the fastest way to try Claude Code — just open claude.ai/code and point it at a repo.

Terminal (CLI)

Claude Code is also available as a CLI, installed via npm (npm install -g @anthropic-ai/claude-code). It’s the same agentic engine, but the terminal interface offers some distinct advantages:

IDE extensions — Claude Code integrates directly into VS Code and JetBrains, so you can use it without leaving your editor
Scriptability — pipe commands, chain with shell tools, and integrate into automated workflows (CI/CD, git hooks)
CLAUDE.md authoring — the terminal is the natural place to set up and iterate on your project’s CLAUDE.md configuration
SSH and remote environments — works anywhere you have a terminal, including remote servers, containers, and cloud dev environments
Full local control — no cloud dependency; everything runs on your machine (or wherever your terminal is)
Flexible auth and billing — the desktop and web apps require an Anthropic login (Max plan or API credits). The CLI also supports routing requests through Google Vertex AI or AWS Bedrock, so organizations that need to keep API traffic within their own cloud environment (for compliance, billing, or data residency reasons) can do so. See our Claude Code Cloud Setup Guide for a step-by-step walkthrough using UW-Madison GCP or AWS

# Install npm if needed (e.g., on a fresh WSL2 or Ubuntu setup)
sudo apt install npm

# Install and start Claude Code (sudo needed on Linux/WSL2)
sudo npm install -g @anthropic-ai/claude-code

# Install sandbox dependencies (WSL2/Linux only)
sudo apt-get update && sudo apt-get install bubblewrap socat

# Navigate to your project and launch Claude Code
# Note: in WSL2, your Windows files are at /mnt/c/Users/<username>/...
cd yourrepo && claude

Best for: Developers comfortable with the terminal, CI/CD integration, scripting and automation, working in remote/SSH environments, and organizations that need to route traffic through their own cloud provider.

A note on security: Claude Code runs with your permissions

Agentic coding tools can access more than you might expect

The level of system access depends on which surface you use:

CLI and desktop app — Claude Code operates with your full user-level filesystem and shell permissions. It can read your SSH keys, modify files outside your project, run arbitrary shell commands, and access anything your user account can reach.
IDE extensions (VS Code, JetBrains) — same access as the CLI, since the extension runs Claude Code as a local process under your user account.
Web version (claude.ai/code) — runs in an isolated cloud VM with access only to your cloned GitHub repo. It cannot reach your local filesystem, SSH keys, or other local resources. This is the most restricted surface by default.

This isn’t unique to Claude Code — any agentic tool with shell access (Cursor, Windsurf, Copilot coding agent) has similar access on your local machine. The difference is in what mitigations each tool provides.

Claude Code mitigates this with several layers of protection:

Permission prompts — Claude asks before every file write, shell command, and git operation. You can configure allow/deny rules to auto-approve trusted actions and hard-block sensitive paths.
Built-in sandboxing — an OS-level sandbox restricts filesystem access to your project directory and limits outbound network traffic. This is the single most impactful security measure you can enable.
Desktop app — adds git worktree isolation on top of the sandbox, so changes in one session don’t affect others until committed.
Web version (claude.ai/code) — the most restricted surface. Each task runs in a fresh, ephemeral VM with Gvisor-based kernel isolation; storage is wiped when the task completes and credentials never exist inside the sandbox.

See Security fundamentals below for configuration details, deny-rule examples, and container guidance — or the Cloud Setup Guide’s security section for a step-by-step walkthrough.

GitHub Copilot

GitHub Copilot is GitHub’s AI coding assistant. It’s a multi-model platform — you can choose from Claude, GPT, Gemini, and others as the underlying model. This is fundamentally different from Claude Code, and the distinction matters.

“Claude” in Copilot vs. Claude Code: what’s actually different?

When you select Claude as the model in Copilot (whether in VS Code agent mode or the async coding agent), you’re using Claude’s language model — but GitHub’s orchestration layer is driving it. GitHub controls the system prompts, the tool-calling framework, the context management, and how your instructions are delivered to the model. Think of it as Claude’s brain in GitHub’s body.

Claude Code, by contrast, is Anthropic’s own agentic system built specifically around Claude. Anthropic controls the entire stack: the system prompts are purpose-built for agentic coding, the tool framework is designed for Claude’s strengths, and features like extended thinking, CLAUDE.md project configuration, and the permission model are all tightly integrated.

Why this matters in practice:

Context handling — Copilot primarily derives context from open tabs and (when indexing is enabled) broader repo structure, with a platform-level cap of ~128k tokens. Claude Code uses Claude’s full 200k-token context window and maps your entire repository, accumulating context through conversation threading. For multi-file tasks, Claude Code generally understands project architecture more holistically.
Instruction following — Claude Code reads your CLAUDE.md files natively. Copilot has its own instruction mechanism (copilot-instructions.md), but users have reported that Claude models don’t always follow Copilot’s instruction files as reliably — because the model is being orchestrated by a system designed for multiple models, not optimized for any one.
Extended thinking — Claude Code uses extended thinking by default with adjustable token budgets. Copilot support for thinking tokens has been inconsistent, with some configurations producing errors when extended thinking parameters are passed.
Tools and sub-agents — Claude Code ships with 18+ built-in tools (file editing, bash, search, git, sub-agents), plus full MCP support and hooks. Copilot agent mode uses its own curated tool set, which is capable but less extensive.
Quality on complex tasks — In a 50-session benchmark study, Claude Code produced a higher accept rate (44% vs 38%) and scored significantly better on bug-fixing context fidelity (8.5/10 vs 5.9/10). Copilot was ~15 seconds faster per task on average and excels at inline completions.

Claude as a standalone agent on GitHub (Feb 2026)

As of February 2026, Claude is also available as a standalone agent on GitHub — not just a model choice within Copilot. You can assign issues directly to @claude (or @copilot, or @codex) on GitHub.com, and in VS Code 1.109+ you can start Claude agent sessions that use Anthropic’s own agent harness rather than Copilot’s orchestration. In these modes, you get the same prompts, tools, and architecture as Claude Code — which should close the quality gap vs. using Claude as a model within Copilot. Initially available for Pro+ and Enterprise plans; expanded to Copilot Business and Pro on Feb 26 at no additional cost.

Agent mode (in VS Code)

Open your repo in VS Code with the Copilot extension installed
Open the Copilot chat panel (Ctrl/Cmd+Shift+I)
Select agent mode, choose Claude as the model
Type: “Add a slugify function to src/utils/ matching the existing style. Write tests.”

Copilot will:

Read relevant files
Create/edit files directly — no permission prompt by default in many configurations
May run tests if it decides to (or you can ask it to)
You review the changes in VS Code’s diff view
You handle the git workflow (branch, commit, push, PR) — or use the async coding agent for that

Friction: The IDE experience is smooth but you have less visibility into why the agent made certain choices. Agent mode is still evolving — for complex multi-step tasks it may not iterate as effectively as Claude Code’s agentic loop. The upside is zero context-switching: you’re already in your editor.

Coding agent (async)

GitHub’s async coding agents let you delegate work directly from issues and PRs — no IDE or terminal needed:

Go to your repo on GitHub.com
Create an issue: “Add a slugify utility function to src/utils/ with tests”
Assign the issue to @copilot, @claude, or @codex via the Assignees dropdown
Walk away — the agent works in a secure cloud environment

The agent will:

Create a branch
Implement the function and tests in an ephemeral environment
Open a draft PR referencing the issue
You get a notification when the PR is ready to review
You can leave review comments mentioning @claude to request changes — the agent iterates like a human collaborator

What’s running under the hood? When you assign to @claude, GitHub runs Anthropic’s Claude Code Action — which uses the same Claude Code engine (agentic loop, tools, extended thinking) that powers the CLI and desktop app. The key difference is that it runs in GitHub’s managed environment rather than your local machine, and its scope is limited to the repo and issue context. Assigning to @copilot uses GitHub’s own orchestration with your selected model, and @codex uses OpenAI’s agent.

By default, the async coding agent uses Claude Sonnet 4.6 when no model is explicitly selected. You can choose from Claude Opus 4.6, Claude Sonnet 4.5, GPT-5.1-Codex-Max, GPT-5.2-Codex, and others via the model picker.

Friction: This is the most hands-off option, but you have the least control during execution. Works best for well-scoped, clearly described issues. If the task is ambiguous or requires judgment calls, you may end up doing multiple rounds of PR review and comments to guide it.

Best for: Inline autocomplete, single-file edits, and quick agent tasks within the IDE. Also excellent for async PR generation on well-defined issues. Many developers use Copilot alongside Claude Code — Copilot for inline completions in the editor, Claude Code in the terminal for deep multi-file work.

Key takeaway

The same task ranges from fully manual (Claude.ai — you apply every change) to fully hands-off (Copilot coding agent — you just review the PR). But “more autonomous” doesn’t always mean “better results.”

Counterintuitively, Claude.ai can feel lower-friction than Claude Code for many tasks — the chat interface just answers, with no permission prompts or approve/deny flow. You lose the ability to have Claude execute things directly, but you gain a frictionless conversation. Claude Code (in any form) is far more capable — it can run tests, iterate on failures, and push code — but its default guardrails (which exist for good reason) mean more interruptions until you tune them.

The trade-off is between autonomy, control, and optimization:

Claude.ai (chat) — not agentic, but fluid and zero-risk. You do the manual work.
Claude Code (desktop, web, CLI, or IDE extension) — fully agentic, with Anthropic’s purpose-built orchestration optimized for Claude. The deepest integration between model and tooling.
Copilot with Claude model (IDE) — agentic within the IDE, fewer interruptions, but Claude is running through GitHub’s orchestration layer rather than Anthropic’s. Good for inline work; less optimized for complex multi-step reasoning.
Claude agent on GitHub (async) — Anthropic’s own agent harness running on GitHub’s infrastructure. Assign issues to @claude for async PR generation.

Pick based on the task. Sensitive work or unfamiliar codebase? Claude Code’s guardrails are a feature. Quick question or brainstorming? Claude.ai chat is hard to beat. Already in VS Code and want inline help? Copilot is hard to beat. Need Claude’s full reasoning depth on a complex refactor? Claude Code is the most direct path to the model’s capabilities.

Working effectively with Claude Code

This is the core of the guide. Whether you’re using the CLI, desktop app, an IDE extension, or the web IDE, these practices apply across all Claude Code surfaces. Anthropic’s own best practices guide goes deeper on context management, prompt patterns, and scaling across parallel sessions — we’ll highlight the essentials here and add our own perspective.

Think in features, not projects

One of the biggest lessons from working with agentic coding tools: use them for feature-level development, not for building entire projects in one shot.

Why? Because agents work best with clear, well-scoped requests. The less clarity you provide, the more the agent has to guess — and guessing leads to:

Agentic loops (trying approaches, failing, trying again)
Drift from your intended architecture
Wasted tokens and time
Code that technically works but doesn’t match your vision

Precise requests get precise results. Instead of “build me a web app with auth,” try:

“Add a login form component that submits to /api/auth/login and stores the JWT in a httpOnly cookie”
“Write a pytest fixture that creates a test database with the schema from models.py”
“Refactor the process_data function in pipeline.py to handle the case where input_df has missing columns”

Each of these is a single, well-defined task that an agent can execute without ambiguity.

Use `CLAUDE.md` as your control surface

CLAUDE.md is a markdown file you place in your project root that gives Claude Code persistent context about your project. Think of it as a README for the agent — it’s loaded automatically at the start of every session and shapes how Claude behaves. You can include things like:

How your project is structured (key directories, entry points)
Coding conventions (naming, formatting, patterns to follow or avoid)
Testing and build commands
Safety rules (“never force-push,” “don’t modify migrations/”)
Links to docs or specs the agent should reference

Claude Code also supports CLAUDE.md files in subdirectories (loaded when Claude works in that directory) and a global ~/.claude/CLAUDE.md for preferences that apply across all projects. The file is advisory — Claude will follow these instructions in good faith, but they’re not enforced at the system level the way hooks or deny rules are. For anything safety-critical, back it up with a hook or deny rule.

This is one of the most underrated features — it’s your main lever for shaping how the agent behaves across sessions.

Prevent runaway loops:

## Testing requirements
- Always run the full test suite (`pytest tests/`) after making changes
- If tests fail, fix the failing tests before moving on
- Do not push code with failing tests

This single instruction saves enormous headaches. Without it, the agent might push broken code, you discover test failures in CI, and then you’re spending time fixing things that should have been caught locally.

Enforce project conventions:

## Code style
- Use type hints for all function signatures
- Follow the existing import ordering convention
- Do not add new dependencies without asking first

Limit destructive actions:

## Safety
- Never run `rm -rf` on any directory
- Never force-push to any branch
- Never modify files in the `config/production/` directory
- Always create a new branch for changes; never commit directly to main

Provide architectural context:

## Project structure
- API routes go in `src/routes/`
- Business logic goes in `src/services/`
- Database models are in `src/models/`
- Tests mirror the source structure under `tests/`

Good CLAUDE.md context reduces agentic loops — the agent spends less time exploring your project and more time doing useful work. But there’s a tradeoff: CLAUDE.md loads into every session, and a bloated file can hurt more than it helps.

Don’t overload CLAUDE.md

LLMs perform best when their context window is full of focused, relevant content. Since CLAUDE.md is injected into every conversation, everything in it competes for attention with the actual task at hand. A few things to keep in mind:

Aim for under 300 lines. Some teams keep theirs under 60. There’s no hard limit, but shorter files mean less context pollution.
Frontier models can track ~150–200 instructions consistently. Beyond that, the model starts selectively ignoring rules it considers irrelevant to the current task — which means your most important instructions may get lost in the noise.
Use linters, not CLAUDE.md, for code style. If a tool can enforce a rule deterministically, don’t spend context budget asking an LLM to follow it.
Prefer progressive disclosure. Rather than documenting everything upfront, tell Claude where to find information (e.g., “see docs/api-spec.md for endpoint details”) so it can load context on demand.
Move specialized knowledge into skills. Skills load on demand — a /deploy skill or a migration-conventions skill only enters context when relevant, keeping your base CLAUDE.md lean.

The bottom line: write CLAUDE.md for the model, not for humans. Keep it concise, universally applicable, and structured. As Anthropic has noted, larger context windows won’t solve this — “context pollution and information relevance concerns” apply at any window size.

See the official CLAUDE.md reference for the full spec, including file resolution order and advanced features.

Tune the permission dial

Claude Code’s permission system is the main thing that distinguishes it from tools that “just go.” By default, it asks before every file write, shell command, and git operation. This is safe but slow.

The key insight: permissions aren’t all-or-nothing. You can configure a spectrum:

Start conservative — approve everything manually while you’re learning what the agent does
Auto-approve low-risk actions — file reads, grep/search, test execution. These rarely cause harm and the prompts add friction without adding safety.
Manually approve writes and git operations — this is where real damage can happen (overwriting files, force-pushing, committing secrets)
Use CLAUDE.md safety rules as a second layer — even if you auto-approve shell commands, the agent will respect instructions like “never force-push”

The sweet spot for most developers: auto-approve reads and test runs, manually approve everything else. As you build trust with specific workflows, you can loosen further. See Anthropic’s permissions reference for the full rule syntax and available tool names.

Use branches and commit frequently

The non-negotiable: always work on a branch, never let an agent commit directly to main. Beyond that, there are two common workflows:

Auto-commit freely, review at the PR stage. Let the agent commit (and even push) as it works. You review the full diff when you open the PR, just like you would with a human contributor. This keeps momentum high and works well when you have CI checks and a good test suite gating your merges.
Commit manually after reviewing each change. Approve each commit yourself so you stay close to every change as it happens. This is safer when you’re learning the tool, working on sensitive code, or don’t yet have strong CI guardrails.

Either way, frequent commits help — they give you clean revert points if the agent goes off track. A good CLAUDE.md instruction like “commit after each completed task” keeps things granular regardless of which workflow you prefer.

Review everything (at the right level)

Agent-generated code isn’t exempt from review — but when you review is a matter of workflow. Some developers review each diff before committing; others let the agent run and review the full PR diff before merging. Both are valid. What matters is that someone (you, a teammate, or CI) checks the code before it lands on main:

Read the diffs
Check for security issues (hardcoded secrets, SQL injection, etc.)
Verify it matches your architectural patterns
Make sure it doesn’t introduce unnecessary complexity

Give the agent a way to verify its own work

This is the single highest-leverage thing you can do. Claude performs dramatically better when it can check its own output — running tests, comparing screenshots, validating behavior — rather than relying on you as the only feedback loop.

Include test cases in your prompt: “Write a validateEmail function. Test cases: user@example.com → true, invalid → false, user@.com → false. Run the tests after implementing.”
Ask it to verify UI changes visually: “[paste screenshot] Implement this design. Take a screenshot of the result and compare it to the original.”
Point to the symptom, not just the fix: “The build fails with this error: [paste error]. Fix it and verify the build succeeds. Address the root cause, don’t suppress the error.”

The more you invest in making your verification rock-solid (a good test suite, a linter, a build check), the more autonomously the agent can work.

Explore first, then plan, then code

For complex tasks, resist the urge to let Claude jump straight to implementation. Use Plan Mode (toggle with Shift+Tab) to separate exploration from execution:

Explore: In Plan Mode, Claude reads files and answers questions without making changes. “Read src/auth/ and understand how we handle sessions and login.”
Plan: Ask Claude to create an implementation plan. “I want to add Google OAuth. What files need to change? Create a plan.”
Implement: Switch back to Normal Mode and let Claude execute the plan, verifying against tests.
Commit: Ask Claude to commit with a descriptive message.

Skip this for small, clear tasks — if you could describe the diff in one sentence, just ask Claude to do it directly. Planning is most useful when you’re uncertain about the approach or the change touches multiple files.

Manage context aggressively

Claude’s context window is your most important resource. As it fills up with conversation history, file contents, and command outputs, performance degrades — Claude may “forget” earlier instructions or make more mistakes. (This section is adapted from Anthropic’s official best practices.)

Use /clear between unrelated tasks — a clean context dramatically improves quality
Use /compact to summarize long conversations — run /compact focus on the API changes to keep what matters and discard the rest
Delegate exploration to subagents — when Claude needs to read dozens of files to investigate something, have it use a subagent. The subagent works in its own context and returns a summary, keeping your main conversation lean.
Run /context to see what’s consuming your context window (MCP servers can be surprisingly expensive)
Course-correct early — if Claude is going in the wrong direction, interrupt with Esc rather than letting it generate more output that clutters context. After two failed corrections, /clear and start fresh with a better prompt.

Extend Claude Code with skills, hooks, and MCP

Beyond CLAUDE.md, Claude Code has a rich extension system for customizing behavior:

Skills — reusable knowledge and workflows. Create a /deploy skill that runs your deployment checklist, or an API conventions skill that Claude loads when working on your endpoints. Skills load on demand, so they don’t bloat every session like CLAUDE.md does.
Hooks — deterministic scripts that run at specific points in Claude’s workflow. Unlike CLAUDE.md instructions (which are advisory), hooks are guaranteed to fire. Use them for things like running ESLint after every file edit or blocking writes to a migrations/ directory.
MCP — connect Claude to external services. Query your database, post to Slack, control a browser, or pull issues from your project tracker — all from within a Claude Code session.
Subagents — isolated workers with their own context. Useful for research tasks, code review, or any work where you don’t want the intermediate steps cluttering your main conversation.

Start with CLAUDE.md for your core conventions. Add skills when you find yourself repeating the same workflows. Add hooks when you need guaranteed automation. Add MCP when you need external integrations. For a deeper dive, see Anthropic’s extension system overview, which covers when to use each mechanism.

Managing costs

Agentic coding tools that use API tokens (like Claude Code) charge per token, and agentic workflows are token-hungry — the agent reads files, reasons through problems, writes code, runs commands, reads output, and iterates. A single focused task might use 50K–200K tokens. A sprawling, underspecified session can easily burn through 1M+ tokens.

What does this actually cost?

There are two ways to pay for Claude Code: subscription plans (fixed monthly cost) or API tokens (pay-per-use). Most individuals should start with a subscription; API pricing is better for automation and CI/CD pipelines. (Pricing details adapted from Anthropic’s pricing page and Claude Code cost management docs — verify current prices, as they change frequently.)

Subscription plans (as of early 2026):

Plan	Price	What you get
Pro	$20/month	Claude Code access with moderate usage limits
Max 5x	$100/month	5× the Pro usage limit — sweet spot for most active developers
Max 20x	$200/month	20× the Pro usage limit — for heavy agentic work or parallel sessions
Team (premium seats)	$150/user/month (min 5 seats)	Team management, shared billing, org-level policies

With subscription plans, you never get a surprise bill — you hit rate limits instead. The /cost command shows your token usage in a session, but on a subscription plan this is informational only; it doesn’t affect your bill.

API token pricing (pay-per-use):

Model	Input tokens	Output tokens	Best for
Haiku 4.5	$1/MTok	$5/MTok	Fast, cheap tasks (linting, simple edits)
Sonnet 4.6	$3/MTok	$15/MTok	Default for most coding work
Opus 4.6	$5/MTok	$25/MTok	Complex reasoning, architecture decisions

Note that output tokens cost 5× more than input tokens across all models — and code generation is output-heavy. Also, requests exceeding 200K input tokens are charged at 2× input / 1.5× output rates, which matters for large codebases. Prompt caching can reduce input costs by up to 90% on repeated system prompts, and the Batch API offers a 50% discount for async processing.

Anthropic reports that the average Claude Code user on API pricing spends roughly $6/day, with 90% of users under $12/day. That translates to $100–200/month for active development with Sonnet. But averages hide a lot of variance — one developer documented a single session that hit ~$150/hour running multiple parallel agents. One detailed usage report showed ~892K output tokens vs ~45K input tokens in a single month on a mix of Opus and Sonnet, costing ~$1,248.

Rule of thumb: If your monthly API costs would exceed $60–80, Max 5x is cheaper. If they’d exceed $150, Max 20x is the clear winner.

For comparison, GitHub Copilot runs $10–$39/month depending on tier, with usage-based pricing for premium models beyond included allowances.

Watch out for runaway costs

Agentic workflows can burn through tokens fast, especially when things go wrong:

Agentic loops: A vague prompt can send the agent into cycles of trying approaches, failing, reading more files, and trying again — each loop consuming thousands of tokens.
Context accumulation: As your conversation grows, every new message includes the full context window — so the 50th message in a session costs far more than the 1st. Use /clear between unrelated tasks and /compact to summarize long conversations.
Parallel sessions: Running multiple Claude Code sessions simultaneously (especially on the web or with agent teams) multiplies your token consumption proportionally. Five parallel sessions = 5× the cost.
Extended thinking: Thinking tokens are billed as output tokens. A complex Opus session with deep reasoning can generate thousands of thinking tokens per turn.

How to protect yourself:

On a subscription: You can’t overspend, but you can hit rate limits mid-task. Monitor with /cost and plan your usage around your limit.
On API pricing: Set spending alerts and hard limits on your Anthropic account. Use separate API keys for different projects so you can track spending.
In both cases: Use /cost to monitor token usage mid-session. If a session is getting expensive, /clear and start fresh with a more specific prompt. Break large tasks into focused sessions.

For UW-Madison researchers: institutional cloud benefits

If you’re at UW-Madison (or a similar research institution), routing AI API costs through a UW-provisioned cloud account offers two main benefits: institutional billing (charges go to your cloud project, not your personal card — important for grants and shared budgets) and lower overhead on grants (UW’s Cloud Computing Pilot cuts F&A from 55.5% to 26%, saving ~$2,950 per $10,000 in cloud spending). NIH-funded researchers may get additional discounts through STRIDES. Note that these savings are on the overhead and billing side — Anthropic’s per-token pricing is the same whether you route through Vertex AI, Bedrock, or the direct API. Also note that institutional cloud agreements cover the cloud provider’s services — they do not extend to Anthropic’s data handling (see Data Privacy below).

Contact your department’s IT staff or Research Computing to ask about available cloud credits and whether AI API costs are eligible.

Strategies to keep costs down

Be specific in your prompts — vague requests lead to more agentic loops, which means more tokens. “Add a login form” costs more than “Add a React component at src/components/LoginForm.tsx that posts email/password to /api/auth/login”
Use /clear aggressively — reset context between unrelated tasks. A clean context means fewer input tokens per message.
Use /compact — summarize long conversations to free up context space without losing key information
Use the right model for the task — Haiku or Sonnet for straightforward tasks, reserve Opus for complex reasoning
Break large tasks into smaller sessions — each focused session is cheaper than one sprawling conversation that loses context and re-reads files
Use CLAUDE.md to provide project context upfront — this reduces the amount of exploration the agent needs to do
Delegate exploration to subagents — they run in isolated context and return summaries, keeping your main session lean
Monitor session costs — run /cost periodically to see where you stand

For more detail, see Anthropic’s cost management guide.

Energy and environmental considerations

Agentic coding is more compute-intensive than a simple chat query or web search. A single LLM text query now uses roughly 0.3 Wh — about the same as a Google search — thanks to hardware improvements and model optimization. But an agentic coding session chains hundreds or thousands of such calls together as the agent reads files, reasons, writes code, runs commands, and iterates.

How much energy does agentic coding actually use?

A detailed analysis by Simon P. Couch estimated Claude Code’s energy footprint at roughly 41 Wh per session — over 130× a single chat query. A heavy day of usage (multiple sessions, parallel agents) can reach ~1,300 Wh/day. To put that in perspective:

Activity	Energy
Google search or single AI chat query	~0.3 Wh
LED lightbulb (1 hour)	~10 Wh
One Claude Code session	~41 Wh
Streaming 1 hour of video (incl. device)	~36–80 Wh
Heavy Claude Code daily use	~1,300 Wh
Running a dishwasher once	~1,300 Wh
Daily refrigerator use	~1,200–1,500 Wh

So a heavy day of agentic coding is roughly equivalent to running your dishwasher — modest at the individual level, but significant in aggregate.

The bigger picture:

The IEA projects that global data center electricity consumption will roughly double from ~415 TWh in 2024 to over 945 TWh by 2030, driven largely by AI workloads. In the US, data centers are projected to consume more electricity than all energy-intensive manufacturing combined (aluminum, steel, cement, chemicals) by 2030.
An estimated 60–90% of AI computing energy goes to inference (running models), not training. Training grabs headlines, but inference — every agentic session, every chat query — is where the ongoing energy cost lives.
Cloud providers are investing in renewable energy, but coverage varies. Anthropic has pledged to offset energy costs and invested in grid optimization research, though the company lacks formal carbon reduction targets and a significant portion of new capacity is natural gas powered.
On the efficiency side, a University of Rhode Island study found Claude Sonnet to be among the most energy-efficient frontier models, and energy per token has improved ~120× from GPT-3 to current models due to hardware and architecture advances.

What this means for you:

This doesn’t mean you shouldn’t use agentic tools — the productivity gains can be substantial, and the energy per unit of useful output may be better than the alternative (a human developer running builds, searching docs, and context-switching for hours). But it’s a reason to be intentional: don’t let an agent spin in wasteful loops when a well-scoped prompt would get the job done in one pass. Efficient prompting is both cheaper and greener.

Data privacy: who sees your code?

When you use Claude Code, your code and prompts are sent to Anthropic’s servers for inference. This is a reasonable concern — here’s what actually happens to that data, depending on how you access Claude.

Is your data used for model training?

Access method	Used for training?	Default retention
Claude API, Team, Enterprise (commercial terms)	No — prohibited unless you explicitly opt in (e.g., Development Partner Program)	30 days
Free / Pro / Max (consumer plans)	Your choice — controlled via Privacy Settings	5 years (training on) / 30 days (training off)

If you’re on a Free, Pro, or Max plan: check your training settings

Anthropic gives you the choice to allow training on your data — check your setting at claude.ai/settings/data-privacy-controls. This applies to Claude Code sessions on consumer plans too.

Important nuances for consumer plans:

Safety exception: Even if you disable training, conversations flagged for safety review may still be used to improve Anthropic’s ability to detect and enforce their Usage Policy (e.g., training safeguard models).
What’s included: When training is enabled, Anthropic may use the entire conversation — prompts, outputs, custom styles, and conversation preferences.
What’s excluded: Raw content from connectors (Google Drive, MCP servers) is not included in training data, unless you directly copy that content into your conversation.
Feedback (thumbs up/down): Submitting feedback stores the full related conversation for up to 5 years, de-linked from your user ID. This data may be used for training regardless of your training setting.

For researchers with sensitive or restricted data: Routing through a cloud provider (Vertex AI, Bedrock) ensures your data is not used for training and limits retention to 30 days — but your prompts still reach Anthropic’s infrastructure for inference. UW-Madison has agreements with Google, AWS, and Microsoft for their cloud services, but does not yet have a direct data-use agreement with Anthropic. This means cloud routing alone does not provide UW-sanctioned data protections for restricted data (HIPAA/PHI, FERPA, CUI, export-controlled, or data under a DUA that prohibits third-party processing). Avoid using Claude Code with restricted data until a formal UW-Anthropic agreement is in place. For general, non-sensitive research code, cloud-routed Claude Code is fine to use today. UW is actively exploring institutional Anthropic licenses and data agreements. Enterprise customers can negotiate zero-data retention (ZDR) agreements where Anthropic stores nothing after the API response. See our Cloud Setup Guide for how UW-Madison researchers can use institutional cloud accounts (GCP or AWS) and for more details on data sensitivity considerations.

Can Anthropic employees see your code?

Not by default. Employee access to conversation data requires one of:

You submit feedback (thumbs up/down, /bug command) — the full related conversation becomes reviewable, stored for up to 5 years (de-linked from your user ID for thumbs up/down)
A trust & safety investigation — if Anthropic’s automated systems flag a policy violation (this data may also be used for training safeguard models)
Explicit consent — you voluntarily share data with Anthropic

Under commercial terms (API, Vertex AI, Bedrock), access is further restricted by contractual obligations.

What about the web version?

When you use Claude Code on the web (claude.ai/code), your GitHub repo is cloned into an ephemeral VM. The VM is destroyed when the task completes — there’s no persistent repo storage between sessions. The same retention policies above apply to any code Claude reads during the session.

Telemetry and error reporting

Claude Code sends operational telemetry (latency, reliability metrics — no code or file paths) to Statsig, and error reports to Sentry. These are enabled by default on the direct Claude API but disabled by default on Vertex AI, Bedrock, and Foundry.

To opt out individually: DISABLE_TELEMETRY=1, DISABLE_ERROR_REPORTING=1, DISABLE_BUG_COMMAND=1, or CLAUDE_CODE_DISABLE_FEEDBACK_SURVEY=1. To disable all non-essential traffic at once: CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1. These can be set in your settings.json.

For a detailed breakdown by provider, see the Cloud Setup Guide — Data Usage & Privacy.

Security fundamentals

When you launch Claude Code from the CLI, it runs with your user’s full filesystem permissions. It can read, modify, or delete files anywhere your account can reach — not just your project directory. A poorly worded prompt, an agentic loop, or a prompt injection attack could cause changes you didn’t intend. Here’s how to limit the blast radius, from most important to least.

Use permissions and deny rules

Claude Code has a built-in permissions system that controls what it can do. In the default mode, it asks for approval before file writes, shell commands, and git operations. You can customize this with rules in settings.json:

deny — hard block. Claude can’t use the tool, period. Deny rules always win, even if you accidentally click “always allow” on a prompt.
allow — auto-approve. Skips the approval prompt for things you trust (e.g., git add, pytest).

Deny rules are your most important security layer. They protect sensitive paths — SSH keys, cloud credentials, .env files — regardless of what the agent tries to do. The approval prompt is your first line of defense; deny rules are the backup that can’t be bypassed.

{
  "permissions": {
    "deny": [
      "Read(//home/youruser/.ssh/**)",
      "Edit(//home/youruser/.ssh/**)",
      "Read(//home/youruser/.aws/**)",
      "Edit(//home/youruser/.aws/**)",
      "Read(./.env)",
      "Edit(./.env)",
      "Bash(rm -rf *)",
      "Bash(curl:*)",
      "Bash(wget:*)",
      "Bash(cat:*)"
    ]
  }
}

See the Cloud Setup Guide’s security section for a full walkthrough with platform-specific examples, or the official permissions docs for the complete rule syntax.

Enable Claude Code’s built-in sandbox

Claude Code’s built-in sandbox uses OS-level isolation (Linux namespaces / macOS Seatbelt) to restrict what shell commands can do — limiting filesystem writes to your project directory and blocking unauthorized network requests. This is separate from running inside a container (covered below). It’s lightweight, adds negligible overhead, and Anthropic’s internal testing found it reduced permission prompts by 84% while increasing security. Use it alongside deny rules for the strongest protection — Anthropic calls this “defense in depth”.

Scope your credentials

Even with deny rules and sandboxing, it’s good practice to limit what credentials the agent has access to in the first place.

Use minimal-scope tokens. Create fine-grained GitHub tokens scoped to only the repos and permissions the agent needs. If it only pushes to one repo, don’t give it access to your entire account. Use a bot account for agent-driven git operations, and generate dedicated deploy keys rather than reusing your personal SSH keys.

Set spending limits on API keys and use separate keys from your personal or production ones.

Add secrets to .gitignore — .env, credentials.json, *.pem, *.key, .netrc — before the agent ever runs. Once a secret is committed, it’s in the history. (But note: .gitignore prevents committing secrets, not reading them. Deny rules are what actually block the agent from accessing sensitive files.)

Consider containers for CI/CD and headless environments

For interactive development, the built-in sandbox is the right choice — it’s what Anthropic recommends, it’s lightweight, and combined with deny rules it provides strong isolation without any setup overhead. You don’t need Docker for local coding sessions.

Containers solve a different problem: unattended, non-interactive execution where there’s no human to approve permission prompts. In CI/CD pipelines, GitHub Actions, and headless automation, running Claude Code inside a container lets you use --dangerously-skip-permissions safely — the container itself is the isolation boundary, so there’s nothing outside it to damage. This is also the pattern Anthropic’s own GitHub Actions and GitLab CI/CD integrations use.

Options include a plain Docker container with your project mounted as a volume, Docker sandboxes (microVM-based isolation), or cloud sandbox platforms like E2B. For CI pipelines, ephemeral containers that are destroyed after each run are the safest option — nothing persists between runs.

Don’t stack them. The built-in sandbox and Docker containers are alternative isolation strategies. Running bubblewrap inside Docker introduces nested sandbox complexity without meaningful security benefit. Pick one: sandbox for interactive work, containers for headless automation.

Watch for prompt injection and runaway agents

Prompt injection is when an agent reads a file or message that contains hidden instructions designed to hijack its behavior. A malicious README.md, issue body, or .docx attachment could trick the agent into exfiltrating files or running harmful commands. Be especially cautious when pointing an agent at untrusted repositories or external content. Deny rules and sandboxing are your main defenses here — they limit what the agent can do even if it’s been tricked.

Runaway agents burn tokens and make unwanted changes when they get stuck in loops. Commit your work frequently so you can recover from mistakes, set spending limits on your API keys, and don’t hesitate to interrupt (Ctrl+C) and redirect. Set up git hooks or CI checks as safety nets — for example, preventing force-pushes to main.

Never give an agent unsupervised access to production systems, databases, or deployment pipelines.

Platform and deployment notes

Running Claude Code remotely

You don’t have to run Claude Code on your local machine. Running it over SSH on a cloud VM or remote server keeps your local system untouched and gives you access to more powerful hardware. For CI/CD integration — running Claude Code in GitHub Actions, GitLab CI, or similar systems — see the container discussion in Security fundamentals above, plus the official docs for GitHub Actions and GitLab CI/CD.

A note for GitLab users

Many teams — including many at UW-Madison — use GitLab rather than GitHub. Claude Code works with GitLab, but the integration is less mature than the GitHub experience.

What works well:

Claude Code CLI with GitLab repos — the core experience (reading code, editing files, running commands) works identically regardless of your git host. Claude Code operates on your local checkout, so the remote platform doesn’t matter for day-to-day coding.
GitLab CI/CD integration — Anthropic provides official documentation for running Claude Code in GitLab CI/CD pipelines, including merge request review and test scaffolding.
Git operations — push, pull, branching, and committing all work normally since these are standard git operations.

What’s different or limited compared to GitHub:

No native GitLab integration in Claude Code’s Slack bot — the Slack integration currently only supports GitHub repos. GitLab support is an open feature request.
No @claude mention in GitLab issues/MRs — GitHub Copilot’s coding agent lets you assign issues to Copilot or mention it in PRs. There’s no equivalent native integration for GitLab yet, though GitLab is working on it.
Community-built CI/CD tooling — while official docs exist, you may find yourself using community solutions to replicate the smoother GitHub Actions experience.
Self-hosted GitLab — if your institution runs a self-hosted GitLab instance, be aware that Claude Code sends code context to Anthropic’s API for processing. This may raise compliance concerns depending on your institution’s data policies.

Practical advice: The CLI workflow is essentially identical — focus your setup effort on CI/CD integration. For MR review automation, use Claude Code in your .gitlab-ci.yml with the claude -p (prompt) flag for non-interactive pipeline usage. If your institution has data sensitivity requirements, check with your IT governance team before sending code to external APIs — this applies to all cloud-based AI coding tools, not just Claude.

Summary

Agentic coding tools are genuinely powerful — they can dramatically accelerate feature development, help you explore unfamiliar codebases, and automate tedious multi-step tasks. But they require a different mindset than traditional code assistants:

Scope your requests tightly — features, not projects
Use CLAUDE.md to encode guardrails and project context
Tune permissions deliberately — start conservative, loosen as you build trust
Set deny rules and enable the sandbox — your two strongest security layers
Scope your credentials — fine-grained tokens, dedicated keys, .gitignore for secrets
Monitor costs — set limits, be specific, use the right model for the task
Commit frequently — keep escape hatches available
Review everything — you’re the engineer; the agent is a very fast intern

The technology is moving fast, and best practices will continue to evolve. The core principle stays the same: give agents the minimum access they need, provide maximum clarity in your instructions, and always keep a human in the loop for decisions that matter.