Beyond prompt engineering: shell scripts, skills, and memory that make AI agents useful
Move past chatbot prompts. Shell scripts, reusable skills, and persistent memory create the infrastructure that makes AI agents faster, cheaper, and smarter across every project.
Most prompt engineering advice stops at the chatbot: write better instructions, add role and context, specify your output format. That advice is correct and incomplete. The real multiplier is not a better prompt—it is the infrastructure you build around the AI agent before you ever type a word. Shell scripts that automate commits, linting, deploys, and project scaffolding. Reusable skill files that give the agent domain knowledge without re-explaining your conventions every session. Persistent memory that carries decisions, patterns, and project context across conversations so the agent compounds what it knows instead of starting from zero. The payoff is concrete: faster task execution, fewer tokens burned on repeated context, and a tighter loop from research to shipped code. This post covers the infrastructure layer that separates people who use AI from people who build systems around it.
Why prompt templates are not enough
Structured prompts work. The five-part pattern—role, context, task, format, tone—consistently produces better output than a vague question typed into a chat window. If you haven’t internalized that discipline yet, AI-assisted design workflows: what actually works covers the fundamentals and which tools are worth your time in 2026.
But even a perfect prompt is a one-shot artifact. You write it, run it, get output, and next session you start over. The context you painstakingly described? Gone. The project conventions you specified? Re-explained from scratch. The commit-lint-test-push sequence you ran manually last time? Manual again. Prompt templates help with the last mile of a single interaction. They do nothing for the infrastructure that makes every interaction faster.
The gap between a good prompt writer and an AI-powered practitioner is not prompt quality—it is everything that surrounds the prompt. The scripts that automate repetitive shell operations. The skill files that preload domain knowledge. The persistent context that means today’s session starts where yesterday’s left off. That infrastructure is what compounds.
Shell scripts that make AI agents repeatable
The tasks you repeat across every project—committing code, running lint checks, scaffolding folder structures, setting up environments, validating before deploy—are the lowest-hanging automation targets. A shell script that runs in one command replaces a sequence you’d otherwise re-explain to the agent or execute manually every time.
Commit workflows. A single script that stages changes, runs the linter, executes tests, formats the commit message to your convention, and pushes. Instead of five separate commands or asking the agent to remember your commit message format, you run ./scripts/commit.sh and the entire sequence fires in order. The agent can call it too—one tool invocation instead of a multi-step conversation about your git workflow.
#!/bin/bash
# commit.sh — lint, test, commit, push in one pass
npm run lint:fix
npm run test -- --bail
git add -A
git commit -m "$(cat <<EOF
$1
$(git diff --cached --stat)
EOF
)"
git push origin HEAD
Environment setup. A script that creates the standard project folder structure—deploy configs, security review checklists, error handling patterns, testing configurations, rule files for data handling—installs dependencies, and initializes git. When you start a new project, this runs once and every convention is in place. When an AI agent works in that project, the structure is already there to guide its output.
#!/bin/bash
# scaffold.sh — standardized project structure
mkdir -p config/{deploy,security,rules}
mkdir -p src/{components,utils,hooks,styles}
mkdir -p tests/{unit,integration,e2e}
mkdir -p docs/{decisions,reviews}
cp templates/.eslintrc.json .
cp templates/.prettierrc .
cp templates/AGENTS.md .
npm install
git init && git add -A && git commit -m "scaffold: initial project structure"
Pre-deploy validation. A script that runs lint checks, type checking, security scans, bundle analysis, and any project-specific gates before you deploy. No more forgetting a step. No more asking the agent “did you run the linter?” when you can make linting a precondition of the deploy script itself.
#!/bin/bash
# pre-deploy.sh — gate checks before shipping
set -e
echo "Running type check..." && npx tsc --noEmit
echo "Running linter..." && npm run lint
echo "Running tests..." && npm run test -- --bail
echo "Running security scan..." && npm audit --audit-level=high
echo "Building..." && npm run build
echo "All checks passed. Ready to deploy."
These scripts are small—ten to thirty lines each. Their value is not complexity but consistency. When every project starts from the same automated baseline, onboarding a new contributor—human or AI—takes minutes instead of hours. As I covered in my guide to GitHub Actions CI/CD for designers, the same principle applies at the pipeline level: automate the repeatable, reserve human attention for the judgment calls.
How do reusable skills reduce token cost and improve output quality?
A skill, in the context of AI-assisted development, is a structured document—typically markdown—that gives an agent domain-specific knowledge, coding conventions, and workflow instructions it can load on demand. Instead of re-explaining your design system’s component API shape, your deployment pipeline’s CI checks, or your content authoring rules in every conversation, you write it once as a skill file and the agent loads it when relevant.
The economics are straightforward. A skill file might be 500–1,500 tokens to load. Without it, you spend 500–2,000 tokens re-explaining the same context in every session, and the agent still gets details wrong because your ad-hoc explanation varies each time. Over ten sessions, a skill file saves thousands of tokens and produces more consistent output because the instructions are identical every time.
Concrete examples of skills I maintain:
- Design system conventions. Component API shapes, naming patterns, token usage rules, accessibility requirements. The agent generates components that match the system without me specifying “use Tailwind v4 theme tokens, not arbitrary values” in every prompt.
- Deployment pipeline. Which CI checks run, how to structure PR descriptions, what the branch naming convention is, which tests are required before merge. The agent writes PRs that pass review on the first try.
- Content authoring. Frontmatter requirements, internal linking rules, tag canonicalization, SEO patterns. The agent writes blog post drafts that conform to the site’s content architecture without a style guide pasted into every conversation.
- Security and error handling. Input validation patterns, error response formats, logging conventions, authentication flow requirements. The agent writes defensive code by default.
The skill ecosystem extends beyond your own files. Open-source repositories of agent skills—for accessibility auditing, framework-specific authoring patterns, performance optimization checklists—are starting to appear. You can grab a well-maintained WCAG audit skill, drop it into your project, and immediately give the agent the ability to run accessibility reviews against a standard you didn’t have to write yourself. The same pattern applies to framework skills (Astro, Next.js, SvelteKit), testing patterns, and infrastructure conventions.
In practice, skills map to features in the tools you’re already using. Cursor has project-level .cursor/skills/ and .cursor/rules/. Claude Code reads AGENTS.md and CLAUDE.md at the project root. Codex uses .codex/skills/. The naming varies; the pattern is identical: structured context that loads automatically so you spend tokens on the work, not on re-establishing what the agent should already know.
Persistent memory across sessions—the context that compounds
Every new AI session starts from zero. The agent does not know what you decided yesterday, which architectural approach you chose and why, what you tried and rejected, or which patterns you established three sessions ago. Without persistent memory, each session is isolated—you pay the context cost again, and decisions that should build on each other don’t.
Persistent memory solves this at three levels.
Session summaries and decision logs. At the end of a working session, capture the key decisions, trade-offs considered, and patterns established in a structured markdown file. Next session, the agent loads that file and starts with full context. This is not complicated—it’s a decisions/ folder with timestamped entries that the agent reads before starting work. The discipline of writing them pays for itself within two or three sessions.
Project-level rule files. AGENTS.md, .cursor/rules/, project-specific convention files—these encode decisions that apply to the entire project. “Always use CSS custom properties from the design token system, never arbitrary hex values.” “Commit messages follow conventional commits format.” “All API responses use the standard error envelope.” These are not prompts—they are persistent instructions the agent loads automatically on every session. They function as institutional memory that no team member (human or AI) has to memorize.
Structured context documents. A single file—call it CONTEXT.md or product-context.md—that describes the product, its users, technical constraints, and current priorities. Every agent session that loads this file starts with the same foundational understanding. When priorities shift, you update one file and every future session reflects the change. This is the AI equivalent of onboarding documentation, except the agent actually reads it every time.
The compounding effect is the point. Session one, you establish conventions and the agent follows them inconsistently. Session five, the conventions are encoded in rules files, the last three sessions’ decisions are logged, and the agent produces output that reflects accumulated project context. Session twenty, the agent knows your project better than a new team member would after a week—because it has read every decision, every convention, and every pattern you’ve established. The infrastructure carries the knowledge forward.
The tighter loop—research, spec, plan, test, build, iterate
Shell scripts, skills, and persistent memory are not isolated improvements. Together, they compress the development loop. Here’s what the full cycle looks like on a concrete task—say, building a new feature component for a design system.
Research. Load the relevant skill (design system conventions, component patterns), feed context from persistent memory (what components exist, what the last sprint decided about API consistency), and ask the agent to synthesize requirements. The agent already knows your token system, your accessibility standards, and your naming conventions. You’re not spending the first ten minutes of the conversation on setup—you’re starting at the problem.
Spec. Generate a structured component spec using the project conventions the agent already has loaded. Props table, states, edge cases, accessibility requirements, responsive behavior. The spec follows your format because the skill file defines the format. No back-and-forth about “actually, we use TypeScript interfaces, not prop-types.”
Plan. Break the work into tasks. The agent references your project scaffolding to understand where the component lives, which test directories to target, and what the PR template looks like. The plan is grounded in your actual project structure because the scaffold script already created it.
Test. Run automated checks via shell scripts before writing implementation code. Type checking, linting, existing test suites—all fire with one command. If the tests surface a constraint you missed in the spec, update the spec before writing code. The shell script catches the gap; you don’t discover it during code review.
Build. Implement with the agent having full project context loaded. The agent writes code that uses your design tokens, follows your component API patterns, and handles errors according to your conventions—because all of that is loaded from skills and rules, not improvised from a prompt.
Iterate. Save decisions to persistent memory. If the implementation revealed a new pattern worth codifying—a responsive breakpoint strategy, an animation convention, a data-fetching pattern—add it to the relevant skill file. Update the decision log. Next session starts further ahead than this one did.
The key metric is not “how fast did the agent write code.” It is how much of the session was spent on the decisions that require human judgment—information architecture, interaction design, product trade-offs—versus how much was spent re-establishing context the agent should already have. The infrastructure handles the second category so you can focus on the first. As I described in how I embed in engineering teams, the principle is the same whether the collaborator is a junior engineer or an AI agent: reduce the overhead of getting someone up to speed so they can contribute meaningful work faster.
Key Takeaways
- The real AI leverage is infrastructure, not individual prompts—shell scripts, skills, and persistent memory compound across every session and project
- Shell scripts make repetitive workflows (commit, lint, deploy, scaffold) into one-command operations that both humans and AI agents can run consistently
- Reusable skill files give AI agents domain knowledge—design system conventions, deployment rules, content patterns—without re-explaining context every session
- Persistent memory across sessions means the agent compounds project knowledge over time instead of starting from zero
- The tighter the research-spec-plan-test-build-iterate loop, the more time you spend on design judgment and the less on setup and context re-establishment