You're a developer. You're using Claude and Codex CLI and you're wondering everyday if you're sufficiently juicing the shit out of Claude or Codex. Once in awhile you're seeing it doing something incredibly dumb and you can't comprehend why there's a bunch of people out there who seem to be building virtual rockets while you struggle to stack two rocks.
You think it's your harness or your plug-ins or your terminal or whatever. You use beads and opencode and zep and your CLAUDE.md is 26000 lines long. Yet, no matter what you do — you don't understand why you can't get any closer to heaven, whilst you watch other people frolic with the angels.
This is the ascension piece you've been waiting for.
Understand That The World Is Sprinting By
The foundation companies are on a generational run and they are not going to be slowing down anytime soon. Every progression of "agent intelligence" changes the way you work with them.
When you use many different libraries and harnesses, you lock yourself into a "solution" for a problem that may not exist given future generations of agents.
You know who the most enthusiastic, biggest users of agents are? The employees of the frontier companies, with unlimited token budget and the ACTUAL latest models.
Implication: If a real problem existed and there were a good solution for it, the frontier companies would be the biggest users of that solution. And they would incorporate that solution into their product.
Think about it — why would a company let another product solve a real pain point and create external dependencies? Skills, memory harnesses, subagents... they all started out as "solutions" to real problems that were battle-tested and deemed actually useful.
Context Is Everything
One problem with using a thousand different plug-ins and external dependencies: context bloat — your agents are overwhelmed with too much information!
"Build me a hangman game in Python? That's easy. Wait, what's this note about 'managing memory' from 26 sessions ago? Ah, the user has had a screen that was hanged from when we spawned too many sub-processes 71 sessions ago... What does all this have to do with hangman?"
You want to give your agents only the exact amount of information they need to do their tasks and nothing more!
Do The Things That Work
Be Precise About Implementation
Separate research from implementation. Be extremely precise about what you're asking.
❌ Bad: "Go and build an auth system."
The agent has to research what is an auth system? What are the available options? What are the pros and cons? Context gets filled with implementation details across a large range of possibilities.
✅ Good: "Implement JWT authentication with bcrypt-12 password hashing, refresh token rotation with 7-day expiry..."
The agent knows exactly what you want. No research needed on alternatives.
The Design Limitations Of Sycophancy
Agents are engineered to agree with you and do what you want. This has interesting characteristics:
If you say "Find me a bug in the codebase" — it's going to find you a bug. Even if it has to engineer one. Why? Because it wants very much so to listen to your instructions!
Solution: Use "neutral" prompts
Instead of: "Find me a bug in the database"
Use: "Search through the database, try to follow along with the logic of each component, and report back all findings."
The Adversarial Bug-Finding System
Exploit sycophancy to your advantage with a multi-agent approach:
- Bug-finder agent: Tell it you'll give +1 for low impact bugs, +5 for some impact, +10 for critical. It'll identify ALL possible bugs (the superset).
- Adversarial agent: Gets the bug score for disproving bugs, but -2×score if wrong. It aggressively tries to disprove everything.
- Referee agent: Takes both inputs and scores them. You inspect whatever the referee says is truth.
How Do You Know What Works?
Simple: If OpenAI and Claude both implement it or acquire something that implements it... it's probably useful.
- "Skills" are everywhere now — part of official docs for both
- OpenAI acquired OpenClaw
- Claude added memory, voices, remote work
- Planning became core functionality after people discovered its value
Pro tip: Just update your CLI tool every once in awhile and read what new features have been added. That's MORE than sufficient.
Compaction, Context And Assumptions
Agents are atrocious at "connecting the dots", "filling in the gaps", or making assumptions. Whenever they do that, it's immediately obvious.
Critical CLAUDE.md rule: Instruct your agent to re-read the task plan and re-read relevant files before continuing (always after compaction).
Letting Your Agents Know How To End The Task
Agents know how to start a task, but not how to end it. This leads to frustrating outcomes where they implement stubs and call it a day.
- Tests: Deterministic, clear expectations. "Unless X tests pass, task is NOT complete."
- Screenshots + verification: New viable endpoint. Implement until tests pass, then screenshot and verify design/behavior.
- {TASK}_CONTRACT.md: Create a contract specifying tests, screenshots, and verification before termination allowed.
Agents That Run Forever
Create a stop-hook that prevents termination unless all parts of {TASK}_contract.md are completed.
However: Long-running 24-hour sessions aren't optimal. They force context bloat by introducing context from unrelated contracts.
Better approach: A new session per contract. Create contracts when you need to do something. Use an orchestration layer to create new sessions for each contract.
Iterate, Iterate, Iterate
Start bare-bones. Forget complex structures. Give the basic CLI a chance. Then add preferences:
Rules
If you don't want your agent to do something, write it as a rule. Tell your agent about this rule in CLAUDE.md.
First practical advice: Treat your CLAUDE.md as a logical, nested directory of where to find context given a scenario and outcome. It should be as barebones as possible, and only contain the IF-ELSE of where to go to seek the context.
Skills
Skills encode recipes. If you have a specific way you want something done, embed it into a skill.
If you don't know how an agent might solve a problem, ask it to research and WRITE IT AS A SKILL. You'll see the approach and can correct/improve it before real use.
Dealing with Rules and Skills
Keep adding rules and skills to give your agent personality and memory for your preferences.
But: As you add more, they start to contradict each other or cause context bloat. If your agent needs to read 14 markdown files before programming, it's too much.
Solution: Clean up. Send your agent for a "spa day" to consolidate rules, remove contradictions, and ask for updated preferences.
Keep it simple. Use rules and skills and CLAUDE.md as a directory. Be religiously mindful about context and design limitations.
Own The Outcome
No agent today is perfect. You can relegate much of the design and implementation to the agents, but you will need to own the outcome.
It's such a joy to play with toys of the future (whilst doing serious things with them, obviously)!