“Has anyone figured out a rock-solid formula to run Codex and Claude Code in a sandbox? No firewall, proxy, or temp credentials; bash and generated code can't access the secret; streaming I/O.” — asked publicly, near verbatim, by the founder of an eval platform
but...
The constraints preserve the Unix ambient-authority model while demanding capability-security guarantees — that's why they fight each other. You're subtracting power from an overpowered process and asking for the subtraction to be rock-solid. It never will be.
what if...
Change the agent's calling convention instead of its cage. Agents emit typed effects — model.complete, shell.run, fs.read, net.fetch — and handlers own the authority. The model key becomes an implementation detail of one handler; streaming comes free because everything was already an event stream; and the typed effect trace is a better eval artifact than any transcript.
“Why didn't my webhook fire? How do we make delivery reliable — retries, ordering, idempotency keys, dead-letter queues?”
but...
Every feature on that list is the price of making delivery the correctness mechanism. You're hardening a channel that should be allowed to fail — and after all of it, the provider's docs will still say "don't rely on webhooks."
what if...
Demote the webhook to a hint. A signed, content-free poke plus a ranged pull from a cursor you own turns every delivery failure into latency instead of corruption. Truth lives in a pullable ledger; sanity is a cursor you own.
“How do I add a sign-up form to my static site? Do I really need a whole backend now?”
but...
Today, yes — and that's the scandal. The most basic feature on the web since 1995 — a sign-up form, a contact form, a file upload — summons a database, spam filtering, email deliverability, auth, a storage bucket, a privacy policy, and a bill. The form takes an hour; the apparatus takes the rest of your life. And every "easy" fix — a forms SaaS, a serverless function, a full-stack framework — is a backend rented in a different shape.
what if...
The form was never the page's job. Capability composes onto the static page as layers — independent apps with their own identity, permissions, and lifecycle — and the data lands in a personal data server that takes all-you-need-is-log seriously: submissions as appended events, uploads as content-addressed blobs. The site stays files on a CDN; nobody's nightmare begins.
“Which terminal should I switch to — Warp? Ghostty? Should I finally learn tmux?”
but...
You're choosing a prettier window onto the same amnesia. Warp modernized the chrome, Fig proved the appetite and got sunset, nushell structured the pipes inside a forgetting shell — each fixed a symptom and the defaults survived them all.
what if...
Ask why the operating surface discards everything it just computed — commands, timings, outputs, outcomes. Demand a ledger, not a theme. Then make it prove the waste: there's a prompt on the desh site your agent can run against your own history.
“Should we organize the repo by feature or by layer?”
but...
If something can be classified in more than one hierarchy, you're not in a tree anymore — and code always can. The debate is unwinnable because the data model is wrong, not because your team lacks discipline.
what if...
Code is a graph; the folder tree is one projection of it, kept for the toolchains. Argue about the things that deserve argument — where the command, event, and scenario boundaries sit — and let directories be a rendering decision.
“How am I supposed to review 3,000 lines of AI-generated code?”
but...
You're reviewing bytes moved, not meaning changed — at exactly the moment volume made that impossible. Line diffs measure churn; an import shuffle and a behavior change look the same in green and red.
what if...
Review intent and semantic change sets: which boundaries moved, which claims (tests, scenarios) changed, which decisions were taken. When agents write most of the code, intent is the human contribution — review the thing only the human could have supplied.
“Do we need an llms.txt? How do we rank in AI answers?”
but...
That's SEO brain applied to a governance problem: write one magic file and hope. Your machine-readable truth already leaks across llms.txt, .well-known, OpenAPI, MCP, and schema.org — published by different teams, drifting independently, read by agents that don't ask twice.
what if...
Treat the machine-readable surface as a governed product with an owner. Observe what machines actually request (the 404s are demand data), publish what they should find, and monitor the drift between the two.
“How do we let AI agents use our product — build an MCP server? Ship our own agent? Hand out API keys?”
but...
Every default answer hands somebody's agent ambient authority — your keys living in their runtime, or their agent living in your account. The user is reduced to a credential, and revocation becomes an incident response.
what if...
Grant a scoped integration agent exactly what it needs through consent rails that already exist — OAuth, CloudFormation quick-create, GitHub App installs — verify it in shadow mode before cutover, and make revocation boring. The surface proposes; the user's runtime disposes.
“What was that command I ran last month that fixed this exact thing?”
but...
Ctrl-R roulette is archaeology without a site map: the strings survived, but the timings, outputs, and outcomes — the parts you actually need — were discarded at birth. You're not bad at remembering; the shell is bad at keeping.
what if...
Sessions should be facts on a ledger. "What did I run, what came out, did it work, what did it touch" should be a query — and a session you can hand to a colleague or an agent as structure, not a screenshot of green text.
“Airflow, Temporal, or just cron?”
but...
You're choosing an executor before you have a contract — and committing to a rewrite to get one. Meanwhile the pipeline you've re-run fourteen times this week already is the spec; porting it into a DAG framework is the waste, not the work.
what if...
The graph is the contract; execution is a choice. Promote the pipeline you already have — name it, pin its inputs, parameterize the literals — and ship the same graph under whichever scheduler and output engine fits: local incremental, durable-execution, batch, stream.
“Can you just email me the spreadsheet?”
but...
The moment an artifact is attached it sheds its schema, its behavior, and its history, arriving as bytes the receiver must re-divine. Every forwarded spreadsheet is a small funeral for structure.
what if...
Attachments should be envelopes: payload plus schema plus provenance plus the verbs the artifact supports — render, validate, replay, diff. Transport should change where an artifact is, not what it can do.
Your question framed wrong? Send it in — the but... is usually more interesting than the answer. $ mail hello@sudoscience.dev