frequently asked questions

# the questions are real. the framings are negotiable.

Collected from the wild: things developers and teams actually ask — in standups, in forums, in public posts. Each one is answered the way the lab actually thinks: a but... for where the framing fights you, a what if... for the reframe, and the investigation it routes to.

“Has anyone figured out a rock-solid formula to run Codex and Claude Code in a sandbox? No firewall, proxy, or temp credentials; bash and generated code can't access the secret; streaming I/O.” — asked publicly, near verbatim, by the founder of an eval platform

but...

The constraints preserve the Unix ambient-authority model while demanding capability-security guarantees — that's why they fight each other. You're subtracting power from an overpowered process and asking for the subtraction to be rock-solid. It never will be.

what if...

Change the agent's calling convention instead of its cage. Agents emit typed effects — model.complete, shell.run, fs.read, net.fetch — and handlers own the authority. The model key becomes an implementation detail of one handler; streaming comes free because everything was already an event stream; and the typed effect trace is a better eval artifact than any transcript.

EZAF →DESH →

“Why didn't my webhook fire? How do we make delivery reliable — retries, ordering, idempotency keys, dead-letter queues?”

but...

Every feature on that list is the price of making delivery the correctness mechanism. You're hardening a channel that should be allowed to fail — and after all of it, the provider's docs will still say "don't rely on webhooks."

what if...

Demote the webhook to a hint. A signed, content-free poke plus a ranged pull from a cursor you own turns every delivery failure into latency instead of corruption. Truth lives in a pullable ledger; sanity is a cursor you own.

Webpoke →

“How do I add a sign-up form to my static site? Do I really need a whole backend now?”

but...

Today, yes — and that's the scandal. The most basic feature on the web since 1995 — a sign-up form, a contact form, a file upload — summons a database, spam filtering, email deliverability, auth, a storage bucket, a privacy policy, and a bill. The form takes an hour; the apparatus takes the rest of your life. And every "easy" fix — a forms SaaS, a serverless function, a full-stack framework — is a backend rented in a different shape.

what if...

The form was never the page's job. Capability composes onto the static page as layers — independent apps with their own identity, permissions, and lifecycle — and the data lands in a personal data server that takes all-you-need-is-log seriously: submissions as appended events, uploads as content-addressed blobs. The site stays files on a CDN; nobody's nightmare begins.

Layers →DESH →

“Which terminal should I switch to — Warp? Ghostty? Should I finally learn tmux?”

but...

You're choosing a prettier window onto the same amnesia. Warp modernized the chrome, Fig proved the appetite and got sunset, nushell structured the pipes inside a forgetting shell — each fixed a symptom and the defaults survived them all.

what if...

Ask why the operating surface discards everything it just computed — commands, timings, outputs, outcomes. Demand a ledger, not a theme. Then make it prove the waste: there's a prompt on the desh site your agent can run against your own history.

DESH →

“Should we organize the repo by feature or by layer?”

but...

If something can be classified in more than one hierarchy, you're not in a tree anymore — and code always can. The debate is unwinnable because the data model is wrong, not because your team lacks discipline.

what if...

Code is a graph; the folder tree is one projection of it, kept for the toolchains. Argue about the things that deserve argument — where the command, event, and scenario boundaries sit — and let directories be a rendering decision.

Unfiled →

“How am I supposed to review 3,000 lines of AI-generated code?”

but...

You're reviewing bytes moved, not meaning changed — at exactly the moment volume made that impossible. Line diffs measure churn; an import shuffle and a behavior change look the same in green and red.

what if...

Review intent and semantic change sets: which boundaries moved, which claims (tests, scenarios) changed, which decisions were taken. When agents write most of the code, intent is the human contribution — review the thing only the human could have supplied.

Unfiled →EZAF →

“Do we need an llms.txt? How do we rank in AI answers?”

but...

That's SEO brain applied to a governance problem: write one magic file and hope. Your machine-readable truth already leaks across llms.txt, .well-known, OpenAPI, MCP, and schema.org — published by different teams, drifting independently, read by agents that don't ask twice.

what if...

Treat the machine-readable surface as a governed product with an owner. Observe what machines actually request (the 404s are demand data), publish what they should find, and monitor the drift between the two.

instagrate-me →

“How do we let AI agents use our product — build an MCP server? Ship our own agent? Hand out API keys?”

but...

Every default answer hands somebody's agent ambient authority — your keys living in their runtime, or their agent living in your account. The user is reduced to a credential, and revocation becomes an incident response.

what if...

Grant a scoped integration agent exactly what it needs through consent rails that already exist — OAuth, CloudFormation quick-create, GitHub App installs — verify it in shadow mode before cutover, and make revocation boring. The surface proposes; the user's runtime disposes.

EZAF →Webpoke →

“What was that command I ran last month that fixed this exact thing?”

but...

Ctrl-R roulette is archaeology without a site map: the strings survived, but the timings, outputs, and outcomes — the parts you actually need — were discarded at birth. You're not bad at remembering; the shell is bad at keeping.

what if...

Sessions should be facts on a ledger. "What did I run, what came out, did it work, what did it touch" should be a query — and a session you can hand to a colleague or an agent as structure, not a screenshot of green text.

DESH →

“Airflow, Temporal, or just cron?”

but...

You're choosing an executor before you have a contract — and committing to a rewrite to get one. Meanwhile the pipeline you've re-run fourteen times this week already is the spec; porting it into a DAG framework is the waste, not the work.

what if...

The graph is the contract; execution is a choice. Promote the pipeline you already have — name it, pin its inputs, parameterize the literals — and ship the same graph under whichever scheduler and output engine fits: local incremental, durable-execution, batch, stream.

DESH →

“Can you just email me the spreadsheet?”

but...

The moment an artifact is attached it sheds its schema, its behavior, and its history, arriving as bytes the receiver must re-divine. Every forwarded spreadsheet is a small funeral for structure.

what if...

Attachments should be envelopes: payload plus schema plus provenance plus the verbs the artifact supports — render, validate, replay, diff. Transport should change where an artifact is, not what it can do.

Avoidant →

“Why does every conversation with one person live in a single thread?”

but...

Because messaging modeled Person ↔ Person → messages when most traffic was where are you? and ok. Replies added edges between messages but not durable context objects — no per-topic status, summary, next action, or lifecycle. Project management, logistics, and long collaborations get flattened into one transcript.

what if...

Messages belong to contags — context tags with lifecycles — that belong to relationships. Apple, Google, and carriers could ship contag inboxes as opt-in UI tomorrow; #tags work as plain-text fallback until then.

Contags →

“How do I stop my Cloudflare bill from surprise-spiking — especially when an agent loops?”

but...

Dashboard billing is hours to days behind. Monthly invoice alerts are archaeology — they don't stop a runaway agent burning Workers invocations, D1 reads, and AI tokens right now. Delegating compute without a spend kill switch is ambient financial authority.

what if...

Put enforceable limits at the spend proxy: count operations in near-real time, estimate cost from a learned model, trip the circuit when a limit spec is violated — absolute caps, rate limits, SKU-scoped bounds, burn-rate projections — before the bill, before the loop.

CloudCap →

“Why am I typing production commands into a one-line terminal prompt?”

but...

The most important text you write all day is often typed into the worst editor you use all day — shell prompts, browser textareas, AI compose boxes. They lack undo, history, linting, and review-before-submit while your IDE already has all of that. The gap isn't Vim nostalgia; it's that keystrokes are the wrong abstraction.

what if...

Treat every input as an editing request: surface → user-owned editor session → validated apply/submit effect. The IDE becomes the capability handler for text — agents propose buffers, users review, handlers apply with provenance.

EZAF →DESH →

Your question framed wrong? Send it in — the but... is usually more interesting than the answer. $ mail hello@sudoscience.dev