// swift AI lab

We build the systems that let AI change real software, and prove they hold.

A research lab. We ship working systems, run them against real software, and publish what happens, the results and the limits both.

the work
context engineering

The prompt is at the bottom. On purpose.

An AI reads a stack of inputs in priority order, and when they disagree, the higher one wins. Most people think the prompt they type is in charge. Watch what actually happens when a prompt tries to override your rules. Click any layer for the detail.

priority stack · higher winsyour rules overridden: 0
promptoverruled
A stack of inputs, in priority order.
playing · click a layer for detail

Based on the five-role context package in Context Engineering (arXiv:2604.04258). Higher priority wins; the prompt is lowest, the Conductor Paradox.

augment engineering

One person. The whole team.

A project needs many kinds of work, and the old way hires a specialist for each. Watch what one person with the right method does to that team. Click any role if you want the detail.

the old way0 hires
the swift way1 person · 0 hires
1
One practitioner
same method, a tool for each job
A project needs seven kinds of work.
playing · click a role for detail

Based on a five-month study of a single practitioner across seven professional domains in Augment Engineering (arXiv:2605.26146). Whether the method transfers to everyone is the next thing being tested.

mandate · the specification plane

Before the AI acts, it writes the contract.

Vague instructions are where AI goes wrong. So a plain request is turned into a precise, checkable contract first: the goal, what is in scope, what is off-limits, and how it knows it is done. And when a request is too vague, it stops and says what is missing instead of guessing. Watch both happen.

a request arrives
objective
in scope
off-limits
done when
working
A plain request comes in.
playing · two requests, watch both outcomes

Based on the dual-output specification model in MANDATE: Specification Plane (SSRN 6170328). One of three planes, with LATTICE (authorization) and TRACE (runtime evidence).

lattice · the authorization plane

Every change an AI makes gets a verdict. Signed.

Before an AI is allowed to do anything, the request hits a gate: allow it, send it for a person to review, or block it. Watch the requests come through and get their call, each one signed into a record so it can be checked later. Click any line for the reason.

0
decisions signed
0
sent for review
0
blocked
0
slipped through unchecked
the AI wants to
the signed log
Requests are arriving at the gate.
playing · click a line for the reason

Based on the signed allow / review / block authorization model in LATTICE: Authorization Plane (SSRN 6151128).

trace · the runtime and evidence plane

If the AI starts to drift, the leash tightens on its own.

While the AI runs, it is watched. Normal work runs free. The moment something looks off, it is stepped down a containment ladder: restrict what it can do, require sign-off, and if needed, pull the plug. Watch one incident play out. Every step lands in a sealed record. Click a rung for what it means.

0
problems it hit
0
that slipped past containment
0
sealed log entries

the sealed record

The AI is running on its own at L0, full freedom.
playing · one incident, start to finish

Based on the six-level containment ladder and hash-chained evidence in TRACE: Runtime and Evidence Plane (SSRN 6212818).

how the work earns trust

It checks its own work until nothing is left.

Before an AI change is allowed to ship, the system reviews the whole job, fixes what it finds, and reviews again. It keeps going until a full review comes back clean. Here is a real run, problems found per review, falling to zero. Click any point to see that review.

0
reviews run
0
problems found and fixed
left at the end
The first review is about to run.
playing · click a point for that review

Findings per review from the nine-round convergence reported in Iterative Audit Convergence (arXiv:2605.12280).

closed-loop autonomous delivery

It fixes and ships software on its own.

This is the system that takes a task and carries it all the way to shipped, with no person in the loop. Watch tasks move down the line. When one carries a dangerous change, the gate stops it. When the tracker dies mid-task, no work is lost. Click any step for what it does.

0
shipped on their own
0
bad changes caught at the gate
0
work lost
0
bad changes shipped
on the line
waiting for the first task
idle
what shipped
The line is starting up.
playing · four tasks down the line

Based on a published study of autonomous, closed-loop software delivery. Read the paper (arXiv:2604.05000).

// approach

How we work.

Built, not theorized.

We ship and run real systems, then report what actually happened.

Publish the limits too.

Every result says where the evidence stops. That is what makes it trustworthy.

Governance is the architecture.

Every system starts from a specification, not a feature list.

The builder never grades its own work.

The agent that writes a change is never the one that audits it.

// join us

Come build things you can prove.

We are a small lab doing work that does not exist anywhere else: autonomous systems that earn the right to touch production. If you build systems and care about proving they hold, we want to talk.