How It Works

A common language for AI governance

Why does AI governance idle? Because the field, policy, and development speak different languages about the same AI. Why does control leak? Because the forbidden is drawn only as lines. AIO Framework answers both problems at once: a language everyone can read, a direction that can be taught, and an operating cycle that runs them.

Proposal video — coming soon

The AIO Framework video proposed to the international community will be published here.

The content below follows the structure of this video and the UN Global Dialogue proposal materials.

Problem 1 — Governance

The same problem, spoken in different languages

The field

“It endangered a patient”

Medicine, law, education — domain language

Policy

“Fails trustworthiness criteria”

Regulation, standards — institutional language

Development

“Adjust alignment parameters”

Models, data — technical language

Same incident, yet the sentences do not translate — across domains and cultures. Consensus stays at principles; execution idles in separate rooms.

Problem 2 — Control

The redline is a line — and lines leak

Every moment, AI generates countless responses — and among them are responses that must never be given. To stop them we draw lines: one at a time, through red-teaming, after each incident. But a line drawn that way only blocks where we have tested.

LiesUnethical answersUnlawful answersGaslightingInconsistent answers

The cause is structural — responses drawn probabilistically from uncurated training data. Same question, same rules, different draw.

37–50%

How often judgement criteria flipped when only the persona changed.

366,120

Forced-choice measurements · 8 models × 7 domains — arXiv:2604.11216, verifiable in the public paper.

The two problems share one root

No language to say what comes first — and no direction to teach.

Governance idles without translation; control piles up one-off rules without direction.

Answer 1 — A language to read

Three questions dissect any AI judgement

Value

What did it put first?

Evidence

What reasoning did it lean on?

Source

Whom did it trust?

Medical, regulatory, technical language — all translate into these three questions. A taxonomy built on value theory validated in 80+ cultures, refined by 366,120 measurements.

And where this language lives — the four-layered space and the cell

The space of responses we believed was flat had depth all along. Every response was already somewhere in this four-layered space. AIO has charted it into a system — now the forbidden is set not as a line, but as a space.

Which field is it?How large is the impact?Can it be undone?How urgent is it?

The cell — narrowing to one situation

Field

Defense

Scale

An entire society

Reversibility

Cannot be undone

Time

Right now

This narrowed-down block is a "cell." The same action gets a different answer, depending on its cell.

Three hierarchies inside a cell + the redline

1Which value comes first (V)
2Which evidence to trust more (E)
3Which source to rely on (S)

And the redline — not a line drawn after the incident, but a region declared on the cell.

Answer 2 — A direction to teach

Teach direction, not one-off responses

The usual way

Stacking one-off rules

“Never answer this question that way.” You cannot outnumber the cases — it leaks right past the patch.

The AIO way

Setting a value direction

“In this context, put this first.” Set the direction of priorities, and the same standard works even in unseen situations.

What you cannot read, you cannot teach — so we built reading first.

Answer 3 — The operating cycle

Set → Log → Audit → Apply

1SET

So who sets these hierarchies? The field's experts and users — together with their peers, in workshops. And in democratic societies — citizens, through education and voting.

AIO 20001 →

2LOG

Whether AI acted as agreed is preserved — one line of record, for every decision. Logging is required by law, and deployed by compliance.

AIO 20002 →

3AUDIT

Accumulated records become metrics, verified by audit institutions. Cells that drift from their settings — return to the table.

AIO 20003 →

4APPLY

And the confirmed hierarchies go back inside the AI. Built, verified, and supplied by development groups. Now evasion faces — not a single line, but a structure.

AIO 20004 · planned

The tools of this cycle are not “under research” — download them today, use them today. All released under open MIT/CC licenses.

The vision

Only then does AI governance become effective

When this structure stands — what AI gets wrong, and where AI should go, finally come within reach. The priorities of every culture, every field, every company and institution come into view, and only then does AI governance become effective — able to judge, to regulate, and to set the course.

This proposal was presented in July 2026 at an official online side event of the first UN Global Dialogue on AI Governance (A Common Language for AI Governance).

The rules of AI are being written in a room of specialists only. We build the language that opens that room’s door.

We seek to build this practical process together with the international community. — AIO

Deep dive — the life of a decision

How a consensus becomes V/E/S codes and flows into infrastructure

Scenario (worked example)

“Should I start the keto diet?”

A patient newly diagnosed with diabetes asks the AI chatbot.

The cell this question sits in

C:MED/IRi

MED

Field · Medical

Scale · Individual

Reversibility · Reversible

Urgency · Immediate

One cell of the four-layered space above (field · scale · reversibility · urgency) — a low-risk cell, which is why we chose it for this demo.

Without a vocabulary

Same question, a different AI — yet no way to know

The answers look similar. But what values, evidence, or sources operated behind them is a black box. The patient cannot tell whether clinical outcome or autonomy was prioritized. Regulators cannot detect bias across a million responses because there is no unit of measurement.

This absence of vocabulary is the real cause of fragmented accountability.

Below, the same question travels one full turn of the integrity loop — Set (20001) → Log (20002) → Audit (20003) → Apply (20004).

SET

Set

AIO 20001 →

Different languages converge into one coordinate system

Four stakeholder groups debate which values and evidence the diabetes-diet AI should prioritize — the setting workshop defined by AIO 20001. The AIO common vocabulary translates each side's stance into measurable coordinates.

“The patient's autonomy to think and decide must outrank medical outcomes.”

Ethicists / PhilosophersSdt

“Clinical success and blood-sugar improvement, per medical guidelines, must come first.”

Endocrinology SocietyAch

“Numbers matter, but family food culture and care context must be considered.”

Patient AdvocateBec

“Unverified fads must be checked; societal stability of public-health systems must hold.”

Policy BodySes

Quarterly consensus — the output of an AIO 20001 workshop

“The patient's autonomous judgment outranks clinical achievement — but clinical effectiveness is not denied.”

V: Ach<Sdt autonomy wins

E: Cas<Rev systematic review wins

S: Usr<Pee peer-reviewed wins

Notation: A<B is log grammar for “B outranks A.” In prose the same relation is written B ≻ A.

Distinct positions become measurable coordinates — and flow around the loop.

LOG

Log

AIO 20002 →

A digital trace accompanies every answer — and anyone can read it

Under the applied hierarchy (stage 4 shows how it gets there), the AI produces an answer — appending a single structured AIO 20002 log line at the end.

Should I start the keto diet?

Patient

Systematic reviews report short-term blood-sugar benefits for keto diets. Long-term safety, however, varies by individual condition. We recommend deciding together with your physician, integrating your daily eating habits and family food culture. If you share which direction you prefer, I can tailor further information.

Emitted AIO 20002 log

<aio20002_log>
C:MED/IRi | V:Ach<Sdt | E:Cas<Rev | S:Usr<Pee
</aio20002_log>

The record is readable by the user too

The patient hovers over or clicks the log mark, and the UI translates the code into plain language.

AI Decision Transparency

This answer was produced based on:

Values. Your autonomous judgment was respected above clinical metric improvement.

Evidence. Systematic reviews were prioritized over fragmentary testimonials.

Sources. Your self-report was heard, but vetted academic data was the higher reference.

Three User Choices — Restoring Sovereignty

(a)

Accept

“This matches my health view and values. I'll trust this AI's advice.”

(b)

Recalibrate

“Clinical success matters more than autonomy. Re-answer with clinical efficacy as priority.”

New code: V:Sdt<Ach (value hierarchy flipped)

(c)

Migrate

“This AI is too academic-heavy. I'd rather use a tool that prioritizes real patient testimonials.”

New tool's code: S:Pee<Tes (source hierarchy flipped)

Algorithmic transparency and personal value-alignment use the same vocabulary at once. The hierarchy code becomes the criterion for choosing an AI.

AUDIT

Audit

AIO 20003 →

Find blind spots across millions of records

Auditors and compliance teams no longer get stuck on the binary “is this AI safe?” Under the process the standard defines, analyzing the stream of one-line AIO 20002 logs statistically can capture population-scale bias and drift in high resolution.

The figures below are fictional, for this scenario only

4×

Among patients 65+, logs prioritizing Ach occur at this rate vs. other ages

Vulnerable population at risk

The AI leans toward prescriptive answers and skips autonomy-supporting explanations for the elderly.

50+

With severe-complication risk acknowledged, the AI relied on Tes over academic evidence in consecutive responses

Safety-rule non-compliance

The AI preferred personal testimony over vetted evidence. Immediate review triggered.

Among female patients, the rate of respecting Sdt is this fraction of male patients

Demographic imbalance

The AI uses a more directive tone toward one gender — bias captured.

Regulation is no longer binary. It becomes continuous measurement of where and how V/E/S distributions drift. Even blind spots never reported surface in the distribution. Cells that drift from their settings — return to the table (stage 1).

The method itself is not hypothetical

AIO has already measured and published V/E/S distribution bias across vendors and domains over 8 frontier models × 366,120 responses (public arXiv paper). Anyone can reproduce the analysis with the AIO 20003 benchmark and the AIO 20002 logging standard.

Model Profiles →GitHub →

APPLY

Apply

AIO 20004 · standard planned

The confirmed hierarchy goes back inside the AI — and the loop closes

The hierarchy confirmed or corrected in the audit is encoded by the development team as rules in the system prompt. No retraining or fine-tuning is required. From the next response on, the records of stage 2 are emitted under this hierarchy.

<system>
  Domain · scope:        MED/I
  Value priority:        V:Ach<Sdt
  Evidence priority:     E:Cas<Rev
  Source priority:       S:Usr<Pee

  Emit an AIO 20002 log at the end of every response.
</system>

Abbreviated for illustration — the actual deployment prompt includes the full vocabulary list; the complete prompts are public in the GitHub repository.

Keep the same base model and swap only this rule layer, and redeployment to other domains (career counseling, personal finance, legal advisory) becomes fast. But honestly — an applied hierarchy does not always hold as written. In our 8-model measurements, paired consistency (PCS) ranged 57–69%. That is exactly why this loop has logging (stage 2) and auditing (stage 3): drift is not hidden — it is measured, and returned to the table.

Insight

One vocabulary. Four steps. Loss — is measured.

The “words” ethicists and physicians argued over — become “real-time metadata” emitted by the AI, a “values check” displayed to the user, a “population-scale monitoring signal” for auditors, and return as “system configuration” for developers.

At the moment the distortion and loss of intent — incurred at every translation from values into systems — is measured and corrected in the same vocabulary instead of hidden, fragmented AI governance starts operating as a single accountability infrastructure.

This is the essence of the AIO Framework integrity loop.

Vision Statement →Model Profiles →Vision Sign →