// PAGE — DEMO

Demo

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

This page walks through the actual evidence Si-Chip produced when it self-shipped v0.4.0 on 2026-04-30. v0.4.0 is the first Si-Chip release at the v2_tightened (= standard) gate — Round 18 + Round 19 are the two consecutive v2_tightened passes that unlocked promotion. The historical v0.1.0 evidence (Round 1 + Round 2; v1_baseline; 2026-04-28) is preserved in chapter 8 as an additive baseline.

本页面逐章呈现 Si-Chip 在 2026-04-30 自我交付 v0.4.0 时实际产出的证据。 v0.4.0 是 Si-Chip 首次在 v2_tightened（= standard）档位发版—— Round 18 + Round 19 是解锁升档的两轮连续 v2_tightened 通过。历史的 v0.1.0 证据（Round 1 + Round 2；v1_baseline；2026-04-28）作为对照基线保留在第 8 章。

// CHAPTER 01 //

1. Two consecutive `v2_tightened` passes (v0.4.0)

Round	T1 pass_rate	T2 pass_k	trigger_F1	metadata_tokens	per_invocation_footprint	wall_clock_p95 (s)	router_floor
18 (live)	1.0 best cell, 0.9719 mean	1.0 best cell	1.0	94 (Stage 8 frozen)	4726	17.0028	composer_2/fast
19 (replay)	same (cache replay)	same (cache replay)	1.0	94	4726	17.5726	composer_2/fast

Round 18 was the first dogfood-side evals/si-chip/runners/real_llm_runner.py invocation against claude-haiku-4-5 + claude-sonnet-4-6 via Veil litellm-local at 127.0.0.1:8086 — $0.20 spend, 640 calls, 16-min wall-clock; honest k=4 sampling unblocked T2_pass_k from the deterministic SHA-256 PROXY 0.5478 lower bound it had been stuck at since Round 1. Round 19 replayed the cache at 100% hit ($0; ~20 ms wall-clock) for the second consecutive v2 pass.

1. 连续两轮 `v2_tightened` 通过（v0.4.0）

轮次	T1 pass_rate	T2 pass_k	trigger_F1	metadata_tokens	per_invocation_footprint	wall_clock_p95（秒）	router_floor
18（live）	1.0 best cell，0.9719 mean	1.0 best cell	1.0	94（Stage 8 已冻结）	4726	17.0028	composer_2/fast
19（replay）	同上（cache replay）	同上（cache replay）	1.0	94	4726	17.5726	composer_2/fast

Round 18 是 dogfood 侧首次调用 evals/si-chip/runners/real_llm_runner.py，通过 Veil litellm-local（127.0.0.1:8086）打到 claude-haiku-4-5 + claude-sonnet-4-6 —— 共花 $0.20、640 次调用、16 分钟 wall-clock；honest k=4 采样把 T2_pass_k 从 Round 1 起一直困在 SHA-256 PROXY 下界 0.5478 解放出来。 Round 19 用 100% 命中的 cache replay 完成第二轮 v2 通过（$0、约 20 ms wall-clock）。

// CHAPTER 02 //

2. The 8-axis value vector (Round 19; v0.4.0)

v0.4.0 broke value_vector byte-identicality with the addition of an 8th axis eager_token_delta per the Q4 user decision (the FIRST §6.1 break since v0.1.0).

Axis	Round 19	Direction
`task_delta`	+0.95	improvement vs no-ability baseline (real-LLM)
`token_delta`	+0.0	unchanged at v0.4.0 ship
`latency_delta`	+0.0	unchanged at v0.4.0 ship
`context_delta`	+0.0	unchanged at v0.4.0 ship
`path_efficiency_delta`	null	not measured this round
`routing_delta`	+1.0	improvement (full trigger_F1)
`governance_risk_delta`	0.0	unchanged
`eager_token_delta` (NEW @ v0.4.0)	per `token_tier` block	EAGER tokens / session decomposition

Decision rule (spec §6.2): task_delta = +0.95 >= +0.10 → keep.

2. 8 维 value_vector（Round 19；v0.4.0）

v0.4.0 按 Q4 用户决策新增第 8 维 eager_token_delta，破坏了 §6.1 自 v0.1.0 以来的字节级一致性（这是首次破坏）。

维度	Round 19	方向
`task_delta`	+0.95	相对 no-ability baseline 改善（real-LLM）
`token_delta`	+0.0	v0.4.0 发版时未变
`latency_delta`	+0.0	v0.4.0 发版时未变
`context_delta`	+0.0	v0.4.0 发版时未变
`path_efficiency_delta`	null	本轮未测量
`routing_delta`	+1.0	改善（trigger_F1 满分）
`governance_risk_delta`	0.0	未变化
`eager_token_delta`（v0.4.0 新增）	由 `token_tier` 块描述	每会话 EAGER token 分解

判定规则（规范 §6.2）：task_delta = +0.95 >= +0.10 → keep。

// CHAPTER 03 //

3. Real-LLM router-test sweep (8-cell MVP @ v0.4.0)

Round 18 ran the 8-cell MVP matrix via real_llm_runner.py against two real models (and a replay vs deterministic baseline). Best-cell pass_rate hits 1.0 across all 4 trigger_basic cells; near_miss_FP_rate sits at 0.0 across the entire matrix.

model	thinking_depth	scenario_pack	T1 pass_rate (best)
claude-haiku-4-5	fast	trigger_basic	1.0
claude-haiku-4-5	fast	near_miss	1.0 (FP=0)
claude-haiku-4-5	default	trigger_basic	1.0
claude-haiku-4-5	default	near_miss	1.0 (FP=0)
claude-sonnet-4-6	fast	trigger_basic	1.0
claude-sonnet-4-6	fast	near_miss	1.0 (FP=0)
claude-sonnet-4-6	default	trigger_basic	1.0
claude-sonnet-4-6	default	near_miss	1.0 (FP=0)

router_floor = composer_2/fast (the cheapest tuple where both packs reach the v2_tightened pass_rate >= 0.82 hard threshold; recorded in router_floor_report.yaml).

Si-Chip does not train router models (spec §5.2). The sweep evaluates existing model x thinking-depth combinations to find the cheapest cell that meets the gate, exactly as v0.1.0 did — only the harness backend swapped from deterministic SHA-256 simulation to real-LLM cache.

3. Real-LLM router-test 矩阵（8-cell MVP @ v0.4.0）

Round 18 用 real_llm_runner.py 跑了 8-cell MVP 矩阵，对接两个真实模型（外加 replay 与确定性 baseline）。最佳 cell 的 pass_rate 在 4 个 trigger_basic cell 上都达到 1.0；near_miss_FP_rate 在整个矩阵上都是 0.0。

model	thinking_depth	scenario_pack	T1 pass_rate（best）
claude-haiku-4-5	fast	trigger_basic	1.0
claude-haiku-4-5	fast	near_miss	1.0（FP=0）
claude-haiku-4-5	default	trigger_basic	1.0
claude-haiku-4-5	default	near_miss	1.0（FP=0）
claude-sonnet-4-6	fast	trigger_basic	1.0
claude-sonnet-4-6	fast	near_miss	1.0（FP=0）
claude-sonnet-4-6	default	trigger_basic	1.0
claude-sonnet-4-6	default	near_miss	1.0（FP=0）

router_floor = composer_2/fast（两个 scenario_pack 均达到 v2_tightened pass_rate >= 0.82 硬门槛的最便宜组合；记录在 router_floor_report.yaml）。

Si-Chip 不训练 router 模型（规范 §5.2）。该扫描评估的是既有 model × thinking-depth 组合，目的是找到达到 gate 阈值的最便宜单元——与 v0.1.0 的做法一致，只是 harness 后端从确定性 SHA-256 模拟换成 real-LLM cache。

// CHAPTER 04 //

4. Cross-platform sync (drift = 0; v0.4.0)

Tree	Files	SHA-of-SKILL.md	Drift
Source `.agents/skills/si-chip/`	21 (incl. `DESIGN.md`)	identical	n/a (canonical)
Mirror `.cursor/skills/si-chip/`	20 (no `DESIGN.md`)	identical	DRIFT_ZERO
Mirror `.claude/skills/si-chip/`	20 (no `DESIGN.md`)	identical	DRIFT_ZERO
Tarball `docs/skills/si-chip-0.4.0.tar.gz`	21 (incl. `DESIGN.md`)	identical (extracted)	reproducible

Three-tree summary verdict: ALL_TREES_DRIFT_ZERO. Tarball SHA-256 2cfcce00f989faf2467014e638b0ea1fa67870b5a1ee6b0531942be5a4be21ab (83060 bytes; deterministic options pinned via --mtime '2026-04-30 00:00:00 UTC').

4. 跨平台同步（drift = 0；v0.4.0）

目录树	文件数	SHA-of-SKILL.md	Drift
源头 `.agents/skills/si-chip/`	21（含 `DESIGN.md`）	identical	n/a（canonical）
镜像 `.cursor/skills/si-chip/`	20（无 `DESIGN.md`）	identical	DRIFT_ZERO
镜像 `.claude/skills/si-chip/`	20（无 `DESIGN.md`）	identical	DRIFT_ZERO
Tarball `docs/skills/si-chip-0.4.0.tar.gz`	21（含 `DESIGN.md`）	identical（解压后）	reproducible

三树汇总判定：ALL_TREES_DRIFT_ZERO。Tarball SHA-256 2cfcce00f989faf2467014e638b0ea1fa67870b5a1ee6b0531942be5a4be21ab （83060 字节；确定性参数固定，--mtime '2026-04-30 00:00:00 UTC'）。

// CHAPTER 05 //

5. The 14 spec invariants (v0.4.0)

The default python tools/spec_validator.py --json run reports verdict PASS at v0.4.0 (9 historical + 1 REACTIVATION_DETECTOR_EXISTS + 2 v0.3.0 additive + 3 v0.4.0 additive = 14 BLOCKERs):

[BLOCKER] BAP_SCHEMA: PASS                                  (§2.1)
[BLOCKER] R6_KEYS: PASS (37 keys per §3.1; ignores method-tag suffixes)
[BLOCKER] THRESHOLD_TABLE: PASS (30 cells per §4.1)
[BLOCKER] ROUTER_MATRIX_CELLS: PASS (mvp=8, intermediate=16, full=96)
[BLOCKER] VALUE_VECTOR_AXES: PASS (version-aware: 7 ≤ v0.3.0; 8 @ v0.4.0+)
[BLOCKER] PLATFORM_PRIORITY: PASS (Cursor -> Claude Code -> Codex)
[BLOCKER] DOGFOOD_PROTOCOL: PASS (8 steps + 6 evidence files; 7 when round_kind=ship_prep)
[BLOCKER] FOREVER_OUT_LIST: PASS (4 items; re-affirmed in §14.6/§18.7/§19.6/§20.6/§21.6/§22.7/§23.7)
[BLOCKER] REACTIVATION_DETECTOR_EXISTS: PASS (all 6 §6.4 trigger ids)
[BLOCKER] CORE_GOAL_FIELD_PRESENT: PASS                     (§14, v0.3.0)
[BLOCKER] ROUND_KIND_TEMPLATE_VALID: PASS                   (§15, v0.3.0)
[BLOCKER] TOKEN_TIER_DECLARED_WHEN_REPORTED: PASS           (§18, v0.4.0)
[BLOCKER] REAL_DATA_FIXTURE_PROVENANCE: PASS                (§19, v0.4.0)
[BLOCKER] HEALTH_SMOKE_DECLARED_WHEN_LIVE_BACKEND: PASS     (§21, v0.4.0)
verdict: PASS

The --strict-prose-count mode passes against any v0.2.0 / v0.3.0 / v0.4.0 spec (since the §13.4 prose was reconciled to 37 / 30 at v0.2.0); it intentionally fails on R6_KEYS and THRESHOLD_TABLE against the historical spec_v0.1.0.md (28 / 21 prose) — both verdicts are pinned in the v0.1.0 ship report under §13.4 for regression purposes.

5. 14 项规范不变量（v0.4.0）

默认的 python tools/spec_validator.py --json 运行结果在 v0.4.0 为 verdict PASS（9 条历史 + 1 条 REACTIVATION_DETECTOR_EXISTS + v0.3.0 新增 2 条 + v0.4.0 新增 3 条 = 14 BLOCKER）：

[BLOCKER] BAP_SCHEMA: PASS                                  (§2.1)
[BLOCKER] R6_KEYS: PASS (按 §3.1 共 37 key；忽略 method-tag 后缀)
[BLOCKER] THRESHOLD_TABLE: PASS (按 §4.1 共 30 cell)
[BLOCKER] ROUTER_MATRIX_CELLS: PASS (mvp=8, intermediate=16, full=96)
[BLOCKER] VALUE_VECTOR_AXES: PASS (版本敏感：v0.3.0 及以下 7 维；v0.4.0+ 8 维)
[BLOCKER] PLATFORM_PRIORITY: PASS (Cursor -> Claude Code -> Codex)
[BLOCKER] DOGFOOD_PROTOCOL: PASS (8 步 + 6 件证据；ship_prep 轮次 7 件)
[BLOCKER] FOREVER_OUT_LIST: PASS (4 项；§14.6/§18.7/§19.6/§20.6/§21.6/§22.7/§23.7 复申)
[BLOCKER] REACTIVATION_DETECTOR_EXISTS: PASS (§6.4 全部 6 个 trigger id)
[BLOCKER] CORE_GOAL_FIELD_PRESENT: PASS                     (§14，v0.3.0)
[BLOCKER] ROUND_KIND_TEMPLATE_VALID: PASS                   (§15，v0.3.0)
[BLOCKER] TOKEN_TIER_DECLARED_WHEN_REPORTED: PASS           (§18，v0.4.0)
[BLOCKER] REAL_DATA_FIXTURE_PROVENANCE: PASS                (§19，v0.4.0)
[BLOCKER] HEALTH_SMOKE_DECLARED_WHEN_LIVE_BACKEND: PASS     (§21，v0.4.0)
verdict: PASS

--strict-prose-count 模式在 v0.2.0 / v0.3.0 / v0.4.0 任一 spec 上都 PASS （§13.4 散文已在 v0.2.0 对齐到 37 / 30）；只在历史 spec_v0.1.0.md（28 / 21 散文）上故意失败 R6_KEYS 与 THRESHOLD_TABLE——两个判定都已固定记录在 v0.1.0 ship report 的 §13.4，用于回归。

// CHAPTER 06 //

6. Reproduce the numbers locally

git clone https://github.com/YoRHa-Agents/Si-Chip.git
cd Si-Chip

# 1. Spec validator (14 invariants — verdict PASS)
python tools/spec_validator.py --json

# 2. Re-aggregate the included simulated baseline runs
python .agents/skills/si-chip/scripts/aggregate_eval.py \
  --runs-dir evals/si-chip/baselines/with_si_chip \
  --baseline-dir evals/si-chip/baselines/no_ability \
  --skill-md .agents/skills/si-chip/SKILL.md \
  --templates-dir templates \
  --out /tmp/metrics_report.yaml

# 3. Confirm the v2_tightened packaging gate (metadata=94, body=4646, pass=true)
python .agents/skills/si-chip/scripts/count_tokens.py \
  --file .agents/skills/si-chip/SKILL.md --both \
  --budget-meta 100 --budget-body 5000 --json

# 4. (optional) Replay the Round 18 real-LLM cache at $0
#    See .agents/skills/si-chip/scripts/real_llm_runner_quickstart.md
python evals/si-chip/runners/real_llm_runner.py --help

6. 在本地复现这些数字

git clone https://github.com/YoRHa-Agents/Si-Chip.git
cd Si-Chip

# 1. Spec validator (14 invariants — verdict PASS)
python tools/spec_validator.py --json

# 2. Re-aggregate the included simulated baseline runs
python .agents/skills/si-chip/scripts/aggregate_eval.py \
  --runs-dir evals/si-chip/baselines/with_si_chip \
  --baseline-dir evals/si-chip/baselines/no_ability \
  --skill-md .agents/skills/si-chip/SKILL.md \
  --templates-dir templates \
  --out /tmp/metrics_report.yaml

# 3. Confirm v2_tightened packaging gate (metadata=94, body=4646, pass=true)
python .agents/skills/si-chip/scripts/count_tokens.py \
  --file .agents/skills/si-chip/SKILL.md --both \
  --budget-meta 100 --budget-body 5000 --json

# 4. (可选) 用 $0 回放 Round 18 real-LLM 缓存
#    详见 .agents/skills/si-chip/scripts/real_llm_runner_quickstart.md
python evals/si-chip/runners/real_llm_runner.py --help

// CHAPTER 07 //

7. What ships next (deferred from v0.4.0)

v3_strict promotion (2 fresh rounds at v3 thresholds; single blocker is metadata_tokens = 94 vs <= 80).
Codex .codex/ native runtime (still bridge-only at v0.4.0 per spec §11.2; re-evaluable after v3_strict is earned).
Broader IDE coverage (OpenCode / Copilot CLI / Gemini CLI; spec §11.2 deferred).
Multi-tenant hosted API surface (spec §11.2 deferred).

See CHANGELOG.md for the full v0.4.0 + v0.4.1 release notes (v0.4.1 is the doc-only patch that produced this Pages tree).

7. 后续交付（从 v0.4.0 延后）

升档至 v3_strict（在 v3 阈值下完成 2 轮新的 dogfood；唯一阻塞项是 metadata_tokens = 94 vs <= 80）。
Codex .codex/ 原生 runtime（v0.4.0 仍按规范 §11.2 仅 bridge；待 v3_strict 达成后重新评估）。
更广义 IDE（OpenCode / Copilot CLI / Gemini CLI；规范 §11.2 延后）。
Multi-tenant hosted API 表面（规范 §11.2 延后）。

完整的 v0.4.0 + v0.4.1 发版说明见 CHANGELOG.md （v0.4.1 是产生本 Pages 树的纯文档 patch）。

// CHAPTER 08 //

8. Historical baseline — v0.1.0 ship (2026-04-28; v1_baseline)

The original Si-Chip ship was at v1_baseline with deterministic SHA-256 PROXY runners. Preserved here for regression / reproducibility.

Round	pass_rate	trigger_F1	metadata_tokens	per_invocation_footprint	wall_clock_p95 (s)	half_retire	router_floor
1	0.85	0.89	78	4071	1.47	keep	composer_2/default
2	0.85	0.89	78	3598 (-11.6%)	1.47	keep	composer_2/default

Round 2 also populated R4_near_miss_FP_rate = 0.05, slimming SKILL.md body tokens from 2493 to 2020 (-18.97 %). The 7-axis value vector for Round 1 is preserved below — note task_delta = +0.35 >= +0.10 triggers the keep rule per spec §6.2:

value_vector:           # v0.1.0 — v0.3.0 used 7 axes
  task_delta: 0.35
  token_delta: -1.71
  latency_delta: -0.55
  context_delta: -1.71
  path_efficiency_delta: null
  routing_delta: 0.8934
  governance_risk_delta: 0.0

8. 历史基线 —— v0.1.0 发版（2026-04-28；v1_baseline）

最早的 Si-Chip 发版是在 v1_baseline 档位、使用确定性 SHA-256 PROXY runner。此处保留以便回归 / 复现。

轮次	pass_rate	trigger_F1	metadata_tokens	per_invocation_footprint	wall_clock_p95（秒）	half_retire	router_floor
1	0.85	0.89	78	4071	1.47	keep	composer_2/default
2	0.85	0.89	78	3598 (-11.6%)	1.47	keep	composer_2/default

Round 2 同时填充了 R4_near_miss_FP_rate = 0.05，并将 SKILL.md 正文 token 数从 2493 压缩到 2020（-18.97 %）。Round 1 的 7 维 value vector 见下——按规范 §6.2，task_delta = +0.35 >= +0.10 触发 keep 规则：

value_vector:           # v0.1.0 — v0.3.0 都是 7 维
  task_delta: 0.35
  token_delta: -1.71
  latency_delta: -0.55
  context_delta: -1.71
  path_efficiency_delta: null
  routing_delta: 0.8934
  governance_risk_delta: 0.0