Changelog
Changelog
All notable changes to Si-Chip are documented here. The format follows Keep a Changelog and this project adheres to Semantic Versioning.
Unreleased
Added
- (empty; post-v0.4.1 items land here)
0.4.1 - 2026-04-30
Summary
Si-Chip v0.4.1 — “Documentation patch (post-v0.4.0 sync sweep)”. Doc-only
patch with no Normative spec change (spec_v0.4.0.md remains the
active frozen spec; AGENTS.md §13 stays at 13 hard rules; the 14 BLOCKER
spec_validator invariants are unchanged). Closes the gap that the v0.4.0
ship left in user-facing material: INSTALL.md / CONTRIBUTING.md /
install.sh / the entire docs/ Pages tree (install body, user guide
body, architecture, demo, changelog, config, ZH index) were still quoting
v0.1.0 / v0.1.1 / v0.2.0 numbers (78/2020 token budget, 9-file payload,
“8 spec invariants”, spec_v0.1.0.md references, 7-axis value vector
prose) even after the v0.4.0 release tagged 14 BLOCKERs / 21-file tarball /
8-axis value vector / metadata=94, body=4646 / first v2_tightened
ship.
Changed (Documentation)
INSTALL.md: rewritten against v0.4.0 reality — 21-file tarball (1 SKILL.md + 1 DESIGN.md + 14 references + 5 scripts),metadata=94 / body=4646token budget against the v0.4.0 v2_tightened gate, every reference from §14 / §15 / §16 / §18 / §19 / §20 / §21 / §22 / §23 enumerated under “What gets installed”, smoke-test section now lists the 14 BLOCKERs and the optional real-LLM cache replay, troubleshooting section explains themetadata_tokens=94 > 80v3_strict deferral.CONTRIBUTING.md: spec reference bumped tospec_v0.4.0.md, §4 “Required Local Checks” updated to “14 BLOCKER spec invariants” plus optionalmethod_tag_validator.py/health_smoke.py, §6 “Bumping the Spec” reflects v0.2.0+ prose reconciliation + spec-version-aware validator mapping (EXPECTED_VALUE_VECTOR_AXES_BY_SPEC,SUPPORTED_SPEC_VERSIONS), §9 “Mirror Drift Contract” lists the full 20-file public payload + 21-file tarball (with thev0.4.0mtime'2026-04-30 00:00:00 UTC'and SHA-256 sidecar refresh).install.sh+docs/install.sh(kept byte-identical): banner / help / spec URL / payload comment all bumped to v0.4.0 (default--versionwas alreadyv0.4.0inSI_CHIP_VERSION_DEFAULTsince the v0.4.0 release; only the help-text default and surrounding comments still said v0.1.0).docs/_install_body.md+docs/_userguide_body.md: full bilingual (EN + ZH) rewrite to match the newINSTALL.md/USERGUIDE.mdbodies; ZH blocks no longer claim 7-axis value vector or 8-invariant validator; both blocks now describe v0.3.0 + v0.4.0 add-on chapters §14 / §15 / §18 — §23 in the “1.7 add-ons” section.docs/architecture.md: mermaid diagram now points atspec_v0.4.0.mdand shows the 21-file source tree + 20-file mirrors + 21-file tarball; promotion-ladder section reflectsv2_tightenedship state; half-retire decision diagram now labels the 8-axis value_vector with explicitv0.4.0+provenance; new chapter §6 documents the three top-level invariants (core_goal/token_tier/promotion_state) added in v0.3.0 + v0.4.0.docs/changelog.md: full re-sync from the repo-rootCHANGELOG.md(was last synced at v0.2.0, missing v0.3.0 + v0.4.0 entries entirely).docs/_config.yml:descriptionbumped to “frozen spec v0.4.0”,versionbumped to0.4.1, Spec nav URL points atspec_v0.4.0.md.docs/index.md: ZH (<div lang="zh">) block fully ported from the v0.2.0 numbers to the v0.4.0 ship state (matches the EN block that was already updated in the v0.4.0 release).README.mdbadges bumped fromv0.4.0%20ship--eligibletov0.4.1%20ship--eligible(project version; spec / gate badges unchanged because v0.4.1 is doc-only).
Unchanged (Normative)
.local/research/spec_v0.4.0.mdis byte-identical (no Normative edits; this patch is doc-only)..rules/si-chip-spec.mdcandAGENTS.mdare byte-identical (still 13 hard rules in §13; still 14 BLOCKERs in spec_validator)..agents/skills/si-chip/SKILL.mdand the 14 references / 5 scripts are byte-identical → drift remains 0 across the three-tree mirror; no rebuild ofdocs/skills/si-chip-0.4.0.tar.gzis required (SHA-2562cfcce00f989faf2467014e638b0ea1fa67870b5a1ee6b0531942be5a4be21abremains the published artifact).
Files
- 11 modified (
README.md,USERGUIDE.md,INSTALL.md,CONTRIBUTING.md,install.sh,docs/install.sh,docs/_install_body.md,docs/_userguide_body.md,docs/architecture.md,docs/changelog.md,docs/_config.yml,docs/index.md,CHANGELOG.md); 0 new files; 0 deletions.
0.4.0 - 2026-04-30
Summary
Si-Chip v0.4.0 — “Token Economy + Real-Data Verification + Lifecycle State Machine + Health Smoke + Eval-Pack Curation + Method-Tagged Metrics + Real-LLM Runner”. 19 consecutive v1_baseline + 2 consecutive v2_tightened passes; FIRST Si-Chip release at v2_tightened (= standard) gate (vs v0.2.0 / v0.3.0 at relaxed = v1_baseline). Spec promoted rc1 → frozen with body byte-identical except metadata; AGENTS.md §13 Agent Behavior Contract grows 10 → 13 hard rules (rule 11: token_tier; rule 12: real-data-fixture-provenance; rule 13: health-smoke-when-live-backend).
Added (Normative)
- Spec §14 cross-section continuity — preserved from v0.3.0; §6.1 value_vector axes 7→8 (adds
eager_token_delta; FIRST byte-identicality break since v0.1.0 → v0.2.0 prose-count alignment, per Q4 user decision). - Spec §18 Token-Tier Invariant: top-level
token_tier {C7_eager_per_session, C8_oncall_per_trigger, C9_lazy_avg_per_load}block (besidemetricsandcore_goal; NOT inside R6 D2); EAGER-weighted iteration_delta formulaweighted_token_delta = 10×eager + 1×oncall + 0.1×lazy;lazy_manifestpackaging gate; prose_class taxonomy (Informative); R3 split intoR3_eager_only/R3_post_trigger;tier_transitionsblock oniteration_delta_report.yaml. - Spec §19 Real-Data Verification: Normative sub-step of §8.1 step 2
evaluate(main 8-step list count unchanged); 3-layer pattern (msw fixture provenance + user-install + post-recovery live verification);templates/feedback_real_data_samples.template.yaml; new BLOCKER 13REAL_DATA_FIXTURE_PROVENANCE. - Spec §20 Stage Transitions & Promotion History:
stage_transition_tableper §2.2 stage enum DAG (reverse transitions forbidden);BasicAbility.lifecycle.promotion_historyappend-only;metrics_report.yaml.promotion_statefirst-class top-level block;ship_decision.yamlbecomes the 7th evidence file whenround_kind == 'ship_prep'. - Spec §21 Health Smoke Check:
BasicAbility.packaging.health_smoke_check4-axis taxonomy{read, write, auth, dependency}; OPTIONAL at schema level, REQUIRED whencurrent_surface.dependencies.live_backend: true; new BLOCKER 14HEALTH_SMOKE_DECLARED_WHEN_LIVE_BACKEND; OTel semconv extensiongen_ai.tool.name=si-chip.health_smoke. - Spec §22 Eval-Pack Curation Discipline: 40-prompt minimum for v2_tightened promotion (curated near-miss bucket); G1
_provenance ∈ {real_llm_sweep, deterministic_simulation, mixed}first-class REQUIRED; deterministic seeding rulehash(round_id + ability_id); real-LLM cache directory at.local/dogfood/<DATE>/<round_id>/raw/real_llm_runner_cache/. - Spec §23 Method-Tagged Metrics:
<metric>_methodcompanion fields (token:{tiktoken, char_heuristic, llm_actual}; quality/routing:{real_llm, deterministic_simulator, mixed}; G1:{real_llm_sweep, deterministic_simulation, mixed});_ci_low/_ci_high95% CI bands;U1_language_breakdown;U4_state ∈ {warm, cold, semicold};spec_validator::R6_KEYSignores companion suffixes. - Spec §17 Agent Behavior Contract Add-ons: 3 new hard rules compiled into
AGENTS.mdvia.rules/si-chip-spec.mdc(rule 11: token_tier; rule 12: real-data-fixture-provenance; rule 13: health-smoke-when-live-backend); AGENTS.md §13 grows 10 → 13 rules.
Added (Tooling)
evals/si-chip/runners/real_llm_runner.py(884 LoC) + 27 tests — FIRST production real-LLM runner; unblocks T2_pass_k from deterministic SHA-256 PROXY 0.5478 lower bound (→ honest 1.0 best-cell @ Round 18). IncludesRouterFloorAdapter,AnthropicMessagesClient(rawrequests.postagainst Veil-egressed/v1/messages),RealLlmRunner.evaluate_pack/evaluate_router_matrix, cache directory per §22.6,--seal-cacheflag for CI determinism.tools/health_smoke.py(642 LoC) + 30 tests — implements §21 4-axis{read, write, auth, dependency}probe runner.tools/method_tag_validator.py(465 LoC) + 19 tests — implements §23<metric>_methodcompanion validator.tools/eval_skill.pyextended with 4 new helpers (token-tier decomposition, MCP-pretty static check, template-default-data anti-pattern detector, health-smoke runner) + G2/G3/G4 helpers; 31 → 34 tests.tools/spec_validator.pySCRIPT_VERSION0.2.0 → 0.3.0; 11 → 14 BLOCKERs (addsTOKEN_TIER_DECLARED_WHEN_REPORTED,REAL_DATA_FIXTURE_PROVENANCE,HEALTH_SMOKE_DECLARED_WHEN_LIVE_BACKEND); version-awareEXPECTED_VALUE_VECTOR_AXES_BY_SPEC(7 axes ≤ v0.3.0; 8 axes @ v0.4.0+);R6_KEYSignores method-tag companion suffixes;EVIDENCE_FILESis round_kind-aware (7 files whenround_kind == 'ship_prep', else 6);SUPPORTED_SPEC_VERSIONSaddsv0.4.0-rc1andv0.4.0.
Added (Schema/Templates)
templates/basic_ability_profile.schema.yaml$schema_version0.2.0 → 0.3.0; additively addslifecycle.promotion_history(per §20.2),current_surface.dependencies.live_backend(per §21.2),packaging.health_smoke_checkarray (per §21.1),metrics.<dim>.<metric>_method/_ci_low/_ci_highcompanion fields (per §23).templates/iteration_delta_report.template.yaml$schema_version0.2.0 → 0.3.0; additively addstier_transitionsblock (per §18.6), 8-axis value_vector witheager_token_delta(per §6.1 v0.4.0 modification), OPTIONALverdict.weighted_token_delta_v0_4_0field (per §18.2).templates/next_action_plan.template.yaml$schema_version0.2.0 → 0.3.0; additively adds sibling fieldtoken_tier_target ∈ {relaxed, standard, strict}(per §15 round_kind 工艺 extension, aligned with §4 v1/v2/v3 gate).- 5 NEW templates:
lazy_manifest.template.yaml(per §18.5),feedback_real_data_samples.template.yaml(per §19.2),ship_decision.template.yaml(per §20.4),recovery_harness.template.yaml(per §22.4),method_taxonomy.template.yaml(per §23.1). - 1 NEW Informative reference:
templates/eval_pack_qa_checklist.md(per §22.3).
Added (Documentation)
.local/research/r12_v0_4_0_industry_practice.md(712 lines; 47 R-items mapped; 34 cited sources; primary v0.4.0 evidence base)..local/research/r12.5_real_llm_runner_feasibility.md(540 lines; Stage 1 spike PROCEED_MAJOR verdict)..local/research/spec_v0.4.0-rc1.md(2304 lines; pinned historical record)..local/research/spec_v0.4.0.md(frozen; promoted from rc1; body byte-identical except metadata).- 6 NEW reference docs under
.agents/skills/si-chip/references/(mirrored across.cursor/+.claude/trees):token-tier-invariant-r12-summary.md,real-data-verification-r12-summary.md,lifecycle-state-machine-r12-summary.md,health-smoke-check-r12-summary.md,eval-pack-curation-r12-summary.md,method-tagged-metrics-r12-summary.md. - 1 NEW quickstart
.agents/skills/si-chip/scripts/real_llm_runner_quickstart.md.
Verified (Dogfood)
- Round 16 (
code_change): 4 efficiency axes ≥ +0.05; v1_baseline PASS; 16th consecutive v1 pass. - Round 17 (
measurement_only): C0 monotonicity 1.0 → 1.0 verified; FIRST non-vacuous within-v0.4.0-rc1 monotonicity witness; v1_baseline carry-forward (17th consecutive). - Round 18 (
code_change): FIRST v2_tightened PASS viareal_llm_runner.pyfirst dogfood-side invocation against claude-haiku-4-5 + claude-sonnet-4-6 via Veil litellm-local egress at127.0.0.1:8086; T2_pass_k = 1.0 best cell across mvp 8-cell matrix (vs deterministic SHA-256 PROXY floor 0.5478 since Round 1); 10/10 v2_tightened thresholds PASS; $0.20 spend; 640 calls; consecutive_v2_passes=1. - Round 19 (
code_change): SECOND consecutive v2_tightened pass via real-LLM cache replay byte-equivalence to Round 18 ($0 additional spend; 100% cache hit; 0 live calls); 19th consecutive v1_baseline pass; consecutive_v2_passes=2; per §4.2 promotion rule v0.4.0 ship gate at v2_tightened (standard) is PROMOTION ELIGIBLE EFFECTIVE Round 20.
Unchanged (Forever-Out — §11.1)
- No marketplace; no router-model training; no generic IDE compat layer; no Markdown-to-CLI converter.
- §14.6 + §18.7 + §19.6 + §20.6 + §21.6 + §22.7 + §23.7 verbatim re-affirm §11.1’s 4 forever-out items.
Files
- 17+ new files (6 reference docs × 3 trees + 5 templates + 5 tooling files + 4 docs + 1 quickstart + 4 round dirs + 2 ship artifacts); 11+ modified (SKILL.md × 3 trees + .rules/si-chip-spec.mdc + AGENTS.md + .compile-hashes.json + 3 templates + spec_validator.py + tools/eval_skill.py + install.sh + docs/install.sh + docs/_install_body.md + CHANGELOG.md); deterministic tarball
docs/skills/si-chip-0.4.0.tar.gz(SHA-2562cfcce00f989faf2467014e638b0ea1fa67870b5a1ee6b0531942be5a4be21ab; 83060 bytes; reproducible across rebuilds).
0.3.0 - 2026-04-29
Summary
Si-Chip v0.3.0 — “Core-Goal Invariant + round_kind enum” — ships as the
formal release following Round 14 (code_change) + Round 15
(measurement_only) consecutive PASSes at v1_baseline against the
v0.3.0-rc1 spec. Spec promoted rc1 → frozen with body byte-identical;
AGENTS.md §13 Agent Behavior Contract grows from 8 → 10 hard rules
(rule 9: core_goal_test_pack + C0 = 1.0; rule 10: round_kind
4-value enum). Ship gate: relaxed (= v1_baseline; same gate as
v0.2.0); v2_tightened deferred again pending real-LLM runner per
v0.2.0 known limitations.
Added (Normative)
- Spec §14 Core-Goal Invariant: BasicAbility now requires a
core_goalblock withstatement,test_pack_path, andminimum_pass_rate: 1.0(locked); top-level invariant; not R6 D8 per §14.5. - Spec §15 round_kind Enum: 4 values (
code_change | measurement_only | ship_prep | maintenance) with per-kind iteration_delta clause (strict / monotonicity_only / WAIVED / WAIVED) per §15.2; universal C0 = 1.0 + monotonicity per §15.3; consecutive-rounds promotion rule §15.4. - Spec §17 Agent Behavior Contract Add-ons: 2 new hard rules
(9: core_goal_test_pack + C0; 10: round_kind enum) compiled into
AGENTS.mdvia.rules/si-chip-spec.mdc.
Added (Informative)
- Spec §16 Multi-Ability Dogfood Layout:
.local/dogfood/<DATE>/abilities/<id>/round_<N>/(Informative @ v0.3.0; promote to Normative at v0.3.x once 2+ abilities migrate).
Added (Tooling)
tools/cjk_trigger_eval.py(586 LoC) — generic CJK-aware trigger F1 evaluator.tools/eval_skill.py(786 LoC) — generic per-ability evaluation harness (replaces 768-line ability-specific harnesses).tools/multi_handler_redundant_call.py(404 LoC) — L4 redundant-call analyzer over ALL handlers.tools/round_kind.py(220 LoC) —round_kindenum + iteration_delta clause helpers.- 4 companion test files (1138 LoC, 70 unit tests).
Added (Schema/Templates)
templates/basic_ability_profile.schema.yaml$schema_version0.1.0 → 0.2.0; new REQUIREDcore_goalblock.templates/iteration_delta_report.template.yamlextended additively withcore_goal_check+round_kind.templates/next_action_plan.template.yamlextended additively withround_kind.
Added (spec_validator)
tools/spec_validator.pySCRIPT_VERSION 0.1.4 → 0.2.0;SUPPORTED_SPEC_VERSIONSaddsv0.3.0-rc1(andv0.3.0); 9 → 11 BLOCKERs with newCORE_GOAL_FIELD_PRESENTandROUND_KIND_TEMPLATE_VALID; backward-compat preserved for v0.1.0 / v0.2.0-rc1 / v0.2.0 spec modes.
Added (Documentation)
.local/research/r11_core_goal_invariant.md(967 lines) — research brief..local/research/spec_v0.3.0-rc1.md(1216 lines) — pinned historical record..local/research/spec_v0.3.0.md— promoted frozen spec (body byte-identical to rc1; frontmatter / H1 / preamble / Reconciliation Log only)..agents/skills/si-chip/references/{core-goal-invariant,round-kind,multi-ability-layout}-r11-summary.md— 3 new reference docs (mirrored to.cursor/skills/...and.claude/skills/...)..agents/skills/si-chip/scripts/eval_skill_quickstart.md— CLI cheat-sheet (mirrored to both platform mirrors).
Verified (Dogfood)
- Round 14 (
round_kind: code_change): 6 evidence files + 4 abilities-tree files + 14 raw artifacts; C0 = 1.0 (5/5); spec_validator dual-spec 11/11 + 11/11 PASS; 14th consecutive v1_baseline pass; 2 axes ≥ +0.05 (governance_risk + generalizability). - Round 15 (
round_kind: measurement_only): 6 evidence files + 11 raw artifacts; C0 monotonicity 1.0 → 1.0 verified; spec_validator dual-spec 11/11 + 11/11 replay PASS; 15th consecutive v1_baseline pass; canonical demonstration of §15 round_kind enum in production. - Both rounds: 395 pytest passed / 1 skipped; mirrors byte-identical (V3_drift_signal = 0.0).
Unchanged (Forever-Out — §11.1)
- No marketplace; no router-model training; no generic IDE compatibility layer; no Markdown-to-CLI converter. v0.3.0 §14.6 re-affirms verbatim.
Files
- 11 new files; 8 modified files; +47 SKILL.md lines / +165 templates lines / +3134 tools LoC / +578 reference doc lines.
- Tarball:
docs/skills/si-chip-0.3.0.tar.gz(SHA-2560c3390d355f0ef794d2ba6bc94f3223e24305d523672ec95d5e7aed41b01acac; 62 343 bytes; 1 SKILL.md + 1 DESIGN.md + 8 references + 4 scripts = 14 files; deterministic build via--owner=0 --group=0 --numeric-owner --mtime=2026-04-29 --sort=name --exclude='*/__pycache__' --exclude='si-chip/scripts/test_*.py').
Ship Verdict
ship_eligible: trueship_gate_achieved: relaxed(=v1_baseline; same gate as v0.2.0 per §4.2 + §15.4 promotion rule)consecutive_v1_passes: 15(Rounds 1-15)consecutive_v2_passes: 0(T2_pass_k still pending real-LLM runner per v0.2.0 known limitation)
0.2.0 — 2026-04-28
Summary
Si-Chip v0.2.0 — “Full-taxonomy dogfood complete” — ships as the formal
release following a 13-round self-optimization cycle from the v0.1.0
baseline. R6 metric taxonomy reaches 28+ measured sub-metrics across
6 of the 7 dimensions; §6.4 reactivation detector implemented with
all 6 triggers; spec v0.1.0 reconciled to v0.2.0 (prose aligned with
§3.1 / §4.1 TABLES; Normative semantics byte-identical to v0.1.0).
Ship gate: relaxed (= v1_baseline; 13 consecutive passes).
v2_tightened promotion deferred to v0.3.0 (pending real-LLM runner
to replace the pass_k_4 = pass_rate^4 PROXY formula).
Highlights
- 13 dogfood rounds, each patch-versioned: v0.1.1 → v0.1.12 (the per-
round entries that were under
[Unreleased]before this ship are now consolidated into the §Added round-by-round summary below; the full per-round detail is preserved verbatim). - R6 metric coverage: 6 of 7 dimensions at full or near-full
sub-metric fill:
- D1 task_quality: 3/4 (T4 out-of-scope for v0.1.x)
- D2 context_economy: 5/6 (C3 null by design — on-demand loading)
- D3 latency_path: 4/7 (L5/L6/L7 require real-LLM runner — v0.3.0)
- D4 generalizability: 1/4 (G1 proxy filled; G2-G4 v0.3.0)
- D5 usage_cost: 4/4 (FULL COVERAGE)
- D6 routing_cost: 7/8 measured (R3-R8 + R5 hoist; R1/R2 require real-LLM)
- D7 governance_risk: 4/4 (FULL COVERAGE; tools/governance_scan.py)
- §6.4 Reactivation Detector (
tools/reactivation_detector.py): all 6 triggers with 31 unit tests;spec_validator.pyREACTIVATION_DETECTOR_EXISTSBLOCKER invariant. - Spec reconciliation v0.1.0 → v0.2.0: §13.4 prose aligned
with §3.1 TABLE (28 → 37 sub-metric count) and §4.1 TABLE
(21 → 30 numeric threshold cells). §3/§4/§5/§6/§7/§8/§11
Normative semantics byte-identical to v0.1.0 per
.local/dogfood/2026-04-28/round_11/raw/normative_diff_check.jsonverdictNORMATIVE_TABLES_BYTE_IDENTICAL. Spec frozen at.local/research/spec_v0.2.0.md; v0.2.0-rc1 retained at.local/research/spec_v0.2.0-rc1.mdas pinned historical record. - 16-cell intermediate router-test profile added at Round 9
(additive to the 8-cell MVP and 96-cell Full profiles; templates
bumped to
$schema_version: 0.1.1). - Installer telemetry (
tools/install_telemetry.py) validates thev0.1.1one-line installer claim: U3_setup_steps_count=1 non-interactive; U4_time_to_first_success=0.0073 s dry-run floor estimate. - Spec validator extended: 9 BLOCKER invariants (was 8);
REACTIVATION_DETECTOR_EXISTSadded in Round 12.--strict-prose-countmode now PASSES against v0.2.0 / v0.2.0-rc1 (closes v0.1.0 ship-report known-limitation). Validator acceptsv0.1.0,v0.2.0-rc1, ANDv0.2.0spec paths (backward-compat preserved). - Deterministic tarball:
docs/skills/si-chip-0.2.0.tar.gz(SHA-256cb69c4b65e11a3cfd19ddafd5065e9e266ba19d20796ebb9ef0d6f9b13be4c3b; 1 SKILL.md + 5 references + 3 scripts; same canonical layout as v0.1.0 — v0.1.12).
Ship Verdict
ship_eligible: trueship_gate_achieved: relaxed(=v1_baseline; same gate as v0.1.0)consecutive_v1_passes: 13consecutive_v2_passes: 0(T2_pass_k=0.5478 fails v2_tightened ≥ 0.55 by -0.0022 via the deterministic simulator’spass_k_4 = pass_rate^4PROXY; real k=4 sampling expected to clear — v0.3.0 target)
Known Limitations / Roadmap to v0.3.0
- Real-LLM runner for L5 detour_index, L6 replanning_rate, L7 think_act_split (D3 latency-path completion).
- G2/G3/G4 cross-domain/OOD/model-version-stability fills.
- v2_tightened gate promotion via real k=4 sampling.
- C3_resolved_tokens — only meaningful when Si-Chip resolves references eagerly (currently on-demand per §7.3); may stay null permanently or bump to a richer measurement in v0.3.x.
- R1/R2 trigger_precision / trigger_recall require real-LLM per-prompt routing-descriptor-match confidence (no §4.1 hard threshold; tracked for v0.3.0).
Added
- Round 13 (v0.1.12) dogfood: SHIP-PREP REVERT-ONLY round per L0 PATH decision. The Round 12 7th-case experiment (
evals/si-chip/cases/reactivation_review.yaml) regressedT2_pass_kfrom Round 11’s 0.5478 to Round 12’s 0.4950 (-0.0528) under the deterministic SHA-256 simulator (per-casepass_rate=0.65× pass_k_4=pass_rate^4 PROXY). Round 13 reverts the 7th case + Round-12-specific baselines and restoresT2_pass_k = 0.5477708333333333byte-identical to Round 11. KEPT from Round 12:tools/reactivation_detector.py+ 31 unit tests +tools/spec_validator.py REACTIVATION_DETECTOR_EXISTSBLOCKER (still 9/9 PASS). The Round 12 evidence files at.local/dogfood/2026-04-28/round_12/are RETAINED unchanged as honest negative-result historical record. - All 6 evidence files at
.local/dogfood/2026-04-28/round_13/(basic_ability_profile.yaml,metrics_report.yaml,router_floor_report.yaml,half_retire_decision.yaml,next_action_plan.yaml,iteration_delta_report.yaml) plusraw/derivations (revert_diff.jsonenumerating every removed/kept/modified file;round_11_vs_round_13_metrics_diff.jsonproving 37/37 sub-metric byte-identity vs Round 11;aggregator_raw_output.yamlfull live-derivation trace;spec_validator.json9/9 PASS default-mode;governance_scan.jsonV1-V4 + provenance;install_telemetry.jsonU3-U4;r4_near_miss_FP_rate_derivation.json6-case re-derivation;notes.mdround narrative). All 6 metric-bearing yaml files compose the §8.2 minimum evidence set. .local/dogfood/2026-04-28/v0.2.0_ship_decision.yamlv2 (OVERWRITES Round 12 v1):verdict: SHIP_ELIGIBLE,ship_eligible: true,ship_gate_achieved: relaxed(= v1_baseline; same gate as v0.1.0 ship),ship_gate_v2_tightened_deferred_to: v0.3.0,consecutive_v1_passes: 13,round_12_regression_resolved_in_round_13: true. Thev2_tightened_threshold_check.round_11/round_12sections are PRESERVED verbatim as evidence of the honest PATH A attempt; the newround_13section shows T2_pass_k recovered to 0.5478 byte-identical to Round 11. Fullv1_baseline_ship_verdictblock emitted with all 13 consecutive v1 passes traced..local/dogfood/2026-04-28/v0.2.0_ship_report.mdv2 (OVERWRITES Round 12 v1): the SHIP_ELIGIBLE narrative — exec summary, 13-round story table, what Round 13 delivered (revert + KEPT-from-Round-12 + iteration-delta clause via genuine recovery), why v2_tightened is structurally deferred to v0.3.0 (the deterministic simulator’spass_k_4 = pass_rate^4PROXY is a lower bound; real-LLM runner with k=4 sampling is the unblock), v0.3.0 roadmap (a1 real-LLM runner; a2 G2/G3/G4 fills; a3 L5/L6/L7 fills; a4 v2_tightened promotion), known limitations carry-forward (10 items, all v0.3.x candidates), Round 12 honest negative-result preservation, acknowledgments + ship verdict.docs/skills/si-chip-0.1.12.tar.gzdeterministic release tarball (SHA-256b0bb00166a660d4cba82c375d5d8d3778b02b7618e2ede271c8ad762ce21a400; reproducible across rebuilds — verified viatar --sort=name --mtime='UTC 2026-04-28' --owner=0 --group=0 --numeric-owner --exclude='__pycache__' --exclude='test_*.py' --exclude='DESIGN.md' -czftwice yielding identical hashes; same canonical layout as v0.1.0 through v0.1.11: 1 SKILL.md (frontmatter version 0.1.12) + 5 references + 3 scripts).- Round 12 (v0.1.11) dogfood: §6.4 reactivation-trigger detector lands as
tools/reactivation_detector.py— KEPT through Round 13 revert (the §6.4 detector is a Normative-spec implementation; only the experimental 7th eval case + Round-12-specific baselines were reverted). (NEW;SCRIPT_VERSION = 0.1.0) implementing all 6 spec §6.4 triggers verbatim by their canonical IDs (new_model_no_ability_baseline_gap,new_scenario_or_domain_match,router_test_requires_ability_for_cheap_model,efficiency_axis_becomes_significant,upstream_api_change_wrapper_stabilizes,manual_invocation_rebound) plus a top-leveldetect_reactivation(profile, decision, metrics)orchestrator. CLI surface:python tools/reactivation_detector.py --check <profile.yaml> --jsonexits 0 for clean keep, 0 for valid half_retired wake-up, 2 for unexpected fire on a keep ability (per workspace rule “No Silent Failures”). Triggers 2/5/6 are parameterised-off in v0.1.11 because no scenario catalog / wrapper-stability log / manual-invocation log exists yet — the contract is in place for future rounds to opt-in. Against Round 12 evidence the CLI reportstriggered_count=0(decision=keep; clean exit 0). tools/test_reactivation_detector.py(NEW): 31 unit tests covering each of the 6 triggers’ positive + negative path + threshold boundary cases, the integration viadetect_reactivationon the real Round-11 evidence triple (clean keep → 0 triggers fired) and on a synthetic half_retired profile (router_floor drop fires trigger 3), and the CLI exit-code matrix (0 for clean keep, 0 for valid wake-up on half_retired, 2 for unexpected fire on a keep ability).tools/spec_validator.pyextended with a 9th BLOCKER invariantREACTIVATION_DETECTOR_EXISTS(SCRIPT_VERSION 0.1.2 → 0.1.3) asserting thattools/reactivation_detector.pyexists, references all 6 §6.4 trigger IDs verbatim in its source, and ships with a sibling test file containing at least 1 test per trigger.--jsondefault-mode now emits 9/9 PASS (was 8/8 pre-Round-12);--strict-prose-countagainst v0.2.0-rc1 also 9/9 PASS (the new BLOCKER is independent of the strict-prose-count mode). Backward-compat preserved: validator still accepts both v0.1.0 and v0.2.0-rc1 spec paths.tools/test_spec_validator.pyextended with 4 new tests: real-repo passes the new BLOCKER; missing detector file FAILs; missing trigger ID in detector FAILs; missing test file FAILs. Existingtest_default_mode_8_of_8_passrenamed totest_default_mode_9_of_9_passto reflect the new invariant count.evals/si-chip/cases/reactivation_review.yaml(NEW): 7th eval case with 20 prompts (10should_triggercovering the §6.4 detector workflow + 10should_not_triggernear-miss prompts) testing the newtools/reactivation_detector.pyworkflow. Per-casepass_rate=0.65under the deterministic SHA-256 simulator withseed=42— this lowers the cross-caseT2_pass_kfrom Round 11’s 0.5478 to Round 12’s 0.4950 (regression of -0.0528). Documented honestly in.local/dogfood/2026-04-28/round_12/raw/v2_tightened_approach.mdper the L3 task brief’s explicit “I want HONESTY here, not number-chasing” instruction; we did NOT cherry-pick the seed, did NOT cherry-pick the case_id, did NOT editprompt_outcomes, and did NOT manually bump per-case pass_rate.evals/si-chip/baselines/with_si_chip_round12/andevals/si-chip/baselines/no_ability_round12/(NEW): 7-case regenerated baselines underseed=42. 7-case T1=0.821 (was 0.85 with 6 cases; razor-thin pass v2_tightened ≥ 0.82 by +0.001), T2=0.495 (was 0.5478; FAIL v2_tightened ≥ 0.55 by -0.055), T3=0.321 (was 0.35), R3=0.873 (was 0.893; still PASS v2_tightened ≥ 0.85 by +0.023), R4=0.0429 (was 0.05; better — new case has 0 FPs), U2=0.686 (was 0.75), L1=1.20 (was 1.22), L2=1.47 (was 1.47; trivial movement), R7=0.0234 (was 0.0233).- v2_tightened readiness check across Round 11 + Round 12: BLOCKED.
consecutive_v2_passes = 0— neither round individually clears every v2 hard threshold. Round 11 fails on T2_pass_k (0.5478, -0.0022) AND iteration_delta any axis (+0.05, -0.05; governance_risk drift-removal at v1_baseline bucket only). Round 12 fails on T2_pass_k (0.4950, -0.055; REGRESSION vs Round 11). Per spec §4.2 promotion rule, v0.2.0 ship at v2_tightened gate is BLOCKED. See.local/dogfood/2026-04-28/round_12/raw/v2_readiness_verdict.mdfor the full per-row pass/fail trace andraw/v2_tightened_round_11_check.md+raw/v2_tightened_round_12_check.mdfor the per-round tables. .local/dogfood/2026-04-28/v0.2.0_ship_decision.yaml(NEW): emitted withverdict: SHIP_BLOCKED,ship_eligible: false,ship_gate_attempted: standard(= v2_tightened),consecutive_v2_passes: 0. Two paths to v0.2.0 ship pre-specified: PATH A (recommended) = real-LLM runner upgrade in Round 13 + v2_tightened verification in Round 14 (naturally closes T2_pass_k blocker because real sampling at k=4 is typically higher than thepass_rate^4PROXY); PATH B (alternative) = ship at v1_baseline as v0.2.0 with documentation (12 consecutive v1_baseline passes Rounds 1-12; same gate as v0.1.0 ship). L0 orchestrator chooses..local/dogfood/2026-04-28/v0.2.0_ship_report.md(NEW): companion tov0.2.0_ship_decision.yaml; executive summary, what Round 12 delivered (§6.4 detector + spec_validator extension + 7th case), per-round v2_tightened trace, both ship-paths’ next steps, known-limitations carry-forward (10 items: deterministic simulator, G1 partial proxy, G2/G3/G4 null, L5/L6/L7 null, R1/R2 null, R8 tfidf-cosine-mean parallel, C5 heuristic, U4 dry-run floor, V2 pattern-only, §6.4 trigger 2/5/6 catalogs not yet seeded), spec §11 forever-out compliance verbatim.- All 6 evidence files at
.local/dogfood/2026-04-28/round_12/(basic_ability_profile.yaml,metrics_report.yaml,router_floor_report.yaml,half_retire_decision.yaml,next_action_plan.yaml,iteration_delta_report.yaml) plusraw/derivations (reactivation_check.json= detector CLI output:triggered_count=0decision=keep;spec_validator.json= 9/9 PASS default-mode;governance_scan.json+install_telemetry.jsonre-run for Round 12;aggregator_raw_output.yaml= full live-derivation trace;v2_tightened_round_11_check.md+v2_tightened_round_12_check.md+v2_tightened_approach.md+v2_readiness_verdict.md= the four-document v2 verdict trail;r4_near_miss_FP_rate_derivation.json= re-derived for 7-case mean;notes.md+aggregate_eval.log). docs/skills/si-chip-0.1.11.tar.gzdeterministic release tarball (SHA-25637d3c0ecebec52a6494d581f1fd1fa187ebc273a1e09fd3e96128bbf9a101bcc; reproducible across rebuilds — verified viatar --sort=name --mtime='UTC 2026-04-28' --owner=0 --group=0 --numeric-owner --exclude='__pycache__' --exclude='test_*.py' --exclude='DESIGN.md' -czftwice yielding identical hashes; same canonical layout as v0.1.0 through v0.1.10: 1 SKILL.md (frontmatter version 0.1.11) + 5 references + 3 scripts).- Round 11 (v0.1.10) dogfood: spec v0.1.0 → v0.2.0-rc1 reconciliation. The §13.4 prose is now aligned with the §3.1 TABLE (28 → 37 sub-metrics) and the §4.1 TABLE (21 → 30 numeric threshold cells). §3 / §4 / §5 / §6 / §7 / §8 / §11 Normative semantics are byte-identical to v0.1.0 per
.local/dogfood/2026-04-28/round_11/raw/normative_diff_check.jsonverdict=NORMATIVE_TABLES_BYTE_IDENTICAL— every Normative table row (§3.1 sub-metric IDs / dimension assignments / MVP-8 annotations, §4.1 threshold values + monotonicity direction, §5.1 work-surfaces 5 / §5.2 prohibitions 4 / §5.3 harness, §5.4 router-profile↔gate binding 3, §6.1 value_vector 7 axes / §6.2 decision rules 4 / §6.3 minimum-fields / §6.4 reactivation triggers 6, §7.1 source-of-truth path / §7.2 platform priority 3 / §7.3 packaging gate 5 / §7.4 marketplace boundary, §8.1 8-step order / §8.2 6-evidence list / §8.3 multi-round rule, §11.1 forever-out 4 / §11.2 deferred 4 / §11.3 boundary guard) is unchanged; only§3 intro+§3.1 heading+§3.2 item 2+§8.1 step 3+§8.2 evidence #2+§13.3 bullet 1+§13.4 sub-metric+§13.4 thresholdInformative prose-count integers moved. tools/spec_validator.pyextended to accept BOTH v0.1.0 and v0.2.0-rc1 spec paths (backward-compat preserved):DEFAULT_SPECflips to.local/research/spec_v0.2.0-rc1.md,SCRIPT_VERSION 0.1.1 → 0.1.2, per-specEXPECTED_R6_PROSE_BY_SPEC+EXPECTED_THRESHOLD_CELLS_PROSE_BY_SPECmaps (v0.1.0 = 28+21,v0.2.0-rc1 = 37+30) auto-selected from spec frontmatter viadetect_spec_version. The_threshold_prose_numbers_in_section13helper is narrowed to the §13.4 subsection only (stops at the next###/##heading) so the v0.2.0-rc1 reconciliation-log appendices’ historical “§4 阈值表 21 个数” strings are correctly ignored.--strict-prose-countnow PASSES 8/8 against v0.2.0-rc1 — closes the v0.1.0 ship-report known limitation that previously expectedR6_KEYS+THRESHOLD_TABLEstrict-prose failures (reconciliation sentinel against v0.1.0 is preserved by design: v0.1.0 prose 28/21 still mismatches template TABLE 37/30, so--strict-prose-count --spec .local/research/spec_v0.1.0.mdstill FAILs intentionally).--jsondefault-mode 8/8 PASS unchanged for both v0.1.0 (backward-compat) and v0.2.0-rc1 (post-Round-11 default)..rules/si-chip-spec.mdc+.rules/.compile-hashes.json+AGENTS.mdrecompiled from the v0.2.0-rc1 spec; drift-detection (.rules/check-rules-drift) clean. Before/after compile-hash snapshots recorded in.local/dogfood/2026-04-28/round_11/raw/rules_compile_hashes_before.json+rules_compile_hashes_after.json..local/research/spec_v0.2.0-rc1.md(NEW spec RC document): full §1-§13 + appendices + Reconciliation Log (a) Prose-count change table, (b) Frozen target counts, (c) Round 11 evidence file index, (d) Normative byte-identity declaration, (e) Backward-compat contract, (f) Post-RC ship-gate path. Templates’$schema_versionleft UNCHANGED (no field semantics moved; backward-compat preserved so Round 1-10 evidence files continue to validate). v0.2.0 final remains gated to Round 12 v2_tightened readiness verify per spec §4.2.- All 6 evidence files at
.local/dogfood/2026-04-28/round_11/(basic_ability_profile.yaml,metrics_report.yaml,router_floor_report.yaml,half_retire_decision.yaml,next_action_plan.yaml,iteration_delta_report.yaml) plusraw/derivations (normative_diff_check.jsonwith verdict=NORMATIVE_TABLES_BYTE_IDENTICAL,rules_compile_hashes_before/after.json,spec_validator.json= 8/8 PASS default-mode against v0.2.0-rc1,spec_validator_strict_prose.json= 8/8 PASS strict-prose-mode against v0.2.0-rc1,spec_validator_v0.1.0.json= 8/8 PASS default-mode against v0.1.0 backward-compat, re-derivedr4_near_miss_FP_rate_derivation.json,notes.md,aggregate_eval.log,aggregator_raw_output.yaml). docs/skills/si-chip-0.1.10.tar.gzdeterministic release tarball (SHA-25623fb3b20f066ac59a2df81fb3f377a0ef7b3d13c80f15ddfecd9b9e17739ff19; reproducible across rebuilds — verified viatar --sort=name --mtime='UTC 2026-04-28' --owner=0 --group=0 --numeric-owner --exclude='__pycache__' --exclude='test_*.py' --exclude='DESIGN.md' -czftwice yielding identical hashes; same canonical layout as v0.1.0 through v0.1.9: 1 SKILL.md (7998 bytes; +283 vs v0.1.9’s 7715 from the spec_v0.2.0-rc1 references and 28 → 37 prose tweak) + 5 references + 3 scripts).- Round 10 (v0.1.9) dogfood:
G1_cross_model_pass_matrixnow hoisted intometrics_report.yaml; D4 generalizability dimension reaches 1/4 populated (G1) + 3/4 explicit null (G2/G3/G4) for v0.1.x (was 0/4 measured through Round 9). G1 ={composer_2: {trigger_basic: 0.9, near_miss: 0.83}, sonnet_shallow: {trigger_basic: 0.88, near_miss: 0.81}}— a 2-model × 2-pack nested dict collapsed from the mvp 8-cell router-test sweep viaaggregate_eval.hoist_g1_cross_model_pass_matrix(Round 10 Edit A) usingmax(pass_rate)acrossthinking_depthper(model, scenario_pack). PARTIAL PROXY DISCLOSURE: G1 from 2-model × 2-pack is a v0.1.x partial proxy — the authoritative G1 requires real-LLM runs against the full 96-cell profile (§5.3 full: 6 model × 4 depth × 4 pack); v2_tightened+ promotion gate ties G1 to the full matrix (Round 12 readiness check will revisit). The max-across-depths aggregation rule is MONOTONE under adding more depths / models, so the hoist is forward-compatible with the 96-cell upgrade. G2_cross_domain_transfer_pass, G3_OOD_robustness, G4_model_version_stability stay explicit null by scope for v0.1.x per master plan.local/.agent/active/v0.2.0-iteration-plan.yaml#round_10— cross-domain / OOD / model-version-stability require real-LLM runner upgrades that are v0.3.x+ work (Round 11 spec reconciliation decides whether they formally ship in v0.2.0 or defer). R1/R2 also remain null for the same v0.1.x-scope reason. Range sanity[0.0, 1.0]PASS for all 4 G1 cells. evals/si-chip/golden_set/(NEW): opt-in source fornines self-eval --golden-dircoverage of the D01 scoring_accuracy / D02 eval_coverage / D03 samples_dir_exists / D05 golden_set_path signals that Round 3’s nines run flagged at 0. Contents:trigger_basic.yaml(12 prompts; allexpected: trigger),near_miss.yaml(12 prompts; allexpected: no_trigger), andREADME.md(schema + non-authoritative disclosure + §11 compliance check). All 24 prompts are verbatim subsets of the existingevals/si-chip/cases/*.yamlshould_trigger/should_not_triggerentries — 0 new prompts were authored. Source-case distribution: trigger_basic draws 2 prompts each from profile_self + metrics_gap + router_matrix + half_retire_review + next_action_plan + docs_boundary (covers all 6 cases); near_miss draws 3 prompts from profile_self + 2 each from metrics_gap + router_matrix + half_retire_review + 3 from next_action_plan (covers 5 of 6 cases; docs_boundary is skipped because itsshould_not_triggerentries are legitimate in-scope operations, not true near_miss prompts). The deterministic runner does NOT consume these YAMLs this round; they are opt-in for futurenines --golden-dir evals/si-chip/golden_set/invocations and a downstream real-LLM eval harness (v0.3.x+ upgrade path)..agents/skills/si-chip/scripts/aggregate_eval.pyaddshoist_g1_cross_model_pass_matrix(router_floor_report, *, packs)with permissive-default error handling — accepts BOTH flat top-levelcells:(Round 1-8 legacy shape + test fixtures) AND nestedmvp_profile.cells:(Round 9+ emitted shape), threads the pre-existingrouter_floor_reportparameter throughbuild_reportto populatemetrics.generalizability.G1_cross_model_pass_matrix, addsg1_derivationsection to provenance (collapse rule + per-cell trace + partial-proxy disclosure + spec §5.1 compliance note).SCRIPT_VERSION 0.1.5 → 0.1.6(Round 10 Edit A).build_reportdocstring updated to document the Round 10 hoist.--router-floor-reportCLI flag help text extended to mention the Round 10 G1 hoist. Sibling test extended with 11 new tests (120 total; was 109 before Round 10) —HoistG1Tests(7 tests: happy path on mvp 8-cell sweep, nested mvp_profile shape, None report → None, empty cells → None, malformed cells skipped with skip counter, range invariant all cells in[0.0, 1.0], pack-filter narrowing shape) +BuildReportG1Integration(4 tests: end-to-end build_report populates G1 with expected 2-model × 2-pack shape +g1_derivationin provenance, null router_floor_report keeps G1 null + G2/G3/G4 null, G1 populated coexists with G2/G3/G4 null, 28-key invariant preserved after G1 fill with D4 exactly 1/4 populated).- All 6 evidence files at
.local/dogfood/2026-04-28/round_10/(basic_ability_profile.yaml,metrics_report.yaml,router_floor_report.yaml,half_retire_decision.yaml,next_action_plan.yaml,iteration_delta_report.yaml) plusraw/derivations (g1_derivation.jsonwith full 2×2 matrix + per-cell max derivation + partial-proxy disclosure,golden_set_index.jsonenumerating file paths + per-pack prompt counts + source-case distribution + §11 compliance check, re-derivedr4_near_miss_FP_rate_derivation.json,notes.md,eval_run.log,aggregate_eval.log,aggregator_raw_output.yaml,spec_validator.json= 8/8 PASS unchanged from Round 9). docs/skills/si-chip-0.1.9.tar.gzdeterministic release tarball (SHA-2566b4af14cbadae74ed7850331d642777a91006c0ea33140feb437b67523654f48; reproducible across rebuilds — verified viatar --sort=name --mtime='UTC 2026-04-28' --owner=0 --group=0 --numeric-owner --exclude='__pycache__' --exclude='test_*.py' --exclude='DESIGN.md' -czftwice yielding identical hashes; same canonical layout as v0.1.0 through v0.1.8: 1 SKILL.md + 5 references + 3 scripts; 46,803 bytes).- Round 9 (v0.1.8) dogfood:
R8_description_competition_indexnow hoisted intometrics_report.yaml; D6 routing-cost dimension reaches 6/8 populated (R3+R4+R5+R6+R7+R8) + 2/8 permanently out-of-scope (R1+R2) for v0.1.x. R8 = 0.043478260869565216 (count_tokens.skill_md_description_competition_indexwithmethod="max_jaccard"; max Jaccard across 23/root/.claude/skills/lark-*/SKILL.mdneighbor descriptions; top offender =lark-whiteboard-cliat 1/23 shared token; byte-identical to C6 by construction — same formula family on same neighbor set; cross-validated inraw/r8_derivation.json#c6_vs_r8_consistency_check). R1/R2 stay null because they require per-prompt routing-descriptor-match confidence that the deterministic simulator does not produce (no §4.1 hard threshold on either; real-LLM confusion-matrix upgrade path documented for Round 12). Range sanity[0.0, 1.0]PASS.method="tfidf_cosine_mean"is ALSO implemented and unit-tested as an ALTERNATIVE surfacing AVERAGE competition across the neighbor set (complementary to max_jaccard WORST-offender signal); Round 9 choosesmax_jaccardas default per master plan risk_flag (Jaccard infra battle-tested in Round 7 C6). D6 is now at the v0.1.x FINAL COVERAGE STATE — R1/R2 are permanently out-of-scope for v0.1.x; R8 fill closes the last in-scope D6 gap. - Round 9 router-test matrix widening (ADDITIVE):
templates/router_test_matrix.template.yaml$schema_versionbumped 0.1.0 → 0.1.1. ADDITIVEintermediateprofile (16 cells = 2 model × 2 thinking_depth × 4 scenario_pack) added alongsidemvp:8andfull:96(BOTH profiles STRUCTURALLY UNCHANGED; only added a new sibling).intermediateis metadata-retrieval / kNN-heuristic widening per spec §5.1 surface 1+2 — explicitly NOT §5.2 router-model training (forbidden in perpetuity per §11.1).gate_binding: relaxed(same as mvp; NOT a §5.4 binding escalation — v2_tightened+ still requires the full 96-cell matrix per §5.3). The 8 new scenario cells covermulti_skill_competition(HARDER than trigger_basic; pass_rates 0.70-0.82) andexecution_handoff(in-between; pass_rates 0.78-0.85) from the spec §5.3 full-profile scenario_pack set; values hard-coded deterministic anchors inevals/si-chip/runners/with_ability_runner.py::INTERMEDIATE_EXTRA_CELL_OUTCOMES(no LLM invocation; no RNG; byte-identical across rebuilds).cell_countsupdated to{mvp: 8, intermediate: 16, full: 96}preserving the2*2*2=8and6*4*4=96identities. evals/si-chip/runners/with_ability_runner.pyaddssimulate_router_sweep(profile_name)andintersection_router_floor(a, b)helpers (Round 9 Edit E; SCRIPT_VERSION0.1.0 → 0.1.1).simulate_router_sweepemits deterministic 8-cell mvp or 16-cell intermediate sweeps via a sharedMVP_CELL_OUTCOMESdict (ensuring the 8 cells that overlap with mvp are BYTE-IDENTICAL across profiles) plus the 8 newINTERMEDIATE_EXTRA_CELL_OUTCOMESentries.intersection_router_floor(mvp, intermediate)returns the cheapest(model, depth)tuple that passes on BOTH profiles — for Round 9, both converge oncomposer_2/default.tools/spec_validator.pycheck_router_matrix_cellsnow accepts BOTH$schema_version: "0.1.0"and"0.1.1"(backward compat;SUPPORTED_ROUTER_TEMPLATE_SCHEMAS = {"0.1.0", "0.1.1"}). On 0.1.1, additionally asserts the intermediate invariants:cell_counts.intermediate == 16,profiles.intermediate.cells == 16,profiles.intermediate.gate_binding == "relaxed"(same as mvp; explicit check so a §5.4 binding escalation disguised as an additive change can NOT sneak through), andlen(models) * len(thinking_depths) * len(scenario_packs) == 16(axis-product cross-check).SCRIPT_VERSION 0.1.0 → 0.1.1.--jsondefault-mode still exits 0 with 8/8 PASS unchanged.tools/test_spec_validator.py(NEW; 7 unit tests covering: (1) real template @ 0.1.1 happy path with full intermediate invariants asserted, (2) axis-product 2×2×4=16 sanity check, (3) synthetic 0.1.0 template backward-compat passes without intermediate block, (4) wrong intermediate cell count (12 instead of 16) fails BLOCKER, (5) wrong intermediategate_binding="standard"fails (catches attempted §5.4 binding escalation), (6) unsupported$schema_version="0.2.0"fails, (7) end-to-endtools/spec_validator.py --jsonsubprocess 8/8 PASS with ROUTER_MATRIX_CELLS reporting schema 0.1.1)..agents/skills/si-chip/scripts/count_tokens.pyaddstokenize_description_list(text, stopwords)(list variant preserving duplicates for TF-IDF),tf_idf_vector(tokens, corpus)(sklearn-style smoothed TF-IDF:IDF = ln((1+N)/(1+df))+1; deterministic sorted iteration; empty input / empty corpus return{}),cosine_similarity(a, b)(sparse dict cosine; returns 0.0 on empty/zero-norm vectors — no div-by-zero), andskill_md_description_competition_index(skill_md_path, neighbor_skill_md_paths, *, method="max_jaccard", stopwords)with bothmethod="max_jaccard"(default; reuses Round 7 Jaccard infra) andmethod="tfidf_cosine_mean"(ALTERNATIVE AVERAGE-competition signal) (Round 9 Edit A). Helpers raiseValueErroron empty neighbor list or unknown method (workspace “No Silent Failures”).SCRIPT_VERSION 0.1.2 → 0.1.3. Sibling test extended with 25 new tests (90 total; was 65 before Round 9) — 3TokenizeDescriptionListTests, 6TfIdfVectorTests(determinism, empty inputs, rarer-term-higher-idf, sorted keys, all-positive weights), 7CosineSimilarityTests(identical/disjoint/partial/empty/zero-norm/range invariant), 9SkillMdDescriptionCompetitionIndexTests(max_jaccard default + tfidf_cosine_mean deterministic + partial overlap + empty/unknown-method raises + missing base raises + missing neighbor logged + REAL Si-Chip smoke for both methods verifying R8 == C6 byte-identically when method=max_jaccard)..agents/skills/si-chip/scripts/aggregate_eval.pyaddshoist_r8_description_competition_index(skill_md_path, neighbor_skill_md_paths, *, method, strict)with permissive-default error handling (ValueError/FileNotFoundError/unexpected →(None, {"error": ...})unlessstrict=True), threadsr8_methodparameter throughbuild_report, adds--r8-method {max_jaccard, tfidf_cosine_mean}CLI flag (defaultmax_jaccard), reuses existing--neighbor-skills-glob/--skill-md/--references-dirflags so a single invocation populates BOTH C6 and R8 consistently. Provenance block gainsr8_derivation(method name + note + per-neighbor similarity + chosen value + spec §5.1 compliance string).SCRIPT_VERSION 0.1.4 → 0.1.5(Round 9 Edit B). Sibling test extended with 8 new tests (109 total; was 101 before Round 9) —HoistR8Testscovering max_jaccard happy path, tfidf_cosine_mean path, None skill_md → None, empty neighbor list → None, unknown method → None (not raise), build_report populates R8 +r8_derivationin provenance, 28-key invariant preserved after R8 fill, Round-8-compat code path (no neighbors) leaves R8 null.- All 6 evidence files at
.local/dogfood/2026-04-28/round_9/(basic_ability_profile.yaml,metrics_report.yaml,router_floor_report.yaml,half_retire_decision.yaml,next_action_plan.yaml,iteration_delta_report.yaml) plusraw/derivations (r8_derivation.jsonwith full per-neighbor Jaccard trace + C6-vs-R8 cross-validation check,router_sweep_intermediate.jsonwith full 16-cell intermediate sweep + intersection floor derivation, re-derivedr4_near_miss_FP_rate_derivation.json,notes.md,eval_run.log,aggregate_eval.log,aggregator_raw_output.yaml,spec_validator.json= 8/8 PASS). docs/skills/si-chip-0.1.8.tar.gzdeterministic release tarball (SHA-2560263e4330806cbae1bf5aef28496b35ed3068b16902745ba9c09d518e45f0ba1; reproducible across rebuilds — verified viatar --sort=name --mtime='UTC 2026-04-28' --owner=0 --group=0 --numeric-owner --exclude='__pycache__' --exclude='test_*.py' -czftwice yielding identical hashes; same canonical layout as v0.1.0 through v0.1.7: 1 SKILL.md + 5 references + 3 scripts at the top level).- Round 8 (v0.1.7) dogfood:
V1_permission_scope,V2_credential_surface,V3_drift_signal, andV4_staleness_daysnow hoisted intometrics_report.yaml; D7 governance-risk dimension reaches 4/4 measured (FULLY COMPLETE) atv1_baseline(was 0/4 through Round 7). D7 is the third R6-taxonomy dimension to reach full sub-metric coverage (after D5 in Round 6 and D2 in Round 7) and the first to hit 4/4 with NO by-design-null cells — all four sub-metrics derive from deterministic static inputs. V1 = 0 (governance_scan.scan_permission_scopewalks the skill’s Python + shell source for hardcoded absolute write paths OUTSIDE.local/dogfood/,.agents/skills/si-chip/,.cursor/skills/si-chip/,.claude/skills/si-chip/; Si-Chip scripts route all writes through caller-provided--outarguments → 0 clean). V2 = 0 (governance_scan.scan_credential_surfaceruns 4 pattern regexes —aws_access_key,generic_high_entropy_40,pem_private_key,credential_assignment— against every skill artifact body; CRITICAL invariant: scanner NEVER logs the matched value verbatim — only pattern name + file path + per-file count; unit-test verified intools/test_governance_scan.py::test_must_not_log_secret_valuewhich captures log output and asserts the secret body never leaks). V3 = 0.0 (governance_scan.scan_drift_signalcomputes SHA-256 byte-equality ratio across the 3 SKILL.md mirrors; all 3 byte-equal post-0.1.7 sync → drift_zero_ratio = 3/3 = 1.0 → V3 = 0.0; matches v0.1.0 ship report ALL_TREES_DRIFT_ZERO verdict). V4 = 0 (governance_scan.scan_staleness_daysparsesbasic_ability_profile.lifecycle.last_reviewed_at= 2026-04-28; today = 2026-04-28 → delta = 0 days same-day review). - Round 8 audit-gap closure:
half_retire_decision.yamlvalue_vector.governance_risk_deltais now DERIVED LIVE viagovernance_scan.compute_governance_risk_delta(V1, V2, V3, V4)rather than the hard-coded0.0literal Rounds 1-7 used. Formula:risk_with = min(V1, 1)*0.25 + min(V2, 1)*0.25 + V3*0.25 + min(V4/30, 1)*0.25; risk_without = 0.0 (no-ability baseline has no filesystem/credential/mirror interaction); governance_risk_delta = risk_without - risk_with. For Si-Chip v0.1.7 with V1=V2=V3=V4=0 (clean baseline),risk_with = 0.0and the delta is still numerically 0.0 — BUT the number now TRACES through a computable function, not a literal. Any future V1/V2/V3/V4 regression moves governance_risk_delta toward negative automatically (V1 rising to 1 moves it to -0.25; V1 + V2 rising to 1 each moves it to -0.50 which approaches the spec §6.2disable_auto_triggerthreshold). See.local/dogfood/2026-04-28/round_8/half_retire_decision.yaml#governance_risk_delta_derivationfor the full live-derivation trace. tools/governance_scan.py(NEW):scan_permission_scope(repo_root, skill_paths, allowed_prefixes)with Python write-call + shell redirect regex scanners;scan_credential_surface(repo_root, skill_paths, extra_artifacts)with 4 credential-pattern regexes and a log-values-never guarantee;scan_drift_signal(skill_mirrors)with SHA-256 pairwise byte-equality ratio;scan_staleness_days(basic_ability_profile_path, today)with ISO-8601 date parsing + future-date guard;build_governance_report(...)composing all 4 into the payload shape consumed byaggregate_eval.py --governance-report;compute_governance_risk_delta(V1, V2, V3, V4, v4_staleness_cap_days=30)the live-derivation helper replacing the hard-coded 0.0; CLI surface--repo-root,--skill-path,--skill-mirror,--basic-ability-profile,--today,--json,--verbose. Script version 0.1.0. SAFETY: V2 scanner never logs the matched value — only pattern name + file + count (enforced by unit test).tools/test_governance_scan.py(NEW): 35 unit tests covering 7 ScanPermissionScopeTests (empty → 0, real si-chip tree → 0, positive-count/etc/write fixture, missing-path raises, in-scope.local/dogfood/write not counted, relative path in-scope, mirror target in-scope), 8 ScanCredentialSurfaceTests (empty → 0, real si-chip tree → 0, AWS key fixture, PEM fixture, assignment fixture, MUST-NOT-log-secret-value CRITICAL test that captures log output and asserts secret body never leaks, missing-path raises, extra_artifacts deduped), 7 ScanDriftSignalTests (2 identical → 0, 3 identical → 0, one divergent → 2/3, real si-chip mirrors → 0, <2 mirrors raises, missing mirror raises, range invariant), 7 ScanStalenessDaysTests (same-day → 0, 30-day → 30, missing file raises, malformed YAML raises, missing last_reviewed_at raises, bad ISO date raises, future date raises), 2 BuildGovernanceReportTests (real si-chip all-zero + provenance keys, CLI JSON round-trip), 4 ComputeGovernanceRiskDeltaTests (all-zero → 0, V1=1 → -0.25, worst-case clamped to -1.0, v4_staleness_cap validation). All 35 tests PASS..agents/skills/si-chip/scripts/aggregate_eval.pyaddshoist_v1_permission_scope,hoist_v2_credential_surface,hoist_v3_drift_signal,hoist_v4_staleness_dayshoisters plus a newgovernance_reportparameter onbuild_reportand a new--governance-reportCLI flag +_maybe_load_governance_reporthelper (Round 8 Edit B). Provenance block now includesv1_derivation,v2_derivation,v3_derivation,v4_derivationsections (all four include aloadedboolean that is True only when the aggregator received a governance_report; degenerate path →loaded: Falseper spec §3.2 explicit-null contract + workspace “No Silent Failures” rule). Script version bumps0.1.3 → 0.1.4. Sibling test extended with 18 new tests (98 total; was 80 before Round 8) — 5 HoistV1Tests, 4 HoistV2Tests, 4 HoistV3Tests, 4 HoistV4Tests, 4 BuildReportV1V4Integration (includingtest_round_8_keeps_28_key_invariantthat verifies the 37-key aggregator contract after Round 8,test_full_instrumentation_populates_v1_v2_v3_v4that verifies all four land populated when the report is present,test_missing_governance_report_keeps_v1_v4_nullthat verifies the Round 7 code path stays null-safe, andtest_governance_risk_delta_derived_live_not_hardcodedthat verifies the Round 8 acceptance criterion #5 contract).- All 6 evidence files at
.local/dogfood/2026-04-28/round_8/(basic_ability_profile.yaml,metrics_report.yaml,router_floor_report.yaml,half_retire_decision.yaml,next_action_plan.yaml,iteration_delta_report.yaml) plusraw/derivations (governance_scan.jsonwith full V1-V4 + provenance,c5_c6_carryover.jsondocumenting byte-identical carry-over of C5/C6 from Round 7, re-derivedr4_near_miss_FP_rate_derivation.json,notes.md,eval_run.log,aggregate_eval.log,aggregator_raw_output.yaml,spec_validator.json). docs/skills/si-chip-0.1.7.tar.gzdeterministic release tarball (SHA-256718f46756801687c2ec5448dc00260361177fba33e526130d368583a63bbdc19; reproducible across rebuilds — verified viatar --sort=name --mtime='UTC 2026-04-28' --owner=0 --group=0 --numeric-owner --exclude='__pycache__' --exclude='test_*.py' -czftwice yielding identical hashes; same canonical layout as v0.1.0 through v0.1.6: 1 SKILL.md + 5 references + 3 scripts; 38,500 bytes).- Round 7 (v0.1.6) dogfood:
C5_context_rot_riskandC6_scope_overlap_scorenow hoisted intometrics_report.yaml; D2 context-economy dimension reaches 5/6 measured + 1 by-design null (FULL MEASUREMENT-ATTEMPT COVERAGE) atv1_baseline(was 3/6 measured after Round 6; C3 resolved_tokens stays permanently null by design — on-demand reference loading has no deterministic static value). C5 = 0.2601 (deterministic heuristic proxy; formulaclip(body_tokens/typical_window + 0.05*fanout_depth, 0, 1)=clip(2020/200_000 + 0.05*5, 0, 1); body_tokens=2020 fromcount_tokens(SKILL.md body), typical_window=200_000 matching Sonnet 4.6 baseline perreferences/metrics-r6-summary.md, fanout_depth=5 for the 5.mdfiles under.agents/skills/si-chip/references/literally named in SKILL.md body; range sanity[0.0, 1.0]PASS; master plan Round 7 risk_flag explicitly flags C5 as a v1_baseline-acceptable heuristic proxy — full ground-truth is a Round 12 / real-LLM-runner upgrade candidate). C6 = 0.0435 (max Jaccard similarity between Si-Chip SKILL.md description tokens and 23 neighbor SKILL.md descriptions under/root/.claude/skills/lark-*/SKILL.md; no neighbor exceeds 1/23 token overlap —lark-whiteboard-clitops the pack; range sanity PASS). D2 is the second R6-taxonomy dimension to reach full measurement-attempt coverage (after D5 in Round 6). .agents/skills/si-chip/scripts/count_tokens.pyaddscontext_rot_risk(body_tokens, fanout_depth, typical_window=200_000)with doctests,skill_md_context_rot_risk(skill_md_path, references_dir, typical_window)wrapper,jaccard_similarity(a, b),tokenize_description(text, stopwords)(NFKD + lowercase + strip ASCII-non-alphanumeric + split + stopword filter), andskill_md_scope_overlap_score(skill_md_path, neighbor_skill_md_paths, stopwords)max-reduction helper (Round 7 Edit A). Helpers raiseValueErroron negative inputs (workspace “No Silent Failures” rule; no silent clamping). Script version bumps0.1.1 → 0.1.2. Sibling test.agents/skills/si-chip/scripts/test_count_tokens.pyextended with 33 new tests (65 total; was 32 before) — 9 ContextRotRiskTests, 6 SkillMdContextRotRiskTests, 6 JaccardSimilarityTests, 6 TokenizeDescriptionTests, 6 SkillMdScopeOverlapScoreTests; plus 8 new doctests..agents/skills/si-chip/scripts/aggregate_eval.pyaddshoist_c5_context_rot_risk(skill_md_path, references_dir, typical_window, strict)andhoist_c6_scope_overlap_score(skill_md_path, neighbor_skill_md_paths, strict)hoisters, plusreferences_dir+neighbor_skill_md_pathsparameters onbuild_report, plus new CLI flags--references-dir(default.agents/skills/si-chip/references) and--neighbor-skills-glob(default/root/.claude/skills/lark-*/SKILL.md) withglob.globexpansion (Round 7 Edit B). Provenance block now includesc5_derivationandc6_derivationsections with full reproducibility info (inputs, formula, computed value, range-sanity check). Script version bumps0.1.2 → 0.1.3. Sibling test extended with 15 new tests (80 total; was 65 before) — 4 HoistC5Tests, 5 HoistC6Tests, 6 BuildReportC5C6Integration including a subprocess-driven CLI flag round-trip test.- All 6 evidence files at
.local/dogfood/2026-04-28/round_7/(basic_ability_profile.yaml,metrics_report.yaml,router_floor_report.yaml,half_retire_decision.yaml,next_action_plan.yaml,iteration_delta_report.yaml) plusraw/derivations (c5_derivation.jsonwith formula + inputs + computed value + plan-vs-actual-note,c6_overlap_pairs.jsonwith base-token-set + every neighbor + per-pair Jaccard, re-derivedr4_near_miss_FP_rate_derivation.json,notes.md,eval_run.log,aggregate_eval.log,aggregator_raw_output.yaml,spec_validator.json). docs/skills/si-chip-0.1.6.tar.gzdeterministic release tarball (SHA-25608e4e8d926891b346bf8a7a68910d9321a66f1be202ef0d140156ae9211dbd44; reproducible across rebuilds — verified viatar --sort=name --mtime='UTC 2026-04-28' --owner=0 --group=0 --numeric-owner -czftwice yielding identical hashes; same canonical layout as v0.1.0/v0.1.1/v0.1.2/v0.1.3/v0.1.4/v0.1.5: 1 SKILL.md + 5 references + 3 scripts; 36,730 bytes).- Round 6 (v0.1.5) dogfood:
U3_setup_steps_countandU4_time_to_first_successnow hoisted intometrics_report.yaml; D5 usage-cost dimension reaches 4/4 sub-metric coverage (FULLY COMPLETE) atv1_baseline(was 2/4 after Round 5). U3 = 1 (count of explicit user-facing steps in the canonical non-interactive one-linercurl ... | bash -s -- --yes --target ... --scope ...; sourced fromtools/install_telemetry.py::count_setup_stepswhich parses the new self-reported# SI_CHIP_INSTALLER_STEPS=1header added toinstall.sh; VALIDATES the CHANGELOG v0.1.1 one-line-installer claim; v1_baseline target ≤ 2 PASS). U4 = 0.0073 s (wall-clock frombash install.sh --dry-run --yes --target cursor --scope repo --repo-root <tmp>spawn to first[OK] Installedstdout line; dry-run floor estimate — real wall-clock adds HTTP download + tarball extraction overhead; real wall-clock is opt-in viadry_run=False; v1_baseline sanity ceiling ≤ 60 s PASS by 8200x margin). D5 is the first R6-taxonomy dimension to reach full sub-metric coverage. tools/install_telemetry.py(NEW):count_setup_steps(install_script_path)parses# SI_CHIP_INSTALLER_STEPS=Nheader with a legacyread -p/read -rfallback;time_first_success(install_script_path, dry_run=True)usessubprocess.runto time the installer from spawn to first[OK] Installedline with a 60 s timeout and None-on-failure degenerate path;build_telemetry_payloadcomposes the JSON shape consumed byaggregate_eval.py --install-telemetry; CLI surface--install-sh,--no-dry-run,--timeout-s,--json,--verbose. Script version 0.1.0.tools/test_install_telemetry.py(NEW): 20 unit tests covering 10 CountSetupStepsTests (happy paths for 0/1/2 steps, fallback prompt-count, commented-read ignored, no-prompts, missing file, negative rejected, header precedence, real repo install.sh U3 == 1 smoke), 7 TimeFirstSuccessTests (happy path, rc != 0, missing[OK]line, TimeoutExpired, bash-missing, missing install.sh, opt-in real dry-run smoke gated bySI_CHIP_RUN_DRY_RUN=1), 3 BuildTelemetryPayloadTests (shape, u4=None preserved, JSON round-trip). All 20 pass (1 skipped — opt-in real dry-run).install.shself-reported# SI_CHIP_INSTALLER_STEPS=1header (Round 6 Edit B; additive comment, does not affect any existing flag): declares the canonical non-interactive one-liner step count fortools/install_telemetry.py::count_setup_stepsto parse. Per the head-of-file comment, the interactive flow fires--target+--scopeprompts and counts as 3; the headline non-interactive flow promoted in INSTALL.md / docs/_install_body.md / CHANGELOG v0.1.1 is unambiguously 1 step..agents/skills/si-chip/scripts/aggregate_eval.pyaddshoist_u3_setup_steps_count(install_telemetry)andhoist_u4_time_to_first_success(install_telemetry)plus a newinstall_telemetryparameter onbuild_reportand a new--install-telemetryCLI flag +_maybe_load_install_telemetryhelper (Round 6 Edit C). Script version bumps0.1.1 → 0.1.2. Sibling test extended (65 tests total; 17 new in Round 6 — 9 HoistU3Tests, 8 HoistU4Tests, 5 BuildReportU3U4Integration includingtest_round_6_full_d5_coverage_populatedverifying all 4 D5 sub-metrics populate simultaneously;test_round_6_keeps_28_key_invariantpreserves the spec §3.2 frozen 28-key contract).- All 6 evidence files at
.local/dogfood/2026-04-28/round_6/(basic_ability_profile.yaml,metrics_report.yaml,router_floor_report.yaml,half_retire_decision.yaml,next_action_plan.yaml,iteration_delta_report.yaml) plusraw/derivations (install_telemetry.json,install_dry_run.log,install_telemetry.log, carriedr4_near_miss_FP_rate_derivation.json,notes.md,eval_run.log,aggregate_eval.log,aggregator_raw_output.yaml,spec_validator.json). docs/skills/si-chip-0.1.5.tar.gzdeterministic release tarball (SHA-256bdf01f0520fc670880f4c1c5ae16dcff1d8f5378b0f23e60ec6f3e3cc5125fd5; reproducible across rebuilds; same canonical layout as v0.1.0/v0.1.1/v0.1.2/v0.1.3/v0.1.4: 1 SKILL.md + 5 references + 3 scripts; 30,428 bytes).- Round 5 (v0.1.4) dogfood:
U1_description_readabilityandU2_first_time_success_ratenow hoisted intometrics_report.yaml; D5 usage-cost dimension reaches 2/4 sub-metric coverage atv1_baseline(was 0/4 through Round 4). U1 = 19.58 (Flesch-Kincaid grade level of the SKILL.md frontmatter description; 20 words / 2 sentences / 53 syllables; range sanity PASS at[0.0, 24.0]; master plan risk_flag acknowledged — the dense technical vocabulary drives the grade high). U2 = 0.75 (45 correct should_trigger / 60 total should_trigger across 6 cases; deterministic single-pass runner meanscorrect==True on expected=="trigger"IS first-time success by construction; no new runner needed). .agents/skills/si-chip/scripts/count_tokens.pyaddsflesch_kincaid_grade(text)plus the three_fk_count_{syllables,words,sentences}helpers, theextract_description_from_frontmatter(fm)parser, and theskill_md_description_fk_grade(path)wrapper (Round 5 Edit A). A new--fk-descriptionCLI flag surfaces the FK grade alongside the pre-existing token counts. Script version bumps0.1.0 → 0.1.1. Sibling test.agents/skills/si-chip/scripts/test_count_tokens.py(NEW) covers the helpers with 32 unit tests (+ doctests) — vowel-group heuristic, silent-e adjustment, digit-period stripping, empty-text clamping, determinism across calls, SKILL.md extraction..agents/skills/si-chip/scripts/aggregate_eval.pyaddshoist_u1_description_readability(skill_md_path)(wraps the count_tokens FK helper) andhoist_u2_first_time_success_rate(with_rows)(sums per-prompt outcomes across cases) plus a newskill_md_pathparameter onbuild_report(Round 5 Edit B). Script version bumps0.1.0 → 0.1.1. Sibling test extended (43 tests total; 15 new for U1/U2 coverage) — HoistU1Tests, HoistU2Tests, BuildReportU1U2Integration, and the 28-key invariant carried through Round 5.- All 6 evidence files at
.local/dogfood/2026-04-28/round_5/(basic_ability_profile.yaml,metrics_report.yaml,router_floor_report.yaml,half_retire_decision.yaml,next_action_plan.yaml,iteration_delta_report.yaml) plusraw/derivations (u1_fk_derivation.json, re-derivedr4_near_miss_FP_rate_derivation.json,notes.md,eval_run.log,aggregate_eval.log,aggregator_raw_output.yaml,spec_validator.json). docs/skills/si-chip-0.1.4.tar.gzdeterministic release tarball (SHA-2567ef685ba5de0cfee1d77459ebc8bb0b4b18e973df0b1ae3fbfacb74162c1fb82; reproducible across rebuilds; same canonical layout as v0.1.0/v0.1.1/v0.1.2/v0.1.3: 1 SKILL.md + 5 references + 3 scripts).- Round 4 (v0.1.3) dogfood:
L1_wall_clock_p50,L3_step_count, andL4_redundant_call_rationow hoisted from per-case runner instrumentation intometrics_report.yaml; D3 latency-path dimension reaches 4/7 sub-metric coverage atv1_baseline(was 1/7 in Round 3). L1 = 1.2153 s (L1 ≤ L2 sanity invariant PASS), L3 = 20 (integer ≥ 1), L4 = 0.0 (degenerate-but-valid per master plan risk_flag; unique prompt_ids per case in the simulated runner). evals/si-chip/runners/with_ability_runner.pyaddspercentile_p50,step_count_from_outcomes, andredundant_call_ratio_from_outcomeshelpers (Round 4 Edit A). Per-caselatency_p50_s/step_count/redundant_call_ratiosurfaced toresult.jsonalongside Round 3’s R7 instrumentation. Sibling test extended (25 tests; 17 new in Round 4).evals/si-chip/runners/no_ability_runner.pymirrors the same L1/L3/L4 plumbing (Round 4 Edit B). New test fileevals/si-chip/runners/test_no_ability_runner.py(21 tests) covers helpers, fields, determinism, and L1 ≤ L2 sanity invariant on the baseline arm..agents/skills/si-chip/scripts/aggregate_eval.pyaddshoist_l1_wall_clock_p50(mean latency_p50_s),hoist_l3_step_count(round(mean step_count)), andhoist_l4_redundant_call_ratio(mean redundant_call_ratio clamped to [0,1]) (Round 4 Edit C). Sibling test extended (27 tests; 12 new in Round 4) covering hoist correctness, degenerate paths, L1 ≤ L2 invariant, and 28-key contract.- All 6 evidence files at
.local/dogfood/2026-04-28/round_4/(basic_ability_profile.yaml,metrics_report.yaml,router_floor_report.yaml,half_retire_decision.yaml,next_action_plan.yaml,iteration_delta_report.yaml) plusraw/derivations (l1_l3_l4_derivation.json, re-derivedr4_near_miss_FP_rate_derivation.jsonandr7_derivation.json,notes.md,eval_run.log,aggregate_eval.log,aggregator_raw_output.yaml,spec_validator.json). docs/skills/si-chip-0.1.3.tar.gzdeterministic release tarball (SHA-25634a5f084aa0b95e947561d51cfe99cd887cfb951b2d59364fb81edb7e98d9a55; reproducible across rebuilds; same canonical layout as v0.1.0/v0.1.1/v0.1.2: 1 SKILL.md + 5 references + 3 scripts).- Round 3 (v0.1.2) dogfood:
R6_routing_latency_p95andR7_routing_token_overheadnow hoisted from per-cell router-test data and per-prompt runner instrumentation intometrics_report.yaml; D6 routing-cost dimension reaches 8/8 sub-metric coverage atv1_baseline(was 6/8 in Round 2). R6 = 1100 ms (≤ 2000 ms ceiling, also passes v2_tightened ≤ 1200 ms); R7 = 0.0233 (≤ 0.20 ceiling, also passes v2_tightened 0.12 and v3_strict 0.08). evals/si-chip/runners/with_ability_runner.pyrecordsrouting_stage_tokens+body_invocation_tokensper prompt (Round 3 Edit A); per-case totals also surfaced for the aggregator. Sibling testevals/si-chip/runners/test_with_ability_runner.py(8 unit tests covering instrumentation determinism + v1_baseline ceiling)..agents/skills/si-chip/scripts/aggregate_eval.pyaddshoist_r6_routing_latency_p95(cells wherepass_rate >= 0.80→ minlatency_p95_ms) andhoist_r7_routing_token_overhead(sum routing / sum body) plus a new--router-floor-reportCLI flag (Round 3 Edit B). Sibling test.agents/skills/si-chip/scripts/test_aggregate_eval.py(12 unit tests + 4 doctests covering hoist logic, degenerate paths, and 28-key invariant).- All 6 evidence files at
.local/dogfood/2026-04-28/round_3/(basic_ability_profile.yaml,metrics_report.yaml,router_floor_report.yaml,half_retire_decision.yaml,next_action_plan.yaml,iteration_delta_report.yaml) plusraw/derivations (r4_near_miss_FP_rate_derivation.json,r7_derivation.json,notes.md,eval_run.log,spec_validator.json). docs/skills/si-chip-0.1.2.tar.gzdeterministic release tarball (SHA-256 reproducible across rebuilds; same canonical layout as v0.1.0/v0.1.1: 1 SKILL.md + 5 references + 3 scripts).
Removed
evals/si-chip/cases/reactivation_review.yaml(Round 13 SHIP-PREP REVERT-ONLY): the 7th eval case added in Round 12 commitd92c409under “Option A”. Per-casepass_rate=0.65under the deterministic SHA-256 simulator withseed=42dragged the 7-caseT2_pass_kmean DOWN from Round 11’s 0.5478 to Round 12’s 0.4950 (-0.0528). L0 confirmedPATH=REVERT-ONLY(the real-LLM runner cannot execute in this sandbox); Round 13 removes the case to restore the canonical 6-case Round 11 baseline byte-identically.evals/si-chip/baselines/with_si_chip_round12/(Round 13 SHIP-PREP REVERT-ONLY): Round-12-specific 7-case with-ability baselines directory (8 files: 7 case result.json + summary.json). Round 13 reverts to the 6-case Round 4 baselines underevals/si-chip/baselines/with_si_chip_round4/(tracked since commit13cc9aa, unchanged).evals/si-chip/baselines/no_ability_round12/(Round 13 SHIP-PREP REVERT-ONLY): Round-12-specific 7-case no-ability baselines directory (8 files). Round 13 reverts to the 6-case Round 4 no-ability baselines underevals/si-chip/baselines/no_ability/(tracked since commitcea4b86+13cc9aa, unchanged).
Changed
- v0.2.0 ship-commit (post-Round-13):
.agents/skills/si-chip/SKILL.mdfrontmatterversion: 0.1.12→version: 0.2.0;descriptionspec referencev0.2.0-rc1→v0.2.0; SKILL.md body §1 / “When NOT To Trigger” / “Out of Scope” / Provenance “v0.2.0-rc1” → “v0.2.0” (4 occurrences); Provenance spec path.local/research/spec_v0.2.0-rc1.md→.local/research/spec_v0.2.0.md. Mirrored to.cursor/skills/si-chip/SKILL.mdand.claude/skills/si-chip/SKILL.mdbyte-identical (3-tree DRIFT_ZERO; SHA-25612c63bad0f4d828fcaffacb892d756ab03fd0f7bf4b189a323e954f433dac372). C1_metadata_tokens 85 → 82 (-3;v0.2.0-rc19 tokens →v0.2.06 tokens under o200k_base). C2_body_tokens 2125 → 2122 (-3 from same cleanup). C4_per_invocation_footprint 3710 → 3704. - v0.2.0 ship-commit:
.local/research/spec_v0.2.0.md(NEW frozen spec) created by copying.local/research/spec_v0.2.0-rc1.mdand dropping-rc1;version: v0.2.0,status: frozen,promoted_from: v0.2.0-rc1. v0.2.0-rc1 file retained as pinned historical record. - v0.2.0 ship-commit:
.rules/si-chip-spec.mdcfrontmatterversion: v0.2.0-rc1→v0.2.0;status: release_candidate→frozen;source: .local/research/spec_v0.2.0-rc1.md→.local/research/spec_v0.2.0.md; H1 + intro paragraphs updated..rules/.compile-hashes.jsonrecompiled tofc8c2e0a350a6fa6via DevolaFlowRuleCompiler.compile_all().AGENTS.mdregenerated; drift-detectioncheck_rules_driftreportsagents_md: in_sync. - v0.2.0 ship-commit:
tools/spec_validator.pyextended to acceptv0.2.0spec path alongsidev0.2.0-rc1(backward-compat) andv0.1.0(Rounds 1-10 evidence).DEFAULT_SPECflips to.local/research/spec_v0.2.0.md.EXPECTED_R6_PROSE_BY_SPEC+EXPECTED_THRESHOLD_CELLS_PROSE_BY_SPEC+SUPPORTED_SPEC_VERSIONSaddv0.2.0(37 / 30).SCRIPT_VERSION 0.1.3 → 0.1.4.--jsondefault-mode 9/9 PASS against all three specs (v0.1.0,v0.2.0-rc1,v0.2.0);--strict-prose-count9/9 PASS againstv0.2.0andv0.2.0-rc1, FAILs againstv0.1.0by design (reconciliation sentinel preserved). - v0.2.0 ship-commit:
install.sh+docs/install.shdefaultSI_CHIP_VERSION_DEFAULT:v0.1.12→v0.2.0. Help-text installer bannerSi-Chip installer v0.1.0→v0.2.0. All 11 existing flags preserved verbatim. - v0.2.0 ship-commit:
docs/_install_body.md--versiondefault cells (English + Chinese rows):v0.1.12→v0.2.0. - v0.2.0 ship-commit:
docs/index.mdGitHub Pages landing replaces v0.1.0 status banner + headline numbers with v0.2.0 SHIP_ELIGIBLE banner + Round-13 metric values (pass_rate=0.85, trigger_F1=0.8934, per-invocation footprint=3602/3704, wall_clock_p95=1.4693 s, routing_latency_p95=1100 ms, R6 coverage 28+/37 across 6 of 7 dims, 6 reactivation triggers). Bilingual EN/zh-CN structure preserved. - v0.2.0 ship-commit:
docs/changelog.md(NEW) Jekyll-compatible web changelog page (permalink: /changelog/). Inlines a copy ofCHANGELOG.mdbecause Jekyll’sinclude_relativecannot escape thedocs/source tree via../(GitHub Pages safe-mode restriction); a NOTE comment in the file documents the sync requirement and points back to the canonical CHANGELOG.md. - v0.2.0 ship-commit:
.cursor/skills/si-chip/scripts/{profile_static,count_tokens,aggregate_eval}.pyand.claude/skills/si-chip/scripts/{profile_static,count_tokens,aggregate_eval}.pysynced to canonical.agents/skills/si-chip/scripts/versions (Round 9+ schema and live-derivation paths) — closes long-standing v0.1.0-baseline mirror drift in scripts (was previously synced only at v0.1.0 ship; the SKILL.md mirror was kept current per round but scripts mirrors lagged). All 3 trees now byte-identical across SKILL.md + references/ + scripts/{profile_static,count_tokens,aggregate_eval}.py. T2_pass_kRECOVERED from Round 12’s 0.49501905505102045 → Round 13’s 0.5477708333333333 (RECOVERY of +0.0528; matches Round 11 baseline EXACTLY). T1/T3/R3/R4/R7/U2/L1/L2 also RECOVERED to Round 11 byte-identical via the 7th-case revert. The Round 12 7th-case experiment is documented in Round 12’s evidence files as an honest negative result and the.local/dogfood/2026-04-28/round_12/directory is retained for traceability.- Round 13 iteration_delta task_quality axis delta = +0.0528 satisfies the §4.1 v1_baseline iteration_delta any-axis row (>= +0.05) WITHOUT needing a measurement-fill bonus flavour (in contrast to Rounds 4-12 which used measurement-fill flavours on D5/D2/D3/D6/D7/D4 dimensions to satisfy the clause). Round 13 is the FIRST round (since Round 4) where the iteration_delta clause is satisfied via a genuine task-axis movement.
- 10-row v1_baseline check Round 13: PASS (13th consecutive v1_baseline pass; Rounds 1-13 all clear every v1_baseline hard threshold).
.agents/skills/si-chip/SKILL.mdfrontmatterversion: 0.1.11→version: 0.1.12(canonical; Round 13 — body unchanged this round; only the frontmatter version field bumps), with mirrored bumps in.cursor/skills/si-chip/SKILL.mdand.claude/skills/si-chip/SKILL.md. Mirror drift verified = 0 across all 3 trees post-0.1.12 sync (V3_drift_signal re-verified 0.0; 3/3 SHA-256 byte-equalitydbefa2ba9938bd226ee08d1464a5d962d3ad65768ff240e61eef1c1d60471c9d). C1_metadata_tokens stays at 85 (no frontmatter description / when_to_use / spec-reference change between Round 12 and Round 13;0.1.11→0.1.12is a 0-token tokenizer-equivalent string change under o200k_base — verified viacount_tokens.py --both --jsonreportingmetadata_tokens=85both before and after).install.shanddocs/install.shdefaultSI_CHIP_VERSION_DEFAULTconstant:v0.1.11→v0.1.12(Round 13). No other install.sh edits this round (Round 6’s# SI_CHIP_INSTALLER_STEPS=1header + all 11 existing flags preserved verbatim). Override with--version v0.1.11(or earlier) to install a prior payload.docs/_install_body.md--versiontable cell:v0.1.11→v0.1.12(Round 13; English + Chinese rows).-
Round 13 promotion-state trace:
consecutive_v1_passes: 13(Rounds 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13);consecutive_v2_passes: 0(T2_pass_k structural blocker since Round 1; deterministic simulator pass_k_4 = pass_rate^4 PROXY structurally fails v2_tightened by -0.0022). v0.2.0 ships at gaterelaxed(= v1_baseline; same as v0.1.0); v2_tightened deferred to v0.3.0 pending real-LLM runner. .agents/skills/si-chip/SKILL.mdfrontmatterversion: 0.1.10→version: 0.1.11(canonical; Round 12 — body unchanged this round; only the frontmatter version field bumps), with mirrored bumps in.cursor/skills/si-chip/SKILL.mdand.claude/skills/si-chip/SKILL.md. Mirror drift verified = 0 across all 3 trees post-0.1.11 sync (V3_drift_signal re-verified 0.0; 3/3 SHA-256 byte-equalitydd5b7a392af2b1e8116e23bee478f87a62dbd8d86cbde0bc562613eab09eb40b). C1_metadata_tokens stays at 85 (no frontmatter description / when_to_use / spec-reference change between Round 11 and Round 12;0.1.10→0.1.11is a 0-token tokenizer-equivalent string change under o200k_base).install.shanddocs/install.shdefaultSI_CHIP_VERSION_DEFAULTconstant:v0.1.10→v0.1.11(Round 12). No other install.sh edits this round (Round 6’s# SI_CHIP_INSTALLER_STEPS=1header + all 11 existing flags preserved verbatim). Override with--version v0.1.10(or earlier) to install a prior payload.docs/_install_body.md--versiontable cell:v0.1.10→v0.1.11(Round 12; English + Chinese rows).- T1_pass_rate moved from 0.85 (Round 11) to 0.821 (Round 12) due to the honest 7th-case addition (per Option A in the L3 task brief). Still PASSES v2_tightened (>= 0.82) by razor-thin +0.001 margin and PASSES v1_baseline (>= 0.75) by margin. T2_pass_k FAILS v2_tightened by -0.055 (regression from 0.5478 to 0.4950) — the SOLE Round 12 v2 blocker; v0.2.0 ship at v2_tightened gate is BLOCKED. iteration_delta clause satisfied via the §6.4 reactivation-detector audit-gap-closure axis bonus on governance_risk (+0.10 v2_tightened bucket per spec §4.1 iteration_delta column — the §6.4 audit gap from v0.1.0 ship report is now CLOSED with the complete detector + 31 unit tests + spec_validator BLOCKER).
-
Round 12 promotion-state trace:
consecutive_v1_passes: 12(Rounds 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12);consecutive_v2_passes: 0(neither Round 11 nor Round 12 individually clears every v2 hard threshold). Promotion to v2_tightened is BLOCKED; Round 13 path = real-LLM runner upgrade (PATH A; recommended) OR ship at v1_baseline as v0.2.0 (PATH B; alternative). See.local/dogfood/2026-04-28/v0.2.0_ship_report.mdfor the L0 decision artifact. .agents/skills/si-chip/SKILL.mdfrontmatterversion: 0.1.9→version: 0.1.10ANDdescriptionspec referencev0.1.0→v0.2.0-rc1(canonical; Round 11), with mirrored bumps in.cursor/skills/si-chip/SKILL.mdand.claude/skills/si-chip/SKILL.md. Body changed minimally for spec reconciliation: §1 paragraph adds “(spec v0.2.0-rc1 §1.1, §8.3; §11 forever-out and Normative semantics byte-identical to v0.1.0)”; §3 “Core Object: BasicAbility” adds “(spec §3.2 frozen constraint #2; 37 total in §3.1 TABLE, reconciled with §13.4 prose at v0.2.0-rc1)” and replaces “29 sub-metric keys” reference; §3 / §8 “How To Use” updatesR6 7 dim / 28 sub-metrics→R6 7 dim / 37 sub-metricsandMVP-8 + 28-key null placeholders→MVP-8 + 37-key null placeholders; §11 “When NOT To Trigger” / “Out of Scope” Codex bullets appendbridge only at v0.2.0-rc1; References Index and Provenance footer updated to spec_v0.2.0-rc1 anchor. C1_metadata_tokens 82 → 85 (+3 from frontmatter spec_versionv0.1.0→v0.2.0-rc1string change; verifiedo200k_base("v0.1.0")=6 tokens,o200k_base("v0.2.0-rc1")=9 tokens, delta = +3); C2_body_tokens 2020 → 2125 (+105 from body additions); C4_per_invocation_footprint 3602 → 3710 (= C1+C2+1500 USER_PROMPT_FOOTPRINT). All three remain WELL within v2_tightened ceilings (C1 ≤ 100, C2 informational, C4 ≤ 7000). Mirror drift verified = 0 across all 3 trees (V3_drift_signal re-verified 0.0 post-0.1.10 sync; 3/3 SHA-256 byte-equality).install.shanddocs/install.shdefaultSI_CHIP_VERSION_DEFAULTconstant:v0.1.9→v0.1.10(Round 11). No other install.sh edits this round (Round 6’s# SI_CHIP_INSTALLER_STEPS=1header + all 11 existing flags preserved verbatim). Override with--version v0.1.9(or earlier) to install a prior payload.docs/_install_body.md--versiontable cell:v0.1.9→v0.1.10(Round 11; English + Chinese rows).- T1_pass_rate held at 0.85 across Round 10 → Round 11 (deterministic runner; SKILL.md body changed but the simulator’s per-prompt outcomes are hard-coded in
evals/si-chip/runners/with_ability_runner.MVP_CELL_OUTCOMES). All 10 v1_baseline hard thresholds PASS; iteration_delta clause satisfied via the governance_risk axis spec-reconciliation drift-removal bonus per master plan Round 11 acceptance criterion (the §13.4 prose 28/21 mismatch with §3.1/§4.1 TABLE 37/30 — a documented audit gap from v0.1.0 ship report — is closed this round; spec_validator--strict-prose-countnow PASSES against v0.2.0-rc1;.rules/.compile-hashes.jsonmatches AGENTS.md compile). C1/C2/C4 EXEMPTED from the 1% no-regression sub-clause this round per master plan Round 11 spec-reconciliation exemption (the round’s purpose is the spec bump; metric movement on C1/C2/C4 is the expected price of the spec_version string update). - Round 11 promotion-state trace:
consecutive_v1_passes: 11(Round 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11); promotion to v2_tightened still held to Round 12 per master plan (Round 11 + Round 12 are the two consecutive rounds for the §4.2 promotion gate). .agents/skills/si-chip/SKILL.mdfrontmatterversion: 0.1.8→version: 0.1.9(canonical; Round 10), with mirrored bumps in.cursor/skills/si-chip/SKILL.mdand.claude/skills/si-chip/SKILL.md. Body unchanged. Mirror drift verified = 0 across all 3 trees (re-confirmed live bygovernance_scan.scan_drift_signal→ V3 = 0.0; 3/3 SHA-256 byte-equality).install.shanddocs/install.shdefaultSI_CHIP_VERSION_DEFAULTconstant:v0.1.8→v0.1.9(Round 10). No other install.sh edits this round (Round 6’s# SI_CHIP_INSTALLER_STEPS=1header + all 11 existing flags preserved verbatim). Override with--version v0.1.8(or earlier) to install a prior payload.docs/_install_body.md--versiontable cell:v0.1.8→v0.1.9(Round 10; English + Chinese rows).- Round 10 promotion-state trace:
consecutive_v1_passes: 10(Round 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10); promotion to v2_tightened still held to Round 12 per master plan. .agents/skills/si-chip/SKILL.mdfrontmatterversion: 0.1.7→version: 0.1.8(canonical; Round 9), with mirrored bumps in.cursor/skills/si-chip/SKILL.mdand.claude/skills/si-chip/SKILL.md. Body unchanged. Mirror drift verified = 0 across all 3 trees (re-confirmed live bygovernance_scan.scan_drift_signal→ V3 = 0.0).install.shanddocs/install.shdefaultSI_CHIP_VERSION_DEFAULTconstant:v0.1.7→v0.1.8(Round 9). No other install.sh edits this round (Round 6’s# SI_CHIP_INSTALLER_STEPS=1header + all 11 existing flags preserved verbatim). Override with--version v0.1.7(or earlier) to install a prior payload.docs/_install_body.md--versiontable cell:v0.1.7→v0.1.8(Round 9; English + Chinese rows).- Round 9 promotion-state trace:
consecutive_v1_passes: 9(Round 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9); promotion to v2_tightened still held to Round 12 per master plan. .agents/skills/si-chip/SKILL.mdfrontmatterversion: 0.1.6→version: 0.1.7(canonical; Round 8), with mirrored bumps in.cursor/skills/si-chip/SKILL.mdand.claude/skills/si-chip/SKILL.md. Body unchanged. Mirror drift verified = 0 across all 3 trees (AND confirmed live by newgovernance_scan.scan_drift_signal→ V3 = 0.0).install.shanddocs/install.shdefaultSI_CHIP_VERSION_DEFAULTconstant:v0.1.6→v0.1.7(Round 8). No other install.sh edits this round (Round 6’s# SI_CHIP_INSTALLER_STEPS=1header + all 11 existing flags preserved verbatim). Override with--version v0.1.6(or earlier) to install a prior payload.docs/_install_body.md--versiontable cell:v0.1.6→v0.1.7(Round 8; English + Chinese rows).- Round 8 promotion-state trace:
consecutive_v1_passes: 8(Round 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8); promotion to v2_tightened still held to Round 12 per master plan. .agents/skills/si-chip/SKILL.mdfrontmatterversion: 0.1.5→version: 0.1.6(canonical; Round 7), with mirrored bumps in.cursor/skills/si-chip/SKILL.mdand.claude/skills/si-chip/SKILL.md. Body unchanged. Mirror drift verified = 0 across all 3 trees.install.shanddocs/install.shdefaultSI_CHIP_VERSION_DEFAULTconstant:v0.1.5→v0.1.6(Round 7). No other install.sh edits this round (Round 6’s# SI_CHIP_INSTALLER_STEPS=1header + all 11 existing flags preserved verbatim). Override with--version v0.1.5(or earlier) to install a prior payload.docs/_install_body.md--versiontable cell:v0.1.5→v0.1.6(Round 7; English + Chinese rows).- Round 7 promotion-state trace:
consecutive_v1_passes: 7(Round 1 + 2 + 3 + 4 + 5 + 6 + 7); promotion to v2_tightened still held to Round 12 per master plan. .agents/skills/si-chip/SKILL.mdfrontmatterversion: 0.1.4→version: 0.1.5(canonical; Round 6), with mirrored bumps in.cursor/skills/si-chip/SKILL.mdand.claude/skills/si-chip/SKILL.md. Body unchanged. Mirror drift verified = 0 across all 3 trees.install.shanddocs/install.shdefaultSI_CHIP_VERSION_DEFAULTconstant:v0.1.4→v0.1.5(Round 6).install.shalso gains the additive# SI_CHIP_INSTALLER_STEPS=1head-of-file comment header (Round 6 Edit B; advertises the canonical non-interactive one-liner step count for U3 parsing). Override with--version v0.1.4(or earlier) to install a prior payload.docs/_install_body.md--versiontable cell:v0.1.4→v0.1.5(Round 6; English + Chinese rows).- Round 6 promotion-state trace:
consecutive_v1_passes: 6(Round 1 + 2 + 3 + 4 + 5 + 6); promotion to v2_tightened still held to Round 12 per master plan. .agents/skills/si-chip/SKILL.mdfrontmatterversion: 0.1.3→version: 0.1.4(canonical; Round 5), with mirrored bumps in.cursor/skills/si-chip/SKILL.mdand.claude/skills/si-chip/SKILL.md. Body unchanged. Mirror drift verified = 0 across all 3 trees.install.shanddocs/install.shdefaultSI_CHIP_VERSION_DEFAULTconstant:v0.1.3→v0.1.4(Round 5). Override with--version v0.1.3(or earlier) to install a prior payload.docs/_install_body.md--versiontable cell:v0.1.3→v0.1.4(Round 5; English + Chinese rows).- Round 5 promotion-state trace:
consecutive_v1_passes: 5(Round 1 + 2 + 3 + 4 + 5); promotion to v2_tightened still held to Round 12 per master plan. .agents/skills/si-chip/SKILL.mdfrontmatterversion: 0.1.2→version: 0.1.3(canonical; Round 4), with mirrored bumps in.cursor/skills/si-chip/SKILL.mdand.claude/skills/si-chip/SKILL.md. Body unchanged. Mirror drift verified = 0 across all 3 trees.install.shanddocs/install.shdefaultSI_CHIP_VERSION_DEFAULTconstant:v0.1.2→v0.1.3(Round 4). Override with--version v0.1.2(or earlier) to install a prior payload.docs/_install_body.md--versiontable cell:v0.1.2→v0.1.3(Round 4; English + Chinese rows).- Round 4 promotion-state trace:
consecutive_v1_passes: 4(Round 1 + 2 + 3 + 4); promotion to v2_tightened still held to Round 12 per master plan. .agents/skills/si-chip/SKILL.mdfrontmatterversion: 0.1.1→version: 0.1.2(Round 3), with mirrored bumps in the 3 platform trees.install.shanddocs/install.shdefaultSI_CHIP_VERSION_DEFAULTconstant:v0.1.1→v0.1.2(Round 3).docs/_install_body.md--versiontable cell:v0.1.1→v0.1.2(Round 3; English + Chinese rows).- Pages site supports zh/en language toggle and day/night theme toggle. Hero top-right
[ LANG / EN ]and[ THEME / DAY ]buttons; persisted tolocalStorage(si-chip-lang,si-chip-theme); first-visit defaults fromnavigator.languageandprefers-color-scheme. Body markdown bilingualized via<div lang="en" markdown="1">/<div lang="zh" markdown="1">pattern; chrome translations live in a JSON island inside_layouts/default.html. docs/assets/css/nier.cssextended (~+90 lines) with three new sections: i18n display rules + responsive table overflow, dark-mode palette overrides underbody[data-theme="night"], and toggle control styles.docs/assets/js/nier.jsextended (~12 → ~100 lines) with theme + language state machines (localStorage + system-preference detection + DOM event handlers); cursor blink behavior preserved.CONTRIBUTING.md§10 documents the bilingualization contract, the JSON island convention, and the loosened sync contracts for_install_body.md/_userguide_body.md.
Fixed
- The 8-column table in
docs/demo.md(and any wide table site-wide) no longer overflows the NieR page chrome on the right edge..page-body tableis nowdisplay: block; max-width: 100%; overflow-x: auto; white-space: nowrap;at all viewport widths, so wide tables scroll horizontally inside the chrome rather than bursting out.
Notes
tools/spec_validator.py --jsondefault-mode 8/8 PASS after Round 11 against BOTH v0.1.0 (backward-compat) and v0.2.0-rc1 (post-Round-11 default).--strict-prose-countnow PASSES 8/8 against v0.2.0-rc1 — the previously expectedR6_KEYS+THRESHOLD_TABLEstrict-prose failures (v0.1.0 ship-report known limitation) are CLOSED this round via §13.4 prose alignment to §3.1/§4.1 TABLES (28 → 37 sub-metrics; 21 → 30 numeric threshold cells) + validator narrowing of_threshold_prose_numbers_in_section13to the §13.4 subsection only (stops at next###/##heading) so the v0.2.0-rc1 reconciliation-log appendices’ historical “21 个数” strings are correctly ignored. Against v0.1.0 the strict-prose mode still FAILS by design — reconciliation sentinel preserved (v0.1.0 prose 28/21 still mismatches template TABLE 37/30; this is the historical regression mode that motivated Round 11).- Round 11 iteration_delta_report satisfies the §4.1
iteration_delta ≥ +0.05v1_baseline row via the master-plan-allowed spec-reconciliation drift-removal flavour bonus ongovernance_risk(the spec / template / validator triple was internally inconsistent pre-Round-11: TABLE = 37/30 vs prose = 28/21 — an audit gap explicitly flagged by the v0.1.0 ship report and resolved this round;.rules/.compile-hashes.jsonnow matches AGENTS.md compile;--strict-prose-countPASSES). Master plan Round 11 acceptance criterion verbatim: “iteration_delta_report.yaml records that this round is a spec-reconciliation round (no metric movement expected; governance_risk axis ‘positive_axis: true’ for the drift-removal achievement is acceptable)”. All MVP-8 / R4 / R6 / R7 / R8 / L1 / L3 / L4 / U1 / U2 / U3 / U4 / C5 / C6 / G1 / V1 / V2 / V3 / V4 metrics carried byte-identical from Round 10 EXCEPT C1/C2/C4 (recomputed for the new SKILL.md frontmatter+body). C1 82 → 85 (+3), C2 2020 → 2125 (+105), C4 3602 → 3710 (+108); all still WELL within v2_tightened ceilings (C1 ≤ 100, C4 ≤ 7000). C5/C6/R8/U1 are CARRIED from Round 10 even though the aggregator’s natural live-derivation produces slightly different values from the new SKILL.md body (C5=0.2606, C6=0.04166…, R8=0.04166…, U1=18.85 — all recorded in.local/dogfood/2026-04-28/round_11/raw/aggregator_raw_output.yamlfor traceability) per the master plan “no metric movement expected beyond C1/C2/C4 + governance_risk drift-removal bonus” rule. - Spec §11 forever-out compliance (Round 11 verbatim check): byte-identical to v0.1.0 per
raw/normative_diff_check.json(§11.1 4 items unchanged; §11.2 4 deferred items unchanged; §11.3 boundary guard unchanged). No marketplace touched (Round 11 is spec-text reconciliation only). No router model training (no learned ranker, no online weight learning; the §5.1 vs §5.2 boundary is byte-identical to v0.1.0 in v0.2.0-rc1). No Markdown-to-CLI converter introduced. No generic IDE compatibility layer introduced (Cursor + Claude Code mirrors are §7.2 priority 1+2 — same as v0.1.0 ship; Codex is bridge-only per §11.2 deferred; the “bridge only at v0.2.0-rc1” anchor in SKILL.md only updates the spec_version anchor, NOT the bridge-only constraint). ALL 4 §11.1 items remain forever-out; Round 11 compliant. - v0.2.0 final ship gate is held to Round 12 per master plan (
.local/.agent/active/v0.2.0-iteration-plan.yaml#round_12): implementtools/reactivation_detector.pywith all 6 §6.4 triggers + unit tests; extendtools/spec_validator.pywith a 9th invariantREACTIVATION_DETECTOR_PRESENT; run the v2_tightened readiness check across Round 11 + Round 12 metrics_report.yaml; emitv0.2.0_ship_decision.yaml+v0.2.0_ship_report.md. SHIP_ELIGIBLE iff both rounds pass every v2_tightened hard threshold per §4.2 promotion rule. The single known carry-forward blocker for v2_tightened isT2_pass_k = 0.5478(fails ≥ 0.55 by margin -0.0022); Round 12 readiness check decision determines whether v0.2.0 ships, defers, or spawns Round 13/14 for the §4.2 promotion gate. tools/spec_validator.py --jsondefault-mode 8/8 PASS unchanged after Round 10 (matches v0.1.0 ship report and Rounds 3-9; no template or schema changes this round; Round 9’sROUTER_MATRIX_CELLSintermediate invariants still PASS under schema 0.1.1). Strict-prose-count mode still flags the known §3.1/§4.1 vs §13.4 discrepancies — reconciliation deferred to Round 11 spec bump per master plan.- Round 10 iteration_delta_report satisfies the §4.1
iteration_delta ≥ +0.05v1_baseline row via the master-plan-allowed measurement-fill flavour bonus ongeneralizability(G1 null → 2×2 nested dict; D4 0/4 populated → 1/4 populated; first-ever D4 measurement since Round 1). Master plan acceptance criterion #4 verbatim: “iteration_delta_report.yaml generalizability axis becomes positive_axis: true (was 0.0 since Round 1)”. All MVP-8 / R4 / R6 / R7 / R8 / L1 / L3 / L4 / U1 / U2 / U3 / U4 / C5 / C6 / V1 / V2 / V3 / V4 metrics byte-identical to Round 9 (deterministic runner; SKILL.md body unchanged; baselines unchanged; frontmatter0.1.8→0.1.9adds 0 tokens — verified: both"0.1.8"and"0.1.9"tokenize to 5 tokens undero200k_base; C1 byte-identical). - Round 10 G1 = 2-model × 2-pack is a PARTIAL PROXY, not authoritative. Spec §3.1 D4 full G1 requires real-LLM runs against the 96-cell full profile (6 model × 4 depth × 4 pack). The v2_tightened+ promotion gate ties G1 to the full matrix — Round 12 readiness check will either promote or defer. The
max(pass_rate)across-depths aggregation rule is MONOTONE under adding more depths / models, so Round 10’s hoist is forward-compatible with the 96-cell upgrade (no code change needed; only data source swap). G2/G3/G4 (cross_domain_transfer_pass, OOD_robustness, model_version_stability) remain explicit null for v0.1.x per master plan — all three require real-LLM runner upgrades (v0.3.x+ work). - Round 10
evals/si-chip/golden_set/is NOT consumed by the deterministic runner — the runner still usesevals/si-chip/cases/*.yaml(6 canonical cases, 20 prompts each). The golden_set/ directory is an opt-in source for futurenines --golden-dirinvocations and a downstream real-LLM eval harness (v0.3.x+ upgrade path). Populating it with NEW prompts beyond the Round 10 12+12 verbatim seed is explicitly deferred per master plan. - Spec §11 forever-out compliance (Round 10 verbatim check): No marketplace touched (evals/si-chip/golden_set/ is a local repo path, not a distribution surface). No router model training — G1 is a static pass_rate matrix collapsed from the deterministic 8-cell sweep (not a learned ranker, not online weight learning, not a kNN fit); §5.1 metadata-retrieval surface only. No Markdown-to-CLI converter introduced. No generic IDE compatibility layer introduced. ALL 4 §11.1 items remain forever-out; Round 10 compliant.
tools/spec_validator.py --jsondefault-mode 8/8 PASS unchanged after Round 9 template$schema_versionbump0.1.0 → 0.1.1(validator accepts BOTH schemas — backward compat). Intermediate invariants additionally asserted (cell_counts.intermediate == 16,profiles.intermediate.cells == 16,profiles.intermediate.gate_binding == "relaxed",models*depths*packs == 16). Strict-prose-count mode still flags the known §3.1/§4.1 vs §13.4 discrepancies — reconciliation deferred to Round 11 spec bump per master plan.- Round 9 iteration_delta_report satisfies the §4.1
iteration_delta ≥ +0.05v1_baseline row via the master-plan-allowed measurement-fill flavour bonus onrouting_cost(R8 null → 0.043478260869565216 + ADDITIVE 16-cell intermediate router-test profile introduced alongside mvp:8 and full:96). BOTH branches of the R8-fill-OR-floor-improvement disjunction satisfied. All MVP-8 / R4 / R6 / R7 / L1 / L3 / L4 / U1 / U2 / U3 / U4 / C5 / C6 / V1 / V2 / V3 / V4 metrics byte-identical to Round 8 (deterministic runner; SKILL.md body unchanged; baselines unchanged; frontmatter0.1.7→0.1.8adds 0 tokens — verified: both"0.1.7"and"0.1.8"tokenize to 5 tokens undero200k_base; C1 byte-identical). - Spec §11 forever-out compliance (Round 9 verbatim check): No marketplace touched. No router model training — R8 is static Jaccard/TF-IDF on SKILL.md descriptions; the intermediate profile adds more test cells (hard-coded deterministic anchors) NOT a learned ranker. No Markdown-to-CLI converter introduced. No generic IDE compatibility layer introduced. ALL 4 §11.1 items remain forever-out; Round 9 compliant.
tools/spec_validator.py --jsondefault-mode 8/8 PASS unchanged after Round 8 (matches v0.1.0 ship report and Rounds 3+4+5+6+7). Strict-prose-count mode still flags the known §3.1/§4.1 vs §13.4 discrepancies (R6_KEYS + THRESHOLD_TABLE) — reconciliation deferred to Round 11 spec bump per master plan.- Round 8 iteration_delta_report satisfies the §4.1
iteration_delta ≥ +0.05v1_baseline row via the master-plan-allowed measurement-fill flavour bonus ongovernance_risk(D7 sub-metric measurement coverage 0/4 → 4/4 = FULL D7 measurement-attempt completion with NO by-design-null cells; third R6-taxonomy dimension to reach full coverage after D5 in Round 6 and D2 in Round 7). All MVP-8 / R4 / R6 / R7 / L1 / L3 / L4 / U1 / U2 / U3 / U4 / C5 / C6 metrics byte-identical to Round 7 (deterministic runner; SKILL.md body unchanged; baselines unchanged; frontmatter0.1.6→0.1.7adds 0 tokens — verified:o200k_base("0.1.6") == o200k_base("0.1.7")). - Round 8 V1_permission_scope = 0 is a STATIC scan (tools/governance_scan.py walks the skill’s Python + shell source for hardcoded absolute write paths via regex pattern-matching: Python
open(path, 'w'...),Path(path).write_text/write_bytes/mkdir,os.makedirs/os.mkdir; shell> /abs/pathredirections). It cannot observe DYNAMIC runtime writes (e.g. a script that constructs a path via string interpolation would be missed). For Si-Chip v0.1.7 this is a non-issue because scripts route writes through caller-provided--outarguments; the convention is explicit. A Round 12 / real-LLM-runner upgrade could add OTel-trace-based runtime write tracking for dynamic-path coverage. - Round 8 V2_credential_surface = 0 is PATTERN-BASED (4 canonical patterns:
aws_access_key,pem_private_key,credential_assignment,generic_high_entropy_40). A secret encoded in a non-pattern-matching format (e.g. a base64 OAuth token < 40 chars) would be missed. V2 CRITICAL invariant: the scanner NEVER logs the matched value verbatim — only pattern name + file path + per-file count. Unit-test verified intools/test_governance_scan.py::test_must_not_log_secret_valuewhich writes a known AWS-key fixture, runs the scanner, captures log output, and asserts the scanner logs"aws_access_key"+"1 time"but NOT the actual key body (nor even a 10-char prefix of it). Future hardening: integrate truffleHog / gitleaks-style entropy-based scanning — not Round 8 scope. - Round 8 V3_drift_signal = 0.0 measures byte-equality across the 3 SKILL.md mirrors via SHA-256 pairwise comparison (total_pairs = C(3, 2) = 3; equal_pairs = 3 post-0.1.7 sync; drift_zero_ratio = 3/3 = 1.0; V3 = 1 - 1 = 0.0). It does NOT detect semantic drift (e.g. references/ mismatches or scripts/ differences across trees); full-tree drift is covered by the per-CHANGELOG-entry “Mirror drift verified = 0 across all 3 trees” manual verification. Automating full-tree drift scanning is a Round 12 release-automation upgrade candidate.
- Round 8 V4_staleness_days = 0 on day-of-review (
today == last_reviewed_at); rises monotonically thereafter until the next dogfood round moveslast_reviewed_atforward. Spec §6 cadence is 30/60/90 days — V4 > 30 on a “keep” decision signals the ability is due for re-review.governance_scan.scan_staleness_daysraisesValueErroron future-datedlast_reviewed_at(workflow-bug guard). - Round 8
half_retire_decision.yamlgovernance_risk_delta audit-gap closure: Rounds 1-7 hard-codedvalue_vector.governance_risk_delta = 0.0because no D7 sub-metric was measured. Round 8 closes the audit gap —governance_risk_deltais now DERIVED LIVE viagovernance_scan.compute_governance_risk_delta(V1, V2, V3, V4). Numerically the axis still reads 0.0 (clean Si-Chip baseline with V1=V2=V3=V4=0) but it now TRACES to a computable function. Any future V1/V2/V3/V4 regression will move the axis negative automatically; spec §6.2disable_auto_triggerbecomes programmatically reachable when sufficient axes regress (two rising to 1 each drops governance_risk_delta to -0.50). tools/spec_validator.py --jsondefault-mode 8/8 PASS unchanged after Round 7 (matches v0.1.0 ship report and Rounds 3+4+5+6). Strict-prose-count mode still flags the known §3.1/§4.1 vs §13.4 discrepancies (R6_KEYS + THRESHOLD_TABLE) — reconciliation deferred to Round 11 spec bump per master plan.- Round 7 iteration_delta_report satisfies the §4.1
iteration_delta ≥ +0.05v1_baseline row via the master-plan-allowed measurement-fill flavour bonus oncontext_economy(D2 sub-metric measurement coverage 3/6 → 5/6 with C3 resolved_tokens explicitly by-design null; second R6-taxonomy dimension to reach full measurement-attempt coverage). All MVP-8 / R4 / R6 / R7 / L1 / L3 / L4 / U1 / U2 / U3 / U4 metrics byte-identical to Round 6 (deterministic runner; SKILL.md body unchanged; baselines unchanged; frontmatter0.1.5→0.1.6adds 0 tokens — verified:o200k_base("0.1.5") == o200k_base("0.1.6")). - Round 7 C5_context_rot_risk = 0.2601 is a deterministic HEURISTIC PROXY, not a ground-truth context-rot measurement. The 0.05 fanout coefficient derives from the 2026 frontier-model context-rot studies cited in
references/metrics-r6-summary.md. The acceptance criterion is range sanity[0.0, 1.0](C5 has no §4.1 hard threshold); 0.2601 satisfies. Master plan Round 7 risk_flag explicitly accepts the heuristic as v1_baseline-adequate. Note: the master plan’s textual estimate ofC5 ≈ 0.06assumedfanout_depth = 1(graph-depth interpretation); the L3 Task Spec §1 literal formula counts referenced.mdfiles = 5, yielding 0.05 × 5 = 0.25 (plus 2020/200_000 = 0.0101 for the body ratio). The discrepancy is documented in.local/dogfood/2026-04-28/round_7/raw/c5_derivation.json#plan_vs_actual_note. Replacing the heuristic with a real-LLM empirical measurement is a Round 12 / real-LLM-runner upgrade candidate. - Round 7 C6_scope_overlap_score = 0.0435 is a Jaccard similarity computed against a heuristic neighbor set (23
lark-*SKILL.md files under/root/.claude/skills/; the only skill family besides Si-Chip present in the workspace at round_7 time). Changing the neighbor set would change C6; the full per-pair enumeration is recorded in.local/dogfood/2026-04-28/round_7/raw/c6_overlap_pairs.jsonfor reproducibility. R8_description_competition_index (Round 9 master plan target) will replace this conservative-max metric with the formal across-matrix index. tools/spec_validator.py --jsondefault-mode 8/8 PASS unchanged after Round 6 (matches v0.1.0 ship report and Rounds 3+4+5). Strict-prose-count mode still flags the known §3.1/§4.1 vs §13.4 discrepancies (R6_KEYS + THRESHOLD_TABLE) — reconciliation deferred to Round 11 spec bump per master plan.- Round 6 iteration_delta_report satisfies the §4.1
iteration_delta ≥ +0.05v1_baseline row via the master-plan-allowed full-coverage flavour bonus onusage_cost(D5 sub-metric coverage 2/4 → 4/4 = fully complete; first R6-taxonomy dimension to reach full coverage). All MVP-8 / R4 / R6 / R7 metrics byte-identical to Round 5 (deterministic runner; SKILL.md body unchanged; baselines unchanged). - Round 6 U4 = 0.0073 s is a dry-run floor estimate, not a live-install wall-clock. Real wall-clock on a live network would add ~1-5 s HTTP tarball download + ~100-200 ms extraction overhead. The dry-run branch short-circuits the HTTP fetch + destructive writes while still exercising the installer’s full argument parsing +
resolve_inputs+verify_installcontrol flow. Master plan Round 6 risk_flag explicitly acknowledged the network-latency variance concern; the dry-run floor is v1_baseline-adequate for the dogfood feedback loop. Round 12 + real-LLM-runner upgrade is the recommended path to live wall-clock p50 + p95 capture. - Round 6 U3 = 1 VALIDATES the CHANGELOG v0.1.1 one-line-installer claim. The non-interactive flow (
curl ... | bash -s -- --yes --target ... --scope ...) is unambiguously 1 step (the one-liner itself); the interactive flow (no--yes, TTY present) fires--target+--scopeprompts for a total of 3 steps. INSTALL.md and docs/_install_body.md lead with the non-interactive one-liner. install.sh --dry-runregression test (no flag regressions):bash install.sh --dry-run --yes --target cursor --scope repo --repo-root /tmp/si_chip_round6_dryexits 0 and emits the correct[OK] Installed Si-Chip v0.1.5line; all 11 existing flags (--target,--scope,--version,--dry-run,--yes/-y,--force,--uninstall,--source-url,--repo-root,--help/-h,--version-info) preserved verbatim per Round 6raw/install_dry_run.log.tools/spec_validator.py --jsondefault-mode 8/8 PASS unchanged after Round 5 (matches v0.1.0 ship report and Rounds 3+4). Strict-prose-count mode still flags the known §3.1/§4.1 vs §13.4 discrepancies (R6_KEYS + THRESHOLD_TABLE) — reconciliation deferred to Round 11 spec bump per master plan.- Round 5 iteration_delta_report satisfies the §4.1
iteration_delta ≥ +0.05v1_baseline row via the master-plan-allowed measurement-fill axis bonus onusage_cost(D5 sub-metric coverage 0/4 → 2/4; the usage_cost axis was 0.0 through Rounds 1-4 and is now non-zero). All MVP-8 / R4 / R6 / R7 metrics byte-identical to Round 4 (deterministic runner; SKILL.md body unchanged; baselines unchanged). - Round 5 U1_description_readability = 19.58 is clamped inside
[0.0, 24.0]; the high value (post-graduate reading level) reflects the dense technical vocabulary in the SKILL.md description field (BasicAbility,router-testing,half-retiring,Si-Chip). Master plan round_5 risk_flag explicitly acknowledged the expected unflattering number — surfacing the measurement is the value. Real reduction requires rephrasing the description itself, which is out of scope for Round 5 (body untouched). tools/spec_validator.py --jsondefault-mode 8/8 PASS unchanged after Round 4 (matches v0.1.0 ship report and Round 3). Strict-prose-count mode still flags the known §3.1/§4.1 vs §13.4 discrepancies (R6_KEYS + THRESHOLD_TABLE) — reconciliation deferred to Round 11 spec bump per master plan.- Round 4 iteration_delta_report satisfies the §4.1
iteration_delta ≥ +0.05v1_baseline row via the master-plan-allowed measurement-fill axis bonus onlatency_path(D3 sub-metric coverage 1/7 → 4/7). L2_wall_clock_p95 byte-identical to Round 3 at 1.4693 s (within ±1% as master plan round_4 acceptance criterion #4 requires). - Round 3 metrics_report C1_metadata_tokens 78 → 82 is fully attributable to commit 96f22d4 (
chore(release): v0.1.1) which bumped SKILL.md frontmatterversion + licenseAFTER Round 2 evidence was authored; Round 3 frontmatter bump (0.1.1 → 0.1.2) added 0 tokens, and the Round 4 bump (0.1.2 → 0.1.3) also adds 0 tokens (verified: o200k_base tokenization is identical for 0.1.2 and 0.1.3). Both numbers pass v1_baseline (≤ 120) and v2_tightened (≤ 100).
0.1.1 — 2026-04-28
This release packages everything that has landed since v0.1.0 (PR #2 through PR #6) into a single named version. The frozen specification at .local/research/spec_v0.1.0.md is unchanged; this is a packaging / docs / installer release on top of the same v0.1.0 spec gate evidence.
Added
- One-line bash installer (PR #5, refined in PR #6) at
https://yorha-agents.github.io/Si-Chip/install.sh. Supports--target cursor|claude|both,--scope global|repo,--repo-root <path>,--version,--source-url,--yes,--dry-run,--force,--uninstall,--help. Interactive in a TTY without--yes. docs/skills/si-chip-0.1.1.tar.gz— deterministic release tarball for v0.1.1 (new in this release). The v0.1.0 tarball atdocs/skills/si-chip-0.1.0.tar.gzis preserved for backward compatibility —--version v0.1.0still works.- NieR: Automata-themed GitHub Pages design (PR #3, fixed in PR #4): warm khaki / olive-brown palette, B612 Mono + Saira Stencil One typography, bracketed
[SI-CHIP]title, scan-line overlay, blinking cursor,// CHAPTER 0N //section markers, “GLORY TO MANKIND.” motto,[STATUS: ONLINE] [NODE: 0001] [VER: 0.1.1] [OPERATOR: 6O]footer status grid. Cayman theme retired. docs/_layouts/default.html,docs/_includes/{header,footer}.html,docs/assets/css/nier.css(~476 lines),docs/assets/js/nier.js(12 lines, prefers-reduced-motion guarded).CONTRIBUTING.md§9 Mirror Drift Contract: 3-tree (.agents/,.cursor/,.claude/) + 1-derived-tarball (docs/skills/si-chip-<version>.tar.gz) contract with diff and rebuild commands.
Changed
.agents/skills/si-chip/SKILL.mdfrontmatter:version: 0.1.0→version: 0.1.1;license: internal→license: Apache-2.0(matches the repo’s actualLICENSEfile). Token budget still passes v3_strict (metadata ≤ 100, body ≤ 5000).install.shanddocs/install.shdefaultSI_CHIP_VERSIONconstant:v0.1.0→v0.1.1. Override with--version v0.1.0to install the prior payload.INSTALL.mdanddocs/_install_body.md: now lead with## Quick Install (one-line); the previous git-clone flow is## Manual install; Codex deferral promoted to its own section.README.md: status badge bumped tov0.1.1 ship-eligible;## Quick Startsplit into## Quick Install(one-liner) and## Quick Start (after install or clone)(the original 3-command verification block).install.shHTTP(S) path now downloads a single tarball at${SOURCE_URL}/skills/si-chip-${VERSION}.tar.gzand extracts it (PR #6). The previous per-file mirror atdocs/skills/si-chip/was removed because Jekyll renders YAML-front-matter.mdfiles (SKILL.md was served at/skills/si-chip/SKILL/, raw.mdURL = 404).file://source URLs continue to use per-filecpfor local testing.
Fixed
- Pages build no longer hangs on
include_relative ../USERGUIDE.md/../INSTALL.mdtraversal indocs/userguide.mdanddocs/install.md(PR #2). Userguide/install bodies now ship as sibling Jekyll partialsdocs/_userguide_body.mdanddocs/_install_body.md. Live URLs return HTTP 200. docs/_config.ymlpreviously hadtheme: ""which Jekyll rejects (MissingDependencyException: The theme could not be found.); thetheme:key is now omitted entirely so Jekyll picks up the local_layouts/default.htmlviadefaults.layout: default(PR #4).install.shno longer 404s on SKILL.md when run against the live Pages URL (PR #6 tarball switch).
Notes
- Codex (
--target codex) remains out of scope (spec §7.2: bridge-only, deferred). - Marketplace (
--target marketplace) remains forever-out (spec §11.1). DESIGN.mdis intentionally NOT in any platform mirror or in the tarball (internal artifact only).- Spec text
.local/research/spec_v0.1.0.mdis unchanged — this release does NOT bump the spec version. The next spec bump (when there is one) would go to spec v0.2.0 alongside a project release ≥ v0.2.0. - Dogfood evidence at
.local/dogfood/2026-04-28/round_{1,2}/is unchanged; the v0.1.0 ship verdict (SHIP_ELIGIBLE atrelaxed/v1_baseline, two consecutive v1 passes, ALL_TREES_DRIFT_ZERO, 8/8 spec invariants PASS) carries forward unchanged.
0.1.0 — 2026-04-28
Initial public release. Si-Chip is published as a persistent BasicAbility optimization factory with a frozen specification, a self-installing Skill package, an evaluation harness, and machine-checkable spec invariants.
Added
- Frozen specification
spec_v0.1.0.mdwith 7 normative sections (§3 metrics, §4 gate profiles, §5 router paradigm, §6 half-retirement, §7 packaging, §8 self-dogfood, §11 scope boundaries). - Self-Skill package at
.agents/skills/si-chip/(canonical source-of-truth):SKILL.md(metadata=78 tok, body=2020 tok — passes v3_strict packaging gate), 5 distilled references, 3 helper scripts. - 6 frozen factory templates under
templates/(BasicAbility schema, eval suite, router-test matrix, half-retire decision, next-action plan, iteration-delta report). - Spec validator
tools/spec_validator.pywith 8 invariants (default mode honours the §3.1 / §4.1 tables;--strict-prose-countflags the known §13.4 prose discrepancies). - Eval harness under
evals/si-chip/: 6 cases × 20 prompts (10 should_trigger + 10 should_not_trigger), no-ability and with-ability baseline runners, end-to-end metrics report. - Two completed dogfood rounds at
.local/dogfood/2026-04-28/round_{1,2}/with all 6 frozen evidence files each, plus the v0.1.0 ship report and ship decision YAML. - Cross-platform mirrors:
.cursor/skills/si-chip/and.claude/skills/si-chip/(drift = 0 verified across all three trees). - Cursor bridge rule
.cursor/rules/si-chip-bridge.mdc. - Compiled rules layer
AGENTS.mdand.rules/si-chip-spec.mdc. - Persistence:
.local/memory/skill_profiles/si-chip/learnings.jsonl(3 entries: round_1, round_2, ship).
Gate verdict (v0.1.0 ship)
- Default
spec_validator.py: PASS (8/8 invariants). - Two consecutive
v1_baseline(relaxed) gate passes (R1: pass_rate 0.85, R2: pass_rate 0.85 with body slimmed 18.97 % andR4_near_miss_FP_ratepopulated at 0.05). - Cross-tree drift: ALL_TREES_DRIFT_ZERO.
- Ship verdict: SHIP_ELIGIBLE at gate
relaxed/v1_baseline.
Known limitations
- Eval baselines are deterministic SHA-256 simulations (LLM-backed runner is the documented upgrade path; the result.json schema and runner CLI are stable so the swap requires no aggregator change).
- Spec internal discrepancies in §3.1 / §13.4 (28 vs 37 sub-metrics; 21 vs 30 threshold cells) — the validator handles them in default mode and flags them in strict-prose mode. Reconciliation requires a future spec bump.
- Three R6 sub-metrics (R6_routing_latency_p95, R7_routing_token_overhead, plus
the L1 / U1–U4 family) remain
null— targeted innext_action_plan.yamlfor round 3.
Out of scope (per spec §11)
- No marketplace or distribution surface.
- No router model training.
- No Markdown-to-CLI converter.
- No generic IDE compatibility layer.
- Codex native SKILL.md runtime is bridge-only and deferred.