Testing¶
just test— pytest with coverage (terminal report + missing lines).just check— ruff lint + format check +mypy --strict.just check-all—check+ lock-file consistency + all pre-commit hooks (CI parity).
Rules¶
- No public function/class without a test. Coverage must not regress on any PR.
- Global coverage floor: 90%. Enforced by
fail_under = 90inpyproject.toml(applied bypytest-covagainst combined line + branch coverage). - Per-module gates enforced by
scripts/check_coverage.pyagainst thecoverage.xmlthatpytest --cov-report=xmlproduces. The script runs locally viajust testand in CI after the test step on every OS in the matrix, so platform-specific regressions surface at PR time. Current gates (line coverage):
| Prefix | Gate | Aspirational target |
|---|---|---|
<global> (line + branch via fail_under) |
90.0% | rise with reality |
src/yaya/kernel/ |
93.5% | 95% |
src/yaya/core/ |
90.0% | 90% |
src/yaya/cli/ |
85.0% | 85% |
src/yaya/plugins/agent_tool/ |
95.0% | 95% |
src/yaya/plugins/llm_echo/ |
95.0% | 95% |
src/yaya/plugins/llm_openai/ |
79.5% | 85% |
src/yaya/plugins/mcp_bridge/ |
90.0% | 90% |
src/yaya/plugins/memory_sqlite/ |
89.5% | 90% |
src/yaya/plugins/strategy_react/ |
95.0% | 95% |
src/yaya/plugins/tool_bash/ |
95.0% | 95% |
src/yaya/plugins/web/ |
84.0% | 85% |
Gates tagged below their aspirational target track the ratchet
convention "reality minus 0.5" — they rise as coverage improves
and never drop without a governance issue. The tables in
scripts/check_coverage.py are the source of truth; the table here
is a mirror for readers who are not reading the script.
Ratchet policy¶
- After any merge to
mainthat raises a prefix's coverage abovegate + 0.5, bump its gate inscripts/check_coverage.py(and the globalfail_underinpyproject.toml) tofloor(actual − 0.5)rounded down to the nearest 0.5. Do this in the same PR or an immediate follow-up. - Gates never retreat. Lowering a gate — or relaxing the global
floor — requires a dedicated issue labelled
governanceand owner sign-off. - If current reality is below an aspirational target (e.g.
plugins/web/below 85%), the gate sits at reality minus 0.5; a follow-up issue raises coverage first, then the gate. Never set a gate you cannot meet today. - The
ratchet->column incheck_coverage.py's output names the value a gate could be raised to right now. Watch for it in PR logs — free ratchet opportunities are ratchet opportunities missed. - Prefer integration over mocks. Reach for real objects,
tmp_path, and recorded fixtures beforeunittest.mock. - Tests must fail before they pass. Write the failing test first, then the implementation (TDD).
- Feature PRs: each observable behavior is covered by a BDD scenario
in
specs/<slug>.specwhoseTest:selector names the test function.scripts/check_specs.shrunsagent-spec lifecycle, and pytest-bdd rejects unbound.featurescenarios. See agent-spec.md. - Test layout mirrors
src/. A file atsrc/yaya/core/foo.pyhas its tests attests/core/test_foo.py. - No network, no time, no randomness in tests without an explicit seam.
- Every test finishes in ≤30s. Enforced by
pytest-timeout. If you need longer, mark the test@pytest.mark.slowand justify it in the PR body. - Tests must pass in random order.
pytest-randomlyshuffles every run. Order-dependent tests are bugs.
Stack¶
pytest+pytest-asyncio(auto mode).pytest-cov— branch coverage, threshold enforced.pytest-httpx— mock httpx calls; the only way to touch the network in tests.pytest-timeout— 30s default per test.pytest-randomly— shuffled execution order.hypothesis— property-based tests for pure logic (seetests/core/test_updater_properties.py).syrupy— snapshot tests for stable outputs like--help. Refresh withpytest --snapshot-update; commit the resulting__snapshots__/changes.typer.testing.CliRunner(via conftestrunnerfixture) — CLI end-to-end.
Markers¶
Registered in pyproject.toml:
| Marker | Meaning | Default |
|---|---|---|
unit |
fast, isolated, pure logic | implicit default |
integration |
touches fs / subprocess / local network | explicit |
slow |
>1s; skippable via -m "not slow" |
explicit |
Use module-level pytestmark = pytest.mark.unit to tag a whole file.
--strict-markers fails the suite on typos.
Test kinds (when to reach for each)¶
- Example tests: one concrete input → one expected output. Default.
- Property tests (hypothesis): assert an invariant over a generated input space. Use for pure functions (parsers, comparators, transforms).
- Snapshot tests (syrupy): pin the exact shape of a stable output.
Only snapshot outputs that are platform-agnostic — e.g. a JSON
payload from
yaya --json foo. Do NOT snapshot rich-rendered help text; rich picks different box-drawing glyphs on Windows vs POSIX, and snapshots drift. Refresh deliberately, not reflexively. - CLI tests:
runner.invoke(cli_app, [...])+ assert exit code, JSON shape, stderr vs stdout routing. See cli.md. - BDD tests (pytest-bdd): Gherkin scenarios in
tests/bdd/features/*.featurebound to step definitions intests/bdd/test_*.py. Each scenario inspecs/<slug>.specCompletion Criteria has a matching.featurescenario; changing the scenario text without updating the step definition causes pytest to fail withStepDefinitionNotFoundError.scripts/check_feature_sync.pyverifies.featureand.specstay aligned; it runs injust checkand CI. Step-by-step conversion procedure lives in bdd-workflow.md — new agents authoring or migrating a spec read that first. See agent-spec.md for how.specauthoring and BDD execution relate.
Isolation fixtures¶
Two autouse fixtures in tests/conftest.py:
_isolate_state_dir— redirectsXDG_DATA_HOMEintotmp_pathso the updater's state files never touch~/.local._no_auto_update— setsYAYA_NO_AUTO_UPDATE=1so the toast stays silent in CLI tests (override in a specific test if you need it).
Pre-commit¶
Pre-commit hooks run on every commit. Never use --no-verify. If a hook
fails, fix the cause before the final commit of the PR.