Guard the click-capture pipeline with an automated end-to-end check
Template tests / tests (push) Waiting to run
Template tests / tests (pull_request) Waiting to run

Add tests/checks/test_click_capture_selftest.sh: runs the real Electron
STEPFORGE_CLICK_SELFTEST session and asserts every scenario passes — 3/3
markers at 0.00% offset, 8/8 burst clicks kept on finish, the first armed
click captured (warmup click ignored), and the debounce (4/4). Picked up
automatically by tests/run_test.sh. Skips cleanly when the host has no
capture environment so it never falsely fails CI, but fails the suite on any
real regression in click->screenshot->step behavior.

Document the guard in ARCHITECTURE.md and CHANGELOG.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Iisyourdad
2026-06-12 09:25:39 -05:00
parent f2c5831315
commit aafed7eca7
3 changed files with 82 additions and 3 deletions
+7 -3
View File
@@ -156,9 +156,13 @@ Reliability rules that keep "one click → one step" true under load:
one. one.
`STEPFORGE_CLICK_SELFTEST=1 npm start` exercises the whole pipeline in a `STEPFORGE_CLICK_SELFTEST=1 npm start` exercises the whole pipeline in a
real Electron session: it reports steps-per-click and marker offsets, then real Electron session across four scenarios — marker accuracy (0.00%
runs a fast-burst-then-finish scenario that must save every click. offset), a fast-burst-then-finish that must save every click, the
`STEPFORGE_CAPTURE_LOG=1` prints one diagnostic line per click decision. warm-before-arm first click, and the ~200ms debounce. It runs automatically
as `tests/checks/test_click_capture_selftest.sh` (skipped only when the host
has no capture environment), so a regression in click→screenshot→step
behavior fails the suite. `STEPFORGE_CAPTURE_LOG=1` prints one diagnostic
line per click decision.
## Security Rules ## Security Rules
+5
View File
@@ -36,6 +36,11 @@ Keep-a-Changelog conventions; versions follow semver.
screenshot is late" — while every later click was fine. Now frames are screenshot is late" — while every later click was fine. Now frames are
buffering by the time the window tucks away, so the first click is buffering by the time the window tucks away, so the first click is
served a pre-click frame like the rest. served a pre-click frame like the rest.
- The whole click→screenshot→step pipeline is guarded end to end by
`tests/checks/test_click_capture_selftest.sh`, which runs a real Electron
session and asserts marker accuracy, no dropped burst clicks, the first
click of a session captured, and the debounce — so this behavior fails
the suite if it ever regresses.
### Added ### Added
+70
View File
@@ -0,0 +1,70 @@
#!/usr/bin/env bash
# Workflow check: run the full click-capture pipeline end to end in a real
# Electron session (STEPFORGE_CLICK_SELFTEST) and assert every scenario
# passes. This guards the click→screenshot→step behavior — exact markers,
# one step per click, fast bursts not dropped on finish, the first click of a
# session captured (warm-before-arm), and the ~200ms debounce — against
# regressions that unit tests alone can't catch because they don't exercise
# the live capture stream and window timing.
#
# Scenarios and their pass lines (see app/main.js STEPFORGE_CLICK_SELFTEST):
# steps: 3 of 3, each marker "off by 0.00% of screen"
# burst: 8 of 8 (fast clicks + immediate finish, none lost)
# arm: warmup click ignored, first armed click captured
# debounce: 4 of 4 (40ms burst collapses to 1, three 300ms clicks kept)
#
# If the environment can't run a desktop capture at all (no display/stream),
# the scenarios never print, so the check skips rather than failing CI.
set -euo pipefail
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
cd "$ROOT_DIR"
TMP_ROOT="$(mktemp -d)"
trap 'rm -rf "$TMP_ROOT"' EXIT
LOG_FILE="$TMP_ROOT/selftest.log"
set +e
STEPFORGE_DATA_DIR="$TMP_ROOT/data" STEPFORGE_CLICK_SELFTEST=1 \
timeout 120s npm start >"$LOG_FILE" 2>&1
set -e
# The self-test always prints this first line once it begins; without it the
# app never reached the scenarios (couldn't launch / no capture environment).
if ! grep -q 'CLICK-SELFTEST source:' "$LOG_FILE"; then
echo "click capture selftest SKIPPED (no capture environment on this host)"
exit 0
fi
fail() {
echo "click capture selftest FAILED: $1" >&2
echo "----- selftest output -----" >&2
grep -E 'CLICK-SELFTEST' "$LOG_FILE" >&2 || true
exit 1
}
# Any scenario that detected a problem prints FAIL or an ERROR line.
if grep -Eq 'CLICK-SELFTEST.*(FAIL|ERROR)' "$LOG_FILE"; then
fail "a scenario reported FAIL/ERROR"
fi
# Per-scenario positive assertions (deterministic with synthetic clicks).
grep -q 'CLICK-SELFTEST steps: 3 of 3' "$LOG_FILE" \
|| fail "marker scenario did not capture 3 of 3 clicks"
# All three markers must land exactly on the injected click positions.
marker_ok="$(grep -c 'CLICK-SELFTEST marker [0-9]*: off by 0.00% of screen' "$LOG_FILE" || true)"
[[ "$marker_ok" -eq 3 ]] \
|| fail "expected 3 markers at 0.00% offset, found $marker_ok"
grep -q 'CLICK-SELFTEST burst: 8 of 8' "$LOG_FILE" \
|| fail "burst scenario lost clicks on finish"
grep -q 'CLICK-SELFTEST arm:.*OK' "$LOG_FILE" \
|| fail "arm scenario did not capture the first armed click"
grep -q 'CLICK-SELFTEST debounce: 4 of 4 expected OK' "$LOG_FILE" \
|| fail "debounce scenario did not collapse the burst / keep deliberate clicks"
echo "click capture selftest OK (markers, burst, arm, debounce all verified)"