Rearchitect click capture: strict click-time frames, off-main-process recorder, exact marker coordinates
Template tests / tests (push) Successful in 1m50s

Implements the architecture change from ai_prompts/prompt3.md:

- New app/click-frames.js: shared timestamped frame ring + strict
  click-to-frame pairing (never a frame whose grab started after the
  click); legacy slack behavior kept behind capture.strictClickFrames=false.
- New stream capture backend (app/stream-backend.js + hidden worker
  window): per-display desktop media streams sampled into ring buffers
  and PNG-encoded entirely off the main process, so click delivery is
  never starved by capture work. Auto-degrades to the legacy in-process
  frame loop when streams cannot start or the worker stops answering.
- Clicks are paired with their frame at event time (eager pairing in
  enqueueClickCapture); only the storing is serialized, so slow encodes
  cannot skew later clicks in a fast burst.
- Linux watcher: restored event-time root coordinates from
  xinput test-xi2 and merge raw/regular twin events structurally.
- Replaced the 40ms time debounce with source-aware duplicate
  suppression: fast legitimate clicks are never dropped.
- New app/coords.js: physical-to-DIP conversion with multi-monitor and
  scale-factor handling; Windows keeps screenToDipPoint.
- STEPFORGE_CLICK_SELFTEST end-to-end hook: 3/3 clicks become steps via
  the stream backend with 0.00% marker offset on this host.
- Tests rewritten/added: strict selection, coords, stream backend,
  Linux coordinate parsing, twin merge, burst clicking (126 passing).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Iisyourdad
2026-06-11 21:33:31 -05:00
parent c6d0e9e356
commit a0b69f8cc7
14 changed files with 2109 additions and 162 deletions
+35
View File
@@ -102,6 +102,41 @@ IPC API (`stepforge.*`), and `app/main.js` routes calls into `core/`. Screen
capture uses Electron's `desktopCapturer` (full screen, window) and an
overlay window for region selection; hotkeys use `globalShortcut`.
## Click-Capture Pipeline
Workflow recording must behave like one click → one step, with the
screenshot showing the screen *at* the click and the marker on the exact
click position. Three pieces make that hold:
1. **OS click events** (`app/capture.js`): a low-level mouse hook on Windows
(`CLICK x y button unixMs` lines), an `xinput test-xi2 --root` watcher on
X11. The Linux parser carries event-time `root:` coordinates and merges
raw/regular twin blocks structurally — there is no time-based debounce
that could drop fast clicks, only suppression of identical duplicate
deliveries. Physical coordinates convert to DIP via
`screen.screenToDipPoint` on Windows or display-geometry math in
`app/coords.js` elsewhere (multi-monitor and scale-factor aware).
2. **Frame recorders**: while recording, a hidden worker window
(`app/stream-backend.js` + `app/renderer/capture-worker.js`) samples a
desktop media stream per display into a timestamped ring buffer —
entirely off the main process, so click delivery is never delayed by
capture work, and PNG encoding happens in the worker. If streams can't
start (portal-less Wayland), or the worker stops answering, the service
degrades to the legacy in-process `desktopCapturer` loop.
3. **Click ↔ frame pairing** (`app/click-frames.js`, shared by the main
process, the worker, and tests): each click is paired *at event time*
with the newest frame captured at or before its hook timestamp. In strict
mode (`capture.strictClickFrames`, default on) a frame whose grab started
after the click is never used — when nothing qualifies, the service takes
an explicit fresh shot instead of passing a post-click frame off as the
click-time screen. Storing is serialized per click; pairing is not, so
slow encodes never skew later clicks.
`STEPFORGE_CLICK_SELFTEST=1 npm start` exercises the whole pipeline in a
real Electron session and reports steps-per-click and marker offsets.
## Security Rules
- Zero network code paths: no sockets, no telemetry, no update or license