Rearchitect click capture: strict click-time frames, off-main-process recorder, exact marker coordinates
Template tests / tests (push) Successful in 1m50s
Template tests / tests (push) Successful in 1m50s
Implements the architecture change from ai_prompts/prompt3.md: - New app/click-frames.js: shared timestamped frame ring + strict click-to-frame pairing (never a frame whose grab started after the click); legacy slack behavior kept behind capture.strictClickFrames=false. - New stream capture backend (app/stream-backend.js + hidden worker window): per-display desktop media streams sampled into ring buffers and PNG-encoded entirely off the main process, so click delivery is never starved by capture work. Auto-degrades to the legacy in-process frame loop when streams cannot start or the worker stops answering. - Clicks are paired with their frame at event time (eager pairing in enqueueClickCapture); only the storing is serialized, so slow encodes cannot skew later clicks in a fast burst. - Linux watcher: restored event-time root coordinates from xinput test-xi2 and merge raw/regular twin events structurally. - Replaced the 40ms time debounce with source-aware duplicate suppression: fast legitimate clicks are never dropped. - New app/coords.js: physical-to-DIP conversion with multi-monitor and scale-factor handling; Windows keeps screenToDipPoint. - STEPFORGE_CLICK_SELFTEST end-to-end hook: 3/3 clicks become steps via the stream backend with 0.00% marker offset on this host. - Tests rewritten/added: strict selection, coords, stream backend, Linux coordinate parsing, twin merge, burst clicking (126 passing). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,17 @@
|
||||
'use strict';
|
||||
|
||||
const { contextBridge, ipcRenderer } = require('electron');
|
||||
|
||||
/**
|
||||
* Bridge for the hidden capture-worker window. The worker only ever talks to
|
||||
* the StreamCaptureBackend in the main process: commands in (start streams,
|
||||
* frame requests), events out (stream health, PNG-encoded frames).
|
||||
*/
|
||||
contextBridge.exposeInMainWorld('captureWorkerBridge', {
|
||||
onCommand(fn) {
|
||||
ipcRenderer.on('capture-worker:command', (_event, msg) => fn(msg));
|
||||
},
|
||||
send(msg) {
|
||||
ipcRenderer.send('capture-worker:event', msg);
|
||||
},
|
||||
});
|
||||
+395
-111
@@ -6,53 +6,74 @@ const { desktopCapturer, screen, BrowserWindow, nativeImage, Tray, Menu, Notific
|
||||
const { expandPlaceholders } = require('../core/placeholders');
|
||||
const raster = require('../core/raster');
|
||||
const { encodePng } = require('../core/png');
|
||||
const {
|
||||
selectFrameForClick,
|
||||
frameUsableForClick,
|
||||
pointInBounds,
|
||||
DEFAULT_MAX_AGE_MS,
|
||||
DEFAULT_START_SLACK_MS,
|
||||
} = require('./click-frames');
|
||||
const { physicalToDip } = require('./coords');
|
||||
|
||||
/**
|
||||
* Capture service: full-screen, active-window, and region capture via
|
||||
* Electron's desktopCapturer, plus a click-marker annotation at the cursor
|
||||
* position and a capture session (start/pause/resume/finish).
|
||||
* Capture service: full-screen, active-window, and region capture, plus a
|
||||
* click-marker annotation at the click position and a capture session
|
||||
* (start/pause/resume/finish).
|
||||
*
|
||||
* A session captures continuously, with three triggers layered by what the
|
||||
* platform supports:
|
||||
* - click-capture via an OS adapter (xinput on X11, PowerShell on Windows),
|
||||
* - click-capture via an OS adapter (xinput on X11, a low-level mouse hook
|
||||
* on Windows),
|
||||
* - a global hotkey (unreliable on some Wayland compositors),
|
||||
* - interval auto-capture as the always-works fallback.
|
||||
*
|
||||
* Click captures are served from one of two frame recorders:
|
||||
* - the stream backend (app/stream-backend.js): a hidden worker window
|
||||
* samples a desktop media stream per display into a timestamped ring
|
||||
* buffer, entirely off the main process. This is the preferred path —
|
||||
* the main-process event loop stays free, so OS click events arrive on
|
||||
* time, and the tight sampling cadence keeps a genuinely fresh pre-click
|
||||
* frame available for every click;
|
||||
* - the legacy in-process frame loop below, kept as the fallback when
|
||||
* streams can't start (portal-less Wayland, exotic drivers).
|
||||
*
|
||||
* Either way the pairing rule is the same (click-frames.js): in strict mode
|
||||
* a click only ever gets a frame captured at or before the click — never one
|
||||
* whose grab started after it.
|
||||
*
|
||||
* Note: under Wayland/WSLg, screen capture may require portal support; all
|
||||
* failures surface as { ok: false, reason } instead of crashing.
|
||||
*/
|
||||
|
||||
// Dedupe duplicate watcher events for one physical click while still
|
||||
// allowing intentionally fast clicking.
|
||||
const CLICK_DEBOUNCE_MS = 40;
|
||||
// Idle gap between frame-loop grabs. Must stay well above zero: grabbing
|
||||
// back-to-back starves the main-process event loop, which delays delivery
|
||||
// of click events from the OS watcher by whole seconds. The frame history
|
||||
// plus hook-side click timestamps tolerate the coarser cadence.
|
||||
// Suppress only *duplicate deliveries* of one physical press (same button,
|
||||
// same coordinates, a few ms apart). This deliberately replaces the old
|
||||
// time-only debounce: real humans double-click ~50-100ms apart, and any
|
||||
// purely temporal cutoff eventually drops a legitimate fast click, which
|
||||
// reads as "my click didn't register". One hook/watcher event = one click.
|
||||
const CLICK_EVENT_DUPLICATE_MS = 8;
|
||||
// How long a Linux raw button event waits for its regular twin (the
|
||||
// representation that carries root coordinates) before firing without them.
|
||||
const LINUX_CLICK_TWIN_MS = 25;
|
||||
// Idle gap between legacy frame-loop grabs. Must stay well above zero:
|
||||
// grabbing back-to-back starves the main-process event loop, which delays
|
||||
// delivery of click events from the OS watcher by whole seconds. (The
|
||||
// stream backend exists precisely because of this constraint.)
|
||||
const FRAME_LOOP_IDLE_MS = 200;
|
||||
// A buffered frame older than this is too stale to pass off as "the screen
|
||||
// at the instant of the click".
|
||||
const CLICK_FRAME_MAX_AGE_MS = 600;
|
||||
// at the instant of the click". Shared with click-frames.js.
|
||||
const CLICK_FRAME_MAX_AGE_MS = DEFAULT_MAX_AGE_MS;
|
||||
// How long a click waits for the in-flight grab before falling back to a
|
||||
// one-off fresh shot.
|
||||
const CLICK_FRAME_WAIT_MS = 2000;
|
||||
// A loop grab that started at most this long after the click still shows
|
||||
// the screen the user clicked on (UI reactions render slower than this).
|
||||
const CLICK_FRAME_START_SLACK_MS = 300;
|
||||
// Balanced (non-strict) mode only: a loop grab that started at most this
|
||||
// long after the click is still accepted. Strict mode never does this.
|
||||
const CLICK_FRAME_START_SLACK_MS = DEFAULT_START_SLACK_MS;
|
||||
const CLICK_CAPTURE_HIDE_DELAY_MS = 25;
|
||||
// Frames now hold raw images (~20MB each at 2880x1800), so keep the history
|
||||
// Frames hold raw images (~20MB each at 2880x1800), so keep the history
|
||||
// window wide enough to outlast any processing hiccup but the count low.
|
||||
const RECENT_FRAME_RETENTION_MS = 4000;
|
||||
const RECENT_FRAME_LIMIT = 4;
|
||||
|
||||
function pointInBounds(point, bounds) {
|
||||
if (!point || !bounds) return false;
|
||||
return point.x >= bounds.x
|
||||
&& point.x <= bounds.x + bounds.width
|
||||
&& point.y >= bounds.y
|
||||
&& point.y <= bounds.y + bounds.height;
|
||||
}
|
||||
|
||||
function hasBinary(name) {
|
||||
try {
|
||||
execFileSync('which', [name], { stdio: 'pipe' });
|
||||
@@ -63,11 +84,14 @@ function hasBinary(name) {
|
||||
}
|
||||
|
||||
class CaptureService {
|
||||
constructor({ store, settings, getWindow, notify }) {
|
||||
constructor({ store, settings, getWindow, notify, screenApi = screen }) {
|
||||
this.store = store;
|
||||
this.settings = settings;
|
||||
this.getWindow = getWindow;
|
||||
this.notify = notify;
|
||||
// Injectable for tests; the click/coordinate paths must never reach for
|
||||
// the global `screen` directly so coordinate handling stays testable.
|
||||
this.screen = screenApi;
|
||||
this.session = null; // { guideId, paused, count, intervalSec }
|
||||
this.intervalTimer = null;
|
||||
this.clickWatcher = null;
|
||||
@@ -76,14 +100,17 @@ class CaptureService {
|
||||
this.frameWaiters = [];
|
||||
this.latestFrame = null;
|
||||
this.clickWatcherBuf = '';
|
||||
this.clickWatcherPendingPress = false;
|
||||
this.clickWatcherErrTail = '';
|
||||
this.linuxEvent = null; // event block currently being parsed
|
||||
this.pendingRawClick = null; // raw press waiting for its coordinate twin
|
||||
this.clickQueue = Promise.resolve();
|
||||
this.frameLoopInFlight = false;
|
||||
this.frameLoopGrabStartedAt = null;
|
||||
this.recentFrames = [];
|
||||
this.shooting = false;
|
||||
this.lastClickCaptureByButton = new Map();
|
||||
this.lastClickEventByButton = new Map();
|
||||
this.streamBackend = null;
|
||||
this.streamBackendStarting = false;
|
||||
}
|
||||
|
||||
state() {
|
||||
@@ -96,10 +123,24 @@ class CaptureService {
|
||||
intervalSec: this.session.intervalSec || 0,
|
||||
clickCapture: Boolean(this.clickWatcher),
|
||||
clickCaptureAvailable: this.clickCaptureAvailable(),
|
||||
clickFrameSource: this.streamBackend ? 'stream' : (this.frameLoopRunning ? 'loop' : 'idle'),
|
||||
strictClickFrames: this.strictClickFrames(),
|
||||
}
|
||||
: { active: false, clickCaptureAvailable: this.clickCaptureAvailable() };
|
||||
}
|
||||
|
||||
/**
|
||||
* Strict is the default: a stored step must never show the screen *after*
|
||||
* its click (a frame whose grab started post-click can already contain the
|
||||
* click's effects). The setting exists as an explicit escape hatch for
|
||||
* machines where capture is too slow to keep pre-click frames buffered —
|
||||
* there, the legacy slack heuristics trade accuracy for fewer fresh-shot
|
||||
* fallbacks.
|
||||
*/
|
||||
strictClickFrames() {
|
||||
return this.settings.get('capture.strictClickFrames') !== false;
|
||||
}
|
||||
|
||||
clickCaptureAvailable() {
|
||||
if (this._clickAvail === undefined) {
|
||||
this._clickAvail = process.platform === 'win32' || (process.platform === 'linux' && hasBinary('xinput'));
|
||||
@@ -223,22 +264,23 @@ class CaptureService {
|
||||
const wasPaused = this.session.paused;
|
||||
this.session.paused = typeof force === 'boolean' ? force : !this.session.paused;
|
||||
// Starting/resuming tucks the window away again for clean shots (after
|
||||
// a brief delay so the user sees it happen) and starts the frame loop
|
||||
// that serves click captures. Pausing stops the loop and discards the
|
||||
// buffered frame, so a resume can never serve a pre-pause screen.
|
||||
// a brief delay so the user sees it happen) and starts the frame
|
||||
// recorder that serves click captures. Pausing stops it and discards
|
||||
// buffered frames, so a resume can never serve a pre-pause screen.
|
||||
if (wasPaused && !this.session.paused) {
|
||||
const win = this.getWindow();
|
||||
const arm = () => {
|
||||
if (!this.session || this.session.paused) return;
|
||||
if (this.hiddenForSession && win && !win.isDestroyed() && win.isVisible()) win.hide();
|
||||
if (this.settings.get('capture.captureOutsideClicks') !== false && this.clickCaptureAvailable()) {
|
||||
this.startFrameLoop();
|
||||
this.startClickFrameBackend().catch(() => {});
|
||||
}
|
||||
};
|
||||
if (this.hiddenForSession && win && !win.isDestroyed()) setTimeout(arm, 400);
|
||||
else arm();
|
||||
} else if (!wasPaused && this.session.paused) {
|
||||
this.stopFrameLoop();
|
||||
this.stopClickFrameBackend();
|
||||
}
|
||||
if (this.rebuildTrayMenu) this.rebuildTrayMenu();
|
||||
this.notify('capture:state', this.state());
|
||||
@@ -251,6 +293,7 @@ class CaptureService {
|
||||
}
|
||||
this.stopClickWatcher();
|
||||
this.stopFrameLoop();
|
||||
this.stopClickFrameBackend();
|
||||
this.destroySessionTray();
|
||||
this.session = null;
|
||||
if (this.hiddenForSession) {
|
||||
@@ -269,7 +312,7 @@ class CaptureService {
|
||||
userIsInApp() {
|
||||
const win = this.getWindow();
|
||||
if (!win || win.isDestroyed() || !win.isVisible() || win.isMinimized()) return false;
|
||||
const cur = screen.getCursorScreenPoint();
|
||||
const cur = this.screen.getCursorScreenPoint();
|
||||
const b = win.getBounds();
|
||||
return cur.x >= b.x && cur.x <= b.x + b.width && cur.y >= b.y && cur.y <= b.y + b.height;
|
||||
}
|
||||
@@ -283,14 +326,18 @@ class CaptureService {
|
||||
return { ok: false, reason: 'skipped — StepForge is focused' };
|
||||
}
|
||||
|
||||
// Clicks are served from the frame loop: the buffered frame was grabbed
|
||||
// at (or moments before) the click instant, so the background matches
|
||||
// what the user clicked on. A click that lands while a grab is in
|
||||
// flight waits for that frame instead of being dropped, so fast
|
||||
// Clicks are served from the frame recorder: the chosen frame was
|
||||
// captured at (or moments before) the click instant, so the background
|
||||
// matches what the user clicked on. A click that lands while a grab is
|
||||
// in flight waits for that frame instead of being dropped, so fast
|
||||
// clicking still yields one step per click.
|
||||
if (trigger === 'click') {
|
||||
const clickAt = clickMeta && Number.isFinite(clickMeta.at) ? clickMeta.at : Date.now();
|
||||
const frame = await this.frameForClick(clickPos, clickAt);
|
||||
// Prefer the frame the click was paired with at event time (see
|
||||
// enqueueClickCapture); ask now only when no eager pairing happened.
|
||||
const frame = clickMeta && clickMeta.framePromise
|
||||
? await clickMeta.framePromise
|
||||
: await this.frameForClick(clickPos, clickAt);
|
||||
if (!this.session || this.session.paused) return { ok: false, reason: 'no active capture session' };
|
||||
if (frame) {
|
||||
const result = this.storeFrameAsStep(this.session.guideId, frame.mode, frame, clickPos);
|
||||
@@ -335,11 +382,14 @@ class CaptureService {
|
||||
// ---- click-triggered capture --------------------------------------------
|
||||
|
||||
/**
|
||||
* Continuous screen-grab loop that runs while recording. It keeps the most
|
||||
* recent frame in `latestFrame` so a click can be served from a frame
|
||||
* grabbed at (or moments before) the instant of the click — a fresh grab
|
||||
* started after the click would land hundreds of ms late and show the
|
||||
* click's effects instead of what the user clicked on.
|
||||
* Fallback frame recorder: a continuous screen-grab loop in the main
|
||||
* process, used only when the stream backend can't run. It keeps the most
|
||||
* recent frames buffered so a click can be served from a frame grabbed at
|
||||
* (or moments before) the instant of the click — a fresh grab started
|
||||
* after the click would land hundreds of ms late and show the click's
|
||||
* effects instead of what the user clicked on. Its cadence is capped at
|
||||
* FRAME_LOOP_IDLE_MS because tighter grabbing here starves the event loop
|
||||
* and delays the very click events it serves.
|
||||
*/
|
||||
startFrameLoop() {
|
||||
if (this.frameLoopRunning) return;
|
||||
@@ -416,45 +466,67 @@ class CaptureService {
|
||||
}
|
||||
|
||||
/**
|
||||
* Freshest frame usable for a click capture: the buffered frame when it's
|
||||
* recent enough, otherwise the next frame the loop delivers. Null when the
|
||||
* loop isn't running or can't deliver in time.
|
||||
* Frame representing the screen at the instant of one click.
|
||||
*
|
||||
* Order of preference:
|
||||
* 1. the stream backend's ring buffer (off-main-process, tight cadence);
|
||||
* 2. the legacy loop's buffered frames;
|
||||
* 3. waiting for the loop grab that was already in flight when the user
|
||||
* clicked.
|
||||
* Selection semantics live in click-frames.js. In strict mode every path
|
||||
* obeys the same rule — never a frame whose grab started after the click —
|
||||
* and when nothing qualifies this returns null so the caller takes the
|
||||
* *explicit* fresh-shot fallback rather than silently passing a post-click
|
||||
* frame off as the click-time screen.
|
||||
*/
|
||||
async frameForClick(clickPos = null, clickAt = Date.now()) {
|
||||
const mode = this.settings.get('capture.mode') || 'fullscreen';
|
||||
const grabMode = mode === 'region' ? 'fullscreen' : mode;
|
||||
const clickTime = Number.isFinite(clickAt) ? clickAt : Date.now();
|
||||
// Fast clicks can move to another monitor before the buffered frame is
|
||||
// consumed; only reuse frames from the clicked display.
|
||||
const usable = (f, { allowInFlight = false } = {}) => {
|
||||
const sameDisplay = !clickPos || pointInBounds(clickPos, f && f.display && f.display.bounds);
|
||||
const startedAt = Number.isFinite(f && f.startedAt) ? f.startedAt : (f && f.capturedAt);
|
||||
const completedBeforeClick = Number.isFinite(f && f.capturedAt) && f.capturedAt <= clickTime;
|
||||
// A grab that began within the slack window after the click still
|
||||
// shows the click-instant screen (UI reactions take longer than the
|
||||
// slack to render), and it beats the alternative — a fresh shot that
|
||||
// both starts later and stalls the loop for every queued click.
|
||||
const startedNearClick = Number.isFinite(startedAt)
|
||||
&& startedAt <= clickTime + CLICK_FRAME_START_SLACK_MS;
|
||||
const timingMatches = completedBeforeClick
|
||||
? clickTime - f.capturedAt <= CLICK_FRAME_MAX_AGE_MS
|
||||
: allowInFlight && startedNearClick;
|
||||
return Boolean(f)
|
||||
&& f.mode === grabMode
|
||||
&& timingMatches
|
||||
&& sameDisplay;
|
||||
const strict = this.strictClickFrames();
|
||||
const opts = {
|
||||
clickAt: clickTime,
|
||||
clickPos,
|
||||
mode: grabMode,
|
||||
strict,
|
||||
maxAgeMs: CLICK_FRAME_MAX_AGE_MS,
|
||||
startSlackMs: CLICK_FRAME_START_SLACK_MS,
|
||||
};
|
||||
const buffered = [...this.recentFrames, this.latestFrame]
|
||||
.filter((f, i, arr) => f && arr.indexOf(f) === i && usable(f))
|
||||
.sort((a, b) => b.capturedAt - a.capturedAt)[0];
|
||||
|
||||
if (this.streamBackend && this.streamBackend.isActive() && grabMode === 'fullscreen') {
|
||||
const frame = await this.streamBackend.frameForClick({ clickPos, clickAt: clickTime, strict });
|
||||
if (frame) return frame;
|
||||
// No qualifying frame (or the backend just went unhealthy): fall
|
||||
// through to the loop buffer / fresh-shot fallbacks below.
|
||||
}
|
||||
|
||||
const buffered = selectFrameForClick(
|
||||
[...this.recentFrames, this.latestFrame].filter((f, i, arr) => f && arr.indexOf(f) === i),
|
||||
opts,
|
||||
);
|
||||
if (buffered) return buffered;
|
||||
// As long as the loop is running, the next grab is at most one idle gap
|
||||
// away — wait for it rather than racing it with a one-off shot.
|
||||
if (!this.frameLoopRunning) return null;
|
||||
|
||||
if (strict) {
|
||||
// Only a grab already in flight when the user clicked can still
|
||||
// qualify: its pixels predate the click even though it completes
|
||||
// after. Any grab starting later is post-click by definition, so
|
||||
// don't wait around for one — return immediately and let the caller
|
||||
// take the fresh-shot fallback.
|
||||
const inFlightStartedBeforeClick = this.frameLoopInFlight
|
||||
&& Number.isFinite(this.frameLoopGrabStartedAt)
|
||||
&& this.frameLoopGrabStartedAt <= clickTime;
|
||||
if (!inFlightStartedBeforeClick) return null;
|
||||
const next = await this.nextFrame(CLICK_FRAME_WAIT_MS);
|
||||
return frameUsableForClick(next, { ...opts, allowInFlight: true }) ? next : null;
|
||||
}
|
||||
|
||||
// Balanced (legacy) mode: wait for the next loop frame and accept it if
|
||||
// its grab started within the slack window after the click.
|
||||
const deadline = Date.now() + CLICK_FRAME_WAIT_MS;
|
||||
while (this.frameLoopRunning && Date.now() < deadline) {
|
||||
const next = await this.nextFrame(Math.max(1, deadline - Date.now()));
|
||||
if (usable(next, { allowInFlight: true })) return next;
|
||||
if (frameUsableForClick(next, { ...opts, allowInFlight: true })) return next;
|
||||
if (next && Number.isFinite(next.startedAt)
|
||||
&& next.startedAt > clickTime + CLICK_FRAME_START_SLACK_MS) {
|
||||
// Grabs only get later from here; let the fresh-shot path handle it.
|
||||
@@ -464,11 +536,79 @@ class CaptureService {
|
||||
return null;
|
||||
}
|
||||
|
||||
// ---- click-frame backends -------------------------------------------------
|
||||
|
||||
/**
|
||||
* Bring up the frame recorder for a recording run. The stream backend is
|
||||
* the architecture path (capture entirely off the main process); the
|
||||
* in-process frame loop is the fallback when streams can't start — and the
|
||||
* automatic degradation target if the worker stops answering mid-session.
|
||||
*/
|
||||
async startClickFrameBackend() {
|
||||
const mode = this.settings.get('capture.mode') || 'fullscreen';
|
||||
// The worker streams screens; window-mode grabs need the loop's
|
||||
// source-filtering logic.
|
||||
if (this.settings.get('capture.streamCapture') === false || mode === 'window') {
|
||||
this.startFrameLoop();
|
||||
return;
|
||||
}
|
||||
if (this.streamBackend || this.streamBackendStarting) return;
|
||||
this.streamBackendStarting = true;
|
||||
try {
|
||||
// eslint-disable-next-line global-require
|
||||
const { StreamCaptureBackend, createElectronHost } = require('./stream-backend');
|
||||
const backend = new StreamCaptureBackend({
|
||||
createHost: createElectronHost,
|
||||
onUnhealthy: () => this.degradeToFrameLoop(),
|
||||
});
|
||||
const displays = this.screen.getAllDisplays();
|
||||
const sources = await desktopCapturer.getSources({
|
||||
types: ['screen'],
|
||||
thumbnailSize: { width: 1, height: 1 }, // ids only — skip thumbnail work
|
||||
});
|
||||
const ok = await backend.start({
|
||||
displays,
|
||||
sources: sources.map((s) => ({ id: s.id, display_id: s.display_id })),
|
||||
sampleMs: this.settings.get('capture.frameSampleMs') || 100,
|
||||
});
|
||||
if (!ok || !this.session || this.session.paused) {
|
||||
backend.stop();
|
||||
if (this.session && !this.session.paused) this.startFrameLoop();
|
||||
return;
|
||||
}
|
||||
this.streamBackend = backend;
|
||||
this.notify('capture:state', this.state());
|
||||
} catch {
|
||||
if (this.session && !this.session.paused) this.startFrameLoop();
|
||||
} finally {
|
||||
this.streamBackendStarting = false;
|
||||
}
|
||||
}
|
||||
|
||||
stopClickFrameBackend() {
|
||||
if (!this.streamBackend) return;
|
||||
const backend = this.streamBackend;
|
||||
this.streamBackend = null;
|
||||
backend.stop();
|
||||
}
|
||||
|
||||
/**
|
||||
* The worker stopped answering frame requests. Capture must not silently
|
||||
* stop mid-session: drop the backend and run the in-process loop for the
|
||||
* rest of the recording.
|
||||
*/
|
||||
degradeToFrameLoop() {
|
||||
this.streamBackend = null;
|
||||
console.error('[stepforge] stream capture backend unhealthy — falling back to in-process frame loop');
|
||||
if (this.session && !this.session.paused) this.startFrameLoop();
|
||||
this.notify('capture:state', this.state());
|
||||
}
|
||||
|
||||
startClickWatcher() {
|
||||
this.stopClickWatcher();
|
||||
try {
|
||||
this.clickWatcherBuf = '';
|
||||
this.clickWatcherPendingPress = false;
|
||||
this.linuxEvent = null;
|
||||
if (process.platform === 'linux' && hasBinary('xinput')) {
|
||||
// Stream raw button events from the X server; one capture per press.
|
||||
// xinput block-buffers stdout when piped, so a press event can sit
|
||||
@@ -660,7 +800,8 @@ public static class SFMouseHook {
|
||||
* the session to interval captures, and tell the UI.
|
||||
*/
|
||||
handleClickWatcherLoss(reason) {
|
||||
this.clickWatcherPendingPress = false;
|
||||
this.linuxEvent = null;
|
||||
this.discardPendingRawClick();
|
||||
const detail = [reason, this.clickWatcherErrTail].filter(Boolean).join(' — ');
|
||||
console.error(`[stepforge] click watcher stopped${detail ? `: ${detail}` : ''}`);
|
||||
if (!this.session) return;
|
||||
@@ -677,8 +818,9 @@ public static class SFMouseHook {
|
||||
this.clickWatcher = null;
|
||||
}
|
||||
this.clickWatcherBuf = '';
|
||||
this.clickWatcherPendingPress = false;
|
||||
this.lastClickCaptureByButton.clear();
|
||||
this.linuxEvent = null;
|
||||
this.discardPendingRawClick();
|
||||
this.lastClickEventByButton.clear();
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -698,29 +840,58 @@ public static class SFMouseHook {
|
||||
processClickWatcherData(text, platform = process.platform) {
|
||||
const lines = String(text).split(/\r?\n/);
|
||||
if (platform === 'linux') {
|
||||
// xinput prints each event as a multi-line block: an "EVENT type …
|
||||
// (RawButtonPress)" header followed by a "detail: N" line carrying the
|
||||
// button number. Fire on the detail line so scroll-wheel ticks (X11
|
||||
// reports them as buttons 4-7) neither create steps nor debounce away
|
||||
// the real clicks that follow them.
|
||||
// xinput test-xi2 --root prints each event as a multi-line block:
|
||||
//
|
||||
// EVENT type 4 (ButtonPress) EVENT type 15 (RawButtonPress)
|
||||
// device: 11 (10) device: 11 (11)
|
||||
// detail: 1 detail: 1
|
||||
// root: 644.52/343.55 valuators: …
|
||||
//
|
||||
// Regular (non-raw) blocks carry the event-time root coordinates —
|
||||
// exactly what the click marker needs, because a cursor read at parse
|
||||
// time drifts whenever delivery is delayed or the pointer keeps
|
||||
// moving after the click. Raw blocks have no coordinates, but on many
|
||||
// servers they are the only representation delivered for the root
|
||||
// window, so both kinds must fire. One physical press can produce
|
||||
// *both* representations; that duplication is resolved structurally
|
||||
// in fireLinuxClick (raw press briefly waits for its regular twin and
|
||||
// they merge into one click), never by a time-only debounce that
|
||||
// could swallow legitimate fast clicks.
|
||||
for (const line of lines) {
|
||||
if (!line) continue;
|
||||
if (/RawButtonPress|ButtonPress/.test(line)) {
|
||||
if (this.clickWatcherPendingPress) this.onOsClick();
|
||||
this.clickWatcherPendingPress = true;
|
||||
const header = /EVENT type \d+ \(([A-Za-z]+)\)/.exec(line);
|
||||
if (header) {
|
||||
this.finishLinuxEvent();
|
||||
const name = header[1];
|
||||
this.linuxEvent = /ButtonPress$/.test(name)
|
||||
? { name, raw: /^Raw/.test(name), button: null, at: Date.now(), fired: false }
|
||||
: null;
|
||||
continue;
|
||||
}
|
||||
if (!this.clickWatcherPendingPress) continue;
|
||||
const detail = line.match(/detail:\s*(\d+)/);
|
||||
const ev = this.linuxEvent;
|
||||
if (!ev || ev.fired) continue;
|
||||
const detail = /detail:\s*(\d+)/.exec(line);
|
||||
if (detail) {
|
||||
this.clickWatcherPendingPress = false;
|
||||
const button = Number(detail[1]);
|
||||
if (button < 4 || button > 7) this.onOsClick(Date.now(), null, `button-${button}`);
|
||||
} else if (line.includes('EVENT type')) {
|
||||
// Next event arrived without a detail line in between — treat the
|
||||
// pending press as a plain click rather than dropping it.
|
||||
this.clickWatcherPendingPress = false;
|
||||
this.onOsClick();
|
||||
ev.button = Number(detail[1]);
|
||||
if (ev.button >= 4 && ev.button <= 7) {
|
||||
// Scroll-wheel ticks (X11 buttons 4-7) are not clicks.
|
||||
this.linuxEvent = null;
|
||||
} else if (ev.raw) {
|
||||
// Raw blocks never carry coordinates; this one is complete.
|
||||
ev.fired = true;
|
||||
this.linuxEvent = null;
|
||||
this.fireLinuxClick(ev.at, null, ev.button, { raw: true });
|
||||
}
|
||||
continue;
|
||||
}
|
||||
const root = /root:\s*(-?[\d.]+)\/(-?[\d.]+)/.exec(line);
|
||||
if (root && !ev.raw && ev.button != null) {
|
||||
ev.fired = true;
|
||||
this.linuxEvent = null;
|
||||
this.fireLinuxClick(ev.at, {
|
||||
x: Math.round(parseFloat(root[1])),
|
||||
y: Math.round(parseFloat(root[2])),
|
||||
}, ev.button, { raw: false });
|
||||
}
|
||||
}
|
||||
return;
|
||||
@@ -737,27 +908,127 @@ public static class SFMouseHook {
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* A new event header arrived while a press block was still open: the block
|
||||
* ended without the line we fire on. Old xinput builds sometimes omit
|
||||
* detail lines entirely — treat such a press as a plain click rather than
|
||||
* dropping it.
|
||||
*/
|
||||
finishLinuxEvent() {
|
||||
const ev = this.linuxEvent;
|
||||
this.linuxEvent = null;
|
||||
if (!ev || ev.fired) return;
|
||||
if (ev.button == null) {
|
||||
this.onOsClick(ev.at, null, 'mouse');
|
||||
} else if (!ev.raw) {
|
||||
// Regular press whose root line never showed up — fire without
|
||||
// coordinates; onOsClick falls back to a cursor read.
|
||||
this.fireLinuxClick(ev.at, null, ev.button, { raw: false });
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Funnel for parsed Linux button presses. Raw and regular blocks for the
|
||||
* same physical press are merged here: a raw press (no coordinates) is
|
||||
* held for LINUX_CLICK_TWIN_MS; if the regular twin (with root
|
||||
* coordinates) arrives inside that window the pair fires once, with the
|
||||
* raw block's earlier timestamp and the regular block's coordinates.
|
||||
* Distinct presses always fire — there is no time-based dropping.
|
||||
*/
|
||||
fireLinuxClick(at, osPoint, button, { raw = false } = {}) {
|
||||
const pending = this.pendingRawClick;
|
||||
if (raw) {
|
||||
// Two raw presses can't be one click — release the held one first.
|
||||
this.flushPendingRawClick();
|
||||
const entry = { button, at, timer: null };
|
||||
entry.timer = setTimeout(() => {
|
||||
if (this.pendingRawClick !== entry) return;
|
||||
this.pendingRawClick = null;
|
||||
this.onOsClick(entry.at, null, `button-${entry.button}`);
|
||||
}, LINUX_CLICK_TWIN_MS);
|
||||
if (entry.timer.unref) entry.timer.unref();
|
||||
this.pendingRawClick = entry;
|
||||
return;
|
||||
}
|
||||
if (pending && pending.button === button) {
|
||||
// The regular twin of the held raw press: one physical click.
|
||||
this.pendingRawClick = null;
|
||||
clearTimeout(pending.timer);
|
||||
this.onOsClick(Math.min(pending.at, at), osPoint, `button-${button}`);
|
||||
return;
|
||||
}
|
||||
this.onOsClick(at, osPoint, `button-${button}`);
|
||||
}
|
||||
|
||||
/** Fire the held raw press immediately (its twin is not coming). */
|
||||
flushPendingRawClick() {
|
||||
const pending = this.pendingRawClick;
|
||||
if (!pending) return;
|
||||
this.pendingRawClick = null;
|
||||
clearTimeout(pending.timer);
|
||||
this.onOsClick(pending.at, null, `button-${pending.button}`);
|
||||
}
|
||||
|
||||
discardPendingRawClick() {
|
||||
if (!this.pendingRawClick) return;
|
||||
clearTimeout(this.pendingRawClick.timer);
|
||||
this.pendingRawClick = null;
|
||||
}
|
||||
|
||||
onOsClick(at = Date.now(), osPoint = null, button = 'mouse') {
|
||||
if (!this.session || this.session.paused) return;
|
||||
const clickAt = Number.isFinite(at) ? at : Date.now();
|
||||
const debounceKey = button || 'mouse';
|
||||
const last = this.lastClickCaptureByButton.get(debounceKey) || 0;
|
||||
if (clickAt >= last && clickAt - last < CLICK_DEBOUNCE_MS) return;
|
||||
this.lastClickCaptureByButton.set(debounceKey, clickAt);
|
||||
// Source-aware dedupe, not a debounce: each hook/watcher event is one
|
||||
// click however fast it follows the previous one. Only an *identical*
|
||||
// event a few ms later — duplicate delivery of one physical press — is
|
||||
// suppressed.
|
||||
if (this.isDuplicateClickEvent(clickAt, osPoint, button)) return;
|
||||
// Prefer the position the watcher sampled with the button-down event
|
||||
// (physical px -> DIP); otherwise read the cursor synchronously,
|
||||
// right now, so the marker lands where the user clicked even if the
|
||||
// shot itself takes a moment to grab. (Clicks on StepForge itself are
|
||||
// (physical px -> DIP); otherwise read the cursor synchronously, right
|
||||
// now, so the marker lands where the user clicked even if the shot
|
||||
// itself takes a moment to grab. (Clicks on StepForge itself are
|
||||
// filtered by the cursor-position check in sessionCapture, not by
|
||||
// window focus — WSLg reports focus unreliably.)
|
||||
let clickPos = null;
|
||||
if (osPoint) {
|
||||
clickPos = typeof screen.screenToDipPoint === 'function'
|
||||
? screen.screenToDipPoint(osPoint)
|
||||
: osPoint;
|
||||
let clickPos = osPoint ? this.osPointToDip(osPoint) : null;
|
||||
if (!clickPos) clickPos = this.screen.getCursorScreenPoint();
|
||||
this.enqueueClickCapture(clickPos, clickAt, button || 'mouse');
|
||||
}
|
||||
|
||||
isDuplicateClickEvent(at, osPoint, button) {
|
||||
const key = button || 'mouse';
|
||||
const last = this.lastClickEventByButton.get(key);
|
||||
this.lastClickEventByButton.set(key, { at, osPoint });
|
||||
if (!last) return false;
|
||||
if (at < last.at || at - last.at >= CLICK_EVENT_DUPLICATE_MS) return false;
|
||||
// Same button within a few ms: duplicate only if it is the *same* event
|
||||
// (same coordinates, or neither delivery carried coordinates).
|
||||
if (osPoint && last.osPoint) {
|
||||
return osPoint.x === last.osPoint.x && osPoint.y === last.osPoint.y;
|
||||
}
|
||||
if (!clickPos) clickPos = screen.getCursorScreenPoint();
|
||||
this.enqueueClickCapture(clickPos, clickAt, debounceKey);
|
||||
return !osPoint && !last.osPoint;
|
||||
}
|
||||
|
||||
/**
|
||||
* Physical (OS event) pixels -> DIP. Windows exposes the canonical
|
||||
* conversion; on Linux/X11 it is reconstructed from display geometry (see
|
||||
* app/coords.js). Without this, the click marker drifts on any display
|
||||
* scaled away from 100% and on secondary monitors.
|
||||
*/
|
||||
osPointToDip(osPoint) {
|
||||
if (this.screen && typeof this.screen.screenToDipPoint === 'function') {
|
||||
try {
|
||||
const dip = this.screen.screenToDipPoint(osPoint);
|
||||
if (dip && Number.isFinite(dip.x) && Number.isFinite(dip.y)) return dip;
|
||||
} catch { /* fall through to manual conversion */ }
|
||||
}
|
||||
try {
|
||||
const displays = this.screen && typeof this.screen.getAllDisplays === 'function'
|
||||
? this.screen.getAllDisplays()
|
||||
: [];
|
||||
const dip = physicalToDip(osPoint, displays);
|
||||
if (dip) return dip;
|
||||
} catch { /* no display geometry available */ }
|
||||
return osPoint;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -765,9 +1036,20 @@ public static class SFMouseHook {
|
||||
* still being stored queues behind it instead of being dropped by the
|
||||
* "capture already in progress" guard. The marker position was already
|
||||
* read at click time, so a queued step still circles the right spot.
|
||||
*
|
||||
* Crucially, only the *storing* is serialized. The click is paired with
|
||||
* its frame right here, at event time: behind a slow store or PNG encode
|
||||
* the queue can run seconds late, and a frame request issued that late
|
||||
* could find the click-time frame already evicted from the ring buffer.
|
||||
* Eager pairing keeps one-click-one-frame semantics intact no matter how
|
||||
* fast the user clicks or how slow the encoder is.
|
||||
*/
|
||||
enqueueClickCapture(clickPos, clickAt = Date.now(), button = 'mouse') {
|
||||
const clickMeta = { at: Number.isFinite(clickAt) ? clickAt : Date.now(), button: button || 'mouse' };
|
||||
if (this.session && !this.session.paused && !this.userIsInApp()) {
|
||||
clickMeta.framePromise = this.frameForClick(clickPos, clickMeta.at)
|
||||
.catch(() => null);
|
||||
}
|
||||
this.clickQueue = this.clickQueue
|
||||
.then(() => this.sessionCapture('click', clickPos, clickMeta))
|
||||
.catch(() => {});
|
||||
@@ -795,8 +1077,10 @@ public static class SFMouseHook {
|
||||
storeFrameAsStep(guideId, mode, frame, clickPos = null) {
|
||||
if (!frame) return { ok: false, reason: 'no capture frame available' };
|
||||
const annotations = [];
|
||||
const cursor = clickPos || frame.cursor;
|
||||
if (mode !== 'window' && this.settings.get('capture.clickMarker')) {
|
||||
// The click position (DIP, read at event time) wins over the frame's
|
||||
// grab-time cursor; stream-backend frames carry no cursor at all.
|
||||
const cursor = clickPos || frame.cursor || null;
|
||||
if (cursor && mode !== 'window' && this.settings.get('capture.clickMarker')) {
|
||||
const fx = (cursor.x - frame.display.bounds.x) / frame.display.bounds.width;
|
||||
const fy = (cursor.y - frame.display.bounds.y) / frame.display.bounds.height;
|
||||
if (fx >= 0 && fx <= 1 && fy >= 0 && fy <= 1) {
|
||||
@@ -837,8 +1121,8 @@ public static class SFMouseHook {
|
||||
|
||||
/** Grab the screen/window image as { image, display } or throw. */
|
||||
async grab(mode, cursorPoint = null) {
|
||||
const cursor = cursorPoint || screen.getCursorScreenPoint();
|
||||
const display = screen.getDisplayNearestPoint(cursor);
|
||||
const cursor = cursorPoint || this.screen.getCursorScreenPoint();
|
||||
const display = this.screen.getDisplayNearestPoint(cursor);
|
||||
const { width, height } = display.size;
|
||||
const scale = display.scaleFactor || 1;
|
||||
// Ask for both kinds: some compositors (WSLg/Wayland portals) expose no
|
||||
|
||||
@@ -0,0 +1,162 @@
|
||||
'use strict';
|
||||
|
||||
/**
|
||||
* Click ↔ frame correlation logic, shared by the main process and the
|
||||
* capture-worker renderer (loaded there via a plain <script> tag, hence the
|
||||
* UMD-style export at the bottom and the total absence of dependencies).
|
||||
*
|
||||
* The model: a recorder keeps a ring buffer of timestamped frames, each with
|
||||
* { startedAt, capturedAt } — when the grab began and when it completed.
|
||||
* A click carries its own hook-time timestamp. Pairing the two answers
|
||||
* "what did the screen look like when the user clicked?".
|
||||
*
|
||||
* Strict mode encodes the product requirement (Folge-like recording): a step
|
||||
* must show the screen *at or before* the click, never after it. A frame
|
||||
* whose grab started after the click can already contain the click's effects
|
||||
* (menus opened, pages navigated), so strict mode rejects it outright — the
|
||||
* caller falls back to an explicit fresh shot instead of silently passing a
|
||||
* post-click frame off as the click-time screen. Balanced mode keeps the old
|
||||
* slack-window behavior for platforms where capture is too slow to keep a
|
||||
* pre-click frame buffered.
|
||||
*/
|
||||
|
||||
const DEFAULT_FRAME_LIMIT = 6;
|
||||
const DEFAULT_RETENTION_MS = 4000;
|
||||
// A frame older than this is too stale to pass off as "the screen at the
|
||||
// instant of the click".
|
||||
const DEFAULT_MAX_AGE_MS = 600;
|
||||
// Balanced mode only: a grab that began within this window after the click
|
||||
// is accepted on the assumption that UI reactions render slower than this.
|
||||
const DEFAULT_START_SLACK_MS = 300;
|
||||
|
||||
function pointInBounds(point, bounds) {
|
||||
if (!point || !bounds) return false;
|
||||
return point.x >= bounds.x
|
||||
&& point.x <= bounds.x + bounds.width
|
||||
&& point.y >= bounds.y
|
||||
&& point.y <= bounds.y + bounds.height;
|
||||
}
|
||||
|
||||
/**
|
||||
* Ring buffer of recent frames, bounded by both count and age. Frames are
|
||||
* raw images (potentially tens of MB each), so eviction is eager and an
|
||||
* optional onEvict hook lets callers release native resources (e.g.
|
||||
* ImageBitmap.close() in the capture worker).
|
||||
*/
|
||||
class FrameRing {
|
||||
constructor({ limit = DEFAULT_FRAME_LIMIT, retentionMs = DEFAULT_RETENTION_MS, now = Date.now, onEvict = null } = {}) {
|
||||
this.limit = limit;
|
||||
this.retentionMs = retentionMs;
|
||||
this.now = now;
|
||||
this.onEvict = onEvict;
|
||||
this.items = [];
|
||||
}
|
||||
|
||||
push(frame) {
|
||||
if (!frame) return null;
|
||||
this.items.push(frame);
|
||||
this.prune();
|
||||
return frame;
|
||||
}
|
||||
|
||||
prune() {
|
||||
const cutoff = this.now() - this.retentionMs;
|
||||
while (this.items.length
|
||||
&& (this.items.length > this.limit || !(this.items[0].capturedAt >= cutoff))) {
|
||||
const evicted = this.items.shift();
|
||||
if (this.onEvict) this.onEvict(evicted);
|
||||
}
|
||||
}
|
||||
|
||||
frames() {
|
||||
return [...this.items];
|
||||
}
|
||||
|
||||
latest() {
|
||||
return this.items.length ? this.items[this.items.length - 1] : null;
|
||||
}
|
||||
|
||||
clear() {
|
||||
const dropped = this.items;
|
||||
this.items = [];
|
||||
if (this.onEvict) for (const f of dropped) this.onEvict(f);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Whether one frame may represent one click.
|
||||
*
|
||||
* Strict mode accepts only:
|
||||
* - a frame completed at or before the click (and not older than maxAgeMs), or
|
||||
* - when allowInFlight is set, a frame whose grab *started* at or before the
|
||||
* click — its pixels predate the click's effects even though encoding
|
||||
* finished after.
|
||||
* A frame whose grab started after the click is never acceptable in strict
|
||||
* mode, no matter how close: that is exactly the "screenshot shows the menu
|
||||
* already open" failure.
|
||||
*
|
||||
* Balanced mode additionally accepts in-flight frames that started within
|
||||
* startSlackMs after the click (the legacy heuristic).
|
||||
*/
|
||||
function frameUsableForClick(frame, {
|
||||
clickAt,
|
||||
clickPos = null,
|
||||
mode = null,
|
||||
strict = true,
|
||||
allowInFlight = false,
|
||||
maxAgeMs = DEFAULT_MAX_AGE_MS,
|
||||
startSlackMs = DEFAULT_START_SLACK_MS,
|
||||
} = {}) {
|
||||
if (!frame) return false;
|
||||
if (mode && frame.mode !== mode) return false;
|
||||
// Fast clicks can move to another monitor before a buffered frame is
|
||||
// consumed; only reuse frames from the clicked display.
|
||||
if (clickPos && frame.display && !pointInBounds(clickPos, frame.display.bounds)) return false;
|
||||
|
||||
const clickTime = Number.isFinite(clickAt) ? clickAt : Date.now();
|
||||
const capturedAt = frame.capturedAt;
|
||||
const startedAt = Number.isFinite(frame.startedAt) ? frame.startedAt : capturedAt;
|
||||
|
||||
const completedBeforeClick = Number.isFinite(capturedAt) && capturedAt <= clickTime;
|
||||
if (completedBeforeClick) return clickTime - capturedAt <= maxAgeMs;
|
||||
|
||||
if (!allowInFlight || !Number.isFinite(startedAt)) return false;
|
||||
if (strict) return startedAt <= clickTime;
|
||||
return startedAt <= clickTime + startSlackMs;
|
||||
}
|
||||
|
||||
/**
|
||||
* Best already-buffered frame for a click: the newest frame that qualifies
|
||||
* under frameUsableForClick. Buffered frames are by definition completed, so
|
||||
* in-flight acceptance never applies here. Returns null when nothing
|
||||
* qualifies and the caller must wait for the in-flight grab or fall back to
|
||||
* a fresh shot.
|
||||
*/
|
||||
function selectFrameForClick(frames, opts = {}) {
|
||||
let best = null;
|
||||
for (const frame of frames || []) {
|
||||
if (!frameUsableForClick(frame, { ...opts, allowInFlight: false })) continue;
|
||||
if (!best || frame.capturedAt > best.capturedAt) best = frame;
|
||||
}
|
||||
return best;
|
||||
}
|
||||
|
||||
const api = {
|
||||
FrameRing,
|
||||
frameUsableForClick,
|
||||
selectFrameForClick,
|
||||
pointInBounds,
|
||||
DEFAULT_FRAME_LIMIT,
|
||||
DEFAULT_RETENTION_MS,
|
||||
DEFAULT_MAX_AGE_MS,
|
||||
DEFAULT_START_SLACK_MS,
|
||||
};
|
||||
|
||||
/* eslint-disable no-undef */
|
||||
if (typeof module === 'object' && module.exports) {
|
||||
module.exports = api;
|
||||
} else if (typeof self !== 'undefined') {
|
||||
self.StepForgeClickFrames = api;
|
||||
} else if (typeof window !== 'undefined') {
|
||||
window.StepForgeClickFrames = api;
|
||||
}
|
||||
+110
@@ -0,0 +1,110 @@
|
||||
'use strict';
|
||||
|
||||
const { pointInBounds } = require('./click-frames');
|
||||
|
||||
/**
|
||||
* Coordinate-space conversion between physical (OS event) pixels and
|
||||
* Electron DIP points.
|
||||
*
|
||||
* Why this exists: OS-level click hooks report *physical* pixels (the X11
|
||||
* root window space on Linux, virtual-screen pixels on Windows), while
|
||||
* everything Electron-side — display bounds, cursor reads, the click-marker
|
||||
* math in storeFrameAsStep — is in DIP. Mixing the two spaces is exactly the
|
||||
* bug that makes the red marker drift on scaled displays: at 150% scaling a
|
||||
* physical click at (1500, 900) is the DIP point (1000, 600), and a marker
|
||||
* drawn at the physical values lands well below-right of the real click.
|
||||
*
|
||||
* On Windows, Electron exposes screen.screenToDipPoint() and the capture
|
||||
* service prefers it. On Linux/X11 there is no such API, so we reconstruct
|
||||
* the mapping from display geometry: each display's DIP bounds plus its
|
||||
* scaleFactor give its physical rectangle, and a physical point inside that
|
||||
* rectangle maps back linearly. With mixed-DPI multi-monitor X11 setups the
|
||||
* origin reconstruction is an approximation (X11 itself has a single global
|
||||
* coordinate space), but it is exact for the overwhelmingly common cases:
|
||||
* single display at any scale, and multi-display with a uniform scale.
|
||||
*/
|
||||
|
||||
/** Physical-pixel rectangle a display occupies, derived from DIP bounds. */
|
||||
function physicalBoundsOf(display) {
|
||||
const bounds = display && display.bounds;
|
||||
if (!bounds) return null;
|
||||
const scale = display.scaleFactor || 1;
|
||||
return {
|
||||
x: Math.round(bounds.x * scale),
|
||||
y: Math.round(bounds.y * scale),
|
||||
width: Math.round(bounds.width * scale),
|
||||
height: Math.round(bounds.height * scale),
|
||||
};
|
||||
}
|
||||
|
||||
function centerDistanceSq(point, rect) {
|
||||
const cx = rect.x + rect.width / 2;
|
||||
const cy = rect.y + rect.height / 2;
|
||||
return (point.x - cx) ** 2 + (point.y - cy) ** 2;
|
||||
}
|
||||
|
||||
/**
|
||||
* Display whose physical rectangle contains the point, or the nearest one
|
||||
* (clicks on the very edge of a screen can round to one pixel outside it).
|
||||
*/
|
||||
function displayForPhysicalPoint(point, displays) {
|
||||
if (!point || !Array.isArray(displays) || !displays.length) return null;
|
||||
let nearest = null;
|
||||
let nearestDist = Infinity;
|
||||
for (const display of displays) {
|
||||
const phys = physicalBoundsOf(display);
|
||||
if (!phys) continue;
|
||||
if (pointInBounds(point, phys)) return display;
|
||||
const dist = centerDistanceSq(point, phys);
|
||||
if (dist < nearestDist) {
|
||||
nearestDist = dist;
|
||||
nearest = display;
|
||||
}
|
||||
}
|
||||
return nearest;
|
||||
}
|
||||
|
||||
/**
|
||||
* Convert a physical-pixel point (OS click hook) to DIP. Returns null when
|
||||
* no display geometry is available — the caller should then fall back to a
|
||||
* live cursor read rather than guessing.
|
||||
*/
|
||||
function physicalToDip(point, displays) {
|
||||
if (!point || !Number.isFinite(point.x) || !Number.isFinite(point.y)) return null;
|
||||
const display = displayForPhysicalPoint(point, displays);
|
||||
if (!display) return null;
|
||||
const phys = physicalBoundsOf(display);
|
||||
const scale = display.scaleFactor || 1;
|
||||
return {
|
||||
x: display.bounds.x + (point.x - phys.x) / scale,
|
||||
y: display.bounds.y + (point.y - phys.y) / scale,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Display whose DIP bounds contain the point, or the nearest one. Used to
|
||||
* route a click to the capture stream of the monitor it landed on.
|
||||
*/
|
||||
function displayForDipPoint(point, displays) {
|
||||
if (!point || !Array.isArray(displays) || !displays.length) return null;
|
||||
let nearest = null;
|
||||
let nearestDist = Infinity;
|
||||
for (const display of displays) {
|
||||
if (!display || !display.bounds) continue;
|
||||
if (pointInBounds(point, display.bounds)) return display;
|
||||
const dist = centerDistanceSq(point, display.bounds);
|
||||
if (dist < nearestDist) {
|
||||
nearestDist = dist;
|
||||
nearest = display;
|
||||
}
|
||||
}
|
||||
return nearest;
|
||||
}
|
||||
|
||||
module.exports = {
|
||||
physicalBoundsOf,
|
||||
displayForPhysicalPoint,
|
||||
displayForDipPoint,
|
||||
physicalToDip,
|
||||
pointInBounds,
|
||||
};
|
||||
+56
@@ -98,6 +98,62 @@ function createWindow() {
|
||||
}
|
||||
}, 1500);
|
||||
}
|
||||
// Dev-only self-test: exercise the full click-capture pipeline — resume
|
||||
// session, wait for the frame recorder, inject OS-level clicks the way
|
||||
// the watcher would, and verify one stored step per click.
|
||||
if (process.env.STEPFORGE_CLICK_SELFTEST) {
|
||||
setTimeout(async () => {
|
||||
try {
|
||||
const guide = store.createGuide({ title: 'click selftest' });
|
||||
capture.startSession(guide.guideId, { intervalSec: 0 });
|
||||
capture.togglePause(false);
|
||||
mainWindow.hide();
|
||||
// Arm the frame recorder directly: this host may lack the click
|
||||
// watcher binary (xinput), which normally gates the recorder, but
|
||||
// the recorder itself must still be testable end to end.
|
||||
await capture.startClickFrameBackend();
|
||||
// Let the stream backend (or the fallback loop) come up and buffer.
|
||||
await new Promise((res) => setTimeout(res, 3000));
|
||||
console.log('CLICK-SELFTEST source:', capture.state().clickFrameSource);
|
||||
const clicks = [
|
||||
{ x: 200, y: 150 },
|
||||
{ x: 400, y: 300 },
|
||||
{ x: 600, y: 450 },
|
||||
];
|
||||
for (const point of clicks) {
|
||||
capture.onOsClick(Date.now(), point, 'button-1');
|
||||
await new Promise((res) => setTimeout(res, 120)); // fast clicking
|
||||
}
|
||||
// Wait for the queue to drain (encodes can take seconds on WSLg).
|
||||
await capture.clickQueue;
|
||||
await new Promise((res) => setTimeout(res, 500));
|
||||
const stepIds = store.getGuide(guide.guideId).stepsOrder;
|
||||
const steps = store.listSteps(guide.guideId);
|
||||
const markers = stepIds.map((id) => (steps.get(id).annotations || []).length);
|
||||
console.log('CLICK-SELFTEST steps:', stepIds.length, 'of', clicks.length,
|
||||
'markers:', JSON.stringify(markers));
|
||||
// Marker accuracy: each oval's center (fractional) must match the
|
||||
// injected click position relative to the display bounds.
|
||||
const { bounds } = screen.getPrimaryDisplay();
|
||||
stepIds.forEach((id, i) => {
|
||||
const a = (steps.get(id).annotations || [])[0];
|
||||
if (!a) return;
|
||||
const center = { x: a.x + a.w / 2, y: a.y + a.h / 2 };
|
||||
const expected = {
|
||||
x: (clicks[i].x - bounds.x) / bounds.width,
|
||||
y: (clicks[i].y - bounds.y) / bounds.height,
|
||||
};
|
||||
const offBy = Math.hypot(center.x - expected.x, center.y - expected.y);
|
||||
console.log(`CLICK-SELFTEST marker ${i}: off by ${(offBy * 100).toFixed(2)}% of screen`);
|
||||
});
|
||||
capture.finishSession();
|
||||
} catch (err) {
|
||||
console.log('CLICK-SELFTEST ERROR', err.message);
|
||||
} finally {
|
||||
app.quit();
|
||||
}
|
||||
}, 1500);
|
||||
}
|
||||
// Dev-only self-test: exercise the exact hotkey-session capture path
|
||||
// (hide window -> grab -> showInactive) several times, then exit.
|
||||
if (process.env.STEPFORGE_CAPTURE_SELFTEST) {
|
||||
|
||||
@@ -0,0 +1,11 @@
|
||||
<!doctype html>
|
||||
<html>
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>StepForge capture worker</title>
|
||||
<!-- Shared click↔frame selection logic; sets window.StepForgeClickFrames. -->
|
||||
<script src="../click-frames.js" defer></script>
|
||||
<script src="capture-worker.js" defer></script>
|
||||
</head>
|
||||
<body><!-- hidden window; frames live in JS, nothing renders here --></body>
|
||||
</html>
|
||||
@@ -0,0 +1,199 @@
|
||||
'use strict';
|
||||
|
||||
/**
|
||||
* Capture worker: runs in a hidden renderer window and owns all continuous
|
||||
* screen capture during a recording session.
|
||||
*
|
||||
* Per display it opens a desktop media stream (the desktopCapturer source id
|
||||
* comes from the main process) and samples it on a fixed cadence into a
|
||||
* timestamped ring buffer of ImageBitmaps. Sampling and PNG encoding happen
|
||||
* entirely in this process, so the main-process event loop — which must stay
|
||||
* responsive to deliver OS click events on time — never blocks on capture
|
||||
* work. ImageBitmaps are GPU-backed and cheap to create from a <video>
|
||||
* element, which is what lets the cadence be much tighter than the old
|
||||
* 200ms main-process desktopCapturer loop.
|
||||
*
|
||||
* On a frame request the worker applies the shared strict selection rule
|
||||
* (newest frame captured at or before the click; never one whose grab
|
||||
* started after it), encodes that single frame to PNG, and ships the bytes
|
||||
* to the main process.
|
||||
*/
|
||||
|
||||
/* global StepForgeClickFrames, captureWorkerBridge */
|
||||
|
||||
(() => {
|
||||
const FALLBACK_SAMPLE_MS = 100;
|
||||
// Tight cadence means more frames per second; keep enough of them to
|
||||
// bridge any encode/IPC hiccup without hoarding GPU memory.
|
||||
const FALLBACK_FRAME_LIMIT = 8;
|
||||
const FALLBACK_RETENTION_MS = 2000;
|
||||
|
||||
const streams = new Map(); // displayId(string) -> stream state
|
||||
|
||||
function send(msg) {
|
||||
try {
|
||||
captureWorkerBridge.send(msg);
|
||||
return true;
|
||||
} catch (err) {
|
||||
// Either the main process is gone or the payload didn't survive the
|
||||
// bridge; log it — a silently dropped frame-response would otherwise
|
||||
// look like a worker hang from the main process.
|
||||
console.error('capture-worker send failed:', err && err.message, 'type:', msg && msg.type);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
async function startStream(cmd) {
|
||||
const key = String(cmd.displayId);
|
||||
stopStream(key);
|
||||
const display = cmd.display || {};
|
||||
const scale = display.scaleFactor || 1;
|
||||
const bounds = display.bounds || { width: 1280, height: 720 };
|
||||
const physWidth = Math.round(bounds.width * scale);
|
||||
const physHeight = Math.round(bounds.height * scale);
|
||||
const state = {
|
||||
displayId: cmd.displayId,
|
||||
media: null,
|
||||
video: null,
|
||||
timer: null,
|
||||
sampling: false,
|
||||
ring: new StepForgeClickFrames.FrameRing({
|
||||
limit: cmd.frameLimit || FALLBACK_FRAME_LIMIT,
|
||||
retentionMs: cmd.retentionMs || FALLBACK_RETENTION_MS,
|
||||
onEvict: (frame) => {
|
||||
if (frame && frame.bitmap && frame.bitmap.close) frame.bitmap.close();
|
||||
},
|
||||
}),
|
||||
};
|
||||
streams.set(key, state);
|
||||
try {
|
||||
// The chromeMediaSource constraint set is Electron's documented bridge
|
||||
// from a desktopCapturer source id to a live media stream.
|
||||
state.media = await navigator.mediaDevices.getUserMedia({
|
||||
audio: false,
|
||||
video: {
|
||||
mandatory: {
|
||||
chromeMediaSource: 'desktop',
|
||||
chromeMediaSourceId: cmd.sourceId,
|
||||
minWidth: physWidth,
|
||||
maxWidth: physWidth,
|
||||
minHeight: physHeight,
|
||||
maxHeight: physHeight,
|
||||
maxFrameRate: 30,
|
||||
},
|
||||
},
|
||||
});
|
||||
const video = document.createElement('video');
|
||||
video.muted = true;
|
||||
video.srcObject = state.media;
|
||||
state.video = video;
|
||||
await video.play();
|
||||
const sampleMs = cmd.sampleMs || FALLBACK_SAMPLE_MS;
|
||||
state.timer = setInterval(() => sampleFrame(state), sampleMs);
|
||||
// Buffer a frame immediately so a click right after "Start recording"
|
||||
// already has something captured before it.
|
||||
await sampleFrame(state);
|
||||
send({ type: 'stream-ready', displayId: cmd.displayId });
|
||||
} catch (err) {
|
||||
stopStream(key);
|
||||
send({ type: 'stream-error', displayId: cmd.displayId, reason: String(err && err.message || err) });
|
||||
}
|
||||
}
|
||||
|
||||
async function sampleFrame(state) {
|
||||
if (state.sampling || !state.video || state.video.readyState < 2) return;
|
||||
state.sampling = true;
|
||||
// startedAt/capturedAt bracket the grab so strict selection can tell
|
||||
// pre-click frames from post-click ones.
|
||||
const startedAt = Date.now();
|
||||
try {
|
||||
const bitmap = await createImageBitmap(state.video);
|
||||
state.ring.push({
|
||||
mode: 'fullscreen',
|
||||
bitmap,
|
||||
width: bitmap.width,
|
||||
height: bitmap.height,
|
||||
startedAt,
|
||||
capturedAt: Date.now(),
|
||||
});
|
||||
} catch {
|
||||
// A failed sample only means a slightly older best frame.
|
||||
} finally {
|
||||
state.sampling = false;
|
||||
}
|
||||
}
|
||||
|
||||
function stopStream(key) {
|
||||
const state = streams.get(key);
|
||||
if (!state) return;
|
||||
if (state.timer) clearInterval(state.timer);
|
||||
if (state.media) {
|
||||
for (const track of state.media.getTracks()) {
|
||||
try { track.stop(); } catch { /* already stopped */ }
|
||||
}
|
||||
}
|
||||
state.ring.clear();
|
||||
streams.delete(key);
|
||||
}
|
||||
|
||||
async function handleFrameRequest(cmd) {
|
||||
const state = streams.get(String(cmd.displayId));
|
||||
const reply = (extra) => send({ type: 'frame-response', requestId: cmd.requestId, ...extra });
|
||||
if (!state) return reply({ ok: false, reason: 'no stream for display' });
|
||||
// One last sample: if the compositor delivered a newer video frame since
|
||||
// the previous tick, a sub-millisecond grab here can only improve (never
|
||||
// worsen) the match — its startedAt is still checked against the click.
|
||||
await sampleFrame(state);
|
||||
const frame = StepForgeClickFrames.selectFrameForClick(state.ring.frames(), {
|
||||
clickAt: cmd.clickAt,
|
||||
mode: 'fullscreen',
|
||||
strict: cmd.strict !== false,
|
||||
});
|
||||
if (!frame) return reply({ ok: false, reason: 'no frame at or before the click' });
|
||||
try {
|
||||
const canvas = new OffscreenCanvas(frame.width, frame.height);
|
||||
canvas.getContext('2d').drawImage(frame.bitmap, 0, 0);
|
||||
const blob = await canvas.convertToBlob({ type: 'image/png' });
|
||||
const png = await blob.arrayBuffer();
|
||||
return reply({
|
||||
ok: true,
|
||||
png: new Uint8Array(png),
|
||||
width: frame.width,
|
||||
height: frame.height,
|
||||
startedAt: frame.startedAt,
|
||||
capturedAt: frame.capturedAt,
|
||||
});
|
||||
} catch (err) {
|
||||
return reply({ ok: false, reason: String(err && err.message || err) });
|
||||
}
|
||||
}
|
||||
|
||||
/** Health/diagnostic snapshot of every stream. */
|
||||
function reportStats(cmd) {
|
||||
const stats = {};
|
||||
for (const [key, state] of streams) {
|
||||
stats[key] = {
|
||||
frames: state.ring.frames().length,
|
||||
latestCapturedAt: state.ring.latest() ? state.ring.latest().capturedAt : null,
|
||||
videoReadyState: state.video ? state.video.readyState : null,
|
||||
videoSize: state.video ? `${state.video.videoWidth}x${state.video.videoHeight}` : null,
|
||||
sampling: state.sampling,
|
||||
};
|
||||
}
|
||||
send({ type: 'stats', requestId: cmd && cmd.requestId, stats });
|
||||
}
|
||||
|
||||
captureWorkerBridge.onCommand((msg) => {
|
||||
if (!msg || typeof msg !== 'object') return;
|
||||
if (msg.type === 'start-stream') startStream(msg);
|
||||
else if (msg.type === 'stop-stream') stopStream(String(msg.displayId));
|
||||
else if (msg.type === 'frame-request') {
|
||||
// A request must always produce a response — an unanswered click
|
||||
// counts toward backend unhealthiness in the main process.
|
||||
handleFrameRequest(msg).catch((err) => {
|
||||
console.error('capture-worker frame-request failed:', err && err.message);
|
||||
send({ type: 'frame-response', requestId: msg.requestId, ok: false, reason: String(err && err.message || err) });
|
||||
});
|
||||
} else if (msg.type === 'stats-request') reportStats(msg);
|
||||
});
|
||||
})();
|
||||
@@ -0,0 +1,294 @@
|
||||
'use strict';
|
||||
|
||||
const path = require('node:path');
|
||||
const { displayForDipPoint } = require('./coords');
|
||||
|
||||
/**
|
||||
* Off-main-process click-frame backend.
|
||||
*
|
||||
* The legacy design ran desktopCapturer.getSources() in a 200ms loop on the
|
||||
* main process. That had two structural problems this backend removes:
|
||||
* - every grab (and the occasional PNG encode) blocked the main-process
|
||||
* event loop, which delayed delivery of OS click events — the very events
|
||||
* the loop existed to serve — by up to whole seconds under load;
|
||||
* - getSources() is a heavy thumbnail API, so the loop had to idle 200ms
|
||||
* between grabs, leaving clicks to be matched against frames that could
|
||||
* be hundreds of ms stale.
|
||||
*
|
||||
* Here, a hidden worker window opens a desktop media *stream* per display
|
||||
* and samples it on a tight cadence into a timestamped ring buffer — all in
|
||||
* the worker's renderer process. On click, the main process sends only a tiny
|
||||
* IPC request carrying the hook-time click timestamp; the worker picks the
|
||||
* newest frame captured at or before that instant (strict semantics from
|
||||
* click-frames.js), PNG-encodes it off the main process, and ships the bytes
|
||||
* back. The main process never grabs or encodes a frame while recording.
|
||||
*
|
||||
* Failure handling: the backend is an optimization, never a single point of
|
||||
* failure. If streams don't come up (Wayland portals, WSLg quirks) start()
|
||||
* reports false and the capture service falls back to the legacy loop; if
|
||||
* frame requests start timing out mid-session, the backend declares itself
|
||||
* unhealthy once and the service degrades the same way.
|
||||
*/
|
||||
|
||||
const DEFAULT_SAMPLE_MS = 100;
|
||||
// Generous on purpose: the worker selects the frame the moment the request
|
||||
// arrives (that pins the click↔frame pairing), but PNG-encoding a 4K-class
|
||||
// frame can take seconds on software-rendered hosts (WSLg, VMs). A slow
|
||||
// reply is still the *correct* frame; only a worker that never answers
|
||||
// should count as unhealthy.
|
||||
const DEFAULT_FRAME_TIMEOUT_MS = 10_000;
|
||||
const DEFAULT_START_TIMEOUT_MS = 8000;
|
||||
// Consecutive frame-request timeouts before the backend declares itself
|
||||
// unhealthy and the capture service degrades to the in-process loop.
|
||||
const MAX_CONSECUTIVE_FAILURES = 2;
|
||||
|
||||
class StreamCaptureBackend {
|
||||
/**
|
||||
* @param {object} opts
|
||||
* @param {(onEvent: (msg) => void) => Promise<{send,destroy}>} opts.createHost
|
||||
* Factory for the worker transport (the hidden BrowserWindow in
|
||||
* production, a fake in tests).
|
||||
* @param {(reason: string) => void} [opts.onUnhealthy]
|
||||
*/
|
||||
constructor({ createHost, onUnhealthy = null, frameTimeoutMs = DEFAULT_FRAME_TIMEOUT_MS, startTimeoutMs = DEFAULT_START_TIMEOUT_MS } = {}) {
|
||||
this.createHost = createHost;
|
||||
this.onUnhealthy = onUnhealthy;
|
||||
this.frameTimeoutMs = frameTimeoutMs;
|
||||
this.startTimeoutMs = startTimeoutMs;
|
||||
this.host = null;
|
||||
this.active = false;
|
||||
this.requests = new Map(); // requestId -> { resolve, timer }
|
||||
this.streams = new Map(); // displayId(string) -> { display, ready }
|
||||
this.nextRequestId = 1;
|
||||
this.consecutiveFailures = 0;
|
||||
this.startWaiters = [];
|
||||
}
|
||||
|
||||
isActive() {
|
||||
return this.active;
|
||||
}
|
||||
|
||||
/**
|
||||
* Spin up the worker and one stream per display that has a matching screen
|
||||
* source. Resolves true when at least one stream is delivering frames.
|
||||
*/
|
||||
async start({ displays = [], sources = [], sampleMs = DEFAULT_SAMPLE_MS, retentionMs = null, frameLimit = null } = {}) {
|
||||
if (this.host) return this.active;
|
||||
const pairs = pairDisplaysToSources(displays, sources);
|
||||
if (!pairs.length) return false;
|
||||
try {
|
||||
this.host = await this.createHost((msg) => this.handleWorkerEvent(msg));
|
||||
} catch {
|
||||
this.host = null;
|
||||
return false;
|
||||
}
|
||||
for (const { display, sourceId } of pairs) {
|
||||
this.streams.set(String(display.id), { display, ready: false, failed: false });
|
||||
this.hostSend({
|
||||
type: 'start-stream',
|
||||
displayId: display.id,
|
||||
sourceId,
|
||||
// The worker needs the physical pixel size to request a full-res
|
||||
// stream; bounds stay in DIP for marker math back in the main process.
|
||||
display: {
|
||||
id: display.id,
|
||||
bounds: display.bounds,
|
||||
scaleFactor: display.scaleFactor || 1,
|
||||
},
|
||||
sampleMs,
|
||||
retentionMs,
|
||||
frameLimit,
|
||||
});
|
||||
}
|
||||
const anyReady = await this.waitForStreams();
|
||||
this.active = anyReady;
|
||||
if (!anyReady) this.stop();
|
||||
return this.active;
|
||||
}
|
||||
|
||||
/** Resolves true as soon as one stream reports ready, false on timeout/all-failed. */
|
||||
waitForStreams() {
|
||||
return new Promise((resolve) => {
|
||||
const finish = (ok) => {
|
||||
clearTimeout(timer);
|
||||
this.startWaiters = this.startWaiters.filter((w) => w !== check);
|
||||
resolve(ok);
|
||||
};
|
||||
const check = () => {
|
||||
const states = [...this.streams.values()];
|
||||
if (states.some((s) => s.ready)) return finish(true);
|
||||
if (states.length && states.every((s) => s.failed)) return finish(false);
|
||||
return null;
|
||||
};
|
||||
const timer = setTimeout(() => finish(false), this.startTimeoutMs);
|
||||
this.startWaiters.push(check);
|
||||
check();
|
||||
});
|
||||
}
|
||||
|
||||
hostSend(msg) {
|
||||
if (!this.host) return;
|
||||
try {
|
||||
this.host.send(msg);
|
||||
} catch {
|
||||
// A dead host surfaces as request timeouts → unhealthy → degrade.
|
||||
}
|
||||
}
|
||||
|
||||
handleWorkerEvent(msg) {
|
||||
if (!msg || typeof msg !== 'object') return;
|
||||
if (msg.type === 'stream-ready' || msg.type === 'stream-error') {
|
||||
const stream = this.streams.get(String(msg.displayId));
|
||||
if (stream) {
|
||||
stream.ready = msg.type === 'stream-ready';
|
||||
stream.failed = msg.type === 'stream-error';
|
||||
}
|
||||
for (const check of [...this.startWaiters]) check();
|
||||
return;
|
||||
}
|
||||
if (msg.type === 'frame-response') {
|
||||
const pending = this.requests.get(msg.requestId);
|
||||
if (!pending) return; // late reply after timeout — already handled
|
||||
this.requests.delete(msg.requestId);
|
||||
clearTimeout(pending.timer);
|
||||
// Any answer — even "no qualifying frame" — proves the worker is alive.
|
||||
this.consecutiveFailures = 0;
|
||||
if (!msg.ok || !msg.png) {
|
||||
pending.resolve(null);
|
||||
return;
|
||||
}
|
||||
pending.resolve({
|
||||
mode: 'fullscreen',
|
||||
png: Buffer.from(msg.png),
|
||||
size: { width: msg.width, height: msg.height },
|
||||
display: pending.display,
|
||||
startedAt: msg.startedAt,
|
||||
capturedAt: msg.capturedAt,
|
||||
source: 'stream',
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Frame for one click, selected in the worker under the given strictness.
|
||||
* Resolves null when no frame qualifies (caller falls back) — and also on
|
||||
* timeout, which additionally counts toward unhealthiness.
|
||||
*/
|
||||
frameForClick({ clickPos = null, clickAt = Date.now(), strict = true } = {}) {
|
||||
if (!this.active || !this.host) return Promise.resolve(null);
|
||||
const displays = [...this.streams.values()].filter((s) => s.ready).map((s) => s.display);
|
||||
const display = clickPos ? displayForDipPoint(clickPos, displays) : (displays[0] || null);
|
||||
if (!display) return Promise.resolve(null);
|
||||
const requestId = this.nextRequestId++;
|
||||
return new Promise((resolve) => {
|
||||
const timer = setTimeout(() => {
|
||||
this.requests.delete(requestId);
|
||||
resolve(null);
|
||||
this.noteFailure();
|
||||
}, this.frameTimeoutMs);
|
||||
this.requests.set(requestId, { resolve, timer, display });
|
||||
this.hostSend({
|
||||
type: 'frame-request',
|
||||
requestId,
|
||||
displayId: display.id,
|
||||
clickAt,
|
||||
strict,
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
noteFailure() {
|
||||
this.consecutiveFailures += 1;
|
||||
if (this.consecutiveFailures < MAX_CONSECUTIVE_FAILURES) return;
|
||||
const notify = this.onUnhealthy;
|
||||
this.stop();
|
||||
if (notify) notify('frame requests timing out');
|
||||
}
|
||||
|
||||
stop() {
|
||||
this.active = false;
|
||||
for (const [, pending] of this.requests) {
|
||||
clearTimeout(pending.timer);
|
||||
pending.resolve(null);
|
||||
}
|
||||
this.requests.clear();
|
||||
this.streams.clear();
|
||||
for (const check of [...this.startWaiters]) check();
|
||||
this.startWaiters = [];
|
||||
if (this.host) {
|
||||
try { this.host.destroy(); } catch { /* already gone */ }
|
||||
this.host = null;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/** Match each display to its desktopCapturer screen source by display_id. */
|
||||
function pairDisplaysToSources(displays, sources) {
|
||||
const screens = (sources || []).filter((s) => s && typeof s.id === 'string' && s.id.startsWith('screen:'));
|
||||
const pairs = [];
|
||||
const used = new Set();
|
||||
for (const display of displays || []) {
|
||||
let source = screens.find((s) => !used.has(s.id) && String(s.display_id) === String(display.id));
|
||||
if (!source && displays.length === 1 && screens.length === 1) {
|
||||
// Single display, single source: some platforms leave display_id empty.
|
||||
source = screens[0];
|
||||
}
|
||||
if (!source) continue;
|
||||
used.add(source.id);
|
||||
pairs.push({ display, sourceId: source.id });
|
||||
}
|
||||
return pairs;
|
||||
}
|
||||
|
||||
/**
|
||||
* Production worker host: a hidden BrowserWindow running the capture-worker
|
||||
* page. Lazy-required Electron so this module stays loadable under node for
|
||||
* unit tests.
|
||||
*/
|
||||
async function createElectronHost(onEvent) {
|
||||
// eslint-disable-next-line global-require
|
||||
const { BrowserWindow, ipcMain } = require('electron');
|
||||
const win = new BrowserWindow({
|
||||
show: false,
|
||||
width: 320,
|
||||
height: 240,
|
||||
skipTaskbar: true,
|
||||
webPreferences: {
|
||||
preload: path.join(__dirname, 'capture-worker-preload.js'),
|
||||
contextIsolation: true,
|
||||
nodeIntegration: false,
|
||||
// The worker must keep sampling while hidden — throttling a hidden
|
||||
// window is exactly the wrong default for a frame recorder.
|
||||
backgroundThrottling: false,
|
||||
},
|
||||
});
|
||||
const listener = (event, msg) => {
|
||||
if (event.sender === win.webContents) onEvent(msg);
|
||||
};
|
||||
ipcMain.on('capture-worker:event', listener);
|
||||
try {
|
||||
await win.loadFile(path.join(__dirname, 'renderer', 'capture-worker.html'));
|
||||
} catch (err) {
|
||||
ipcMain.removeListener('capture-worker:event', listener);
|
||||
if (!win.isDestroyed()) win.destroy();
|
||||
throw err;
|
||||
}
|
||||
return {
|
||||
send(msg) {
|
||||
if (!win.isDestroyed()) win.webContents.send('capture-worker:command', msg);
|
||||
},
|
||||
destroy() {
|
||||
ipcMain.removeListener('capture-worker:event', listener);
|
||||
if (!win.isDestroyed()) win.destroy();
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
module.exports = {
|
||||
StreamCaptureBackend,
|
||||
createElectronHost,
|
||||
pairDisplaysToSources,
|
||||
DEFAULT_SAMPLE_MS,
|
||||
DEFAULT_FRAME_TIMEOUT_MS,
|
||||
MAX_CONSECUTIVE_FAILURES,
|
||||
};
|
||||
Reference in New Issue
Block a user