Root cause of 'I clicked many times but only got two screenshots':
finishing/pausing a session called backend.stop(), which cancelled every
in-flight frame request to null. Clicks whose PNG had not finished
encoding yet were then dropped — only the first few survived.
Fixes:
- Stream backend now *drains* on stop: it stops accepting new requests but
keeps the worker alive until frames already selected for queued clicks
finish encoding. stop({ immediate: true }) keeps the old abandon-now
behavior for an unhealthy worker.
- Two-stage worker reply: a fast 'frame-selected' ack pins the pairing and
proves liveness; the slow PNG payload follows. A slow encode (seconds on
software-rendered hosts) is no longer mistaken for a dead worker, which
had been forcing the post-click fresh-shot fallback (late screenshots).
- Queued clicks carry their guide id and are stored even if the session
ends while they wait in the queue.
- The tray gesture that stops a session is discarded by matching its
recorded screen position, not a time window — a fast workflow click near
the stop is no longer collateral damage. (Replaces the earlier grace
window, which dropped whole bursts.)
- A click on a display with no ready stream resolves null so the caller
fresh-shots the correct monitor instead of returning another screen.
- STEPFORGE_CAPTURE_LOG=1 prints one line per click decision; the
second-instance handler now surfaces the running window instead of
exiting silently.
- Self-test gains a fast-burst-then-finish scenario (8/8 saved) and the
marker/coordinate checks remain at 0.00% offset.
Tests: 133 unit + all repo checks passing.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
8.8 KiB
Architecture
StepForge is split into a dependency-free Node.js core and a thin
Electron desktop shell. All product logic lives in core/ and
exporters/ and runs (and is tested) headlessly with plain node. The shell
in app/ only provides the window, the canvas UI, screen capture, hotkeys,
and the clipboard.
Repository Layout
app/ Electron shell: main process, preload bridge, renderer UI
core/ Dependency-free domain logic (schema, store, archive, search,
placeholders, render AST, png/gif/pdf/zip primitives, locks,
snapshots, settings)
exporters/ One module per output format, all consuming the Render AST
scripts/ bootstrap / verify / build / package scripts (sh + ps1)
tests/
run_test.sh entrypoint — runs every tests/checks/test_*.sh
checks/ shell wrappers that invoke the node test suites
unit/ node:test workflow suites
fixtures/ test images and guides
examples/ sample guide + sample exports
assets/ app icon and packaged static assets
build/ agent_audit.md, build_report.md, artifacts_manifest.json
docs/ file-format and data-model documentation
vendor/ reserved for vendored deps (reuse only; nothing fetched)
Data Model
Internal working storage is folder-based; sharing/backups use a
single-file zip archive (.sfgz).
<data root>/ (~/.local/share/stepforge, %APPDATA%/stepforge,
settings/ or $STEPFORGE_DATA_DIR)
app-settings.json
placeholders.json global placeholders
templates/<format>/<name>.template.json
library/
folders.json folder tree + guide->folder mapping
guides/<guide-id>/
guide.json guide metadata (schema below)
steps/<step-id>/
step.json
original.png never mutated after capture
working.png crop target; annotations stay vector JSON
history/snapshots/*.zip automated + manual backups
index/search-index.json inverted full-text index
temp/ previews; cleaned on close
shared-links/ linked-guide registry
guide.json— schemaVersion, guideId, title, descriptionHtml, placeholders, flags (focusedViewDefault, ...), stepsOrder, favorite, linkedSource, exportProfiles, createdAt/updatedAt.step.json— stepId, parentStepId, kind (image | empty | content), status (todo | in-progress | done), title, descriptionHtml, hidden, skipped, focusedView {enabled, zoom, panX, panY}, image paths + size,annotations[](normalized scene graph, coordinates in 0..1 fractions of the image), textBlocks[], codeBlocks[], tableBlocks[], links[].
All writes are atomic (write to *.tmp, fsync, rename). Deleting a guide
moves it to library/trash/ first.
Annotation Scene Graph
Annotations are stored as normalized JSON, never as an editor-library blob.
Coordinates are fractions of the image (resolution-independent). Types:
rect, oval, line, arrow, text, tooltip, number, blur, highlight, magnify, cursor. The same geometry is rendered by the editor canvas (HTML5 canvas)
and by the export rasterizer (core/raster.js), so what you see is what
exports.
Render Pipeline
guide.json + step.json + settings
│ core/renderast.js (placeholder expansion, numbering, filtering
▼ hidden/skipped, focused-view geometry)
Render AST ──► exporters/json.js .json + steps-<title>/ images
──► exporters/markdown.js .md + steps-<title>/ images
──► exporters/html-simple.js single self-contained .html
──► exporters/html-rich.js checkboxes + floating TOC
──► exporters/pdf.js native PDF writer (core/pdf.js)
──► exporters/gif.js GIF89a encoder (core/gif.js)
──► exporters/image-bundle.js annotated PNGs + metadata
──► exporters/docx.js zip+XML (core/zip.js)
──► exporters/pptx.js zip+XML (core/zip.js)
Image-bearing exporters rasterize annotations with core/raster.js on top of
PNG pixels decoded by core/png.js. Every exporter accepts a template object
(per-format settings persisted under settings/templates/, shareable as
.sfglt zip files).
Shell / Core Boundary
The renderer never touches the filesystem. app/preload.js exposes a typed
IPC API (stepforge.*), and app/main.js routes calls into core/. Screen
capture uses Electron's desktopCapturer (full screen, window) and an
overlay window for region selection; hotkeys use globalShortcut.
Click-Capture Pipeline
Workflow recording must behave like one click → one step, with the screenshot showing the screen at the click and the marker on the exact click position. Three pieces make that hold:
-
OS click events (
app/capture.js): a low-level mouse hook on Windows (CLICK x y button unixMslines), anxinput test-xi2 --rootwatcher on X11. The Linux parser carries event-timeroot:coordinates and merges raw/regular twin blocks structurally — there is no time-based debounce that could drop fast clicks, only suppression of identical duplicate deliveries. Physical coordinates convert to DIP viascreen.screenToDipPointon Windows or display-geometry math inapp/coords.jselsewhere (multi-monitor and scale-factor aware). -
Frame recorders: while recording, a hidden worker window (
app/stream-backend.js+app/renderer/capture-worker.js) samples a desktop media stream per display into a timestamped ring buffer — entirely off the main process, so click delivery is never delayed by capture work, and PNG encoding happens in the worker. If streams can't start (portal-less Wayland), or the worker stops answering, the service degrades to the legacy in-processdesktopCapturerloop. -
Click ↔ frame pairing (
app/click-frames.js, shared by the main process, the worker, and tests): each click is paired at event time with the newest frame captured at or before its hook timestamp. In strict mode (capture.strictClickFrames, default on) a frame whose grab started after the click is never used — when nothing qualifies, the service takes an explicit fresh shot instead of passing a post-click frame off as the click-time screen. Storing is serialized per click; pairing is not, so slow encodes never skew later clicks.
Reliability rules that keep "one click → one step" true under load:
- The worker reply is two-stage. It acknowledges frame selection within milliseconds (proving liveness and pinning the pairing), then ships the PNG whenever the encode finishes — seconds later on software-rendered hosts. A slow payload is never mistaken for a dead worker; only a missing ack degrades the backend.
- Stopping drains. Finishing or pausing a recording keeps the worker alive until frames already selected for queued clicks finish encoding. Without this, ending a session right after a fast click burst cancelled every still-encoding frame and those clicks vanished (the "I clicked ten times but only got two screenshots" bug).
- Queued clicks outlive the session. A click registered while recording carries its guide id and still becomes a step if the session ends while it waits in the store queue. The lone exception is the tray gesture that stopped the session, discarded by matching its recorded screen position.
- A click is never served another monitor's frame. If the clicked display has no ready stream the backend returns null and the caller fresh-shots the correct screen, rather than circling a point on the wrong one.
STEPFORGE_CLICK_SELFTEST=1 npm start exercises the whole pipeline in a
real Electron session: it reports steps-per-click and marker offsets, then
runs a fast-burst-then-finish scenario that must save every click.
STEPFORGE_CAPTURE_LOG=1 prints one diagnostic line per click decision.
Security Rules
- Zero network code paths: no sockets, no telemetry, no update or license checks, no remote fonts in exports.
- Archive imports validate every entry name against path traversal and
absolute paths before extraction (
core/zip.js). - Linked guides use sidecar
*.lock-sfgzlock files; conflicts surface a keep-editing / discard dialog and last-write-wins is documented. - Renderer runs with
contextIsolation: true,nodeIntegration: false, andsandboxenabled; only the preload API is exposed.
Workflow
- Change core/exporter logic together with its workflow tests in
tests/unit/. - Put shell checks in
tests/checks/so the shared runner picks them up. - Run
bash tests/run_test.shlocally. - Open a pull request so CI can verify on PR open.