README

A local HTTP server skill that turns any HTML file or static website into a collaboratory between a human in a browser and a coding agent in a terminal.

The agent starts the server pointed at a directory, tells the user the URL, and the user opens it. Every page served gets a tiny inspector script injected into it. When the user hovers or clicks a DOM element, the inspector reports the full context — selector, attributes, text, computed styles, bounding rect, parent chain — back to the server. The agent then polls a handful of REST endpoints to know exactly what the user is looking at, what they just interacted with, and (when enabled) can even run JavaScript inside the page to drive the UI.

This skill is the HTML-side counterpart to content-factory: same Node CJS + stdlib http stack, same workspace conventions under .codi_output/, same start/stop shell script pattern.


Table of contents

  1. What this skill does
  2. Directory structure
  3. Workflow
  4. Configuration
  5. Server API reference
  6. Injected inspector client
  7. Design decisions
  8. Adapting for similar use cases
  9. Testing

What this skill does

  1. Serves a user-supplied directory over HTTP on 127.0.0.1:<random-port>.
  2. Injects <script src="/__inspect/inspector.js"></script> before </body> on every HTML response.
  3. The injected inspector draws a hover outline, lets the user click to lock a selection, and streams selection events and user interactions to the server via POST /__inspect/ingest.
  4. The server exposes a REST API the coding agent polls to read:
    • What element is currently selected (full context)
    • A ring buffer of recent user interactions (click, input, navigation, scroll)
    • Current page URL, title, viewport
  5. Optional (on by default, disable with --no-eval): the agent can push JavaScript to the server, which the in-page inspector long-polls, runs in the page context, and returns the result to the agent.

Directory structure

html-live-inspect/
├── README.md                           # this file
├── index.ts                            # exports template + staticDir
├── template.ts                         # SKILL.md body (wrapped in TS literal)
├── evals/
│   └── evals.json                      # trigger + behavior evals
├── scripts/
│   ├── package.json                    # zero-dep Node package manifest
│   ├── start-server.sh                 # launcher — prints JSON {url,pid,apiBase}
│   ├── stop-server.sh                  # graceful shutdown via pid file
│   ├── server.cjs                      # HTTP entrypoint + static + injection
│   ├── lib/
│   │   ├── http-utils.cjs              # sendJson, sendText, readBody, status helpers
│   │   ├── workspace.cjs               # pid/log/state files under .codi_output/
│   │   ├── selection-store.cjs         # latest selection + history
│   │   ├── event-log.cjs               # ring buffer (seq-based, default 500)
│   │   ├── eval-bridge.cjs             # long-poll queue for agent→browser eval
│   │   └── injector.cjs                # <script> tag injection into HTML
│   ├── routes/
│   │   ├── health-routes.cjs           # GET /api/health, GET /api/page
│   │   ├── selection-routes.cjs        # GET /api/selection[, /history]
│   │   ├── events-routes.cjs           # GET /api/events?since=N, DELETE /api/events
│   │   ├── dom-routes.cjs              # GET /api/dom?selector=...
│   │   ├── eval-routes.cjs             # POST /api/eval (+ internal pull/push)
│   │   └── ingest-routes.cjs           # POST /__inspect/ingest (client → server)
│   └── client/
│       └── inspector.js                # injected browser-side overlay + capture
└── tests/
    └── integration/
        └── server.test.js              # boot server, hit API, assert shapes

Workflow

  1. Agent invokes scripts/start-server.sh --site-dir <path>. The launcher picks a random port, writes pid/log files under <project>/.codi_output/html-live-inspect/, and prints a JSON line {"url":"http://localhost:PORT","apiBase":"http://localhost:PORT/api","pid":1234}.
  2. User opens the URL in a browser. The first HTML response has the inspector script injected. The browser loads /__inspect/inspector.js from the server.
  3. User hovers and clicks elements. The inspector draws an outline, captures a full context snapshot on click, and POSTs it to /__inspect/ingest.
  4. Agent polls GET /api/selection to know what the user is looking at, and GET /api/events?since=N to catch up on recent interactions.
  5. Agent drives the UI (if eval is enabled) by calling POST /api/eval with a JavaScript body. The server queues it; the inspector’s long-poll loop picks it up, runs it inside the page, and posts the result back.
  6. Agent shuts down the server with scripts/stop-server.sh. The workspace directory stays if a --project-dir was given, otherwise it is wiped from /tmp/.

Configuration

Environment variables honored by server.cjs:

VariableDefaultPurpose
HLI_PORTrandom high portbind port
HLI_HOST127.0.0.1bind address
HLI_SITE_DIRrequireddirectory served as static root
HLI_WORKSPACE/tmp/html-live-inspect-workspacepid/log/state root
HLI_ALLOW_EVAL10 disables /api/eval and in-page eval
HLI_EVENT_BUFFER500ring buffer capacity for events
HLI_IDLE_TIMEOUT_MS1800000auto-shutdown when idle (30 min)
HLI_OWNER_PIDnoneparent PID to watch for exit

All of these are settable via flags on start-server.sh: --site-dir, --host, --port, --workspace, --no-eval, --idle-timeout, --foreground/--background.


Server API reference

Base URL: http://127.0.0.1:<port>. All responses are JSON unless noted.

MethodPathPurpose
GET/api/health{status, uptimeMs, siteDir, allowEval}
GET/api/page{url, title, viewport, userAgent, lastUpdateMs}
GET/api/selectionFull context of the currently locked element — null if none
GET/api/selection/history?limit=NPrevious selections (max 50)
GET/api/events?since=<seq>&limit=N{events:[...], nextSeq, dropped}
DELETE/api/eventsClear the event ring buffer
GET/api/dom?selector=<css>Ask the page to resolve the selector and return context
POST/api/evalBody {js, timeoutMs?}{ok, result?, error?} (403 if eval disabled)
GET/__inspect/inspector.jsThe injected client script (served raw JS)
GET/__inspect/eval-pullInternal long-poll for the inspector — not for agent use
POST/__inspect/eval-pushInternal result callback from the inspector
POST/__inspect/ingestInternal sink for selection/event updates from the inspector

Selection payload shape:

{
  "seq": 42,
  "timestamp": 1713110400000,
  "selector": "main > section:nth-of-type(2) > button.primary",
  "tag": "button",
  "id": "submit",
  "classes": ["btn", "primary"],
  "attributes": {"type": "submit", "aria-label": "Send"},
  "text": "Send message",
  "outerHTMLSnippet": "<button id=\"submit\" ...>...</button>",
  "boundingRect": {"x": 120, "y": 480, "width": 140, "height": 44},
  "computedStyles": {"display": "inline-block", "color": "rgb(255,255,255)", ...},
  "parentChain": [
    {"tag": "section", "id": "", "classes": ["hero"], "selector": "main > section:nth-of-type(2)"},
    ...
  ],
  "childrenCount": 1,
  "pageUrl": "http://localhost:1234/index.html",
  "pageTitle": "Demo"
}

Injected inspector client

scripts/client/inspector.js is the brain of the browser side. It:

  • Draws a dashed outline on mouseover via a single absolutely-positioned overlay <div> — does not touch the page’s own styles.
  • Locks the selection on click (or Alt+click to unlock).
  • Builds a stable CSS selector path using :nth-of-type indices.
  • Snapshots a curated set of computed styles (box model, typography, color, flex/grid) — not the full getComputedStyle object, which is huge.
  • Captures interactions (click, input, submit, scroll, navigation) and POSTs them to /__inspect/ingest with a monotonic sequence number.
  • Long-polls /__inspect/eval-pull; when a task arrives, runs it in a controlled Function() scope, captures the result (or error), and POSTs back to /__inspect/eval-push.
  • Exposes window.__HLI__ with helpers for debugging but nothing the page can exploit to exfiltrate data — the inspector is isolated per origin and only communicates with its own server.

Design decisions

  • Zero npm deps — Node stdlib http/fs/path only. Matches content-factory and eliminates install friction.
  • Single-process server, single shared state — the server is expected to run one inspected site at a time, so in-memory state is fine. Pid + log files let a second invocation detect and replace a stale instance.
  • Long-poll over WebSocket for eval — one code path, no upgrade dance, works behind anything. Long-poll timeout of 25s matches browser fetch defaults.
  • Injection via regex on response body — simple and resilient. For responses that are missing </body>, the injector appends to the end.
  • script-src is NOT tightened — if a user’s HTML already has a strict CSP, the injected inspector may be blocked. Documented as a known limitation; the user can add 'self' to their script-src or set the --csp-relax flag to rewrite their Content-Security-Policy header (off by default).
  • Eval is opt-out, not opt-in — the user explicitly opted for this during skill design. Disable with --no-eval for demos to untrusted stakeholders.
  • No auth on localhost — bound to 127.0.0.1 by default. Exposing to a LAN requires --host 0.0.0.0 AND --allow-remote, and prints a loud warning.

Adapting for similar use cases

To build a variant (e.g. a PDF inspector, a canvas inspector):

  1. Duplicate the directory under src/templates/skills/<new-name>/.
  2. Replace scripts/client/inspector.js with the target-specific capture logic.
  3. Adjust scripts/lib/injector.cjs for the new content type — the regex that finds an injection point will differ.
  4. Re-export and register in src/templates/skills/index.ts and src/core/scaffolder/skill-template-loader.ts.

The server core (server.cjs, routes/, lib/http-utils.cjs, lib/workspace.cjs, lib/event-log.cjs, lib/selection-store.cjs, lib/eval-bridge.cjs) is reusable as-is.


Testing

  • Evals — run the 6 cases in evals/evals.json after any change to the description. Two negatives are required to keep false triggers down.
  • Integration testtests/integration/server.test.js boots the server against a fixture directory, hits every API endpoint, asserts response shapes, and checks that HTML responses carry the injected script tag. Run with npx vitest run src/templates/skills/html-live-inspect/tests/.
  • Manual smoke testbash scripts/start-server.sh --site-dir ./fixtures then open the printed URL, click elements, and curl $apiBase/selection.

SKILL.md

Overview

HTML Live Inspect turns any local HTML file or static website into a shared workspace between the human (in a browser) and the coding agent (in a terminal). You start a tiny Node HTTP server pointed at a directory, give the user the URL, and every page the server returns is silently augmented with a DOM inspector. When the user hovers or clicks an element, a full context snapshot is pushed to the server. The agent reads a handful of REST endpoints to know exactly what the user is looking at and what they just did. When enabled, the agent can also run JavaScript inside the live page to drive the UI, highlight things, fill forms, or trigger actions — a true collaboratory.

This skill is the HTML twin of codi-content-factory: same Node CJS + stdlib http stack, same start/stop shell scripts, same workspace layout under .codi_output/.


When to Activate

  • The user asks you to open a local HTML file or folder so you can see what they click.
  • The user says “start html live inspect”, “open this html in collaboratory mode”, “let me show you in the browser”, “watch what I select”, “inspect this page with me”.
  • The user wants you to drive a local web page (click, fill, read state) while they watch in a browser.
  • The user is designing HTML and wants you to know which element they mean without pasting selectors.

Skip When

  • The user wants to test a live remote URL — use codi-webapp-testing (Playwright-backed).
  • The user wants to generate social cards or slides — use codi-content-factory.
  • The user wants to build frontend components from scratch — use codi-frontend-design.

Skill assets

AssetPurpose
${CLAUDE_SKILL_DIR}[[/scripts/start-server.sh]]Start the server — prints JSON {url, apiBase, pid}
${CLAUDE_SKILL_DIR}[[/scripts/stop-server.sh]]Stop the server gracefully via pid file
${CLAUDE_SKILL_DIR}[[/scripts/server.cjs]]Node HTTP entrypoint (zero deps)
${CLAUDE_SKILL_DIR}[[/scripts/client/inspector.js]]Injected browser-side overlay and capture script
${CLAUDE_SKILL_DIR}[[/scripts/routes/selection-routes.cjs]]/api/selection handlers
${CLAUDE_SKILL_DIR}[[/scripts/routes/events-routes.cjs]]/api/events handlers
${CLAUDE_SKILL_DIR}[[/scripts/routes/dom-routes.cjs]]/api/dom handler
${CLAUDE_SKILL_DIR}[[/scripts/routes/eval-routes.cjs]]/api/eval handler (agent → page JS)
${CLAUDE_SKILL_DIR}[[/scripts/routes/health-routes.cjs]]/api/health and /api/page handlers

Agent workflow

Step 1 — Start the server

Run:

bash ${CLAUDE_SKILL_DIR}[[/scripts/start-server.sh]] --site-dir <absolute-path-to-html-or-folder>

Flags:

FlagDefaultPurpose
--site-dir <path>requiredFile or directory to serve
--host <host>127.0.0.1Bind address (localhost only by default)
--port <port>random high portFixed port
--workspace <dir>/tmp/html-live-inspect-workspacepid/log/state root
--no-evaleval enabledDisable the /api/eval endpoint and in-page executor
--idle-timeout <ms>1800000Auto-shutdown after idle (30 min)
--foregroundbackgroundRun attached (for debugging)

The launcher prints one JSON line on stdout:

{"url":"http://localhost:49234","apiBase":"http://localhost:49234/api","pid":12345,"siteDir":"/abs/path","allowEval":true}

Capture url and apiBase. Tell the user the URL and ask them to open it in their browser.

Step 2 — Wait for the user to select something

The inspector supports three selection modes:

  • Plain click — replaces the single current selection. Read with GET /api/selection.
  • Cmd/Ctrl + click — toggles membership in a multi-selection set. Each selected element gets an orange overlay, and a badge in the corner shows the count. Read the set with GET /api/selections.
  • Alt + click — clears both the single selection and the entire multi-selection set.

After the user tells you they have selected, choose the right endpoint:

# Single selection (most common)
curl -s "$apiBase/selection"

# Multi-selection set — returns {count, selections: [...]}
curl -s "$apiBase/selections"

When count > 0, prefer /api/selections and apply operations to all of them at once (see Step 5).

Step 3 — Read context and act

The selection payload contains everything you need to reason about the element the user picked:

  • selector — stable CSS selector (use for further DOM queries via /api/dom?selector=... or /api/eval)
  • tag, id, classes, attributes — semantic identity
  • text, outerHTMLSnippet — content (truncated to 500 chars / 2 KB)
  • computedStyles — curated subset (box model, typography, color, flex/grid) — not the full raw style object
  • boundingRect — x, y, width, height in viewport coordinates
  • parentChain — up to 5 ancestors (tag, id, classes, selector)
  • childrenCount — direct children count
  • pageUrl, pageTitle — context

Step 4 — Catch up on recent interactions

curl -s "$apiBase/events?since=0"

Returns a ring buffer of recent events (clicks, inputs, form submits, navigations, scroll positions). Each event has a monotonic seq — poll with ?since=<last-seq> to get only new events. The response also has dropped (events lost to buffer overflow) and nextSeq.

Step 5 — Drive the page (optional, if allowEval is true)

curl -s -X POST "$apiBase/eval" \
  -H "Content-Type: application/json" \
  -d '{"js":"document.querySelector(\"#submit\").click(); return true;"}'

The body’s js string is wrapped in a Function() and executed in the page context. The return value (or thrown error) comes back as {ok, result, error}. Use this to click buttons, fill inputs, toggle state, read computed values the static snapshot does not capture, or highlight elements for the user.

Powerful things to do with eval:

  • Highlight the element you are talking about:
    const el = document.querySelector(sel);
    el.style.outline = '3px solid magenta';
    setTimeout(() => el.style.outline = '', 2000);
  • Scroll an element into view before the user looks:
    document.querySelector(sel).scrollIntoView({behavior:'smooth', block:'center'});
  • Read live values not in the static snapshot:
    return {value: document.querySelector('#name').value, scroll: window.scrollY};
  • Drive a form end-to-end on behalf of the user for a demo.

Eval times out after 10 seconds by default. Override with {"js":"...", "timeoutMs": 30000}.

Applying an operation to the multi-selection set:

selectors=$(curl -s "$apiBase/selections" | jq -c '.selections | map(.selector)')
curl -s -X POST "$apiBase/eval" -H "Content-Type: application/json" \
  -d "{\"js\":\"const sels=${selectors}; sels.forEach(s=>document.querySelectorAll(s).forEach(el=>el.style.color='red')); return sels.length;\"}"

Similarity / query selection (Option C pattern):

When the user clicks one element and says “apply to all like this”, do not ask them to click every one. Instead:

  1. Read GET /api/selection to get the clicked element’s tag + classes + parent context.
  2. Propose 2-3 querySelectorAll variants that might match what they want, e.g.:
    • all siblings with the same tag: .hero > p
    • all elements with the same class: .btn.primary
    • all descendants matching a pattern: .card h3
  3. Preview the match count for each with POST /api/eval { js: "return window.__HLI__.previewQuery('.btn.primary');" }.
  4. Highlight the best match set briefly to confirm visually: POST /api/eval { js: "return window.__HLI__.highlight('.btn.primary', 2000);" }.
  5. Get user confirmation in chat.
  6. Apply the operation with a single eval over the full querySelectorAll.

The inspector exposes these helpers on window.__HLI__:

  • previewQuery(selector) → match count (or -1 on invalid selector)
  • highlight(selectorOrList, ms) → briefly outlines all matches in magenta
  • listSelections() → array of selectors currently in the multi set
  • describe(selector) → full snapshot for a CSS selector

Step 6 — Stop the server

bash ${CLAUDE_SKILL_DIR}[[/scripts/stop-server.sh]]

The server also auto-shuts down after 30 minutes of idle.


Server API reference

All endpoints return JSON. Base URL is ${apiBase} (from the start script output).

MethodPathBodyResponse
GET/api/health{status, uptimeMs, siteDir, allowEval, version}
GET/api/page{url, title, viewport, userAgent, lastUpdateMs}
GET/api/selectionSingle current selection, or null
GET/api/selection/history?limit=NPrevious single-click selections
GET/api/selectionsMulti-selection set: {count, selections:[...]}
DELETE/api/selectionsClear the multi-selection set
GET/api/events?since=<seq>&limit=N{events, nextSeq, dropped}
DELETE/api/events{ok: true}
GET/api/dom?selector=<css>Selection-shape object for the first match, or null
POST/api/eval{js, timeoutMs?}{ok, result, error} — 403 if disabled

The ingest, pull, and push endpoints under /__inspect/* are internal — they are used by the injected client, never by the agent.


Conventions

  • Always serve an absolute path. Pass --site-dir with the full path so the server does not depend on where it was started.
  • One inspected site at a time. If you need a second, stop the first (the workspace tracks a single active server via pid file).
  • Bind to localhost only. Do not pass --host 0.0.0.0 unless the user explicitly asks for LAN access — and tell them it is unauthenticated.
  • Wait for user interaction. Do not poll /api/selection in a tight loop. Either wait for the user to say “I clicked” or poll once every few seconds.
  • Cite the selector, not the description. When referring to what the user selected in your replies, include the exact selector from the payload so the user knows you are looking at the same element.
  • Always echo the URL to the user. The user cannot interact with a server they do not know how to open.

Output contract

When activated, you must:

  1. Start the server and emit the url to the user in plain text.
  2. Confirm which directory is being served and whether eval is enabled.
  3. Explain in one line that the user can hover + click to select.
  4. Poll /api/selection once the user confirms they have selected, and summarize the element in 2-3 sentences (tag + role + text + key classes).
  5. Stop the server with the stop script when the user is done, or when you are handing off to another skill.

Constraints

  • Do NOT start the server without an explicit --site-dir from the user.
  • Do NOT bind to any interface other than 127.0.0.1 unless the user explicitly asks.
  • Do NOT poll APIs in a busy loop — respect the server.
  • Do NOT run /api/eval with code the user has not seen — show the JS first, get confirmation, then execute.
  • Do NOT ask the user to click every element when a similarity query could match them all — propose a selector, preview the count, confirm, then apply (Option C pattern in Step 5).
  • Do NOT leave the server running across unrelated tasks. Stop it when done.
  • Do NOT use this skill to inspect remote production sites — use codi-webapp-testing (Playwright) for that.