Operations Runbook
This runbook covers the public site, docs, local preview, and integration evidence for AgentHub. It is intentionally secret-free: production host paths, rollback commands, keys, and private logs belong in the private operator workspace.
Daily Checks
| Check | Command or route | Expected result |
|---|---|---|
| Public home | https://hub.vectorcontrol.tech/zh and /en | Current homepage, no old hero copy, no old purple accent |
| Docs entry | /zh/docs and /en/docs | Styled docs page with sidebar, TOC, and body content |
| Deep docs | /zh/docs/workflows, /en/docs/desktop, /zh/docs/hub-edge | Correct body, localized navigation, no 404 |
| Discovery | /robots.txt, /sitemap.xml, /llms.txt | New routes present and free of secrets |
| Login entry | /zh/login or nav login button | Redirects to TokenDance ID rather than implementing a separate product login |
Use cache-busting when validating a recent deploy:
curl.exe -I "https://hub.vectorcontrol.tech/zh/docs/desktop?v=YYYYMMDDTHHMMSS"
curl.exe "https://hub.vectorcontrol.tech/llms.txt?v=YYYYMMDDTHHMMSS"
Local Build Gate
Run these from the registered release source worktree:
pnpm build
pnpm test
pnpm lint
Expected result:
- the static export exists in
out/; - localized docs pages export to both
out/en/docs/...andout/zh/docs/...; - tests cover the docs registry, i18n routes, search index, nav, footer, and hero behavior;
- lint reports no new warnings.
Docs Release Gate
Every new docs route must update:
| File or system | Required update |
|---|---|
| English MDX | src/content/docs/<slug>.mdx |
| Chinese MDX | src/content/docs/zh/<slug>.mdx |
| Navigation | src/lib/docs-data.ts groups and labels |
| TOC | PAGE_HEADINGS and PAGE_HEADINGS_ZH |
| Search | src/lib/search-index.ts body text |
| Discovery | public/sitemap.xml and public/llms.txt |
| Reader docs | README Documentation IA and changelog |
| Tests | Docs registry/search assertions where useful |
If any of these are missing, the route may render, but search, sidebar, prev/next, crawler discovery, or zh/en parity will drift.
Visual QA Gate
For UI or docs layout changes, check:
- desktop viewport around 1440 x 900;
- mobile viewport around 390 x 844;
/zh,/en,/zh/docs,/en/docs;- the changed docs route;
- theme toggle state when the page includes the Desktop mock;
- language switch state when mock copy is visible.
Look for:
- text wrapping or overflowing inside pills, buttons, nav, or footer;
- stale purple accents instead of TokenDance Blue;
- hard black footer backgrounds;
- excessive hover shadows on the Desktop mock;
- focus rectangles inside the non-interactive mock preview;
- old docs body missing styles.
Product Runtime Signals
When validating AgentHub product behavior rather than only the public site, record public-safe signals instead of private logs:
| Area | Signal | Healthy shape |
|---|---|---|
| Hub session | auth/session check | TokenDance ID subject maps to a Hub-local session; denied actions have request ids |
| Project routing | task target selection | target Edge is authorized before work is queued |
| Edge presence | heartbeat or health shape | Edge reports reachable state, version, workspace policy, and runtime inventory |
| Run lifecycle | event stream | created -> preparing -> running -> completed/failed/cancelled with monotonic events |
| Runtime adapter | adapter readiness | mock passes first; real CLI reports installed/authenticated/available or a clear runtime_unavailable |
| Artifacts and diff | review surfaces | relative paths, base/target metadata, approval id, and no private absolute paths |
| Queue/integration | async work | webhook/card path acknowledges quickly and slow work moves to a queue/retry state |
| Audit | action record | actor, project, target, run id, action, result, timestamp, and redacted failure reason |
Use this table for smoke reports and issue triage. Do not paste full prompts, raw provider output, private file content, access tokens, or host-specific logs into public docs.
Live Smoke Shape
A live smoke result should record:
site: hub.vectorcontrol.tech
version: cache-busting timestamp
routes: /zh, /en, /zh/docs, /en/docs, changed deep docs route
checks: body keyword, status code, discovery files, no stale old hero
result: pass/fail
Public changelog entries should describe what changed, not where the server files live.
Failure Triage
| Failure | First action |
|---|---|
| Old homepage still visible | Check the deploy source, CDN/browser cache, and whether the wrong worktree was built |
| Docs page empty or plain text | Confirm MDX export, static route, CSS assets, and browser cache |
| New docs page missing from sidebar | Update src/lib/docs-data.ts and tests |
| Search cannot find the new page | Update src/lib/search-index.ts |
| Sitemap misses a route | Update public/sitemap.xml and rerun public-surface checks |
llms.txt stale | Update key links and the current status text |
| Login page looks standalone | Keep the static site login as a TokenDance ID redirect shell only |
| Web route cannot reach an Edge target | Check Hub authorization, Edge presence, target id, and audit denial before changing the Web UI |
| Event stream stops mid-run | Check run state, queue backlog, adapter process state, and schema validation failures |
unauthorized_target | Confirm user/project membership and device target binding |
workspace_outside_allowlist | Confirm the workspace is registered for the selected Edge |
runtime_unavailable | Verify the mock runtime first, then local CLI install/auth/profile compatibility |
| Real runtime fails | Reproduce with the mock first, then inspect local CLI auth and Edge adapter logs |
Runtime Incident Triage
Use this table when product behavior fails after the public site itself is healthy.
| Symptom | Severity | First check | Owner | Public-safe evidence | Private evidence |
|---|---|---|---|---|---|
| Edge unreachable | High | Edge health URL from the selected target | Edge Server | status code, request id, target id | local process logs and host details |
| Web target unauthorized | High | Hub audit/request id for the target route | Hub Server | error code, request id, project id placeholder | authorization trace and member records |
| Workspace rejected | Medium | selected workspace against Edge allowlist | Edge Server | workspace_outside_allowlist, target id | local allowlist and path details |
| Runtime unavailable | Medium | mock runtime first, then CLI install/auth state | Edge adapter | runtime_unavailable, adapter id | local CLI output and credential state |
| Event stream stalls | Medium | last monotonic event and queue/run state | Edge + Hub | run id, last event type, request id | adapter logs and queue traces |
| Feishu/Lark callback slow | High | callback acknowledgement latency | Integration Gateway | event type, action id, latency bucket | raw provider payload and retry logs |
| Audit missing | High | Hub action record for the task/run | Hub Server | actor placeholder, run id, action, result | full audit row and internal correlation ids |
Public issues should stop at the error code, request id, route class, and redacted identifiers. Private evidence can include local logs, host paths, queue names, callback payloads, and raw runtime output only inside the operator workspace.
Status Words
Use conservative language:
| Word | Meaning |
|---|---|
| Live | Public site or route is reachable |
| Preview-ready | Works in local or controlled preview with clear evidence |
| Contract shaped | Interface is documented but the public SDK/package may still change |
| In progress | Implementation exists or is actively being integrated |
| In development | Planned or partially implemented; do not sell as available |
When unsure, choose the more conservative word and link to the roadmap or changelog.