§9 structured logging + OTLP observability (todo §9); structured, OTLP-native logging on @larvit/log (2.3.0, pinned; itself zero-dependency — the one new runtime dep). New pure src/logger.ts: createLogger() builds one app Log tagged service.name=plainpages (level/format/OTLP from config, injectable stdout/stderr); requestLogger() clones it per request (own root trace, inheriting level/format/streams/OTLP) into a "request" span, adopting an inbound W3C traceparent so a request continues an upstream proxy's distributed trace (malformed ⇒ fresh trace; clone honours a passed traceparent while dropping the parent's, unlike parentLog). app.ts builds the per-request log at the top of the handler and on res "close" (fires on completion AND abort, unlike "finish") emits one access line (method/path-without-query/status/ms/requestId, guarded) then end()s to flush the span (fire-and-forget .catch — a flaky collector never crashes a served request); the catch-all 500 + Ory-unreachable re-mint now log via reqLog.error/warn; static.ts mid-stream error takes an injected onError. server.ts builds the app logger, logs discovery/listen/shutdown, end()-flushes on SIGTERM/SIGINT (re-entry-guarded). bootstrap.ts events go structured (the human first-run banner stays raw). Config (environment-agnostic, fail-loud): LOG_LEVEL (info), LOG_FORMAT (text; prod compose → json), OTLP_ENDPOINT (unset ⇒ console-only; set ⇒ export logs + spans to an OTel Collector), OTLP_PROTOCOL (http/json|http/protobuf). compose: base sets LOG_FORMAT=json, dev override flips it to text. Tests-first: logger.test.ts (service.name/severity/level-gate/format, OTLP-only-when-endpoint, a stubbed-fetch proof it POSTs /v1/logs, requestLogger context-merge/own-root-trace/traceparent-continue/malformed-ignored), config.test.ts (4 toggles + validation), app.test.ts (live request emits the JSON access line), compose.test.ts (prod json / dev text). Stability-reviewer: APPROVE, no Critical/High (addressed both yellow nits — guarded access line + "finish"→"close" so aborted requests log; shutdown re-entry guard — and the green ones). README (config table, new Observability section, Status, Layout, runtime-deps) + AGENTS (deps) updated. typecheck + 326 units green (317 → 326).

This commit is contained in:
2026-06-20 02:11:10 +02:00
parent a8a018f3e5
commit a9e3dedbb4
17 changed files with 325 additions and 20 deletions

View File

@@ -128,7 +128,7 @@ everything via Docker.
- [x] `compose.yml` prod: Ory + Postgres, secrets via env, no source mount. → The base file was already the full prod stack (web + Postgres + Kratos/Keto/Hydra + migrations + the one-shot bootstrap; `.:/app` lives only in the dev override), built during §3. **The real gap, now closed:** it set `REQUIRE_SECURE_SECRETS=true` but never wired `CSRF_SECRET` into `web`, so `docker compose -f compose.yml up` couldn't boot. Added `CSRF_SECRET: ${CSRF_SECRET:-dev-insecure-csrf-secret}` — env-supplied with the throwaway as the only fallback; `config.ts`'s existing `REQUIRE_SECURE_SECRETS` logic rejects that throwaway, so a forgotten prod secret **fails loud** (verified all three paths: prod-unset→reject, prod-set→real secret, dev→throwaway + toggle off → boots). Used `:-` not `:?` because compose interpolates the base file per-file *before* merging the override (confirmed empirically), so a `:?` in the base would also break the zero-config dev `docker compose up`. Tests-first: extended `compose.test.ts` (secret-via-env + no-source-mount + the prod/dev toggle split + postgres-creds-via-env). README prod section corrected (dropped the stale "_(… Ory + Postgres — planned)_"). typecheck + 310 units green.
- [x] Security headers; secure/HttpOnly/SameSite cookies; CSRF; clock-skew tolerance. → Cookies (HttpOnly · SameSite=Lax · Secure-when-`SECURE_COOKIES`, `src/cookie.ts`), the signed double-submit CSRF (`src/csrf.ts`), and JWT clock-skew leeway (`JWT_CLOCK_SKEW_SEC`, applied to exp+nbf in `validateClaims`) all landed in §4 — the open gap was **response security headers**, now closed. New pure `src/security-headers.ts` (`securityHeaders({secure})`): a strict CSP for the zero-JS core — `default-src 'self'`, `script-src 'self'` with **no** `'unsafe-inline'` (an injected `<script>` can't run; core ships none, a plugin may still serve its own `/public/<id>/*.js`), `style-src` adds `'unsafe-inline'` for the partials' inline `style=`, `img-src 'self' data:`, `frame-ancestors 'none'`, `object-src 'none'`; **`form-action` deliberately omitted** (the themed login POSTs to Kratos' often-cross-origin action URL) — plus `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, `Referrer-Policy: strict-origin-when-cross-origin`, `Cross-Origin-Opener-Policy: same-origin`, and HSTS only when `secureCookies` (https; ignored on dev http). Wired in `app.ts`: precomputed once at boot, `res.setHeader`'d at the very top of the handler before any branch, so **every** response (page/json/redirect/static/error/plugin) inherits them via `writeHead`'s merge; a plugin overrides per-route via `RouteResult.headers`. Verified no view/CSS loads cross-origin (no `<script>` anywhere, no external fonts/CDNs), so `default-src 'self'` breaks nothing. Tests-first: `security-headers.test.ts` (strict defaults, `script-src` has no `'unsafe-inline'`, HSTS-only-on-secure) + an `app.test.ts` integration (the page **and** a static asset both carry the headers; HSTS toggles with `SECURE_COOKIES`). Stability-reviewer on the diff: **APPROVE, no Critical/High** (Low: a CDN/absolute branding logo would be CSP-blocked → documented the same-origin-logo constraint). README Status + Production + Layout updated. typecheck + 312 units green.
- [x] Optional revocation denylist for instant role/session revoke. → Closes the documented ~10m role/session lag for security-critical revoke, **off by default** (`REVOCATION_DENYLIST`, zero hot-path cost + zero behaviour change when off). New pure `src/denylist.ts` (`createDenylist({ttlSec})`): an in-memory, auto-evicting `Map<sub, revokedAt>``revoke(sub)` records now, `isRevoked(sub, iat)` rejects a subject's tokens minted **at/before** the revoke (`iat <= revokedAt`; missing `iat` fails closed), so a *fresh* re-login (iat after the revoke) passes while a downgrade lands immediately. Entries self-evict after `REVOCATION_TTL_SEC` (default 900 ≥ the 10m tokenizer TTL + skew), so it stays a bounded cache like JWKS — **no database, Keto stays off the hot path**. Wired: `jwt-middleware.ts` takes the denylist in `VerifyOptions` and throws `TokenError(expired)` on a revoked sub, so `resolveSession` routes it through the existing §4 re-mint (live session → fresh post-revoke JWT with current Keto roles; dead/deactivated → cleared cookie). `app.ts` merges it into `authOptions` (the same `resolveSession` hot-path call) and hands a bound `revoke` to the Users + Roles admin deps; `admin-users.ts` revokes on **deactivate/delete**, `admin-roles.ts` revokes a direct `user:` member on **assign/unassign** (a `group:`/whole-role change is transitive → left to lag, documented). `server.ts` builds it only when the toggle is on. Tests-first: `denylist.test.ts` (iat semantics, cutoff-advance, TTL eviction), `jwt-middleware.test.ts` (revoked→expired→re-mint, fresh passes), `config.test.ts` (toggle + posint TTL), `app.test.ts` (hot-path reject + fresh-login pass; admin deactivate/role-assign/unassign record the revoke). Stability-reviewer on the diff: **APPROVE, no Critical/High/Medium** (addressed its one Low: a comment noting whole-role delete lags like a group change). Per the §9 security-headers precedent, covered by unit + app-HTTP integration (no new browser E2E — no new user-facing page; the operator toggle + handler paths are exercised directly). README (Auth trade-off + a new "Instant revoke" subsection, config table, Layout) updated. typecheck + 317 units green.
- [ ] Structured logging / basic observability. use @larvit/log for OTLP compability dig down in how to use it properly.
- [x] Structured logging / basic observability. use @larvit/log for OTLP compability dig down in how to use it properly. → Structured, OTLP-native logging on **`@larvit/log`** (2.3.0, pinned; itself zero-dependency — the one new runtime dep, justified by this item). New pure `src/logger.ts`: `createLogger({format,level,otlpEndpoint,otlpProtocol,stdout,stderr})` → one app `Log` tagged `service.name=plainpages` (the OTLP resource attr Loki/Tempo group by); `requestLogger(appLog,{requestId,traceparent})` **clones** it per request (own root trace — *not* nested under one app-lifetime span — inheriting level/format/streams/OTLP) into a "request" span, **adopting** an inbound W3C `traceparent` so a request continues an upstream proxy's distributed trace (malformed/duplicate ⇒ fresh trace; verified `clone` honours a passed `traceparent` while dropping the parent's, unlike `parentLog`). Wired: `app.ts` builds the per-request log at the top of the handler and on `res` **"close"** (fires on both completion *and* abort/truncation, unlike "finish", so aborted/static-stream-error requests are still logged) emits one access line (`method`/`path` — query dropped, may carry tokens — `status`/`ms`/`requestId`, guarded by try/catch) then `end()`s to flush the span (fire-and-forget `.catch`, so a flaky collector never crashes a served request); the catch-all 500 + the Ory-unreachable re-mint now log via `reqLog.error`/`warn`; `static.ts`'s mid-stream error takes an injected `onError` (default console.error for standalone use). `server.ts` builds the app logger from config, logs discovery/listen/shutdown, and `end()`-flushes on SIGTERM/SIGINT (re-entry-guarded). `bootstrap.ts` events go structured; the human first-run banner stays a raw console.log (UX, not a log event). Config (environment-agnostic, fail-loud): `LOG_LEVEL` (info), `LOG_FORMAT` (text; prod compose → json), `OTLP_ENDPOINT` (unset ⇒ console-only; set ⇒ export logs + spans to an OTel Collector → Loki/Tempo), `OTLP_PROTOCOL` (http/json|http/protobuf). compose: base sets `LOG_FORMAT=json` (prod pipelines), dev override flips it to `text`. Tests-first: `logger.test.ts` (service.name/severity-routing/level-gate/format, level-none silent, OTLP-only-when-endpoint, a stubbed-global-fetch proof it POSTs `/v1/logs`, requestLogger context-merge / own-root-trace / traceparent-continue / malformed-ignored), `config.test.ts` (the 4 toggles + enum/URL validation), `app.test.ts` (a live request emits the JSON access line), `compose.test.ts` (prod json / dev text). Per the §9 security-headers/denylist precedent: unit + app-HTTP integration, **no new browser E2E** (no new user-facing page) — and live-boot-verified (dev text+colour, prod json, access lines for page/static/404, graceful-shutdown line). Stability-reviewer on the diff: **APPROVE, no Critical/High** — addressed both yellow nits (access line guarded + switched "finish"→"close" so aborted requests log; shutdown re-entry guard) and the green ones (README collector-outage stderr note, double-`end()` guard). README (config table, new **Observability** section, Status, Layout, runtime-deps) + AGENTS (deps) updated. **Deferred:** threading a traced `ctx.log`/`log.fetch` into plugin + Ory clients for child spans on upstream calls → a future `apiVersion` minor bump (RequestContext field), with the other deferred contract changes. typecheck + **326 units** green (317 → 326).
- [ ] JWT signing-key rotation runbook.
- [ ] Refresh README `Layout` + drop `_(planned)_` markers as pieces land.
- [ ] Run the architecture and the product reviewer agents on the _whole_ project, not just the latest changes, and address their issues.