JWKS fetch + cache + rotation (todo §4); cachingJwks: TTL cache + rotation-on-miss reload (throttled, last-good on error), createJwksProvider routes file/base64/http + primes at boot

This commit is contained in:
2026-06-18 10:01:40 +02:00
parent c8b56b85eb
commit 24eb6b1c68
5 changed files with 154 additions and 18 deletions

View File

@@ -82,7 +82,7 @@ everything via Docker.
- [x] SSO buttons → Kratos OIDC flows. **Render per configured provider only**: derive the list from Kratos' enabled OIDC providers (no creds ⇒ no button); hide the whole SSO section when none are configured. No code change needed to add/remove a provider — config only. → `flow-view.ts` now collects the login/registration flow's `oidc`-group submit nodes into `FlowView.sso` (`{label, logo, name, value}` per provider; `logo` = provider initial, lucide ships no brand marks) instead of skipping them — so the button list *is* Kratos' live provider list (none configured ⇒ `sso: []` ⇒ no section; activate/remove a provider purely via the §3 OIDC env). `auth-card.ejs` gained a submit-provider branch: a provider with `name`/`value` renders `<button type="submit" name=… value=…>` (posts `provider=<id>` to the same Kratos form, sharing its csrf hidden input); `href` still ⇒ `<a>`, neither ⇒ inert button. `auth.ejs` forwards `sso: { providers: flow.sso }`. Removed the mockup-only `body:not(:has(#sso-toggle:checked)) .sso{display:none}` rule from `auth.css` (`#sso-toggle` is a "remove for production" preview control in `html-css-foundation/Auth.html`) — visibility is now purely server-side. Tests-first: `flow-view.test.ts` (oidc→sso matrix + `sso:[]` when none), `auth-card.test.ts` (submit-provider markup), `app.test.ts` (live `/login` renders the SSO submit button in the form). README **Social sign-in (SSO)** updated (dropped the §4 forward-ref). typecheck + 181 units green. Boot-verified end-to-end: a real Kratos with the OIDC env emitted `{group:oidc, name:provider, value:google}``buildFlowView` derived `[{label:"Sign in with google", logo:"G", name:"provider", value:"google"}]`; clean-clone `/login` renders no `.sso` section; torn down.
- [x] Login completion: read roles from Keto → write `metadata_public` projection → tokenize → set JWT cookie. → `src/login.ts` (`completeLogin`/`readRoles`/`sessionCookie`, `SESSION_COOKIE`), wired into `app.ts` at `GET /auth/complete` — where `kratos.yml` now lands the browser after a successful login (`login.after.default_browser_return_url`). The route: `whoami(cookie)` → identity (id/email; no session ⇒ 303 `/login`); `readRoles` lists `Role:*#members@user:<id>` from Keto (one paged read, sorted/de-duped; group→role transitivity is §5); projects `{roles}` onto the identity; then `whoami(tokenize_as: plainpages)` → the signed JWT, stored as `plainpages_jwt` (HttpOnly + SameSite=Lax + 30d, `secure` deferred to §9). `server.ts` builds the kratos-admin + keto clients and passes all three to `createApp`. **Design bug caught in live boot-verify + fixed:** the projection had to move `metadata_admin``metadata_public` — Kratos *strips admin metadata* from the session the tokenizer reads, so `metadata_admin` yielded `roles:[]`; `metadata_public` is carried (and the user already reads these coarse roles in their own JWT, so nothing leaks). Touched `kratos-admin.ts` (`updateMetadataAdmin``updateMetadataPublic`, `/metadata_public` patch), the tokenizer jsonnet, and the kratos.yml/README rationale. Tests-first: `login.test.ts` (readRoles paging/dedup; completeLogin order whoami→project→tokenize; no-session⇒null; missing email⇒null; no-JWT⇒throw; cookie flags) + `app.test.ts` integration (`/auth/complete` projects roles, sets `plainpages_jwt`, 303→`/`; no session ⇒ 303 `/login`, no cookie) + `kratos.test.ts` (after-login URL + jsonnet metadata_public). Boot-verified the whole chain live: real admin login → `/auth/complete` → JWT `{sub, email, roles:["admin"], expiat=600}`, identity re-projected `metadata_public:{roles:["admin"]}` from Keto (wiped first to prove the write); no-session ⇒ 303 `/login`; torn down. The full-stack login Playwright E2E is owned by §8. typecheck + 189 units green.
- [x] JWT middleware: verify signature via cached JWKS, validate `exp`/`iss`/`aud` (+clock skew), build context (user, roles). → `src/jwt-middleware.ts` (`authenticate`/`verifyToken`/`validateClaims`/`claimsToUser`) is the per-request hot path that never calls Ory: read the `plainpages_jwt` cookie → `decodeJws` the `kid` → resolve the verify key from the cached JWKS → `verifyJws` (§0 signature/alg-confusion guards) → validate claims → project the `User` (`sub`→id, email, roles). `src/jwks.ts` (`JwksProvider`, `loadJwks`, `staticJwks`) is the key-by-`kid` seam: `loadJwks` reads the mounted `file://` tokenizer key (dev default + prod mount) or a `base64://` inline set; `staticJwks` picks by `kid`, falling back to the sole key when a token carries none — **HTTP fetch + TTL cache + rotation-on-miss is the next §4 item (line 85)**; the interface lets it drop in without touching callers. Claim checks: `exp` required + `nbf` honoured, both with a 60s clock-skew leeway; `iss`/`aud` are **opt-in** — validated only when `JWT_ISSUER`/`JWT_AUDIENCE` are pinned (new optional `config.ts` fields), because the Kratos tokenizer sets neither (a clean clone must still verify). `authenticate` **fails closed**: any bad/expired/malformed token ⇒ `null` (anonymous), so the route renders signed-out and the §2 permission gate denies. Wired into `app.ts` — verify once per request (after the static short-circuit, before routing/hooks), thread `user` into both the base and route `RequestContext`, and feed `ctx.roles` (was `[]`) into the dashboard nav; `server.ts` loads the mounted JWKS at boot + passes the pinned iss/aud. Tests-first: `jwt-middleware.test.ts` (key-by-kid across a rotated set, exp/nbf + skew, iss/aud only-when-configured, bad-sig/unknown-kid, claimsToUser sub/email/roles, authenticate fail-closed matrix), `jwks.test.ts` (kid select/sole-key/miss + file/base64/reject-http), `config.test.ts` (iss/aud optional), `app.test.ts` (a verified cookie authorizes the gated `/demo/secret`; no-cookie/expired ⇒ 403). typecheck + 199 units + 7 E2E green; boot-smoked server.ts loading the mounted key. The live-stack token-refresh/timeout E2E is the §4 line 90 item; the full login E2E is §8.
- [ ] JWKS fetch + cache + rotation handling.
- [x] JWKS fetch + cache + rotation handling.`src/jwks.ts`: `cachingJwks(load, opts)` self-refreshing provider behind the existing `JwksProvider.getKey` seam (drop-in, callers untouched) — holds keys for `ttlMs` (5m), reloads on the next lookup past TTL, and on a `kid` miss reloads **once more** (rotation-on-miss → a freshly-prepended key verifies without a restart, README zero-downtime rotation), throttled by `minRefetchMs` (60s) so a stream of bogus kids can't hammer the source. A reload failure keeps the last-good set (transient resilience); only a cold cache propagates the error (→ middleware fails closed). Concurrent loads coalesce on one in-flight promise. `createJwksProvider(jwksUrl)` routes by scheme + primes at boot (fail loud): `base64://` → immutable `staticJwks`; `file://` → re-readable cache (rotation by remount/edit); `http(s)://` → new `fetchJwks` (Accept JSON, non-2xx throws). `server.ts` now `await createJwksProvider(config.jwksUrl)` (top-level await already present) — replaces `staticJwks(loadJwks(...))`. Tests-first (`jwks.test.ts`: TTL cache+expiry, rotation-on-miss + throttle, last-good-on-error vs cold-load-propagates, scheme routing + http prime/cache + fail-loud on non-2xx/missing-file/bad-scheme). README **Layout** line updated; the **JWT signing key & rotation** + flow-diagram cache notes already described this. typecheck + 203 units green; boot-smoked the file:// prime path. Guards/re-mint/logout/CSRF are the next §4 items.
- [ ] Guards: `requireSession` (validate JWT), `can(role)` (claim, in-process), `check(relation, object)` (live Keto).
- [ ] Session re-mint on TTL expiry (re-read roles from Keto).
- [ ] Logout: revoke Kratos session + clear cookie.