Commit Graph

10 Commits

Author SHA1 Message Date
Julien Herr a29e9ab372 feat: WebSub Atom support, HTML processing via linkedom, W3C badges
WebSub / PubSubHubbub:
- Hub now accepts both /rss/:id and /atom/:id topic URLs
- WebSubSubscription stores format ("rss" | "atom")
- notifySubscribers sends RSS or Atom XML with correct Content-Type
- verifyAndStoreSubscription sends correct topic URL per format
- CI paths-ignore docs/** to skip deploy on docs-only changes

HTML processing (linkedom + escape-html):
- New html-processor.ts: body extraction, script/iframe/object removal,
  event handler + javascript: URL stripping, mso-* style cleanup,
  plain text → <pre> with HTML escaping via escape-html
- feed-generator.ts and entries.ts use processEmailContent

Admin UI:
- W3C validation badges (Atom + RSS) on feed detail page

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-05-22 21:12:22 +02:00
Julien Herr afed4464cf fix(feed): use permalink URL as Atom entry id, strip mso-* inline styles
- Entry <id> was a non-URL string (timestamp + base64 snippet), which
  is invalid per the Atom spec; now uses the entry permalink URL which
  is both valid and stable across feed regeneration
- Strip mso-* properties from inline style attributes in extracted body
  content to eliminate the feed validator DangerousStyleAttr warning
  caused by Microsoft Office HTML in newsletter emails

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 18:43:06 +02:00
Julien Herr 4428f35dd4 fix(feed): add Atom link in emails page, fix HTML stripping, use request URL for self-link
- Add Atom Feed URL to the Feed Details card in the emails page
- Fix extractBodyContent to handle emails without a closing </body> tag
  (regex now falls back to capturing everything after the opening <body>)
- Use the actual request URL origin for atom:link rel="self" in RSS/Atom
  feeds, guaranteeing it always matches the document location regardless
  of how DOMAIN is configured

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 18:41:21 +02:00
Julien Herr bcc9640591 fix(feed): correct feed link, canonical id, and strip html wrapper from content
- link: computed as /admin/feeds/:id/emails instead of stale site_url from KV
- id: computed dynamically from baseUrl instead of stale feed_url from KV
- item description/content: strip <html><head><body> wrapper via extractBodyContent()
  so feed readers receive a body fragment, not a full HTML document

Fixes RSS validator warnings: SelfDoesntMatchLocation (stale KV domain) and
InvalidHTML (full HTML document inside <description>/<content:encoded>).
Adds 8 tests covering extractBodyContent and the new feed/atom link assertions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 18:28:13 +02:00
Julien Herr e93bbb8d3e feat: store email attachments in R2 and expose as RSS enclosures
Attachments from incoming emails are uploaded to an optional Cloudflare R2
bucket and exposed as <enclosure> elements in RSS and <link rel="enclosure">
in Atom feeds, served at /files/{id}/{filename} with immutable caching.

R2 is opt-in: if ATTACHMENT_BUCKET is not bound the feature is a no-op.
Attachments are cleaned up from R2 on email/feed deletion and during
size-based feed trimming. Adds MockR2 to the test setup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 09:09:37 +02:00
Julien Herr 4d8d1fdc82 refactor: extract buildFeed helper, add generateAtomFeed 2026-05-21 07:34:29 +02:00
Julien Herr 5308544672 refactor: simplify quick-win code after review
- Make feedId required in generateRssFeed (removes dead /emails/ fallback)
- Hoist loop-invariant conditional and remove intermediate variable
- Extract normalizeAllowedSenders() so JSON and form paths share same logic
- Move escapeHtml to src/utils/html.ts for reuse by admin.ts
- Parallelize the two independent KV puts in feed creation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 23:55:17 +02:00
Julien Herr 54e7a1bfa0 feat: parse <author> from From header in RSS items
Parse the From header into name + email parts so the feed library
renders proper RFC 2822 format (email (Name)) in <author> elements.
Also passes feedId to the generator so item links can point to the
upcoming /entries/:feedId/:receivedAt route.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 23:50:54 +02:00
Julien Herr 3ed9d2ee22 chore: apply Prettier formatting to entire codebase
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-05-20 22:01:53 +02:00
Young Lee 8839aac24b Set up initial project and files 2025-02-27 14:51:38 -08:00