11 Commits

Author SHA1 Message Date
Julien Herr 44fcbfc4f6 fix(favicon): fall back to apex domain when subdomain hosts no icon
Senders on a subdomain that hosts no favicon (e.g. mail.example.com) left
feeds blank because both the direct /favicon.ico and the DuckDuckGo lookup
were tried only against the full subdomain. Resolution now walks up to the
apex via Domain.parents() and caches the result under the original sender
domain.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 23:49:43 +02:00
Julien Herr 4d3a94d1ec fix(confirmation): flag code-based OTP signups with no clickable link
Detect verification-code signups (e.g. "your verification code is
371404") whose only link is a mailto. These cleared the keyword
threshold but were dropped because the detector required an http(s)
candidate link. A code path now raises the flag/badge/banner when a
verification keyword sits next to an OTP-style code; the code is never
extracted or surfaced.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 23:46:14 +02:00
Julien Herr 3f35435610 fix(confirmation): recognize localized subscribe CTAs in weak link signals
The weak link-signal vocabulary was English-only, so a genuine double
opt-in whose confirm button reads "Je m'inscris…" over an opaque tracking
redirect scored 0 on every link and was missed. Make the weak vocab
multilingual (FR/DE/ES) to match the confirmation keywords.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 23:35:10 +02:00
Julien Herr a353de1342 fix(favicon): raise max icon size to 256 KB for hi-res PNGs
DuckDuckGo serves hi-res PNG favicons that legitimately exceed the old
100 KB cap, causing them to be rejected and negatively cached.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 23:30:20 +02:00
Julien Herr fd3ff8c40a feat(admin): show email count and last-email date per feed
Surface each feed's email count on its Emails button and a "Last email …"
freshness line under the title, in both dashboard views. The values are
projected into feeds:list (kept to a single KV read) via the Feed aggregate,
so toListItemDTO now maps the whole aggregate through its intention-revealing
accessors instead of threading scalar projections. Also fixes long titles
overflowing into the Feed ID column in the table view.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 23:18:38 +02:00
Julien Herr e258206384 fix(feed): advertise WebSub hub in RSS/Atom body
Readers like FreshRSS discover the hub from <atom:link rel="hub"> in the
feed XML, not the HTTP Link header. Without it they never subscribe and
only refresh on cache expiry (~30 min) instead of receiving an instant
push when a new email arrives.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 23:04:33 +02:00
Julien Herr 7297e06b94 fix(feed): escape bare ampersands in entry HTML attribute URLs
linkedom escapes & in text nodes but not in attribute values, so URLs
with query strings (?a=1&b=2) serialized with bare ampersands. Valid XML
inside the feed CDATA, but the W3C validator parses the embedded HTML and
warns "Named entity expected. Got none." on <description>/<content:encoded>
(RSS) and <summary>/<content> (Atom). Escape every & not already starting
a valid entity; covers all three formats via processEmailContent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 22:49:57 +02:00
Julien Herr 5f13126b35 fix(favicon): short TTL for negative favicon cache entries
A failed favicon lookup was cached for a full week (same TTL as a
success), so a transient miss (e.g. the icon not yet indexed upstream)
blacklisted the domain for days. Cache negatives for 6 hours instead so
the next email retries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 22:44:35 +02:00
Julien Herr bb9fce72ff fix(confirmation): detect confirm emails whose CTA hint is in the link text
Weak subscribe/subscription signals are now matched on the link href OR its
visible text (matched once, not additively), so a double opt-in email whose
button reads "Yes, subscribe me…" over an opaque tracking-redirect href is no
longer missed. Adds a regression test with anonymized fixture data.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 22:36:16 +02:00
Julien Herr b6b160a186 fix(release): set GitHub Release title to the tag
--notes-file (unlike --generate-notes) leaves the release name blank; pass
--title so releases keep a heading.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 19:02:54 +02:00
Julien Herr a9814ca063 chore: open 0.4.0 develop cycle 2026-05-25 19:01:44 +02:00
26 changed files with 683 additions and 87 deletions
+1 -1
View File
@@ -77,5 +77,5 @@ jobs:
TAG_NAME: ${{ github.ref_name }}
BUNDLE_PATH: ${{ steps.bundle.outputs.path }}
run: |
gh release create "$TAG_NAME" --notes-file release-notes.md --verify-tag || true
gh release create "$TAG_NAME" --title "$TAG_NAME" --notes-file release-notes.md --verify-tag || true
gh release upload "$TAG_NAME" "$BUNDLE_PATH" --clobber
+51
View File
@@ -12,6 +12,57 @@ verbatim as the GitHub Release notes — so what you write here is what ships.
## [Unreleased]
### Added
- The admin dashboard now shows each feed's email count on its **Emails** button
and a **"Last email …"** freshness line under the feed title, in both the list
and table views. Both values are projected into `feeds:list`, so the dashboard
stays a single KV read; they backfill on a feed's next email or save.
### Fixed
- Per-feed favicons now resolve for senders on a subdomain that hosts no icon of
its own (e.g. `mail.example.com`): the lookup walks up to the apex domain
(`example.com`) and uses its favicon, caching it under the original sender
domain. Previously both the direct `/favicon.ico` and the DuckDuckGo lookup
were tried only against the full subdomain, leaving such feeds blank.
- Subscription-confirmation detection now flags code-based signup verifications
(OTP) that have no link to click — e.g. "Your verification code is 371404",
whose only link is a `mailto:` support address. These cleared the keyword
threshold but were dropped because the detector required an http(s) candidate
link. A code path now raises the flag/badge/banner when a verification keyword
sits next to an OTP-style code; the code itself is never extracted or surfaced.
- Subscription-confirmation detection now recognizes localized "subscribe" CTAs.
The weak link-signal vocabulary was English-only (`subscrib`),
so a genuine double opt-in whose confirm button reads "Je m'inscris…" over an
opaque tracking redirect scored 0 on every link and was missed. The weak vocab
is now multilingual (FR/DE/ES) to match the confirmation keywords.
- Per-feed favicons no longer fail for senders whose DuckDuckGo icon is a
hi-res PNG: the maximum accepted favicon size is raised from 100 KB to 256 KB,
so legitimate large icons (~107 KB and up) are cached instead of rejected.
A domain that was already negatively cached only re-fetches once that entry's
TTL expires (and something — a new email or a favicon request — retriggers
the fetch); delete its `icon:<domain>` KV key to force an immediate refresh.
- Admin dashboard table view: long feed titles no longer overflow into the Feed
ID column — the title/description cell now shrinks so its text ellipsises.
- RSS and Atom feeds now advertise the WebSub hub inside the feed body
(`<atom:link rel="hub">`), not just in the HTTP `Link` header. Readers like
FreshRSS discover the hub from the XML, so they can now subscribe and receive
an instant push when a new email arrives instead of waiting up to the cache
`max-age` (30 min) to refresh.
- Subscription-confirmation detection now recognises a confirm email whose CTA
button carries the subscribe/subscription hint only in its visible text (e.g.
"Yes, subscribe me to this mailing list.") over an opaque tracking-redirect
href — previously the link scored zero and the email was missed.
- Sender favicons now recover from a transient miss: a failed favicon lookup is
cached negatively for 6 hours instead of a full week, so a domain whose icon
was momentarily unavailable (e.g. not yet indexed upstream) is retried on the
next email instead of staying blank for days.
- Feed entry HTML now escapes bare ampersands in attribute URLs (e.g. query
strings like `?a=1&b=2`), clearing the W3C feed validator's "Named entity
expected. Got none." warning and improving interoperability with stricter
feed readers.
## [0.3.1] - 2026-05-25
### Fixed
+2 -2
View File
@@ -1,12 +1,12 @@
{
"name": "kill-the-news",
"version": "0.3.1",
"version": "0.4.0-develop",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "kill-the-news",
"version": "0.3.1",
"version": "0.4.0-develop",
"license": "MIT",
"dependencies": {
"@hono/zod-openapi": "^1.4.0",
+1 -1
View File
@@ -1,6 +1,6 @@
{
"name": "kill-the-news",
"version": "0.3.1",
"version": "0.4.0-develop",
"description": "Convert email newsletters into private RSS feeds using Cloudflare Workers",
"main": "dist/worker.js",
"scripts": {
+3 -1
View File
@@ -236,7 +236,9 @@ async function storeEmail(
...(inlineIds.length > 0 ? { inlineAttachmentIds: inlineIds } : {}),
...(messageId ? { messageId } : {}),
dedupHash,
...(confirmationLinks
// null = not a confirmation; [] = a code-based confirmation (flag it, no
// link to surface). Both an empty and a populated array mean "detected".
...(confirmationLinks !== null
? { confirmation: { links: confirmationLinks } }
: {}),
};
+14 -2
View File
@@ -31,8 +31,20 @@ export const STATS_KEY = "stats:counters";
/** Default TTL for a cached per-domain favicon (seconds). */
export const ICON_TTL_SECONDS = 7 * 24 * 60 * 60; // 1 week
/** Maximum accepted favicon size (bytes); larger responses are rejected. */
export const MAX_ICON_BYTES = 100 * 1024; // 100 KB
/**
* TTL for a *negative* favicon cache entry (seconds). Kept short so a transient
* miss (e.g. DuckDuckGo not having indexed the domain yet) self-heals within
* hours instead of blacklisting the domain for a full week.
*/
export const ICON_NEGATIVE_TTL_SECONDS = 6 * 60 * 60; // 6 hours
/**
* Maximum accepted favicon size (bytes); larger responses are rejected.
* DuckDuckGo serves hi-res (often 144×144) PNG favicons that legitimately
* exceed 100 KB, so the cap is generous; KV's value limit (25 MB) is the only
* hard constraint, even after base64 inflation.
*/
export const MAX_ICON_BYTES = 256 * 1024; // 256 KB
/** Timeout for an outbound favicon fetch (milliseconds). */
export const ICON_FETCH_TIMEOUT_MS = 5000;
+88
View File
@@ -159,6 +159,94 @@ describe("detectConfirmation", () => {
expect(result![0]).toBe("https://news.example.com/subscribe/abc123");
});
it("detects a confirm email whose CTA link carries the weak signal only in its text (opaque tracking href)", () => {
// Real-world Mailchimp double opt-in: the subject/body clearly confirm, but
// the button's href is an opaque base64 tracking redirect (no signal) and its
// visible text — "Yes, subscribe me…" — is only a weak signal. The link must
// still qualify as a candidate so the email is flagged.
const result = detectConfirmation({
subject: "Action Required | Please Confirm Your Subscription",
text: "Please confirm your mailing list subscription (double opt-in) by clicking the button below. You won't be subscribed if you don't click the confirmation link above.",
links: [
{
href: "https://click.example.com/track/click/00000000/list.example.com?p=eyJzIjoiQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUEiLCJ2",
text: "Yes, subscribe me to this mailing list.",
},
],
});
expect(result).not.toBeNull();
expect(result![0]).toContain("click.example.com");
});
it("detects a French confirm email whose CTA text is a localized 'subscribe' over an opaque tracking href", () => {
// Real-world double opt-in: subject/body clearly confirm, but the
// button's href is an opaque provider redirect (proc.php?…&act=csub — no
// signal) and its visible text "Je m'inscris…" is the French equivalent of
// "subscribe" (a weak signal). The weak vocab must be multilingual like the
// confirmation keywords, otherwise the link scores 0 and the email is missed.
const result = detectConfirmation({
subject: "[Action requise] Confirme ton inscription",
text: "Avant de confirmer ton inscription, clique ici.",
links: [
{
href: "https://email.example.com/proc.php?nl=1&f=36&s=abc&act=csub",
text: "Je m'inscris sur la liste d'attente",
},
{ href: "https://www.example.com/", text: "Notre site" },
],
});
expect(result).not.toBeNull();
expect(result![0]).toContain("proc.php");
});
// ── Code-based signup confirmations (OTP) with no clickable link ─────────────
// Some signups send a verification *code* to enter manually — there is nothing
// to click. We still flag these (empty links: detected but no actionable link),
// but never extract or surface the code itself.
it("flags an OTP signup email whose only link is a mailto", () => {
const result = detectConfirmation({
subject: "❄️ Ton code de vérification est 371404",
text: "Salut ! Entre le code de vérification ci-dessous lorsqu'il te sera demandé : 371404. Tu n'as rien demandé ?",
links: [
{
href: "mailto:hey@example.com?subject=Acc%C3%A8s+frauduleux",
text: "contacter le support",
},
],
});
expect(result).toEqual([]);
});
it("flags a code email via a body keyword + code pattern when there are no links", () => {
const result = detectConfirmation({
subject: "Welcome to Acme",
text: "Your verification code is 246810. Enter it to finish signing up.",
links: [],
});
expect(result).toEqual([]);
});
it("does not flag a transactional email with a big number but no code-near-code-word", () => {
const result = detectConfirmation({
subject: "Order confirmed",
text: "Your order 12345678 ships Monday.",
links: [
{ href: "https://shop.example.com/track/12345678", text: "Track" },
],
});
expect(result).toBeNull();
});
it("does not flag a newsletter with numbers but no verification keyword", () => {
const result = detectConfirmation({
subject: "Your 2026 wrapped: 4567 minutes listened",
text: "Here is your year in review with code 9999 highlights.",
links: [{ href: "https://music.example.com/wrapped", text: "See more" }],
});
expect(result).toBeNull();
});
it("dedupes a confirmation link repeated in the body", () => {
const result = detectConfirmation({
subject: "Confirm your subscription",
+57 -15
View File
@@ -5,8 +5,11 @@
* the link-signal patterns, the scoring weights and the threshold.
*
* Returns the ranked candidate confirmation links (top 3) when the combined score
* clears the threshold AND at least one candidate link exists; otherwise null.
* Only http(s) links are ever considered or returned.
* clears the threshold AND at least one candidate link exists. When the email is a
* code-based signup verification (a verification keyword next to an OTP-style code,
* with no clickable link — e.g. "your verification code is 371404") it returns an
* empty array: detected, but nothing to click. Returns null when not a confirmation.
* Only http(s) links are ever considered or returned; the code is never extracted.
*/
export interface DetectConfirmationInput {
@@ -46,11 +49,20 @@ const STRONG_LINK_SIGNALS = [
"activation",
];
// Weak URL signals: ambiguous subscribe/subscription words that also appear in
// ordinary "manage subscription" footers. Worth only +1 so they cannot, on their
// own (with a stray body keyword), cross the threshold and cry wolf — but still
// let a genuine "confirm your subscription" subject + a bare /subscribe link pass.
const WEAK_LINK_SIGNALS = ["subscription", "subscribe"];
// Weak signals: ambiguous subscribe/subscription words that also appear in
// ordinary "manage subscription" footers. Matched on the link href OR its visible
// text (a CTA button often reads "Yes, subscribe me…" / "Je m'inscris…" over an
// opaque tracking redirect). Worth only +1 — and only once, never href+text
// additively — so they cannot, on their own (with a stray body keyword), cross
// the threshold and cry wolf, yet still let a genuine "confirm your subscription"
// email pass. Multilingual like KEYWORDS (EN / FR / DE / ES) — extend per language.
const WEAK_LINK_SIGNALS = [
"subscrib", // EN: subscribe / subscription (unsubscribe is caught by NEGATIVE first)
"inscri", // FR: s'inscrire / inscription / je m'inscris
"anmeld", // DE: anmelden / anmeldung
"suscrib", // ES: suscribir / suscripción
"inscrib", // ES: inscribirse / inscripción
];
// Negative patterns: a link matching any of these is NEVER a candidate, and these
// tokens are stripped from text before keyword scanning (kills the unsubscribe
@@ -67,6 +79,21 @@ const NEGATIVE = [
const THRESHOLD = 3;
// A verification code (OTP) sitting next to a code-ish word, in either order and
// within a short window — "your verification code is 371404" / "371404 is your
// code". This is the signup-by-code case that has no link to click. Run on the
// already-normalized (lowercased, diacritics-stripped) subject/body. We only test
// for presence to raise the flag; the code value is never captured or surfaced.
const CODE_WORDS = "code|codigo|otp|verif";
const CODE_PROXIMITY = 48;
const CODE_PATTERN = new RegExp(
`(?:${CODE_WORDS})[\\s\\S]{0,${CODE_PROXIMITY}}?\\b\\d{4,8}\\b|\\b\\d{4,8}\\b[\\s\\S]{0,${CODE_PROXIMITY}}?(?:${CODE_WORDS})`,
);
function hasVerificationCode(text: string): boolean {
return CODE_PATTERN.test(text);
}
function normalize(s: string): string {
return s.normalize("NFD").replace(/[̀-ͯ]/g, "").toLowerCase();
}
@@ -85,7 +112,8 @@ function linkScore(href: string, text: string): number {
if (matchesAny(h, NEGATIVE) || matchesAny(t, NEGATIVE)) return 0;
let score = 0;
if (matchesAny(h, STRONG_LINK_SIGNALS)) score += 2;
else if (matchesAny(h, WEAK_LINK_SIGNALS)) score += 1;
else if (matchesAny(h, WEAK_LINK_SIGNALS) || matchesAny(t, WEAK_LINK_SIGNALS))
score += 1;
if (matchesAny(t, KEYWORDS)) score += 2;
return score;
}
@@ -105,18 +133,32 @@ export function detectConfirmation(
.filter((l) => l.score > 0)
.sort((a, b) => b.score - a.score);
if (candidates.length === 0) return null;
const subject = stripNegatives(normalize(input.subject));
const text = stripNegatives(normalize(input.text));
const subjectScore = matchesAny(subject, KEYWORDS) ? 2 : 0;
const bodyScore = matchesAny(text, KEYWORDS) ? 1 : 0;
const bestLinkScore = candidates[0].score;
if (subjectScore + bodyScore + bestLinkScore < THRESHOLD) return null;
// Link path: a clickable confirm/verify/subscribe link clears the threshold.
if (candidates.length > 0) {
const bestLinkScore = candidates[0].score;
if (subjectScore + bodyScore + bestLinkScore >= THRESHOLD) {
// Dedupe by href before capping, so a link repeated in the body never
// wastes one of the three surfaced slots.
return [...new Set(candidates.map((c) => c.href))].slice(0, 3);
}
}
// Dedupe by href before capping, so a link repeated in the body never wastes
// one of the three surfaced slots.
return [...new Set(candidates.map((c) => c.href))].slice(0, 3);
// Code path: an OTP-style signup verification with no link to click. Requires
// both a verification keyword (subject or body) and a code-near-code-word
// pattern, so a stray number or a lone keyword cannot cry wolf. Flag it with
// an empty link list — detected, but nothing actionable to surface.
if (
(subjectScore > 0 || bodyScore > 0) &&
(hasVerificationCode(subject) || hasVerificationCode(text))
) {
return [];
}
return null;
}
+49
View File
@@ -200,6 +200,34 @@ describe("Feed.removeEmails", () => {
});
});
describe("Feed.emailCount / lastEmailAt", () => {
it("reports zero and undefined for an empty feed", () => {
const feed = Feed.reconstitute(FID, state(), { emails: [] });
expect(feed.emailCount).toBe(0);
expect(feed.lastEmailAt).toBeUndefined();
});
it("counts emails and reports the newest receivedAt (index head)", () => {
const feed = Feed.reconstitute(FID, state(), {
emails: [
entry({ key: "k2", receivedAt: 2000 }),
entry({ key: "k1", receivedAt: 1000 }),
],
});
expect(feed.emailCount).toBe(2);
expect(feed.lastEmailAt).toBe(2000);
});
it("tracks the latest email after ingest", () => {
const feed = Feed.reconstitute(FID, state(), {
emails: [entry({ key: "old", receivedAt: 1000 })],
});
feed.ingest(entry({ key: "new", receivedAt: 5000 }), { maxBytes: 10_000 });
expect(feed.emailCount).toBe(2);
expect(feed.lastEmailAt).toBe(5000);
});
});
describe("Feed events", () => {
it("records FeedCreated on create and drains it once", () => {
const feed = Feed.create(FID, createInput(), { mailboxId: MBOX });
@@ -333,6 +361,27 @@ describe("FeedRepository.load / save round-trip", () => {
]);
});
it("projects email count and last-email timestamp into feeds:list", async () => {
const repo = new FeedRepository(mockEnv().EMAIL_STORAGE);
const created = Feed.create(FID, createInput({ title: "Proj" }), {
mailboxId: MBOX,
});
await repo.save(created);
let listed = await repo.listFeeds();
expect(listed[0].emailCount).toBe(0);
expect(listed[0].lastEmailAt).toBeUndefined();
created.ingest(entry({ key: "feed:opaque-feed-id:1", receivedAt: 4242 }), {
maxBytes: 1_000_000,
});
await repo.saveMetadata(created);
listed = await repo.listFeeds();
expect(listed[0].emailCount).toBe(1);
expect(listed[0].lastEmailAt).toBe(4242);
});
it("returns null when the feed has no config", async () => {
const repo = new FeedRepository(mockEnv().EMAIL_STORAGE);
expect(await repo.load(FeedId.unchecked("missing"))).toBeNull();
+13
View File
@@ -190,6 +190,19 @@ export class Feed {
return [...this._metadata.emails];
}
/** Number of emails currently in the index. */
get emailCount(): number {
return this._metadata.emails.length;
}
/**
* Received timestamp (ms) of the most recent email, or undefined when the
* feed has none. The index is maintained newest-first (ingest unshifts).
*/
get lastEmailAt(): number | undefined {
return this._metadata.emails[0]?.receivedAt;
}
/** Per-sender one-click unsubscribe links (copy). */
unsubscribeUrls(): Record<string, string> {
return { ...(this._metadata.unsubscribe ?? {}) };
+39
View File
@@ -22,4 +22,43 @@ describe("Domain", () => {
).toBe(true);
expect(Domain.parse("a.com")!.matches(Domain.parse("b.com")!)).toBe(false);
});
describe("parents", () => {
it("yields the domain itself and each parent, most-specific first", () => {
expect(
Domain.parse("mail.example.com")!
.parents()
.map((d) => d.value),
).toEqual(["mail.example.com", "example.com"]);
});
it("stops at the two-label registrable domain", () => {
expect(
Domain.parse("a.b.c.example.com")!
.parents()
.map((d) => d.value),
).toEqual([
"a.b.c.example.com",
"b.c.example.com",
"c.example.com",
"example.com",
]);
});
it("returns just the domain when it is already two labels", () => {
expect(
Domain.parse("example.com")!
.parents()
.map((d) => d.value),
).toEqual(["example.com"]);
});
it("returns the single label as-is", () => {
expect(
Domain.parse("localhost")!
.parents()
.map((d) => d.value),
).toEqual(["localhost"]);
});
});
});
+16
View File
@@ -18,6 +18,22 @@ export class Domain {
return this.value === other.value;
}
/**
* This domain plus each parent domain down to the two-label registrable
* level, most-specific first: `a.b.example.com` →
* `[a.b.example.com, b.example.com, example.com]`. Lets a lookup fall back to
* the apex when a sending subdomain (e.g. `mail.example.com`) hosts no asset
* of its own. A single-label value is returned unchanged.
*/
parents(): Domain[] {
const labels = this.value.split(".");
const result: Domain[] = [];
for (let i = 0; i + 2 <= labels.length; i++) {
result.push(new Domain(labels.slice(i).join(".")));
}
return result.length ? result : [this];
}
toString(): string {
return this.value;
}
+68 -2
View File
@@ -1,4 +1,4 @@
import { describe, it, expect } from "vitest";
import { describe, it, expect, vi } from "vitest";
import { http, HttpResponse } from "msw";
import { server, createMockEnv } from "../test/setup";
import {
@@ -6,7 +6,12 @@ import {
extractEmailDomain,
getCachedIcon,
} from "./favicon-fetcher";
import { MAX_ICON_BYTES } from "../config/constants";
import { IconRepository } from "./icon-repository";
import {
ICON_NEGATIVE_TTL_SECONDS,
ICON_TTL_SECONDS,
MAX_ICON_BYTES,
} from "../config/constants";
const iconKey = (domain: string) => `icon:${domain}`;
import type { Env } from "../types";
@@ -71,6 +76,28 @@ describe("cacheFaviconForDomain", () => {
expect(icon?.contentType).toBe("image/x-icon");
});
it("falls back to the apex domain when the subdomain has no icon", async () => {
const env = createMockEnv() as unknown as Env;
server.use(
http.get("https://mail.acme.test/favicon.ico", () =>
HttpResponse.error(),
),
http.get("https://icons.duckduckgo.com/ip3/mail.acme.test.ico", () =>
HttpResponse.text("", { status: 404 }),
),
http.get("https://acme.test/favicon.ico", () =>
imageResponse(PNG, "image/vnd.microsoft.icon"),
),
);
await cacheFaviconForDomain("mail.acme.test", env);
// Cached under the original sender domain, so reads still hit.
const icon = await getCachedIcon("mail.acme.test", env);
expect(icon?.contentType).toBe("image/vnd.microsoft.icon");
expect(new Uint8Array(icon!.bytes)).toEqual(PNG);
});
it("writes a negative entry when no icon is found", async () => {
const env = createMockEnv() as unknown as Env;
server.use(
@@ -89,6 +116,45 @@ describe("cacheFaviconForDomain", () => {
expect(await getCachedIcon("nope.test", env)).toBeNull();
});
it("gives a negative entry a short TTL so transient misses self-heal", async () => {
const env = createMockEnv() as unknown as Env;
const put = vi.spyOn(IconRepository.prototype, "put");
server.use(
http.get("https://transient.test/favicon.ico", () =>
HttpResponse.text("", { status: 404 }),
),
http.get("https://icons.duckduckgo.com/ip3/transient.test.ico", () =>
HttpResponse.text("", { status: 404 }),
),
);
await cacheFaviconForDomain("transient.test", env);
expect(put).toHaveBeenCalledWith(
"transient.test",
expect.any(String),
ICON_NEGATIVE_TTL_SECONDS,
);
put.mockRestore();
});
it("gives a positive entry the full TTL", async () => {
const env = createMockEnv() as unknown as Env;
const put = vi.spyOn(IconRepository.prototype, "put");
server.use(
http.get("https://hit.test/favicon.ico", () => imageResponse(PNG)),
);
await cacheFaviconForDomain("hit.test", env);
expect(put).toHaveBeenCalledWith(
"hit.test",
expect.any(String),
ICON_TTL_SECONDS,
);
put.mockRestore();
});
it("rejects oversized responses as negative", async () => {
const env = createMockEnv() as unknown as Env;
const big = new Uint8Array(MAX_ICON_BYTES + 1);
+21 -11
View File
@@ -1,10 +1,12 @@
import { Env } from "../types";
import {
ICON_FETCH_TIMEOUT_MS,
ICON_NEGATIVE_TTL_SECONDS,
ICON_TTL_SECONDS,
MAX_ICON_BYTES,
} from "../config/constants";
import { IconRepository } from "./icon-repository";
import { Domain } from "../domain/value-objects/domain";
import { EmailAddress } from "../domain/value-objects/email-address";
import { logger } from "./logger";
@@ -64,16 +66,23 @@ async function fetchIconFrom(
async function resolveIcon(
domain: string,
): Promise<{ buffer: ArrayBuffer; contentType: string } | null> {
const candidates = [
`https://${domain}/favicon.ico`,
`https://icons.duckduckgo.com/ip3/${domain}.ico`,
];
for (const url of candidates) {
try {
const icon = await fetchIconFrom(url);
if (icon) return icon;
} catch {
// Try the next candidate; network/timeout errors must never propagate.
// Walk the sending subdomain up to its apex so a sender like
// `mail.example.com` falls back to `example.com`'s favicon.
const hosts = Domain.parse(domain)
?.parents()
.map((d) => d.value) ?? [domain];
for (const host of hosts) {
const candidates = [
`https://${host}/favicon.ico`,
`https://icons.duckduckgo.com/ip3/${host}.ico`,
];
for (const url of candidates) {
try {
const icon = await fetchIconFrom(url);
if (icon) return icon;
} catch {
// Try the next candidate; network/timeout errors must never propagate.
}
}
}
return null;
@@ -102,7 +111,8 @@ export async function cacheFaviconForDomain(
}
: { data: null, contentType: "" };
await repo.put(domain, JSON.stringify(record), ICON_TTL_SECONDS);
const ttl = icon ? ICON_TTL_SECONDS : ICON_NEGATIVE_TTL_SECONDS;
await repo.put(domain, JSON.stringify(record), ttl);
} catch (error) {
logger.warn("Favicon cache failed", { domain, error: String(error) });
}
+22
View File
@@ -130,6 +130,17 @@ describe("generateRssFeed", () => {
expect(result).toContain(`${BASE_URL}/rss/${FEED_ID}`);
});
it("advertises the WebSub hub in the RSS body", () => {
const result = generateRssFeed(
mockFeedConfig,
mockEmails,
BASE_URL,
FEED_ID,
);
expect(result).toContain('rel="hub"');
expect(result).toContain(`${BASE_URL}/hub`);
});
it("includes email entries as <item> elements", () => {
const result = generateRssFeed(
mockFeedConfig,
@@ -280,6 +291,17 @@ describe("generateAtomFeed", () => {
expect(result).toContain(`${BASE_URL}/atom/${FEED_ID}`);
});
it("advertises the WebSub hub in the Atom body", () => {
const result = generateAtomFeed(
mockFeedConfig,
mockEmails,
BASE_URL,
FEED_ID,
);
expect(result).toContain('rel="hub"');
expect(result).toContain(`${BASE_URL}/hub`);
});
it("includes rss alternate link", () => {
const result = generateAtomFeed(
mockFeedConfig,
+4
View File
@@ -35,6 +35,10 @@ function buildFeed(
// Public "website" for this feed: its own read URL (never the inbound address
// or an auth-gated admin path, so the feed output leaks neither).
link: `${baseUrl}/rss/${feedId}`,
// WebSub hub advertised in the feed body (<atom:link rel="hub">). Readers like
// FreshRSS discover the hub here, not from the HTTP Link header, so without it
// they never subscribe and only refresh on cache expiry.
hub: `${baseUrl}/hub`,
language: feedConfig.language,
updated: new Date(),
generator: "kill-the-news",
+32 -11
View File
@@ -1,7 +1,8 @@
import { describe, it, expect } from "vitest";
import { fromConfigDTO, toConfigDTO, toListItemDTO } from "./feed-mapper";
import { FeedId } from "../domain/value-objects/feed-id";
import type { FeedConfig } from "../types";
import { Feed } from "../domain/feed.aggregate";
import type { FeedConfig, FeedMetadata } from "../types";
const fullConfig: FeedConfig = {
title: "News",
@@ -16,6 +17,13 @@ const fullConfig: FeedConfig = {
expires_at: 3000,
};
const feedFrom = (metadata: FeedMetadata) =>
Feed.reconstitute(
FeedId.unchecked("a.b.42"),
fromConfigDTO(fullConfig),
metadata,
);
describe("feed-mapper", () => {
it("round-trips a full config DTO through domain state unchanged", () => {
expect(toConfigDTO(fromConfigDTO(fullConfig))).toEqual(fullConfig);
@@ -32,11 +40,8 @@ describe("feed-mapper", () => {
expect(state.blockedSenders).toEqual([]);
});
it("projects the feeds:list item from domain state", () => {
const item = toListItemDTO(
FeedId.unchecked("a.b.42"),
fromConfigDTO(fullConfig),
);
it("projects the feeds:list item from an empty feed aggregate", () => {
const item = toListItemDTO(feedFrom({ emails: [] }));
expect(item).toEqual({
id: "a.b.42",
title: "News",
@@ -45,17 +50,33 @@ describe("feed-mapper", () => {
expires_at: 3000,
pendingConfirmation: false,
hasNativeFeed: false,
emailCount: 0,
lastEmailAt: undefined,
});
});
it("projects hasNativeFeed when passed", () => {
it("projects pendingConfirmation and hasNativeFeed from metadata", () => {
const item = toListItemDTO(
FeedId.unchecked("a.b.42"),
fromConfigDTO(fullConfig),
true,
true,
feedFrom({
emails: [],
pendingConfirmation: true,
nativeFeeds: { "n@x.com": [{ url: "https://x/rss", type: "rss" }] },
}),
);
expect(item.pendingConfirmation).toBe(true);
expect(item.hasNativeFeed).toBe(true);
});
it("projects email count and the newest email's timestamp", () => {
const item = toListItemDTO(
feedFrom({
emails: [
{ key: "k2", subject: "b", receivedAt: 1700000000000 },
{ key: "k1", subject: "a", receivedAt: 1600000000000 },
],
}),
);
expect(item.emailCount).toBe(2);
expect(item.lastEmailAt).toBe(1700000000000);
});
});
+18 -15
View File
@@ -1,6 +1,6 @@
import { FeedConfig, FeedListItem } from "../types";
import { FeedState } from "../domain/feed-state";
import { FeedId } from "../domain/value-objects/feed-id";
import { Feed } from "../domain/feed.aggregate";
/**
* The translation seam between the Feed aggregate's domain state (camelCase) and
@@ -44,20 +44,23 @@ export function toConfigDTO(state: FeedState): FeedConfig {
};
}
/** Domain state → the projection cached in the global `feeds:list` registry. */
export function toListItemDTO(
id: FeedId,
state: FeedState,
pendingConfirmation = false,
hasNativeFeed = false,
): FeedListItem {
/**
* The Feed aggregate → the projection cached in the global `feeds:list` registry.
* Unlike the config DTO, the list item is a read-model view: it folds in the
* aggregate's metadata-derived signals (pending confirmation, native feed,
* email count/last-received) alongside the config fields, so it reads the whole
* aggregate through its intention-revealing accessors.
*/
export function toListItemDTO(feed: Feed): FeedListItem {
return {
id: id.value,
title: state.title,
description: state.description,
mailbox_id: state.mailboxId,
expires_at: state.expiresAt,
pendingConfirmation,
hasNativeFeed,
id: feed.id.value,
title: feed.title,
description: feed.description,
mailbox_id: feed.mailboxId.value,
expires_at: feed.expiresAt,
pendingConfirmation: feed.pendingConfirmation,
hasNativeFeed: feed.hasNativeFeed(),
emailCount: feed.emailCount,
lastEmailAt: feed.lastEmailAt,
};
}
+3 -24
View File
@@ -87,14 +87,7 @@ export class FeedRepository {
await Promise.all([
this.putConfig(feed.id, toConfigDTO(feed.state())),
this.putMetadata(feed.id, feed.toMetadataSnapshot()),
this.upsertListEntry(
toListItemDTO(
feed.id,
feed.state(),
feed.pendingConfirmation,
feed.hasNativeFeed(),
),
),
this.upsertListEntry(toListItemDTO(feed)),
this.putInboundIndex(feed.mailboxId, feed.id),
]);
}
@@ -108,14 +101,7 @@ export class FeedRepository {
async saveMetadata(feed: Feed): Promise<void> {
await Promise.all([
this.putMetadata(feed.id, feed.toMetadataSnapshot()),
this.upsertListEntry(
toListItemDTO(
feed.id,
feed.state(),
feed.pendingConfirmation,
feed.hasNativeFeed(),
),
),
this.upsertListEntry(toListItemDTO(feed)),
]);
}
@@ -127,14 +113,7 @@ export class FeedRepository {
async saveConfig(feed: Feed): Promise<void> {
await Promise.all([
this.putConfig(feed.id, toConfigDTO(feed.state())),
this.upsertListEntry(
toListItemDTO(
feed.id,
feed.state(),
feed.pendingConfirmation,
feed.hasNativeFeed(),
),
),
this.upsertListEntry(toListItemDTO(feed)),
this.putInboundIndex(feed.mailboxId, feed.id),
]);
}
+19
View File
@@ -104,6 +104,25 @@ describe("processEmailContent — attribute sanitization", () => {
const result = processEmailContent(html);
expect(result).toContain("https://example.com");
});
it("escapes bare ampersands in attribute URLs (W3C feed-valid HTML)", () => {
const html =
'<body><a href="https://example.com/?a=1&b=2&utm_source=x">link</a></body>';
const result = processEmailContent(html);
expect(result).toContain(
"https://example.com/?a=1&amp;b=2&amp;utm_source=x",
);
expect(result).not.toMatch(/&(?!amp;)/);
});
it("does not double-escape existing entities", () => {
const html =
'<body><p>Tom &amp; Jerry &#39; &lt;tag&gt;</p><a href="https://x.com/?q=a&amp;b">l</a></body>';
const result = processEmailContent(html);
expect(result).toContain("Tom &amp; Jerry");
expect(result).not.toContain("&amp;amp;");
expect(result).toContain("?q=a&amp;b");
});
});
describe("processEmailContent — mso style cleanup", () => {
+13 -1
View File
@@ -159,6 +159,18 @@ function isPlainText(content: string): boolean {
return !/<[a-z][\s\S]*>/i.test(content);
}
// linkedom escapes `&` in text nodes but not in attribute values, so a URL like
// `?a=1&b=2` serializes with bare ampersands. That's valid XML inside the feed's
// CDATA, but the W3C feed validator parses the embedded HTML and warns
// ("Named entity expected. Got none."). Escape every `&` that doesn't already
// start a valid entity (named, decimal, or hex) — leaves `&amp;`/`&#39;` intact.
function escapeBareAmpersands(html: string): string {
return html.replace(
/&(?!(?:[a-zA-Z][a-zA-Z0-9]*|#\d+|#x[0-9a-fA-F]+);)/g,
"&amp;",
);
}
function rewriteCidSrc(
el: Element,
cidMap: Map<string, AttachmentData>,
@@ -261,5 +273,5 @@ export function processEmailContent(
// Full documents expose a <body>; bodyless fragments are serialized directly
// so that sanitization and cid rewriting still apply to their nodes.
const body = document.querySelector("body");
return body ? body.innerHTML : document.toString();
return escapeBareAmpersands(body ? body.innerHTML : document.toString());
}
+70
View File
@@ -1389,6 +1389,76 @@ describe("Admin Routes", () => {
expect(body).toContain("pill-confirmation");
});
it("dashboard shows email count badge and last-email line in both views", async () => {
const authCookie = await loginAndGetCookie();
const repo = FeedRepository.from(mockEnv as unknown as Env);
const feedId = FeedId.generate();
const mailboxId = MailboxId.unchecked("count.dash.07");
const feed = Feed.create(
feedId,
{
title: "Counted Feed",
language: "en",
allowedSenders: [],
blockedSenders: [],
},
{ mailboxId },
);
await repo.save(feed);
for (let i = 0; i < 2; i++) {
const emailKey = repo.newEmailKey(feedId);
await repo.putEmail(emailKey, {
subject: `Email ${i}`,
from: "newsletter@example.com",
content: "<p>hi</p>",
receivedAt: Date.now(),
headers: {},
});
feed.ingest(
{ key: emailKey, subject: `Email ${i}`, receivedAt: Date.now() },
{ maxBytes: 1_000_000 },
);
}
await repo.saveMetadata(feed);
for (const view of ["table", "list"]) {
const res = await request(`/admin?view=${view}`, {
headers: { Cookie: authCookie },
});
expect(res.status).toBe(200);
const body = await res.text();
expect(body).toContain('class="button-count">2<');
expect(body).toContain("Last email");
}
});
it("dashboard shows 'No emails yet' for a feed with zero emails", async () => {
const authCookie = await loginAndGetCookie();
const repo = FeedRepository.from(mockEnv as unknown as Env);
const feedId = FeedId.generate();
const feed = Feed.create(
feedId,
{
title: "Empty Feed",
language: "en",
allowedSenders: [],
blockedSenders: [],
},
{ mailboxId: MailboxId.unchecked("empty.dash.08") },
);
await repo.save(feed);
const res = await request("/admin?view=list", {
headers: { Cookie: authCookie },
});
const body = await res.text();
expect(body).toContain("No emails yet");
expect(body).toContain('class="button-count">0<');
});
it("feed emails page shows confirmation-banner when pendingConfirmation is true", async () => {
const authCookie = await loginAndGetCookie();
const repo = FeedRepository.from(mockEnv as unknown as Env);
+15 -1
View File
@@ -14,6 +14,8 @@ import {
CheckIcon,
FeedFormats,
ExpiryBadge,
LastEmail,
EmailCountBadge,
} from "./admin/ui";
import { FeedRepository } from "../infrastructure/feed-repository";
import { FeedId } from "../domain/value-objects/feed-id";
@@ -628,7 +630,7 @@ app.get("/", async (c) => {
height="20"
loading="lazy"
/>
<div>
<div class="feed-title-cell-text">
<strong class="truncate" title={titleHover}>
{titleDisplay}
</strong>
@@ -641,6 +643,10 @@ app.get("/", async (c) => {
{descDisplay}
</div>
)}
<LastEmail
at={feed.lastEmailAt}
count={feed.emailCount}
/>
</div>
{feed.pendingConfirmation && (
<ConfirmationPill feedId={feed.id} />
@@ -683,6 +689,7 @@ app.get("/", async (c) => {
tabindex={-1}
>
Emails
<EmailCountBadge count={feed.emailCount} />
</span>
</>
) : (
@@ -698,6 +705,7 @@ app.get("/", async (c) => {
class="button button-small"
>
Emails
<EmailCountBadge count={feed.emailCount} />
</a>
</>
)}
@@ -780,6 +788,10 @@ app.get("/", async (c) => {
<span title={descHover}>{descDisplay}</span>
</p>
)}
<LastEmail
at={feed.lastEmailAt}
count={feed.emailCount}
/>
</div>
<div style="margin-bottom: var(--spacing-md);">
@@ -819,6 +831,7 @@ app.get("/", async (c) => {
tabindex={-1}
>
Emails
<EmailCountBadge count={feed.emailCount} />
</span>
</>
) : (
@@ -834,6 +847,7 @@ app.get("/", async (c) => {
class="button button-small"
>
Emails
<EmailCountBadge count={feed.emailCount} />
</a>
</>
)}
+35
View File
@@ -325,3 +325,38 @@ export const ExpiryBadge = ({ expiresAt }: { expiresAt: number }) => {
</span>
);
};
// ── Email activity ──────────────────────────────────────────────────────────────
function formatRelativeTime(ts: number): string {
const diff = Date.now() - ts;
if (diff < 60_000) return "just now";
const m = Math.floor(diff / 60_000);
if (m < 60) return `${m}m ago`;
const h = Math.floor(m / 60);
if (h < 24) return `${h}h ago`;
const d = Math.floor(h / 24);
if (d < 30) return `${d}d ago`;
const mo = Math.floor(d / 30);
if (mo < 12) return `${mo}mo ago`;
return `${Math.floor(mo / 12)}y ago`;
}
// Count badge rendered inside the "Emails" button. Omitted for legacy feeds
// whose count hasn't been projected into feeds:list yet (backfills on next save).
export const EmailCountBadge = ({ count }: { count?: number }) =>
count === undefined ? null : <span class="button-count">{count}</span>;
// Muted "last email" freshness line for the feed title block. Shows "No emails
// yet" for empty feeds; renders nothing when the timestamp isn't projected yet.
export const LastEmail = ({ at, count }: { at?: number; count?: number }) => {
if (count === 0) {
return <span class="feed-activity muted">No emails yet</span>;
}
if (at === undefined) return null;
return (
<span class="feed-activity muted" title={new Date(at).toLocaleString()}>
Last email {formatRelativeTime(at)}
</span>
);
};
+27
View File
@@ -77,6 +77,33 @@
gap: var(--spacing-sm);
}
/* Let the title/description text shrink so .truncate ellipsizes instead of
overflowing into the next column. Flex items default to min-width:auto. */
.feed-title-cell-text {
flex: 1;
min-width: 0;
}
/* "Last email …" freshness line under the feed title. */
.feed-activity {
display: block;
margin-top: 4px;
font-size: var(--font-size-sm);
}
/* Count badge inside the "Emails" button (always on the orange primary button,
incl. its faded disabled variant, so a light-on-dark badge fits both modes). */
.button-count {
display: inline-block;
margin-left: 6px;
padding: 0 6px;
border-radius: 999px;
background: rgba(255, 255, 255, 0.22);
font-size: var(--font-size-xs);
font-weight: var(--font-weight-semibold);
line-height: 1.5;
}
.feed-description {
font-size: var(--font-size-md);
color: var(--color-text-secondary);
+2
View File
@@ -111,6 +111,8 @@ export interface FeedListItem {
expires_at?: number; // Cached from FeedConfig to avoid per-feed KV reads
pendingConfirmation?: boolean; // Projected from FeedMetadata for the dashboard
hasNativeFeed?: boolean; // Projected from FeedMetadata for the dashboard pill
emailCount?: number; // Projected email index size (dashboard "Emails" count)
lastEmailAt?: number; // Projected receivedAt (ms) of the most recent email
}
// Cumulative monitoring counters (persisted as a KV singleton)