Appearance
email-top-tasks
When to invoke
Trigger this skill when the user asks any of:
- "list the top N tasks from my email"
- "what do I need to act on from the last X days"
- "summarize my email inbox into action items"
- anything that couples email_store data + local LM Studio + ranked task list
Pre-flight (non-negotiable)
- Confirm Postgres reachable:
host=192.168.1.33 port=54522 db=email_store. Creds in~/Developer/EMail CLI/email-store/.env. - Confirm LM Studio reachable:
curl -s http://192.168.1.235:1234/v1/models. - If the user names a model other than
google/gemma-4-e4b, setMODELenv var.
Known infra pitfalls (learned the hard way — do NOT re-discover)
| Symptom | Root cause | Fix |
|---|---|---|
NotImplementedError: RotatingKVCache Quantization NYI | LM Studio KV cache quantisation toggle does not support Gemma sliding-window rotating cache once total tokens exceed ~5000. | Keep each request prompt + max_tokens ≤ ~4600. Split into cohorts. |
Output empty, reasoning_tokens equals completion_tokens | Gemma thinking eats the entire output budget before producing any content. | Add a third message {"role":"assistant","content":"1."} as a prefill — the model resumes from "1." and skips the think phase. Prepend "1." back when parsing. |
The number of tokens to keep from the initial prompt is greater than the context length | Target model loaded with smaller context than requested. | Either shrink input or pick a model loaded with ≥10k ctx (gemma-4-e4b is). |
Model loading was stopped due to insufficient system resources | Machine cannot hold the model. | Fall back to a smaller loaded model (gemma-4-e4b, gpt-oss-20b, qwen3-vl-8b). |
Pipeline shape
Three Postgres queries → three LM Studio calls → merge + dedup + renumber.
- Cohort A (
work/personal/travel/invoice/education/otherwith non-emptyai_action_items): up to 50 tasks. - Cohort B (upcoming
subscriptions,v_email_transaction_amountsbig TX,notificationemails withpay|due|bill|invoice|refund|taxaction items): up to 50 tasks. - Cohort C (
email_threads.is_awaiting_reply=truefiltered against newsletter regex, plusnotificationemails withrsvp|confirm|verify|deliver|renew|expire|deadlineaction items): up to 40 tasks.
Dedup by lowercase-alnum prefix of the first 50 chars, keep first 130, renumber 1..130.
Budget per call (keep total tokens ≤ 4600)
input ≈ 1100–1700 tokens
max_tokens = 2300
reasoning_tokens = 0 ← guaranteed via prefill trickReference script
~/.claude/skills/email-top-tasks/run.py — copy and adjust N_TASKS, WINDOW_DAYS, and the three SQL slices to tune.
Invocation
bash
MODEL=google/gemma-4-e4b python3 ~/.claude/skills/email-top-tasks/run.py
# Output goes to stdout and /tmp/top130_tasks.txtMeasured performance (baseline, 2026-04-20, David's LM Studio @ 192.168.1.235, gemma-4-e4b)
Full 3-call pipeline, 130 tasks output:
| Metric | Value |
|---|---|
| Wall time (3 sequential calls) | 150.90 s |
| Cohort A usage | prompt=2491, completion=1995, total=4486 |
| Cohort B usage | prompt=2376, completion=1919, total=4295 |
| Cohort C usage | prompt=1886, completion=1819, total=3705 |
| Total tokens processed | 12 486 |
| Completion tokens only | 5 733 |
| Completion throughput | ≈ 38 tok/s |
| End-to-end throughput | ≈ 82.7 tok/s |
| Tasks emitted | 130 (100 % of target) |
Per-call avg ≈ 50 s. If the user wants faster wall time, issue the 3 calls in parallel — they're independent and the server can serve them back-to-back; expect ~55-70 s end-to-end.
Open issue surfaced while running
email_threads.is_awaiting_reply in this project is heavily polluted with newsletter / security-alert subjects (e.g. "New login from Chrome", "Cricut Craftfest", "Koala x Bluey collection"). The classification logic in scripts/reconstruct_threads.py needs tighter filters. The skill works around it with a regex exclusion in the cohort-C query, but the data pipeline itself should be fixed.