Condensed prose, not code. This page shows every major LLM call, what goes in, the actual prompt shape, what tool/client is used, what must come out, and then every major processing step after generation. Goal: make the full factory legible enough to debug where quality is slipping.
| Layer | What it does | Main code |
|---|---|---|
| Preflight | Optional research bootstrap + audience persona generation | content_cli.py, research_integration.py, persona_generator.py |
| Phase 1 | 9 writing agents generate the article ingredients | core/dispatcher.py + core/prose_runner.py |
| Phase 2 | Assembler turns components into article text; humanizer modifies it | core/assembler.py, core/humanizer.py |
| Phase 3 | Audience/persona evaluation scores the result | content_evaluator.py, core/pipeline.py |
| Review | Slop audit, manual R/Y/G review, publishing decision | core/slop_auditor.py + review site |
Input: the content brief, derived audience, and a planned 3-cone query set: macro, mezzo, micro.
Actual prompt shape:
System: “You are a research analyst. Search the open web and return a concise JSON with: sources (array of {title,url}), keyFindings (array), and 140-char summary.”
User: “Research request (focus): query”
Output requirement: cached Perplexity response, stored as assistant_text plus raw response. This is not yet article prose; it is research fuel.
What changes after this step: the run gets external research context and suggested queries. If quota is exhausted, the system degrades to a minimal response instead of failing hard.
Input: brief, primary audience label, and a compact summary of the research bundle.
Actual prompt shape:
“You are designing audience personas to evaluate an article… Create 12 personas grouped as core (3), expanded (6), adjacent (3)… Each persona should include name, role, goals, pains, content preferences, evaluation criteria, success signal. Return a single JSON object.”
Tool(s) called: OpenRouter only. Then an anti-persona generator creates contrarian readers for stress tests.
Output requirement: JSON personas file in workspace/current/evaluation/personas.json.
What changes after this step: the evaluator gets real-ish readers instead of static rubric-only scoring.
All nine agents are fed a readable context block, not just raw JSON. The prompt builder puts the content brief at the top, then product/audience metadata, SEO queries, previous output, and critic feedback where relevant. Soul text is inserted before the agent prompt when available.
Input: content brief + topic metadata.
Actual prompt:
“Define a clear, testable goal for the article based on the brief. Use the framework: When [Audience] reads this on [Platform] they will [Feel/Do] because [Payoff]. Apply Brooks' Warm Heart / Cool Head test. Propose a one-sentence success metric.”
Output requirement: JSON with goalStatement, warmHeart, coolHead, successMetric, context.
Input: brief + audience description.
Actual prompt:
“Synthesize a human audience model from the topic + niche… Produce 20 jobs-to-be-done, 3 top drivers, and 4 avatars with lived context, trigger event, shower-thought question, rewards, and hates. Avoid generic personas. Make them feel real.”
Output requirement: JSON only: JTBD list, top drivers, 4 avatars.
Input: brief + audience + any product context.
Actual prompt:
“Select ideal authorial voices and synthesize a style profile. List 5–7 apex writers; apply King's honesty and Adams' attention filters. Select top-3 with 30-word style synopses. Produce a concise voiceProfile capturing diction, pacing, tone, and POV guidance.”
Output requirement: JSON with apex writers, top 3, and voice profile.
Input: brief + audience model + soul/voice profile.
Generator prompt:
“Generate titles + hooks that match the audience and the writer SOUL… Produce 10 titles and 10 hooks… Optimize for signal, taste, edge… Avoid generic scaffolding like ‘Have you ever wondered…’ or ‘In today’s world…’”
Critic prompt:
“Score each title/hook combo on CRAP: Clear, Relevant, At-a-glance, Provocative. Return JSON: {score, top3, improvements}.”
Output requirement: top 3 title/hook combos plus critic scores. Loop stops at score ≥ 85 or round 3.
Input: brief, audience, voice, any research already available.
Generator prompt:
“Produce 40–50 fragments (anecdotes, stats, quotes, steps) aligned to the goal. Label Story / Science / Steps. Score each for relevance/impact and dedupe near-duplicates.”
Critic prompt:
“Check fragment set for diversity, specificity, and voice match. Penalize repetition and generic phrasing. Return JSON: {score, repetitions, gaps, fix_brief}.”
Output requirement: fragment set plus critic notes. Loop stops at score ≥ 85 or round 3.
Input: brief, audience, and the external research bundle if Perplexity was available.
Actual prompt:
“Plan and perform research; highlight gaps to fill. Three-cone research: 3 seminal (macro), 5 recent (mezzo), 15 social (micro). Each with 140-char summaries and links when available. Identify gaps; propose targeted queries.”
Output requirement: JSON with macro / mezzo / micro research arrays, gaps, and context.
Input: all previous planning ingredients.
Actual prompt:
“Propose and rank 3 outline structures best suited to goal and audience. Consider templates like Listicle, Problem–Solution, Myth–Buster (or better fits). Build outlines with H2/H3 headings and bullet summaries. Rank by fit and explain ranking.”
Output requirement: JSON with 3 ranked structures and rationale.
Input: topic assignment, audience summary, soul, selected title/hook, research bullets, prior audience notes if revising.
Writer prompt (core excerpt):
“Write the canonical longform blog post… Length must land 1800–2800 words… Open with a scene or concrete claim. Name the promise. Build an argument spine. Use one memorable framework. Include counterpressure. End with a landing… Return JSON only with title, dek, article_markdown, word_count, section_outline, notes_for_next_iteration.”
Critic prompt:
“Review the draft for clarity, structure, accuracy, and voice match. Score 0–100 with penalties for AI-smell. Return JSON: {score, must_fix, should_fix, notes, revision_brief}.”
Output requirement: canonical markdown draft + revision notes. Loop stops at score ≥ 85 or round 3.
Input: draft plus all earlier context.
Editor prompt:
“Apply persuasion overlay, emotional arc, and readability proofing. Persuasion: contrast, repetition, vivid examples; mark placements. Emotional arc: map curiosity → insight → resolution. Readability: clarity, flow, jargon trimming; suggest cuts/additions.”
Critic prompt:
“Evaluate edited output for flow, voice consistency, and AI-smell. Score 0–100 and flag any robotic phrasing. Return JSON: {score, robotic_lines, must_fix, notes}.”
Output requirement: polished markdown plus editor notes. Loop stops at score ≥ 88 or round 3.
What changes: picks the chosen title, pulls article_markdown from draft_creator, and writes the initial article text.
Rules now: editor telemetry is no longer appended into article text. CTA, persuasion overlay, emotional arc, readability, timestamps, and editor suggestions are written to article_meta.json sidecar instead.
Known struggle: duplicate H1 can still happen if the draft already contains the title and the assembler prepends another top title.
What changes: swaps inline source IDs like [ABC-123] for titled markdown links, then appends a references section if references were used.
Input: the assembled article plus the brief.
Actual prompt shape:
“You extract likely jargon for an article and recommend concise alternatives. Return JSON only following the schema: likely_jargon [{term, reason, suggested_alternatives}], forbidden_rewrites, borderline_terms… Brief: … Article excerpt (first 1200 chars): …”
Tool(s) called: OpenRouter via JSON guard, plus heuristic word list fallback.
Output requirement: list of suspect terms and simpler replacements; then the processor rewrites matching terms in the article.
Input: the full article, voice signature, optional quantified profile deltas.
Actual system prompt shape:
“You are a careful style editor making TARGETED adjustments to match a quantitative voice profile… Apply ONLY the specific edits listed below… Do NOT change meaning, facts, section headings, list structure, or formatting… Do NOT compress, summarize, or shorten.”
Output requirement: full adjusted text only. If output compresses too much, the original text is kept.
Known struggle: this pass has been implicated in over-compression and weird connective damage in some runs.
What changes: regex substitutions strip banned patterns (“As an AI”, “In conclusion”, “Let’s dive in”, etc.), then sentence variety/voice checks try to reduce robotic rhythm.
Known struggle: this sentence-variety logic likely contributes to the broken “X, and Y. Z, and …” connective damage in bad Kenro runs.
What changes: if human feedback is enabled, the terminal asks for a review signal before continuing. This was the original justification for some editor-note plumbing.
What changes: if enabled, compares article to voice samples and can harmonize style. Currently usually off in production runs.
Input: full article plus agent contributions, persona profile, human context, voice context.
Actual prompt excerpt:
“You are reading a content article after iteration X of development. Your job is to react in-character as this persona (first-person voice). This is not a generic rubric… Tell us what you felt as you read: where you leaned in, where you zoned out, where you rolled your eyes… Return JSON with overall_score, detailed_scores, reaction, strengths, weaknesses, recommendation.”
Output requirement: persona-by-persona JSON evaluations, aggregated into overall score, gate score, trend analysis, and per-agent feedback history.
What changes: rolls up scores, improvement trend, cumulative averages, and narrative metrics. Phase 3 currently passes if quality passes and audience score is at least 20. This is one reason internal final scores can look over-optimistic.
What changes: scores the article against 100+ slop rules and remnant patterns. It can run in audit mode, clean mode, and JSON mode.
Rules summary: LLM Tell, Corporate, Filler, AI Phrase, Overwrought, Remnant. Hard-fail on any pipeline leakage. Also checks duplicate H1, target drift, and trailing scaffolding.
Important current state: this exists, but is not yet fully baked into the pipeline loop. It is a backstop, not the primary gate.
final_score: 100, which is not trustworthy as a real editorial signal.This is the condensed factory map. It is intentionally prose-first, but every step above is grounded in the current repo prompts and orchestration code.