Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Enterprise-grade research with multi-source synthesis, citation tracking, and verification. 8-phase pipeline with auto-continuation.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
reference/methodology.md
1# Deep Research Methodology: 8-Phase Pipeline23## Overview45This document contains the detailed methodology for conducting deep research. The 8 phases represent a comprehensive approach to gathering, verifying, and synthesizing information from multiple sources.67---89## Phase 1: SCOPE - Research Framing1011**Objective:** Define research boundaries and success criteria1213**Activities:**141. Decompose the question into core components152. Identify stakeholder perspectives163. Define scope boundaries (what's in/out)174. Establish success criteria185. List key assumptions to validate1920**Ultrathink Application:** Use extended reasoning to explore multiple framings of the question before committing to scope.2122**Output:** Structured scope document with research boundaries2324---2526## Phase 2: PLAN - Strategy Formulation2728**Objective:** Create an intelligent research roadmap2930**Activities:**311. Identify primary and secondary sources322. Map knowledge dependencies (what must be understood first)333. Create search query strategy with variants344. Plan triangulation approach355. Estimate time/effort per phase366. Define quality gates3738**Graph-of-Thoughts:** Branch into multiple potential research paths, then converge on optimal strategy.3940**Output:** Research plan with prioritized investigation paths4142---4344## Phase 3: RETRIEVE - Parallel Information Gathering4546**Objective:** Systematically collect information from multiple sources using parallel execution for maximum speed4748**CRITICAL: Execute ALL searches in parallel using a single message with multiple tool calls**4950### Query Decomposition Strategy5152Before launching searches, decompose the research question into 5-10 independent search angles:53541. **Core topic (semantic search)** - Meaning-based exploration of main concept552. **Technical details (keyword search)** - Specific terms, APIs, implementations563. **Recent developments (date-filtered)** - What's new in last 12-18 months (use current date from Step 0)574. **Academic sources (domain-specific)** - Papers, research, formal analysis585. **Alternative perspectives (comparison)** - Competing approaches, criticisms596. **Statistical/data sources** - Quantitative evidence, metrics, benchmarks607. **Industry analysis** - Commercial applications, market trends618. **Critical analysis/limitations** - Known problems, failure modes, edge cases6263### Parallel Execution Protocol6465**Step 0: Get the current date**6667Before ANY searches, retrieve today's date using Bash: `date +%Y-%m-%d`68Use the returned year for all date-filtered queries and recency checks. Do NOT assume a year from training data.6970**Step 1: Launch ALL searches concurrently (single message)**7172**CRITICAL: Use correct tool and parameters to avoid errors**7374**Primary: search-cli (multi-provider, always use first)**75- Unified CLI aggregating Brave, Serper, Exa, Jina, and Firecrawl76- Auto-detects best provider per query type (academic, news, general, people)77- JSON output for structured processing: `search "query" --json`78- Modes: general, news, academic, scholar, patents, people, images, extract, scrape79- Example: `search "quantum computing 2025" -m academic --json -c 15`80- For page content extraction: `search "URL" -m extract --json`81- For scraping: `search "URL" -m scrape --json`82- Run via Bash tool: `search "query" --json -c 10`8384**Fallback: WebSearch (if search-cli fails or is unavailable)**85- Built-in Claude web search, no setup required86- Parameters: `query` (required), optional `allowed_domains`, `blocked_domains`87- Use when: search-cli returns errors, rate-limited, or for domain-restricted queries8889**Optional: Exa MCP (if configured, for semantic/neural search)**90- Tool name: `mcp__Exa__exa_search`91- Use for semantic exploration alongside search-cli keyword results929394**NEVER mix parameter styles** - this causes "Invalid tool parameters" errors.9596**Step 2: Spawn parallel deep-dive agents**9798Use Task tool with general-purpose agents (3-5 agents) for:99- Academic paper analysis (PDFs, detailed extraction)100- Documentation deep dives (technical specs, API docs)101- Repository analysis (code examples, implementations)102- Specialized domain research (requires multi-step investigation)103104**Sub-agent output format:** Require all sub-agents to return structured evidence, not free text:105```json106{"claim": "specific claim text", "evidence_quote": "exact quote from source", "source_url": "https://...", "source_title": "...", "confidence": 0.85}107```108This prevents synthesis fatigue when merging results from 3-5 agents.109110**Evidence persistence (v3.0):** After each retrieval batch, persist evidence immediately:111```bash112# Register the source first (returns stable source_id)113python scripts/citation_manager.py register-source --json '{"raw_url": "...", "title": "..."}' --dir [folder]114115# Then persist each evidence span from that source116python scripts/evidence_store.py add --json '{"source_id": "...", "quote": "exact text", "evidence_type": "direct_quote", "locator": "page 5"}' --dir [folder]117```118Evidence must not live only in model context — it must be persisted to `evidence.jsonl` before synthesis begins. This ensures continuation agents and claim-support verification can access the full evidence trail.119120**Example parallel execution (using search-cli via Bash):**121```122[Single message with multiple Bash tool calls]123- Bash: search "quantum computing 2026 state of the art" --json -c 10124- Bash: search "quantum computing limitations challenges" --json -c 10125- Bash: search "quantum computing commercial applications 2026" -m news --json -c 10126- Bash: search "quantum computing vs classical comparison" --json -c 10127- Bash: search "quantum error correction research" -m academic --json -c 10128- Task(subagent_type="general-purpose", description="Analyze quantum computing papers", prompt="Deep dive into quantum computing academic papers from [CURRENT_YEAR], extract key findings and methodologies")129- Task(subagent_type="general-purpose", description="Industry analysis", prompt="Analyze quantum computing industry reports and market data, identify commercial applications")130- Task(subagent_type="general-purpose", description="Technical challenges", prompt="Extract technical limitations and challenges from quantum computing research")131```132133**Example parallel execution (using Exa MCP - if available):**134```135[Single message with multiple tool calls]136- mcp__Exa__exa_search(query="quantum computing state of the art", type="neural", num_results=10, start_published_date="[use current year from Step 0]")137- mcp__Exa__exa_search(query="quantum computing limitations", type="keyword", num_results=10)138- mcp__Exa__exa_search(query="quantum computing commercial", type="auto", num_results=10, start_published_date="[use current year from Step 0]")139- mcp__Exa__exa_search(query="quantum error correction", type="neural", num_results=10, include_domains=["arxiv.org"])140- Task(subagent_type="general-purpose", description="Academic analysis", prompt="Analyze quantum computing academic papers")141```142143**Step 3: Collect and organize results**144145As results arrive:1461. Extract key passages with source metadata (title, URL, date, credibility)1472. Track information gaps that emerge1483. Follow promising tangents with additional targeted searches1494. Maintain source diversity (mix academic, industry, news, technical docs)1505. Monitor for quality threshold (see FFS pattern below)151152### First Finish Search (FFS) Pattern153154**Adaptive completion based on quality threshold:**155156**Quality gate:** Proceed to Phase 4 when FIRST threshold reached:157- **Quick mode:** 10+ sources with avg credibility >60/100 OR 2 minutes elapsed158- **Standard mode:** 15+ sources with avg credibility >60/100 OR 5 minutes elapsed159- **Deep mode:** 25+ sources with avg credibility >70/100 OR 10 minutes elapsed160- **UltraDeep mode:** 30+ sources with avg credibility >75/100 OR 15 minutes elapsed161162**Continue background searches:**163- If threshold reached early, continue remaining parallel searches in background164- Additional sources used in Phase 5 (SYNTHESIZE) for depth and diversity165- Allows fast progression without sacrificing thoroughness166167### Quality Standards168169**Source diversity requirements:**170- Minimum 3 source types (academic, industry, news, technical docs)171- Temporal diversity (mix of recent 12-18 months + foundational older sources)172- Perspective diversity (proponents + critics + neutral analysis)173- Geographic diversity (not just US sources)174175**Credibility tracking:**176- Score each source 0-100 using source_evaluator.py177- Flag low-credibility sources (<40) for additional verification178- Prioritize high-credibility sources (>80) for core claims179180**Techniques:**181- Use search-cli for all searches (primary tool, multi-provider)182- Fall back to WebSearch if search-cli fails or is rate-limited183- Use WebFetch for deep dives into specific sources (secondary)184- Use Exa search (via WebSearch with type="neural") for semantic exploration185- Use Grep/Read for local documentation186- Execute code for computational analysis (when needed)187- Use Task tool to spawn parallel retrieval agents (3-5 agents)188189**Output:** Organized information repository with source tracking, credibility scores, and coverage map190191---192193## Phase 4: TRIANGULATE - Cross-Reference Verification194195**Objective:** Validate information across multiple independent sources196197**Activities:**1981. Identify claims requiring verification1992. Cross-reference facts across 3+ sources2003. Flag contradictions or uncertainties2014. Assess source credibility2025. Note consensus vs. debate areas2036. Document verification status per claim204205**Quality Standards:**206- Core claims must have 3+ independent sources207- Flag any single-source information208- Note recency of information209- Identify potential biases210211**Output:** Verified fact base with confidence levels212213---214215## Phase 4.5: OUTLINE REFINEMENT - Dynamic Evolution (WebWeaver 2025)216217**Objective:** Adapt research direction based on evidence discovered218219**Problem Solved:** Prevents "locked-in" research when evidence points to different conclusions or uncovers more important angles than initially planned.220221**When to Execute:**222- **Standard/Deep/UltraDeep modes only** (Quick mode skips this)223- After Phase 4 (TRIANGULATE) completes224- Before Phase 5 (SYNTHESIZE)225226**Activities:**2272281. **Review Initial Scope vs. Actual Findings**229- Compare Phase 1 scope with Phase 3-4 discoveries230- Identify unexpected patterns or contradictions231- Note underexplored angles that emerged as critical232- Flag overexplored areas that proved less important2332342. **Evaluate Outline Adaptation Need**235236**Signals for adaptation (ANY triggers refinement):**237- Major findings contradict initial assumptions238- Evidence reveals more important angle than originally scoped239- Critical subtopic emerged that wasn't in original plan240- Original research question was too broad/narrow based on evidence241- Sources consistently discuss aspects not in initial outline242243**Signals to keep current outline:**244- Evidence aligns with initial scope245- All key angles adequately covered246- No major gaps or surprises2472483. **Refine Outline (if needed)**249250**Update structure to reflect evidence:**251- Add sections for unexpected but important findings252- Demote/remove sections with insufficient evidence253- Reorder sections based on evidence strength and importance254- Adjust scope boundaries based on what's actually discoverable255256**Example adaptation:**257```258Original outline:2591. Introduction2602. Technical Architecture2613. Performance Benchmarks2624. Conclusion263264Refined after Phase 4 (evidence revealed security as critical):2651. Introduction2662. Technical Architecture2673. **Security Vulnerabilities (NEW - major finding)**2684. Performance Benchmarks (demoted - less critical than expected)2695. **Real-World Failure Modes (NEW - pattern emerged)**2706. Synthesis & Recommendations271```2722734. **Targeted Gap Filling (if major gaps found)**274275If outline refinement reveals critical knowledge gaps:276- Launch 2-3 targeted searches for newly identified angles277- Quick retrieval only (don't restart full Phase 3)278- Time-box to 2-5 minutes279- Update triangulation for new evidence only2802815. **Document Adaptation Rationale**282283Record in methodology appendix:284- What changed in outline285- Why it changed (evidence-driven reasons)286- What additional research was conducted (if any)287288**Quality Standards:**289- Adaptation must be evidence-driven (cite specific sources that prompted change)290- No more than 50% outline restructuring (if more needed, scope was severely mis scoped)291- Retain original research question core (don't drift into different topic entirely)292- New sections must have supporting evidence already gathered293294**Output:** Refined outline that accurately reflects evidence landscape, ready for synthesis295296**Anti-Pattern Warning:**297- ❌ DON'T adapt outline based on speculation or "what would be interesting"298- ❌ DON'T add sections without supporting evidence already in hand299- ❌ DON'T completely abandon original research question300- ✅ DO adapt when evidence clearly indicates better structure301- ✅ DO document rationale for changes302- ✅ DO stay within original topic scope303304---305306## Phase 5: SYNTHESIZE - Deep Analysis307308**Objective:** Connect insights and generate novel understanding309310**Activities:**3111. Identify patterns across sources3122. Map relationships between concepts3133. Generate insights beyond source material3144. Create conceptual frameworks3155. Build argument structures3166. Develop evidence hierarchies317318**Ultrathink Integration:** Use extended reasoning to explore non-obvious connections and second-order implications.319320**Output:** Synthesized understanding with insight generation321322---323324## Phase 6: CRITIQUE - Quality Assurance325326**Objective:** Rigorously evaluate research quality327328**Activities:**3291. Review for logical consistency3302. Check citation completeness3313. Identify gaps or weaknesses3324. Assess balance and objectivity3335. Verify claims against sources3346. Test alternative interpretations335336**Red Team Questions:**337- What's missing?338- What could be wrong?339- What alternative explanations exist?340- What biases might be present?341- What counterfactuals should be considered?342343**Persona-Based Critique (Deep/UltraDeep only):**344Simulate 2-3 specific critic personas relevant to the topic:345- "Skeptical Practitioner" — Would someone doing this daily trust these findings?346- "Adversarial Reviewer" — What would a peer reviewer reject?347- "Implementation Engineer" — Can these recommendations actually be executed?348349**Critical Gap Loop-Back:**350If critique identifies a critical knowledge gap (not just a writing issue), return to Phase 3 with targeted "delta-queries" before proceeding to Phase 7. Time-box to 3-5 minutes. This prevents publishing reports with known blind spots.351352**Output:** Critique report with improvement recommendations353354---355356## Phase 7: REFINE - Iterative Improvement357358**Objective:** Address gaps and strengthen weak areas359360**Activities:**3611. Conduct additional research for gaps3622. Strengthen weak arguments3633. Add missing perspectives3644. Resolve contradictions3655. Enhance clarity3666. Verify revised content367368**Output:** Strengthened research with addressed deficiencies369370---371372## Phase 8: PACKAGE - Report Generation373374**Objective:** Deliver professional, actionable research375376**Activities:**3771. Structure report with clear hierarchy3782. Write executive summary3793. Develop detailed sections3804. Create visualizations (tables, diagrams)3815. Compile full bibliography3826. Add methodology appendix383384**Output:** Complete research report ready for use385386---387388## Advanced Features389390### Graph-of-Thoughts Reasoning391392Rather than linear thinking, branch into multiple reasoning paths:393- Explore alternative framings in parallel394- Pursue tangential leads that might be relevant395- Merge insights from different branches396- Backtrack and revise as new information emerges397398### Parallel Agent Deployment399400Use Task tool to spawn sub-agents for:401- Parallel source retrieval402- Independent verification paths403- Competing hypothesis evaluation404- Specialized domain analysis405406### Adaptive Depth Control407408Automatically adjust research depth based on:409- Information complexity410- Source availability411- Time constraints412- Confidence levels413414### Citation Intelligence415416Smart citation management:417- Track provenance of every claim418- Link to original sources419- Assess source credibility420- Handle conflicting sources421- Generate proper bibliographies422