{"kind":"Skill","metadata":{"namespace":"community","name":"thinking-kepner-tregoe","version":"0.1.0"},"spec":{"description":"Systematic rational process for complex problem analysis, decision making, and risk assessment. Use for high-stakes engineering decisions, root cause analysis beyond 5 Whys, and multi-factor evaluations requiring structured criteria.","files":{"SKILL.md":"---\nname: thinking-kepner-tregoe\ndescription: Systematic rational process for complex problem analysis, decision making, and risk assessment. Use for high-stakes engineering decisions, root cause analysis beyond 5 Whys, and multi-factor evaluations requiring structured criteria.\n---\n\n# Kepner-Tregoe Rational Process\n\n## Overview\n\nThe Kepner-Tregoe (KT) methodology, developed by Charles Kepner and Benjamin Tregoe in the 1950s, provides four integrated analytical processes for rational thinking. Unlike heuristic approaches, KT offers rigorous frameworks for separating fact from speculation and making defensible decisions.\n\n**Core Principle:** Separate what you know from what you assume. Use structured comparison to reveal truth.\n\n## When to Use\n\n- Complex engineering problems with multiple potential causes\n- High-stakes decisions requiring documented rationale\n- Root cause analysis when 5 Whys yields ambiguous results\n- Evaluating alternatives with competing criteria\n- Post-implementation risk assessment\n- Incident response requiring systematic triage\n- Architecture decisions with long-term implications\n\nDecision flow:\n\n```text\nComplex problem? → yes → Multiple concerns/unclear priority? → yes → Start with SA\n                                                            ↘ no → Known single problem? → yes → PA\n                                                                                         ↘ no → Decision needed? → yes → DA\n                                                                                                                 ↘ no → Implementation risk? → yes → PPA\n               ↘ no → Simpler frameworks may suffice\n```\n\n## The Four Processes\n\n| Process | Purpose | Key Question |\n|---------|---------|--------------|\n| **SA** - Situation Analysis | Clarify and prioritize | \"What's going on?\" |\n| **PA** - Problem Analysis | Find root cause | \"Why did this happen?\" |\n| **DA** - Decision Analysis | Evaluate alternatives | \"What should we do?\" |\n| **PPA** - Potential Problem Analysis | Anticipate risks | \"What could go wrong?\" |\n\n---\n\n## Process 1: Situation Analysis (SA)\n\nUse when facing multiple concerns, unclear priorities, or overwhelm.\n\n### Purpose\n\nBreak complex situations into manageable components, set priorities, and plan approach.\n\n### Steps\n\n#### Step 1: List All Concerns\n\nBrainstorm everything that needs attention:\n\n```markdown\nConcerns:\n- Production API latency increased 3x\n- New feature deployment blocked\n- Team velocity dropped 40%\n- Customer complaints about checkout errors\n- Database connection pool exhaustion\n- Unclear requirements for Q2 roadmap\n```\n\n#### Step 2: Separate and Clarify\n\nFor each concern, ask: \"Is this one issue or multiple?\"\n\n```markdown\n\"Production performance issues\" →\n  - API latency (response time)\n  - Database connections (resource exhaustion)\n  - Memory usage (potential leak)\n```\n\n#### Step 3: Set Priority\n\nUse Timing, Impact, Trend (TIT):\n\n| Concern | Timing | Impact | Trend | Priority |\n|---------|--------|--------|-------|----------|\n| API latency | Urgent | High | Worsening | P0 |\n| DB connections | Urgent | Critical | Stable | P0 |\n| Checkout errors | Soon | High | Worsening | P1 |\n| Velocity drop | Soon | Medium | Stable | P2 |\n\n**Timing:** When must action be taken?\n**Impact:** What's the consequence of inaction?\n**Trend:** Is it getting better, worse, or stable?\n\n#### Step 4: Plan Approach\n\nFor each prioritized concern, determine:\n\n- Which KT process applies (PA, DA, PPA)?\n- Who should be involved?\n- What information is needed?\n\n### SA Template\n\n```markdown\n# Situation Analysis: [Context]\nDate: [Date]\n\n## Concerns Inventory\n| # | Concern | Clarification Needed? |\n|---|---------|----------------------|\n| 1 | [Concern] | [Yes/No - details] |\n\n## Priority Matrix\n| Concern | Timing | Impact | Trend | Priority | Next Process |\n|---------|--------|--------|-------|----------|--------------|\n| | | | | | SA/PA/DA/PPA |\n\n## Action Plan\n| Priority | Concern | Process | Owner | Deadline |\n|----------|---------|---------|-------|----------|\n| P0 | | | | |\n```\n\n---\n\n## Process 2: Problem Analysis (PA)\n\nUse when you need to find root cause of a deviation from expected performance.\n\n### Purpose\n\nSystematically identify the true cause by comparing what IS happening vs. what IS NOT.\n\n### Key Concept: The IS/IS-NOT Matrix\n\nThe power of PA lies in specification through contrast. A problem exists in a specific context—understanding boundaries reveals cause.\n\n### Steps\n\n#### Step 1: State the Problem\n\nBe precise about the deviation:\n\n```text\nBAD: \"The system is slow\"\nGOOD: \"API response time increased from 200ms to 800ms for /checkout endpoint starting Monday 9 AM\"\n```\n\n#### Step 2: Specify the Problem (IS/IS-NOT)\n\n| Dimension | IS | IS NOT | Distinction |\n|-----------|-----|---------|-------------|\n| **WHAT** | | | |\n| What object has the problem? | /checkout endpoint | /cart, /product, /user endpoints | Only payment-related |\n| What is the defect? | 4x latency increase | Errors, timeouts, data corruption | Performance only |\n| **WHERE** | | | |\n| Where is the object when observed? | Production US-East | Production EU, US-West, staging | Single region |\n| Where on the object? | Database query phase | Auth, validation, response serialization | DB layer |\n| **WHEN** | | | |\n| When was it first observed? | Monday 9:00 AM | Before Monday, after 6 PM | Business hours |\n| When in lifecycle/pattern? | During checkout submit | During browsing, cart add | Write operations |\n| **EXTENT** | | | |\n| How many objects affected? | ~30% of checkout requests | 100% of requests | Intermittent |\n| How much of object affected? | 600ms additional latency | Complete failure | Degradation |\n| Is it growing/spreading? | Stable since Tuesday | Getting worse | Plateaued |\n\n#### Step 3: Identify Distinctions\n\nFor each IS/IS-NOT pair, ask: \"What's unique or distinctive about the IS side?\"\n\n```markdown\nDistinctions identified:\n- Only /checkout endpoint (payment processing)\n- Only US-East region (specific DB replica)\n- Only during business hours (load-related?)\n- Only ~30% of requests (specific query pattern?)\n- Started Monday 9 AM (deployment? config change?)\n```\n\n#### Step 4: Identify Changes\n\nWhat changed in, on, around, or about the distinctions?\n\n```markdown\nChanges near Monday 9 AM:\n- Payment provider SDK updated (Sunday night deploy)\n- Database index rebuild scheduled (Sunday maintenance)\n- New fraud detection rules enabled (Monday 8:45 AM)\n```\n\n#### Step 5: Generate Possible Causes\n\nCombine distinctions and changes:\n\n```markdown\nPossible causes:\n1. Fraud detection rules causing additional DB queries\n2. Payment SDK making synchronous external calls\n3. Index rebuild affected checkout-related queries\n```\n\n#### Step 6: Test Against Specification\n\nFor each possible cause, verify it explains ALL IS and IS-NOT:\n\n| Possible Cause | Explains IS? | Explains IS-NOT? | Verdict |\n|----------------|--------------|------------------|---------|\n| Fraud rules | ✓ Only checkout | ✓ Only write ops | ✓ Possible |\n| Payment SDK | ✓ Only checkout | ✗ Would affect all regions | ✗ Ruled out |\n| Index rebuild | ✓ DB layer | ✗ Would affect all queries | ✗ Ruled out |\n\n#### Step 7: Verify True Cause\n\nDesign verification to confirm or rule out:\n\n```markdown\nVerification plan for \"Fraud detection rules\":\n1. Check timing: Rules enabled 8:45 AM (matches)\n2. Check scope: Rules only on checkout (matches)\n3. Test: Disable rules in canary, measure latency\n4. Examine: Query logs for fraud check queries\n```\n\n### IS/IS-NOT Template\n\n```markdown\n# Problem Analysis: [Problem Statement]\nDate: [Date]\n\n## Problem Specification\n\n### What\n| Question | IS | IS NOT | Distinction |\n|----------|-----|---------|-------------|\n| What object has the problem? | | | |\n| What specifically is wrong? | | | |\n\n### Where\n| Question | IS | IS NOT | Distinction |\n|----------|-----|---------|-------------|\n| Where is the problem observed? | | | |\n| Where on the object is it? | | | |\n\n### When\n| Question | IS | IS NOT | Distinction |\n|----------|-----|---------|-------------|\n| When first observed? | | | |\n| Any pattern to occurrence? | | | |\n\n### Extent\n| Question | IS | IS NOT | Distinction |\n|----------|-----|---------|-------------|\n| How many/much affected? | | | |\n| Is it changing? | | | |\n\n## Distinctions Summary\n1. [Unique characteristic]\n2. [Unique characteristic]\n\n## Changes Near Distinctions\n| Change | When | What Changed |\n|--------|------|--------------|\n| | | |\n\n## Possible Causes\n| # | Cause | Based on |\n|---|-------|----------|\n| 1 | | Distinction + Change |\n\n## Cause Testing\n| Cause | Explains IS | Explains IS-NOT | Verdict |\n|-------|-------------|-----------------|---------|\n| | | | |\n\n## Verification Plan\n- [ ] [Test to confirm/rule out most likely cause]\n\n## Confirmed Root Cause\n[Cause with evidence]\n```\n\n---\n\n## Process 3: Decision Analysis (DA)\n\nUse when choosing among alternatives with multiple criteria.\n\n### Purpose\n\nMake systematic, defensible decisions by separating MUSTS from WANTS and scoring alternatives objectively.\n\n### Steps\n\n#### Step 1: Clarify the Decision\n\nState the decision clearly:\n\n```text\n\"Select a message queue system for order processing\"\n\"Choose deployment strategy for the new auth service\"\n```\n\n#### Step 2: Develop Objectives\n\nList what the decision must accomplish:\n\n```markdown\nObjectives:\n- Handle 10K messages/second throughput\n- Provide at-least-once delivery guarantees\n- Support multiple consumer groups\n- Minimize operational overhead\n- Stay within $5K/month budget\n- Integrate with existing monitoring\n```\n\n#### Step 3: Classify as MUST vs WANT\n\n**MUST:** Non-negotiable requirements (pass/fail)\n**WANT:** Desirable attributes (weighted scoring)\n\n| Objective | MUST/WANT | Weight (1-10) |\n|-----------|-----------|---------------|\n| 10K msg/sec throughput | MUST | - |\n| At-least-once delivery | MUST | - |\n| Under $5K/month | MUST | - |\n| Multiple consumer groups | WANT | 9 |\n| Low operational overhead | WANT | 8 |\n| Existing monitoring integration | WANT | 6 |\n| Strong community/docs | WANT | 5 |\n| Team familiarity | WANT | 4 |\n\n#### Step 4: Generate Alternatives\n\nList viable options:\n\n```markdown\nAlternatives:\nA. Apache Kafka (self-managed)\nB. AWS SQS + SNS\nC. RabbitMQ (self-managed)\nD. Amazon MSK (managed Kafka)\n```\n\n#### Step 5: Screen Against MUSTs\n\n| Alternative | 10K msg/sec | At-least-once | Under $5K | MUST Score |\n|-------------|-------------|---------------|-----------|------------|\n| Kafka | ✓ Yes | ✓ Yes | ✓ Yes | PASS |\n| SQS+SNS | ✓ Yes | ✓ Yes | ✓ Yes | PASS |\n| RabbitMQ | ✗ ~5K limit | ✓ Yes | ✓ Yes | FAIL |\n| MSK | ✓ Yes | ✓ Yes | ✗ ~$8K | FAIL |\n\nRabbitMQ and MSK eliminated—don't meet MUSTs.\n\n#### Step 6: Score Against WANTs\n\nRate each alternative 1-10 on each WANT:\n\n| WANT (Weight) | Kafka | SQS+SNS |\n|---------------|-------|---------|\n| Consumer groups (9) | 10 | 7 |\n| Low ops overhead (8) | 4 | 9 |\n| Monitoring integration (6) | 7 | 10 |\n| Community/docs (5) | 10 | 8 |\n| Team familiarity (4) | 3 | 8 |\n\n#### Step 7: Calculate Weighted Scores\n\n| WANT | Weight | Kafka Score | Kafka Weighted | SQS Score | SQS Weighted |\n|------|--------|-------------|----------------|-----------|--------------|\n| Consumer groups | 9 | 10 | 90 | 7 | 63 |\n| Low ops overhead | 8 | 4 | 32 | 9 | 72 |\n| Monitoring | 6 | 7 | 42 | 10 | 60 |\n| Community | 5 | 10 | 50 | 8 | 40 |\n| Team familiarity | 4 | 3 | 12 | 8 | 32 |\n| **TOTAL** | | | **226** | | **267** |\n\nSQS+SNS scores higher on weighted WANTs.\n\n#### Step 8: Assess Risks (→ feeds into PPA)\n\nBefore final decision, consider adverse consequences:\n\n| Alternative | Risk | Likelihood | Severity |\n|-------------|------|------------|----------|\n| SQS+SNS | Message ordering challenges | Medium | High |\n| SQS+SNS | Vendor lock-in | High | Medium |\n| Kafka | Operational complexity | High | High |\n\n#### Step 9: Make Decision\n\nConsider scores AND risks to make final choice. Document rationale.\n\n### DA Template\n\n```markdown\n# Decision Analysis: [Decision Statement]\nDate: [Date]\nDecision Maker: [Name]\n\n## Objectives\n| Objective | MUST/WANT | Weight |\n|-----------|-----------|--------|\n| | | |\n\n## Alternatives\n1. [Option A]\n2. [Option B]\n3. [Option C]\n\n## MUST Screening\n| Alternative | MUST 1 | MUST 2 | MUST 3 | Pass/Fail |\n|-------------|--------|--------|--------|-----------|\n| | | | | |\n\n## WANT Scoring\n| WANT (Weight) | Alt A | Alt B | Alt C |\n|---------------|-------|-------|-------|\n| (w) | score | score | score |\n| **Weighted Total** | | | |\n\n## Risk Assessment\n| Alternative | Risk | L | S | Mitigation |\n|-------------|------|---|---|------------|\n| | | | | |\n\n## Decision\n**Selected:** [Alternative]\n**Rationale:** [Why this choice given scores and risks]\n```\n\n---\n\n## Process 4: Potential Problem Analysis (PPA)\n\nUse after making a decision or before implementation to anticipate and prevent problems.\n\n### Purpose\n\nIdentify what could go wrong with a planned action and develop preventive/contingent actions.\n\n### Steps\n\n#### Step 1: State the Plan\n\nDescribe what will be implemented:\n\n```markdown\nPlan: Migrate order service from monolith to microservice\nTimeline: 4 weeks\nKey changes: New service, message queue, database split\n```\n\n#### Step 2: Identify Potential Problems\n\nWalk through the plan and ask \"What could go wrong?\":\n\n```markdown\nPotential problems:\n1. Message queue loses orders during migration\n2. New service has undiscovered bugs in production\n3. Database sync fails, causing data inconsistency\n4. Rollback needed but unclear how to reverse\n5. Performance degradation under load\n6. Team lacks Kafka operational knowledge\n```\n\n#### Step 3: Assess Each Potential Problem\n\nRate probability (P) and seriousness (S):\n\n| Potential Problem | Probability | Seriousness | P×S |\n|-------------------|-------------|-------------|-----|\n| Lost orders | Medium | Critical | HIGH |\n| Undiscovered bugs | High | High | HIGH |\n| Data sync failure | Medium | Critical | HIGH |\n| Rollback unclear | Medium | High | MEDIUM |\n| Performance issues | Medium | Medium | MEDIUM |\n| Kafka knowledge gap | High | Medium | MEDIUM |\n\n#### Step 4: Identify Likely Causes\n\nFor high P×S problems, determine probable causes:\n\n```markdown\nProblem: Message queue loses orders\nLikely causes:\n- Consumer crashes before acknowledgment\n- Queue overflow during peak\n- Network partition between services\n- Misconfigured dead letter queue\n```\n\n#### Step 5: Develop Preventive Actions\n\nActions to reduce probability of cause occurring:\n\n| Cause | Preventive Action | Owner |\n|-------|-------------------|-------|\n| Consumer crash | Implement idempotent processing with transactional outbox | Dev team |\n| Queue overflow | Configure auto-scaling, set appropriate limits | Platform |\n| Network partition | Deploy in same availability zone initially | Infra |\n| DLQ misconfigured | Pre-production DLQ testing with failure injection | QA |\n\n#### Step 6: Develop Contingent Actions\n\nActions to reduce impact IF problem occurs:\n\n| Problem | Contingent Action | Trigger |\n|---------|-------------------|---------|\n| Lost orders | Replay from audit log, manual reconciliation | Order count mismatch \u003e 0.1% |\n| Data sync failure | Activate sync monitor, pause writes, manual fix | Sync lag \u003e 5 minutes |\n| Performance issues | Activate circuit breaker, failover to monolith | p99 \u003e 2s for 5 min |\n\n#### Step 7: Build Monitoring/Triggers\n\nDefine how you'll detect problems and when to activate contingent actions.\n\n### PPA Template\n\n```markdown\n# Potential Problem Analysis: [Plan Name]\nDate: [Date]\n\n## Plan Summary\n[Brief description of what will be implemented]\n\n## Potential Problems\n| # | Problem | P (H/M/L) | S (H/M/L) | Priority |\n|---|---------|-----------|-----------|----------|\n| 1 | | | | |\n\n## High-Priority Problem Analysis\n\n### Problem: [Name]\n**Likely Causes:**\n1. [Cause]\n\n**Preventive Actions:**\n| Cause | Action | Owner | Due |\n|-------|--------|-------|-----|\n| | | | |\n\n**Contingent Actions:**\n| Trigger | Action | Owner |\n|---------|--------|-------|\n| | | |\n\n## Monitoring Plan\n| What to Monitor | Threshold | Alert | Response |\n|-----------------|-----------|-------|----------|\n| | | | |\n\n## Review Schedule\n- Pre-implementation review: [Date]\n- Post-implementation check: [Date]\n```\n\n---\n\n## Integrating the Four Processes\n\nTypical flow for complex situations:\n\n```text\n1. SA: \"We have multiple issues after the release\"\n   → Separate concerns, prioritize\n   → P0: Production errors (needs PA)\n   → P1: Architecture decision (needs DA)\n   → P2: Future release risks (needs PPA)\n\n2. PA: Investigate production errors\n   → IS/IS-NOT analysis\n   → Identify root cause\n   → Feeds solution options into DA\n\n3. DA: Choose solution approach\n   → Define objectives\n   → Score alternatives\n   → Select best option\n   → Risk assessment feeds PPA\n\n4. PPA: Plan implementation\n   → Identify what could go wrong\n   → Preventive and contingent actions\n   → Monitoring plan\n```\n\n## Integration with Other Thinking Skills\n\n| Skill | Integration Point |\n|-------|-------------------|\n| **thinking-pre-mortem** | Use as input to PPA—pre-mortem identifies problems, PPA develops mitigations |\n| **thinking-inversion** | Use in PA—invert \"what would cause this?\" to identify possible causes |\n| **thinking-first-principles** | Use in DA—challenge MUST criteria, are they truly fundamental? |\n| **thinking-debiasing** | Apply checklist when scoring DA alternatives, evaluating PA causes |\n| **thinking-systems** | Use in SA—understand how concerns interconnect, avoid siloed analysis |\n| **tools-debugging-root-cause** | PA complements debugging—PA for systematic cause identification, debugging for code-level investigation |\n\n## Verification Checklist\n\n- [ ] Used appropriate KT process for the situation type\n- [ ] SA: All concerns listed, separated, and prioritized with TIT criteria\n- [ ] PA: IS/IS-NOT fully specified across all four dimensions\n- [ ] PA: Each possible cause tested against specification\n- [ ] DA: MUST/WANT clearly separated, MUSTs are truly non-negotiable\n- [ ] DA: Weighted scores calculated, not just intuition\n- [ ] PPA: High P×S problems have both preventive and contingent actions\n- [ ] PPA: Triggers defined for contingent action activation\n- [ ] Analysis documented for future reference and team alignment\n\n## Key Questions by Process\n\n### Situation Analysis\n\n- \"What are ALL the concerns we're facing?\"\n- \"Is this one problem or several?\"\n- \"What's the timing, impact, and trend?\"\n- \"Which process should we use for each concern?\"\n\n### Problem Analysis\n\n- \"What specifically IS happening vs IS NOT?\"\n- \"What's unique about where/when this occurs?\"\n- \"What changed in, on, or around the distinctions?\"\n- \"Does this cause explain BOTH the IS and IS-NOT?\"\n\n### Decision Analysis\n\n- \"What are the MUST-have requirements?\"\n- \"How important is each WANT relative to others?\"\n- \"How well does each alternative satisfy each objective?\"\n- \"What risks come with each alternative?\"\n\n### Potential Problem Analysis\n\n- \"What could go wrong with this plan?\"\n- \"What would cause each problem?\"\n- \"How can we prevent the cause?\"\n- \"If it happens anyway, how do we minimize damage?\"\n"},"import":{"commit_sha":"a31e22d4445ad8fef7cd771d32af537aebb68c49","imported_at":"2026-05-22T21:14:39Z","license_text":"MIT License\n\nCopyright (c) 2025 TJ Boudreaux\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n","owner":"tjboudreaux","repo":"tjboudreaux/cc-thinking-skills","source_url":"https://github.com/tjboudreaux/cc-thinking-skills/tree/a31e22d4445ad8fef7cd771d32af537aebb68c49/skills/thinking-kepner-tregoe"}},"content_hash":[172,229,222,53,104,240,14,83,162,191,206,71,161,6,152,163,91,136,116,149,239,145,165,86,234,216,42,65,212,96,161,220],"trust_level":"unsigned","yanked":false}
