SKILL.md

When to Activate

Apply TDD to:

New features
Bug fixes
Refactors that change behavior
Any production code change

Exceptions — require explicit human approval before skipping:

Throwaway prototypes (must be deleted, not committed)
Auto-generated code (scaffolding, code-gen output)
Configuration files with no executable logic

Thinking “skip TDD just this once”? Stop. That is rationalization.

Skip When (routing to another skill)

User wants to backfill tests for existing code only — use codi-test-suite
User is debugging an existing failing test — use codi-debugging
User wants dead-code cleanup without behavior change — use codi-refactoring
User is planning the feature, not implementing it yet — use codi-plan-writer

The Iron Law

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST. If code was written before the test, delete it entirely and start over.

No exceptions:

Do not keep it as “reference”
Do not “adapt” it while writing tests
Do not look at it while writing tests
Delete means delete

Implement fresh from tests.

RED Phase

Write one failing test that describes the next required behavior.

Requirements:

One behavior per test
Clear name: 'does X when Y'
Tests real code usage — minimal mocking

Good:

test('retries failed operations 3 times', async () => {
  let attempts = 0;
  const operation = () => {
    attempts++;
    if (attempts < 3) throw new Error('fail');
    return 'success';
  };

  const result = await retryOperation(operation);

  expect(result).toBe('success');
  expect(attempts).toBe(3);
});

Clear name. Tests real behavior. One thing.

Bad:

test('retry works', async () => {
  const mock = jest.fn()
    .mockRejectedValueOnce(new Error())
    .mockRejectedValueOnce(new Error())
    .mockResolvedValueOnce('success');
  await retryOperation(mock);
  expect(mock).toHaveBeenCalledTimes(3);
});

Vague name. Tests mock call count, not actual retry logic. If retryOperation calls the mock differently, this test breaks without proving the behavior is correct.

Verify RED

This step is MANDATORY and cannot be skipped.

Run the test and confirm all four conditions:

Test fails (does not error out)
Failure message matches the expected assertion
Failure is caused by the missing feature, not a typo or import error
No other tests were broken

npm test path/to/test.test.ts

If the test passes immediately: it tests existing behavior or is written wrong — fix the test before continuing.

If the test errors out: resolve the error (import, syntax, type) and rerun until it fails correctly.

GREEN Phase

Write the minimal code to make the failing test pass.

Good:

async function retryOperation<T>(fn: () => T | Promise<T>): Promise<T> {
  for (let i = 0; i < 3; i++) {
    try {
      return await fn();
    } catch (e) {
      if (i === 2) throw e;
    }
  }
  throw new Error('unreachable');
}

Just enough to pass the test.

Bad:

async function retryOperation<T>(
  fn: () => Promise<T>,
  options?: {
    maxRetries?: number;
    backoff?: 'linear' | 'exponential';
    onRetry?: (attempt: number, error: unknown) => void;
    timeout?: number;
  }
): Promise<T> {
  // ... full implementation with all options
}

Over-engineered. The test requires three retries — nothing more.

Do not add features, refactor other code, or “improve” beyond what the test requires.

Verify GREEN

This step is MANDATORY and cannot be skipped.

Run the full test suite and confirm:

The target test passes
All other tests still pass
Output is clean — no errors, no warnings

npm test

If the target test fails: fix the implementation, not the test.

If other tests fail: fix the regression before continuing.

REFACTOR Phase

Only after the test suite is fully green:

Remove duplication
Improve naming
Extract helper functions

Keep tests green. Do not add behavior.

Good Tests

Quality	Good	Bad
Minimal	One thing. “and” in name? Split it.	`test('validates email and domain and whitespace')`
Clear	Name describes behavior	`test('test1')` or `test('test works')`
Shows intent	Demonstrates desired API usage	Tests implementation details

Example: Bug Fix

Bug: Empty email accepted

RED

test('rejects empty email', async () => {
  const result = await submitForm({ email: '' });
  expect(result.error).toBe('Email required');
});

Verify RED

$ npm test
FAIL: expected 'Email required', got undefined

GREEN

function submitForm(data: FormData) {
  if (!data.email?.trim()) {
    return { error: 'Email required' };
  }
  // ...
}

Verify GREEN

$ npm test
PASS

REFACTOR Extract validation for multiple fields if needed. Keep tests green.

Rationalization Table

Rationalization	Reality
”Too simple to test”	Simple code breaks. Writing the test takes 30 seconds.
”I’ll test after”	Tests written after passing code pass immediately and prove nothing.
”Tests after achieve the same goals”	Tests-after answer “What does this do?” Tests-first answer “What should this do?"
"Already manually tested all the edge cases”	Manual testing is ad-hoc. No record, cannot re-run, easy to miss cases under pressure.
”Deleting X hours of work is wasteful”	Sunk cost fallacy. Keeping unverifiable code is the waste.
”I’ll keep it as reference and write tests first”	You will adapt it. That is testing after. Delete means delete.
”I need to explore the design first”	Fine — explore in a throwaway branch. Then delete it and start with TDD.
”The test is hard to write”	Listen to that signal. Hard to test means hard to use. Simplify the design.
”TDD will slow me down”	TDD is faster than debugging production. The slowdown is an illusion.
”Manual testing is faster”	Manual does not prove edge cases. You will re-test every change manually forever.
”The existing codebase has no tests”	You are improving it. Add a test for the behavior you are changing.

Red Flags — Stop and Restart

Any of these thoughts or situations require deleting the code and restarting from a failing test:

Code was written before the test
Test was created after the implementation
Test passes immediately on first run
Cannot explain why the test failed
Plan to “add tests later”
Thinking “just this once”
“I already manually tested it”
“Tests after achieve the same purpose”
“It is about the spirit, not the ritual”
“Keep as reference” or “adapt existing code”
“Already spent X hours — deleting is wasteful”
“TDD is dogmatic; I am being pragmatic”
“This case is different because…”

All of these mean: delete the code and start over with a failing test.

Verification Checklist

Before marking any task complete:

Every new function has a test
Watched each test fail before implementing
Each test failed for the expected reason (missing feature, not a typo)
Wrote minimal code to make each test pass
All tests pass
Test output is clean — no errors, no warnings
Tests use real code (mocks only when unavoidable)
Edge cases and error paths are covered

Cannot check all boxes? TDD was skipped. Start over.

When Stuck

Problem	Solution
Uncertain how to test the behavior	Write the desired API call first — start from the assertion, work backwards
Test is too complicated to write	The design is too complex — simplify the interface before continuing
Everything requires mocking	Code is too coupled — apply dependency injection to break the coupling
Test setup is enormous (arrange phase > 20 lines)	Extract setup helpers; if still complex, simplify the design

When adding mocks or test utilities, read ${CLAUDE_SKILL_DIR}[[/references/testing-anti-patterns.md]] to avoid common pitfalls such as testing mock behavior instead of real behavior, adding test-only methods to production classes, and mocking without understanding dependencies.

Integration

Use codi-debugging when bugs surface during TDD — write a failing test reproducing the bug first, then follow the TDD cycle. The test proves the fix and prevents regression.

Use codi-verification before claiming any task complete — the verification checklist above must pass in full.