testing v1.0.0 · Updated Apr 17, 2026 · by Justin Adams

Verify Before Done

Forces Claude Code to run typecheck and the full test suite before marking any task complete. Eliminates 'it should work' hand-waving.

Claude Code Cursor
$ curl -fsSL https://www.cendis.ai/library/skill/verify-before-done/install | sh

What it does

Blocks the agent from declaring a task done until typecheck and the full test suite have actually run and passed. If anything fails, the agent fixes it without asking — no “should be good” without proof.

When to use it

  • Before every commit, in any project with a real test suite
  • When you’re a non-technical reviewer who can’t spot type errors in a diff
  • In CI/CD-heavy workflows where failing locally is cheaper than failing in pipeline
  • Anywhere an agent has previously claimed “all done” only for CI to fail

How it works

Add to your CLAUDE.md:

## Verify Before Done

Never mark a task complete without proving it works:

1. Run `{{typecheck_command}}` from the project root
2. Run `{{test_command}}` from the project root
3. For user-facing changes, run E2E tests (`{{e2e_command}}`)
4. Report results clearly:
   - List any type errors with file and line
   - List any test failures with test name and assertion
5. If all pass: confirm "Ready to commit"
6. If anything fails: fix all issues before marking the task done — do not ask, just fix

Configure for your stack

Stacktypecheckteste2e
Bun + TSbun run typecheckbun testbun run test:e2e
Node + TStsc --noEmitnpm testnpx playwright test
pnpm + TSpnpm typecheckpnpm testpnpm test:e2e
Pythonmypy .pytest

Example

Without this skill: Agent finishes a feature, says “Done — implementation looks correct.” You merge. CI fails on a type error in an unrelated file the agent forgot it touched.

With this skill: Agent finishes the feature, runs typecheck (catches the cross-file type error), fixes it, runs tests, reports “Typecheck clean, 142 tests passed. Ready to commit.”

Why it matters

LLM-generated code looks plausible at a glance. Type errors and test failures are how you know the code actually does what it claims. Treating these as a hard gate — not a courtesy — turns “looks done” into “is done.” This is especially important when the human reviewer doesn’t read code and is relying on the agent’s report.