February 2026

The 8 Guardrails You Need Before Letting AI Write Production Code

Starting at the end.

I’ve been thinking about how LLMs fit into our workflow, and one thing keeps popping into my head: Users don’t care how we wrote the code. They just care if our software works.

For arguments sake, let’s look at a non-exhaustive list of the fundamental qualities of “good software”:

  • UX: Accessibility, clarity, ease of use, usable on any device with any input
  • Speed & Performance: “Things happen” As fast as possible
  • Reliability & Uptime: Up-and-working, as much as possible
  • Features that solve real problems: Well thought out solutions to customer problems [1]
  • Data integrity & Security: No data loss, no data leaks

So… How do you ensure code generated by LLMs satisfies those requirements?

LLMs can generate pretty damn good code quickly - which is a genuine needle-mover in our industry, but fast does not equal “good enough for paying customers”.

What’s the missing link?

Fast code output + ? = Great Software

I think the ? is actually two things: Test infrastructure matters so much more now, but systems thinking and translating real world customer problems into technical solutions at a level that is worth paying for, is still a done-by-humans task.

So the missing link is:

Fast Code Output + Comprehensive Test Infrastructure + Translating Customer Problems into Technical Solutions = Great Software

JIRA tickets will no longer be:

Implement this code in this file -> Coder writes Code

They will be:

Our customer has this problem -> Coder plans solution -> Ensures tests capture new feature expectations.

How broad does test infrastructure need to be for quality AI-driven development?

I’ve coined COMPUTED, for Automated test infrastructure, the 8-factor guardrails you need to have around your application code to ensure quality. These feedback loops will guide frontier models to produce results that are high enough in quality that you’ll forget about application code.

  • Compliance: Coding Standards, linting, formatting, visual Diffs
  • Observability: Logs, metrics, traces, monitoring
  • Mutations: Test LLM generated tests by injecting bugs
  • Performance: Speed, load, scalability benchmark tools for comparing existing code versus new candidate-code
  • Unit: Isolated component testing
  • Typing: Type safety & annotations - A new era for types: very good for constraining outputs and as-the-llm-reads-code discovery
  • E2E: Full user workflows
  • Data Integrity: Consistency & validation

How do COMPUTED tests help deliver a good product that satisfies what customers need?

UX

  • E2E: Validates complete user journeys work end-to-end
  • Compliance: Enforces accessibility & design standards, and captures visual regressions

Speed & Performance

  • Performance: Directly measures and prevents slowdowns
  • Observability: Identifies performance bottlenecks in production

Reliability & Uptime

  • Unit: Catches bugs before they reach production
  • Observability: Enables fast incident detection & resolution
  • Mutation: Ensures generated tests actually catches failures

Features that Solve Real Problems

  • E2E: Validates features work in real-world scenarios
  • Unit + Typing: Enables confident, rapid feature development
  • Mutation: Strong tests = ship faster with less risk

Data Integrity & Security

  • Data Integrity: Validates consistency, constraints, migrations
  • Compliance: Enforces security standards & vulnerability checks
  • Typing: Prevents data type errors that cause corruption

When do I write application code?

I think it’s pretty clear that writing application code is going away, and even setting up COMPUTED tests LLMs will do alot of the work for you - the goal is to iterate towards confidence in LLM generated code now. Learn to Stop Worrying and Love the Bomb.

Imagine a world where you stopped writing application code today, you would still have a tonne of work implementing COMPUTED - along with SRE and Ops, in depth feature capture, planning, and testing - that’s enough to keep us all busy for a long time.

P.S [1] The standout point here is feature delivery, and the architecture design of those features: Well thought-out technical solutions to real world problems -is still the crucial point for a human to bridge right now, and I believe and going forward.

  1. Human spends more time ideating on ambitious and thorough solutions
  2. Human spends less time coding, LLMs can build multiple options where resource-limitations might’ve meant “one and done” attempts, Multivariant testing everything becomes possible. Capture analytics on deployed code, with observability monitoring, sentry, UserPilot etc.
  3. COMPUTED tests ensure as much of LLM generated code as possible is ready-to-ship.

How Can I start Implementing COMPUTED today?

In the Javascript ecosystem, here are a list of tools you can started with, you may be using many of them already:

C - Compliance

  • Linting: ESLint, Biome, Rome (deprecated but influential), TSLint (deprecated)
  • Formatting: Prettier, dprint, Biome formatter, oxfmt (via Biome)
  • Visual Diffs: Percy, Chromatic, Vizdiff, Applitools Eyes, BackstopJS
  • Code Standards: SonarQube, CodeClimate, DeepSource
  • Accessibility: axe-core, eslint-plugin-jsx-a11y, Pa11y, Lighthouse CI
  • Security Scanning: Snyk, npm audit, Socket, OWASP Dependency-Check

O - Observability

  • Logging: Winston, Pino, Bunyan, Log4js, console (structured)
  • Metrics: Prometheus client, StatsD, OpenTelemetry
  • Tracing: OpenTelemetry, Jaeger client, Zipkin, Datadog APM, New Relic
  • Monitoring: Sentry, Datadog, New Relic, Grafana, LogRocket, FullStory
  • APM: Elastic APM, AppDynamics, Dynatrace
  • Error Tracking: Sentry, Rollbar, Bugsnag, TrackJS

M - Mutations

  • Mutation Testing: Stryker, Stryker Mutator, mutode
  • Fault Injection: Chaos Monkey (Netflix), Gremlin (but more infrastructure)

P - Performance

  • Benchmarking: Benchmark.js, TinyBench, Vitest bench, AutoCannon (HTTP)
  • Load Testing: k6, Artillery, Gatling, Apache JMeter, Autocannon
  • Profiling: Clinic.js, 0x, Node.js —inspect, Chrome DevTools
  • Bundle Analysis: Webpack Bundle Analyzer, Rollup Plugin Visualizer, Bundle Buddy
  • Lighthouse: Lighthouse CI, PageSpeed Insights API, WebPageTest
  • Performance Monitoring: SpeedCurve, Calibre, DebugBear

U - Unit

  • Test Runners: Vitest, Jest, Mocha, AVA, uvu, Node:test (native)
  • Assertion Libraries: Chai, expect (Jest/Vitest), assert (Node.js native)
  • Mocking: Sinon, Jest mocks, Vitest mocks, testdouble.js
  • Test Utilities: Testing Library, Enzyme (legacy React)
  • Snapshot Testing: Jest snapshots, Vitest snapshots

T - Typing

  • Type Systems: TypeScript, Flow (Meta)
  • Runtime Validation: Zod, Yup, Joi, Ajv, io-ts, Valibot, Arktype
  • Type Checking: tsc (TypeScript compiler), Flow CLI
  • JSDoc: TypeScript + JSDoc, better-docs
  • Schema Validation: JSON Schema + Ajv, TypeBox, Effect Schema
  • API Contract: OpenAPI/Swagger validators, tRPC

E - E2E

  • Browser Automation: Playwright, Cypress, Puppeteer, Selenium WebDriver
  • Testing Frameworks: Playwright Test, Cypress, WebdriverIO, Nightwatch.js, TestCafe
  • Visual Regression: Percy, Chromatic, BackstopJS (also in Compliance)
  • API Testing: Supertest, Postman/Newman, REST-assured, Pactum
  • Mobile: Appium, Detox, Maestro (React Native)

D - Data Integrity

  • Database Migrations: Prisma Migrate, Knex.js, TypeORM migrations, Sequelize migrations, db-migrate
  • Validation: Zod, Yup, Joi, class-validator, Superstruct
  • Schema Management: Prisma, TypeORM, Mongoose schemas, Drizzle ORM
  • Seed Data: Prisma seed, Knex seed, Faker.js, @faker-js/faker
  • Backup/Integrity: pg_dump (PostgreSQL), mysqldump, MongoDB tools
  • Consistency Checks: Database constraint testing, referential integrity validators
  • Transaction Testing: Database transaction rollback testing in tests

Honorable mentions for multi-category tools:

  • Nx: Monorepo tooling with linting, testing, and build orchestration
  • Turborepo: Monorepo with caching and pipeline orchestration
  • GitHub Actions / GitLab CI: Can orchestrate all COMPUTED checks
  • Husky + lint-staged: Pre-commit hooks for compliance
  • Docker / Testcontainers: Consistent test environments for data integrity