February 2026

The 8 Guardrails You Need Before Letting AI Write Production Code

Starting at the end.So... How do you ensure code generated by LLMs satisfies those requirements?How broad does test infrastructure need to be for quality AI-driven development?How do COMPUTED tests help deliver a good product that satisfies what customers need?When do I write application code?How Can I start Implementing COMPUTED today?

Starting at the end.

I’ve been thinking about how LLMs fit into our workflow, and one thing keeps popping into my head: Users don’t care how we wrote the code. They just care if our software works.

For arguments sake, let’s look at a non-exhaustive list of the fundamental qualities of “good software”:

UX: Accessibility, clarity, ease of use, usable on any device with any input
Speed & Performance: “Things happen” As fast as possible
Reliability & Uptime: Up-and-working, as much as possible
Features that solve real problems: Well thought out solutions to customer problems [1]
Data integrity & Security: No data loss, no data leaks

So… How do you ensure code generated by LLMs satisfies those requirements?

LLMs can generate pretty damn good code quickly - which is a genuine needle-mover in our industry, but fast does not equal “good enough for paying customers”.

What’s the missing link?

Fast code output + ? = Great Software

I think the ? is actually two things: Test infrastructure matters so much more now, but systems thinking and translating real world customer problems into technical solutions at a level that is worth paying for, is still a done-by-humans task.

So the missing link is:

Fast Code Output + Comprehensive Test Infrastructure + Translating Customer Problems into Technical Solutions = Great Software

JIRA tickets will no longer be:

Implement this code in this file -> Coder writes Code

They will be:

Our customer has this problem -> Coder plans solution -> Ensures tests capture new feature expectations.

How broad does test infrastructure need to be for quality AI-driven development?

I’ve coined COMPUTED, for Automated test infrastructure, the 8-factor guardrails you need to have around your application code to ensure quality. These feedback loops will guide frontier models to produce results that are high enough in quality that you’ll forget about application code.

Compliance: Coding Standards, linting, formatting, visual Diffs
Observability: Logs, metrics, traces, monitoring
Mutations: Test LLM generated tests by injecting bugs
Performance: Speed, load, scalability benchmark tools for comparing existing code versus new candidate-code
Unit: Isolated component testing
Typing: Type safety & annotations - A new era for types: very good for constraining outputs and as-the-llm-reads-code discovery
E2E: Full user workflows
Data Integrity: Consistency & validation

How do COMPUTED tests help deliver a good product that satisfies what customers need?

E2E: Validates complete user journeys work end-to-end
Compliance: Enforces accessibility & design standards, and captures visual regressions

Speed & Performance

Performance: Directly measures and prevents slowdowns
Observability: Identifies performance bottlenecks in production

Reliability & Uptime

Unit: Catches bugs before they reach production
Observability: Enables fast incident detection & resolution
Mutation: Ensures generated tests actually catches failures

Features that Solve Real Problems

E2E: Validates features work in real-world scenarios
Unit + Typing: Enables confident, rapid feature development
Mutation: Strong tests = ship faster with less risk

Data Integrity & Security

Data Integrity: Validates consistency, constraints, migrations
Compliance: Enforces security standards & vulnerability checks
Typing: Prevents data type errors that cause corruption

When do I write application code?

I think it’s pretty clear that writing application code is going away, and even setting up COMPUTED tests LLMs will do alot of the work for you - the goal is to iterate towards confidence in LLM generated code now. Learn to Stop Worrying and Love the Bomb.

Imagine a world where you stopped writing application code today, you would still have a tonne of work implementing COMPUTED - along with SRE and Ops, in depth feature capture, planning, and testing - that’s enough to keep us all busy for a long time.

P.S [1] The standout point here is feature delivery, and the architecture design of those features: Well thought-out technical solutions to real world problems -is still the crucial point for a human to bridge right now, and I believe and going forward.

Human spends more time ideating on ambitious and thorough solutions
Human spends less time coding, LLMs can build multiple options where resource-limitations might’ve meant “one and done” attempts, Multivariant testing everything becomes possible. Capture analytics on deployed code, with observability monitoring, sentry, UserPilot etc.
COMPUTED tests ensure as much of LLM generated code as possible is ready-to-ship.

How Can I start Implementing COMPUTED today?

In the Javascript ecosystem, here are a list of tools you can started with, you may be using many of them already:

C - Compliance

Linting: ESLint, Biome, Rome (deprecated but influential), TSLint (deprecated)
Formatting: Prettier, dprint, Biome formatter, oxfmt (via Biome)
Visual Diffs: Percy, Chromatic, Vizdiff, Applitools Eyes, BackstopJS
Code Standards: SonarQube, CodeClimate, DeepSource
Accessibility: axe-core, eslint-plugin-jsx-a11y, Pa11y, Lighthouse CI
Security Scanning: Snyk, npm audit, Socket, OWASP Dependency-Check

O - Observability

Logging: Winston, Pino, Bunyan, Log4js, console (structured)
Metrics: Prometheus client, StatsD, OpenTelemetry
Tracing: OpenTelemetry, Jaeger client, Zipkin, Datadog APM, New Relic
Monitoring: Sentry, Datadog, New Relic, Grafana, LogRocket, FullStory
APM: Elastic APM, AppDynamics, Dynatrace
Error Tracking: Sentry, Rollbar, Bugsnag, TrackJS

M - Mutations

Mutation Testing: Stryker, Stryker Mutator, mutode
Fault Injection: Chaos Monkey (Netflix), Gremlin (but more infrastructure)

P - Performance

Benchmarking: Benchmark.js, TinyBench, Vitest bench, AutoCannon (HTTP)
Load Testing: k6, Artillery, Gatling, Apache JMeter, Autocannon
Profiling: Clinic.js, 0x, Node.js —inspect, Chrome DevTools
Bundle Analysis: Webpack Bundle Analyzer, Rollup Plugin Visualizer, Bundle Buddy
Lighthouse: Lighthouse CI, PageSpeed Insights API, WebPageTest
Performance Monitoring: SpeedCurve, Calibre, DebugBear

U - Unit

Test Runners: Vitest, Jest, Mocha, AVA, uvu, Node:test (native)
Assertion Libraries: Chai, expect (Jest/Vitest), assert (Node.js native)
Mocking: Sinon, Jest mocks, Vitest mocks, testdouble.js
Test Utilities: Testing Library, Enzyme (legacy React)
Snapshot Testing: Jest snapshots, Vitest snapshots

T - Typing

Type Systems: TypeScript, Flow (Meta)
Runtime Validation: Zod, Yup, Joi, Ajv, io-ts, Valibot, Arktype
Type Checking: tsc (TypeScript compiler), Flow CLI
JSDoc: TypeScript + JSDoc, better-docs
Schema Validation: JSON Schema + Ajv, TypeBox, Effect Schema
API Contract: OpenAPI/Swagger validators, tRPC

E - E2E

Browser Automation: Playwright, Cypress, Puppeteer, Selenium WebDriver
Testing Frameworks: Playwright Test, Cypress, WebdriverIO, Nightwatch.js, TestCafe
Visual Regression: Percy, Chromatic, BackstopJS (also in Compliance)
API Testing: Supertest, Postman/Newman, REST-assured, Pactum
Mobile: Appium, Detox, Maestro (React Native)

D - Data Integrity

Database Migrations: Prisma Migrate, Knex.js, TypeORM migrations, Sequelize migrations, db-migrate
Validation: Zod, Yup, Joi, class-validator, Superstruct
Schema Management: Prisma, TypeORM, Mongoose schemas, Drizzle ORM
Seed Data: Prisma seed, Knex seed, Faker.js, @faker-js/faker
Backup/Integrity: pg_dump (PostgreSQL), mysqldump, MongoDB tools
Consistency Checks: Database constraint testing, referential integrity validators
Transaction Testing: Database transaction rollback testing in tests

Honorable mentions for multi-category tools:

Nx: Monorepo tooling with linting, testing, and build orchestration
Turborepo: Monorepo with caching and pipeline orchestration
GitHub Actions / GitLab CI: Can orchestrate all COMPUTED checks
Husky + lint-staged: Pre-commit hooks for compliance
Docker / Testcontainers: Consistent test environments for data integrity