jshez

June 2025

Your Test Data Is Hiding Bugs

I shipped a user directory feature that worked perfectly in development. Every name fit nicely in its card, the layout was clean, and my product manager was thrilled. Two days after launch, our support inbox exploded with complaints about “broken profiles” and “cut-off text.”

The problem? My test data was all “John Smith” and “Jane Doe” – nice, short, ASCII names that fit perfectly in my 200-pixel-wide cards. Real users had names like “María Fernández (González)” and “Αλέξανδρος Παπαδόπουλος” that broke my layout completely. My test data hadn’t just failed to catch the bug – it had actively hidden it.

This wasn’t a one-off mistake. Over the years, I’ve watched “User 1” and “test@email.com” hide performance problems, internationalization bugs, and missing error states that only surfaced when real users started clicking around. The pattern was always the same: my generic test data made everything look fine, while real data revealed fundamental problems with my app.

Your test data is probably hiding bugs right now. If you can write a function that returns a random number, you can create realistic seed data that will surface these problems before your users do.

What Makes Fake Data Actually Useful?

The problem with most fake data isn’t that it’s fake – it’s that it’s boring. When you populate your user table with five records that all look like “User 1”, “User 2”, “User 3”, you’re not testing anything real.

Good fake data has three qualities: it’s realistic, it’s varied, and it tells a story. Instead of “John Smith”, you want “María González” who works at “Quantum Computing Solutions” and has an email like “m.gonzalez@qcs-innovations.com”. You want users with different name lengths, different character sets, different company structures.

The goal isn’t to trick anyone – it’s to surface problems before your users do.

Here’s something else that took me years to understand: realistic fake data reveals UX flaws that completely random fake data can’t. When your test user is “Priyanka Chakraborty” instead of “User 1”, you immediately notice that your name field cuts off after 15 characters. When your fake company is “Artificial Intelligence Research Laboratory” instead of “Company”, you see that your card layout breaks with longer text. Random gibberish might stress-test your backend, but realistic data stress-tests your user experience. Those awkward line breaks, truncated labels, and cramped layouts that make your app feel broken? They hide when your test data is all the same length and complexity.

But there’s an even deeper layer here. When you use realistic, coherent data – complete user profiles with job titles, team memberships, credit balances, and company types – you start spotting information architecture problems that totally random faker data masks. You’ll notice when a user’s job title is missing from a team directory, or when someone’s credit balance is mysteriously exposed on a public profile page, or when your admin panel shows “undefined” where a department name should be. Realistic fake data makes these gaps obvious because your brain expects the information to make sense together. With random data, missing pieces just look like more randomness.

The Tools That Make It Simple (And Domain-Specific)

Here’s where most developers get intimidated. They think creating realistic fake data means building some complex data generation system. But tools like Faker.js make it almost trivial.

import { faker } from '@faker-js/faker';
const user = {
id: faker.string.uuid(),
firstName: faker.person.firstName(),
lastName: faker.person.lastName(),
email: faker.internet.email(),
company: faker.company.name(),
avatar: faker.image.avatar(),
bio: faker.lorem.paragraph(),
createdAt: faker.date.past(),
};

That’s it. One function call gives you a realistic name. Another gives you a plausible company. The email addresses look real, the dates make sense, and the UUIDs are properly formatted.

But here’s where Faker.js really shines: custom templates that speak your domain’s language. Instead of generic companies like “Smith LLC”, you can create templates that generate vocabulary specific to your industry:

// For a healthcare app
const medicalProvider = {
facilityName: faker.helpers.arrayElement([
'Regional Medical Center',
'Community Health Clinic',
'Specialty Care Associates',
'Family Medicine Group'
]),
department: faker.helpers.arrayElement([
'Cardiology', 'Pediatrics', 'Emergency Medicine',
'Orthopedics', 'Internal Medicine'
]),
specialty: faker.helpers.arrayElement([
'Interventional Cardiology', 'Pediatric Surgery',
'Sports Medicine', 'Endocrinology'
])
};
// For a B2B SaaS app
const saasCustomer = {
companyType: faker.helpers.arrayElement([
'Software Development', 'Digital Marketing Agency',
'E-commerce Platform', 'Financial Services'
]),
role: faker.helpers.arrayElement([
'Engineering Manager', 'Product Owner',
'DevOps Engineer', 'Technical Lead'
]),
techStack: faker.helpers.arrayElements([
'React', 'Node.js', 'Python', 'AWS', 'Docker', 'Kubernetes'
], { min: 2, max: 4 })
};

This approach creates fake data that doesn’t just look real – it looks real in your specific context. When your healthcare app shows “Dr. Sarah Chen, Interventional Cardiology, Regional Medical Center” instead of “User 1, Department A, Company B”, you immediately spot when job titles wrap poorly, when department names get truncated, or when your search functionality fails to find specialists correctly. The domain-specific vocabulary makes edge cases obvious because it mirrors exactly what your real users will input.

Want to get even more sophisticated? You can make the data internally consistent and domain-aware:

const firstName = faker.person.firstName();
const lastName = faker.person.lastName();
const company = faker.company.name();
const domain = company.toLowerCase().replace(/[^a-z0-9]/g, '');
const email = faker.internet.email({
firstName,
lastName,
provider: `${domain}.com`
});

Now your users have email addresses that actually match their companies, and you’ll catch bugs where email domain parsing fails or where company-email mismatches break your authentication logic.

Why Volume Matters More Than You Think

Here’s something I learned the hard way: five fake users will never catch the bugs that 500 fake users will. Performance issues, UI layout problems, database query inefficiencies – they all hide when you’re working with tiny datasets.

I now create thousands of records by default. It takes maybe 30 seconds to generate, but it instantly reveals problems that would have bitten me later:

  • Does your pagination actually work?
  • What happens when user names are really long?
  • How does your search perform with realistic data volumes?
  • Do your database indexes actually help?

The difference between testing with 10 records and 10,000 records is night and day.

Creating Data That Tells Stories

Random data is good, but patterned data is better. I like to create fake data that represents realistic user journeys:

// Create some power users with lots of activity
for (let i = 0; i < 50; i++) {
const user = createUser();
const projectCount = faker.number.int({ min: 10, max: 50 });
for (let j = 0; j < projectCount; j++) {
createProject(user.id);
}
}
// Create some new users with minimal activity
for (let i = 0; i < 200; i++) {
const user = createUser();
const projectCount = faker.number.int({ min: 0, max: 2 });
for (let j = 0; j < projectCount; j++) {
createProject(user.id);
}
}

This gives you data that reflects reality: most users are new and haven’t done much, but a small percentage are power users who stress-test your system.

Making Your Team Love Your Fake Data

Nothing frustrates a designer or product manager like opening a staging environment and seeing “User 1” everywhere. Good fake data makes your app feel real, which helps everyone evaluate it properly.

I’ve seen design reviews completely change tone when we switched from generic test data to realistic fake data. Suddenly, everyone could imagine real people using the app. Edge cases became obvious. The conversation shifted from “Does this work?” to “Does this work well?”

Make your fake data readable and professional. Use real-looking names, companies, and addresses. If you’re building a B2B app, create fake companies that sound like places people actually work. If you’re building a consumer app, create users who feel like real people.

The Simple Recipe That Works

Here’s my standard approach to fake seed data:

  1. Start with realistic individual records – Use Faker.js or similar tools to create data that looks real
  2. Create lots of it – Generate hundreds or thousands of records, not dozens
  3. Add patterns – Mix power users with casual users, old data with new data
  4. Make it consistent – Ensure related data actually relates (matching email domains to company names, etc.)
  5. Test edge cases – Intentionally create some records with very long names, empty fields, special characters

The whole process takes maybe an hour to set up initially, then runs automatically every time you reset your development database.

Beyond Just Users and Posts

Don’t limit yourself to basic entities. Create fake data for everything:

  • Financial transactions with realistic amounts and patterns
  • Geographic data with real city/state combinations
  • Time-series data that follows realistic usage patterns
  • File uploads with varied file sizes and types
  • API logs with realistic response times and error rates

The more realistic your fake data, the more confident you can be that your app will handle real data gracefully.

Start Simple, Get Sophisticated

You don’t need to build the perfect fake data system on day one. Start with basic Faker.js calls to replace your “test1”, “test2”, “test3” placeholders. Even that small change will immediately make your app feel more real and catch more bugs.

As you get comfortable, you can get more sophisticated. Add relationships between records. Create data that follows realistic patterns. Build seed scripts that create entire realistic scenarios.

What I’ve found over and over with fake data is that it’s not hard – it just requires thinking beyond the happy path. The tools are simple, the concepts are straightforward, and the payoff is immediate.

Good fake data has made me a better developer. It’s caught bugs I never would have found, helped me optimize queries I didn’t know were slow, and made every demo and design review more productive. If you’re still using “test123” as your go-to fake data, I can’t recommend strongly enough taking an hour to set up something realistic. Your future self will thank you.