Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When possible, I try to use real data for both volumetry and heterogeneity testing.

It helps reveal unknowns in the problem space that synthetic data might miss.



This is very important and requires some foresight when the real data is personally identifiable information, private health information, etc.

It's possible, but requires designing a safe way to run pre-production code that touches production data. Which in practice means you better be sure you're only doing reads, not writes, and running your code in the production environment with all the same controls as your production code.


You are right. I have a pre-production environment with a copy of production data and a script that scramble names and personal infos.


I try to do UX design with real data too. Not sure if that is what you mean with heterogeneity?


Not quite UX-focused, but related

I meant data heterogeneity - the variety in formats, edge cases, and data quality you encounter in production. Real user data often has inconsistencies, missing fields, unexpected formats, etc. that synthetic test data tends to miss.

This helps surface integration issues and performance bottlenecks early.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: