The importance of realistic data in the tests
There are many tests that we can execute in Business Intelligence (BI) systems or any system that uses uncontrolled data - extract, transform, load (ETL), queries, performance etc. In this article, I will concentrate on the data itself, and more specifically on using realistic data for the tests. Testing should be done as close as possible to the conditions that it will be used in production by the actual users. One of the keys to success in this area is the data that is being used during the testing process. Some applications use only data produced by them, like alarm clocks. Other apps use only predefined data like weather apps. Those cases are relatively straightforward. However, when your application or system uses a lot of types of data, including various external data, and sometimes unstructured data - like systems with big data, the data might be corrupted or unexpected (IOW, the code can’t handle it). However, it is not only unexpected data that might cause data integrity