Domain Integrity Aware Data Generation

In the world of data-driven applications, data is for sure a king. And when it comes to testing those applications, the importance of meaningful, structured, and interrelated test data cannot be overestimated.

Why not test apps with real data? Because in this case, sensitive information will be revealed which is totally unacceptable. Thus, accurate, relevant, and consistent test data is of crucial importance today.

Why not use meaningless sample data then? Unfortunately, this approach suffers from some significant drawbacks. Low-quality test data is of little or no use to businesses when it comes to ensuring that the application adheres to the intended business logic. And that's not to mention the dangers of a potential loss of sensitive production data in case the testing was performed with meaningless test data.

Data Generator for SQL Server allows generating loads of realistic test data preserving domain-level data integrity and thus ensuring database data accuracy, consistency, and relevance.

What is domain integrity?

There are four main types of Database Data Integrity:

  • Domain Integrity
  • Entity Integrity
  • Referential Integrity
  • User-Defined Integrity

Domain integrity refers to the accuracy and consistency of data in a database. The domain represents a set of rules that define acceptable values that a column can contain. They include constraints and other measures that determine the format, type, and amount of data entered.

The example of domain data integrity. The Age of a user can be entered in a table in a number of different ways. In case the Age column only accepts integers (INT), the values twenty-two, eleven, and VII will not be accepted by the database. This way data is protected by using domain-level data integrity.

The example of domain data integrity

Populating primary key columns

For primary key columns, Data Generator for SQL Server can generate sequential IDs from 1 to n, where n=127 for tinyint, 32,767 for smallint, 2,147,483,647 for integer, and 9,223,372,036,854,775,807 for bigint.

Like PRIMARY KEY constraints, UNIQUE constraints guarantee that the values in a column are unique. However, a UNIQUE KEY unlike a PRIMARY KEY can accept one null value per a table column. Many of the generators have the Set values to be unique option. When this option is enabled, Data Generator for SQL Server makes sure that the values generated for the column are unique.

You can also set a value distribution mode for the generated values. If you select Random by timestamp, the generated IDs will be inconsequent and every generation thereafter will populate new random values. In case you select Random by seed, IDs will be generated based on the specified seed value and when repopulating the column, the sequence will be preserved.

Populating primary keys columns

Populating foreign keys columns

A FOREIGN KEY constraint is defined on a child table and references the PRIMARY KEY column or a set of columns, in the parent table. Foreign key relationships in a database help keep the related data consistent.

Data Generator for SQL Server automatically assigns the Foreign Key generator to the columns that have foreign key constraints and generates foreign key values for a child table column based on the values contained in the parent table column. If necessary you can switch the Foreign Key (auto assigned) Generator to Foreign Key (manually assigned) Generator and select a parent column. This way, you can create a single-column foreign key.

If Data Generator detects a FK between two columns, it automatically assigns the Foreign Key (auto assigned) Generator. For other cases, you can manually define a parent column using the Foreign Key (manually assigned) Generator.

Populating foreign keys columns

Preserving domain integrity for DATE columns

Let's consider a worked example of how to use Data Generator to preserve domain integrity for values in DATE columns. Suppose, ShippedDate can not be prior to OrderDate which in turn can not be before today. To configure OrderDate we set Range to Offset from now, then select the Use offset details checkbox and set Min to -7 and Max to -1. Next, we need to configure ShippedDate to be no later than 7 days after OrderDate. So, in the Range box, we select Offset from column and in the Column box, we specify OrderDate. Then, we need to provide the offset details, namely the range of days. And since the order can be received and the goods might not be dispatched yet, we select the Include Null values checkbox. In this way, the values generated for the ShippedDate column will realistic, consistent, and related.

Preserving domain integrity for DATE columns

Populating tables with check constraints

A CHECK constraint specifies a requirement that must be met by each value in a column. If you set a CHECK constraint on a column it will allow only certain values for this column. If you set a CHECK constraint on a table it can limit the values in certain columns based on values in other columns in the row.

Thus, it is immensely important to take CHECK constraints into account when generating data for columns and/or tables with CHECK constraints defined. Otherwise, the database data won't be consistent and accurate.

dbForge Data Generator for SQL Servers supports most of the existing CHECK constraints. When the tool automatically assigns generators to the columns, it sets the generator parameters to take account of any check constraints defined on a database.

In the screenshot below, the check constraint defines the acceptable range of values between 0 and 100, however, the range is also limited by the Product Quantity generator (which allows values from 1 to 99). Therefore, as a result of generation, only the values from 1 to 99 will be generated.

Populating tables with check constraints

Populating tables with a circular reference

Circular reference can be defined as a condition when one table in a database references a second table, which in turn references the first table, resulting in a closed loop. A special case of circular reference is the self-referencing table. A foreign key column in a self-referencing table references its own primary key column thus creating a loop in the table.

dbForge Data Generator for SQL Server smartly detects dependencies within a database. When the tool spots a circular reference within a table to be populated, it throws a warning, and data generation is not possible in this case. When detecting a circular dependency between two tables, Data Generator displays a warning message as well, however, it allows changing the generator from the Foreign Key Generator for any other valid one. For such cases, data generation is possible.

Populating tables with a circular reference