Referential Integrity through the ages
The real-time synchronization of a file or database on one platform (e.g. a mainframe z / OS or z / VSE) with a database on another platform (e.g. Linux, Unix or Windows) or between two databases on comparable platforms is always subordinated to the one important principle:
The transaction of the change must be logically coherent and error-free up to completion (COMMIT) and be found in the target database.
In IT, the term referential integrity (RI) is used for this. In case of relational databases, RI is used, when the consistency and integrity of the data must be ensured.
Detailed treatises on the importance of RI can be found in abundance on the Internet. In these articles (1, 2, 3) you will find some references.
Transaction security (TS) is another term that matters.
In computer science, ‘Transaction processing is information processing, that is divided into individual, indivisible operations called transactions. Each transaction must succeed or fail as a complete unit; it can never be only partially complete’ (see here).
When developing tcVISION, special attention was paid to the referential integrity and transaction-based transmission of data from the very beginning.
In addition to backing up RI and TS, this also includes restarting replications in the event of system malfunctions. These include, for example, transmission disruptions in the network or database errors.
During real-time replication between 2 or more databases, so-called LUW-based processing should be carried out, in particular to ensure, that the transactions are in the correct chronological order. Only commited Logical Units of Work are processed here. This processing form is the automatic standard in the tcVISION solution.
However, there are situations in which this LUW-based processing is not necessary and desirable.
When recording the change data on the source database (e.g. Db2, IMS, Adabas, IDMS, Datacom or one of the supported databases) and streaming the changes in a Big Data / Hadoop environment, referential integrity for the target system can be switched off by the user.
Please also read the BLOG entry Impact of Data Streaming on traditional Mainframe IT.
In addition to streaming output, there are also customer-specific applications in which an active RI for the target system is not desired or necessary.
An example of such a customer-specific application is a process in which an extremely high number of INSERTs are applied to a database on the z / OS mainframe (over 90 million records per hour). This extremely high number is captured by tcVISION in real time and applied to the target database in the target system via parallel instances. For this purpose, the data from the mainframe is received by an instance (tcVISION process) on the target system and, with the LUW manager active, provided with a so-called slot identifier and written to an output pipe.
Pipes are storage targets, that accept internal data of a tcVISION data process and simultaneously make them available as input to subsequent tcVISION processes.
This pipe can now be read out in parallel by several processing scripts. Each processing script only applies certain LUWs (Logical Units of Work) to the target database. The processing script is given which slot IDs it should process. All other IDs are skipped, because they are processed by other scripts.
However, it is important to know, that using this procedure can no longer guarantee the referential integrity of the data in the target database. All LUWs are imported correctly, but the chronological order, in which the LUWs are applied, is no longer guaranteed. In this example only new records (INSERTs) are processed.
This previously described procedure can also be used for so-called BULK LOAD scripts. To load a target database (initially or periodically), the data can also be entered via several parallel instances. A conceivable method would be processing according to key areas, with each key area being assigned a specific slot identifier and the maintenance scripts processing the records, based on that slot identifier.
Of course it is possible, to use both procedures together (with and without consideration of referential integrity - RI). Change records without the special slot ID are skipped by the processes in which RI is not active, and processes with RI process the records in the correct chronological order.
tcVISION is an extremely flexible solution for all kinds of requirements and applications, that exchange data from an IBM mainframe with systems and databases in the open world or between systems and databases in the open world. The aim is always to guarantee data integrity (referential integrity and transaction security). However, as this blog shows, there are also requirements, that suggest a different approach. Also here tcVISION offers the suitable solutions.
The integration of tcVISION in different database systems, cloud systems and big data is also dealt with elsewhere in this blog. Various videos are available on the B.O.S. Software GmbH or on YouTube, which deal with the integration of tcVISION into these worlds.
We will be happy to show you, how you can master the real-time integration of your company data with the tcVISION solution efficiently, with low latency, without programming effort and in real time.
The use of the various tcVISION technologies for the integration of your data sources guarantees a considerable acceleration in the implementation of your projects.