Change Data Capture (CDC) – the foundation of a heterogeneous IT
In this blog we have already dealt with the different approaches that are necessary to successfully operate a heterogeneous IT with a transparent and common data base. A heterogeneous IT with a mainframe and the z/OS and z/VSE operating systems represents a particular challenge. The reasons are obvious and have already been described several times in this blog:
- Historically grown databases
- Non-relational database systems
- Record and data structures that are not compatible with modern databases
As a rule, the mainframe-based databases are indispensable for a company and usually also the basis for the expansion of IT in the direction of new systems and technologies.
The question arises as to how to integrate these databases into a heterogeneous IT.
The first step in expanding a monogamous IT into a heterogeneous one is loading (BULK) the new databases with the existing mainframe data. From this point onwards, both the data from the mainframe and the data from the other platform(s) should be kept in sync.
Different concepts can appear here:
The mainframe data is constantly updated and replicated on the new platform(s) for analytics, data warehouse, reporting, Cloud or BigData.
Both the mainframe data and the same data from the other platform(s) are updated and must be replicated to the respective partner platform(s). This bidirectional replication must ensure that all changes are detected in the source and applied to the target, but are no longer replicated back to the source as changes.
For both concepts, the respective platform data is as up-to-date as possible.
A basic requirement for an up-to-date decision-making and market reaction of a company.
Regular loading (BULK) of the databases is far too time-consuming, error-prone and, above all, not timely.
The solution to this problem lies in the focus on change data and the determination of the change data: the Change Data Capture,
Change Data Capture (CDC) is the method to carry out the data replication process.
Wikipedia describes CDC as follows:
"CDC is an approach to data integration that is based on the identification, capture and delivery of the changes made to enterprise data sources."
tcVISION is a solution with a large variety of CDC processors for different databases and platforms.
All the points listed below therefore fully apply to tcVISION.
tcVISION CDC records the changes to a data store (file or database) and replicates these changes (UPDATE, DELETE, INSERT) to one or more target systems.
For a mainframe environment, it is paramount that CDC is performed with low overhead and that the processing and transformation of the change data is performed on the target platform (Linux, Unix, Windows).
In the case of bidirectional replication involving a mainframe as target and CDC on a non-mainframe platform, only the import (apply) of the change data should be done on the mainframe and all other work steps (processing and transformation) on the source.
Why is CDC - versus copying the data - so important?
tcVISION´s CDC offers a number of advantages:
Synchronous data processing
CDC is a real-time or prompt form of replication and guarantees timelines of data for all business processes.
Our blog on this topic:
Real time or near real time, that is often the question
Improved basis for decision-making
Productive data can be replicated in real time (or low latency) for analytical purposes. These can be environments for a Data Warehouse, Cloud systems or BigData.
The data determined with CDC is transmitted over the network (WAN) in compressed form. The costs are thus significantly reduced, since only the changes are transmitted in compressed form.
The advantages of change data capture compared to methods such as ETL (Extract, Transfer, Load) or simply copying entire data sets are therefore obvious.
Another important consideration when implementing a synchronization solution is latency. How quickly must the change data be determined and processed after it has been created? So the question is: does the change data need to be determined in real-time / near real-time or is Log processing determination sufficient?
Real-Time / Near Real-Time
|z/OS||Logstreams||CICS, Shared VSAM, tcVISION Logstreams|
|Active Logs||Db2, IMS, ADABAS,DATACOM, IDMS|
|z/VSE||tcVISION collector||VSAM, Db2, DLI|
|Active Logs||ADABAS, DATACOM, IDMS|
|Windows/UNIX, Linux||Active Logs||Db2, MS SQL-Server, Oracle, MySQL/MariaDB, ADABAS, PostgreSQL and more|
|z/OS||Archive Logs*||Db2, IMS, ADABAS,DATACOM, IDMS|
|z/VSE||Archive Logs*||ADABAS, DATACOM, IDMS|
|Windows/UNIX, Linux||Archive Logs*||Db2, MS SQL-Server, Oracle, MySQL/MariaDB, ADABAS, PostgreSQL and more|
*Archive logs can either be processed on the mainframe or on a Windows, Unix, Linux platform.
tcVISION is an extremely flexible, cross-system solution for real-time, bidirectional data synchronization and replication based on change data:
- Data exchange becomes a single-step operation.
- The use of middleware or message queuing is not necessary.
- The data exchange takes place in raw format in compressed form and is reduced to the delta of change data.
- Data can be moved unidirectionally or bidirectionally in real-time, time-controlled or event-controlled.