a cura di Peter M. Horbach

Change Data Capture (CDC) – the foundation of a heterogeneous IT

In this blog we have already dealt with the different approaches that are necessary to successfully operate a heterogeneous IT with a transparent and common data base. A heterogeneous IT with a mainframe and the z/OS and z/VSE operating systems represents a particular challenge. The reasons are obvious and have already been described several times in this blog:

Historically grown databases
Non-relational database systems
Record and data structures that are not compatible with modern databases

As a rule, the mainframe-based databases are indispensable for a company and usually also the basis for the expansion of IT in the direction of new systems and technologies.

The question arises as to how to integrate these databases into a heterogeneous IT.

The first step in expanding a monogamous IT into a heterogeneous one is loading (BULK) the new databases with the existing mainframe data. From this point onwards, both the data from the mainframe and the data from the other platform(s) should be kept in sync.

Different concepts can appear here:

Master-Slave

The mainframe data is constantly updated and replicated on the new platform(s) for analytics, data warehouse, reporting, Cloud or BigData.

Master-Master

Both the mainframe data and the same data from the other platform(s) are updated and must be replicated to the respective partner platform(s). This bidirectional replication must ensure that all changes are detected in the source and applied to the target, but are no longer replicated back to the source as changes.

For both concepts, the respective platform data is as up-to-date as possible.

A basic requirement for an up-to-date decision-making and market reaction of a company.

Regular loading (BULK) of the databases is far too time-consuming, error-prone and, above all, not timely.

The solution to this problem lies in the focus on change data and the determination of the change data: the Change Data Capture,

Change Data Capture (CDC) is the method to carry out the data replication process.

Wikipedia describes CDC as follows:
"CDC is an approach to data integration that is based on the identification, capture and delivery of the changes made to enterprise data sources."

tcVISION is a solution with a large variety of CDC processors for different databases and platforms.

All the points listed below therefore fully apply to tcVISION.

tcVISION CDC records the changes to a data store (file or database) and replicates these changes (UPDATE, DELETE, INSERT) to one or more target systems.

For a mainframe environment, it is paramount that CDC is performed with low overhead and that the processing and transformation of the change data is performed on the target platform (Linux, Unix, Windows).

In the case of bidirectional replication involving a mainframe as target and CDC on a non-mainframe platform, only the import (apply) of the change data should be done on the mainframe and all other work steps (processing and transformation) on the source.

Why is CDC - versus copying the data - so important?

tcVISION´s CDC offers a number of advantages:

Synchronous data processing

CDC is a real-time or prompt form of replication and guarantees timelines of data for all business processes.
Our blog on this topic:
Real time or near real time, that is often the question

Improved basis for decision-making

Productive data can be replicated in real time (or low latency) for analytical purposes. These can be environments for a Data Warehouse, Cloud systems or BigData.

Cost reduction

The data determined with CDC is transmitted over the network (WAN) in compressed form. The costs are thus significantly reduced, since only the changes are transmitted in compressed form.

The advantages of change data capture compared to methods such as ETL (Extract, Transfer, Load) or simply copying entire data sets are therefore obvious.

Another important consideration when implementing a synchronization solution is latency. How quickly must the change data be determined and processed after it has been created? So the question is: does the change data need to be determined in real-time / near real-time or is Log processing determination sufficient?

Real-Time / Near Real-Time

Operating system	Method	Source
z/OS	Logstreams	CICS, Shared VSAM, tcVISION Logstreams
z/OS	Active Logs	Db2, IMS, ADABAS,DATACOM, IDMS
z/VSE	tcVISION collector	VSAM, Db2, DLI
z/VSE	Active Logs	ADABAS, DATACOM, IDMS
Windows/UNIX, Linux	Active Logs	Db2, MS SQL-Server, Oracle, MySQL/MariaDB, ADABAS, PostgreSQL and more

Log processing

Operating system	Method	Source
z/OS	Archive Logs*	Db2, IMS, ADABAS,DATACOM, IDMS
z/VSE	Archive Logs*	ADABAS, DATACOM, IDMS
Windows/UNIX, Linux	Archive Logs*	Db2, MS SQL-Server, Oracle, MySQL/MariaDB, ADABAS, PostgreSQL and more

*Archive logs can either be processed on the mainframe or on a Windows, Unix, Linux platform.

tcVISION is an extremely flexible, cross-system solution for real-time, bidirectional data synchronization and replication based on change data:

Data exchange becomes a single-step operation.
The use of middleware or message queuing is not necessary.
The data exchange takes place in raw format in compressed form and is reduced to the delta of change data.
Data can be moved unidirectionally or bidirectionally in real-time, time-controlled or event-controlled.

If you want to find out more, get in touch with us or subscribe for our newsletter.

Peter M. Horbach has been active in the area of data synchronization and replication with more than 40 years of IT experience. He manages the international partner business for BOS Software and writes for our blog.

Torna alla panoramica