a cura di Peter M. Horbach

Apache Kafka – Technology for Today and Tomorrow

The age of data processing began when IBM introduced the mainframe in 1964. Data was captured, stored, analyzed, and became invaluable for companies. The mainframe was the central data space.

With the launch of PCs (Windows, OS/2) and workstation systems (Unix) and their integration into enterprise IT in the 1980s, IT became heterogeneous. Different technologies could communicate and exchange data with each other.

The data volume and requirements increase rapidly. Through the Internet of Tings (IoT), more and more information has to be stored and evaluated.

The age of Cloud solutions and Big Data has come. Data is not only transferred from A to B anymore, but sent as messages or continuously streamed.

The open source solution Apache Kafka has successfully been established as a solution for this kind of data exchange.

The ease of use, flexibility, and scalability in modern cluster-based systems make Apache Kafka increasingly popular.

According to Wikipedia, Kafka “aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a massively scalable pub/sub message queue designed as a distributed transaction log.”

Originally, Apache Kafka was developed by LinkedIn. Since 2012, the system is part of the Apache Software Foundation. In 2014, Confluent was founded.

BOS Software is a “Verified Partner” of Confluent (see the blog: B.O.S. Software and Confluent – the Merging of Mainframe, Open System, and BigData).

Apache Kafka has become one of the most important platforms for highly scalable systems and processing large volumes of data, and is very popular in modern IT systems. The trend of using Kafka together with analytics and data hub projects is growing.

Here is a short overview of the Apache Kafka architecture:

The core of the system is a cluster (computer network) consisting of brokers. These brokers store messages in so-called topics that can be divided into partitions. The partitions store the messages in the order in which they are received.

Applications that write messages to Apache Kafka are called producers. Their counterpart are consumers which are applications that read and process these messages.

The data streams can be processed with Kafka Streams - a Java library - and the results can be written into Kafka. Other streaming methods besides Kafka Streams are also supported.

Kafka offers four main interfaces:

  • Producer API: Writes messages
  • Consumer API: Reads messages
  • Streams API: Analyses and transforms messages
  • Connect API: Synchronizes two data systems, for example a relational database and Hadoop HDFS

With the solutions tcVISION and tcACCESS, BOS Software has substantially contributed to the integration of the mainframe into these new heterogeneous environments.

Apache Kafka was already integrated in May 2017. By now, various customers run tcVISION with Apache Kafka and Hadoop.

The topicality of the data is crucial for the success of all these projects. Low latency for data import from various sources is an important prerequisite for real-time processing, data warehouse, and BI analysis.

The tcVISION solution captures the changed data for different data sources directly on the mainframe and RDBMS environments.

More about the integration of tcVISION into Cloud systems and Big Data can be found in other blog entries.

We are happy to assist you in mastering the real-time integration of your business data in your data hub with tcVISION efficiently – with low latency and without programming effort.

The tcVISION Change Data Capture technologies for the integration of your data sources significantly accelerates the implementation of your project.

Peter M. Horbach has been active in the area of data synchronization and replication with more than 40 years of IT experience. He manages the international partner business for BOS Software and writes for our blog.

Torna alla panoramica