Data Ingestion With Flume & Kafka

Data ingestion is an art and science in itself. Ingesting data effectively into a Hadoop cluster or any other data store, requires a good understanding of the source and sink with an ability to configure data pipelines. Ingestion becomes a complex task, as we source events from multiple sources in parallel and need to deliver them to various destinations in real-time. High-speed data ingestion is especially critical when implementing real-time analytics.

Register Now

or call us now on +91 9850033661

Training Goals

To provide a thorough understanding of Flume configuration & Kafka. Participants will be able to implement practical data flows in their projects.Pre-requisite : Some programming background, preferably Java.

Contents

  • Introduction to multiplexed data flows, fan-out flows, aggregators.

  • Implementing Custom De-Serialisers and Interceptors.

  • Advanced Flume Configuration.

  • Kafka Architecture – Publish/Subscribe Model

  • Implementing custom Publishers

  • Kafka Consumers – HDFS consumer, HBase consumer, Cassandra Consumer and many others.

    • Intended Audience

    • ETL developers, Java developers, Analytics professionals and Hadoop developers.

Methodology

The program is designed to provide an overview of Cassandra. Key concepts in each area will be explained and working code provided. Participants will be able to run the examples and expected to understand code on their own with some pointers. Detailed code walk-though is not provided. Code is written in Java.