Hadoop Jump Start
Highlights of Blockchain Certification Training
- 32 Hours of top quality Instructor-led training(16 hours theory and 16 hours of lab work)
- 25+ years experienced Instructor
- 100+ Hadoop workshops completed
- 2000+ professionals trained
- 20+ hand-on exercises, culled from real-world examples
- Set up own cluster as well as work on a production-grade 7-node cluster.
- Ideal groundwork for professionals aiming for Hadoop certifications
- Course covers 50 top interview questions and one practical exam.
Who Should Attend
- Senior Managers, Managers and Team Leaders tasked with the responsibility of Big Data data analytics
- Professionals who intend to seriously consider Data Analytics as their career option
- Project Managers who would like to bid for Data Analytics projects and lead the delivery teams
- IT administrators likely to be tasked with the responsibility of Hadoop installation and maintenance.
- Marketing and Sales professionals who would like to understand what Big Data, Data Analytics and Machine Learning is all about – and package such offerings
Exercises
- Setting up a multi-node Apache Hadoop cluster from scratch
- Performing file I/O using HDFS
- Implementing an end-to-end data pipeline with Hive
- Creating User Defined Functions in Hive
- Working with HBase Shell and loading data from Hive
- Ingesting sensor data and log files using Flume
- Import/Export data from various RDBMS’s using Sqoop
Training Benefits
BENEFITS FOR INDIVIDUALS
- Get a good grip on Big Data and Data Analytics
- Gain the ability to identify and classify data analysis problems – and identify solutions
- Hands-on introduction to Hadoop Software – and their applications
- Apply the acquired concepts to practical situations – such as Business Data Analysis – to support decision making
- Get a practical introduction to the Hadoop eco-system components.
BENEFITS FOR ORGANIZATIONS
- Build a team with a view to harness availability of data for business growth
- Offer data oriented products and services
- Create experts who can further spread their learning into the organisation
Curriculum
- What is Big Data? – The 3V Paradigm
- Limitations of Conventional Technologies
- Essentials of Distributed Computing
- Introduction to Hadoop & Its Ecosystem
- Hadoop Distributed File System (HDFS)
Introduction to Big Data Analytics
- Anatomy of file Read / Write
- Fault Tolerance in HDFS
- Setting Up a Hadoop Cluster
HDFS Architecture
- Exercise : Installing & Configuring Hadoop 2.7.1 Cluster
- Exercise : Working with HDFS through Shell & Web UI
Setting Up a Hadoop Cluster
- Map Reduce Concepts
- Map Reduce Job Execution LifeCycle
- Exercise : Running a Map/ Reduce Job – Word Count
- Example Map/Reduce API Overview
- Exercise : Using EcListItempse to build Map/Reduce AppListItemcations
- Exercise : Deploying Map/Reduce Jobs on Cluster
- Advanced Map/Reduce Examples
Map / Reduce
- Pig Introduction & Basic Concepts
- Pig / Latin Language Overview
- Exercises : Analyzing Stock Market Data using Pig/Latin
- Exercises : Working With Complex Data Types
- Hive Basics & Architecture
- Hive Query Language . Exercise – Working with Hive
- Exercise: Analyzing weather data using Hive QL
Pig
- Hive Formats & SerDes
- Exercise: Working with ORC, XML & RegEx SerDes
- Exercise: Optimizing Hive Queries using Partitions & Clusters
- Overview of Hive Functions
- Exercise : Creating User Defined functions(UDF)
- Introduction to Hbase
Advanced Hive
- Introduction to Hbase & NoSQL
- Database
Need for Low Latency Queries
- Exercise : Working with Hbase Shell
- Role of Zookeeper
Hbase Data Model & Architecture
- Using Sqoop to Extract data from MySQL
- Exercises: Loading Data in HDFS, Hive & Hbase in Various formats
- Flume
Sqoop
- Configuring Flume Agents to build custom data flows
- Exercise: Ingesting Sensor Data into HDFS
- Exercise: Aggregating Weblogs into HDFS
- Building An End to End Hadoop AppListItemcation
Flume Architecture & Data Model
- Exercise: Reading and Writing to Hbase
- Exercise: HiveBase Integration
- Exercise: Displaying the results on a Dashboard
Exercise: Running HQL Queries through a JDBC Client