Highsky IT Solutions Pvt Ltd

Course Duration: 45 Hours

1. Big Data Overview
  1. What is big data
  2. Discussion over Databases
  3. Databases v/s Hadoop
  4. Problems with Large scale data
2. Overview of Apache Hadoop
  1. Why Hadoop
  2. Apache Hadoop Architecture
  3. Apache Hadoop workflow
  4. Apache Hadoop Component
3. Installation of Apache Hadoop
  1. Basic Prerequisites for Hadoop
  2. Apache Hadoop Standalone Installation
  3. Apache Hadoop Management
4. Hadoop Distributed File System ( HDFS )
  1. Understanding HDFS Architecture
  2. Understanding HDFS Management and Core Component
  3. HDFS Snapshot and Management
  4. Understanding FSImage and edit logs management
5. Hadoop Component Management and Understanding
  1. Understanding Apache Hadoop Name node
  2. Understanding Apache Hadoop Data node
  3. Understanding Apache Secondary Name Node
  4. Apache Hadoop Backup & Management
6. Hadoop Cluster Installation and Management
  1. Hadoop Multi Node Cluster setup
  2. Hadoop Cluster Management
  3. Include and Exclude Data nodes in a cluster
7. Hadoop Yarn Architecture & Management
  1. Yarn service Introduction & Architecture
  2. Yarn Resource Manager & Node Manager
  3. Yarn Scheduling and Management
8. Apache Hadoop Administration
  1. Managing Hadoop cli
  2. Managing Hadoop Web
  3. Managing Failover & Nodes
9. Apache Hive Management
  1. Hive Architecture and Hadoop
  2. Hive Installation and management
  3. Data manipulation with Hive
10. Apache Pig Management
  1. Pig Architecture
  2. Pig Installation and management
  3. Pig Latin script for Data Manipulation
11. Eco system process over Hadoop
  1. Configure and Management of Sqoop
  2. Application Management using  Flume
  3. Configure Management of HBASE
12. High Availability over Hadoop
  1. Deployment Name node on HA using Zookeeper
  2. Name node as a standby and active
  1. Cloudera manager
  2. Apache Strom
  3. Ambari