Hadoop Administration Certification Training
Understanding Big Data and Hadoop
Learning Objectives:
To Learn what is Big Data and Apache Hadoop.
How to solve Big Data problems using Hadoop
Hadoop Cluster Architecture
Big Data components & ecosystem
Understanding Hadoop data loading
To learn the reading mechanism
Importance of Hadoop Cluster Administrator.
Topics:
• What is big data
• Understanding and introduction of big data
• limitations of existing solutions in big data
• what is Hadoop architecture
• what are Hadoop components and Hadoop ecosystem
• what is data loading
• understanding data reading from HDFS
• To understand replication rules
• What is rack awareness theory?
• Hadoop cluster administrator
• Roles and responsibilities of Hadoop cluster administrator
Hadoop Cluster Administration and Understanding MapReduce
Learning outcomes:
To learn the working of secondary namenode
Learn Hadoop distributed cluster
Learn to enable rack awareness
Maintenance of Hadoop cluster
To add and remove the nodes to your cluster in an ad-hoc way, to try other recommended
learn MapReduce programming model
mapreduce model in the context of Hadoop administrator and schedules.
Topics:
• What is secondary namenode
• What does it mean to Work with Hadoop distributed cluster
• Decommissioning or commissioning of nodes
• Understanding MapReduce
• Understanding schedulers and enabling them
Understanding Backup, Recovery and Maintenance
Learning outcomes:
To learn the cluster administration tasks
What is balancing of data in a cluster
Protecting data by enabling trash
To Attempt a manual failover
To create backup within or across clusters
Safeguarding of metadata
Metadata recovery/manual failover of NameNode recovery
Learn to restrict of the usage of HDFS with respect to count, volume of data etc
Topics:
• What are key Hadoop Admin Commands
• What is Trash in Hadoop
• What is Import Check Point
• Understand Distcp,
• Understand what data backup is, and data recovery
• Understand Enabling trash
• What is Namespace count quota or space quota?
• Manual failover or metadata recovery
Hadoop 2.0 Cluster: Planning and Management
Learning outcomes:
To learn cluster planning and management
Various aspects to consider to setup of a new cluster,
learn capacity sizing
understanding recommendations in hadoop
comparing different distributions of Hadoop
what is workload in Hadoop and usage patterns
examples in bigdata
Topic:
• What is Hadoop
• What is Hadoop cluster
• Planning a Hadoop 2.0 cluster
• What is Cluster sizing
• Hardware required for hardware
• What are Network and software considerations
• Popular Hadoop distributions
• What are Workload and usage patterns
• Recommendations as per Industry
Hadoop 2.0 and it's features
Learning Objectives:
To learn more about the new features of Hadoop 2.0
Learn HDFS High Availability
What is YARN framework
What is Job execution flow MRv2,
Understand federation
Limitations of Hadoop 1.x
Learn to set up setting up Hadoop 2.0 Cluster setup
Hadoop cluster in pseudo-distributed
Hadoop Cluster in distributed mode.
Topic:
• What is Hadoop 2.0
• Limitations of previous version of Hadoop – that is 1.x
• Main features of Hadoop 2.0
• Importance of Hadoop 2.0
• What is YARN framework
• Understanding MRv2
• What is Hadoop high availability and Hadoop federation
• YARN ecosystem and Hadoop 2.0 Cluster setup
Setting up Hadoop 2.X with High Availability and upgrading Hadoop
Learning Objectives: In this module, you will learn to setup Hadoop 2 with high availability, upgrading from v1 to v2, importing data from RDBMS into HDFS, understand why Oozie, Hive, and HBase are used and working on the components.
Topics:
• Configuring Hadoop 2 with high availability
• upgrading to Hadoop 2
• working with Sqoop
• understanding Oozie
• working with Hive
• working with HBase
Project: Cloudera manager and Cluster setup, Overview on Kerberos
Learning outcomes: to learn Cloudera manager to setup Cluster
Optimizations of Hadoop/Hbase/Hive performance parameters
Learn basics on Kerberos.
Setup Pig to use in local/distributed mode
Topics:
• what is Cloudera manager
• what is cluster and setup
• what is Hive
• understand Hive administration
• what is HBase architecture
• understand HBase setup
• performance optimization of Hadoop/Hive/HBase
• Pig setup
• working with grunt,
• what is Kerberos
• importance of Kerberos