Introduction to Hadoop and Big Data:
• What are the challenges for processing big data?
• What technologies support big data?
• What is Hadoop?
• Why Hadoop?
• History of Hadoop
• Use cases of Hadoop
• RDBMS vs Hadoop
• When to use and when not to use Hadoop
• Ecosystem tour
• Vendor comparison
• Hardware Recommendations & Statistics
>> Get Upto 40% OFF on Big DATA Course Fee <<
HDFS: Hadoop Distributed File System:
– Significance of HDFS in Hadoop
• Features of HDFS
• 5 daemons of Hadoop
- Name Node and its functionality
- Data Node and its functionality
- Secondary Name Node and its functionality
- Job Tracker and its functionality
- Task Tracker and its functionality
• Data Storage in HDFS
- Introduction about Blocks
- Data replication
• Accessing HDFS
- CLI (Command Line Interface) and admin commands
- Java Based Approach
• Fault tolerance
• Download Hadoop
• Installation and set-up of Hadoop
- Start-up & Shut down process
• HDFS Federation
Map Reduce:
• Map Reduce Story
• Map Reduce Architecture
• How Map Reduce works
• Developing Map Reduce
• Map Reduce Programming Model
- Different phases of Map Reduce Algorithm.
- Different Data types in Map Reduce.
- how Write a basic Map Reduce Program.
- Driver Code
- 3Mapper
- Reducer
• Creating Input and Output Formats in Map Reduce Jobs
- Text Input Format
- Key Value Input Format
- Sequence File Input Format
- Data localization in Map Reduce
- Combiner (Mini Reducer) and Partitioner
- Hadoop I/O
- Distributed cache
PIG:
• Introduction to Apache Pig
• Map Reduce Vs. Apache Pig
• SQL vs. Apache Pig
• Different data types in Pig
• Modes of Execution in Pig
• Grunt shell
• Loading data
• Exploring Pig
• Latin commands
HIVE:
• Hive introduction
• Hive architecture
• Hive vs RDBMS
• HiveQL and the shell
• Managing tables (external vs managed)
• Data types and schemas
• Partitions and buckets
HBASE:
• Architecture and schema design
• HBase vs. RDBMS
• HMaster and Region Servers
• Column Families and Regions
• Write pipeline
• Read pipeline
• HBase commands
Flume
SQOOP