Data Loading Tool - Hệ điều hành | Trường Đại Học Tài Nguyên và Môi Trường TP HCM

Data Loading Tool - Hệ điều hành | Trường Đại Học Tài Nguyên và Môi Trường TP HCM được sưu tầm và soạn thảo dưới dạng file PDF để gửi tới các bạn sinh viên cùng tham khảo, ôn tập đầy đủ kiến thức, chuẩn bị cho các buổi học thật tốt. Mời bạn đọc đón xem!

Môn:
Thông tin:
67 trang 5 tháng trước

Bình luận

Vui lòng đăng nhập hoặc đăng ký để gửi bình luận.

Data Loading Tool - Hệ điều hành | Trường Đại Học Tài Nguyên và Môi Trường TP HCM

Data Loading Tool - Hệ điều hành | Trường Đại Học Tài Nguyên và Môi Trường TP HCM được sưu tầm và soạn thảo dưới dạng file PDF để gửi tới các bạn sinh viên cùng tham khảo, ôn tập đầy đủ kiến thức, chuẩn bị cho các buổi học thật tốt. Mời bạn đọc đón xem!

50 25 lượt tải Tải xuống
Big Data
Data Loading Tools
Trong-Hop Do
S
3
Lab
Smart Software System Laboratory
1
“Without big data, you are blind and deaf
and in the middle of a freeway.”
Geoffrey Moore
Big DataBig Data
2
Hadoop Ecosystem
3
Big Data
Apache Flume Tutorial
4
Introduction to Apache Flume
Apache Flume is a tool for data ingestion in HDFS. It collects,
aggregates and transports large amount of streaming data
such as log files, events from various sources like network
traffic, social media, email messages etc. to HDFS. Flume is a
highly reliable & distributed.
The main idea behind the Flume’s design is to capture
streaming data from various web servers to HDFS. It has
simple and flexible architecture based on streaming data
flows. It is fault-tolerant and provides reliability mechanism for
Fault tolerance & failure recovery.
5
Flume - How it works
6
Data transfer components
Big Data
Data ows like
Agent tier -> Collector tier -> Storage tier
Agent nodes are typically installed on the machines that generate
the logs and are data’s initial point of contact with Flume. They
forward data to the next tier of collector nodes, which aggregate
the separate data ows and forward them to the nal storage tier.
7
Data transfer components
Flume - How it works
Big Data
Flume - Agent architecture
8
Sources:
HTTP, Syslog, JMS, Kafka,
Avro, Twitter - stream api
for tweets download, …
Sink:
HDFS, Hive, HBase, Kafka,
Solr, …
Channel:
File, JDBC, Kafka, ...
Data transfer components
Big Data
Start Flume on Cloudera Quickstart VM
To add Flume to Cloudera Quickstart VM, you need to launch Cloudera Manager
Congure the VM.
Allocate a minimum of 10023 MB memory.
Allocate 2 CPUs.
Allocate 20 MB video memory.
Consider setting the clipboard to bidirectional.
9
Start Flume on Cloudera Quickstart VM
Launch Cloudera Express
10
Start Flume on Cloudera Quickstart VM
Check the status of Namenode services
Command: sudo service hadoop-hdfs-namenode status
If namenode is running, then start namenode servicenot
Command: sudo service hadoop-hdfs-namenode start
Check the status of Namenode services
Command: sudo service hadoop-hdfs-datanode status
If namenode is not running, then start namenode service
Command: sudo service hadoop-hdfs-datanode start
11
Start Flume on Cloudera Quickstart VM
Open Cloudera Manager in web browser
Username: cloudera
Password: cloudera
12
Start Flume on Cloudera Quickstart VM
After logging in to Cloudera Manager, click Add Service
13
Start Flume on Cloudera Quickstart VM
Select Flume
14
Start Flume on Cloudera Quickstart VM
Start Hue
15
Start Flume on Cloudera Quickstart VM
Start Flume
16
Start Flume on Cloudera Quickstart VM
Check the conguration of Flume
17
Start Flume on Cloudera Quickstart VM
Check the port ( in this VM)9999
18
Start Flume on Cloudera Quickstart VM
Firstly, let’s install telnet
Command: sudo yum install telnet
19
Use Telnet to test the default Flume implementation
Start Flume on Cloudera Quickstart VM
Launch Telnet with the command: telnet localhost 9999
At the prompt, enter Hello world ^.^
Press to escapeCtr+]
Type to close telnetquit
20
Use Telnet to test the default Flume implementation
| 1/67

Preview text:

Big Data Data Loading Tools Trong-Hop Do S3Lab
Smart Software System Laboratory 1
“Without big data, you are blind and deaf
and in the middle of a freeway.” Geoffrey Moore Big Data 2 Hadoop Ecosystem 3 Big Data Apache Flume Tutorial 4 Introduction to Apache Flume ●
Apache Flume is a tool for data ingestion in HDFS. It collects,
aggregates and transports large amount of streaming data
such as log files, events from various sources like network
traffic, social media, email messages etc. to HDFS. Flume is a
highly reliable & distributed. ●
The main idea behind the Flume’s design is to capture
streaming data from various web servers to HDFS. It has
simple and flexible architecture based on streaming data
flows. It is fault-tolerant and provides reliability mechanism for
Fault tolerance & failure recovery. 5 Data transfer components
Flume - How it works 6 Big Data Data transfer components
Flume - How it works ● Data ows like ●
Agent tier -> Collector tier -> Storage tier ●
Agent nodes are typically installed on the machines that generate
the logs and are data’s initial point of contact with Flume. They
forward data to the next tier of collector nodes, which aggregate
the separate data ows and forward them to the nal storage tier. 7 Big Data Data transfer components
Flume - Agent architecture ● Sources: ○ HTTP, Syslog, JMS, Kafka, Avro, Twitter - stream api for tweets download, … ● Sink: ○ HDFS, Hive, HBase, Kafka, Solr, … ● Channel: ○ File, JDBC, Kafka, ... 8 Big Data
Start Flume on Cloudera Quickstart VM ●
To add Flume to Cloudera Quickstart VM, you need to launch Cloudera Manager ● Con gure the VM. ○
Allocate a minimum of 10023 MB memory. ○ Allocate 2 CPUs. ○ Allocate 20 MB video memory. ○
Consider setting the clipboard to bidirectional. 9
Start Flume on Cloudera Quickstart VM ● Launch Cloudera Express 10
Start Flume on Cloudera Quickstart VM ●
Check the status of Namenode services ○
Command: sudo service hadoop-hdfs-namenode status
If namenode is not running, then start namenode service ○
Command: sudo service hadoop-hdfs-namenode start
Check the status of Namenode services ○
Command: sudo service hadoop-hdfs-datanode status
If namenode is not running, then start namenode service ○
Command: sudo service hadoop-hdfs-datanode start 11
Start Flume on Cloudera Quickstart VM ●
Open Cloudera Manager in web browser ● Username: cloudera ● Password: cloudera 12
Start Flume on Cloudera Quickstart VM ●
After logging in to Cloudera Manager, click Add Service 13
Start Flume on Cloudera Quickstart VM ● Select Flume 14
Start Flume on Cloudera Quickstart VM ● Start Hue 15
Start Flume on Cloudera Quickstart VM ● Start Flume 16
Start Flume on Cloudera Quickstart VM ●
Check the con guration of Flume 17
Start Flume on Cloudera Quickstart VM ●
Check the port (9999 in this VM) 18
Start Flume on Cloudera Quickstart VM
Use Telnet to test the default Flume implementation ●
Firstly, let’s install telnet ●
Command: sudo yum install telnet 19
Start Flume on Cloudera Quickstart VM
Use Telnet to test the default Flume implementation ●
Launch Telnet with the command: telnet localhost 9999
At the prompt, enter Hello world ^.^ ● Press Ctr+] to escape ●
Type quit to close telnet 20