67 trang 74 lượt tải

Data Loading Tool - Hệ điều hành | Trường Đại Học Tài Nguyên và Môi Trường TP HCM

147

Data Loading Tool - Hệ điều hành | Trường Đại Học Tài Nguyên và Môi Trường TP HCM được sưu tầm và soạn thảo dưới dạng file PDF để gửi tới các bạn sinh viên cùng tham khảo, ôn tập đầy đủ kiến thức, chuẩn bị cho các buổi học thật tốt. Mời bạn đọc đón xem!

Môn: Hệ điều hành (UIT1) 5 tài liệu

Trường: Đại học Tài nguyên và Môi trường Thành phố Hồ Chí Minh 15 tài liệu

Tác giả:

Mai Nguyệt

1 năm trước

Danh sách Quiz

Big Data

Data Loading Tools

Trong-Hop Do

Lab

Smart Software System Laboratory

“Without big data, you are blind and deaf

and in the middle of a freeway.”

– Geoffrey Moore

Big DataBig Data

Hadoop Ecosystem

Big Data

Apache Flume Tutorial

Introduction to Apache Flume

● Apache Flume is a tool for data ingestion in HDFS. It collects,

aggregates and transports large amount of streaming data

such as log files, events from various sources like network

traffic, social media, email messages etc. to HDFS. Flume is a

highly reliable & distributed.

● The main idea behind the Flume’s design is to capture

streaming data from various web servers to HDFS. It has

simple and flexible architecture based on streaming data

flows. It is fault-tolerant and provides reliability mechanism for

Fault tolerance & failure recovery.

Flume - How it works

Data transfer components

Big Data

● Data ows like

● Agent tier -> Collector tier -> Storage tier

● Agent nodes are typically installed on the machines that generate

the logs and are data’s initial point of contact with Flume. They

forward data to the next tier of collector nodes, which aggregate

the separate data ows and forward them to the nal storage tier.

Data transfer components

Flume - How it works

Big Data

Flume - Agent architecture

● Sources:

○ HTTP, Syslog, JMS, Kafka,

Avro, Twitter - stream api

for tweets download, …

● Sink:

○ HDFS, Hive, HBase, Kafka,

Solr, …

● Channel:

○ File, JDBC, Kafka, ...

Data transfer components

Big Data

Start Flume on Cloudera Quickstart VM

● To add Flume to Cloudera Quickstart VM, you need to launch Cloudera Manager

● Congure the VM.

○ Allocate a minimum of 10023 MB memory.

○ Allocate 2 CPUs.

○ Allocate 20 MB video memory.

○ Consider setting the clipboard to bidirectional.

Start Flume on Cloudera Quickstart VM

● Launch Cloudera Express

Start Flume on Cloudera Quickstart VM

● Check the status of Namenode services

○ Command: sudo service hadoop-hdfs-namenode status

○ If namenode is running, then start namenode servicenot

○ Command: sudo service hadoop-hdfs-namenode start

● Check the status of Namenode services

○ Command: sudo service hadoop-hdfs-datanode status

○ If namenode is not running, then start namenode service

○ Command: sudo service hadoop-hdfs-datanode start

Start Flume on Cloudera Quickstart VM

● Open Cloudera Manager in web browser

● Username: cloudera

● Password: cloudera

Start Flume on Cloudera Quickstart VM

● After logging in to Cloudera Manager, click Add Service

Start Flume on Cloudera Quickstart VM

● Select Flume

Start Flume on Cloudera Quickstart VM

● Start Hue

Start Flume on Cloudera Quickstart VM

● Start Flume

Start Flume on Cloudera Quickstart VM

● Check the conguration of Flume

Start Flume on Cloudera Quickstart VM

● Check the port ( in this VM)9999

Start Flume on Cloudera Quickstart VM

● Firstly, let’s install telnet

● Command: sudo yum install telnet

Use Telnet to test the default Flume implementation

Start Flume on Cloudera Quickstart VM

● Launch Telnet with the command: telnet localhost 9999

● At the prompt, enter Hello world ^.^

● Press to escapeCtr+]

● Type to close telnetquit

Use Telnet to test the default Flume implementation

Bấm Tải xuống để xem toàn bộ.

Preview text:

Big Data Data Loading Tools Trong-Hop Do S3Lab
Smart Software System Laboratory 1
“Without big data, you are blind and deaf
and in the middle of a freeway.” – Geoffrey Moore Big Data 2 Hadoop Ecosystem 3 Big Data Apache Flume Tutorial 4 Introduction to Apache Flume ●
Apache Flume is a tool for data ingestion in HDFS. It collects,
aggregates and transports large amount of streaming data
such as log files, events from various sources like network
traffic, social media, email messages etc. to HDFS. Flume is a
highly reliable & distributed. ●
The main idea behind the Flume’s design is to capture
streaming data from various web servers to HDFS. It has
simple and flexible architecture based on streaming data
flows. It is fault-tolerant and provides reliability mechanism for
Fault tolerance & failure recovery. 5 Data transfer components
Flume - How it works 6 Big Data Data transfer components
Flume - How it works ● Data ows like ●
Agent tier -> Collector tier -> Storage tier ●
Agent nodes are typically installed on the machines that generate
the logs and are data’s initial point of contact with Flume. They
forward data to the next tier of collector nodes, which aggregate
the separate data ows and forward them to the nal storage tier. 7 Big Data Data transfer components
Flume - Agent architecture ● Sources: ○ HTTP, Syslog, JMS, Kafka, Avro, Twitter - stream api for tweets download, … ● Sink: ○ HDFS, Hive, HBase, Kafka, Solr, … ● Channel: ○ File, JDBC, Kafka, ... 8 Big Data
Start Flume on Cloudera Quickstart VM ●
To add Flume to Cloudera Quickstart VM, you need to launch Cloudera Manager ● Con gure the VM. ○
Allocate a minimum of 10023 MB memory. ○ Allocate 2 CPUs. ○ Allocate 20 MB video memory. ○
Consider setting the clipboard to bidirectional. 9
Start Flume on Cloudera Quickstart VM ● Launch Cloudera Express 10
Start Flume on Cloudera Quickstart VM ●
Check the status of Namenode services ○
Command: sudo service hadoop-hdfs-namenode status ○
If namenode is not running, then start namenode service ○
Command: sudo service hadoop-hdfs-namenode start ●
Check the status of Namenode services ○
Command: sudo service hadoop-hdfs-datanode status ○
If namenode is not running, then start namenode service ○
Command: sudo service hadoop-hdfs-datanode start 11
Start Flume on Cloudera Quickstart VM ●
Open Cloudera Manager in web browser ● Username: cloudera ● Password: cloudera 12
Start Flume on Cloudera Quickstart VM ●
After logging in to Cloudera Manager, click Add Service 13
Start Flume on Cloudera Quickstart VM ● Select Flume 14
Start Flume on Cloudera Quickstart VM ● Start Hue 15
Start Flume on Cloudera Quickstart VM ● Start Flume 16
Start Flume on Cloudera Quickstart VM ●
Check the con guration of Flume 17
Start Flume on Cloudera Quickstart VM ●
Check the port (9999 in this VM) 18
Start Flume on Cloudera Quickstart VM
Use Telnet to test the default Flume implementation ●
Firstly, let’s install telnet ●
Command: sudo yum install telnet 19
Start Flume on Cloudera Quickstart VM
Use Telnet to test the default Flume implementation ●
Launch Telnet with the command: telnet localhost 9999 ●
At the prompt, enter Hello world ^.^ ● Press Ctr+] to escape ●
Type quit to close telnet 20

Data Loading Tool - Hệ điều hành | Trường Đại Học Tài Nguyên và Môi Trường TP HCM

Tài liệu liên quan:

Câu hỏi trắc nghiệm môn Hệ điều hành | Đại học Tài nguyên và Môi trường Thành phố Hồ Chí Minh

Spark Introduction - Hệ điều hành | Trường Đại Học Tài Nguyên và Môi Trường TP HCM

Big data Introduction - Hệ điều hành | Trường Đại Học Tài Nguyên và Môi Trường TP HCM

Kiến thức về bộ nhớ - Hệ điều hành | Trường Đại Học Tài Nguyên và Môi Trường TP HCM