Azure Data Lake_SQLSAT_Singapore| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

THE DATA ANALYTICS PIPELINE
Data does not arrive nicely formatted and ready to be consumed.
The data pipeline is paramount to understanding any analytics solution.

Thông tin:
15 trang 3 tháng trước

Bình luận

Vui lòng đăng nhập hoặc đăng ký để gửi bình luận.

Azure Data Lake_SQLSAT_Singapore| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

THE DATA ANALYTICS PIPELINE
Data does not arrive nicely formatted and ready to be consumed.
The data pipeline is paramount to understanding any analytics solution.

33 17 lượt tải Tải xuống
DATA STORAGE AND
ANALYTICS WITH
AZURE
DATA LAKE
GLE N N MORRIS
T E C H N I C A L D I R E C T O R | D A T A S C I E N T I S T
G L E N N @ T A L I S M A N T S . C O M @ G L E N N R M
DISCLAIMER
I am likely to appear to lie to you or tell you something that will change very soon after
I say it!
THE CONFUSION
How do you pronounce Azure”
Takes it’s name from Lapis Lazuli
French: Azur as in Cote D’azur
Spanish: Azul
THE DATA ANALYTICS PIPELINE
Data does not arrive nicely formatted and ready to be consumed.
The data pipeline is paramount to understanding any analytics solution.
The general process:
SOURCE INGEST PROCESS
STORAGE
DELIVERY
DATA LAKES
Storage and Data
Storage: infinitely scalable, fault tolerant storage designed to handle massive
volumes of data
Data: processing engine that can operate on data at the above scale
“If you think of a datamart as a store of bottled water - cleansed and packaged and
structured for easy consumption - the data lake is a large body of water in a more natural
state. The contents of the lake stream in from a source to fill the lake, and various users of
the lake can come to examine, dive in, or take samples.”
James Dixon CEO Pentaho
ARCHITECTURES
LAMBDA
Pipeline Architecture to reduce the complexity in real time analytics
Constrains incremental computation to a small portion of the architecture
Hot (mutable) or Cold (immutable) path for dataflow
KAPPA
Developed to simplify the Lambda Architecture
Eliminate the cold path
Make all processing happen in near real time streaming mode
AZURE ANALYTICS
MICROSERVICES
A microservice is a software building block that does one thing and does it well. It can
be provisioned on demand, elastically scaled, provides fault tolerance and fail over,
and when it is no longer needed can be de-provisioned
Autonomous and Isolated
Autonomous: Existing or capable of existing independently; responding, reacting, or
developing independently of the whole.
Isolated: Separate from others, happening in different places and at different times.
AZURE DATA FACTORY
Data Factory is a cloud-based data integration service that orchestrates and automates the movement and
transformation of data. Just like a manufacturing factory that runs equipment to take raw materials and
transform them into finished goods, Data Factory orchestrates existing services that collect raw data and
transform it into ready-to-use information.
AZURE DATA LAKE STORE
Azure Data Lake Store is an enterprise-wide hyper-scale repository for big data analytic workloads. Azure Data
Lake enables you to capture data of any size, type, and ingestion speed in one single place for operational and
exploratory analytics.
AZURE DATA LAKE
Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and
analysts to store data of any size, shape and speed, and do all types of processing and analytics across
platforms and languages. It removes the complexities of ingesting and storing all of your data while
making it faster to get up and running with batch, streaming, and interactive analytics. Azure Data Lake
works with existing IT investments for identity, management, and security for simplified data
management and governance.
AZURE DATA LAKE ANALYTICS
Azure Data Lake Analytics is a new service, built to make big data analytics easy. This service lets
you focus on writing, running and managing jobs, rather than operating distributed infrastructure.
Instead of deploying, configuring and tuning hardware, you write queries to transform your data
and extract valuable insights.
REMINDER
- - - - - - - - - - - 57 metres - - - - - - - - -
THANK YOU
A Big Thanks to our Sponsors
| 1/15

Preview text:

DATA STORAGE AND ANALYTICS WITH AZURE DATA LAKE G L E N N M O R R I S
T E C H N I C A L D I R E C T O R | D A T A S C I E N T I S T
G L E N N @ T A L I S M A N T S . C O M @ G L E N N R M DISCLAIMER
I am likely to appear to lie to you or tell you something that will change very soon after I say it! THE CONFUSION
How do you pronounce “Azure”
Takes it’s name from Lapis Lazuli
French: Azur as in Cote D’azur Spanish: Azul THE DATA ANALYTICS PIPELINE
Data does not arrive nicely formatted and ready to be consumed.
The data pipeline is paramount to understanding any analytics solution. The general process: SOURCE INGEST PROCESS STORAGE DELIVERY DATA LAKES Storage and Data
Storage: infinitely scalable, fault tolerant storage designed to handle massive volumes of data Data:
processing engine that can operate on data at the above scale
“If you think of a datamart as a store of bottled water - cleansed and packaged and
structured for easy consumption - the data lake is a large body of water in a more natural
state. The contents of the lake stream in from a source to fill the lake, and various users of
the lake can come to examine, dive in, or take samples.”
James Dixon CEO Pentaho ARCHITECTURES LAMBDA
Pipeline Architecture to reduce the complexity in real time analytics
Constrains incremental computation to a small portion of the architecture
Hot (mutable) or Cold (immutable) path for dataflow KAPPA
Developed to simplify the Lambda Architecture Eliminate the cold path
Make all processing happen in near real time streaming mode AZURE ANALYTICS MICROSERVICES
A microservice is a software building block that does one thing and does it well. It can
be provisioned on demand, elastically scaled, provides fault tolerance and fail over,
and when it is no longer needed can be de-provisioned Autonomous and Isolated
• Autonomous: Existing or capable of existing independently; responding, reacting, or
developing independently of the whole.
• Isolated: Separate from others, happening in different places and at different times. AZURE DATA FACTORY
Data Factory is a cloud-based data integration service that orchestrates and automates the movement and
transformation of data. Just like a manufacturing factory that runs equipment to take raw materials and
transform them into finished goods, Data Factory orchestrates existing services that collect raw data and
transform it into ready-to-use information. AZURE DATA LAKE STORE
Azure Data Lake Store is an enterprise-wide hyper-scale repository for big data analytic workloads. Azure Data
Lake enables you to capture data of any size, type, and ingestion speed in one single place for operational and exploratory analytics. AZURE DATA LAKE
Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and
analysts to store data of any size, shape and speed, and do all types of processing and analytics across
platforms and languages. It removes the complexities of ingesting and storing all of your data while
making it faster to get up and running with batch, streaming, and interactive analytics. Azure Data Lake
works with existing IT investments for identity, management, and security for simplified data management and governance. AZURE DATA LAKE ANALYTICS
Azure Data Lake Analytics is a new service, built to make big data analytics easy. This service lets
you focus on writing, running and managing jobs, rather than operating distributed infrastructure.
Instead of deploying, configuring and tuning hardware, you write queries to transform your data and extract valuable insights. REMINDER
- - - - - - - - - - - 57 metres - - - - - - - - - THANK YOU A Big Thanks to our Sponsors