Data Lake| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

Data Lake| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội. Tài liệu gồm 24 trang giúp bạn ôn tập và đạt kết quả cao trong kỳ thi sắp tới. Mời bạn đọc đón xem.

Thông tin:
24 trang 3 tháng trước

Bình luận

Vui lòng đăng nhập hoặc đăng ký để gửi bình luận.

Data Lake| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

Data Lake| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội. Tài liệu gồm 24 trang giúp bạn ôn tập và đạt kết quả cao trong kỳ thi sắp tới. Mời bạn đọc đón xem.

30 15 lượt tải Tải xuống
Microsoft C+E Technology Training
Data Platform and
Analytics
Foundational Training
Solution Area
Data Analytics
Solution
Big Data
Data Lake
[Speaker Name]
Bottom-up
(inductive)
Observation
Pattern
Theory
Hypothesis
VALUE
What will
happen?
How can we
make it happen?
Predictive
analytics
Prescriptive
analytics
DIFFICULTY
What
happened?
Why did
it happen?
Descriptive
analytics
Diagnostic
analytics
Top-down
(deductive)
Confirmation
Theory
Hypothesis
Observation
Two approaches to information management for
analytics: Top-down and bottom-up
Implement data warehouse
Physical design
ETL
development
Reporting and
analytics
development
Install and tune
Reporting and
analytics design
Dimension modeling
ETL design
Set up infrastructure
Understand
corporate
strategy
Data sources
OLTP
ERP
CRM
LOB
ETL
BI and analytics
Dashboards
Reporting
Data warehouse
Gather
requirements
Business
requirements
Technical
requirements
Data warehousing uses a top-down approach
Ingest all data
regardless of requirements
Store all data
in native format without
schema definition
Do analysis
using analytic engines like
Hadoop
Interactive queries
Batch queries
Machine Learning
Data warehouse
Real-time analytics
Devices
Relational
Sensors
Video
LOB
applications
Web
Social
Clickstream
Devices
Relational
Sensors
Video
LOB
applications
Web
Social
Clickstream
The data lake uses a bottom-up approach
Challenges involved in implementing a data lake
Performance and Scale
Storage bottlenecks
IoT sources small writes
Price-performance
Data grows independently
Security
Compliance challenges
Effectively control access
Corporate policies
Data Silos
Data spans sources
Inefficiency in colocation
Purchasing
Marketing
Sales
Operations
Analytics
Open interfaces to data
Variety of analytics tools
Introducing
Azure Data Lake
Azure Data Lake (ADL)
Azure Data Lake (ADL)
Analytics
Azure Data Lake
Analytics
Azure Data Lake
Analytics
Storage
Azure Data Lake
Store
Built on open source
Analytics
YARN
ADL Analytics
ADL HDInsight
WebHDFS
ADL Store
Storage
Hive
U/SQL
Azure Data Lake
As a part of Cortana Analytics Suite
Business scenarios
Recommendations,
customer churn,
forecasting
Perceptual intelligence
Face, vision
Speech, text
Personal digital assistant
Cortana
Dashboards and
visualizations
Power BI
Machine Learning
and Analytics
Azure
Machine Learning
Azure
Stream Analytics
DATA
Business
apps
Custom
apps
Sensors
and devices
INTELLIGENCE ACTION
People
Automated
systems
Big data stores
Azure
SQL Data Warehouse
Information
management
Azure
Data Factory
Azure
Data Catalog
Azure
Event Hub
Azure
Data Lake Store
Azure
HDInsight (Hadoop)
Azure
Data Lake Analytics
All users productive
on day one
Ready for your
enterprise
Analytics on any
data, any size
Introducing Azure Data Lake
Big data made easy
How do you start using ADL?
ADL
Analytics
Data
Azure
Storage
blobs
ADL
Store
… and
so on
Log in to the
Azure portal
Write a U-SQL
script and submit
it to the ADL
Analytics account
Create an ADL
Analytics account
(90 seconds, free)
U-SQL job reads and
writes data
Azure Data Lake
Store
A highly scalable, distributed, parallel file system in the cloud
specifically designed to work with multiple analytic frameworks
ADL Store
What is Azure Data Lake (ADL) Store?
Relational
LOB
applications
Devices
Video
Clickstream
Social
Web
Sensors
HDInsight
ADL Analytics
Machine Learning
Spark
R
Introducing Azure
Data Lake Store
A hyper-scale repository for big data
analytics workloads
Store ANY DATA in its native format
HADOOP FILE SYSTEM (HDFS) for the
cloud
ENTERPRISE GRADE
No limits to SCALE
Optimized for analytic workload
PERFORMANCE
Any data
Unstructured
Semi-structured
Structured
LOB
applications
Devices
Video
Clickstream
Web
Relational
Sensors
Social
HDFS for the cloud
Built from the ground up as a Hadoop
file system
Support for file/folder objects and
operations
Integration with HDInsight, Hortonworks,
and Cloudera
Accessible to all HDFS-compliant projects
Spark | Storm | Flume Sqoop | Kafka | R | and more
HDInsight
ADL Store
Durable and highly
available
Automatically replicates your data
Three copies within a single region
Highly available
Unlimited storage
Unlimited account sizes
Individual file sizes from gigabytes to
petabytes
No limits to scale
PB
TB
GB
PB
TB
Optimized for analytics
workload performance
Built for running large analytics systems
that require massive throughput
Optimized for parallel computation over
petabytes of data
Automatically optimizes for any
throughput
Azure Data Lake
Analytics
| 1/24

Preview text:

Microsoft C+E Technology Training Data Platform and Analytics Foundational Training Solution Area Data Analytics Solution Big Data Technology Data Lake [Speaker Name]
Two approaches to information management for
analytics: Top-down and bottom-up How can we Top-down make it happen? (deductive) Prescriptive What will analytics happen? Theory Predictive Theory analytics Hypothesis Why did it happen? Hypothesis Pattern Diagnostic What Observation happened? analytics Observation Descriptive Confirmation analytics Bottom-up LUE (inductive) VA DIFFICULTY
Data warehousing uses a top-down approach Implement data warehouse Understand Gather corporate requirements Reporting and strategy BI and analytics Reporting and analytics analytics design Business development Dashboards Reporting requirements Data warehouse Dimension modeling Physical design ETL ETL ETL design development Technical requirements Data sources Set up infrastructure Install and tune OLTP ERP CRM LOB
The data lake uses a bottom-up approach Ingest all data Store all data Do analysis regardless of requirements in native format without using analytic engines like schema definition Hadoop Batch queries Devices Social Interactive queries Devices LOB applications Video LOB Social applications Real-time analytics Sensors Machine Learning Video Web Sensors Relational Data warehouse Web Clickstream Relational Clickstream
Chal enges involved in implementing a data lake Data Silos Analytics ng Data spans sources Open interfaces to data si ing s le tions Inefficiency in colocation Sa Variety of analytics tools urcha arket pera P M O Performance and Scale Security Storage bottlenecks Compliance challenges IoT sources – small writes Effectively control access Price-performance Corporate policies Data grows independently Introducing Azure Data Lake Azure Data Lake (ADL) Azure Data Lake (ADL) Analytics Azure Data Lake Azure Data Lake Analytics Analytics Storage Azure Data Lake Store Built on open source ADL Analytics ADL HDInsight Hive Analytics U/SQL YARN WebHDFS ADL Store Storage Azure Data Lake
As a part of Cortana Analytics Suite Information Big data stores Machine Learning Dashboards and management and Analytics visualizations Power BI Business apps Azure Azure Azure Personal digital assistant Data Factory Machine Learning SQL Data Warehouse Cortana People Azure Stream Analytics Azure Perceptual intel igence Custom Data Catalog apps Azure Face, vision Azure HDInsight (Hadoop) Speech, text Data Lake Store Azure Azure Event Hub Data Lake Analytics Business scenarios Sensors Recommendations, and devices customer churn, Automated forecasting systems DATA INTELLIGENCE ACTION Introducing Azure Data Lake Big data made easy Analytics on any
Al users productive Ready for your data, any size on day one enterprise How do you start using ADL? Data Create an ADL Analytics account (90 seconds, free) Log in to the ADL Azure portal Store ADL Analytics Azure Storage blobs U-SQL job reads and writes data Write a U-SQL … and script and submit so on it to the ADL Analytics account Azure Data Lake Store
What is Azure Data Lake (ADL) Store?
A highly scalable, distributed, parallel file system in the cloud
specifically designed to work with multiple analytic frameworks Devices Video ADL Analytics Clickstream Web ADL Store HDInsight R Social Sensors Spark Relational LOB applications Machine Learning Introducing Azure Data Lake Store
A hyper-scale repository for big data
Store ANY DATA in its native format analytics workloads
HADOOP FILE SYSTEM (HDFS) for the cloud ENTERPRISE GRADE No limits to SCALE
Optimized for analytic workload PERFORMANCE Any data Devices Video Unstructured Clickstream Web Semi-structured Structured Social Sensors Relational LOB applications HDFS for the cloud
Built from the ground up as a Hadoop file system
Support for file/folder objects and operations
Integration with HDInsight, Hortonworks, ADL Store and Cloudera
Accessible to all HDFS-compliant projects
Spark | Storm | Flume Sqoop | Kafka | R | and more HDInsight Durable and highly available
Automatically replicates your data
Three copies within a single region Highly available Unlimited storage PB Unlimited account sizes TB GB
Individual file sizes from gigabytes to petabytes No limits to scale PB TB Optimized for analytics workload performance
Built for running large analytics systems
that require massive throughput
Optimized for parallel computation over petabytes of data
Automatically optimizes for any throughput Azure Data Lake Analytics