Azure Data Lake Store & Analytics| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

We Hold These Truths…
• A database has a schema
• We transform the data into that schema
• Data conforms to the schema we define
• The schema defines the business

Thông tin:
42 trang 3 tháng trước

Bình luận

Vui lòng đăng nhập hoặc đăng ký để gửi bình luận.

Azure Data Lake Store & Analytics| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

We Hold These Truths…
• A database has a schema
• We transform the data into that schema
• Data conforms to the schema we define
• The schema defines the business

31 16 lượt tải Tải xuống
Azure Data Lake Store
& Analytics
Neck deep in the drink…
Agenda
Introductions
What is this Thing?
Azure Data Lake Store
Quick End to End Demo
Environment Setup
Azure Data Lake Analytics & U-SQL
Discussion
Who Am I?
Audrey Hammonds
Practice Lead, BI & Analytics at Innovative Architects
Recently moved from Atlanta to West Palm Beach
Organizer, Palm Beach Data Meetup
Board of Directors, Palm Beach Tech Association
Twitter: @DataAudrey
LinkedIn:
https://www.linkedin.com/in/audreyhammonds/
Email: Audrey.Hammonds@InnovativeArchitects.com
Blog: Datachix.com (with Julie Smith)
Founded in Atlanta
13 years in project-based consulting
~120 employees
#6 best small business to work for in ATL in 2017
People who don’t suck
Focus on:
Data
Integration
Mobile
App Dev
Cloud Infrastructure
Being Awesome
Twitter: @InnovArchitects
Facebook: facebook.com/InnovativeArchitects
Blog: blog.innovativearchitects.com/
The Future is Here!
https://www.youtube.com/watch?v=Bmz67ErIRa4
What is this Data Lake thing?
The Main Parts
We Hold These Truths…
A database has a schema
We transform the data into that schema
Data conforms to the schema we define
The schema defines the business
Database
Table
Column
Column
Column
Column
Table
Column
Column
Column
Column
Table
Column
Column
Column
Column
Table
Column
Column
Column
Column
Table
Column
Column
Column
Column
Table
Column
Column
Column
Column
Referential Integrity
Data Type
Size
Defaults/Checks
The Mind-Blowing Proposition…
Schema
Constraints
Data
Transform
Data Formats
Semi-
Structured
Unstructured
Structured
Contrasting Philosophies
Database
Table
Column
Column
Column
Column
Table
Column
Column
Column
Column
Table
Column
Column
Column
Column
Table
Column
Column
Column
Column
Table
Column
Column
Column
Column
Table
Column
Column
Column
Column
Referential Integrity
Data Type
Size
Defaults/Checks
ETL ISA
Extract
Transform
Load
Ingest
Store
Analyze
Azure Data Lake Store
The What
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-overview
The Whys
Why Data Lake?
Traditional data warehousing
is the antithesis of agile
ETL is costly, time-consuming,
and fraught with peril
Schema is hard to change
future-proof it
Scale performance as
needed
The Whys
Why on Azure?
Scale compute and
store independently
Leverage MPP
capabilities (even
against a single
object)
Secure with Azure
Active Directory
The Whys
Why HDFS?
Native Support for
file/folder operations
Compatible with Spark,
Storm, Flume, Sqoop,
Kafka, R, etc.
Open standards mean
better long-term
integration
The Hows
What stores can I load from Data
Lake Store?
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-azure-datalake-connector
Azure
Blob Storage
Data Lake
Store
Cosmos DB
SQL
Database
SQL Data
Warehouse
Search Index
Table storage
Databases
SQL Server
Oracle
File
File System
The Hows
What stores can I load to Data
Lake Store?
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-azure-datalake-connector
Azure
Blob
Storage
Cosmos DB
Data Lake
Store
SQL
Database
SQL Data
Warehouse
Table
storage
Databases
SQL Server
Oracle
Amazon
Redshift
DB2
MySQL
PostgreSQL
SAP
Business
Warehouse
SAP HANA
Sybase
Teradata
File
File System
Amazon S3
FTP
HDFS
SFTP
NoSQL
Cassandra
MongoDB
Others
Generic
HTTP
Generic
OData
Generic
ODBC
Salesforce
Web Table
(from HTML)
GE Historian
The Hows
How do I load data?
U-SQL
SSIS Resource: https://docs.microsoft.com/en-
us/sql/integration-services/connection-manager/azure-
data-lake-store-connection-manager
The Hows
How do I manage DR?
Automatically replicated (3 copies in a
single region)*
*This is where Hadoop/HDFS architecture comes in handy
| 1/42

Preview text:

Azure Data Lake Store & Analytics
Neck deep in the drink… Agenda • Introductions • What is this Thing? • Azure Data Lake Store • Quick End to End Demo • Environment Setup
• Azure Data Lake Analytics & U-SQL • Discussion Who Am I? Audrey Hammonds
▪ Practice Lead, BI & Analytics at Innovative Architects
▪ Recently moved from Atlanta to West Palm Beach
▪ Organizer, Palm Beach Data Meetup
▪ Board of Directors, Palm Beach Tech Association ▪ Twitter: @DataAudrey ▪ LinkedIn:
https://www.linkedin.com/in/audreyhammonds/
▪ Email: Audrey.Hammonds@InnovativeArchitects.com
▪ Blog: Datachix.com (with Julie Smith) Twitter: @InnovArchitects
Facebook: facebook.com/InnovativeArchitects
Blog: blog.innovativearchitects.com/ ▪ Founded in Atlanta
▪ 13 years in project-based consulting ▪ ~120 employees
▪ #6 best small business to work for in ATL in 2017 ▪ People who don’t suck ▪ Focus on: ▪ Data ▪ Integration ▪ Mobile ▪ App Dev ▪ Cloud Infrastructure ▪ Being Awesome The Future is Here!
https://www.youtube.com/watch?v=Bmz67ErIRa4
What is this Data Lake thing? The Main Parts We Hold These Truths… • A database has a schema
• We transform the data into that schema
• Data conforms to the schema we define
• The schema defines the business Database Table Table Data Type Referential Integrity Column Column Column Column Size Column Column Column Column Defaults/Checks Table Table Column Column Column Column Column Column Column Column Table Table Column Column Column Column Column Column Column Column
The Mind-Blowing Proposition… Schema Constraints Data Transform Data Formats Structured Unstructured Semi- Structured
Contrasting Philosophies Database Table Table Data Type Referential Integrity Column Column Column Column Size Column Column Column Column Defaults/Checks Table Table Column Column Column Column Column Column Column Column Table Table Column Column Column Column Column Column Column Column ETL ISA Extract Ingest Transform Store Load Analyze Azure Data Lake Store The What
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-overview The Whys Why Data Lake?
• Traditional data warehousing is the antithesis of agile
• ETL is costly, time-consuming, and fraught with peril
• Schema is hard to change – future-proof it • Scale performance as needed The Whys Why on Azure? • Scale compute and store independently • Leverage MPP capabilities (even against a single object) • Secure with Azure Active Directory The Whys Why HDFS? • Native Support for file/folder operations • Compatible with Spark, Storm, Flume, Sqoop, Kafka, R, etc. • Open standards mean better long-term integration The Hows
What stores can I load from Data Lake Store? Azure Databases File • Blob Storage • SQL Server • File System • Data Lake • Oracle Store • Cosmos DB • SQL Database • SQL Data Warehouse • Search Index • Table storage
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-azure-datalake-connector The Hows
What stores can I load to Data Lake Store? Azure Databases File NoSQL Others •Blob •SQL Server •File System •Cassandra •Generic Storage •Oracle •Amazon S3 •MongoDB HTTP •Cosmos DB •Amazon •FTP •Generic •Data Lake Redshift •HDFS OData Store •DB2 •SFTP •Generic •SQL •MySQL ODBC Database •PostgreSQL •Salesforce •SQL Data •SAP •Web Table Warehouse Business (from HTML) •Table Warehouse •GE Historian storage •SAP HANA •Sybase •Teradata
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-azure-datalake-connector The Hows How do I load data? U-SQL
SSIS Resource: https://docs.microsoft.com/en-
us/sql/integration-services/connection-manager/azure-
data-lake-store-connection-manager The Hows • How do I manage DR?
• Automatically replicated (3 copies in a single region)*
*This is where Hadoop/HDFS architecture comes in handy