-
Thông tin
-
Hỏi đáp
Azure Data Lake Store & Analytics| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội
We Hold These Truths…
• A database has a schema
• We transform the data into that schema
• Data conforms to the schema we define
• The schema defines the business
Môn: Quản trị dữ liệu và trực quan hóa
Trường: Đại học Bách Khoa Hà Nội
Thông tin:
Tác giả:
Preview text:
Azure Data Lake Store & Analytics
Neck deep in the drink… Agenda • Introductions • What is this Thing? • Azure Data Lake Store • Quick End to End Demo • Environment Setup
• Azure Data Lake Analytics & U-SQL • Discussion Who Am I? Audrey Hammonds
▪ Practice Lead, BI & Analytics at Innovative Architects
▪ Recently moved from Atlanta to West Palm Beach
▪ Organizer, Palm Beach Data Meetup
▪ Board of Directors, Palm Beach Tech Association ▪ Twitter: @DataAudrey ▪ LinkedIn:
https://www.linkedin.com/in/audreyhammonds/
▪ Email: Audrey.Hammonds@InnovativeArchitects.com
▪ Blog: Datachix.com (with Julie Smith) Twitter: @InnovArchitects
Facebook: facebook.com/InnovativeArchitects
Blog: blog.innovativearchitects.com/ ▪ Founded in Atlanta
▪ 13 years in project-based consulting ▪ ~120 employees
▪ #6 best small business to work for in ATL in 2017 ▪ People who don’t suck ▪ Focus on: ▪ Data ▪ Integration ▪ Mobile ▪ App Dev ▪ Cloud Infrastructure ▪ Being Awesome The Future is Here!
https://www.youtube.com/watch?v=Bmz67ErIRa4
What is this Data Lake thing? The Main Parts We Hold These Truths… • A database has a schema
• We transform the data into that schema
• Data conforms to the schema we define
• The schema defines the business Database Table Table Data Type Referential Integrity Column Column Column Column Size Column Column Column Column Defaults/Checks Table Table Column Column Column Column Column Column Column Column Table Table Column Column Column Column Column Column Column Column
The Mind-Blowing Proposition… Schema Constraints Data Transform Data Formats Structured Unstructured Semi- Structured
Contrasting Philosophies Database Table Table Data Type Referential Integrity Column Column Column Column Size Column Column Column Column Defaults/Checks Table Table Column Column Column Column Column Column Column Column Table Table Column Column Column Column Column Column Column Column ETL ISA Extract Ingest Transform Store Load Analyze Azure Data Lake Store The What
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-overview The Whys Why Data Lake?
• Traditional data warehousing is the antithesis of agile
• ETL is costly, time-consuming, and fraught with peril
• Schema is hard to change – future-proof it • Scale performance as needed The Whys Why on Azure? • Scale compute and store independently • Leverage MPP capabilities (even against a single object) • Secure with Azure Active Directory The Whys Why HDFS? • Native Support for file/folder operations • Compatible with Spark, Storm, Flume, Sqoop, Kafka, R, etc. • Open standards mean better long-term integration The Hows
What stores can I load from Data Lake Store? Azure Databases File • Blob Storage • SQL Server • File System • Data Lake • Oracle Store • Cosmos DB • SQL Database • SQL Data Warehouse • Search Index • Table storage
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-azure-datalake-connector The Hows
What stores can I load to Data Lake Store? Azure Databases File NoSQL Others •Blob •SQL Server •File System •Cassandra •Generic Storage •Oracle •Amazon S3 •MongoDB HTTP •Cosmos DB •Amazon •FTP •Generic •Data Lake Redshift •HDFS OData Store •DB2 •SFTP •Generic •SQL •MySQL ODBC Database •PostgreSQL •Salesforce •SQL Data •SAP •Web Table Warehouse Business (from HTML) •Table Warehouse •GE Historian storage •SAP HANA •Sybase •Teradata
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-azure-datalake-connector The Hows How do I load data? U-SQL
SSIS Resource: https://docs.microsoft.com/en-
us/sql/integration-services/connection-manager/azure-
data-lake-store-connection-manager The Hows • How do I manage DR?
• Automatically replicated (3 copies in a single region)*
*This is where Hadoop/HDFS architecture comes in handy