-
Thông tin
-
Hỏi đáp
Data Lake| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội
Data Lake| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội. Tài liệu gồm 24 trang giúp bạn ôn tập và đạt kết quả cao trong kỳ thi sắp tới. Mời bạn đọc đón xem.
Môn: Quản trị dữ liệu và trực quan hóa
Trường: Đại học Bách Khoa Hà Nội
Thông tin:
Tác giả:
Preview text:
Microsoft C+E Technology Training Data Platform and Analytics Foundational Training Solution Area Data Analytics Solution Big Data Technology Data Lake [Speaker Name]
Two approaches to information management for
analytics: Top-down and bottom-up How can we Top-down make it happen? (deductive) Prescriptive What will analytics happen? Theory Predictive Theory analytics Hypothesis Why did it happen? Hypothesis Pattern Diagnostic What Observation happened? analytics Observation Descriptive Confirmation analytics Bottom-up LUE (inductive) VA DIFFICULTY
Data warehousing uses a top-down approach Implement data warehouse Understand Gather corporate requirements Reporting and strategy BI and analytics Reporting and analytics analytics design Business development Dashboards Reporting requirements Data warehouse Dimension modeling Physical design ETL ETL ETL design development Technical requirements Data sources Set up infrastructure Install and tune OLTP ERP CRM LOB
The data lake uses a bottom-up approach Ingest all data Store all data Do analysis regardless of requirements in native format without using analytic engines like schema definition Hadoop Batch queries Devices Social Interactive queries Devices LOB applications Video LOB Social applications Real-time analytics Sensors Machine Learning Video Web Sensors Relational Data warehouse Web Clickstream Relational Clickstream
Chal enges involved in implementing a data lake Data Silos Analytics ng Data spans sources Open interfaces to data si ing s le tions Inefficiency in colocation Sa Variety of analytics tools urcha arket pera P M O Performance and Scale Security Storage bottlenecks Compliance challenges IoT sources – small writes Effectively control access Price-performance Corporate policies Data grows independently Introducing Azure Data Lake Azure Data Lake (ADL) Azure Data Lake (ADL) Analytics Azure Data Lake Azure Data Lake Analytics Analytics Storage Azure Data Lake Store Built on open source ADL Analytics ADL HDInsight Hive Analytics U/SQL YARN WebHDFS ADL Store Storage Azure Data Lake
As a part of Cortana Analytics Suite Information Big data stores Machine Learning Dashboards and management and Analytics visualizations Power BI Business apps Azure Azure Azure Personal digital assistant Data Factory Machine Learning SQL Data Warehouse Cortana People Azure Stream Analytics Azure Perceptual intel igence Custom Data Catalog apps Azure Face, vision Azure HDInsight (Hadoop) Speech, text Data Lake Store Azure Azure Event Hub Data Lake Analytics Business scenarios Sensors Recommendations, and devices customer churn, Automated forecasting systems DATA INTELLIGENCE ACTION Introducing Azure Data Lake Big data made easy Analytics on any
Al users productive Ready for your data, any size on day one enterprise How do you start using ADL? Data Create an ADL Analytics account (90 seconds, free) Log in to the ADL Azure portal Store ADL Analytics Azure Storage blobs U-SQL job reads and writes data Write a U-SQL … and script and submit so on it to the ADL Analytics account Azure Data Lake Store
What is Azure Data Lake (ADL) Store?
A highly scalable, distributed, parallel file system in the cloud
specifically designed to work with multiple analytic frameworks Devices Video ADL Analytics Clickstream Web ADL Store HDInsight R Social Sensors Spark Relational LOB applications Machine Learning Introducing Azure Data Lake Store
A hyper-scale repository for big data
Store ANY DATA in its native format analytics workloads
HADOOP FILE SYSTEM (HDFS) for the cloud ENTERPRISE GRADE No limits to SCALE
Optimized for analytic workload PERFORMANCE Any data Devices Video Unstructured Clickstream Web Semi-structured Structured Social Sensors Relational LOB applications HDFS for the cloud
Built from the ground up as a Hadoop file system
Support for file/folder objects and operations
Integration with HDInsight, Hortonworks, ADL Store and Cloudera
Accessible to all HDFS-compliant projects
Spark | Storm | Flume Sqoop | Kafka | R | and more HDInsight Durable and highly available
Automatically replicates your data
Three copies within a single region Highly available Unlimited storage PB Unlimited account sizes TB GB
Individual file sizes from gigabytes to petabytes No limits to scale PB TB Optimized for analytics workload performance
Built for running large analytics systems
that require massive throughput
Optimized for parallel computation over petabytes of data
Automatically optimizes for any throughput Azure Data Lake Analytics