Tìm kiếm tài liệu Giáo Dục

3478 tài liệu

Data Warehouse and OLAP| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

299 150 lượt tải 31 trang

What is Data Warehouse What is Data Warehouse? (1)
• A data warehouse is a repository of information collected from multiple sources, stored under a unified schema, and that usually resides at a single site

Danh mục: Đại học Bách Khoa Hà Nội

Môn: Quản trị dữ liệu và trực quan hóa

Dạng: Tài liệu

Tác giả: Trịnh Thảo Anh

Xem ngay

1 năm trước

Xem ngay
Data Lakes Purposes, Practices, Patterns, and Platforms| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

192 96 lượt tải 42 trang

Introduction to Data Lakes
We’re experiencing a time of great change as data evolves into greater diversity (more data types, sources, schema, and latencies) and as user organizations diversify the ways they use data for business value (via advanced analytics and data integrated across multiple analytics and operational applications). To capture new big data, to scale up burgeoning traditional data, and to leverage both fully, users are modernizing their portfolios of tools, platforms, best practices, and skills.

Danh mục: Đại học Bách Khoa Hà Nội

Môn: Quản trị dữ liệu và trực quan hóa

Dạng: Tài liệu

Tác giả: Trịnh Thảo Anh

Xem ngay

1 năm trước

Xem ngay
An Introduction to Data Management| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

214 107 lượt tải 54 trang

What is Data Management?
Data management concerns the dealing with data in the scientific context. Often, more importance is given to results, analysis and derived conclusion than to the data themselves. However, data are a product of the science enterprise and are more and more understood as a valuable research output themselves (DataONE 2012b; Ludwig and Enke 2013; Data Service 2012-2015a). Research data are considered all
information collected, observed or created for purposes of analysis and validation of original research results. Data can be quantitative or qualitative and comprises also photos, objects or audio files, resulting from as different sources as field experiments, model outputs or satellite data. In the following, the focus lies on the management of quantitative digital data.

Danh mục: Đại học Bách Khoa Hà Nội

Môn: Quản trị dữ liệu và trực quan hóa

Dạng: Tài liệu

Tác giả: Trịnh Thảo Anh

Xem ngay

1 năm trước

Xem ngay
Programming, Data Management and Visualization| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

433 217 lượt tải 4 trang

1. GOALS
In this class you will learn advanced concepts in programming and data management using the statistical software package Stata. We focus on Stata because almost all subsequent courses in the econometrics curriculum use this software. However, note that many topics we cover are highly relevant for any statistical programming suite, even though the commands and concepts may differ slightly. However, Stata is not object-oriented as most other common languages, such
as R or Python, but rather procedural or function-oriented (which makes it also much easier to learn). Upon successful completion, you are capable to handle Stata and understand data management at a level required for the subsequent courses in the JKU econometrics curriculum.

Danh mục: Đại học Bách Khoa Hà Nội

Môn: Quản trị dữ liệu và trực quan hóa

Dạng: Tài liệu

Tác giả: Trịnh Thảo Anh

Xem ngay

1 năm trước

Xem ngay
Experiences with Managing Data Ingestion into a Corporate Datalake| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

230 115 lượt tải 10 trang

INTRODUCTION
The Hadoop Distributed File System (HDFS) [1] is an
inexpensive means of aggregating storage from commodity
machines across a cluster. It has been shown to scale to
petabytes of data [2] allowing organizations to store and
process data at a scale that had not previously been feasible
without very expensive dedicated systems. This has led to the
concept of a Datalake where a company can store their raw
data in such a way that it could be governed by one set of
policies but processed by multiple different teams [3], [4].

Danh mục: Đại học Bách Khoa Hà Nội

Môn: Quản trị dữ liệu và trực quan hóa

Dạng: Tài liệu

Tác giả: Trịnh Thảo Anh

Xem ngay

1 năm trước

Xem ngay
Data Lake Management: Challenges and Opportunities| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

197 99 lượt tải 4 trang

INTRODUCTION
A data lake is a massive collection of datasets that: (1)
may be hosted in different storage systems; (2) may vary
in their formats; (3) may not be accompanied by any use-ful metadata or may use different formats to describe their
metadata; and (4) may change autonomously over time. En-terprises have embraced data lakes for a variety of reasons.
First, data lakes decouple data producers (for example, op-erational systems) from data consumers (such as, reporting
and predictive analytics systems). This is important, espe-cially when the operational systems are legacy mainframes
which may not even be owned by the enterprise (as is com-mon in many enterprises such as banking and finance). For
data science, data lakes provide a convenient storage layer
for experimental data, both the input and output of data
analysis and learning tasks. The creation and use of data
can be done autonomously without coordination with other
programs or analysts. But the shared storage of a data lake
coupled with a (typically distributed) computational frame-work, provides the rudimentary infrastructure required for
sharing and re-use of massive datasets.

Danh mục: Đại học Bách Khoa Hà Nội

Môn: Quản trị dữ liệu và trực quan hóa

Dạng: Tài liệu

Tác giả: Trịnh Thảo Anh

Xem ngay

1 năm trước

Xem ngay
Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

114 57 lượt tải 14 trang

INTRODUCTION
Cloud object stores such as Amazon S3 [4] and Azure Blob
Storage [17] have become some of the largest and most widely used storage systems on the planet, holding exabytes of data for millions of customers [46]. Apart from the traditional advantages of clouds services, such as pay-as-you-go billing, economies of scale, and expert management [15], cloud object stores are especially attractive because they allow users to scale computing and storage resources
separately: for example, a user can store a petabyte of data but only run a cluster to execute a query over it for a few hours.

Danh mục: Đại học Bách Khoa Hà Nội

Môn: Quản trị dữ liệu và trực quan hóa

Dạng: Tài liệu

Tác giả: Trịnh Thảo Anh

Xem ngay

1 năm trước

Xem ngay
OLTP VS OLAP| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

211 106 lượt tải 19 trang

OLTP (ON-LINE TRANSACTION PROCESSING)
- is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE).
- The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second.

Danh mục: Đại học Bách Khoa Hà Nội

Môn: Quản trị dữ liệu và trực quan hóa

Dạng: Tài liệu, Bài giảng

Tác giả: Trịnh Thảo Anh

Xem ngay

1 năm trước

Xem ngay
Exercise on OLAP| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

99 50 lượt tải 8 trang

Exercise (contd.)
1. Define a star schema to represent the above
multidimensional structure;
2. Define a snowflake schema that reduces (at least on one
dimension) the redundancy of the star schema defined at
the previous point;

Danh mục: Đại học Bách Khoa Hà Nội

Môn: Quản trị dữ liệu và trực quan hóa

Dạng: Tài liệu, Bài tập

Tác giả: Trịnh Thảo Anh

Xem ngay

1 năm trước

Xem ngay
Building Robust Data Pipelines with Delta Lake | Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

140 70 lượt tải 26 trang

Data Pipeline V1
• Took 1 engineer ~1 week to implement
• Was pretty robust for the early days of Databricks

Danh mục: Đại học Bách Khoa Hà Nội

Môn: Quản trị dữ liệu và trực quan hóa

Dạng: Tài liệu, Bài giảng

Tác giả: Trịnh Thảo Anh

Xem ngay

1 năm trước

Xem ngay

Data Warehouse and OLAP| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

Data Lakes Purposes, Practices, Patterns, and Platforms| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

An Introduction to Data Management| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

Programming, Data Management and Visualization| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

Experiences with Managing Data Ingestion into a Corporate Datalake| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

Data Lake Management: Challenges and Opportunities| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

OLTP VS OLAP| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

Exercise on OLAP| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

Building Robust Data Pipelines with Delta Lake | Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội