Data Warehouse and OLAP| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

What is Data Warehouse What is Data Warehouse? (1)
• A data warehouse is a repository of information collected from multiple sources, stored under a unified schema, and that usually resides at a single site

Thông tin:
31 trang 3 tháng trước

Bình luận

Vui lòng đăng nhập hoặc đăng ký để gửi bình luận.

Data Warehouse and OLAP| Tài liệu tham khảo môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

What is Data Warehouse What is Data Warehouse? (1)
• A data warehouse is a repository of information collected from multiple sources, stored under a unified schema, and that usually resides at a single site

48 24 lượt tải Tải xuống
Data Warehouse and OLAP
Data
Warehouse
and
OLAP
Week 5
1
Midterm I
Midterm
I
Friday,March4
Scope
Homeworkassignments1
4
Openbook
Team Homework Assignment #7
Team
Homework
Assignment
#7
Rd 121
139 146
150 f h bk
R
ea
d
pp.
121
139
,
146
150
o
f
t
h
etext
b
oo
k
.
DoExamples3.8,3.10andExercise3.4(b)and(c).Preparefor
Duedate
beginningofthelectureonFridayMarch11
th
.
Topics
Topics
Definitionofdatawarehouse
Multidimensionaldatamodel
Datawarehousearchitecture
Fromdatawarehousingtodatamining
What is Data Warehouse
?(1)
What
is
Data
Warehouse
?
(1)
Adatawarehouseisarepositoryofinformation
collectedfrommultiplesources,storedundera
unifiedschema,andthatusuallyresidesatasingle
site
Adatawarehouseisasemanticallyconsistentdata
store thatservesasaphysicalimplementationofa
decisionsupportdatamodel andstoresthe
informationonwhichanenterpriseneedtomake
str ategicdecisions
What is Data Warehouse?
(2)
What
is
Data
Warehouse?
(2)
Data warehouses provide on
line analytical
Data
warehouses
provide
on
line
analytical
processing(OLAP)toolsfortheintera ctive analysisof
multidimensional data of variedgranularities,which
facilitateeff ectivedatageneralizationanddata
mining
Manyotherdataminingfunctions, suchas
association,classification,prediction,andclustering,
b
it td
ith OLAP ti t h
can
b
e
i
n
t
egra
t
e
d
w
ith
OLAP
opera
ti
ons
t
oen
h
ance
interactiveminingofknowledgeatmultiplelevelsof
abstraction
abstraction
What is Data Warehouse?
(3)
What
is
Data
Warehouse?
(3)
Adecisionsupportdatabasethatismaintained
separately fromtheorganization’soperational
database
A
datawarehouseisasubjectoriente
d
,
integrate
d
,
timevariant,andnonvolatile collectionofdatain
supportofmanagementsdecisionmakingprocess
[Inm96].”—W.H.Inmon
Data Warehouse Framework
Data Warehouse Framework
datamining
Figure1.7 TypicalframeworkofadatawarehouseforAllElectronics
8
Data Warehouse is
Sbj
Oi d
S
u
bj
ect-
O
r
i
ente
d
Organizedaroundmajorsubjects , suchascustomer,
product,sales,etc.
Focusingonthe modelingandanalysisofdatafor
decisionmakers,notondailyoperationsor
transactionprocessing
Provideasimpleandconciseviewaroundparticular
subjectissuesbyexcludingdatathatarenotusefulin
thedecisionsupportprocess
Data Warehouse is
It td
I
n
t
egra
t
e
d
Constructedbyintegrating multiple,heterogeneousdata
sources
sources
relationaldatabases,flatfiles,onlinetransactionrecords
Data
cleaning
and data
integration
techniques are applied
Data
cleaning
and
data
integration
techniques
are
applied
.
Ensureconsistencyinnamingconventions,encoding
structures,attributemeasures,etc.amongdifferentdata
sources
E. g.,Hotelprice:currency,tax,breakfastcovered,etc.
Dt W h
i
Ti V i t
D
a
t
a
W
are
h
ouse
i
s
Ti
me
V
ar
i
an
t
Thetimehorizonforthedataware houseissignificantly
lon
g
erthanthatofo
p
erationals
y
stems
g p y
Operationaldatabase:currentvaluedata
Datawarehousedata:provideinformationfroma
historicalperspective(e.g.,past510years)
Everykeystructureinthedatawarehouse
Containsanelementoftime,explicitlyorimplicitly
Dt W h
i
Nltil
D
a
t
a
W
are
h
ouse
i
s
N
onvo
l
a
til
e
Aphysicallyseparate storeofdatatransformedfromthe
operationalenvironment
Operationalupdateofdatadoesnotoccurinthedata
warehouseenvironment
Doesnotrequiretransactionprocessing,recovery,and
concurrencycontrolmechanisms
Ri l t ti i dt i
R
equ
i
reson
l
y
t
woopera
ti
ons
i
n
d
a
t
aaccess
i
ng:
initial loading ofdata andaccess ofdata
OLTP vs OLAP
OLTP vs
.
OLAP
Table 3.1 Comparison between OLTP and OLAP
13
Why Separate is Data Warehouse Needed?
(1)
(1)
Whynotperformonlineanalyticalprocessingdirectlyon
operational databases instead of spending additional time
operational
databases
instead
of
spending
additional
time
andresourcestoconstructaseparatedatawarehouse?
Why Separate is Data Warehouse
Ndd?()
N
ee
d
e
d?
(
2
)
Hi
g
h
p
erformanceforboths
y
stems
g p y
DBMS— tunedforOLTP:searchingforparticularrecords,
indexing,hashing,concurrencycontrol,recovery
Wa rehouse—tunedforOLAP:complexOLAPqueries,
multidimensionalview,consolidation(summarizationand
ti
)
aggrega
ti
on
)
Topics
Topics
Definitionofdatawarehouse
Multidimensionaldatamodel
Datawarehousearchitecture
Fromdatawarehousingtodatamining
From Tables and Spreadsheets to
DCb
D
ata
C
u
b
es
A data warehouse is based on a
multidimensional
A
data
warehouse
is
based
on
a
multidimensional
datamodel
This model views data in the form of a
data cube
This
model
views
data
in
the
form
of
a
data
cube
Adatacube allowsdatatobemodeledandviewedin
multiple
dimensions
multiple
dimensions
From Tables and Spreadsheets to Data
Cb (1)
C
u
b
es
(1)
A data cube is defined by facts and dimensions
A
data
cube
is
defined
by
facts
and
dimensions
Facts are data which data warehouse focus on
Fact tables contain
numeric measures
(such as
Fact
tables
contain
numeric
measures
(such
as
dollars_sold)andkeystoeachoftherelateddimension
tables
Dimensions are perspectives with respect to
fact
Dimensiontablesdescribethedimension with
attributes.Forexample,item(item_name,brand,type),
or
time(day week month quarter year)
or
time(day
,
week
,
month
,
quarter
,
year)
Fig
fro
m
ure 1.6. Fr
a
m
a relati
o
a
gments
o
o
nal data
b
o
f relations
b
ase for Al
l
l
Electroni
c
c
s
19
From Tables and Spreadsheets
t Dt Cb (2)
t
o
D
a
t
a
C
u
b
es
(2)
dimensions
Table 3.2 A 2-D view of sales data for AllElectronics according to the
di i
ti
d
it
h th l f b h l td i
Facts(numericalmeasures)
di
mens
i
ons
ti
me an
d
it
em,
w
h
ere
th
e
sa
l
es
are
f
rom
b
ranc
h
es
l
oca
t
e
d i
n
the city of Vancouver. The measure displayed is dollar_sold (in thousands).
20
| 1/31

Preview text:

Data W a Ware r house house and and OLAP Week 5 1 Midt i erm dt I • Friday, March 4 • Scope
– Homework assignments 1 – 4 – Open book Team Team Homew o Homew rk o A ssig A n ssig ment n #7 • Read pp. 121 – 139, 146 – 150 of h t e text b ook.
• Do Examples 3.8, 3.10 and Exercise 3.4 (b) and (c). Prepare for the re s re u s l u ts t of the homew o homew rk assignmen t assignmen . t • Due date
– beginning of the lecture on Friday March11th. Top To ic i s
• Definition of data warehouse
• Multidimensional data model
• Data warehouse architecture
• From data warehousing to data mining What is Data W a W re r house? (1) (1)
• A data warehouse is a repository of information
collected from multiple sources, stored under a
unified schema, and that usually resides at a single site
• A data warehouse is a semantically consistent data
store that serves as a physical implementation of a
decision support data model and stores the
information on which an enterprise need to make strategic decisions What is Data W a W re r house? (2) • Dat Da a t w a w rehouses r pr ovide pr on‐ on line analytical analytic
processing (OLAP) tools for the interactive analysis of
multidimensional data of varied granularities, which
facilitate effective data generalization and data mining
• Many other data mining functions, such as
association, classification, prediction, and clustering, can be i t n egr t a d e with OLAP oper ti a ons to h en ance
interactive mining of knowledge at multiple levels of abs ab t s r t a r ction What is Data W a W re r house? (3)
• A decision support database that is maintained
separately from the organization’s operational database
• “A data warehouse is a subject‐oriented, integrated,
time‐variant, and nonvolatile collection of data in
support of management’s decision‐making process
[Inm96].”—W. H. Inmon Data Warehouse Framework data mining
Figure 1.7 Typical framework of a data warehouse for AllElectronics 8 Data Warehouse is Subj bject-Oriented
• Organized around major subjects, such as customer, product, sales, etc.
• Focusing on the modeling and analysis of data for
decision makers, not on daily operations or transaction processing
• Provide a simple and concise view around particular
subject issues by excluding data that are not useful in the decision support process Data Warehouse is I t n egr t a ed
• Constructed by integrating multiple, heterogeneous data sources sour
– relational databases, flat files, on‐line transaction records • Dat Da a t cleaning and da t da a t in t in e t gr e a gr t a ion t te t chniques e are ar applied .
– Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data sources
• E.g., Hotel price: currency, tax, breakfast covered, etc. D t a W a ar h e ouse is Time V i ar t an
• The time horizon for the data warehouse is significantly longer
g than that of operational syste y ms
– Operational database: current value data
– Data warehouse data: provide information from a
historical perspective (e.g., past 5‐10 years)
• Every key structure in the data warehouse
– Contains an element of time, explicitly or implicitly D t a W a ar h e ouse is Nonv l o t a il tile
• A physically separate store of data transformed from the operational environment
• Operational update of data does not occur in the data warehouse environment
– Does not require transaction processing, recovery, and concurrency control mechanisms – Requires only t wo oper ti a ons i n d t a a i access ng:
initial loading of data and access of data OLTP vs OLAP .
Table 3.1 Comparison between OLTP and OLAP 13
Why Separate is Data Warehouse Needed? (1)
Why not perform on‐line analytical processing directly on operational databases dat inst ins ead t of
spending additional time
and resources to construct a separate data warehouse?
Why Separate is Data Warehouse N d ee d e ? d? ( ) 2 • High g performance for both syste y ms
– DBMS— tuned for OLTP: searching for particular records,
indexing, hashing, concurrency control, recovery
– Warehouse—tuned for OLAP: complex OLAP queries,
multidimensional view, consolidation (summarization and aggreg ti a on) Top To ic i s
• Definition of data warehouse
• Multidimensional data model
• Data warehouse architecture
• From data warehousing to data mining
From Tables and Spreadsheets to Dat C a ubes • A dat da a t w a w rehouse r is based on a multidimensional data model • This model vie w vie s w da t da a t in the for fo m of a da t da a t cube
• A data cube allows data to be modeled and viewed in multiple dimensions
From Tables and Spreadsheets to Data Cub ( es 1) (1) • A da t da a t cube is de fined de by fact c s t and dimensions
– Facts are data which data warehouse focus on • Fact c ta bl b es e co ntain numeric measur es measur (such as
dollars_sold) and keys to each of the related dimension tables
– Dimensions are perspectives with respect to fact
• Dimension tables describe the dimension with
attributes. For example, item (item_name, brand, type), or time(day time(da , w eek w , mont mon h, quart er quart , yea ye r) r fro Fig m ure 1.6. a relati Fr o a nal data gments b o ase for f relations Al lElectroni c s 19 From Tables and Spreadsheets to D Data C Cubes (2) (2) dimensions Facts (numerical measures)
Table 3.2 A 2-D view of sales data for AllElectronics according to the di i mens ons time and it d em , h w ere t the h sal les are f from b branches l located in d i
the city of Vancouver. The measure displayed is dollar_sold (in thousands). 20