Introduction to Data Mining | Bài giảng số 1 học phần Data Mining | Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh

Available: 600 faults with expert’s diagnosis ~300 unsatisfactory, rest used for training. Attributes augmented by intermediate concepts that embodied causal domain knowledge. Expert not satisfied with initial rules because they did not relate to his domain knowledge Further background knowledge resulted in more complex rules that were satisfactory Learned rules outperformed hand-crafted ones. Tài liệu giúp bạn tham khảo, ôn tập và đạt kết quả cao. Mời bạn đón xem.  

18/02/2024
2/18/2024
1
Lecture 1:
Introducon to Data Mining
Lecturer: Dr. Nguyen, Thi Thanh Sang
(nsang@hcmiu.edu.vn)
References:
[1] Chapter 1 in Data Mining: Concepts and Techniques (4th Edion), by
Jiawei Han, et.al. 2/18/2024 1
[2] Chapter 1 in Data Mining: Praccal Machine Learning Tools and Techniques (4th
Edion), by Ian H.Wien, et.al.
1
2
18/02/2024
2/18/2024
2
What is data mining?
Example 1: Web usage mining
Given: click streams
Problem: predicon of user behaviour
Data: historical records of embryos and outcome
Example 2: cow culling
Given: cows described by 700 features
Problem: selecon of cows that should be culled Data: historical
records and farmers’ decisions
2/18/2024 3
3
4
18/02/2024
2/18/2024
3
What Is Data Mining?
Data mining (knowledge discovery from data)
Extracon of interesng (non-trivial, implicit, previously unknown
and potenally useful) paerns or knowledge from huge amount of
data
Data mining: a misnomer?
Alternave names
Knowledge discovery (mining) in databases (KDD), knowledge extracon,
data/paern analysis, data archeology, data dredging, informaon
harvesng, business intelligence, etc.
Watch out: Is everything “data mining”?
Simple search and query processing
(Deducve) expert systems
2/18/2024
5
6
18/02/2024
2/18/2024
4
What is data mining?
Data mining is dened as the process of discovering
paerns in data.
The process must be automac or (more usually)
semiautomac.
The paerns discovered must be meaningful in that they
lead to some advantage, usually an economic one.
The data is invariably presented in substanal quanes.
2/18/2024 7
7
8
18/02/2024
2/18/2024
5
Data Mining in Business Intelligence
9
9
10
18/02/2024
2/18/2024
6
Introducon
What is data mining?
Data Mining Goals
Stages of the Data Mining Process
Data Mining Techniques
Knowledge Representaon Methods
Applicaons
Example: weather data
2/18/2024 11
11
18/02/2024
2/18/2024
7
Example: A Web Mining Framework
Web mining usually involves
Data cleaning
Data integraon from mulple sources
Warehousing the data
Data cube construcon
Data selecon for data mining
Data mining
Presentaon of the mining results
Paerns and knowledge to be used or stored into knowledge-base
2/18/2024
13
13
14
18/02/2024
2/18/2024
8
Which View Do You Prefer?
Which view do you prefer?
KDD vs. ML/Stat. vs. Business Intelligence
Depending on the data, applicaons, and your focus
Data Mining vs. Data Exploraon
Business intelligence view
Warehouse, data cube, reporng but not much mining
Business objects vs. data mining tools
Supply chain example: mining vs. OLAP vs. presentaon tools
Data presentaon vs. data exploraon2/18/2024
15
15
16
18/02/2024
2/18/2024
9
Data Mining: Conuence of Mulple Disciplines
17
18
18/02/2024
2/18/2024
10
Machine learning techniques
Algorithms for acquiring structural descripons from
examples
Structural descripons represent paerns explicitly
Can be used to predict outcome in new situaon
Can be used to understand and explain how predicon is derived
(may be even more important)
Methods originate from arcial intelligence, stascs,
and research on databases
2/18/2024
19
19
20
18/02/2024
2/18/2024
11
Can machines really learn?
Denions of “learning” from diconary:
To get knowledge of by study,
experience, or being taughtDicult to measure
To become aware by informaon or from
observaon
To commit to memoryTrivial for computers
To be informed of, ascertain; to receive instrucon
Operaonal denion:
Things learn when they change their behavior
Does a slipper learn?
in a way that makes them perform beer in
the future.
Does learning imply intenon?
2/18/2024
2
22
18/02/2024
2/18/2024
12
Knowledge Representaon Methods
Tables
Data cube
Linear models
Trees
Rules
Instance-based Representaon
Clusters
2/18/2024 23
23
Decision table for the weather problem:
24
lOMoARcPSD|47206417 18/02/2024
Knowledge Representaon Methods
2/18/2024
13
lOMoARcPSD|47206417 18/02/2024
Knowledge Representaon Methods
2/18/2024
14
Regression tree for the CPU data
A linear regression funcon for the CPU performance data
26
28
lOMoARcPSD|47206417 18/02/2024
Knowledge Representaon Methods
2/18/2024
15
27
lOMoARcPSD|47206417 18/02/2024
Knowledge Representaon Methods
2/18/2024
16
Instance-based representaon
2/18/2024
29
29
30
18/02/2024
2/18/2024
17
Introducon
What is data mining?
Data Mining Goals
Stages of the Data Mining Process
Data Mining Techniques
Knowledge Representaon Methods
Applicaons
Example: weather data
2/18/2024 31
31
18/02/2024
2/18/2024
18
Processing loan applicaons (American Express)
Given: quesonnaire with nancial and personal
informaon Queson: should money be lent?
Simple stascal method covers 90% of cases
Borderline cases referred to loan ocers But: 50% of
accepted borderline cases defaulted!
Soluon: reject all borderline cases? No! Borderline
cases are most acve customers
2/18/2024
33
33
32
18/02/2024
2/18/2024
19
Screening images
Given: radar satellite images of coastal waters
Problem: detect oil slicks in those images
Oil slicks appear as dark regions with changing size and
shape
Not easy: lookalike dark regions can be caused by
weather condions (e.g. high wind)
Expensive process requiring highly trained personnel
2/18/2024
35
35
34
18/02/2024
2/18/2024
20
Load forecasng
Electricity supply companies need forecast of
future demand for power
Forecasts of min/max load for each hour
signicant savings
Given: manually constructed load model that assumes
“normal” climac condions Problem: adjust for
weather condions Stac model consists of:
base load for the year
load periodicity over the year
eect of holidays
2/18/2024
37
37
36
| 1/26

Preview text:

18/02/2024 Lecture 1: Introduction to Data Mining
Lecturer: Dr. Nguyen, Thi Thanh Sang (nttsang@hcmiu.edu.vn) References:
[1] Chapter 1 in Data Mining: Concepts and Techniques (4th Edition), by Jiawei Han, et.al. 2/18/2024 1
[2] Chapter 1 in Data Mining: Practical Machine Learning Tools and Techniques (4th
Edition), by Ian H.Witten, et.al. 1 2/18/2024 1 2 18/02/2024 What is data mining? Example 1: Web usage mining ◆Given: click streams
◆Problem: prediction of user behaviour
◆Data: historical records of embryos and outcome Example 2: cow culling
◆Given: cows described by 700 features
◆Problem: selection of cows that should be culled ◆Data: historical
records and farmers’ decisions 2/18/2024 3 3 2/18/2024 2 4 18/02/2024 What Is Data Mining?
Data mining (knowledge discovery from data)
Extraction of interesting (non-trivial, implicit, previously unknown
and potentially useful) patterns or knowledge from huge amount of data Data mining: a misnomer? Alternative names
Knowledge discovery (mining) in databases (KDD), knowledge extraction,
data/pattern analysis, data archeology, data dredging, information
harvesting, business intelligence, etc.
Watch out: Is everything “data mining”?
Simple search and query processing (Deductive) expert systems 2/18/2024 5 2/18/2024 3 6 18/02/2024 What is data mining?
Data mining is defined as the process of discovering patterns in data.
The process must be automatic or (more usually) semiautomatic.
The patterns discovered must be meaningful in that they
lead to some advantage, usually an economic one.
The data is invariably presented in substantial quantities. 2/18/2024 7 7 2/18/2024 4 8 18/02/2024
Data Mining in Business Intelligence 9 9 2/18/2024 5 10 18/02/2024 Introduction What is data mining? Data Mining Goals
Stages of the Data Mining Process Data Mining Techniques
Knowledge Representation Methods Applications Example: weather data 2/18/2024 11 11 2/18/2024 6 18/02/2024
Example: A Web Mining Framework Web mining usually involves Data cleaning
Data integration from multiple sources Warehousing the data Data cube construction
Data selection for data mining Data mining
Presentation of the mining results
Patterns and knowledge to be used or stored into knowledge-base 2/18/2024 13 13 2/18/2024 7 14 18/02/2024 Which View Do You Prefer? Which view do you prefer?
KDD vs. ML/Stat. vs. Business Intelligence
Depending on the data, applications, and your focus
Data Mining vs. Data Exploration Business intelligence view
Warehouse, data cube, reporting but not much mining
Business objects vs. data mining tools
Supply chain example: mining vs. OLAP vs. presentation tools
Data presentation vs. data exploration2/18/2024 15 15 2/18/2024 8 16 18/02/2024
Data Mining: Confluence of Multiple Disciplines 17 2/18/2024 9 18 18/02/2024 Machine learning techniques
Algorithms for acquiring structural descriptions from examples
Structural descriptions represent patterns explicitly
◆Can be used to predict outcome in new situation
◆Can be used to understand and explain how prediction is derived (may be even more important)
Methods originate from artificial intelligence, statistics, and research on databases 2/18/2024 19 19 2/18/2024 10 20 18/02/2024 Can machines really learn?
Definitions of “learning” from dictionary: To get knowledge of by study,
experience, or being taughtDifficult to measure
To become aware by information or from observation
To commit to memoryTrivial for computers
To be informed of, ascertain; to receive instruction Operational definition:
Things learn when they change their behavior
in a way that makes them perform better in Does a slipper learn? the future.
Does learning imply intention? 2/18/2024 2 2/18/2024 11 22 18/02/2024
Knowledge Representation Methods Tables Data cube Linear models Trees Rules Instance-based Representation Clusters 2/18/2024 23 23
Decision table for the weather problem: 2/18/2024 12 24 18/02/2024 lOMoARcPSD|47206417
Knowledge Representation Methods 2/18/2024 13 18/02/2024 lOMoARcPSD|47206417
Knowledge Representation Methods
Regression tree for the CPU data
A linear regression function for the CPU performance data 2/18/2024 14 26 28 18/02/2024 lOMoARcPSD|47206417
Knowledge Representation Methods 27 2/18/2024 15 18/02/2024 lOMoARcPSD|47206417
Knowledge Representation Methods
Instance-based representation 2/18/2024 29 29 2/18/2024 16 30 18/02/2024 Introduction What is data mining? Data Mining Goals
Stages of the Data Mining Process Data Mining Techniques
Knowledge Representation Methods Applications Example: weather data 2/18/2024 31 31 2/18/2024 17 18/02/2024 32
Processing loan applications (American Express)
Given: questionnaire with financial and personal
information Question: should money be lent?
Simple statistical method covers 90% of cases
Borderline cases referred to loan officers But: 50% of
accepted borderline cases defaulted!
Solution: reject all borderline cases? ◆No! Borderline
cases are most active customers 2/18/2024 33 33 2/18/2024 18 18/02/2024 34 Screening images
Given: radar satellite images of coastal waters
Problem: detect oil slicks in those images
Oil slicks appear as dark regions with changing size and shape
Not easy: lookalike dark regions can be caused by
weather conditions (e.g. high wind)
Expensive process requiring highly trained personnel 2/18/2024 35 35 2/18/2024 19 18/02/2024 36 Load forecasting
Electricity supply companies need forecast of future demand for power
Forecasts of min/max load for each hour significant savings
Given: manually constructed load model that assumes
“normal” climatic conditions Problem: adjust for
weather conditions Static model consists of: ◆base load for the year
◆load periodicity over the year ◆effect of holidays 2/18/2024 37 37 2/18/2024 20