11 trang 26 lượt tải

Lab 5: Integrating Processes and Ethics | Môn Data Mining - Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh

In the fifth class, we are going to look at some more global issues about the data mining process. (See the lecture of class 5 by Ian H. Witten. Tài liệu được sưu tầm gồm 11 trang, giúp bạn ôn tập tốt hơn. Mời các bạn đón xem.

Môn: Data Mining 10 tài liệu

Trường: Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh 1.1 K tài liệu

Tác giả:

Linh Giang

2 tháng trước

Danh sách Quiz

lOMoARcPSD| 23136115

Introducon to Data Mining

Lab 5: Pung it all together

5.1. The data mining process

In the h class, we are going to look at some more global issues about the data mining process. (See the

lecture of class 5 by Ian H. Wien, [1]

). We are going through four lessons: the data mining process, Pialls

and praalls, and data mining and ethics.

According to [1], the data mining process includes steps: ask a queson, gather data, clean the data, dene

new features, and deploy the result. Write down the brief for these steps:

- Ask a queson

Ask the right kind of queson, such as "What do I want to know?".

This essenal step provides the necessary framework for the subsequent stages of the data mining

process, ensuring a focused and goal-oriented approach. Oming this step can lead to a lack of clarity and

potenal pialls.

- Gather data

Obtain the required data to answer the research queson and/or enrich exisng datasets. While there is

a wealth of data available, challenges such as data quality, relevance, and quanty can limit its usefulness.

To opmize model performance, increasing the amount of data can be a more advantageous approach

than solely ne-tuning the algorithm, as the adage 'more data beats a clever algorithm' suggests.

- Clean the data

Real-world data is oen characterized by noise, missing values, and inconsistencies. To improve data

quality and facilitate accurate analysis, data preprocessing techniques, such as anomaly detecon,

imputaon, integraon, normalizaon, and standardizaon, can be employed to clean and transform

the data.

- Dene new features

hp://www.cs.waikato.ac.nz/ml/weka/mooc/dataminingwithweka/

lOMoARcPSD| 23136115

Create new aributes or features from the exisng data that can provide addional insights and improve

model performance. This process, oen referred to as feature engineering, involves transforming and

combining exisng features to create more informave ones.

- Deploy the result

Deploy the discovered knowledge or model into real-world applicaons or decision-making processes. This

involves sharing the results with relevant stakeholders and integrang them into business operaons. And

here are the 7 steps of the KDD process according to Han and Kamber (2011):

+ Data Cleaning

Removing noise and inconsistent data to improve data quality.

+ Data Selecon

Retrieving relevant data from the database for analysis.

+ Data Integraon

Combining data from mulple sources into a coherent data store.

+ Data Transformaon

Converng data into appropriate forms for mining, oen involving normalizaon and aggregaon.

+ Data Mining

Applying intelligent methods to extract data paerns.

+ Paern Evaluaon

Idenfying truly interesng paerns in the data that represent valuable knowledge, using appropriate

interesngness measures to evaluate their signicance.

+ Knowledge Representaon

Presenng the mined knowledge in a clear, concise, and visually appealing format that is easily

understandable and aconable by the end-user.

5.2. Pialls and praalls

Follow the lecture in [1] to learn what are pialls and praalls in data mining.

Do experiments to invesgate how OneR and J48 deal with missing values.

Write down the results in the following table:

lOMoARcPSD| 23136115

Dataset

OneR’s classier model and

performance

J48’s

classier

performance

model

and

lOMoARcPSD| 23136115

weather

nominal.ar(original)

Classier

=== Classier model (full training

set) ===

outlook:

sunny -> no

overcast -> yes

rainy -> yes

(10/14 instances correct)

Performance

=== 10-fold Straed

crossvalidaon ===

=== Summary ===

Correctly Classied Instances 6

42.8571 %

Incorrectly Classied Instances

8 57.1429 %

Kappa stasc -0.1429

Mean absolute error

0.5714

Root mean squared error

0.7559

Relave absolute error 120

Root relave squared error

153.2194 %

Total Number of Instances

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision

Recall F-Measure MCC ROC

Area PRC Area Class

0.444 0.600 0.571

Classier

=== Classier model (full training set)

===

J48 pruned tree

------------------

outlook = sunny

| humidity = high: no (3.0) |

humidity = normal: yes (2.0)

outlook = overcast: yes (4.0)

outlook = rainy

| windy = TRUE: no (2.0)

| windy = FALSE: yes (3.0)

Number of Leaves : 5

Size of the tree : 8

Performance

=== 10-fold Straed cross-

validaon ===

=== Summary ===

Correctly Classied Instances 7

50 %

Incorrectly Classied Instances 7

50 %

Kappa stasc -0.0426

Mean absolute error

0.4167

Root mean squared error

0.5984

Relave absolute error 87.5

Root relave squared error

121.2987 %

Total Number of Instances 14

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision

Recall F-Measure MCC ROC Area

PRC Area Class

0.556 0.600 0.625

0.556 0.588 -0.043 0.633

lOMoARcPSD| 23136115

0.444 0.500 -0.149 0.422 0.611

yes

0.758 yes

0.400 0.444 0.333

lOMoARcPSD| 23136115

0.400 0.556 0.286

0.400 0.333 -0.149 0.422 0.329

Weighted Avg. 0.429 0.584

0.469 0.429 0.440 -0.149

0.422 0.510

=== Confusion Matrix ===

a b <-- classied as

4 5 | a = yes

3 2 | b = no

0.400 0.364 -0.043 0.633 0.457

Weighted Avg. 0.500 0.544

0.521 0.500 0.508 -0.043

0.633 0.650

=== Confusion Matrix ===

a b <-- classied as

5 4 | a = yes

3 2 | b = no

lOMoARcPSD| 23136115

weather

nominal.ar(with

missing values)

Classier

=== Classier model (full training

set) ===

outlook:

sunny -> yes

overcast -> yes

rainy -> yes

? -> no

(13/14 instances correct)

Performance

=== 10-fold Straed

crossvalidaon ===

=== Summary ===

Correctly Classied Instances

13 92.8571 %

Incorrectly Classied Instances

1 7.1429 %

Kappa stasc 0.8372

Mean absolute error

0.0714

Root mean squared error

0.2673

Relave absolute error 15

Root relave squared error 54.1712

Total Number of Instances

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision

Classier

=== Classier model (full training set)

===

J48 pruned tree

-----------------:

yes (14.0/5.0)

Number of Leaves : 1

Size of the tree : 1

Performance

=== 10-fold Straed cross-

validaon ===

=== Summary ===

Correctly Classied Instances 7

50 %

Incorrectly Classied Instances 7

50 %

Kappa stasc -0.1395

Mean absolute error

0.5403

Root mean squared error

0.5727

Relave absolute error

113.4615 %

Root relave squared error

116.0707 %

Total Number of Instances 14

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision

Recall F-Measure MCC ROC Area

lOMoARcPSD| 23136115

Recall F-Measure MCC ROC

Area PRC Area Class

1.000 0.200 0.900

1.000 0.947 0.849 0.900 0.900

yes

0.800 0.000 1.000

0.800 0.889 0.849 0.900 0.871

Weighted Avg. 0.929 0.129

0.936 0.929 0.926 0.849

0.900 0.890

=== Confusion Matrix ===

a b <-- classied as

9 0 | a = yes

1 4 | b = no

PRC Area Class

0.667 0.800 0.600

0.667 0.632 -0.141 0.211

0.545 yes

0.200 0.333 0.250

0.200 0.222 -0.141 0.211 0.306

Weighted Avg. 0.500 0.633

0.475 0.500 0.485 -0.141

0.211 0.460

=== Confusion Matrix ===

a b <-- classied as

6 3 | a = yes

4 1 | b = no

Remark: how do OneR and J48 deal with missing values?

- OneR: The mere fact that a value is missing can be as important as the value itself, leading to substanal

changes in the nal result

- J48: Even though some values were missing, the overall results remained unaected.

5.3. Data mining and ethics

Reading

5.4. Associaon-rule learners

Do experiments to invesgate how Apriori and FP-Growth generate associaon rules for datasets vote.ar

Dataset

Apriori based associaon rules

FP-Growth based associaon rules

Vote.ar

Apriori

=======

Minimum support: 0.45 (196 instances)

Minimum metric <condence>: 0.9

Number of cycles performed: 11

Generated sets of large itemsets:

=== Run informaon ===

Scheme: weka.associaons.FPGrowth P

2 -I -1 -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M

0.1

Relaon: vote

Instances: 435 Aributes:

handicapped-infants

lOMoARcPSD| 23136115

Size of set of large itemsets L(1): 20

Size of set of large itemsets L(2): 17

Size of set of large itemsets L(3): 6

Size of set of large itemsets L(4): 1

Best rules found:

1. adopon-of-the-budget-

resoluon=y physician-fee-freeze=n 219 ==>

Class=democrat 219 <conf:(1)> li:(1.63)

lev:(0.19) [84] conv:(84.58)

2. adopon-of-the-budget-

resoluon=y physician-fee-freeze=n aid-to-

nicaraguancontras=y 198 ==>

Class=democrat 198 <conf:(1)> li:(1.63)

lev:(0.18) [76] conv:

(76.47)

3. physician-fee-freeze=n aid-to-

nicaraguan-contras=y 211 ==>

Class=democrat 210 <conf:(1)> li:(1.62)

lev:(0.19) [80] conv:

(40.74)

4. physician-fee-freeze=n educaon-

spending=n 202 ==> Class=democrat 201

<conf:(1)> li:(1.62) lev:(0.18) [77] conv:

(39.01)

5. physician-fee-freeze=n 247 ==>

Class=democrat 245 <conf:(0.99)> li:

(1.62) lev:(0.21) [93] conv:(31.8)

6. el-salvador-aid=n Class=democrat

200 ==> aid-to-nicaraguan-contras=y 197

<conf:(0.98)> li:(1.77) lev:(0.2) [85] conv:

(22.18)

7. el-salvador-aid=n 208 ==> aid-to-

nicaraguan-contras=y 204 <conf:(0.98)>

li:(1.76) lev:(0.2) [88] conv:(18.46)

8. adopon-of-the-budget-

resoluon=y aid-to-nicaraguan-contras=y

Class=democrat 203 ==> physician-fee-

freeze=n 198 <conf:

(0.98)> li:(1.72) lev:(0.19) [82] conv:(14.62)

9. el-salvador-aid=n aid-to-

nicaraguancontras=y 204 ==>

Class=democrat 197

water-project-cost-sharing

adopon-of-the-budget-resoluon

physician-fee-freeze el-

salvador-aid

religious-groups-in-schools

an-satellite-test-ban aid-to-

nicaraguan-contras

mx-missile

immigraon

synfuels-corporaon-cutback

educaon-spending

superfund-right-to-sue

crime

duty-free-exports

export-administraon-act-

southafrica

Class

=== Associator model (full training set) ===

FPGrowth found 41 rules (displaying top

10)

1. [el-salvador-aid=y,

Class=republican]: 157 ==> [physician-fee-

freeze=y]: 156 <conf:(0.99)> li:(2.44)

lev:(0.21) conv:

(46.56)

2. [crime=y, Class=republican]: 158

==> [physician-fee-freeze=y]: 155 <conf:

(0.98)> li:(2.41) lev:(0.21) conv:(23.43)

3. [religious-groups-in-schools=y,

physician-fee-freeze=y]: 160 ==>

[elsalvador-aid=y]: 156 <conf:(0.97)>

li:(2) lev:(0.18) conv:(16.4)

4. [Class=republican]: 168 ==>

[physician-fee-freeze=y]: 163

<conf:(0.97)> li:

(2.38) lev:(0.22) conv:(16.61)

5. [adopon-of-the-budget-

resoluon=y, an-satellite-test-ban=y, mx-

missile=y]: 161 ==> [aid-to-nicaraguan-

contras=y]: 155 <conf:(0.96)> li:(1.73)

lev:(0.15) conv:(10.2)

6. [physician-fee-freeze=y,

lOMoARcPSD| 23136115

<conf:(0.97)> li:(1.57) lev:(0.17) [71] conv:

(9.85)

10. aid-to-nicaraguan-contras=y

Class=democrat 218 ==> physician-

feefreeze=n 210 <conf:(0.96)> li:(1.7) lev:

Class=republican]: 163 ==> [el-

salvadoraid=y]: 156 <conf:(0.96)>

li:(1.96) lev:

(0.18) conv:(10.45)

7. [religious-groups-in-schools=y, el-

salvador-aid=y, superfund-right-to-sue=y]:

lOMoARcPSD| 23136115

(0.2) [86] conv:(10.47)

160 ==> [crime=y]: 153 <conf:(0.96)> li:

(1.68) lev:(0.14) conv:(8.6)

8. [el-salvador-aid=y, superfund-

right-to-sue=y]: 170 ==> [crime=y]: 162

<conf: (0.95)> li:(1.67) lev:(0.15)

conv:(8.12)

9. [crime=y, physician-fee-freeze=y]:

168 ==> [el-salvador-aid=y]: 160

<conf:(0.95)> li:(1.95) lev:(0.18)

conv:(9.57)

10. [el-salvador-aid=y, physician-

feefreeze=y]: 168 ==> [crime=y]: 160

<conf: (0.95)> li:(1.67) lev:(0.15)

conv:(8.02)

Bấm Tải xuống để xem toàn bộ.

Preview text:

lOMoAR cPSD| 23136115 Introduction to Data Mining
Lab 5: Putting it all together 5.1. The data mining process
In the fifth class, we are going to look at some more global issues about the data mining process. (See the
lecture of class 5 by Ian H. Witten, [1]1). We are going through four lessons: the data mining process, Pitfalls
and pratfalls, and data mining and ethics.
According to [1], the data mining process includes steps: ask a question, gather data, clean the data, define
new features, and deploy the result. Write down the brief for these steps: - Ask a question
Ask the right kind of question, such as "What do I want to know?".
This essential step provides the necessary framework for the subsequent stages of the data mining
process, ensuring a focused and goal-oriented approach. Omitting this step can lead to a lack of clarity and potential pitfalls. - Gather data
Obtain the required data to answer the research question and/or enrich existing datasets. While there is
a wealth of data available, challenges such as data quality, relevance, and quantity can limit its usefulness.
To optimize model performance, increasing the amount of data can be a more advantageous approach
than solely fine-tuning the algorithm, as the adage 'more data beats a clever algorithm' suggests. - Clean the data
Real-world data is often characterized by noise, missing values, and inconsistencies. To improve data
quality and facilitate accurate analysis, data preprocessing techniques, such as anomaly detection,
imputation, integration, normalization, and standardization, can be employed to clean and transform the data. - Define new features
1 http://www.cs.waikato.ac.nz/ml/weka/mooc/dataminingwithweka/ lOMoAR cPSD| 23136115
Create new attributes or features from the existing data that can provide additional insights and improve
model performance. This process, often referred to as feature engineering, involves transforming and
combining existing features to create more informative ones. - Deploy the result
Deploy the discovered knowledge or model into real-world applications or decision-making processes. This
involves sharing the results with relevant stakeholders and integrating them into business operations. And
here are the 7 steps of the KDD process according to Han and Kamber (2011): + Data Cleaning
Removing noise and inconsistent data to improve data quality. + Data Selection
Retrieving relevant data from the database for analysis. + Data Integration
Combining data from multiple sources into a coherent data store.
+ Data Transformation
Converting data into appropriate forms for mining, often involving normalization and aggregation. + Data Mining
Applying intelligent methods to extract data patterns. + Pattern Evaluation
Identifying truly interesting patterns in the data that represent valuable knowledge, using appropriate
interestingness measures to evaluate their significance.
+ Knowledge Representation
Presenting the mined knowledge in a clear, concise, and visually appealing format that is easily
understandable and actionable by the end-user. 5.2. Pitfalls and pratfalls
Follow the lecture in [1] to learn what are pitfalls and pratfalls in data mining.
Do experiments to investigate how OneR and J48 deal with missing values.
Write down the results in the following table: lOMoAR cPSD| 23136115 Dataset
OneR’s classifier model and J48’s
model and performance classifier performance lOMoAR cPSD| 23136115 weather Classifier Classifier nominal.arff(original)
=== Classifier model (full training === Classifier model (full training set) set) === === outlook: J48 pruned tree sunny -> no ------------------ overcast -> yes rainy -> yes outlook = sunny (10/14 instances correct) | humidity = high: no (3.0) | humidity = normal: yes (2.0) outlook = overcast: yes (4.0) outlook = rainy | windy = TRUE: no (2.0) | windy = FALSE: yes (3.0) Number of Leaves : 5 Size of the tree : 8 Performance Performance === 10-fold Stratified === 10-fold Stratified cross- crossvalidation === validation === === Summary === === Summary ===
Correctly Classified Instances 6 Correctly Classified Instances 7 42.8571 % 50 %
Incorrectly Classified Instances
Incorrectly Classified Instances 7 8 57.1429 % 50 %
Kappa statistic -0.1429 Kappa statistic -0.0426 Mean absolute error Mean absolute error 0.5714 0.4167
Root mean squared error Root mean squared error 0.7559 0.5984 Relative absolute error 120 Relative absolute error 87.5 % % Root relative squared error Root relative squared error 153.2194 % 121.2987 % Total Number of Instances Total Number of Instances 14 14
=== Detailed Accuracy By Class ===
=== Detailed Accuracy By Class === TP Rate FP Rate Precision TP Rate FP Rate Precision Recall F-Measure MCC ROC Area Recall F-Measure MCC ROC PRC Area Class Area PRC Area Class 0.556 0.600 0.625 0.444 0.600 0.571 0.556 0.588 -0.043 0.633 lOMoAR cPSD| 23136115
0.444 0.500 -0.149 0.422 0.611 0.758 yes yes 0.400 0.444 0.333 lOMoAR cPSD| 23136115 0.400 0.556 0.286
0.400 0.364 -0.043 0.633 0.457
0.400 0.333 -0.149 0.422 0.329 no no Weighted Avg. 0.500 0.544 Weighted Avg. 0.429 0.584 0.521 0.500 0.508 -0.043 0.469 0.429 0.440 -0.149 0.633 0.650 0.422 0.510 === Confusion Matrix === === Confusion Matrix === a b <-- classified as a b <-- classified as 5 4 | a = yes 4 5 | a = yes 3 2 | b = no 3 2 | b = no lOMoAR cPSD| 23136115 weather Classifier Classifier nominal.arff(with
=== Classifier model (full training === Classifier model (full training set) missing values) set) === === outlook: J48 pruned tree sunny -> yes -----------------: overcast -> yes yes (14.0/5.0) rainy -> yes ? -> no Number of Leaves : 1 (13/14 instances correct) Size of the tree : 1 Performance Performance === 10-fold Stratified === 10-fold Stratified cross- crossvalidation === validation === === Summary === === Summary ===
Correctly Classified Instances Correctly Classified Instances 7 13 92.8571 % 50 %
Incorrectly Classified Instances
Incorrectly Classified Instances 7 1 7.1429 % 50 % Kappa statistic 0.8372 Kappa statistic -0.1395 Mean absolute error Mean absolute error 0.0714 0.5403
Root mean squared error Root mean squared error 0.2673 0.5727 Relative absolute error 15 Relative absolute error % 113.4615 %
Root relative squared error 54.1712 Root relative squared error % 116.0707 % Total Number of Instances Total Number of Instances 14 14
=== Detailed Accuracy By Class ===
=== Detailed Accuracy By Class === TP Rate FP Rate Precision TP Rate FP Rate Precision Recall F-Measure MCC ROC Area lOMoAR cPSD| 23136115 Recall F-Measure MCC ROC PRC Area Class Area PRC Area Class 0.667 0.800 0.600 1.000 0.200 0.900 0.667 0.632 -0.141 0.211
1.000 0.947 0.849 0.900 0.900 0.545 yes yes 0.200 0.333 0.250 0.800 0.000 1.000
0.200 0.222 -0.141 0.211 0.306
0.800 0.889 0.849 0.900 0.871 no no Weighted Avg. 0.500 0.633 Weighted Avg. 0.929 0.129 0.475 0.500 0.485 -0.141 0.936 0.929 0.926 0.849 0.211 0.460 0.900 0.890 === Confusion Matrix === === Confusion Matrix === a b <-- classified as a b <-- classified as 6 3 | a = yes 9 0 | a = yes 4 1 | b = no 1 4 | b = no
Remark: how do OneR and J48 deal with missing values?
- OneR: The mere fact that a value is missing can be as important as the value itself, leading to substantial changes in the final result
- J48: Even though some values were missing, the overall results remained unaffected. 5.3. Data mining and ethics Reading 5.4. Association-rule learners
Do experiments to investigate how Apriori and FP-Growth generate association rules for datasets vote.arff Dataset
Apriori based association rules
FP-Growth based association rules Vote.arff Apriori === Run information === =======
Scheme: weka.associations.FPGrowth P
Minimum support: 0.45 (196 instances)
2 -I -1 -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M Minimum metric : 0.9 0.1
Number of cycles performed: 11 Relation: vote Instances: 435 Attributes:
Generated sets of large itemsets: 17 handicapped-infants lOMoAR cPSD| 23136115
Size of set of large itemsets L(1): 20 water-project-cost-sharing
adoption-of-the-budget-resolution
Size of set of large itemsets L(2): 17 physician-fee-freeze el- salvador-aid
Size of set of large itemsets L(3): 6 religious-groups-in-schools
anti-satellite-test-ban aid-to-
Size of set of large itemsets L(4): 1 nicaraguan-contras mx-missile Best rules found: immigration synfuels-corporation-cutback 1. adoption-of-the-budget- education-spending
resolution=y physician-fee-freeze=n 219 ==> superfund-right-to-sue
Class=democrat 219 lift:(1.63) crime lev:(0.19) [84] conv:(84.58) duty-free-exports 2. adoption-of-the-budget- export-administration-act-
resolution=y physician-fee-freeze=n aid-to- southafrica
nicaraguancontras=y 198 ==> Class
Class=democrat 198 lift:(1.63) lev:(0.18) [76] conv:
=== Associator model (full training set) === (76.47) 3. physician-fee-freeze=n aid-to-
FPGrowth found 41 rules (displaying top
nicaraguan-contras=y 211 ==> 10)
Class=democrat 210 lift:(1.62) lev:(0.19) [80] conv: 1. [el-salvador-aid=y, (40.74)
Class=republican]: 157 ==> [physician-fee- 4.
physician-fee-freeze=n education- freeze=y]: 156 lift:(2.44)
spending=n 202 ==> Class=democrat 201 lev:(0.21) conv:
lift:(1.62) lev:(0.18) [77] conv: (46.56) (39.01) 2.
[crime=y, Class=republican]: 158 5.
physician-fee-freeze=n 247 ==>
==> [physician-fee-freeze=y]: 155 Class=democrat 245 lift:
(0.98)> lift:(2.41) lev:(0.21) conv:(23.43)
(1.62) lev:(0.21) [93] conv:(31.8)
3. [religious-groups-in-schools=y, 6.
el-salvador-aid=n Class=democrat
physician-fee-freeze=y]: 160 ==>
200 ==> aid-to-nicaraguan-contras=y 197 [elsalvador-aid=y]: 156
lift:(1.77) lev:(0.2) [85] conv:
lift:(2) lev:(0.18) conv:(16.4) (22.18) 4.
[Class=republican]: 168 ==> 7.
el-salvador-aid=n 208 ==> aid-to- [physician-fee-freeze=y]: 163 nicaraguan-contras=y 204 lift:
lift:(1.76) lev:(0.2) [88] conv:(18.46)
(2.38) lev:(0.22) conv:(16.61) 8. adoption-of-the-budget- 5. [adoption-of-the-budget-
resolution=y aid-to-nicaraguan-contras=y
resolution=y, anti-satellite-test-ban=y, mx-
Class=democrat 203 ==> physician-fee-
missile=y]: 161 ==> [aid-to-nicaraguan-
freeze=n 198 contras=y]: 155 lift:(1.73)
(0.98)> lift:(1.72) lev:(0.19) [82] conv:(14.62) lev:(0.15) conv:(10.2) 9. el-salvador-aid=n aid-to- 6. [physician-fee-freeze=y,
nicaraguancontras=y 204 ==> Class=democrat 197 lOMoAR cPSD| 23136115
lift:(1.57) lev:(0.17) [71] conv:
Class=republican]: 163 ==> [el- (9.85) salvadoraid=y]: 156
10. aid-to-nicaraguan-contras=y lift:(1.96) lev:
Class=democrat 218 ==> physician- (0.18) conv:(10.45)
feefreeze=n 210 lift:(1.7) lev: 7.
[religious-groups-in-schools=y, el-
salvador-aid=y, superfund-right-to-sue=y]: lOMoAR cPSD| 23136115 (0.2) [86] conv:(10.47)
160 ==> [crime=y]: 153 lift: (1.68) lev:(0.14) conv:(8.6) 8. [el-salvador-aid=y, superfund-
right-to-sue=y]: 170 ==> [crime=y]: 162 lift:(1.67) lev:(0.15) conv:(8.12) 9.
[crime=y, physician-fee-freeze=y]:
168 ==> [el-salvador-aid=y]: 160 lift:(1.95) lev:(0.18) conv:(9.57) 10. [el-salvador-aid=y, physician-
feefreeze=y]: 168 ==> [crime=y]: 160 lift:(1.67) lev:(0.15) conv:(8.02)

Lab 5: Integrating Processes and Ethics | Môn Data Mining - Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh

Tài liệu liên quan:

Notes: Key Concepts and Techniques | Môn Data Mining - Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh

Lecture 6: Decision Tree & Bayesian Classification | Môn Data Mining - Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh

Lecture 4: Knowledge Representation | Môn Data Mining - Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh

Lecture 3: Data Preprocessing Overview and Techniques | Môn Data Mining - Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh