19 trang 15 lượt tải

Bài giảng về Information Theory and Linear Regression | Trường Đại học Công nghệ, Đại học Quốc gia Hà Nội

Bài giảng về Information Theory and Linear Regression | Trường Đại học Công nghệ, Đại học Quốc gia Hà Nội. Tài liệu được sưu tầm giúp bạn tham khảo, ôn tập và đạt kết quả cao. Mời bạn đọc đón xem.

Môn: Học máy 10 tài liệu

Trường: Trường Đại học Công nghệ, Đại học Quốc gia Hà Nội 824 tài liệu

Tác giả:

nguyen hue

1 tháng trước

Tải xuống Báo cáo

Danh sách Quiz

Information Theory and

Linear Regression

Page 01

How do we choose between splits when constructing decision trees?

• Measure how much information we can gain from a given split.

• This quantity is called Information Gain!

• It is an information theoretic concept that quantiﬁes for a r.v. how much

uncertainty is removed if we know its value.

Let’s review some information theory basics and deﬁnitions.

Information Theory

MLA_Tut3

Uncertainty is the main building block of many information theory concepts.

• We don’t always have all the information about all the variables we care

about.

• We use probabilities about events to make informed guesses.

• As we learn more information, we can increase conﬁdence, or decrease

uncertainty, in our guess

Uncertainty and Entropy

MLA_Tut3

Page 02

Uncertainty and Entropy

MLA_Tut3

Page 03

Joint Entropy

MLA_Tut3

Page 04

Conditional Entropy

MLA_Tut3

Page 05

Conditional Entropy

MLA_Tut3

Page 06

Aside: Logarithm Properties

MLA_Tut3

Page 07

Information Gain

Finally, we can now quantify a notion of Information Gain, aka Mutual Information

between r.v.s X and Y .

• This quantiﬁes how much more certain (or less uncertain) we are about Y if we

know the value of X.

• In other words, how much uncertainty (or entropy) is reduced in Y once we are

given X?

• Deﬁnition: take the entropy of Y and subtract the conditional entropy of Y given X

IG(Y|X) = H(Y) - H(Y|X)

MLA_Tut3

Page 08

Exercises: Information Theory

We now practice computing some of these quantities and prove some standard

equalities and inequalities of information theory, which appear in many

contexts in machine learning and elsewhere.

MLA_Tut3

Page 09

Exercise 1

Let p(x, y) be given by

Compute

• H(X), H(Y)

• H(X|Y), H(Y|X)

• H(X|Y)

• IG(Y|X)

0 1

0 1/3 1/3

1 0 1/3

MLA_Tut3

Page 10

Exercise 2

Page 11

MLA_Tut3

Exercise 3

Prove the Chain Rule for entropy, i.e

H(X,Y) = H(X|Y) + H(Y) = H(Y|X) + H(X)

MLA_Tut3

Page 12

Exercise 4

MLA_Tut3

Prove that H(X, Y ) ≥ H(X).

Hint: you can use results of the ﬁrst two exercises.

Page 13

Linear Regression Review

MLA_Tut3

Page 14

Data, Parameters and the Model

MLA_Tut3

Page 15

Objective Function

Page 16

MLA_Tut3

Exercise: Linear Regression Bias-Variance

Page 17

MLA_Tut3

Exercise: Linear Regression Bias-Variance

MLA_Tut3

Using the above, derive the bias-variance decomposition for the linear

regression problem.

Page 18

Bấm Tải xuống để xem toàn bộ.

Preview text:

Information Theory and Linear Regression Information Theory
How do we choose between splits when constructing decision trees?
• Measure how much information we can gain from a given split.
• This quantity is called Information Gain!
• It is an information theoretic concept that quantiﬁes for a r.v. how much
uncertainty is removed if we know its value.
Let’s review some information theory basics and deﬁnitions. MLA_Tut3 Page 01 Uncertainty and Entropy
Uncertainty is the main building block of many information theory concepts.
• We don’t always have all the information about all the variables we care about.
• We use probabilities about events to make informed guesses.
• As we learn more information, we can increase conﬁdence, or decrease uncertainty, in our guess MLA_Tut3 Page 02 Uncertainty and Entropy MLA_Tut3 Page 03 Joint Entropy MLA_Tut3 Page 04 Conditional Entropy MLA_Tut3 Page 05 Conditional Entropy MLA_Tut3 Page 06
Aside: Logarithm Properties MLA_Tut3 Page 07 Information Gain
Finally, we can now quantify a notion of Information Gain, aka Mutual Information between r.v.s X and Y .
• This quantiﬁes how much more certain (or less uncertain) we are about Y if we know the value of X.
• In other words, how much uncertainty (or entropy) is reduced in Y once we are given X?
• Deﬁnition: take the entropy of Y and subtract the conditional entropy of Y given X IG(Y|X) = H(Y) - H(Y|X) MLA_Tut3 Page 08
Exercises: Information Theory
We now practice computing some of these quantities and prove some standard
equalities and inequalities of information theory, which appear in many
contexts in machine learning and elsewhere. MLA_Tut3 Page 09 Exercise 1
Let p(x, y) be given by 0 1 0 1/3 1/3 1 0 1/3 Compute • H(X), H(Y) • H(X|Y), H(Y|X) • H(X|Y) • IG(Y|X) MLA_Tut3 Page 10 Exercise 2 Page 11 MLA_Tut3 Exercise 3
Prove the Chain Rule for entropy, i.e
H(X,Y) = H(X|Y) + H(Y) = H(Y|X) + H(X) MLA_Tut3 Page 12 Exercise 4
Prove that H(X, Y ) ≥ H(X).
Hint: you can use results of the ﬁrst two exercises. MLA_Tut3 Page 13
Linear Regression Review MLA_Tut3 Page 14
Data, Parameters and the Model MLA_Tut3 Page 15 Objective Function MLA_Tut3 Page 16
Exercise: Linear Regression Bias-Variance MLA_Tut3 Page 17
Exercise: Linear Regression Bias-Variance
Using the above, derive the bias-variance decomposition for the linear regression problem. MLA_Tut3 Page 18

Bài giảng về Information Theory and Linear Regression | Trường Đại học Công nghệ, Đại học Quốc gia Hà Nội

Tài liệu liên quan:

Bài giảng về Decision Trees and Bias-Variance môn Học máy | Trường Đại học Công nghệ, Đại học Quốc gia Hà Nội

Tài liệu học thuật về Deep Learning môn Học máy | Trường Đại học Công nghệ, Đại học Quốc gia Hà Nội

Ôn tập cuối kỳ: Regularized cost and gradient môn Học máy | Trường Đại học Công nghệ, Đại học Quốc gia Hà Nội

Ôn tập cuối kỳ: Logistic Regression môn Học máy | Trường Đại học Công nghệ, Đại học Quốc gia Hà Nội