22 trang 81 lượt tải

Lecture 5 - ENEE1006IU

162

Tài liệu học tập môn Applied statistics (ENEE1006IU) tại Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh. Tài liệu gồm 22 trang giúp bạn ôn tập hiệu quả và đạt điểm cao! Mời bạn đọc đón xem!

Môn: Applied statistics (ENEE1006IU) 47 tài liệu

Trường: Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh 1.1 K tài liệu

Tác giả:

VietJack

1 năm trước

Tải xuống Báo cáo

Danh sách Quiz

lOMoARcPSD|359747 69

APPLIED STATISTICS

COURSE CODE: ENEE1006IU

Lecture 5:

Chapter 3: Descriptive statistics

(3 credits: 2 is for lecture, 1 is for lab-work)

Instructor: TRAN THANH TU Email:

tttu@hcmiu.edu.vn

tttu@hcmiu.edu.vn 1

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 2

3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING

OUTLIERS

•Distribution Shape

•z-Scores

•Chebyshev’s Theorem

•Empirical Rule

•Detecting Outliers

3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING

OUTLIERS

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 3

•Distribution Shape:

•x

= i

Random Variable

• = Mean of the Distribution

•n = Number of Variables in the

Distribution

•σ (or s) = Standard Distribution

3.3. MEASURES OF DISTRIBUTION

SHAPE, RELATIVE LOCATION, AND

DETECTING OUTLIERS

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 4

•Distribution Shape:

•For a symmetric distribution,

the mean and the median are

equal.

•When the data are positively

skewed, the mean will usually

be greater than the median.

•When the data are negatively

skewed, the mean will usually

be less than the median.

3.3. MEASURES OF

DISTRIBUTION SHAPE, RELATIVE

LOCATION, AND DETECTING OUTLIERS •z-Scores: By using

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 5

both the mean and standard deviation, we can determine the relative location of

any observation i.

The z-score is often called the standardized value.

•The z-score, z

, can be interpreted as the number of standard deviations x

is from

the mean x.

z-score > 0 means x

z-score < 0 means x

< z-

score = 0 means x

3.3. MEASURES OF

DISTRIBUTION SHAPE,

RELATIVE LOCATION, AND

DETECTING OUTLIERS

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 6

•Chebyshev’s Theorem: enables us to make statements about the

proportion (%) of data values that must be within a specified number of standard

deviations of the mean (applied for all distribution shapes).

“At least (1 − 1/z

) of the data values must be within z standard deviations of the

mean, where z is any value greater than 1.”

(but z need not be an integer)

3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 7

DETECTING OUTLIERS

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 8

3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING

OUTLIERS

•Detecting Outliers:

•Sometimes a data set will have one or more observations with unusually large or

unusually small values. These extreme values are called outliers.

•An outlier may be a data value that has been incorrectly recorded that needed

to be corrected or removed before further analysis.

- Standardized values (z-scores) can be used to identify outliers

Treat any data value with a z-score less than −3 or greater than +3 as an outlier

3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING

OUTLIERS

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 9

•Detecting Outliers:

- Another approach to identifying outliers is based upon the values of the first and

third quartiles (Q

and Q

) and the interquartile range (IQR).

An observation is classified as an outlier if its value is less than

the lower limit or greater than the upper limit

lOMoARcPSD|359747 69

End of file 1.

Any questions?

tttu@hcmiu.edu.vn

3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS

•Five-Number Summary

•Box Plot

•Comparative Analysis Using Box Plots

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 11

3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS

•Five-Number Summary: is especially useful in descriptive analyses or during

the preliminary investigation of a large data set.

A summary consists of five values: the most extreme values in the data set (the

maximum and minimum values), the lower and upper quartiles, and the median.

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 12

3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS

•Box Plot: A box plot is a graphical display of data based on a fivenumber

summary. A key to the development of a box plot is the computation of the

interquartile range, IQR = Q

− Q

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 13

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 14

3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS

•Comparative Analysis Using Box Plots: Box plots can also be used to

provide a graphical summary of two or more groups and facilitate visual

comparisons among the groups.

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 15

lOMoARcPSD|359747 69

End of file 2.

Any questions?

tttu@hcmiu.edu.vn

3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES

•Covariance

•Interpretation of the Covariance

•Correlation Coefficient

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 17

•Interpretation of the Correlation Coefficient

3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES

•Covariance: For a sample of size n with the observations (x

, y

), and so on, the sample covariance and population covariance are defined

as follows:

To measure the strength of the linear relationship between x and y

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 18

3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES

•Interpretation of the Covariance: The lines divide the graph into four

quadrants:

Points in quadrant I correspond to x

greater than and y

greater than

Points in quadrant II correspond to x

less than and y

greater than

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 19

Points in quadrant III correspond to x

less than and y

less than Points in

quadrant IV correspond to x greater than

and y less than

value of (x

− )(y

− ) must be:

- positive for points in quadrant I

- negative for points in quadrant II

- positive for points in quadrant III

- negative for points in quadrant IV

3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES

•Correlation Coefficient: Person product moment correlation coefficient:

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 20

the sample correlation coefficient r

is a point estimator of the

population correlation coefficient ρ

3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES

•Interpretation of the Correlation Coefficient:

lOMoARcPSD|359747 69

tttu@hcmiu.edu.vn 21

In general, it can be shown that if all the points in a data set fall on a positively

sloped straight line, the value of the sample correlation coefficient is +1; that is, a

sample correlation coefficient of +1 corresponds to a perfect positive linear

relationship between x and y.

Moreover, if the points in the data set fall on a straight line having negative slope,

the value of the sample correlation coefficient is −1; that is, a sample correlation

coefficient of −1 corresponds to a perfect negative linear relationship between x

and y.

note that correlation provides a measure of linear association and not

necessarily causation

A high correlation between two variables does not mean that changes in one

variable will cause changes in the other variable.

lOMoARcPSD|359747 69

End of file 3.

Any questions?

tttu@hcmiu.edu.vn

Bấm Tải xuống để xem toàn bộ.

Preview text:

lOMoARcPSD|359 747 69 APPLIED STATISTICS COURSE CODE: ENEE1006IU Lecture 5:
Chapter 3: Descriptive statistics
(3 credits: 2 is for lecture, 1 is for lab-work)
Instructor: TRAN THANH TU Email: tttu@hcmiu.edu.vn tttu@hcmiu.edu.vn 1 lOMoARcPSD|359 747 69
3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING OUTLIERS •Distribution Shape •z-Scores •Chebyshev’s Theorem •Empirical Rule •Detecting Outliers
3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING OUTLIERS tttu@hcmiu.edu.vn 2 lOMoARcPSD|359 747 69 •Distribution Shape: •xi = ith Random Variable
• = Mean of the Distribution
•n = Number of Variables in the Distribution
•σ (or s) = Standard Distribution 3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING OUTLIERS tttu@hcmiu.edu.vn 3 lOMoARcPSD|359 747 69 •Distribution Shape:
•For a symmetric distribution, the mean and the median are equal.
•When the data are positively skewed, the mean will usually be greater than the median.
•When the data are negatively skewed, the mean will usually be less than the median. 3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE
LOCATION, AND DETECTING OUTLIERS •z-Scores: By using tttu@hcmiu.edu.vn 4 lOMoARcPSD|359 747 69
both the mean and standard deviation, we can determine the relative location of any observation i.
The z-score is often called the standardized value.
•The z-score, zi, can be interpreted as the number of standard deviations xi is from the mean x. z-score > 0 means xi >
z-score < 0 means xi < z- score = 0 means xi = 3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING OUTLIERS tttu@hcmiu.edu.vn 5 lOMoARcPSD|359 747 69
•Chebyshev’s Theorem: enables us to make statements about the
proportion (%) of data values that must be within a specified number of standard
deviations of the mean (applied for all distribution shapes).
“At least (1 − 1/z2) of the data values must be within z standard deviations of the
mean, where z is any value greater than 1.”
(but z need not be an integer)
3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND tttu@hcmiu.edu.vn 6 lOMoARcPSD|359 747 69 DETECTING OUTLIERS tttu@hcmiu.edu.vn 7 lOMoARcPSD|359 747 69
3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING OUTLIERS •Detecting Outliers:
•Sometimes a data set will have one or more observations with unusually large or
unusually small values. These extreme values are called outliers.
•An outlier may be a data value that has been incorrectly recorded that needed
to be corrected or removed before further analysis.
- Standardized values (z-scores) can be used to identify outliers
Treat any data value with a z-score less than −3 or greater than +3 as an outlier
3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING OUTLIERS tttu@hcmiu.edu.vn 8 lOMoARcPSD|359 747 69 •Detecting Outliers:
- Another approach to identifying outliers is based upon the values of the first and
third quartiles (Q1 and Q3) and the interquartile range (IQR).
An observation is classified as an outlier if its value is less than
the lower limit or greater than the upper limit tttu@hcmiu.edu.vn 9 lOMoARcPSD|359 747 69 End of file 1. Any questions? tttu@hcmiu.edu.vn 10
3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS •Five-Number Summary •Box Plot
•Comparative Analysis Using Box Plots lOMoARcPSD|359 747 69
3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS
•Five-Number Summary: is especially useful in descriptive analyses or during
the preliminary investigation of a large data set.
A summary consists of five values: the most extreme values in the data set (the
maximum and minimum values), the lower and upper quartiles, and the median. tttu@hcmiu.edu.vn 11 lOMoARcPSD|359 747 69
3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS
•Box Plot: A box plot is a graphical display of data based on a fivenumber
summary. A key to the development of a box plot is the computation of the
interquartile range, IQR = Q3 − Q1. tttu@hcmiu.edu.vn 12 lOMoARcPSD|359 747 69 tttu@hcmiu.edu.vn 13 lOMoARcPSD|359 747 69
3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS
•Comparative Analysis Using Box Plots: Box plots can also be used to
provide a graphical summary of two or more groups and facilitate visual comparisons among the groups. tttu@hcmiu.edu.vn 14 lOMoARcPSD|359 747 69 tttu@hcmiu.edu.vn 15 lOMoARcPSD|359 747 69 End of file 2. Any questions? tttu@hcmiu.edu.vn 15
3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES •Covariance
•Interpretation of the Covariance •Correlation Coefficient lOMoARcPSD|359 747 69
•Interpretation of the Correlation Coefficient
3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES
•Covariance: For a sample of size n with the observations (x1, y1),
(x2, y2), and so on, the sample covariance and population covariance are defined as follows:
To measure the strength of the linear relationship between x and y tttu@hcmiu.edu.vn 17 lOMoARcPSD|359 747 69
3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES
•Interpretation of the Covariance: The lines divide the graph into four quadrants:
Points in quadrant I correspond to xi greater than and yi greater than
Points in quadrant II correspond to xi less than and yi greater than tttu@hcmiu.edu.vn 18 lOMoARcPSD|359 747 69
Points in quadrant III correspond to xi less than and yi less than Points in
quadrant IV correspond to x greater i i than and y less than
value of (xi − )(yi − ) must be:
- positive for points in quadrant I
- negative for points in quadrant II
- positive for points in quadrant III
- negative for points in quadrant IV
3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES
•Correlation Coefficient: Person product moment correlation coefficient: tttu@hcmiu.edu.vn 19 lOMoARcPSD|359 747 69
the sample correlation coefficient rxy is a point estimator of the
population correlation coefficient ρxy.
3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES
•Interpretation of the Correlation Coefficient: tttu@hcmiu.edu.vn 20 lOMoARcPSD|359 747 69
In general, it can be shown that if all the points in a data set fall on a positively
sloped straight line, the value of the sample correlation coefficient is +1; that is, a
sample correlation coefficient of +1 corresponds to a perfect positive linear relationship between x and y.
Moreover, if the points in the data set fall on a straight line having negative slope,
the value of the sample correlation coefficient is −1; that is, a sample correlation
coefficient of −1 corresponds to a perfect negative linear relationship between x and y.
note that correlation provides a measure of linear association and not necessarily causation
A high correlation between two variables does not mean that changes in one
variable will cause changes in the other variable. tttu@hcmiu.edu.vn 21 lOMoARcPSD|359 747 69 End of file 3. Any questions? tttu@hcmiu.edu.vn 21
Document Outline

APPLIED STATISTICS
- Chapter 3: Descriptive statistics
  - 3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS
  - 3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS (1)

Lecture 5 - ENEE1006IU

Tài liệu học tập môn Applied statistics (ENEE1006IU) tại Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh. Tài liệu gồm 22 trang giúp bạn ôn tập hiệu quả và đạt điểm cao! Mời bạn đọc đón xem!

Tài liệu liên quan:

Data and Statistics | Bài giảng số 1 chương 1 học phần Applied statistics | Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh

Data and Statistics | Bài giảng số 2 chương 1 học phần Applied statistics | Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh

Plotting and Smoothing data | Bài giảng số 3 chương 2 học phần Applied statistics | Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh

Descriptive statistics | Bài giảng số 4 chương 3 học phần Applied statistics | Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh

Descriptive statistics | Bài giảng số 5 chương 3 học phần Applied statistics | Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh