22 trang 124 lượt tải

Descriptive statistics | Bài giảng số 5 chương 3 học phần Applied statistics | Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh

247

Sometimes a data set will have one or more observations with unusually large or unusually small values. These extreme values are called outliers. An outlier may be a data value that has been incorrectly recorded that needed to be corrected or removed before further analysis. Standardized values (z-scores) can be used to identify outliers. Treat any data value with a z-score less than −3 or greater than +3 as an outlier. Tài liệu giúp bạn tham khảo, ôn tập và đạt kết quả cao. Mời bạn đón xem.

Môn: Applied statistics (ENEE1006IU) 47 tài liệu

Trường: Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh 2 K tài liệu

Tác giả:

VietJack

1 năm trước

Tải xuống Báo cáo

Danh sách Quiz

APPLIED STATISTICS

COURSE CODE: ENEE1006IU

Lecture 5:

Chapter 3: Descripve stascs

(3 credits: 2 is for lecture, 1 is for lab-work)

Instructor: TRAN THANH TU Email:

tu@hcmiu.edu.vn

tu@hcmiu.edu.vn 1

tu@hcmiu.edu.vn 2

3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING

OUTLIERS

•Distribuon Shape

•z-Scores

•Chebyshev’s Theorem

•Empirical Rule

•Detecng Outliers

3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING

OUTLIERS

tu@hcmiu.edu.vn 3

•Distribuon Shape:

•x

= i

Random Variable

• = Mean of the Distribuon

•n = Number of Variables in the

Distribuon

•σ (or s) = Standard Distribuon

3.3. MEASURES OF DISTRIBUTION

SHAPE, RELATIVE LOCATION, AND

DETECTING OUTLIERS

tu@hcmiu.edu.vn 4

•Distribuon Shape:

•For a symmetric distribuon,

the mean and the median are

equal.

•When the data are posively

skewed, the mean will usually

be greater than the median.

•When the data are negavely

skewed, the mean will usually

be less than the median.

3.3. MEASURES OF

DISTRIBUTION SHAPE, RELATIVE

LOCATION, AND DETECTING OUTLIERS •z-Scores: By using both the

tu@hcmiu.edu.vn 5

mean and standard deviaon, we can determine the relave locaon of any

observaon i.

The z-score is oen called the standardized value.

•The z-score, z

, can be interpreted as the number of standard deviaons x

is from

the mean x.

z-score > 0 means x

z-score < 0 means x

< z-

score = 0 means x

3.3. MEASURES OF

DISTRIBUTION SHAPE,

RELATIVE LOCATION, AND

DETECTING OUTLIERS

•Chebyshev’s Theorem: enables us to make statements about the

tu@hcmiu.edu.vn 6

proporon (%) of data values that must be within a specied number of standard

deviaons of the mean (applied for all distribuon shapes).

“At least (1 − 1/z

) of the data values must be within z standard deviaons of the

mean, where z is any value greater than 1.”

(but z need not be an integer)

3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND

DETECTING OUTLIERS

tu@hcmiu.edu.vn 7

3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING

OUTLIERS

tu@hcmiu.edu.vn 8

•Detecng Outliers:

•Somemes a data set will have one or more observaons with unusually large or

unusually small values. These extreme values are called outliers.

•An outlier may be a data value that has been incorrectly recorded that needed to

be corrected or removed before further analysis.

- Standardized values (z-scores) can be used to idenfy outliers

Treat any data value with a z-score less than −3 or greater than +3 as an outlier

3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING

OUTLIERS

•Detecng Outliers:

- Another approach to idenfying outliers is based upon the values of the rst and

third quarles (Q

and Q

) and the interquarle range (IQR).

tu@hcmiu.edu.vn 9

An observaon is classied as an outlier if its value is less than

the lower limit or greater than the upper limit

lOMoARcPSD|47231818

End of le 1.

Any quesons?

tu@hcmiu.edu.vn

3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS

•Five-Number Summary

•Box Plot

•Comparave Analysis Using Box Plots

lOMoARcPSD|47231818

tu@hcmiu.edu.vn 11

3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS

•Five-Number Summary: is especially useful in descripve analyses or during

the preliminary invesgaon of a large data set.

A summary consists of ve values: the most extreme values in the data set (the

maximum and minimum values), the lower and upper quarles, and the median.

lOMoARcPSD|47231818

tu@hcmiu.edu.vn 12

3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS

•Box Plot: A box plot is a graphical display of data based on a venumber

summary. A key to the development of a box plot is the computaon of the

interquarle range, IQR = Q

− Q

lOMoARcPSD|47231818

tu@hcmiu.edu.vn 13

lOMoARcPSD|47231818

tu@hcmiu.edu.vn 14

3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS

•Comparave Analysis Using Box Plots: Box plots can also be used to

provide a graphical summary of two or more groups and facilitate visual

comparisons among the groups.

lOMoARcPSD|47231818

tu@hcmiu.edu.vn 15

lOMoARcPSD|47231818

End of le 2.

Any quesons?

tu@hcmiu.edu.vn

3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES

•Covariance

•Interpretaon of the Covariance

•Correlaon Coecient

lOMoARcPSD|47231818

tu@hcmiu.edu.vn 17

•Interpretaon of the Correlaon Coecient

3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES

•Covariance: For a sample of size n with the observaons (x

, y

), and so on, the sample covariance and populaon covariance are dened

as follows:

To measure the strength of the linear relaonship between x and y

lOMoARcPSD|47231818

tu@hcmiu.edu.vn 18

3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES

•Interpretaon of the Covariance: The lines divide the graph into four

quadrants:

Points in quadrant I correspond to x

greater than and y

greater than

Points in quadrant II correspond to x

less than and y

greater than

lOMoARcPSD|47231818

tu@hcmiu.edu.vn 19

Points in quadrant III correspond to x

less than and y

less than Points in

quadrant IV correspond to x greater than

and y less than

value of (x

− )(y

− ) must be:

- posive for points in quadrant I

- negave for points in quadrant II

- posive for points in quadrant III

- negave for points in quadrant IV

3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES

•Correlaon Coecient: Person product moment correlaon coecient:

lOMoARcPSD|47231818

tu@hcmiu.edu.vn 20

the sample correlaon coecient r

is a point esmator of the

populaon correlaon coecient ρ

3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES

•Interpretaon of the Correlaon Coecient:

Bấm Tải xuống để xem toàn bộ.

Preview text:

APPLIED STATISTICS COURSE CODE: ENEE1006IU Lecture 5:
Chapter 3: Descriptive statistics
(3 credits: 2 is for lecture, 1 is for lab-work)
Instructor: TRAN THANH TU Email: tttu@hcmiu.edu.vn tttu@hcmiu.edu.vn 1
3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING OUTLIERS •Distribution Shape •z-Scores •Chebyshev’s Theorem •Empirical Rule •Detecting Outliers
3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING OUTLIERS tttu@hcmiu.edu.vn 2 •Distribution Shape: •xi = ith Random Variable
• = Mean of the Distribution
•n = Number of Variables in the Distribution
•σ (or s) = Standard Distribution 3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING OUTLIERS tttu@hcmiu.edu.vn 3 •Distribution Shape:
•For a symmetric distribution, the mean and the median are equal.
•When the data are positively skewed, the mean will usually be greater than the median.
•When the data are negatively skewed, the mean will usually be less than the median. 3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE
LOCATION, AND DETECTING OUTLIERS •z-Scores: By using both the tttu@hcmiu.edu.vn 4
mean and standard deviation, we can determine the relative location of any observation i.
The z-score is often called the standardized value.
•The z-score, zi, can be interpreted as the number of standard deviations xi is from the mean x. z-score > 0 means xi >
z-score < 0 means xi < z- score = 0 means xi = 3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING OUTLIERS
•Chebyshev’s Theorem: enables us to make statements about the tttu@hcmiu.edu.vn 5
proportion (%) of data values that must be within a specified number of standard
deviations of the mean (applied for all distribution shapes).
“At least (1 − 1/z2) of the data values must be within z standard deviations of the
mean, where z is any value greater than 1.”
(but z need not be an integer)
3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING OUTLIERS tttu@hcmiu.edu.vn 6
3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING OUTLIERS tttu@hcmiu.edu.vn 7 •Detecting Outliers:
•Sometimes a data set will have one or more observations with unusually large or
unusually small values. These extreme values are called outliers.
•An outlier may be a data value that has been incorrectly recorded that needed to
be corrected or removed before further analysis.
- Standardized values (z-scores) can be used to identify outliers
Treat any data value with a z-score less than −3 or greater than +3 as an outlier
3.3. MEASURES OF DISTRIBUTION SHAPE, RELATIVE LOCATION, AND DETECTING OUTLIERS •Detecting Outliers:
- Another approach to identifying outliers is based upon the values of the first and
third quartiles (Q1 and Q3) and the interquartile range (IQR). tttu@hcmiu.edu.vn 8
An observation is classified as an outlier if its value is less than
the lower limit or greater than the upper limit tttu@hcmiu.edu.vn 9 lOMoARcPSD|47231818 End of file 1. Any questions? tttu@hcmiu.edu.vn 10
3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS •Five-Number Summary •Box Plot
•Comparative Analysis Using Box Plots lOMoARcPSD|47231818
3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS
•Five-Number Summary: is especially useful in descriptive analyses or during
the preliminary investigation of a large data set.
A summary consists of five values: the most extreme values in the data set (the
maximum and minimum values), the lower and upper quartiles, and the median. tttu@hcmiu.edu.vn 11 lOMoARcPSD|47231818
3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS
•Box Plot: A box plot is a graphical display of data based on a fivenumber
summary. A key to the development of a box plot is the computation of the
interquartile range, IQR = Q3 − Q1. tttu@hcmiu.edu.vn 12 lOMoARcPSD|47231818 tttu@hcmiu.edu.vn 13 lOMoARcPSD|47231818
3.4. FIVE NUMBERS SUMMARIES AND BOX PLOTS
•Comparative Analysis Using Box Plots: Box plots can also be used to
provide a graphical summary of two or more groups and facilitate visual comparisons among the groups. tttu@hcmiu.edu.vn 14 lOMoARcPSD|47231818 tttu@hcmiu.edu.vn 15 lOMoARcPSD|47231818 End of file 2. Any questions? tttu@hcmiu.edu.vn 15
3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES •Covariance
•Interpretation of the Covariance •Correlation Coefficient lOMoARcPSD|47231818
•Interpretation of the Correlation Coefficient
3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES
•Covariance: For a sample of size n with the observations (x1, y1),
(x2, y2), and so on, the sample covariance and population covariance are defined as follows:
To measure the strength of the linear relationship between x and y tttu@hcmiu.edu.vn 17 lOMoARcPSD|47231818
3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES
•Interpretation of the Covariance: The lines divide the graph into four quadrants:
Points in quadrant I correspond to xi greater than and yi greater than
Points in quadrant II correspond to xi less than and yi greater than tttu@hcmiu.edu.vn 18 lOMoARcPSD|47231818
Points in quadrant III correspond to xi less than and yi less than Points in
quadrant IV correspond to x greater i i than and y less than
value of (xi − )(yi − ) must be:
- positive for points in quadrant I
- negative for points in quadrant II
- positive for points in quadrant III
- negative for points in quadrant IV
3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES
•Correlation Coefficient: Person product moment correlation coefficient: tttu@hcmiu.edu.vn 19 lOMoARcPSD|47231818
the sample correlation coefficient rxy is a point estimator of the
population correlation coefficient ρxy.
3.5. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES
•Interpretation of the Correlation Coefficient: tttu@hcmiu.edu.vn 20

Descriptive statistics | Bài giảng số 5 chương 3 học phần Applied statistics | Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh

Tài liệu liên quan:

Data and Statistics | Bài giảng số 1 chương 1 học phần Applied statistics | Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh

Data and Statistics | Bài giảng số 2 chương 1 học phần Applied statistics | Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh

Plotting and Smoothing data | Bài giảng số 3 chương 2 học phần Applied statistics | Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh

Descriptive statistics | Bài giảng số 4 chương 3 học phần Applied statistics | Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh