Lecture 1 - ENEE1006IU

Tài liệu học tập môn Applied statistics (ENEE1006IU) tại Trường Đại học Quốc tế, Đại học Quốc gia Thành phố Hồ Chí Minh. Tài liệu gồm 28 trang giúp bạn ôn tập hiệu quả và đạt điểm cao! Mời bạn đọc đón xem! 
lOMoARcPSD|359747 69
APPLIED STATISTICS
COURSE CODE: ENEE1006IU
Lecture 1:
Chapter 1: Data and Statistics
(3 credits: 2 is for lecture, 1 is for lab-work)
Instructor: TRAN THANH TU
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 2
Email: tttu@hcmiu.edu.vn tttu@hcmiu.edu.vn
1
1.1. DATA CLASSIFICATION
•Elements, Variables, and Observations •Scales of Measurement
•Categorical and Quantitative Data •Cross-Sectional and Time
Series Data
A. ELEMENTS, VARIABLES, AND OBSERVATIONS
•Data are the facts and figures collected, analyzed, and summarized for
presentation and interpretation.
Elements are the entities on which data are collected.
A variable is a characteristic of interest for the elements.
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 3
The set of measurements obtained for a particular element is called an
B. SCALES OF MEASUREMENT
•Scales of Measurement: nominal, ordinal, interval, or ratio.
determines the amount of information contained in the data
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 4
indicates the most appropriate data summarization and statistical analyses
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 5
B. SCALES OF MEASUREMENT
•Scales of Measurement: nominal, ordinal, interval, or ratio
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 6
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 7
B. SCALES OF MEASUREMENT
-Nominal scale: when the data for a variable consist of labels or names used to
identify an attribute of the element
a numerical code as well as a nonnumerical label may be used
-Ordinal scale: if the data exhibit the properties of nominal data and in addition, the
order or rank of the data is meaningful
-Interval scale: if the data have all the properties of ordinal data and the interval
between values is expressed in terms of a fixed unit of measure
Interval data are always numerical
-Ratio scale: if the data have all the properties of interval data and the ratio of two
values is meaningful
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 8
This scale requires that a zero value be included to indicate that nothing exists for
the variable at the zero point
B. SCALES OF MEASUREMENT
-Nominal scale: when the data for a variable consist of labels or names used to
identify an attribute of the element
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 9
a numerical code as well as a nonnumerical label may be used
Example: genotype, blood type, zip code, gender, race, eye color, political party, etc.
B. SCALES OF MEASUREMENT
-Ordinal scale: if the data exhibit the properties of nominal data and in addition, the
order or rank of the data is meaningful
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 10
Example: socio economic status (“low income”,”middle income”,”high income”)
education level (“high school”,”BS”,”MS”,”PhD”) income level (“less than
50K”, “50K-100K”, “over 100K”)
satisfaction rating (“extremely dislike”, “dislike”, “neutral”, “like”, “extremely like”), etc.
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 11
B. SCALES OF MEASUREMENT
-Interval scale: if the data have all the properties of ordinal data and the interval
between values is expressed in terms of a fixed unit of measure
Interval data are always numerical (in which, zero still has the meaning)
Example: temperature (Farenheit), temperature (Celcius),
pH, SAT score
(200-800), credit score (300-850), etc.
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 12
B. SCALES OF MEASUREMENT
-Ratio scale: if the data have all the properties of interval data and the ratio of two
values is meaningful
This scale requires that a zero value be included to indicate that nothing exists for
the variable at the zero point
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 13
Example: enzyme activity, dose amount, reaction rate, flow rate,
concentration, pulse, weight, length, temperature in Kelvin (0.0
Kelvin really does mean “no heat”), survival time, etc.
Time is interval scale: 0 is 12:00 noon
Duration is ration scale: 0 means no more time
B. SCALES OF MEASUREMENT
Summary of data types and scale measures:
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 14
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 15
C. CATEGORICAL AND QUANTITATIVE DATA
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 16
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 17
C. CATEGORICAL AND QUANTITATIVE DATA
•A categorical variable is a variable with categorical data, and a quantitative
variable is a variable with quantitative data.
•If the variable is categorical, the statistical analysis is limited
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 18
(when the categorical data are
identified by a numerical code,
arithmetic operations such as addition,
subtraction,
multiplication, and division do not
provide meaningful results)
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 19
C. CATEGORICAL AND QUANTITATIVE DATA
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 20
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 21
D. CROSS-SECTIONAL AND TIME SERIES DATA
•Cross-sectional data are data collected at the same or approximately the same
point in time.
•Time series data are data collected over several time periods.
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 22
D. CROSS-SECTIONAL AND TIME SERIES DATA
- Panel Data (Longitudinal Data): combination of the two mentioned types
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 23
1.2. DATA SOURCES
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 24
•Existing Sources •Observational Study •Experiment •Time and
Cost Issues •Data Acquisition Errors
A. EXISTING SOURCES
•A variety of databases
•Internal records
•A variety of industry associations and special interest organizations
•The Internet is an important source of data and statistical information
Pros Cons
Save time Validity
Save money Reliability
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 25
Various data sets Difficulty obtaining information
specific to his or her needs
B. OBSERVATIONAL STUDY
•Observe what is happening in a particular situation, record data on one or more
variables of interest, and conduct a statistical analysis of the resulting data.
Surveys and public opinion polls are two other examples of commonly used
observational studies.
Pros
Cons
Flexible
Time consuming
“Insider” view
Subjective
Participants may not act in true nature
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 26
C. EXPERIMENT
•The key difference between an observational study and an experiment is that an
experiment is conducted under controlled conditions.
the data obtained from a well-designed experiment can often provide more
information as compared to the data obtained from existing sources or by
conducting an observational study
•The types of experiments we deal with in statistics often begin with the
identification of a particular variable of interest.
D. TIME AND COST ISSUES
•The use of existing data sources is desirable when data must be obtained in a
relatively short period of time.
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 27
•If important data are not readily available from an existing source, the additional
time and cost involved in obtaining the data must be taken into account.
•In all cases, the decision maker should consider the contribution of the statistical
analysis to the decision-making process.
•The cost of data acquisition and the subsequent statistical analysis should not
exceed the savings generated by using the information to make a better decision.
E. DATA ACQUISITION ERRORS
•Using erroneous data can be worse than not using any data at all.
•An error in data acquisition occurs whenever the data value obtained is not equal
to the true or actual value that would be obtained with a correct procedure.
•Experienced data analysts take great care in collecting and recording data to
ensure that errors are not made.
lOMoARcPSD|359747 69
tttu@hcmiu.edu.vn 28
•Special procedures can be used to check for internal consistency of the data.
taking steps to acquire accurate data can help ensure reliable and valuable
decision-making information
| 1/28

Preview text:

lOMoARcPSD|359 747 69 APPLIED STATISTICS COURSE CODE: ENEE1006IU Lecture 1:
Chapter 1: Data and Statistics
(3 credits: 2 is for lecture, 1 is for lab-work) Instructor: TRAN THANH TU lOMoARcPSD|359 747 69 Email: tttu@hcmiu.edu.vn tttu@hcmiu.edu.vn 1 1.1. DATA CLASSIFICATION
•Elements, Variables, and Observations •Scales of Measurement
•Categorical and Quantitative Data •Cross-Sectional and Time Series Data
A. ELEMENTS, VARIABLES, AND OBSERVATIONS
•Data are the facts and figures collected, analyzed, and summarized for
presentation and interpretation.
Elements are the entities on which data are collected.
A variable is a characteristic of interest for the elements. tttu@hcmiu.edu.vn 2 lOMoARcPSD|359 747 69
The set of measurements obtained for a particular element is called an B. SCALES OF MEASUREMENT
•Scales of Measurement: nominal, ordinal, interval, or ratio.
determines the amount of information contained in the data tttu@hcmiu.edu.vn 3 lOMoARcPSD|359 747 69
indicates the most appropriate data summarization and statistical analyses tttu@hcmiu.edu.vn 4 lOMoARcPSD|359 747 69 B. SCALES OF MEASUREMENT
•Scales of Measurement: nominal, ordinal, interval, or ratio tttu@hcmiu.edu.vn 5 lOMoARcPSD|359 747 69 tttu@hcmiu.edu.vn 6 lOMoARcPSD|359 747 69 B. SCALES OF MEASUREMENT
-Nominal scale: when the data for a variable consist of labels or names used to
identify an attribute of the element
a numerical code as well as a nonnumerical label may be used
-Ordinal scale: if the data exhibit the properties of nominal data and in addition, the
order or rank of the data is meaningful
-Interval scale: if the data have all the properties of ordinal data and the interval
between values is expressed in terms of a fixed unit of measure
Interval data are always numerical
-Ratio scale: if the data have all the properties of interval data and the ratio of two values is meaningful tttu@hcmiu.edu.vn 7 lOMoARcPSD|359 747 69
This scale requires that a zero value be included to indicate that nothing exists for
the variable at the zero point B. SCALES OF MEASUREMENT
-Nominal scale: when the data for a variable consist of labels or names used to
identify an attribute of the element tttu@hcmiu.edu.vn 8 lOMoARcPSD|359 747 69
a numerical code as well as a nonnumerical label may be used
Example: genotype, blood type, zip code, gender, race, eye color, political party, etc. B. SCALES OF MEASUREMENT
-Ordinal scale: if the data exhibit the properties of nominal data and in addition, the
order or rank of the data is meaningful tttu@hcmiu.edu.vn 9 lOMoARcPSD|359 747 69
Example: socio economic status (“low income”,”middle income”,”high income”)
education level (“high school”,”BS”,”MS”,”PhD”) income level (“less than
50K”, “50K-100K”, “over 100K”)
satisfaction rating (“extremely dislike”, “dislike”, “neutral”, “like”, “extremely like”), etc. tttu@hcmiu.edu.vn 10 lOMoARcPSD|359 747 69 B. SCALES OF MEASUREMENT
-Interval scale: if the data have all the properties of ordinal data and the interval
between values is expressed in terms of a fixed unit of measure
Interval data are always numerical (in which, zero still has the meaning)
Example: temperature (Farenheit), temperature (Celcius), pH, SAT score
(200-800), credit score (300-850), etc. tttu@hcmiu.edu.vn 11 lOMoARcPSD|359 747 69 B. SCALES OF MEASUREMENT
-Ratio scale: if the data have all the properties of interval data and the ratio of two values is meaningful
This scale requires that a zero value be included to indicate that nothing exists for
the variable at the zero point tttu@hcmiu.edu.vn 12 lOMoARcPSD|359 747 69
Example: enzyme activity, dose amount, reaction rate, flow rate,
concentration, pulse, weight, length, temperature in Kelvin (0.0
Kelvin really does mean “no heat”), survival time, etc.
Time is interval scale: 0 is 12:00 noon
Duration is ration scale: 0 means no more time B. SCALES OF MEASUREMENT
Summary of data types and scale measures: tttu@hcmiu.edu.vn 13 lOMoARcPSD|359 747 69 tttu@hcmiu.edu.vn 14 lOMoARcPSD|359 747 69
C. CATEGORICAL AND QUANTITATIVE DATA tttu@hcmiu.edu.vn 15 lOMoARcPSD|359 747 69 tttu@hcmiu.edu.vn 16 lOMoARcPSD|359 747 69
C. CATEGORICAL AND QUANTITATIVE DATA
•A categorical variable is a variable with categorical data, and a quantitative
variable is a variable with quantitative data.
•If the variable is categorical, the statistical analysis is limited tttu@hcmiu.edu.vn 17 lOMoARcPSD|359 747 69
(when the categorical data are
identified by a numerical code,
arithmetic operations such as addition, subtraction,
multiplication, and division do not provide meaningful results) tttu@hcmiu.edu.vn 18 lOMoARcPSD|359 747 69
C. CATEGORICAL AND QUANTITATIVE DATA tttu@hcmiu.edu.vn 19 lOMoARcPSD|359 747 69 tttu@hcmiu.edu.vn 20 lOMoARcPSD|359 747 69
D. CROSS-SECTIONAL AND TIME SERIES DATA
•Cross-sectional data are data collected at the same or approximately the same point in time.
•Time series data are data collected over several time periods. tttu@hcmiu.edu.vn 21 lOMoARcPSD|359 747 69
D. CROSS-SECTIONAL AND TIME SERIES DATA
- Panel Data (Longitudinal Data): combination of the two mentioned types tttu@hcmiu.edu.vn 22 lOMoARcPSD|359 747 69 1.2. DATA SOURCES tttu@hcmiu.edu.vn 23 lOMoARcPSD|359 747 69
•Existing Sources •Observational Study •Experiment •Time and
Cost Issues •Data Acquisition Errors A. EXISTING SOURCES •A variety of databases •Internal records
•A variety of industry associations and special interest organizations
•The Internet is an important source of data and statistical information Pros Cons Save time Validity Save money Reliability tttu@hcmiu.edu.vn 24 lOMoARcPSD|359 747 69
Various data sets Difficulty obtaining information specific to his or her needs B. OBSERVATIONAL STUDY
•Observe what is happening in a particular situation, record data on one or more
variables of interest, and conduct a statistical analysis of the resulting data.
•Surveys and public opinion polls are two other examples of commonly used observational studies. Pros Cons Flexible Time consuming “Insider” view Subjective
Participants may not act in true nature tttu@hcmiu.edu.vn 25 lOMoARcPSD|359 747 69 C. EXPERIMENT
•The key difference between an observational study and an experiment is that an
experiment is conducted under controlled conditions.
the data obtained from a well-designed experiment can often provide more
information as compared to the data obtained from existing sources or by
conducting an observational study
•The types of experiments we deal with in statistics often begin with the
identification of a particular variable of interest. D. TIME AND COST ISSUES
•The use of existing data sources is desirable when data must be obtained in a
relatively short period of time. tttu@hcmiu.edu.vn 26 lOMoARcPSD|359 747 69
•If important data are not readily available from an existing source, the additional
time and cost involved in obtaining the data must be taken into account.
•In all cases, the decision maker should consider the contribution of the statistical
analysis to the decision-making process.
•The cost of data acquisition and the subsequent statistical analysis should not
exceed the savings generated by using the information to make a better decision. E. DATA ACQUISITION ERRORS
•Using erroneous data can be worse than not using any data at all.
•An error in data acquisition occurs whenever the data value obtained is not equal
to the true or actual value that would be obtained with a correct procedure.
•Experienced data analysts take great care in collecting and recording data to
ensure that errors are not made. tttu@hcmiu.edu.vn 27 lOMoARcPSD|359 747 69
•Special procedures can be used to check for internal consistency of the data.
taking steps to acquire accurate data can help ensure reliable and valuable decision-making information tttu@hcmiu.edu.vn 28
Document Outline

  • APPLIED STATISTICS
    • Chapter 1: Data and Statistics
      • •Elements, Variables, and Observations •Scales of Measurement •Categorical and Quantitative Data •Cross-Sectional and Time Series Data
        • A. ELEMENTS, VARIABLES, AND OBSERVATIONS
        • B. SCALES OF MEASUREMENT
        • B. SCALES OF MEASUREMENT (1)
        • B. SCALES OF MEASUREMENT (2)
        • B. SCALES OF MEASUREMENT (3)
        • B. SCALES OF MEASUREMENT (4)
        • B. SCALES OF MEASUREMENT (5)
        • B. SCALES OF MEASUREMENT (6)
        • B. SCALES OF MEASUREMENT (7)
        • C. CATEGORICAL AND QUANTITATIVE DATA
        • C. CATEGORICAL AND QUANTITATIVE DATA (1)
        • C. CATEGORICAL AND QUANTITATIVE DATA (2)
        • D. CROSS-SECTIONAL AND TIME SERIES DATA
        • D. CROSS-SECTIONAL AND TIME SERIES DATA (1)
      • •Existing Sources •Observational Study •Experiment •Time and Cost Issues •Data Acquisition Errors
        • A. EXISTING SOURCES
        • B. OBSERVATIONAL STUDY
        • C. EXPERIMENT
        • D. TIME AND COST ISSUES
        • E. DATA ACQUISITION ERRORS