



















Preview text:
lOMoAR cPSD| 58583460
Chap 4: Further Development and Analysis of the Classical Linear Regression Model
It can be proved that a t-distribution is just a special case of the more general F-distribution.
The square of a t-distribution with T-k degrees of freedom will be identical to an F-distribution
with (1,T-k) degrees of freedom. But remember that if we use a 5% size of test, we will look
up a 5% value for the F-distribution because the test is 2-sided even though we only look in
one tail of the distribution. We look up a 2.5% value for the t distribution since the test is 2- tailed.
Examples at the 5% level from tables a.
We could use an F- or a t- test for this one since it is a single hypothesis involving only
one coefficient. We would probably in practice use a t-test since it is computationally simpler
and we only have to estimate one regression. There is one restriction. b.
Since this involves more than one coefficient, we should use an F-test. There is one restriction. c.
Since we are testing more than one hypothesis simultaneously, we would use an F-test. There are 2 restrictions. d.
As for (c), we are testing multiple hypotheses so we cannot use a t-test. We have 4 restrictions. e.
Although there is only one restriction, it is a multiplicative restriction. We therefore
cannot use a t-test or an F-test to test it. In fact we cannot test it at all using the methodology
that has been examined in this chapter. lOMoAR cPSD| 58583460
THE regression F-statistic would be given by the test statistic associated with hypothesis iv)
above. We are always interested in testing this hypothesis since it tests whether all of the
coefficients in the regression (except the constant) are jointly insignificant. If they are then we
have a completely useless regression, where none of the variables that we have said influence
y actually do. So we would need to go back to the drawing board! The alternative hypothesis is:
Note the form of the alternative hypothesis: “or” indicates that only one of the components of
the null hypothesis would have to be rejected for us to reject the null hypothesis as a whole.
The restricted residual sum of squares will always be at least as big as the unrestricted residual sum of squares i.e.
To see this, think about what we were doing when we determined what the regression
parameters should be: we chose the values that minimize the residual sum of squares. We said
that OLS would provide the “best” parameter values given the actual sample data. Now when
we impose some restrictions on the model, so that they cannot all be freely determined, then
the model should not fit as well as it did before. Hence the residual sum of squares must be
higher once we have imposed the restrictions; otherwise, the parameter values that OLS chose
originally without the restrictions could not be the best.
In the extreme case (very unlikely in practice), the two sets of residual sum of squares could be
identical if the restrictions were already present in the data, so that imposing them on the model
would yield no penalty in terms of loss of fit. lOMoAR cPSD| 58583460 lOMoAR cPSD| 58583460 lOMoAR cPSD| 58583460 lOMoAR cPSD| 58583460
R-squared (R2) is a number that tells you how well the independent variable(s) in a statistical
model explain the variation in the dependent variable. It ranges from 0 to 1, where 1 indicates
a perfect fit of the model to the data. R-squared values range from 0 to 1 and are commonly
stated as percentages from 0% to 100%.
Quantile regression provides an alternative to ordinary least squares (OLS) regression and
related methods, which typically assume that associations between independent and dependent
variables are the same at all levels. Quantile methods allow the analyst to relax the common
regression slope assumption. In OLS regression, the goal is to minimize the distances between
the values predicted by the regression line and the observed values. In contrast, quantile
regression differentially weights the distances between the values predicted by the regression
line and the observed values, then tries to minimize the weighted distances.
Chap 5: Classical Linear Regression Model Assumptions and Diagnostic Tests lOMoAR cPSD| 58583460
We would like to see no pattern in the residual plot! If there is a pattern in the residual plot, this
is an indication that there is still some “action” or variability left in yt that has not been
explained by our model. This indicates that potentially it may be possible to form a better
model, perhaps using additional or completely different explanatory variables, or by using lags
of either the dependent or of one or more of the explanatory variables. Recall that the two plots
shown on pages 157 and 159, where the residuals followed a cyclical pattern, and when they
followed an alternating pattern are used as indications that the residuals are positively and
negatively autocorrelated respectively.
Another problem if there is a “pattern” in the residuals is that, if it does indicate the presence
of autocorrelation, then this may suggest that our standard error estimates for the coefficients
could be wrong and hence any inferences we make about the coefficients could be misleading. lOMoAR cPSD| 58583460 lOMoAR cPSD| 58583460 the case ?
b. The coefficient estimates would still be the “correct” ones (assuming that the other
assumptions required to demonstrate OLS optimality are satisfied), but the problem would
be that the standard errors could be wrong. Hence if we were trying to test hypotheses about
the true parameter values, we could end up drawing the wrong conclusions. In fact, for all of
the variables except the constant, the standard errors would typically be too small, so that we
would end up rejecting the null hypothesis too many times.
c. There are a number of ways to proceed in practice, including -
Using heteroskedasticity robust standard errors which correct for the problem by
enlarging the standard errors relative to what they would have been for the situation where the
error variance is positively related to one of the explanatory variables. -
Transforming the data into logs, which has the effect of reducing the effect of large
errors relative to small ones. lOMoAR cPSD| 58583460 a.
This is where there is a relationship between the ith and jth residuals. Recall that one of
the assumptions of the CLRM was that such a relationship did not exist. We want our residuals
to be random, and if there is evidence of autocorrelation in the residuals, then it implies that
we could predict the sign of the next residual and get the right answer more than half the time on average! b.
The Durbin Watson test is a test for first order autocorrelation. The test is calculated as
follows. You would run whatever regression you were interested in, and obtain the residuals. Then calculate the statistic
You would then need to look up the two critical values from the Durbin Watson tables, and
these would depend on how many variables and how many observations and how many
regressors (excluding the constant this time) you had in the model.
The rejection / non-rejection rule would be given by selecting the appropriate region from the following diagram: c.
We have 60 observations, and the number of regressors excluding the constant term is
3. The appropriate lower and upper limits are 1.48 and 1.69 respectively, so the Durbin Watson
is lower than the lower limit. It is thus clear that we reject the null hypothesis of no
autocorrelation. So it looks like the residuals are positively autocorrelated. d.
The problem with a model entirely in first differences, is that once we calculate the long
run solution, all the first difference terms drop out (as in the long run we assume that the values
of all variables have converged on their own long run values so that yt = yt-1 etc.) Thus when
we try to calculate the long run solution to this model, we cannot do it because there isn’t a
long run solution to this model! e.
The answer is yes, there is no reason why we cannot use Durbin Watson in this case.
You may have said no here because there are lagged values of the regressors (the x variables)
variables in the regression. In fact this would be wrong since there are no lags of the
DEPENDENT (y) variable and hence DW can still be used. lOMoAR cPSD| 58583460
The last equation above is the long run solution.
Ramsey’s RESET test is a test of whether the functional form of the regression is appropriate.
In other words, we test whether the relationship between the dependent variable and the
independent variables really should be linear or whether a non-linear form would be more
appropriate. The test works by adding powers of the fitted values from the regression into a
second regression. If the appropriate model was a linear one, then the powers of the fitted values
would not be significant in this second regression.
If we fail Ramsey’s RESET test, then the easiest “solution” is probably to transform all of the
variables into logarithms. This has the effect of turning a multiplicative model into an additive one.
If this still fails, then we really have to admit that the relationship between the dependent
variable and the independent variables was probably not linear after all so that we have to either
estimate a non-linear model for the data (which is beyond the scope of this course) or we have
to go back to the drawing board and run a different regression containing different variables. lOMoAR cPSD| 58583460 a.
It is important to note that we did not need to assume normality in order to derive the
sample estimates of and or in calculating their standard errors. We needed the normality
assumption at the later stage when we come to test hypotheses about the regression coefficients,
either singly or jointly, so that the test statistics we calculate would indeed have the distribution
(t or F) that we said they would. b.
One solution would be to use a technique for estimation and inference which did not
require normality. But these techniques are often highly complex and also their properties are
not so well understood, so we do not know with such certainty how well the methods will
perform in different circumstances.
One pragmatic approach to failing the normality test is to plot the estimated residuals of the
model, and look for one or more very extreme outliers. These would be residuals that are much
“bigger” (either very big and positive, or very big and negative) than the rest. It is, fortunately
for us, often the case that one or two very extreme outliers will cause a violation of the
normality assumption. The reason that one or two extreme outliers can cause a violation of the
normality assumption is that they would lead the (absolute value of the) skewness and / or
kurtosis estimates to be very large.
Once we spot a few extreme residuals, we should look at the dates when these outliers occurred.
If we have a good theoretical reason for doing so, we can add in separate dummy variables for
big outliers caused by, for example, wars, changes of government, stock market crashes,
changes in market microstructure (e.g. the “big bang” of 1986). The effect of the dummy
variable is exactly the same as if we had removed the observation from the sample altogether
and estimated the regression on the remainder. If we only remove observations in this way, then
we make sure that we do not lose any useful pieces of information represented by sample points. lOMoAR cPSD| 58583460
a. Parameter structural stability refers to whether the coefficient estimates for a regression
equation are stable over time. If the regression is not structurally stable, it implies that the
coefficient estimates would be different for some sub-samples of the data compared to others.
This is clearly not what we want to find since when we estimate a regression, we are implicitly
assuming that the regression parameters are constant over the entire sample period under consideration. lOMoAR cPSD| 58583460 lOMoAR cPSD| 58583460
By definition, variables having associated parameters that are not significantly different from
zero are not, from a statistical perspective, helping to explain variations in the dependent
variable about its mean value. One could therefore argue that empirically, they serve no purpose
in the fitted regression model. But leaving such variables in the model will use up valuable
degrees of freedom, implying that the standard errors on all of the other parameters in the
regression model will be unnecessarily higher as a result. If the number of degrees of freedom
is relatively small, then saving a couple by deleting two variables with insignificant parameters
could be useful. On the other hand, if the number of degrees of freedom is already very large,
the impact of these additional irrelevant variables on the others is likely to be inconsequential. lOMoAR cPSD| 58583460
An outlier dummy variable will take the value one for one observation in the sample and zero
for all others. The Chow test involves splitting the sample into two parts. If we then try to run
the regression on both the sub parts but the model contains such an outlier dummy, then the
observations on that dummy will be zero everywhere for one of the regressions. For that sub-
sample, the outlier dummy would show perfect multicollinearity with the intercept and
therefore the model could not be estimated. a) Measurement error:
Measurement error is the difference between the observed value of a Variable and
the true, but unobserved, value of that Variable.Measurement uncertainty is critical to
risk assessment and decision making. Organizations make decisions every day
based on reports containing quantitative measurement data. If measurement results
are not accurate, then decision risks increase. Selecting the wrong suppliers, could
result in poor product quality.
b) Measurement error arise due to Sources of systematic errors that may be
imperfect calibration of measurement instruments, changes in the environment
which interfere with the measurement process, and imperfect methods of
observation. A systematic error makes the measured value always smaller or
larger than the true value, but not both
c) Measurement is more serious when it is present in the independent variable of
regression.With multiple regression, matters are more complicated, as
measurement error in one variable can bias the estimates of the effects of other
independent variables in unpredictable directions. To “explain” something means
to give reasons why something is the case rather than something else.
d)The linear relationship between the return required on an investment (whether in
stock market securities or in business operations) and its systematic risk is
represented by the CAPM formula, which is given in the Formulae Sheet:
E(ri) = Rf + Bi(E(rm) - Rf)
E(ri) = return required on financial asset i Rf
= risk-free rate of return ßi beta value for lOMoAR cPSD| 58583460
financial asset i Bi E(rm) = average return on the capital market
we examine the effects of errors in measurement of the two independent variables,
return on market (Rm) and return on risk-free assets (Rf), in the traditional one-factor
capital asset pricing model (CAPM). After discussing Sharpe-Lintner's CAPM and
both Jensen and Fama's specifications thereof, we review briefly the recent results of
Friend and Blume , hereafter FB; Black, Jensen and Scholes ,hereafter BJS; and
Miller and Scholes ,hereafter MS. In Section II, we first explore possible sources of
measurement errors for both Rm and Rf; then we specify these errors
mathematically and derive analytically their effects on estimates of systematic risk of
a security or portfolio, , and the Jensen's measure of performance, . In Section III, we
derive an analytical expression for the regression coefficient of estimated b's where
we estimate the equation . The result is then examined to find the conditions under
which errors in measurement of Rm and Rf can cause b to have a positive or
negative value even if the true b is zero. The conditions are then used to examine
FB's results and their interpretation. In Section IV, an alternative hypothesis testing
procedure for the CAPM is examined. We show that the empirical results so derived
are also affected by the measurement errors and the sample variation of the
systematic risk. The relative advantage between the two different testing hypothesis
procedures is then explored. Finally, we comment on the relevance of the result to
the popular zero-beta model and indicate areas for further research.
Chap 6: Univariate Time-Series Modelling and Forecasting -
AR models use past values of the time series to predict future values, while MA models use past error terms. -
AR models capture the autocorrelation in a time series, while MA models capture the
moving average of the error terms. -
AR models can exhibit long-term dependencies in the data, while MA models are more
focused on capturing short-term fluctuations. -
AR models are suitable for data with trends, while MA models are useful for capturing
sudden shocks or irregularities in the data.
In practice, ARMA (autoregressive moving average) models combine both autoregressive and
moving average components to better capture the characteristics of time series data. lOMoAR cPSD| 58583460
ARMA models are of particular use for financial series due to their flexibility. They are fairly
simple to estimate, can often produce reasonable forecasts, and most importantly, they require
no knowledge of any structural variables that might be required for more “traditional” econometric analysis.
AR (Autoregressive) models are where the value of a variable in a time series is a function of
its own previous values (lagged values). It is useful for observing the momentum and mean
reversion effects. eg: stock prices
MA (moving average) models involve time series models where the value of a variable is a
function of the value of the error terms (both contemporaneous and past error terms). It captures
the effect of unexpected events eg: earnings surprise, covid-19 shock
ARMA models combine the AR and MA models and capture both mean reversion and shock
effects. Since financial time series is observed to have momentum and very volatile to shocks, ARMA models are useful lOMoAR cPSD| 58583460 a.
The first equation is a random walk or root process function, i.e. a non-stationary process with a value of 1.
The second equation describes a stationary Autoregressive cycle, since the coefficient is less than 1.
The third equation is a Moving Average model since its residual term is a number. b.
In the first equation, the non-stationary cycle will be fitted with a non decreasing ACF,
with correlation bars equal to 1.
ACF will decline very quickly during the second stationary phase, and its coefficients will
decrease by half the value of the last period.
At the first order on the horizontal axis, the third cycle would be non-zero because the MA's
ACF determines the order of the cycle. c.
While these processes are quite simplistic, the closest to any stock market process will
be a self-reliance process, as stocks are based on previous values, which are improved in the
form of residual acoustics. To say that a stock is a random march would mean that it is
completely unpredictable, except that this is not so in the longer term. The second method of
these three can be used to predict potential stock prices. But a real-world ARIMA (p, d, q)
model is typically suitable as a combination of the properties of the given processes to forecast future stock market values. lOMoAR cPSD| 58583460 d.
By the lagging time coefficient, we can say the persistence of a cycle. Persistence is
supreme in the first process, i.e. the process at a time t has been extremely persistent or relies
entirely on its importance across periods (t-1). The second cycle is less stable because it relies
partly on the importance of its previous era. In this cycle, persistence is lower. The continuity
with its last error term is high in the third MA cycle, because the coefficient is near 1.
a. Box and Jenkins were the first to consider ARMA modeling in this logical and coherent
fashion. Their methodology consists of 3 steps: -
Identification - determining the appropriate order of the model using graphical
procedures (e.g. plots of autocorrelation functions). -
Estimation - of the parameters of the model of size given in the first stage.This can be
done using least squares or maximum likelihood, depending on the model. -
Diagnostic checking - this step is to ensure thot the model actually estimated is
“adequate”. B & J suggest two methods for achieving this:
+ Overfitting, which involves deliberately fitting a model larger than that suggested
in step 1 and testing the hypothesis that all the additional coefficients can jointly be set to zero.
+ Residual diagnostics. If the model estimated is a good description of the data, there should
be no further linear dependence in the residuals of the estimated model. Therefore,
we could calculate the residuals from the estimated model, and use the Ljung-
Box test on them, or calculate their ocf. Ifeither of these reveal evidence of additional
structure, then we assume that the estimated model is not an adequate description of the data.
If the model appears to be adequate, then it can be used for policy analysis and for constructing
forecosts. If it is not adequate, then we must go bock tostoge 1 and tell the story again! b.
The main problem with the B & J methodology is the inexactness of the identification
stoge. Autocorrelation functions and portal autocorrelations for actual data are very difficult
to interpret accurately, rendering the whole procedure often little more than educated
guesswork. A further problem concerns the diagnostic checking storage, which will only
indicate when the proposed model is “too small” and would not inform on when the model proposed is “too large”. c.
We could use Akoike’s or Schwarz's Bayesian information criterion. Our
Objective would then be to fit the model order thot minimizes these.We can calculate the
volume of Akoike’s (AIC) and Schwarz's (SBIC) Boyesioninformotion criteria using the following respective formula