45-734 Probability and Statistics II
(4th Mini AY 1997-98 Flex-Mode and Flex-Time)
Assignment #3: Due 7 April 1998
Cigarette smoking is bad for you! It causes a variety of dreadful
diseases and cuts years off your lifespan. One of the wretched byproducts of
cigarettes is carbon monoxide. In this first problem we are going to look at
the amount of carbon monoxide produced by a cigarette as a function of tar,
nicotine, and weight of the cigarette. The data are in
Ciggy.wf1
The following variables are in the file: TAR (measured in
milligrams); NICO (nicotine measured in milligrams);
WEIGHT (measured in grams); and CARMON (carbon monoxide
measured in milligrams). The data are for 25 brands of cigarettes
(source: Federal Trade Commission, courtesy of Dennis Epple).
Regress carbon monoxide of the other variables (you do not have to do
any transformations of the variables). Perform F tests of
b1 =
b2 = 0,
b1 =
b3 = 0, and
b2 =
b3 = 0
(that is, all combinations of 2 of the 3 independent variables; use
a = .05).
Test the null hypothesis that the coefficient of tar is
equal to one (use
a= .05). In your judgement, how well does this
model account for the amount of carbon monoxide? Show the EVIEWS
output and your calculations.
In this problem we are going to look at the number of wildcat oilwells
drilled every year in the United States as a function of a variety of
variables. The data are in:
Drill3.wf1
The variables in the file are: WILDCT2 (number of wells drilled in the U.S.
by year); OILCON (U.S. oil consumption measured in number of barrels per day
per capita); PCI (U.S. per capita income in thousands of 1982 dollars);
PRICEOK (price of a barrel of sweet Oklahoma crude oil in 1982 dollars);
VEHICLE (U.S. per capita registration of motor vehicles); and
YEAR which is included for your convenience. The data are for 1936 to
1987 and are courtesy of John Londregan.
Regress the number of wildcat wells on the other variables (show
your output--you do not have to transform any of the variables). Note
that it does not make sense to use the variable YEAR here.
Do the t-statistics for the coefficients follow the pattern you expected
--that is, do you think the results make sense? Explain your answer.
Test whether or not the coefficients for OILCON, PCI, and VEHICLE are all
equal to zero (i.e., an F test on the set--use
a= .05).
In (a) the model was linear in the parameters and in the variables.
Test the model
WILDCT2 = eb0
(OILCON)b1
(PCI)b2
(PRICEOK)b3
(VEHICLE)b4
ee
Compare the pattern of the t-statistics on the coefficients for this model
with those of part (a) (show your output). Do you think the results make
sense? Explain your answer.
In the United States it is political folklore that the vote for
the presidential candidate of the incumbent president's political party is a
function of the economy. It is also political folklore that the vote for
the candidates of the incumbent president's political party for seats in
the House of Representatives is not a direct function of the economy;
rather "local" issues matter more than "national" issues in House elections.
We are going to examine this folklore using presidential election data
from 1916 to 1988 (courtesy of Howard Rosenthal and John Londregan).
The data are in:
Presvote.wf1
The variables are HOUSVOTE (percentage of the national vote for the
House candidates of the incumbent president's political party);
PRSVOTE (percentage of the national vote for the presidential
candidate of the incumbent president's political party); GNP (real
GNP growth); REPUB (a dummy variable which is 1 if the incumbent
president is a Republican and 0 if the incumbent president is a Democrat);
MILMOB (same variable as used in homework #1).
Investigate the proposition that the presidential vote for the
incumbent president's political party depends upon the performance of the
economy. At some point try lagging the variable GNP and using this and
MILMOB in your model but do not use REPUB. To lag the variable GNP means to
use the data from last year's GNP to predict this year's PRSVOTE. This is
done by putting GNP(-1) as a right hand side variable (to lag two periods use
GNP(-2), three periods, GNP(-3), and so on). Show the output of the model
that you feel is correct (without using the REPUB variable). Explain your
model and discuss the effects of the various variables that you include and
exclude. Specifically, compare the effect of GNP to GNP(-1). What does this
tell you about voters? Now add the dummy variable REPUB to your model and
show results. What effect does this dummy variable have and why would
we want to include it in the model (hint: note that there are only two
political parties in the U.S.).
Investigate the proposition that the vote for the candidates of the
incumbent president's political party for seats in the House of
Representatives is not a direct function of the economy. Specifically
estimate
HOUSVOTE = b0 +
b1(GNP) +
b2(PRSVOTE) +
e
now estimate
HOUSVOTE = b0 +
b1(GNP) +
b2(PRSVOTE) +
b3(REPUB) +
e
Compare the results of the two regressions (show the results). Do they
make sense? (In your answer, assume you know nothing about U.S. history!)