Tuesday, September 15, 2009

Homework assignment, due mardi 22 sept

The data for all assignments are at http://www.stat.columbia.edu/~gelman/arm/examples/

1. Follow the instructions to download and configure R and a text editor. (You don't need to download or set up Bugs.)

To make sure everything's working, go into R and type the following:

library ("arm")
x <- c(1,2,3)
y <- c(1,2,5)
fit <- lm (y ~ x)
display (fit)

The following should then appear in your R console:

lm(formula = y ~ x)
coef.est coef.sd
(Intercept) -1.33 1.25
x 2.00 0.58
n = 3, k = 2
residual sd = 0.82, R-Squared = 0.92

If you can not get this to work, please speak with Romain right away!

2. Exercise 2.1: A test is graded from 0 to 50, with an average score of 35 and a standard deviation of 10. For comparison to other tests, it would be convenient to rescale to a mean of 100 and standard deviation of 15.

(a) How can the scores be linearly transformed to have this new mean and standard
deviation?

(b) There is another linear transformation that also rescales the scores to have mean 100 and standard deviation 15. What is it, and why would you not want to use it for this purpose?

3. Exercise 2.4: Distribution of averages and differences: the heights of men in the United States are approximately normally distributed with mean 69.1 inches and standard deviation 2.9 inches. The heights of women are approximately normally distributed with mean 63.7 inches and standard deviation 2.7 inches. Let x be the average height of 100 randomly sampled men, and y be the average height of 100 randomly sampled women. In R, create 1000 simulations of x − y and plot their histogram. Using the simulations, compute the mean and standard deviation of the distribution of x − y and compare to their exact values.

4. Exercise 3.4 (a,b): The child.iq folder contains a subset of the children and mother data discussed earlier in the chapter. You have access to children’s test scores at age 3, mother’s education, and the mother’s age at the time she gave birth for a sample of 400 children. The data are a Stata file which you can read into R by saving in your working directory and then typing the following:

library ("foreign")
iq.data <- read.dta ("child.iq.dta")

(a) Fit a regression of child test scores on mother’s age, display the data and fitted model, check assumptions, and interpret the slope coefficient. When do you recommend mothers should give birth? What are you assuming in making these recommendations?

(b) Repeat this for a regression that further includes mother’s education, interpreting both slope coefficients in this model. Have your conclusions about the
timing of birth changed?