Spss 4q4z2w

Multiple Regression Practice Problems

Stat 112

1. When, in 1982, average Scholastic Achievement Test (SAT) scores were first published on a state-by-state basis in the United States, the huge variation in the scores was a source of great pride for some states and of consternation for others. Average scores ranged from a low of 790 (out of a possible 1,600) in South Carolina to a high of 1,088 in Iowa. Two researchers set out to figure out how certain variables are associated with state SAT differences.1 The variable SAT is the average total SAT (verbal+quantitative) score in the state and the two explanatory variables considered are the following: Takers Expend

percentage of the total eligible students (high school seniors) in state who took the exam total state expenditure on secondary schools, expressed in hundreds of dollars per student

Output from a multiple regression analysis is shown below. Response SAT Whole Model Actual by Predicted Plot 1100 1050 SAT Actual

1000 950 900 850 800 750 750 800

850

900

950 1000 1050 1100

SAT Predicted P<.0001 RSq=0.81 RMSE=31.937

Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts)

0.808786 0.800472 31.93721 948.449 49

Analysis of Variance Source Model Error C. Total

DF 2 46 48

Sum of Squares 198456.79 46919.33 245376.12

Mean Square 99228.4 1020.0

F Ratio 97.2841 Prob > F <.0001

Parameter Estimates Term Intercept EXPEND 1

Estimate 932.41448 4.2985226

Std Error 22.16843 1.025343

t Ratio 42.06 4.19

Prob>|t| <.0001 0.0001

B. Powell and L.C. Steelman, “Variations in State SAT Performance: Meaningful or Misleading?,” Harvard Educational Review 54(4), 1984: 389-412.

TAKERS

-3.07411

0.2206

-13.94

<.0001

Effect Tests Source EXPEND TAKERS

Nparm 1 1

DF 1 1

Sum of Squares 17926.44 198071.21

F Ratio 17.5752 194.1902

Prob > F 0.0001 <.0001

Residual by Predicted Plot

SAT Residual

100

50

0

-50 750 800

850

900

950 1000 1050 1100

SAT Predicted

For questions (a)-(e), assume the ideal multiple linear regression model holds. (a) For Pennsylvania, SAT=885, TAKERS=50 and EXPEND=27.98. What would you predict Pennsylvania’s average SAT score to be based on knowing its TAKERS and EXPEND, but not knowing its SAT? What is the residual for Pennsylvania? (b) Is there strong evidence that the multiple regression model provides better predictions of SAT than just using the sample mean of SAT to predict SAT? Use a test at the .05 level to justify your answer. (c) Find an approximate 95% confidence interval for the coefficient on TAKERS. (d) Is there strong evidence that total state expenditures (EXPEND) helps to predict a state’s average SAT score once TAKERS has been taken into ? Use a test at the . 05 level to justify your answer. (e) The two states with the largest Cook’s distances are Alaska and South Carolina with Cook’s distances of 2.06 and 0.18 respectively and leverages of 0.44 and 0.09 respectively. For each state (Alaska, South Carolina), answer whether it would be justified to delete the state from the analysis and report that we omitted the state and that our conclusions only hold for a reduced range of explanatory variables, not including the explanatory variables of the state.

(f) Suppose we want to use either Takers or Log(Takers) in the multiple regression. On the basis of the below information, which of these two forms would you choose to use? Explain. Bivariate Fit of SAT By TAKERS

Linear Fit:

SAT = 1020.3062 - 2.7599621 TAKERS

1100 1050

SAT

1000 950 900 850 800 750 0

10

20

30

40

50

60

70

Linear Fit Transformed Fit to Log

TAKERS

Linear Fit SAT = 1020.3062 - 2.7599621 TAKERS


0.735838 0.730335 36.79525 947.94 50


DF 1 48 49

Sum of Squares 181024.09 64986.73 246010.82

Mean Square 181024 1354

F Ratio 133.7066 Prob > F <.0001

Parameter Estimates Term Intercept TAKERS

Estimate 1020.3062 -2.759962

Std Error 8.139082 0.238686

t Ratio 125.36 -11.56

Prob>|t| <.0001 <.0001

Residual Plot for Linear Fit Residual

100 50 0 -50 -100 0

10

20

30

40

TAKERS

Transformed Fit to Log SAT = 1112.2477 - 59.018822 Log(TAKERS)

50

60

70


0.810762 0.80682 31.14298 947.94 50


DF 1 48 49

Sum of Squares 199456.33 46554.49 246010.82

Mean Square 199456 970

F Ratio 205.6494 Prob > F <.0001

Parameter Estimates Term Intercept Log(TAKERS)

Estimate 1112.2477 -59.01882

Std Error 12.27496 4.11554

t Ratio 90.61 -14.34

Prob>|t| <.0001 <.0001

Residual Plot for Transformed Fit to Log

Residual

50 0 -50 -100 0

10

20

30

40

50

60

70

TAKERS

2. The number of car accidents on a particular stretch of highway seems to be related to the number of vehicles that travel over it and the speed at which they are traveling. A city alderman has decided to ask the county sheriff to provide him with statistics covering the last few years, with the intention of examining these data statistically so that he can (if possible) introduce new speed laws that will reduce traffic accidents. Using the number of accidents as the response variable, he obtains estimates of the number of cars ing along a stretch of road (subtracted from the mean number of cars ing along a stretch of the road) and their average speeds (in miles per hour, subtracted from the mean average speed) for 60 randomly selected days. (a) JMP output from simple linear regressions of (i) Accidents on Speed and (ii) Cars on Speed are shown below. Would you expect the estimated coefficient on Speed to increase, decrease or stay the same in a multiple linear regression of Accidents on Speed and Cars as compared to the estimated coefficient of Speed in the simple linear regression of Accidents on Speed. Justify your answer using the omitted variable bias formula.

Response Accidents Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts)

0.021001 0.004122 2.430355 7.033333 60

Parameter Estimates Term Intercept Speed

Estimate -8.018052 0.2508495

Std Error 13.49733 0.224888

t Ratio -0.59 1.12

Prob>|t| 0.5548 0.2693

t Ratio 1.92 -0.45

Prob>|t| 0.0603 0.6527

Response Cars Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts)

0.003515 -0.01367 1.222004 9.935 60


Estimate 13.003931 -0.051147

Std Error 6.786575 0.113076

(b) JMP output from a multiple linear regression of Accidents on Cars, Speed and Cars*Speed is shown below. Is there strong evidence of an interaction between Cars and Speed? Justify your answer using a test at the .05 level. Response Accidents Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts)

0.743622 0.729887 1.265725 7.033333 60


DF 3 56 59

Sum of Squares 260.21801 89.71533 349.93333

Mean Square 86.7393 1.6021

F Ratio 54.1424 Prob > F <.0001

Parameter Estimates Term Intercept Cars Speed Cars*Speed

Estimate 7.1405117 0.4158119 0.0644162 1.0763228

Std Error 0.163638 0.136049 0.118519 0.087791

t Ratio 43.64 3.06 0.54 12.26

Prob>|t| <.0001 0.0034 0.5889 <.0001

(c) The alderman proposes decreasing the speed limit by 5 MPH. The number of cars on the road is higher on average on weekdays than the weekends. Assuming that the average number of cars will not be changed by decreasing the speed limit and that there are no confounding variables, would you expect the decrease in the speed limit to have a larger impact on the number of accidents during the weekends or the weekdays? 3. Car designers have been experimenting with ways to improve gas mileage for many years. An important element in this research is the way in which a car’s speed affects how quickly fuel is burned. Competitions whose objective is to drive the farthest on the smallest amount of gas have determined that low speeds and high speeds are inefficient. Designers would like to know which speed burns gas most efficiently. As an experiment, 50 identical cars are driven at different speeds and the gas mileage measured. (a) JMP output from a simple linear regression model of Mileage on Speed is shown below. Comment on the regression diagnostics – the residual plot, the histogram of the residuals and the boxplot of the Cook’s distances. If you see any problems, suggest what you would do next in the analysis to try to address those problems. Bivariate Fit of Mileage By Speed 40 35

Mileage

30 25 20 15 10 5 0

10 20 30 40 50 60 70 80 90 100 110 Speed

Linear Fit

Linear Fit Mileage = 23.266776 - 0.0012701 Speed


0.000028 -0.02081 7.102586 23.202 50


DF 1 48 49

Sum of Squares 0.0672 2421.4426 2421.5098

Mean Square 0.0672 50.4467

F Ratio 0.0013 Prob > F 0.9710


Estimate 23.266776 -0.00127

Std Error 2.039431 0.034802

t Ratio 11.41 -0.04

Prob>|t| <.0001 0.9710

Residual

10 0 -10 -20 0

10

20

30

40

50

60

70

80

90

100 110

Speed

Distributions Residual Mileage

-15

-10

-5

0

5

10

15

Distributions Cook's D Influence Mileage 0.2

0.15

0.1

0.05

0

(b) JMP output for a quadratic regression of mileage on speed and speed squared is shown below. Is there strong evidence that the quadratic regression provides better predictions of mileage based on speed than the simple linear regression? Justify your answer using a test at the .05 level.

Response Mileage Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts)

0.710249 0.697919 3.863732 23.202 50

Parameter Estimates Term Intercept Speed Speed Squared

Estimate 9.3413673 0.8021188 -0.007876

Std Error 1.70707 0.077207 0.000734

t Ratio 5.47 10.39 -10.73

Prob>|t| <.0001 <.0001 <.0001

Response Mileage Whole Model Actual by Predicted Plot 40

Mileage Actual

35 30 25 20 15 10 5 5

10

15

20

25

30

35

40

Mileage Predicted P<.0001 RSq=0.71 RMSE=3.8637


0.710249 0.697919 3.863732 23.202 50


DF 2 47 49

Sum of Squares 1719.8740 701.6358 2421.5098

Mean Square 859.937 14.928

F Ratio 57.6040 Prob > F <.0001

Parameter Estimates Term Intercept Speed Speed Squared

Estimate 9.3413673 0.8021188 -0.007876

Std Error 1.70707 0.077207 0.000734

t Ratio 5.47 10.39 -10.73

Prob>|t| <.0001 <.0001 <.0001

Residual by Predicted Plot

Mileage Residual

10

5

0

-5 5

10

15

20

25

30

35

40

Mileage Predicted

Speed Leverage Plot Mileage Leverage Residuals

40 35 30 25 20 15 10 5 0 10 20 30 40 50 60 70 80 90 100 Speed Leverage, P<.0001

Speed Squared Leverage Plot Mileage Leverage Residuals

40 35 30 25 20 15 10 5 0 1000

3000

5000

7000

9000

Speed Squared Leverage, P<.0001

(c) Suppose you are low on gas. Which speed does the quadratic regression model suggest that it is best to drive at – 20 MPH, 50 MPH or 70 MPH? Justify your answer.

Spss 4q4z2w

Overview 4q3b3c

More details 26j3b

Related Documents 171j1w

Spss 4q4z2w

Spss 4q4z2w

Spss 4q4z2w

Spss 4q4z2w

Spss 4q4z2w

Spss 4q4z2w

More Documents from "Muwaga Musa Moses" 5a414w

Spss 4q4z2w

Development Of Attitudes u6m22

The Groundings With My Brothers-walter Rodney f4y

Diffusion Osmosis Worksheet Answers(1) 6m1ab

Dyslexic Cinderella 1z2k69

Shuab Ul Iman Volume5 By Imam Bayhaqi 272062