For this homework, you must use multiple regression to determine what influences an outcome of your choice. Your overarching goals are:
1) to find a predictor of interest, to demonstrate that it is associated with your outcome of choice, and to describe that association.
2) to propose two potential explanations for the association and to generate evidence that will help you determine whether the potential explanations are supported by the data.
For this homework, you can use either the 2016 General Social Survey or a dataset of your choice.
• If you chose to work with the GSS, be aware that some of the narrower survey items are not asked of all respondents. As a result, there will be a large number of missing observations for the variables. To complete this homework, a sufficient number of respondents (more than about 100 at minimum) must have valid non-missing data for all the variables you plan to use. When conducting regression analysis, you must estimate your regressions on the same sample. This means that you will need to exclude from all regression models those respondents who have missing values on any of the variables included in your four nested models.
Association between two variables
You must identify two variables (one outcome, one predictor) and estimate their association using regression analysis. Please choose something different from your last assignment, but the idea here is the same. For example, if you are interested in gender inequality in labor markets, you could look at the association between gender (predictor) and income (outcome). If you are interested in what explains individual prejudice, you could examine how scores on one of the GSS measure of racial attitude (outcome) varies by educational attainment (predictor).
You must propose TWO potential explanation for the association you observed in the previous step. For example, say you are interested in gender inequality in the labor market, and you observe that women report a (statistically significant) lower income than men in your data. Here are two potential explanations for women having lower incomes on average than men.
1) women are less competent than men and income levels reflect competence
2) women are the primary care giver for young children and as a result miss work more often than men; missing work leads to lower income
The GSS has some measures of cognitive ability. You could potentially test the first explanation by controlling for these variables in your model and determining whether doing so explains the original relationship between gender and income. You could test the second explanation in a variety of ways, including (for example) an interaction term examining whether the effect of having children on income differs across gender. Again, remember that the point here is not to propose the one correct explanation, but to generate hypotheses that can be tested using regression analysis.
Overall, you will be estimating at least four regression models: a bivariate model, a model that adds control variables, a model that adds measures of your first explanation, and a model that adds measures of your second explanation. In an appendix, you must also provide additional information on model specification and assumptions. More detailed requirements are provided below.