Your task is to build a prediction model for the outcome percentage of unhealthy liver cells for patients with hepatitis C infection. Your clinical investigator is seeking your advice on the best way to use the various hepatitis C measures along with identifying other possible predictors of Steatohepatitis. To this end, you should first follow the model building steps below (in this order):
1. Investigate the individual associations between each variable and the percentage of unhealthy liver cells to: identify which variables should be included in a multivariable model; to identify if any transformations are necessary; and to identify any possible issues of non-linearity.
2. Create an initial multivariable regression model with the percentage of unhealthy liver cells as the outcome and including all possible predictors identified in part one.
3. Investigate possible collinearity in this model and deal with it appropriately.
4. Refine the multivariable model as necessary to exclude terms not associated with the outcome.
5. Check the assumptions of your final model and make any adjustments as necessary. As part of the validation process, you will provide at least 3 plots that are the most useful, in your opinion.
Note that, for the purpose of this exercise, there is no need to investigate interactions.
Once you have completed this analysis, write a summary of your findings for the clinical collaborate that includes the following:
- • A description and explanation of any issues that arose during the model building process
- • A summary of the relevant findings including P-values, interpretation of regression coefficients, confidence intervals, and an equation that could be used to predict the percentage of unhealthy liver cells in other patients (only for the final model)
- • Some specific advice on which measure of hepatitis C infection (rna, t-cells, or cells infected with hepatitis C) is most useful and why.