Module 4 case study – 25 marks Data used for this case study have been simulated but the problem considered here is very close to a study conducted by researchers at Sydney University. Subject matter...

1 answer below »






Module 4 case study – 25 marks


Data used for this case study have been simulated but the problem considered here is very close to a study conducted by researchers at Sydney University.



Subject matter background:


Steatohepatitis is a liver disease where harmful fat accumulates in liver cells and can cause long term damage and scarring of the liver. Steatohepatitis is measured with a liver biopsy that estimates the percentage of unhealthy or unhealthy liver cells. This study is interested in identifying predictors of Steatohepatitis outcomes. One such predictor is the severity of hepatitis C infection which can be measured in three ways: the amount of Hepatitis C virus RNA in the blood; T-cell levels in the blood; and number of cells in the blood infected with hepatitis C virus. All measures are relatively unobtrusive as they only require a blood sample, however there are significant laboratory costs associated with measuring the number of cells infected with hepatitis C virus.


Data has been collected on 238 patients with an active hepatitis C infection, and is recorded in the dataset HCV.dta. The variables in this dataset are:



  • • idnum Patient ID number

  • • unhealthy Percentage of unhealthy liver cells estimated from a biopsy (outcome)

  • • bmi Body mass index (kg/m2), weight divided by the square of height

  • • age Age of the patient in years

  • • alcohol Alcohol consumption (estimated standard drinks per week)

  • • diabetes Presence of diabetes (0 – absent, 1 – present), binary indicator

  • • rna Level of hepatitis C RNA in the blood (copies per mL)

  • • tcell Level of hepatitis C specific T-cells in the blood (per million T cells)

  • • infected Number of cells infected with hepatitis C in the blood (per million cells)






squared



Exercise:


Your task is to build a prediction model for the outcome percentage of unhealthy liver cells for patients with hepatitis C infection. Your clinical investigator is seeking your advice on the best way to use the various hepatitis C measures along with identifying other possible predictors of Steatohepatitis. To this end, you should first follow the model building steps below (in this order):


1. Investigate the individual associations between each variable and the percentage of unhealthy liver cells to: identify which variables should be included in a multivariable model; to identify if any transformations are necessary; and to identify any possible issues of non-linearity.


2. Create an initial multivariable regression model with the percentage of unhealthy liver cells as the outcome and including all possible predictors identified in part one.


3. Investigate possible collinearity in this model and deal with it appropriately.


4. Refine the multivariable model as necessary to exclude terms not associated with the outcome.










5. Check the assumptions of your final model and make any adjustments as necessary. As part of the validation process, you will provide at least 3 plots that are the most useful, in your opinion.






Note that, for the purpose of this exercise, there is no need to investigate interactions.


Once you have completed this analysis, write a summary of your findings for the clinical collaborate that includes the following:



  • • A description and explanation of any issues that arose during the model building process

  • • A summary of the relevant findings including P-values, interpretation of regression coefficients, confidence intervals, and an equation that could be used to predict the percentage of unhealthy liver cells in other patients (only for the final model)

  • • Some specific advice on which measure of hepatitis C infection (rna, t-cells, or cells infected with hepatitis C) is most useful and why.


Answered Same DayJun 01, 2021

Answer To: Module 4 case study – 25 marks Data used for this case study have been simulated but the problem...

Rajeswari answered on Jun 04 2021
131 Votes
59449 Assignment
The main purpose of this assignment is to analyse the liver diseases namely hepatitis, the causes, relevant factors, and other things. For this purpose, a data set consisting of 238 patients were collected and recorded. Th
e data set size is sufficiently large and also can be assumed to have been randomly drawn.
Steatohepatitis is a harmful liver disease in which fat accumulates in liver cells and may cause long term damage of the liver. Steatohepatitis is measured with a liver biopsy. The unhealthy liver cells are identified and measured. This study is used for identifying predictors of Steatohepatitis outcomes.. All these measures are relatively unobtrusive to the body functioning as they only require a blood sample.
Data has been collected on 238 patients.The excel sheet enclosed contains the data set.
. The variables in this dataset are:
· • are columnwise marked and given in the excel sheet.
·
1) We tried to investigate the association or relation with each variable and the percentage of unhealthy liver cells to: identify which variables should be included in a multivariable model; to identify if any transformations are necessary; and to identify any possible issues of non-linearity.
An ideal check for linear relation is to find out correlation. We know correlation coefficient r, is a measure of strength as well as sign of association. If r is positive then relation is positive and also if |r| is nearer to 1, there is strong correlation and if near to 0 weak correlation.
We found correlation for each variable with last column variable as unhealthy liver
We had the following results:
    Correlation with unhealthy
     
     
    Strength
    Sign
    bmi
    0.220167
     
    Weak
    positive
    age
    0.008529
     
    Very weak
    positive
    alcohol
    0.41989
     
    mod.weak
    positive
    diabetes
    0.21962
     
    weak
    positive
    infected
    0.260333
     
    weak
    positive
    tcell
    0.259787
     
    weak
    positive
    rna
    0.1734
     
    weak
    positive
Except alcohol all shows a very weak correlation. Of these age shows almost 0 which proves that there is no relationship between age and unhealthy liver.
We can remove age as a predictor variable from the multi regression model. Next is infected as such does not show linear relationship as per scatter plot. But when we take log infected this is better predictor as linear relationship is shown by scatter...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here