Foundations of Data Analytics INFO XXXXXXXXXX PROJECT 3 Objective: You have to show that you are able to fit a logistic regression and that you can interpret the results that you obtain, such as...



Foundations of Data Analytics INFO 561 -01



PROJECT 3



Objective: You have to show that you are able to fit a logistic regression and that you can interpret the results that you obtain, such as coefficient estimates, AIC, BIC, odds ratio, Whole test table, Confusion Matrix, ROC curve, etc.


Please use the “lung_cancer” dataset to fit a logistic regression model. Make sure that you follow the given template below:


1- Fit a logistic regression model. (Tell me what your target class is (survived or dead))


2- Tell me whether the model is useful or not (justify why).


3-Interpret the effect of the age variable on the odds ratio (age only).


4- Calculate the accuracy, sensitivity, AUC, and specificity of the model that you obtain (Remember: the outcome of an event becomes positive when the target variable is observed. For example, If a patient died and your target was "dead", then the event of the patient dying becomes a positive result. So, TP would be classifying a non-survivor patient as dead).


5- Remove some of the unimportant variables (in the effect summary) and compare the models that you obtained according to their AUC values. Also, report AIC and BIC for each model.


6- After deciding on the best model (justify why you choose that), write down the logistic regression equation obtained using the coefficient estimates.



Data:


The dataset consists of 1000 observations and 7 variables. You have one dependent variable (output1) indicating whether lung cancer patients survived more than 5 years or not. The dataset (real dataset) was obtained by signing the non-disclosure agreement, therefore, the distribution of the dataset is not permitted, please be understanding. The brief definition of the variables is given below.


Definition of variables







































Variable name




Definition



"AGE_DX"(numeric)



Age of patient



"EOD10_SZ"(numeric)



Tumor size



"EOD10_EX" (numeric)



This item codes the farthest documented extension of tumor away from the primary site, either by contiguous extension or distant metastases. Allowable values = 00-99.



"EOD10_ND" (numeric)



This item records the highest specific lymph node chain that is involved by the tumor.



"EOD10_PN" (numeric)



Records the exact number of regional lymph nodes examined by the pathologist that were found to contain metastases.



"HST_STGA" (numeric)



Cancer stage (please act as if it is a numeric variable)



Output (binary)


Categorical: Alive, or dead (5-year survivability)



As always please hand in a hard copy of your report (not more than 1 page).

Oct 07, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here