1. Clean the excel data set using Rstudio.-Utilize the forcats package for reducing categories-Utilize the mice package for imputingThe HOR_state and UnitST columns of States needs to be reduced to 4...

1 answer below »
1. Clean the excel data set using Rstudio.
-Utilize the forcats package for reducing categories

-Utilize the mice package for imputing






The HOR_state and UnitST columns of States needs to be reduced to 4 regions of West, Midwest, South and Northeast. The Branch categories needs to be reduced to unrestricted and restricted:


Unrestricted - Armor, Air Defense Artillery, Ammunition, Aviation, Field Artillery, Infantry, Logistics, Mechanical Maintenance, Military Police, Special Forces.


Restricted - Adjacent General, Army Medical Specialist Corps, Army Nurse Corps, Behavioral Sciences, CBRN, Chaplain, Civil Affairs, CMF Immaterial, Corps of Engineers, Cyber, Dental Corps, Electronic Maintenance, Financial Management, Force Management, Health Services, Information Operations, Information Systems Engineer, Judge Advocate Generals Corps, Laboratory Sciences, Medical Corps, Military Intelligence, Nuclear & Counterproliferation, Operations Research/Systems Analysis, Personnel Special Reporting Codes, Preventative Medical Sciences, Psychological Operations, Public Affairs, Quartermaster Corps, Recruitment & Reenlistment, Research/Development/Acquisition, Signal Corps, Simulations Operations, Space Operations, Strategist Intelligence, Strategist, Systems Automation Officer, Telecommunications Systems Engineers, Transportation Corps, Veterinary Corps.










2. After the data has been cleaned. Fit a random forest model using the of the unvac_pop column as the response variable to the Branch column and UnitST column using the supporting material word document as guidance.








3.Then estimate the AUC value of the random forest model using the supporting material word document.
Answered 2 days AfterNov 08, 2022

Answer To: 1. Clean the excel data set using Rstudio.-Utilize the forcats package for reducing...

Mohd answered on Nov 11 2022
39 Votes
-
-
-
2022-11-11
Importing the dataset
library(readxl)
data <- read_excel("data.xlsx", col_typ
es = c("text",
"text", "text", "text", "date", "date",
"numeric", "numeric", "text", "text",
"numeric", "text", "text", "text", "text",
"text", "text", "numeric", "numeric",
"text", "text", "text", "text", "date",
"text", "text", "text", "text", "text",
"numeric", "text", "text", "text", "text"))
First look of the Data
skimr::skim(data)
Data summary
    Name
    data
    Number of rows
    176485
    Number of columns
    34
    _______________________
    
    Column type frequency:
    
    character
    25
    numeric
    6
    POSIXct
    3
    ________________________
    
    Group variables
    None
Variable type:...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here