Datasets: There are two datasets for this exercise: 1) “FlightInfo_scheduled.csv” and 2) “FlightInfo_actual.csv”. The former one provides information about the flight schedules that were...

1 answer below »


Datasets:There are two datasets for this exercise: 1) “FlightInfo_scheduled.csv” and 2) “FlightInfo_actual.csv”. The former one provides information about the flight schedules that were plannedtoarriveatthreeNewYorkbasedairports(JFK,LGA,andEWR)onJanuary2022.Thelatter datasetprovidesinformationaboutthedeparturetimesoftheseflights,theweatherstatusduringdeparture(1indicatesbadweather),andwhethertheflightdepartedontimeornot.FlightIDistheunique identifierofflights,Carrieristheairlinecode,andoriginistheairportoforigin.DayOfWeekis1=Monday,2=Tue,etc.NoticethatthetimesareenteredinHHMMformat.Ourgoalistopredictwhetheraflight willbedelayedornot.






STEP 1: Data Pre-processing






1.1.
Reading the data:Read both datasets into the software and make sure that the attribute measurement types areset.




1.2.
Mergingdatasets:Mergethesedatasets and make one file as shown in the screenshot (attached)




1.3.
DataQualityIssues:Twonotableproblemsare:




1.2.1.
Duplicates:Removeduplicates,whilekeepingthe1strecordineachgroupinthedata.




1.2.2.
Missingvalues:ActualDeptTimehasmissingvalues(in@NULLform).ReplacethesemissingvalueswiththeScheduledDeptTime,iftheflightdepartedontime.






1.3.
Selectingsub-sample:SinceUSAirwaysdoesnotexistanymore,remove/discarditsflightsfromthedataset.NotethatUSAirwayscarriercodeis“US”.



1.4.
Re-classifyanattribute:Insteadofusingindividualdaysoftheweekasafeature,assumethatweareonlyinterestedinknowingwhetheraflightwasscheduledonaweekday,orweekend.Re-classifytheDayOfWeekattributeintoanewattributeaccordingly(newattributeshouldhavetwovalues:“weekday” and“weekend”).NoticethatoriginalDayOfWeekvalues6and7indicateSaturdayandSunday.AlsonotethatyoumayneedtochangethemeasurementtypeoftheDayOfWeekfirst,beforedoingthere- classification.




1.5.
Deriveanewattribute:Similartothepreviousstep,assumethatinsteadofusingindividualdaysofthemonthasafeature,weareonlyinterestedinknowingwhetheraflightwasoperatedearlymonthor latemonth.IfaflightwasscheduledonJan15thorlater,thenitisa“LateMonth”flight.Otherwise,itisan


“EarlyMonth”flight.




Derive another attribute:Finally, let’s now assume that you’d like to group flights intothreecategoriesbasedontheirscheduleddeparturetimes:Morning,Afternoon,andEvening.Anyflightthatisbefore12PMisa“Morning”flight;anyflightbetween12PMand17PMisan“Afternoon”flight;andany flightthatislaterthan5PMisan“Evening”flight.




QUESTION1:Submit the merged and updated CSV file.



1.6.
Filteroutattributes:Removeallunnecessaryattributesfromthedatasetbeforerunninganyclassificationmodels.Theseincludeuniqueidentifierattributes,timerelatedattributes(HHMM),thedateattribute,andattributesthatwerere-codedintonewforms.



STEP2:PatternIdentification:Inthisstep,youareexpectedtobuildaDecisionTreemodelusingthealgorithmandmanuallyinterpretitsdecisionrules.




QUESTION2:Whatarethemaincontributorstoaflight’sbeingdelayed?Ifyouwereplanningforanair travel,whenwouldyouprefertobookyourflight(assumingthatyouareflexiblewithyourschedule)?




STEP3:ModelEvaluation:Inthisstep,youwillanalyzetheperformanceofdifferentclassificationmodels.




3.1.
Partitioning:Splitthedatasetintotrainingandtestsets.Keep70%oftheobservationsinthetrainingsetand30%ofobservationsinthetestset and cross-validation fold 10 separately.




3.2.

C5.0 Performance:Build a decision tree model from the Training dataset and test its performanceusingtheTestingdataset.



QUESTION 3:
What is the overall accuracy of the decision tree model on thecross-validation fold 10 andtesting dataset?




3.3.
Random Forest & SVM:Now,usetheRandom Forest and SVM toconductthesameanalysisinthe previousstep.




QUESTION4:WhatistheoverallaccuracyoftheRandom Forest and SVMonthetestingdataset andcross-validation fold 10?Which wouldyoupick,basedonaccuracyperformance?

Answered 1 days AfterDec 07, 2021

Answer To: Datasets: There are two datasets for this exercise: 1) “FlightInfo_scheduled.csv” and 2)...

Subhanbasha answered on Dec 08 2021
104 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here