.Assessment item 3 Business Case Analysis 2 Value: 20% Due date: 29-Jan-2017 Return date: 20-Feb-2017 Submission method options EASTS (online) Task Business Case Analysis 2 1. Classification Tree....


.Assessment item 3 Business Case Analysis 2 Value: 20% Due date: 29-Jan-2017 Return date: 20-Feb-2017 Submission method options EASTS (online) Task Business Case Analysis 2 1. Classification Tree. This item requires the dataset FlightDelays.xls which can be found on the subject Interact site. (10%) The following is a business analytical problem faced by airlines. The objective is to determine the measurements that affect flight delays. The dataset FlightDelays.xls contains information on all commercial flights departing the Washington, DC area and arriving at New York during January 2004. For each flight there is information on the departure and arrival airports, the distance of the route, the scheduled time and date of the flight, and so on. The variable that we are trying to predict is whether or not a flight is delayed. A delay is defined as an arrival that is at least 15 minutes later than scheduled. Create dummies for day of week, carrier, departure airport, and arrival airport. This will give you 17 dummies. Bin the scheduled departure time into eight bins (in XLMiner use Transform → Bin Continuous Data and select equal width). After binning CRS_DEP_TIME into the 8 bins, this new variable should be broken down into dummies (because the effect will not be linear, due to the morning and afternoon rush hours). This will avoid treating the departure time as a continuous predictor, because it is reasonable that delays are related to rush-hour times. Partition the data into training and validation sets. (a). Fit a classification tree to the flight delay variable using all the relevant predictors. Do not include DEP_TIME (actual departure time) in the model because it is unknown at the time of prediction (unless we are generating our predictions of delays after the plane takes off, which is unlikely). In the third step of the Classification Tree menu, choose “Maximum # levels to be displayed = 6”. Use the best-pruned tree, setting the minimum number of observations in the final nodes to 1. Express the resulting tree as a set of rules. (b). If you needed to fly between DCA and EWR on a Monday at 7 AM, would you be able to use this tree? What other information would you need? Is it available in practice? What information is redundant? (c). Fit another tree, this time excluding the weather predictor. Select the option of seeing both the full tree and the best-pruned tree. You will find that the best-pruned tree contains a single terminal node. i. How is this tree used for classification? (What is the rule for classifying?) ii. To what is this rule equivalent? iii. Examine the full tree. What are the top three predictors according to this tree? iv. Why does the pruned tree result in a tree with a single node? v. What is the disadvantage of using the top levels of the full tree as opposed to the best pruned tree? 2. Cluster Analysis. This item requires the dataset EastWestAirlinesCluster.xls which can be found on the subject Interact site. (10%) The dataset EastWestAirlinesCluster.xls contains information on 3999 passengers who belong to an airline’s frequent flier program. For each passenger the data include information on their mileage history and on different ways they accrued or spent miles in the last year. The goal is to try to identify clusters of passengers that have similar characteristics for the purpose of targeting different segments for different types of mileage offers. (a). Apply hierarchical clustering with Euclidean distance and Ward's method. Make sure to normalize the data first. How many clusters appear? (b). What would happen if the data were not normalized? (c). Compare the cluster centroid to characterize the different clusters, and try to give each cluster a label. (d). Use kmeans clustering with the number of clusters that you found above. Does the same picture emerge? (e). Which clusters would you target for offers, and what types of offers would you target to customers in that cluster?





Oct 07, 2019
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here