Answer To: Sheet1 IsCanceledLeadTimeArrival...
Mohd answered on May 11 2021
We have performed analysis on different types of problem such as revenue analysis, cancellation analysis and customer segmentation.
We have developed linear regression, k nearest neighbors and naive Bayes classifiers.
First we have load the data into studio, summarized the data using skim function.
We have identified one significant outlier in ADR City hotels data. We have manually eliminated that outlier. We have visualized ADR data across reservation status date or arrival date. As we can easily see from graph, that there is critical seasonality present in dataset. For resort, as we can easily see from graph, that there is critical seasonality present in dataset.
Both graph has shown the trend that ADR is increasing significantly over a period of time.
We have analyzed relationship between distribution channels as category and ADR as numeric over the different market segment.
For city hotel, we have maximum cancellation in corporate distribution channels and corporate market segment category.
Market segment like aviation, complementary, direct, and groups has minimum cancellation.
Corporate offline TA/to and offline TA has maximum cancellation.
For resort, we have minimum cancellation in all distribution channels category and corporate market segment category.
Market segment like complementary, direct, groups, offline TA/To and offline TA has maximum cancellation.
Now we have analyzed adr vs market segment over customer type.
Customer type for maximum cancellation group and contract has
Maximum cancellation.
In resort dataset, transient and transient party has maximum cancellation.
We have partitioned datasets into two split training set has 70 percent of total observation and rest 30 percent are in validation set.
We have trained linear regression model to predict total pay which is adr*total_stays.
For hotel and resort dataset, we have considered several predictors such as distribution channels, Market segment, customer type, children, Is repeated customer, parking space, special requests and many more.
City hotel data model has adjusted r square value is 72.77 that means we can explain 72.77 percent variability in response variable using this model.
Resort data model has adjusted r square value is 58.42, which means we can explain 58.42 percent variability in response variable using this model.
We have trained three model with naive Bayes, k-nearest neighbors and svm linear.
For city dataset K nearest...