Customer Analytics —Individual Research Project— 1. Background Information You are working for a telecommunication provider. The company wants to improve their customer lifetime value (CLV)...

1 answer below »
these are the questions



Customer Analytics —Individual Research Project— 1. Background Information You are working for a telecommunication provider. The company wants to improve their customer lifetime value (CLV) calculations for newly acquired customers. The key question that the firms marketing managers have is how they can account for the fact that it is very difficult to know a customer’s relationship duration in advance. Yet customer relationship duration is one of the key information to be considered in CLV calculations. When they describe this problem to you, you suggest that you might be able to help. In particular, you believe that you can use a survival model in order to predict customer survival probabilities and then use those probabilities to improve the CLV calculations. You agree with the marketing managers that you will check the data available and perform the necessary analyses for them. 2. Data, Sample and Variables You are provided a dataset that contains information about 3,333 randomly sampled customer relationships. The dataset is called “telecom_churn” and is a csv file. The dataset contains the following variables: Churn: Information whether or not a customer has churned. AccountWeeks: The duration of the customer relationship as it is reflected in the time the customer had an active account at the firm. The variable is captured in weeks. DataPlan: Whether or not a customer has a data plan DataUsage: Gigabytes of average monthly data usage CustServCalls: How often a customer has called the service hotline DayMins: Average daytime minutes (calling time) per month DayCalls: Average number of daytime calls MonthlyCharge: Average monthly bill OverageFee: Largest overage fee in the last year RoamMins: Average number of roaming minutes 3. Your Tasks Please use R to perform the following tasks. You can earn a total of 100 points. 1) (10 points) Estimate a base survival model (i.e., without explanatory variables) for an average customer. Call this mod0. Please provide the output and visualize the survival curve. 2) (20 points) Please estimate a model that includes DataPlan as an explanatory variable. Call it mod1. Please provide the output and visualize the survival curve. Would you prefer mod0 or mod1 for predicting customer survival probabilities. Why? 3) (10 points) You want to use the model mod1 to make predictions of survival probabilities to inform your customer acquisition efforts (e.g., which customers should be preferably acquired). Do you see a chance to improve model performance given the data at hand? Please explain your answer. 4) (10 points) You decide to move on with mod1. Critically evaluate the predicted curve. Do you see any reason for concern? 5) (30 points) You decide to use mod1 to calculate the expected CLV for customers without a data plan and customers with a data plan. The annual interest rate is 5% (note that you have to translate this into weekly discount rates). For an assumed customer lifetime of 500 weeks, please calculate the CLV and the probability corrected CLV for customers without a data plan and customers with a data plan. Please present the correct results in a table (i.e., CLV and probability corrected CLV for both customer prototypes). Should the firm focus on either type of the two customers in their future customer acquisition efforts? Here is a little helper on how to achieve that: First, you have to derive monthly average cash flows for the two customer prototypes separately from the variable MonthlyCharge (use DataPlan as a grouping variable). Second you have to calculate the average weekly cash flows from this data (average monthly cash flow * 12/51); you can use the weekly average cash flow as a cash flow for each of the 500 weeks. Third, you have to derive the predicted survival probabilities for each customer. We have not done the coding for this in the tutorial but it can be achieved in a few steps. You just have to make sure that you use your own variable names in the code below. Suggested code for this step # First install the rms package, which is required to derive predictions for different points in # time. install.packages("rms") library ("rms") # Then you have to rerun mod1 using the psm function that is equivalent to the survreg # function used in the tutorial mod1_psm <- psm(surv(accountweeks,="" churn)="" ~="" dataplan,="" data="Telco1," dist="weibull" )="" mod1_psm="" #="" this="" model="" is="" the="" same="" as="" the="" previous="" model="" mod1="" #="" we="" produce="" a="" sequence="" which="" will="" define="" the="" points="" in="" time="" at="" which="" we="" want="" predicted="" survival="" probabilities="" from="" our="" model.="" weeks=""><- seq(1,500,="" by="+1)" #="" we="" define="" the="" levels="" of="" the="" dataplan="" variable="" for="" which="" we="" want="" probabilities="" n.dat=""><- expand.grid(dataplan="levels(DataPlan))" #="" we="" ask="" the="" model="" for="" predictions="" for="" 500="" weeks="" ahead.=""><-survest(mod1_psm, newdata="data.frame(n.dat)," time="weeks)" #="" we="" rearrange="" the="" data="" such="" that="" we="" can="" easily="" use="" it="" for="" the="" cash="" flow="" predictions.=""><-cbind(n.dat, b1)=""><-melt(b2, id.vars=c("dataplan"), variable.name="time", value.name="surv prob") b3 fourth, you now have all necessary information to calculate the clv and probability corrected clv for both customer prototypes. you can do this either in r or in excel. 6) (20 points) please present a simple visualization that demonstrates your key insight from the probability corrected clv to managers. (it is easiest to use powerpoint to provide an appropriate chart.) customer analytics —individual research project— 1. background information you are working for a telecommunication provider. the company wants to improve their customer lifetime value (clv) calculations for newly acquired customers. the key question that the firms marketing managers have is how they can account for the fact that it is very difficult to know a customer’s relationship duration in advance. yet customer relationship duration is one of the key information to be considered in clv calculations. when they describe this problem to you, you suggest that you might be able to help. in particular, you believe that you can use a survival model in order to predict customer survival probabilities and then use those probabilities to improve the clv calculations. you agree with the marketing managers that you will check the data available and perform the necessary analyses for them. 2. data, sample and variables you are provided a dataset that contains information about 3,333 randomly sampled customer relationships. the dataset is called “telecom_churn” and is a csv file. the dataset contains the following variables: churn: information whether or not a customer has churned. accountweeks: the duration of the customer relationship as it is reflected in the time the customer had an active account at the firm. the variable is captured in weeks. dataplan: whether or not a customer has a data plan datausage: gigabytes of average monthly data usage custservcalls: how often a customer has called the service hotline daymins: average daytime minutes (calling time) per month daycalls: average number of daytime calls monthlycharge: average monthly bill overagefee: largest overage fee in the last year roammins: average number of roaming minutes 3. your tasks please use r to perform the following tasks. you can earn a total of 100 points. 1) (10 points) estimate a base survival model (i.e., without explanatory variables) for an average customer. call this mod0. please provide the output and visualize the survival curve. 2) (20 points) please estimate a model that includes dataplan as an explanatory variable. call it mod1. please provide the output and visualize the survival curve. would you prefer mod0 or mod1 for predicting customer survival probabilities. why? 3) (10 points) you want to use the model mod1 to make predictions of survival probabilities to inform your customer acquisition efforts (e.g., which customers should be preferably acquired). do you see a chance to improve model performance given the data at hand? please explain your answer. 4) (10 points) you decide to move on with mod1. critically evaluate the predicted curve. do you see any reason for concern? 5) (30 points) you decide to use mod1 to calculate the expected clv for customers without a data plan and customers with a data plan. the annual interest rate is 5% (note that you have to translate this into weekly discount rates). for an assumed customer lifetime of 500 weeks, please calculate the clv and the probability corrected clv for customers without a data plan and customers with a data plan. please present the correct results in a table (i.e., clv and probability corrected clv for both customer prototypes). should the firm focus on either type of the two customers in their future customer acquisition efforts? here is a little helper on how to achieve that: first, you have to derive monthly average cash flows for the two customer prototypes separately from the variable monthlycharge (use dataplan as a grouping variable). second you have to calculate the average weekly cash flows from this data (average monthly cash flow * 12/51); you can use the weekly average cash flow as a cash flow for each of the 500 weeks. third, you have to derive the predicted survival probabilities for each customer. we have not done the coding for this in the tutorial but it can be achieved in a few steps. you just have to make sure that you use your own variable names in the code below. suggested code for this step # first install the rms package, which is required to derive predictions for different points in # time. install.packages("rms") library ("rms") id.vars="c("DataPlan")," variable.name="time" ,="" value.name="surv prob" )="" b3="" fourth,="" you="" now="" have="" all="" necessary="" information="" to="" calculate="" the="" clv="" and="" probability="" corrected="" clv="" for="" both="" customer="" prototypes.="" you="" can="" do="" this="" either="" in="" r="" or="" in="" excel.="" 6)="" (20="" points)="" please="" present="" a="" simple="" visualization="" that="" demonstrates="" your="" key="" insight="" from="" the="" probability="" corrected="" clv="" to="" managers.="" (it="" is="" easiest="" to="" use="" powerpoint="" to="" provide="" an="" appropriate="" chart.)="" customer="" analytics="" —individual="" research="" project—="" 1.="" background="" information="" you="" are="" working="" for="" a="" telecommunication="" provider.="" the="" company="" wants="" to="" improve="" their="" customer="" lifetime="" value="" (clv)="" calculations="" for="" newly="" acquired="" customers.="" the="" key="" question="" that="" the="" firms="" marketing="" managers="" have="" is="" how="" they="" can="" account="" for="" the="" fact="" that="" it="" is="" very="" difficult="" to="" know="" a="" customer’s="" relationship="" duration="" in="" advance.="" yet="" customer="" relationship="" duration="" is="" one="" of="" the="" key="" information="" to="" be="" considered="" in="" clv="" calculations.="" when="" they="" describe="" this="" problem="" to="" you,="" you="" suggest="" that="" you="" might="" be="" able="" to="" help.="" in="" particular,="" you="" believe="" that="" you="" can="" use="" a="" survival="" model="" in="" order="" to="" predict="" customer="" survival="" probabilities="" and="" then="" use="" those="" probabilities="" to="" improve="" the="" clv="" calculations.="" you="" agree="" with="" the="" marketing="" managers="" that="" you="" will="" check="" the="" data="" available="" and="" perform="" the="" necessary="" analyses="" for="" them.="" 2.="" data,="" sample="" and="" variables="" you="" are="" provided="" a="" dataset="" that="" contains="" information="" about="" 3,333="" randomly="" sampled="" customer="" relationships.="" the="" dataset="" is="" called="" “telecom_churn”="" and="" is="" a="" csv="" file.="" the="" dataset="" contains="" the="" following="" variables:="" churn:="" information="" whether="" or="" not="" a="" customer="" has="" churned.="" accountweeks:="" the="" duration="" of="" the="" customer="" relationship="" as="" it="" is="" reflected="" in="" the="" time="" the="" customer="" had="" an="" active="" account="" at="" the="" firm.="" the="" variable="" is="" captured="" in="" weeks.="" dataplan:="" whether="" or="" not="" a="" customer="" has="" a="" data="" plan="" datausage:="" gigabytes="" of="" average="" monthly="" data="" usage="" custservcalls:="" how="" often="" a="" customer="" has="" called="" the="" service="" hotline="" daymins:="" average="" daytime="" minutes="" (calling="" time)="" per="" month="" daycalls:="" average="" number="" of="" daytime="" calls="" monthlycharge:="" average="" monthly="" bill="" overagefee:="" largest="" overage="" fee="" in="" the="" last="" year="" roammins:="" average="" number="" of="" roaming="" minutes="" 3.="" your="" tasks="" please="" use="" r="" to="" perform="" the="" following="" tasks.="" you="" can="" earn="" a="" total="" of="" 100="" points.="" 1)="" (10="" points)="" estimate="" a="" base="" survival="" model="" (i.e.,="" without="" explanatory="" variables)="" for="" an="" average="" customer.="" call="" this="" mod0.="" please="" provide="" the="" output="" and="" visualize="" the="" survival="" curve.="" 2)="" (20="" points)="" please="" estimate="" a="" model="" that="" includes="" dataplan="" as="" an="" explanatory="" variable.="" call="" it="" mod1.="" please="" provide="" the="" output="" and="" visualize="" the="" survival="" curve.="" would="" you="" prefer="" mod0="" or="" mod1="" for="" predicting="" customer="" survival="" probabilities.="" why?="" 3)="" (10="" points)="" you="" want="" to="" use="" the="" model="" mod1="" to="" make="" predictions="" of="" survival="" probabilities="" to="" inform="" your="" customer="" acquisition="" efforts="" (e.g.,="" which="" customers="" should="" be="" preferably="" acquired).="" do="" you="" see="" a="" chance="" to="" improve="" model="" performance="" given="" the="" data="" at="" hand?="" please="" explain="" your="" answer.="" 4)="" (10="" points)="" you="" decide="" to="" move="" on="" with="" mod1.="" critically="" evaluate="" the="" predicted="" curve.="" do="" you="" see="" any="" reason="" for="" concern?="" 5)="" (30="" points)="" you="" decide="" to="" use="" mod1="" to="" calculate="" the="" expected="" clv="" for="" customers="" without="" a="" data="" plan="" and="" customers="" with="" a="" data="" plan.="" the="" annual="" interest="" rate="" is="" 5%="" (note="" that="" you="" have="" to="" translate="" this="" into="" weekly="" discount="" rates).="" for="" an="" assumed="" customer="" lifetime="" of="" 500="" weeks,="" please="" calculate="" the="" clv="" and="" the="" probability="" corrected="" clv="" for="" customers="" without="" a="" data="" plan="" and="" customers="" with="" a="" data="" plan.="" please="" present="" the="" correct="" results="" in="" a="" table="" (i.e.,="" clv="" and="" probability="" corrected="" clv="" for="" both="" customer="" prototypes).="" should="" the="" firm="" focus="" on="" either="" type="" of="" the="" two="" customers="" in="" their="" future="" customer="" acquisition="" efforts?="" here="" is="" a="" little="" helper="" on="" how="" to="" achieve="" that:="" first,="" you="" have="" to="" derive="" monthly="" average="" cash="" flows="" for="" the="" two="" customer="" prototypes="" separately="" from="" the="" variable="" monthlycharge="" (use="" dataplan="" as="" a="" grouping="" variable).="" second="" you="" have="" to="" calculate="" the="" average="" weekly="" cash="" flows="" from="" this="" data="" (average="" monthly="" cash="" flow="" *="" 12/51);="" you="" can="" use="" the="" weekly="" average="" cash="" flow="" as="" a="" cash="" flow="" for="" each="" of="" the="" 500="" weeks.="" third,="" you="" have="" to="" derive="" the="" predicted="" survival="" probabilities="" for="" each="" customer.="" we="" have="" not="" done="" the="" coding="" for="" this="" in="" the="" tutorial="" but="" it="" can="" be="" achieved="" in="" a="" few="" steps.="" you="" just="" have="" to="" make="" sure="" that="" you="" use="" your="" own="" variable="" names="" in="" the="" code="" below.="" suggested="" code="" for="" this="" step="" #="" first="" install="" the="" rms="" package,="" which="" is="" required="" to="" derive="" predictions="" for="" different="" points="" in="" #="" time.="" install.packages("rms")="" library="">
Answered Same DayNov 12, 2021

Answer To: Customer Analytics —Individual Research Project— 1. Background Information You are working for a...

Abr Writing answered on Nov 15 2021
141 Votes
telecom_churn.docx
Customer Analytics
Individual Research Project
Loading the data into R workspace
telecom_churn <- read.csv("telecom_churn.csv")
telecom_churn$DataPlan <- as.factor(
telecom_churn$DataPlan
)
Task 1
surv <- Surv(time = telecom_churn$AccountW
eeks,
event = telecom_churn$Churn)
mod0 <- survfit(surv ~ 0, data = telecom_churn)
ggsurvplot(mod0, data = telecom_churn, pval = T)
Warning in .pvalue(fit, data = data, method = method, pval = pval, pval.coord = pval.coord, : There are no survival curves to be compared.
This is a null model.
Task 2
mod1 <- survfit(surv ~ DataPlan, data = telecom_churn)
ggsurvplot(mod1, data = telecom_churn, pval = T)
The log-rank p-value of less than 0.0001 indicates a significant result if you consider p < 0.05 to indicate statistical significance. In this study, Customer which has a data plan were significantly superior, and customers using Data Plan B are doing better throughout the time of follow-up.
The base survival model is similar to the Survival model for the customers with no Data Plan. Also, we don’t have anything to compare in mod0, hence, selecting mod1 for predicting customer survival probabilities.
Task 3
as.data.frame(
t(as.data.frame(lapply(telecom_churn,
function(x) {return(
length(unique(x))
)})))
)
V1
Churn 2
AccountWeeks 212
DataPlan 2
DataUsage 174
CustServCalls 10
DayMins 1667
DayCalls 119
MonthlyCharge 627
OverageFee 1024
RoamMins 162
Th table above shows the number of unique values in each of the variables in the telecom_churn dataset. The model for predicting the survival curve to inform your customer acquisition efforts could have been improved by employing more variables from the dataset. However, looking at the sheer number of levels of each variables, it is much better to use mod1 for predicting the customer survival probabilities to avoid any possibility of over-fitting.
Task 4
ggsurvplot(mod1, data = telecom_churn, pval = T)
The curves diverge early and the log-rank test significant. One might want to argue that a customer acquisition with an increased sample size could validate these results, that is, that customers with data plan have a significantly higher probability of conversion compared to customers without data plan.
Task 5
clv <- telecom_churn %>%
group_by(DataPlan) %>%
summarise(
MeanMonthlyCharge = mean(MonthlyCharge)
) %>%
mutate(
CLV = 500*12/51*MeanMonthlyCharge
)
clv
# A tibble: 2 x 3
DataPlan MeanMonthlyCharge CLV
...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here