[15 points] k-Nearest-NeighborsYou sell IT products and are using kNN to build an IT wallet...

Question

[15 points] k-Nearest-NeighborsYou sell IT products and are using kNN to build an IT wallet estimation predictor. You have information on the total IT budgets of a large set of companies, that will be your database of potential neighbors. You already have decided to use Euclidian distance. Now you want to estimate your wallet share for Acme Corp., one of your current customers for whom you do not know the IT budget. Explain precisely how you will estimate your wallet share for Acme with this technique including (a) stating the target variable, (b) proposing 3 features for predicting the target variable, (c) restriction on the choice of k and (d) evaluation.Microsoft Word - Exampaper-ECON7880          HONG KONG BAPTIST UNIVERSITY Page:  1                               SEMESTER 1 EXAMINATION, 2020-2021 Course Code: ECON7880 Section Number: 1 Time Allowed: 3 Hour(s)  Course Title:  Foundations in Big Data Analytics: Concepts and  Techniques Total No. of Pages:         Q1. [25 points] Confusion Matrix  Table 1 is a confusion matrix generated by Model A. The notations ? and ? represent the actual  positive and negative. The rows Y and N represent predicted decisions “Yes (Offer)” and “No  (No offer)” generated by Model A. There are 100,000 observations in total.    Table 1   ? ?  Y 56,000 6,000  N 5,000 33,000    Suppose that the correct positive prediction, i.e. predict Y for ?, yields $5, and incorrect positive  prediction, i.e. Y for ?, yields -$1. No loss nor benefit for negative predictions, i.e. the benefit  or cost for predicting an N for ? and ? is $0.     (a) Calculate the overall expected value for Model A per person. Show the calculation steps  and state your answer.  (b) Write down the confusion matrix for the majority model. Majority model is either an “All- No model” if ? is the majority or “All-Yes model” if ? is the majority.   (c) Calculate the overall expected value for the majority model per person. Show the calculation  steps and state your answer.  (d) Assume same number of Y offers as in Table 1. Write down the confusion matrix for the  random model.  (e) Calculate the expected overall profit for the random model per person. Show the calculation  steps and state your answer.          Q2. [15 points] k-Nearest-Neighbors   You sell IT products and are using kNN to build an IT wallet estimation predictor.  You have  information on the total IT budgets of a large set of companies, that will be your database of  potential neighbors.  You already have decided to use Euclidian distance.  Now you want to  estimate your wallet share for Acme Corp., one of your current customers for whom you do not  know the IT budget. Explain precisely how you will estimate your wallet share for Acme with  this technique including (a) stating the target variable, (b) proposing 3 features for predicting  the target variable, (c) restriction on the choice of k and (d) evaluation.             Skylarhihi          HONG KONG BAPTIST UNIVERSITY Page:  2                               SEMESTER 1 EXAMINATION, 2020-2021 Course Code: ECON7880 Section Number: 1 Time Allowed: 3 Hour(s)  Course Title:  Foundations in Big Data Analytics: Concepts and  Techniques Total No. of Pages:        Q3. [25 points] Visualization Curves   A population of 100,000  customers with 2 types ?  (respond) and ?  (not respond) has an  unbalanced data structure with (?, ?) = (25000,75000). The overall response rate is 25%. A  predictive model ranks the probability scores of the customers in descending order as shown in  Table 2.  Table 2  Percentage of Targeted Customers Cumulative Responses  10,000 8,000  20,000 14,000  30,000 18,000  40,000 19,000  50,000 20,000  60,000 21,000  70,000 22,000  80,000 23,000  90,000 24,000  100,000 25,000    (a) Plot the cumulative response curve at these 10 points with their corresponding (?, ?)  coordinates, the axis-label and the title of the plot.  (b) Plot the lift curve at these 10 points with their corresponding (?, ?) coordinates, the axis- label and the title of the plot.  (c) The cost of each incentive offer is $12. The marketing campaign is subject to a budget  constraint of $240,000. How many customers can the firm target?  (d) Suppose the revenue of each customer response is $40 and the cost of each incentive offer  is $12. Write down the cost-benefit matrix.   ? ?  Y    N (e) Based on (d), calculate the expected profit per offer? Note: profit=revenue-cost.                   HONG KONG BAPTIST UNIVERSITY Page:  3                               SEMESTER 1 EXAMINATION, 2020-2021 Course Code: ECON7880 Section Number: 1 Time Allowed: 3 Hour(s)  Course Title:  Foundations in Big Data Analytics: Concepts and  Techniques Total No. of Pages:      Q4. [15 points] Naive Bayes  Table 3(a) shows the feature “????ℎ??” and decision “????” of 14 instances. The frequency  table is summarized in Table 3(b).     Table 3(a)  Instance ??????? ???? Instance ??????? ????  #1 Sunny No #8 Rainy No  #2 Overcast Yes #9 Sunny Yes  #3 Rainy Yes #10 Rainy Yes  #4 Sunny Yes #11 Sunny No  #5 Sunny Yes #12 Overcast Yes  #6 Overcast Yes #13 Overcast Yes  #7 Rainy No #14 Rainy No    Table 3(b) Frequency Table  Weather No Yes  Overcast  4  Rainy 3 2  Sunny 2 3  Total 5 9      (a) Using Table 3(b) to calculate the marginal likelihood for each type of the weather, i.e.  ?(????????), ?(?????) and ?(?????).  (b) Using Table 3(b) to calculate the marginal likelihood for each decision, i.e. ?(??) and  ?(???).  (c) Calculate the conditional probability ?(???|?????) using Bayes’ rule.                             HONG KONG BAPTIST UNIVERSITY Page:  4                               SEMESTER 1 EXAMINATION, 2020-2021 Course Code: ECON7880 Section Number: 1 Time Allowed: 3 Hour(s)  Course Title:  Foundations in Big Data Analytics: Concepts and  Techniques Total No. of Pages:      Q5. [20 points] Text Mining   Consider two documents ?  and ? . Either of these contains the word “?????” or “?????” as  shown in Table 4.    Table 4   ?? ??  ????? 5 0  ????? 1 2  Total words in document 75 100    Suppose there are 10,000 documents in the entire corpus and the word “?????” appears in 4,000  of these documents and “?????” in 1000 of these documents.     (a) Calculate the four normalized term frequencies (TF) for “?????” and “?????” in ?  and  ? .   (b) Calculate the inverse document frequency (IDF) for these two words.  (c) Calculate the four TF-IDF (i) TF-IDF(“?????”,? ), (ii) TF-IDF(“?????”,? ), (iii) TF- IDF(“?????”,? ), and (iv) TF-IDF(“?????”,? ).   (d) Now we have a new query “Hello World”. Calculate the cosine similarity with ?  and ?   respectively. Which one is more similar to the search query?

[15 points] k-Nearest-Neighbors You sell IT products and are using kNN to build an IT wallet estimation predictor. You have information on the total IT budgets of a large set of companies, that will...

Get Answer To This Question

Related Questions & Answers

Submit New Assignment