I need you to complete an excel homework assignment for my Data Analysis for Business Class. I will share the excel file my professor wants us to complete. It has 6 problems each with accompanying data sets used for the specific problem ie.(Problem #4, Data#4). I will also attach the professor's slides which can be used to help with what the answers to the questions should look like.
PowerPoint Presentation ITEC 210 DATA ANALYSIS FOR BUSINESS Exploring The Relationship Between Two Variables Prof. Itir KARAESMEN AYDIN 1 Outline and Learning Outcomes In this presentation, you will learn To interpret two-way tables that explore the relationship between two categorical variables To explore the relationship between a quantitative (numerical) and categorical variable using side-by-side charts and histograms To interpret scatter diagrams (plots) To interpret correlation between two numerical variables To interpret the trend line on a scatter diagram. NOTE: This presentation does not show you *how* the work is done on Excel. 2 Topic: Relationship Between Two Categorical Variables 3 Learning Objective After completing these examples, you will be able to interpret two-way tables for two categorical variables visualize two-way tables using charts answer business questions or provide business insights with the help of two-way tables. 4 Two-Way Tables Information displayed in each cell of the table Frequency or % frequency by column (Variable-1) Frequency or % frequency by row (Variable-2) Frequency or % frequency by grand total 5 TABLECategories of Variable 1 Cat-1Cat-2Cat-3Cat-4… Categories of Variable 2Cat-A Cat-B … Excel Exercise #4 Use the data from Exercise#2. 6 #4 (cont’d) Considering all the customers who used a discount code, did the majority use the EStore Card to make a payment? How about all the customers who did not use a discount code? To answer these questions: Prepare a two-way table to display the percentage frequencies with respect to method of payment and use of discount code. Visualize the data using charts. 7 #4 (cont’d) Two-way table prepared using Pivot Tables Frequencies are displayed below 8 #4 (cont’d) Two-way table prepared using Pivot Tables Frequencies as percentages of the Grand Total are displayed below 9 #4 (cont’d) Two-way table prepared using Pivot Tables Frequencies as percentages of the column totals (use of discount code) are displayed below 10 #4 (cont’d) Side-by-side bar chart created via Pivot Charts We display the percentage frequency for method of payment considering all the customers who used a discount code (vs. not used). 11 #4 (cont’d) Q: Considering the customers who used a discount code, did the majority use the EStore Card to make a payment? How about the customers who did not use a discount code? A: Of all the customers who used a discount code, 80% used the EStore Card. Of all the customers who did not use a discount code, 50% used the EStore Card. 12 Excel Exercise #5 Use the data in the previous exercise. Of all the customers who made a purchase within the past 3 months, what percentage used an EStore Card? Is this percentage different for the customers who did not make purchase in the past 3 months? To answer these questions: Prepare a two-way table for method of payment and time since last purchase. Visualize the data using bar charts. 13 #5 (cont’d) Two-way table prepared using Pivot Tables Frequencies as percentages of the Grand Total are displayed below 14 #5 (cont’d) Two-way table prepared using Pivot Tables Frequencies as percentages of the column total (time since last payment) are displayed below 15 #5 (cont’d) Side-by-side bar chart created via Pivot Charts Displays percentage frequency of method of payment by customers who made a purchase in the past 3 months (vs. not) 16 #5 (cont’d) Q: Of all the customers who made a purchase within the past 3 months, what percentage used an EStore Card? Is this percentage different for the customers who did not make purchase in the past 3 months? A: Of all the customers who made a purchase within the past 3 months, 74% used the EStore card. Of all the customers who did not make a purchase within the past 3 months, 68% used the EStore card. 17 Summary Two-way tables and side-by-side bar charts make comparison across categories easy. However: Just because we observe that the percentage of customers who used the EStore Card is different for two distinct groups of customers (discount code users vs. not) does not mean this difference is statistically significant. Statistical tests (e.g. hypothesis tests) can be used to test for statistical significance of the differences. 18 Topic: Relationship Between A Categorical Variable and A Quantitative Variable 19 Learning Objective After completing these examples, you will be able to interpret tables that display information about a quantitative variable when data is grouped with respect to another categorical variable compare histograms of a quantitative variable for different categories answer business questions or provide business insights with the help of tables and charts. 20 Excel Exercise #6 Consider the online retailer data used in Exercises #2 to #5. The data has been expanded to include the transaction value ($ sales value) and the age of each customer. 21 #6 (cont’d) On the average, did the customers who use a discount code spend more or less than the customers who did not use a discount code? To answer this question: Prepare a table to display the average $ amount spent per customer in each category (i.e. those that use discount code vs. not). Visualize the data using bar charts. 22 #6 (cont’d) Pivot Table that displays the average $ sales value for customers who used a discount code vs. not 23 Rows: Use of discount code Column: Transaction value Average transaction value #6 (cont’d) A bar chart created via Pivot Charts 24 #6 (cont’d) Q: On the average, did the customers who use a discount code spend more or less than the customers who did not use a discount code? A: The average amount spent by customers who used a discount code ($113) is lower than the average amount spent by those who did not use a discount code ($120). Note: This difference may or may not statistically significant. 25 Excel Exercise #7 Use the data provided for Exercise#6. Is the transaction value distribution similar for customers who use the discount code vs. not? To answer this question: Prepare a table to display the statistical distribution of the $ amount spent for customers how used the discount code (vs. not). Visualize the data using bar charts. 26 #7 (cont’d) Pivot Table that displays the no. of customers who used a discount code (vs. not), grouped by the amount spent by customer. Amount spent is grouped by $25 increments 27 #7 (cont’d) Side-by-side bar chart is created using Pivot Charts 28 #7 (cont’d) Histogram of amount spent for two separate categories, created using Pivot Charts 29 #7 (cont’d) Q: Is the transaction value distribution similar for customers who use the discount code vs. not? A: Both distributions are skewed to the right; there are a few customers who spent >$350 regardless of the discount code. The bin ranges $50-$74 and $75-$99 have the highest frequency for both groups of customers. The number of customers who did not use a discount code declines sharply beyond $99. In contrast, there is a sizable portion of customers who used a discount code that spent more than $99. 30 Summary We can compare quantitative data across categories by Creating tables Creating side-by-side charts Creating separate histograms for each category Pivot tables and pivot charts are quick ways to organize data 31 Topic: Relationship Between Two Quantitative Variables 32 Learning Objective After completing these examples, you will be able to interpret correlation between two quantitative variables interpret scatter plots (diagrams) interpret trend lines on scatter plots. 33 Review: Correlation Correlation: measures the strength of the linear relationship between two quantitative variables Correlation takes values between -1 and +1. 34 Correlation = -1 Correlation = 0 Correlation = +1 Types of Relationships Y X Y X Y Y X X Linear relationships Curvilinear relationships Types of Linear Relationships Y X Y X Strong Y Y X X Weak Y X Y X No relationship Review: Correlation If correlation value is close to 0, does this mean there is no apparent relationship between the variables? Example: Correlation between X and Y variables is 0.000 for the data below. 37 Review: Correlation If correlation value is close to +1 (or -1), does this mean the relationship between the variables is perfectly linear? Example: Correlation is +0.90 for the data below. 38 Excel Exercise #8 Use the data provided for Exercise#6. What is the correlation between the amount spent by a customer and the customer’s age? Use the CORREL function in Excel. Format: CORREL(Range1, Range2) Range1: cell range for the amount spent Range2: cell range for the age of customer 39 #8 (cont’d) Correlation between the amount spent by a customer and the customer’s age is +0.667. There is a positive relationship between these two variables. 40 #8 (cont’d) Wrong interpretation: “The more people spend, the older they get.” This statement says that spending more at the retail shop is causing people to age. Better interpretation: “The older customers in this data set tend to spend more and the younger customers tend to spend less.” 41 #8 (cont’d) Possible reasons for a positive relationship between age and transaction value: Older people have higher income and have higher discretionary budget to spend. The retailer sells products that are demanded more by older people (e.g. medical devices). 42 Summary Zero correlation does not mean one variable is not related to another. Correlation is not causation. When X and Y are correlated, the changes in Y may not be “caused” or “induced” by changes in X but by outside factors that are related to both variables. CORREL function in Excel computes the correlation. 43 Excel Exercise #9 Use the data provided for Exercise#8. Explore the relationship between the amount spent by a customer and the customer’s age by preparing a scatter plot (scatter diagram). Use the chart functions in Excel to prepare a scatter plot. Age should be on the horizontal axis. Transaction value should be on the vertical axis. 44 #9 (cont’d) Scatter plot of Transaction Value and Age 45 #9 (cont’d) Observations Older customers tend to spend more. Customers who are 35 or younger spent between $50 and $100. Only the customers who are older than 35 spent more than $100. The transactions with the highest $ value were made by customers between the ages of 35 and 45. The relationship between age and transaction value is positive and linear for customers 35 or younger. The relationship between age and transaction value does not appear to be linear for customers 35 and older. 46 Summary A scatter plot can be used to explore the relationship between two quantitative variables. Trends and patterns can be detected visually on a scatter plot. The scatter plot in the Excel chart function plots two quantitative variables. 47 Trend Line Trend line is a straight line It is displayed on the scatter plot The trend line equation is Y = b0 + b1 X where X: variable displayed on the horizontal axis of the scatter plot b0: intercept of the line (the value Y takes