Final project Part 1 Data Submission:- Once you have completed your two (2) weeks of data collection, you must append your data to the class data set found here. You will be pasting your data directly...

1 answer below »

Final project Part 1


Data Submission:- Once you have completed your two (2) weeks of data collection, you must append your data to the class data set found here. You will be pasting your data directly to the web-file - no need to download the file until all students have completed their entries. Make sure to read the guidelines in the file completely before pasting your data. Questions about this process should be addressed with the instructor promptly. This is a very important step, as it provides your classmates with the aggregate data set for the group analysis.



part 2


Per Preparation for the Final Signature Assignment, you are tasked with locating at least two (2) sources to integrate into the analysis for the presentation. For this assignment submit an APA Style annotated bibliography. The format of the annotated bibliography should be one that reflects on the resource, explaining in 2-3 sentences how you will integrate the resource into your presentation.



Part 3


Per Preparation, Data and Contributions Step 1 and 2, you should have collected your personal data and shared it with the class. For this assignment, you must submit six (6) of the Class Statistics you are thinking of integrating into your presentation. For each statistic you should add a 1-2 sentence explaining what the statistic represented.



Formatting for this submission may be a bullet point list of the 6 statistics with 1-2 sentences following each bullet. The statistics can be entered into the text editor or attached as a Word document.



Part 4


One of the many takeaways from this module is the idea that you are Big Data. Our every movement on the web leaves traces and patterns of who we are and our interaction with the various platforms can shape our behavior in return. This assignment will require that you collect, analyze, and visualize your own behavioral data. The data collection and reflection process should draft a personal story told with data. When aggregated to classmates’ work, the project will result in a social analysis garnered from these stories and supporting resources.



All assignment guidelines can be found in the attached PDF: Final Project_Data Becomes You.pdf

Answered 1 days AfterJul 14, 2021

Answer To: Final project Part 1 Data Submission:- Once you have completed your two (2) weeks of data...

Mohd answered on Jul 16 2021
141 Votes
We have fetched data from an online repository of machine learning dataset. We have run summary statistics for numerical or continuous variables.
Continuous Variables:
In the case of continuous variables, we want to see the central tendency and the distribution of the variable. These are measured using various statistical visualization methods such as Histogram, column Chart and line chart. In case we want to analyze variation in two variables ( both must be continuous)
Categorical Variables
For categorical variables, we need to understand the distribution of each category. It can be measured using aggregation like count against each category. Bar chart can be used as a visualization.
Summary Statistics:
     
    ProductRelated
    ProductRelated_Duration
    BounceRates
     
     
     
     
    Mean
    31.731
    1194.746
    0.022
    Standard Error
    0.401
    17.234
    0.000
    Median
    18.000
    598.937
    0.003
    Mode
    1.000
    0.000
    0.000
    Standard Deviation
    44.476
    1913.669
    0.048
    Sample Variance
    1978.070
    3662130.143
    0.002
    Kurtosis
    31.212
    137.174
    7.723
    Skewness
    4.342
    7.263
    2.948
    Range
    705.000
    63973.522
    0.200
    Minimum
    0.000
    0.000
    0.000
    Maximum
    705.000
    63973.522
    0.200
    Sum
    391249.000
    14731220.892
    273.620
    Count
    12330.000
    12330.000
    12330.000
    Confidence Level(95.0%)
    0.785
    33.781
    0.001
As we can see from our summary table, we have 12330 rows of data of 18 variables. All of the above has a high kurtosis value, that means distribution of product related is highly peeked. We have a low standard error that means we have a low amount of variance in our dataset.
We can measure central tendency by mean, median and mode. Dispersion will be measured by standard deviation, standard error and variance. Also we have maximum, minimum and range measures that will help us to examine variable fluctuations range and outliers detection.
Categorical variables have been analysed with the help of summarization of categorical and continuous data. How can we identify and evaluate differences between different categories?
We have drawn a range of visualizations like scatter plot, boxplot, line chart and column chart. Each graph has some specific purpose and implications.
     
    ExitRates
    PageValues
    SpecialDay
     
     
     
     
    Mean
    0.043
    5.889
    0.061
    Standard Error
    0.000
    0.167
    0.002
    Median
    0.025
    0.000
    0.000
    Mode
    0.200
    0.000
    0.000
    Standard Deviation
    0.049
    18.568
    0.199
    Sample Variance
    0.002
    344.787
    0.040
    Kurtosis
    4.017
    65.636
    9.914
    Skewness
    2.149
    6.383
    3.303
    Range
    0.200
    361.764
    1.000
    Minimum
    0.000
    0.000
    0.000
    Maximum
    0.200
    361.764
    1.000
    Sum
    531.088
    72614.549
    757.400
    Count
    12330.000
    12330.000
    12330.000
    Confidence Level(95.0%)
    0.001
    0.328
    0.004
Bounce rate Count by Visitor type and month:
    Row Labels
    Count of Revenue
    Average of Informational
    New_Visitor
    1694
    0.333530106
    Other
    85
    0.176470588
    Returning_Visitor
    10551
    0.533503933
    Grand Total
    12330
    0.503568532
We have 1694 new visitors, 85 others and 10551 as returning customers.
    Row Labels
    Average of ExitRates
    Average of PageValues
    New_Visitor
    0.02
    10.77
    Other
    0.06
    18.19
    Returning_Visitor
    0.05
    5.01
    Grand Total
    0.04
    5.89
As we can see from the above table, other categories from visitor’s type have high exit rates and page value.
    Row Labels
    Average of Informational
    Average of Informational Duration
    New_Visitor
    0.333530106
    19.23747245
    Other
    0.176470588
    11.6854902
    Returning_Visitor
    0.533503933
    37.10199237
    Grand Total
    0.503568532
    34.47239793
Returning visitors have the highest average of informational and informational duration.
As we can see from the graph , bounce rate and exit rates have approximately the same pattern over the months.
Boxplot of exit rates by visitor type:
As we can see from boxplot exit rates with categories have the highest median value. New customers category has the lowest median value for exit rates. Returning customers and new visitors have some significant outliers.
We have built a scatter plot between bounce rate and exit rate. They have a strong association between them. Aa we can see from the above graph both have positive associations between them. Increase in one will cause a significant increase in the other one.
Dataset Source:
UCI Machine Learning Repository: Online Shoppers Purchasing Intention Dataset Data Set
Data Dictionary:
a. We have 10 continuous or numerical and 8 discrete or categorical variables.
b. The 'Revenue' variable has two classes (True or False).
c. "Administrative", "Administrative Duration", "Informational", "Informational Duration", "Product Related" and "Product Related Duration" denote the total number of pages visited by customers and total time spent by the users.
e. The "Bounce Rate", "Exit Rate" & "Page Value" are the metrics used by google analytics.
f. Bounce rate: “The value of "Bounce Rate" feature for a web page refers to the percentage of visitors who enter the site from that page and then leave ("bounce") without triggering any other requests to the analytics server during that session".
g. Exit rate: "The value of "Exit Rate" feature for a specific web page is calculated as for all pageviews to the page and it represents the percentage that the page was seen in the last session".
h. Page value: As retrieved from Wikipedia page value can refer as "The Page Value feature represents the average value for a web page that a user visited before completing an e-commerce transaction".
We also have variables like operating system, browser, region, traffic type, visitor type. They all are categorical variables. We have aggregated information in order to get better insights about data.
Bounce Rate Count
Total    Feb    Mar    May    Jul    Aug    Sep    Oct    Nov    Dec    June    Nov    Dec    June    Feb    Mar    May    Jul    Aug    Sep    Oct    Nov    Dec    June    New_Visitor    Other    Returning_Visitor    1    232    319    54    72    108    124    419    335    30    22    62    1    183    1675    3045    378    361    340    425    2557    1330    257    
Total    New_Visitor    Returning_Visitor    New_Visitor    Returning_Visitor    New_Visitor    Returning_Visitor    New_Visitor    Returning_Visitor    New_Visitor    Returning_Visitor    New_Visitor    Returning_Visitor    New_Visitor    Returning_Visitor    New_Visitor    Other    Returning_Visitor    New_Visitor    Other    Returning_Visitor    New_Visitor    Other    Returning_Visitor    Feb    Mar    May    Jul    Aug    Sep    Oct    Nov    Dec    June    0    0.54644808743169404    3.0991379310344827    1.72    2.5673981191222572    1.9018062397372741    3.074074074074074    2.3306878306878307    3.4305555555555554    3.0775623268698062    3.1018518518518516    3.4088235294117646    2.943548387096774    3.9458823529411764    2.1980906921241048    0.81818181818181823    2.7016034415330465    1.9462686567164178    1.7258064516129032    2.2812030075187968    3.3    0    2.1634241245136185    
Average of BounceRates    Feb    Mar    May    Jul    Aug    Sep    Oct    Nov    Dec    June    4.702137909782609E-2    2.1727793297850059E-2    2.6866607426278433E-2    2.4676473863425922E-2    1.8210815990762123E-2    1.2183056613839288E-2    1.1849442672131139E-2    1.9258797221147532E-2    2.014914133294729E-2    3.5101748871527762E-2    Average of ExitRates    Feb    Mar    May    Jul    Aug    Sep    Oct    Nov    Dec    June    7.4148289581521684E-2    4.4599572606712393E-2    4.8849532999702731E-2    4.5330361993055519E-2    3.7726668900692803E-2    3.0320497022321417E-2    2.901140144626593E-2    3.8202198939626703E-2    4.1302775355530126E-2    5.8242380802083264E-2    
ExitRates    y = 0.915x + 0.0228
R² =...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here