Q XXXXXXXXXXmiddle school students in Michigan were asked whether grades, athletic ability, or popularity was most important to them. The results are shown below, broken down by gender. Do the data...

1 answer below »
Pl chq files


Q1). 478 middle school students in Michigan were asked whether grades, athletic ability, or popularity was most important to them. The results are shown below, broken down by gender. Do the data provide evidence of an association between the two variables? Show all details and be sure to state your conclusion clearly. There are 2 categorical variables: Gender and What students find important For 2 categorical variables, we use Chi Square Test Step 1 H0: The two variables are not associated H1: The two variables are associated Step 2 Condition: is satisfied, because all values in Expected Count are at least 5 Test-Stat = 21.5 Step 3 p-value = ?? (we can use Statkey or Excel, but we can’t find it using table) However, we can approximate whether it’s less than or greater than 0.05 df = (#rows – 1)(#cols – 1) = (2 – 1) (3 – 1) = 2 Focusing on row 2 on Chi Square table, we see that our test-stat (21.5) is located on the far right. So the p-value is also located on the far right. Hence, p-value < 0.05="" step="" 4="" since="" p-value="">< 0.05,="" reject="" h0.="" we="" have="" strong="" evidence="" that="" the="" two="" variables="" are="" associated="" q2).="" the="" dataset="" cereal="" shows="" the="" number="" of="" grams="" of="" fibre="" per="" serving="" for="" 30="" different="" breakfast="" cereals="" from="" three="" different="" companies.="" the="" summary="" statistics="" are="" shown="" below.="" conduct="" an="" analysis="" of="" variance="" test="" to="" determine="" whether="" there="" is="" a="" difference="" in="" mean="" number="" of="" grams="" of="" fibre="" per="" cereal="" between="" the="" three="" companies.="" use="" the="" values="" of="" the="" anova="" table="" given="" in="" your="" calculations.="" show="" all="" details="" of="" the="" test.="" there="" are="" 2="" variables="" involved:="" 1.="" the="" company="" of="" the="" cereal="" categorical="" 2.="" grams="" of="" fibre="" quantitative="" for="" 1="" categorical="" and="" 1="" quantitative="" variables,="" we="" use="" anova="" step="" 1="" h0:="" all="" means="" are="" equal="" h1:="" at="" least="" two="" means="" are="" different="" step="" 2="" condition:="" is="" not="" satisfied="" -="" n="" in="" each="" category="" ≥="" 30="" no="" -="" standard="" deviation="" in="" all="" categories="" are="" similar="" (no="" sd="" twice="" the="" other="" sd)="" yes="" step="" 3="" test-stat="0.69" (from="" the="" calculation="" in="" anova="" table)="" step="" 4="" p-value="0.51" (using="" statkey,="" in="" final="" exam="" it="" will="" be="" given)="" step="" 5="" since="" p-value=""> 0.05, do not reject H0. There is not enough evidence for the difference in the means 1. Section 1: Introduction a. Give a brief introduction about the assignment and search related article and write a paragraph of summary which supports your assignment. You need to give the full citation of the article. b. Dataset 1: Give a short description about this dataset. Is this primary or secondary data? What are types of variables involved? Explain briefly what are the possible cases used in this study. c. Dataset 2: Explain how you collect the data and discuss its limitation (e.g. whether your sample is biased). Is this primary or secondary data? What is/are the type(s) of variable(s) involved? Give a description of cases you consider for this data set. 2. Section 2: Analysis of single variable in Dataset 1 a. To answer research question “Which type of public transport was most used by the NSW people during 8th to 14th of August 2016?”, provide a suitable numerical summary and graphical display for the variables mode of Dataset 1. Give a detailed comment to answer the research question. Hints: 1. use pivot table to create a frequency table 2. use the frequency table to create pie chart or bar chart 3. See tutorial 2, Question 4a b. Now to answer research question “Are there more than 50% of public transport users in NSW use the particular mode of transport found in Part a?” setup an appropriate hypotheses, perform hypotheses test and answer the research question by writing the conclusion of the test. Do step-by-step hypothesis test 3. Section 3: Analysis of two variables in Dataset 1 NSW Government need to decide on whether they have to build an underground Railway line from either Parramatta, Bankstown or Gosford to central. To prepare a recommendation for this; a. Give a numerical summary and an appropriate graphical display for the variables location, by only considering those three stations; and the variable count by considering the data with trains only. Hints: 1. Use Excel to filter location (select only 3 stations) 2. Use Excel to filter mode (select trains only) 3. Use Statkey with 2 variables (columns) : location & count b. Perform a suitable hypothesis test at a 5% level of significance to test whether there is difference between mean counts of taps on and off. Hints: 1. Use Statkey: “ANOVA for Difference in Means” (edit data and enter the values from columns tap and count. You can click ANOVA Table, record important values. 2. Use Statkey to find p-values (Theoretical Distributions: F) 3. Do step-by-step ANOVA test c. Use the conclusion of the test in part b and the outputs in part a to write a recommendation to NSW government. 4. Section 4: Collect and analysis Dataset2 You are interested in finding whether there is a difference in preference between different gender in terms of their transport mode (Bus, Train, Ferry and Light Rail). by considering appropriate number of cases and variable, give a proper graphical display and use it to write a comments. Hints: 1. Do survey Section 5: Discussion & Conclusion Write an executive summary by combining all your findings in the previous sections which must be a valuable recommendation for NSW Transport. Give a suggestion for further research BUS708 Statistics and Data Analysis Statistical Modelling Assignment Trimester 2, 2018 1 OVERVIEW OF THE ASSIGNMENT This assignment will test your skills of collecting and analysing data to answer a specific business problem. It also gives you the opportunity to apply the theories you have learned in this course such as finding numerical summaries, displaying with appropriate graphs and using statistical inferences to solve business problems, including constructing hypotheses, test them and interpret the findings. You may have to use two Data sets. One Data set will be sent to you via KOI student email individually and you need to find or collect another dataset. Suppose you are working for an agency who analyse NSW transport system data to make a recommendation to improve public transport system. You will be given series of research questions. Use your knowledge that you gain from this course to answer these questions by displaying appropriate outputs of Excel, StatKey or Wolfram alpha. Use these answers to write an executive summary which might be a valuable recommendation to Transport NSW. 2 TASK DESCRIPTION: WRITTEN REPORT There are two datasets involved in this assignment: Dataset 1 and Dataset 2, detailed below. Dataset 1: You will receive an email that contains a dataset that is specifically allocated to you. This dataset is a subset of a data Opal Tap on and Tap Off Location - 8th to 14th August 2016 individual sample file, provided by the Transport for NSW Open Data and has been edited to only include a subset of the cases and variables. The original dataset can be obtained from https://opendata.transport.nsw.gov.au/dataset/opal-tap-on-and-tap-off and it is under the license of Creative Commons Attribution 3.0 Australia. Data dictionary of the edited dataset is given in the following table. Variable Description Values mode Type of the public transport Bus, Train, Ferry and Light Rail date Date of the tap on/off held Date/month/year tap It is a tap on or off On and Off loc Locations of stops. For bus postcodes and others name of the stations Postcodes and names of the stations count Total number tap on or off on the certain location and the certain date Number Dataset 2: Collect data (e.g. via a survey) that will answer research question given in section 3. There is no requirement about the number of variables, sampling methods and sample size, but you need to justify your approaches in Section 1 (see below). https://opendata.transport.nsw.gov.au/dataset/opal-tap-and-tap/resource/c8d1d429-c283-4350-95f8-d8d21b845ac0 https://opendata.transport.nsw.gov.au/dataset/opal-tap-on-and-tap-off https://creativecommons.org/licenses/by/3.0/au/ Both datasets should be saved in an Excel file (one file, separate worksheets). All data processing should be performed in Excel or Statkey (http://www.lock5stat.com/StatKey). Prepare a report in a document file (.doc or .docx) which includes all relevant tables and figures, using the following structure: 1. Section 1: Introduction a. Give a brief introduction about the assignment and search related article and write a paragraph of summary which supports your assignment. You need to give the full citation of the article. b. Dataset 1: Give a short description about this dataset. Is this primary or secondary data? What are types of variables involved? Explain briefly what are the possible cases used in this study. c. Dataset 2: Explain how you collect the data and discuss its limitation (e.g. whether your sample is biased). Is this primary or secondary data? What is/are the type(s) of variable(s) involved? Give a description of cases
Answered Same DaySep 05, 2020BUS708University of the Sunshine Coast

Answer To: Q XXXXXXXXXXmiddle school students in Michigan were asked whether grades, athletic ability, or...

Viswanathan answered on Sep 07 2020
133 Votes
Introduction
The main objective of this study is to determine the findings for Total number tap on or off on the certain location and the certain date in dataset 1. In dataset 2, we are interested in determining the descriptive findings for Total number tap on or off on the certain location and the certain date
Dataset 1
The dataset 1 is taken from https://opendata.t
ransport.nsw.gov.au/dataset/opal-tap-on-and-tap-off and it is under the license of Creative Commons Attribution 3.0 Australia. Since this data is taken from reliable government site of Australia, this dataset is an example of secondary data collection technique. The variables included in this dataset is given below
    Variable
    Description
    Values
    Mode
    Transport Type
    Bus, Train, Ferry and Light Rail
    Date
    Date of the tap on/off held
    Entered in the format of dd/mm/yyyy
    Tap
    It is a tap on or off
    On and Off
    Loc
    Locations of stops. For bus postcodes and others name of the stations
    Postcodes and names of the stations
    Count
    Total number tap on or off on the certain location and the certain date
    Values
Dataset 2
The dataset 2 is also represents the information about the opal tap on and tap off and it is just like the same data that was taken under the license of Creative Commons Attribution 3.0 Australia. Here, we are conducting a small survey that asks the respondents about the tap on and tap off and its count over the days, therefore, this dataset is an example of primary data collection technique. The variables used in the dataset 1 will also be used in this dataset too. Simple random sampling technique was used as a sampling procedure to select the sample respondents. The variables included in this dataset is given below
    Variable
    Description
    Values
    Mode
    Transport Type
    Bus, Train, Ferry and Light Rail
    Date
    Date of the tap on/off held
    Entered in the format of dd/mm/yyyy
    Tap
    It is a tap on or off
    On and Off
    Loc
    Locations of stops. For bus postcodes and others name of the stations
    Postcodes and names of the stations
    Count
    Total number tap on or off on the certain location and the certain date
    Values
For this dataset, a random sample of 100 respondent’s opinion was taken for the study
Analysis of Single Variable
The frequency distribution of mode of transport is given below
    Mode of Transport
    Frequency
    Percentage
    Bus
    456
    0.456
    Train
    498
    0.498
    Ferry
    24
    0.024
    Light Rail
    22
    0.022
    Total
    1000
    
The pie chart is given below
From the above chart, it is found that the NSW people most frequently used transport is Train, as 49.8% of the overall sample used Train to as their primary mode of transport
In order to determine whether more than 50% of public transport users in NSW use the train mode of transport, we perform one proportion z test
Null Hypothesis: H0: P ≤ 0.5
That is, not more than 50% of public transport users in NSW use the train mode of transport
Alternative Hypothesis: Ha: P > 0.50 (Right tailed test)
That is, more than 50% of public transport users in NSW use the train mode of transport
Assumptions
Here, np ≥ 10 and n(1 – p) ≥ 10
Therefore, np = 1000 * 0.498 = 498 > 10 and n(1 – p) = 100 * 0.502 = 502 > 10
Thus, the assumption is satisfied to perform one proportion z test
Level of Significance:
Let the level of significance be α = 0.05
Test Statistic
The Z test statistic is
Here, the p – value of z test statistic is 0.5503
Since the p – value of z test statistic is greater than 0.05, there is no sufficient evidence to reject the null hypothesis at 5% level of significance. Therefore, there is no sufficient statistical evidence to support the claim that more than 50% of...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here