Statistics and Probability

1 answer below »
Statistics and Probability


Dataset 1 Assignment : Hypothesis Testing Type the last three digits of your student number in the green cell: 18 -8.3530608174 18.8988101803320 Dataset 1 5 20GroupsFrequencies 0.1300to30560.2768276259 0.05305to310104.4940491225 500.24310to3153528.5400756165 0.050.03315to3208171.6823966744 0320to3258271.6823966744 0.19325to3303828.5400756165 -1.81726328620.4330to335144.4940491225 -7.52332677790.55335to340100.2768276259 -8.35306081740.78 -3.17561078831 2.6077628032 6.6963797318 12.8046821296 18.8988101803 &"Helvetica Neue,Regular"&12&K000000&P Dataset 2 -9.014781962Assignment : Hypothesis Testing 11.1416193246 11.1497142468Dataset 2 20 0.1Part (a) 195.41206.38202.46192.86199.80199.28 0.05193.39204.68201.22193.10200.95197.85 193.09203.03202.54195.15203.07196.72 194.93205.17200.41194.62205.25198.16 196.98207.46198.35195.17205.79198.81 198.43205.31198.27195.28205.74201.03 200.59203.61196.06197.41203.93200.54 Part (b) 202.75201.64195.85199.48201.73202.58 204.12203.89195.08201.49201.47201.61 202.64196.85200.48202.73203.58205.12 -1.817263286229.138767481218.0956543134-9.01478196210.58181419899.1122094651 -7.523326777924.365100083414.5957038989-8.324353203713.81214449785.0865386405 -8.353060817419.692393097618.2972899917-2.540330796819.79917138891.8930576982 -3.175610788325.74610192112.3100579381-4.036352528825.96394531095.9560418874 2.607762803232.21039027776.4917705195-2.494862084327.49382907497.7920959447 6.696379731826.14033458096.2627379668-2.189381149927.347053913514.0369041652 12.804682129621.32657497740.01691902643.820602517422.228311433212.673930057 18.898810180315.7831295349-0.55856808819.670607609516.021133733718.4266959733 22.779165766722.1119003415-2.74877332715.361017242515.278266457615.6971105687 29.138767481218.0956543134-9.01478196210.58181419899.112209465121.8150132581 &"Helvetica Neue,Regular"&12&K000000&P Dataset 3 -9.014781962Assignment : Hypothesis Testing 11.1416193246 11.1497142468Dataset 3 20 0.10.75 1500List (a) 101454.091563.761524.641428.591498.021492.811493.65 0.051433.881546.851512.241431.041509.461478.551482.82 1430.941530.291525.351451.531530.671467.241489.34 1449.281551.741504.141446.231552.511481.63 List (b) 1469.021573.891482.781450.941557.181487.381494.57 1483.501552.391481.971452.021556.661509.511506.01 1505.141535.331459.841473.311538.531504.681494.09 1526.731515.691457.801494.041516.541525.061505.98 1540.481538.111450.041514.201513.901515.39 -1.817263286229.138767481218.0956543134-9.01478196210.58181419899.1122094651 -7.523326777924.365100083414.5957038989-8.324353203713.81214449785.0865386405 -8.353060817419.692393097618.2972899917-2.540330796819.79917138891.8930576982 -3.175610788325.74610192112.3100579381-4.036352528825.96394531095.9560418874 2.607762803232.21039027776.4917705195-2.494862084327.49382907497.7920959447 6.696379731826.14033458096.2627379668-2.189381149927.347053913514.0369041652 12.804682129621.32657497740.01691902643.820602517422.228311433212.673930057 18.898810180315.7831295349-0.55856808819.670607609516.021133733718.4266959733 22.779165766722.1119003415-2.74877332715.361017242515.278266457615.6971105687 &"Helvetica Neue,Regular"&12&K000000&P Dataset 4 -9.014781962 32.2103902777Assignment : Hypothesis Testing 20Dataset 4 0.1 0.05Resistance: 10 0.05Motor runningMotor not running 0.1710.3411.1426.14033458090.85 0.0410.0810.7721.32657497740.74 -1.81726328620.0210.0410.5915.78312953490.6 -7.52332677790.1410.2810.9922.11190034150.76 -8.35306081740.2810.5611.1718.09565431340.66 -3.17561078830.3810.7611.2814.59570389890.57 2.60776280320.5311.0611.6718.29728999170.66 6.69637973180.6811.3611.8312.31005793810.52 12.80468212960.7711.5411.876.49177051950.38 18.89881018030.9311.8612.186.26273796680.37 22.77916576670.8111.6211.790.01691902640.22 29.13876748120.711.4011.56-0.55856808810.21 24.36510008340.8411.6811.78-2.7487733270.15 19.6923930976112.0011.95-9.0147819620 25.7461019210.8511.7011.67-8.32435320370.02 32.21039027770.7411.4811.59-2.54033079680.16 26.14033458090.611.2011.27-4.03635252880.12 21.32657497740.7611.5211.63-2.49486208430.16 15.78312953490.2210.4410.56-2.18938114990.17 22.11190034150.2210.4410.703.82060251740.31 &"Helvetica Neue,Regular"&12&K000000&P Dataset 5 -11.0535737647 8.136993434Assignment : Hypothesis Testing 0.05 20.5Dataset 5102 0.950.7125 5.9962476533 50AdditiveYield 3.916444868316560.173.38749061730.75 0.950.791647457761.8761.380.09932054930.58 50.950.581084450158.7262.73-2.76361345760.43 8.1369934340.434603957356.5262.55-8.31510967470.14 4.13858997020.153089565152.367.72-1.37822434320.5 0.09776642350.42857510356.4366.524.09147815690.79 -2.71327731810.179059402652.6968.341.49400087030.65 -8.115698178705070.52.19809479240.69 -2.82897445120.161812714152.4367.15-3.03330190550.42 -7.61732226590.102785141951.5468.5-0.72786520810.54 -11.05357376470.318095843854.7764.46-6.24021150180.25 -7.94829600120.260766675353.9167.411.22280361410.64 -9.08106859290.413948796656.2164.03-4.25751066050.35 -4.94913409770.67963742460.1963.543.21057744490.74 -6.0493133595 -3.1096615662 1.9890538914 &"Helvetica Neue,Regular"&12&K000000&P Dataset 6 Assignment : Hypothesis Testing 15Dataset 6 0.12 G1G2G3 A7169 B52116 SchoolC112514 D51818 E91711 -1.3054474646 12.17027511860.056.61645696091.8309878661 13.4757225833-1.30544746469.92664224286.1206863098 3.302362346412.17027511862.9572361246 -1.09619845968.10224899564.3447710858 1.73721229195.3614788850.5118962534 1.8309878661 0.9884516258 1.4545989177 &"Helvetica Neue,Regular"&12&K000000&P Reference 3 18 Student numbersLast3Seed value B0014420420400.05 B001442242240 B001454764760 B001489359350 B001464794790 B001406626620 B001444634630 B001468378370 B001463093090 B001432192190 B00144085850 B001423533530 B001444144140 B001398008000 B001473473470 B001433073070 B00133044440 B001451101100 B001019679670 B001414644640 B000515705700 B001416506500 B001488828820 B001563043040 B001362762760 B001484884880 B001205855850 B001452952950 B001423003000 B001493463460 B001465345340 B001444484480 B000792332330 B00132032320 B001424584580 B00149056560 B001435825820 B001454394390 B001471961960 B001466276270 B001464554550 B001418148140 B001363223220 B001419019010 B001407247240 B001483283280 B001458538530 B0014600771 B001464634630 B001235225220 B001418788780 B001419839830 B001425035030 B001453673670 B001399759750 B00146051510 B001481461460 B001367657650 B001342102100 B001489599590 B001488348340 B001485725720 B00141067670 B001386406400 B001448638630 B001458768760 B00134039390 B001429569560 B00137073730 B001349699690 B00141010101 B001466886880 B001361871870 B001388828820 B001491121120 B001372822820 B001466546540 B000983733730 B001461761760 B00140024240 B001409729720 B001473393390 B001453123120 B00147010101 B00143095950 B001426106100 B001421981980 B001404304300 B001437547540 B001358418410 B001353113110 B001329469460 B001028528520 B001416766760 B001377707700 B00142018180 B001442552550 B001455025020 B000718388380 B001411111110 B00147030300 &"Helvetica Neue,Regular"&12&K000000&P 1. All submissions must be in the form of PDF documents. Spread- sheets exported to PDF will be accepted, but calculations must be annotated or explained. 2. It is up to you how you do the calculations in each question, but you must explain how you arrived at your answer for any given calculation. This can be done with a written explanation and by using the relevant equations, along with showing the results of intermediate stages of the calculations. In other words, you need to show that you know how to do a calculation for a statistic other than using spreadsheet functions. 3. Each one of the questions involves a statistical test. Marks within each question will generally be awarded for: 1 All calculations should use this data All calculations should use the datasets in the excel spreadsheet Graphs should be done in excel too • Deciding which statistical test to use, • Framing your Hypotheses and proper conclusions, • Identifying the parameters for the test and • Showing a reasonable level of clarity, detail and explanation in the calculations needed to carry out the test. 4. The data you have been given is in the worksheets of an Excel spreadsheet. This spreadsheet is locked against editing. Please to not try to circumvent this; if you wish to use a spreadsheet to do your calculations, you should copy and paste your data into your own spreadsheet and work with that. Question 1 The lifetimes (in units of 106 seconds) of certain satellite components are shown in the frequency distribution given in ‘Dataset1’. 1. Draw a frequency polygon, histogram and cumulative frequency polygon for the data. 2. Calculate the frequency mean, the frequency standard deviation, the median and the first and third quartiles for this grouped data. 3. Compare the median and the mean and state what this indicates about the distribution. Comment on how the answer to this ques- tion relates to your frequency polygon and histogram. 4. Explain the logic behind the equations for the mean and standard deviation for grouped data, starting from the original equations for a simple list of data values. (This does not just mean ’explain how the equations are used’.) Page 2 5. Carry out an appropriate statistical test to determine whether the data is normally distributed. Question 2 A manufacturer of metal plates makes two claims concerning the thickness of the plates they produce. They are stated here: • Statement A: The mean is 200mm • Statement B: The variance is 1.5mm2. To investigate Statement A, the thickness of a sample of metal plates produced in a given shift was measured. The values found are listed in Part (a) of worksheet ‘Dataset2’, with millimetres (mm) as unit. 1. Calculate the sample mean and sample standard deviation for the data in Part (a) of ’Dataset2’. Explain why we are using the phrase ’sample’ mean or sample’ standard deviation. 2. Set up the framework of an appropriate statistical test on State- ment A. Explain how knowing the sample mean before carrying out the test will influence the structure of your test. 3. Carry out the statistical test and state your conclusions. To investigate the second claim, the thickness of a second sample of metal sheets was measured. The values found are listed in Part (b) of worksheet ‘Dataset2’, with millimetres (mm) as unit. 1. Calculate the sample mean and then the sample variance and standard deviation for the data in Part (b). Page 3 2. Set up the framework of an appropriate statistical test on State- ment B. Explain how knowing the sample variance before carry- ing out the test would influence the structure of your test. 3. Carry out the statistical test and state your conclusions. Question 3 A manager of an inter-county hurling team is concerned that his team lose matches because they ‘fade away’ in the last ten minutes. He has measured GPS data showing how much ground particular players cover within a given time period; this is the data in list (a) in worksheet ‘Dataset3’. He has acquired the corresponding data from an opposing, more successful team, which is given in list (b). 1. Calculate the sample mean and sample standard deviation for the two sets of data. 2. Set up the frame work of an appropriate statistical test to deter- mine whether there is a difference in the distances covered by the two groups of players. 3. Explain how having the results of the calculations above in ad- vance of doing your statistical test will influence the structure of that test. 4. Carry out the statistical test and state your conclusions. Question 4 A study was carried out to determine whether the resistance of the control circuits in a machine are lower when the machine motor is Page 4 running. To investigate this question, a set of the control circuits was tested as follows. Their resistance was measured while the machine motor was not running for a certain period of time and then again while the motor was running. The values found are listed in worksheet ‘Dataset4’, with kilo-Ohms as the unit of measurement. 1. Set up the structure of an appropriate statistical test to determine whether the resistance of the control circuit in a machine are lower when the machine motor is running. 2. Explain how the order of subtraction chosen to calculate the dif- ferences will influence the structure of the test. 3. Give a reason why the data is measured with the engine not run- ning first and then with the engine running. 4. Explain how knowing the mean of the differences in advance will influence the structure of your statistical test. 5. Carry out the statistical test and state your conclusions. Question 5 A study was carried out to determine the influence of a trace element found in soil on the yield of potato plants grown in that soil, defined as the weight of potatoes produced at the end of the season. A large field was divided up into 14 smaller sections for this experiment. For each section, the experimenter recorded the amount of the trace element found (in milligrams per metre squared) and the corresponding weight of the potatoes produced (in kilograms). This information is presented in the worksheet ‘Dataset5’ in the Excel document. Define X as the trace element amount and Y as the yield. Page 5 1. Draw a scatterplot of your data set. 2. Calculate the coefficients of a linear equation to predict the yield Y as a function of X. 3. Calculate the correlation coefficient for the paired data values. 4. Set up the framework for an appropriate statistical test to estab- lish if there is a correlation between the amount of the trace ele- ment and the yield. Explain how having the scatterplot referred to above and having the value of r in advance will influence the structure of your statistical test. 5. Carry out and state the conclusion of your test on the correlation. 6. Comment on how well the regression equation will perform based on the results above. Question 6 A multinational corporation is conducting a study to see how its em- ployees in five different countries respond to three gifts in an incentive scheme. The numbers of employees who choose each of the three gifts (G1 to G3) in each of the five countries (A to E) are given in the table in ‘Dataset6’ in the Excel document. 1. Set up the structure of an appropriate statistical test to deter- mine whether the data supports a link between choice of gift and country, including the statistic to be used. 2. Carry out this test, showing clearly in your work how the expected values are calculated for your test statistic. Page 6
Answered 1 days AfterMay 04, 2023

Answer To: Statistics and Probability

Atul answered on May 06 2023
23 Votes
Question 1
Groups Frequencies
300 to 305 6
305 to 310 10
310 to 315 35
315 to 320 81
320 to 325 82
325 to 330 38
330 to 335 14
335 to 340 10
The lifetimes (in units of 106 seconds) of certain satellite components are shown in the
frequency distribution given in ‘Dataset1’.
1. Draw a frequency polygon, histogram and cumulative frequency polygon for the
data.
To draw the frequency polygon, we first need to calculate the midpoints of each group:
Intervals Frequencies Midpoint Cumulative Frequency
300-305 6 302.5 6
305-310 10 307.5 16
310-315 35 312.5 51
315-320 81 317.5 132
320-325 82 322.5 214
325-330 38 327.5 252
33
0-335 14 332.5 266
335-340 10 337.5 276
Histogram
Finally, to draw the cumulative frequency polygon, we need to calculate the cumulative
frequencies:

Intervals Frequencies Midpoint
Cumulative
Frequency
300-305 6 302.5 6
305-310 10 307.5 16
310-315 35 312.5 51
315-320 81 317.5 132
320-325 82 322.5 214
325-330 38 327.5 252
330-335 14 332.5 266
335-340 10 337.5 276
To calculate the frequency mean, we need to first calculate the midpoint of each interval, then
multiply each midpoint by its corresponding frequency, sum up the results, and finally divide
by the total frequency.
Intervals Frequencies Midpoint
300-305 6 302.5
305-310 10 307.5
310-315 35 312.5
315-320 81 317.5
320-325 82 322.5
325-330 38 327.5
330-335 14 332.5
335-340 10 337.5
Frequency Mean = (6*302.5 + 10*307.5 + 35*312.5 + 81*317.5 + 82*322.5 + 38*327.5 +
14*332.5 + 10*337.5) / (6+10+35+81+82+38+14+10) = 320.7
The frequency standard deviation can be calculated using the following formula:
σ = sqrt[(Σ(f(x) - mean)^2) / n]
where f(x) is the frequency of each interval, mean is the frequency mean we just calculated,
and n is the total frequency.
f(x) midpoint deviation (deviation)^2 f(x)*(deviation)^2
6 302.5 -18.2 331.24 1987.44
10 307.5 -13.2 174.24 1742.4
35 312.5 -8.2 67.24 2353.4
81 317.5 -2.2 4.84 392.04
82 322.5 2.8 7.84 642.88
38 327.5 7.8 60.84 2312.92
14 332.5 12.8 163.84 2293.76
10 337.5 17.8 316.84 3168.4
σ = sqrt[(Σ(f(x) - mean)^2) / n] = sqrt[ (1987.44 + 1742.4 + 2353.4 + 392.04 + 642.88 +
2312.92 + 2293.76 + 3168.4) / 336] ≈ 8.05
To find the median, we need to find the interval that contains the 168th value (the halfway
point between the 336 frequencies). The cumulative frequency column tells us that the 168th
value falls within the 320-325 interval, which has a cumulative frequency of 132. The
interval width is 325-320 = 5, and we need to find how much of this interval contains the
168th value. To do so, we calculate:
p = (168 - 132) / 82 = 0.439
Median = lower limit of the interval + (p * interval width) = 320 + (0.439 * 5) = 322.195
quartile = lower limit of the interval + (p * interval width)
where p is the fractional part of (n * quartile number) / 4 and n is the total frequency.
For the first quartile (Q1), we need to find the interval that contains the 84th value (which is
(336 * 1) / 4). The cumulative frequency column tells us that the 84th value falls within the
310-315 interval, which has a cumulative frequency of 16 + 35 = 51. The interval width is
315-310 = 5, and we need to find how much of this interval contains the 84th value. To do so,
we calculate:
p = (84 - 51) / 81 = 0.407
Q1 = lower limit of the interval + (p * interval width) = 310 + (0.407 * 5) = 312.035
For the third quartile (Q3), we need to find the interval that contains the 252nd value (which
is (336 * 3) / 4). The cumulative frequency column tells us that the 252nd value falls within
the 325-330 interval, which has a cumulative frequency of 132 + 82 + 38 = 252. The interval
width is 330-325 = 5, and we need to find how much of this interval contains the 252nd
value. To do so, we calculate:
p = (252 - 132 - 82) / 38 = 0.842
Q3 = lower limit of the interval + (p * interval width) = 325 + (0.842 * 5) = 329.21
Therefore, the first quartile (Q1) is approximately 312.035 and the third quartile (Q3) is
approximately 329.21.
3. Compare the median and the mean and state what this indicates about the
distribution. Comment on how the answer to this question relates to your frequency
polygon and histogram.
The median for this grouped data is approximately 321.875, and the mean is approximately
322.195.
Since the mean and the median are relatively close in value, this suggests that the data is
fairly symmetrically distributed. This is also evident from the frequency polygon and
histogram, where we see that the distribution is somewhat bell-shaped, with the highest
frequencies occurring in the middle of the data range and decreasing as we move towards the
extremes.
However, there is a slight right skew in the distribution, as we can see from the frequency
polygon and histogram where the right tail extends further than the left tail. This skewness is
also reflected in the fact that the mean is slightly larger than the median, indicating that the
right tail of the distribution is pulling the mean towards it.
Overall, we can conclude that the distribution is roughly symmetric but slightly skewed to the
right.
4. Explain the logic behind the equations for the mean and standard deviation for
grouped data, starting from the original equations for a simple list of data values. (This
does not just mean ’explain how the equations are used’.)
The equations for the mean and standard deviation for grouped data are modifications of the
equations for the mean and standard deviation for a simple list of data values. The main
difference is that the grouped data is divided into intervals, and the frequency of each interval
is used to determine the weight of each interval in the calculation of the mean and standard
deviation.
For the mean, the equation for grouped data is:
mean = Σ (midpoint * frequency) / Σ frequency
where midpoint is the midpoint of each interval, and frequency is the frequency of each
interval. The numerator represents the sum of the products of the midpoint and frequency of
each interval, while the denominator represents the total frequency of all intervals. This
equation is used to calculate the weighted average of the midpoints of the intervals, where the
weight of each interval is its frequency.
For the standard deviation, the equation for grouped data is:
standard deviation = sqrt(Σ [(x - mean)^2 * frequency] / (Σ frequency - 1))
where x is the midpoint of each interval, mean is the mean of the data set, and frequency is
the frequency of each interval. The numerator represents the sum of the products of the
squared differences between the midpoint and the mean and the frequency of each interval,
while the denominator represents the total frequency of all intervals minus one. This equation
is used to calculate the weighted average of the squared deviations of the midpoints from the
mean, where the weight of each interval is its frequency.
The modification of the equations is necessary because grouped data provides less
information about the individual data points than a simple list of values. The midpoint of each
interval is used to represent all the data points within the interval, and the frequency of each
interval is used to determine the weight of each interval in the calculation of the mean and
standard deviation.
5.Carry out an appropriate statistical test to determine whether the data is normally
distributed.
To test for normality, we can use the Shapiro-Wilk test, which is a commonly used statistical
test for normality.
However, since our data is grouped and we only have the frequencies for each interval, we
cannot...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here