Dataset 1 Assignment : Hypothesis Testing Type the last three digits of your student number in the green cell: 724 -7.9549632153 14.1984112513328 Dataset 1 7 20GroupsFrequencies 0.1300to307120.2768276259 0.8307to314184.4940491225 620.67314to3214428.5400756165 0.80.76321to3288871.6823966744 1328to3358671.6823966744 0.73335to3424128.5400756165 6.95461382910.45342to349154.4940491225 8.84555462860.42349to35690.2768276259 14.19841125130.14 8.26373677340 2.0212369411 1.2742778891 -4.9179951627 -7.9549632153 &"Helvetica Neue,Regular"&12&K000000&P Dataset 2 -37.4147752053Assignment : Hypothesis Testing -23.1759908644 13.398570599Dataset 2 20 0.1Part (a) 207.20202.13196.93198.16197.74198.15 0.8207.65203.68197.13197.06196.60197.55 208.93202.22198.63197.09197.40198.36 207.51201.32197.97198.31198.04198.78 206.02200.07196.67199.85199.05200.31 205.84199.09197.67198.40200.32199.29 204.36198.89196.90197.34199.11200.46 Part (b) 203.64197.56198.07198.70198.13202.00 203.23198.43199.61197.65198.25200.55 198.56199.07199.70199.13203.00204.23 6.9546138291-14.2408215775-36.0176781469-30.8666184359-32.6388550472-30.9335754424 8.8455546286-7.7852072629-35.2120054224-35.4989160217-37.4147752053-33.4140912344 14.1984112513-13.8962725431-28.9263544042-35.3555572248-34.0663805275-30.0421974204 8.2637367734-17.6578154853-31.6681723018-30.2389660677-31.387474862-28.274006656 2.0212369411-22.8856482931-37.1113268042-23.7880614796-27.1521956225-21.880094171 1.2742778891-26.9762038377-32.9126083269-29.8730791501-21.8507030202-26.1353097278 -4.9179951627-27.8245078799-36.1352421761-34.3286613999-26.8844859805-21.2517703254 -7.9549632153-33.4057411699-31.253486167-28.624322408-31.0099289575-14.7859664342 -9.6324886749-29.7519662508-24.7888156185-33.0066980522-30.5130282256-20.8902981682 -14.2408215775-36.0176781469-30.8666184359-32.6388550472-30.9335754424-24.8390685042 &"Helvetica Neue,Regular"&12&K000000&P Dataset 3 -37.4147752053Assignment : Hypothesis Testing -23.1759908644 13.398570599Dataset 3 20 0.112 1500List (a) 101571.961521.341469.331481.631477.401481.471500.52 0.81576.481536.761471.251470.571465.991475.551470.84 1589.261522.161486.271470.911473.991483.601504.37 1575.091513.181479.721483.131480.391487.82 List (b) 1548.181488.691454.721486.541478.501491.101477.71 1546.391478.921464.751472.011491.171480.931489.03 1531.611476.901457.051461.361479.141492.601472.54 1524.351463.571468.711474.991469.291508.041484.82 1520.351472.291484.151464.521470.481493.46 6.9546138291-14.2408215775-36.0176781469-30.8666184359-32.6388550472-30.9335754424 8.8455546286-7.7852072629-35.2120054224-35.4989160217-37.4147752053-33.4140912344 14.1984112513-13.8962725431-28.9263544042-35.3555572248-34.0663805275-30.0421974204 8.2637367734-17.6578154853-31.6681723018-30.2389660677-31.387474862-28.274006656 2.0212369411-22.8856482931-37.1113268042-23.7880614796-27.1521956225-21.880094171 1.2742778891-26.9762038377-32.9126083269-29.8730791501-21.8507030202-26.1353097278 -4.9179951627-27.8245078799-36.1352421761-34.3286613999-26.8844859805-21.2517703254 -7.9549632153-33.4057411699-31.253486167-28.624322408-31.0099289575-14.7859664342 -9.6324886749-29.7519662508-24.7888156185-33.0066980522-30.5130282256-20.8902981682 &"Helvetica Neue,Regular"&12&K000000&P Dataset 4 -37.1113268042 14.1984112513Assignment : Hypothesis Testing 20Dataset 4 0.1 0.8Resistance: 14 0.8Motor runningMotor not running 0.8615.7215.12-26.97620383770.2 0.915.8015.18-27.82450787990.18 6.9546138291116.0015.27-33.40574116990.07 8.84555462860.8815.7615.10-29.75196625080.14 14.19841125130.7615.5214.74-36.01767814690.02 8.26373677340.7515.5014.74-35.21200542240.04 2.02123694110.6315.2614.62-28.92635440420.16 1.27427788910.5715.1414.45-31.66817230180.11 -4.91799516270.5415.0814.28-37.11132680420 -7.95496321530.4514.9014.18-32.91260832690.08 -9.63248867490.5715.1414.36-36.13524217610.02 -14.24082157750.4514.9014.21-31.2534861670.11 -7.78520726290.3814.7614.20-24.78881561850.24 -13.89627254310.2814.5613.88-30.86661843590.12 -17.65781548530.214.4013.63-35.49891602170.03 -22.88564829310.1814.3613.59-35.35555722480.03 -26.97620383770.0714.1413.47-30.23896606770.13 -27.82450787990.1414.2813.74-23.78806147960.26 -33.40574116990.7215.4414.78-29.87307915010.14 -29.75196625080.7215.4414.69-34.32866139990.05 &"Helvetica Neue,Regular"&12&K000000&P Dataset 5 -53.3841605322 -3.7575430833Assignment : Hypothesis Testing 0.8 28Dataset 5132 0.20.15 4.8 62AdditiveYield -8.0764593608177119.68-29.70893359550.48 0.20.824637077974.37119.26-38.12655717960.31 62.20.873099097975.1119.78-31.56298442770.44 -3.75754308330.938182966876.07119.29-34.79876530310.37 -12.46021173240.782353857873.74118.88-43.09633574840.21 -10.05520560480.721758012872.83119.49-37.82497821430.31 -6.82531334140.65368474171.81118.92-45.56678519640.16 -14.55858492040.595816637470.94118.72-48.99796024830.09 -17.56575173870.435381757568.53119.08-48.9121832170.09 -20.94399795980.262518020365.94119.04-53.38416053220 -23.81579619870.407566485768.11119.29-47.25492088740.12 -31.7776366070.56202108470.43119.85-38.14650388360.31 -40.35627916650.573998124770.61120.5-30.8528053020.45 -33.15801446340.471168617869.07120.06-37.76985777580.31 -25.4929551994 -24.8985751799 -30.0016557832 &"Helvetica Neue,Regular"&12&K000000&P Dataset 6 Assignment : Hypothesis Testing 15Dataset 6 0.12 G1G2G3 A101314 B18116 SchoolC162017 D122513 E52214 -2.2880010846 9.3864597430.82.4711910153.2754334611 11.67446082765.46096037181.7316334316-1.3183792526 4.42148107486.59452430933.1423297427 2.2803120249.386459743-0.460755274 -2.28800108466.3819096050.8246918139 3.2754334611 5.2031240217 0.7540790008 &"Helvetica Neue,Regular"&12&K000000&P Reference 3 724 Student numbersLast3Seed value B0014420420410.8 B001442242241 B001454764761 B001489359350 B001464794791 B001406626621 B001444634631 B001468378370 B001463093091 B001432192191 B00144085851 B001423533531 B001444144141 B001398008000 B001473473471 B001433073071 B00133044441 B001451101101 B001019679670 B001414644641 B000515705701 B001416506501 B001488828820 B001563043041 B001362762761 B001484884881 B001205855851 B001452952951 B001423003001 B001493463461 B001465345341 B001444484481 B000792332331 B00132032321 B001424584581 B00149056561 B001435825821 B001454394391 B001471961961 B001466276271 B001464554551 B001418148140 B001363223221 B001419019010 B001407247240 B001483283281 B001458538530 B0014600771 B001464634631 B001235225221 B001418788780 B001419839830 B001425035031 B001453673671 B001399759750 B00146051511 B001481461461 B001367657650 B001342102101 B001489599590 B001488348340 B001485725721 B00141067671 B001386406401 B001448638630 B001458768760 B00134039391 B001429569560 B00137073731 B001349699690 B00141010101 B001466886881 B001361871871 B001388828820 B001491121121 B001372822821 B001466546541 B000983733731 B001461761761 B00140024241 B001409729720 B001473393391 B001453123121 B00147010101 B00143095951 B001426106101 B001421981981 B001404304301 B001437547540 B001358418410 B001353113111 B001329469460 B001028528520 B001416766761 B001377707700 B00142018181 B001442552551 B001455025021 B000718388380 B001411111111 B00147030301 &"Helvetica Neue,Regular"&12&K000000&P Statistics and Probability Assignment on Hypothesis Testing January 30, 2023 Instructions This document contains the questions for your assignment project on Statistical Testing. The questions refer to the data given in the individual worksheets in Excel document ‘Assignment Datasets.xlsx’. Please read the following points. 1. All submissions must be in the form of PDF documents. Spread- sheets exported to PDF will be accepted, but calculations must be annotated or explained. 2. It is up to you how you do the calculations in each question, but you must explain how you arrived at your answer for any given calculation. This can be done with a written explanation and by using the relevant equations, along with showing the results of intermediate stages of the calculations. In other words, you need to show that you know how to do a calculation for a statistic other than using spreadsheet functions. 3. Each one of the questions involves a statistical test. Marks within each question will generally be awarded for: 1 • Deciding which statistical test to use, • Framing your Hypotheses and proper conclusions, • Identifying the parameters for the test and • Showing a reasonable level of clarity, detail and explanation in the calculations needed to carry out the test. 4. The data you have been given is in the worksheets of an Excel spreadsheet. This spreadsheet is locked against editing. Please to not try to circumvent this; if you wish to use a spreadsheet to do your calculations, you should copy and paste your data into your own spreadsheet and work with that. Question 1 The lifetimes (in units of 106 seconds) of certain satellite components are shown in the frequency distribution given in ‘Dataset1’. 1. Draw a frequency polygon, histogram and cumulative frequency polygon for the data. 2. Calculate the frequency mean, the frequency standard deviation, the median and the first and third quartiles for this grouped data. 3. Compare the median and the mean and state what this indicates about the distribution. Comment on how the answer to this ques- tion relates to your frequency polygon and histogram. 4. Explain the logic behind the equations for the mean and standard deviation for grouped data, starting from the original equations for a simple list of data values. (This does not just mean ’explain how the equations are used’.) Page 2 5. Carry out an appropriate statistical test to determine whether the data is normally distributed. Question 2 A manufacturer of metal plates makes two claims concerning the thickness of the plates they produce. They are stated here: • Statement A: The mean is 200mm • Statement B: The variance is 1.5mm2. To investigate Statement A, the thickness of a sample of metal plates produced in a given shift was measured. The values found are listed in Part (a) of worksheet ‘Dataset2’, with millimetres (mm) as unit. 1. Calculate the sample mean and sample standard deviation for the data in Part (a) of ’Dataset2’. Explain why we are using the phrase ’sample’ mean or sample’ standard deviation. 2. Set up the framework of an appropriate statistical test on State- ment A. Explain how knowing the sample mean before carrying out the test will influence the structure of your test. 3. Carry out the statistical test and state your conclusions. To investigate the second claim, the thickness of a second sample of metal sheets was measured. The values found are listed in Part (b) of worksheet ‘Dataset2’, with millimetres (mm) as unit. 1. Calculate the sample mean and then the sample variance and standard deviation for the data in Part (b). Page 3 2. Set up the framework of an appropriate statistical test on State- ment B. Explain how knowing the sample variance before carry- ing out the test would influence the structure of your test. 3. Carry out the statistical test and state your conclusions. Question 3 A manager of an inter-county hurling team is concerned that his team lose matches because they ‘fade away’ in the last ten minutes. He has measured GPS data showing how much ground particular players cover within a given time period; this is the data in list (a) in worksheet ‘Dataset3’. He has acquired the corresponding data from an opposing, more successful team, which is given in list (b). 1. Calculate the sample mean and sample standard deviation for the two sets of data. 2. Set up the frame work of an appropriate statistical test to deter- mine whether there is a difference in the distances covered by the two groups of players. 3. Explain how having the results of the calculations above in ad- vance of doing your statistical test will influence the structure of that test. 4. Carry out the statistical test and state your conclusions. Question 4 A study was carried out to determine whether the resistance of the control circuits in a machine are lower when the machine motor is Page 4 running. To investigate this question, a set of the control circuits was tested as follows. Their resistance was measured while the machine motor was not running for a certain period of time and then again while the motor was running. The values found are listed in worksheet ‘Dataset4’, with kilo-Ohms as the unit of measurement. 1. Set up the structure of an appropriate statistical test to determine whether the resistance of the control circuit in a machine are lower when the machine motor is running. 2. Explain how the order of subtraction chosen to calculate the dif- ferences will influence the structure of the test. 3. Give a reason why the data is measured with the engine not run- ning first and then with the engine running. 4. Explain how knowing the mean of the differences in advance will influence the structure of your statistical test. 5. Carry out the statistical test and state your conclusions. Question 5 A study was carried out to determine the influence of a trace element found in soil on the yield of potato plants grown in that soil, defined as the weight of potatoes produced at the end of the season. A large field was divided up into 14 smaller sections for this experiment. For each section, the experimenter recorded the amount of the trace element found (in milligrams per metre squared) and the corresponding weight of the potatoes produced (in kilograms). This information is presented in the worksheet ‘Dataset5’ in the Excel document. Define X as the trace element amount and Y as the yield. Page 5 1. Draw a scatterplot of your data set. 2. Calculate the coefficients of a linear equation to predict the yield Y as a function of X. 3. Calculate the correlation coefficient for the paired data values. 4. Set up the framework for an appropriate statistical test to estab- lish if there is a correlation between the amount of the trace ele- ment and the yield. Explain how having the scatterplot referred to above and having the value of r in advance will influence the structure of your statistical test. 5. Carry out and state the conclusion of your test on the correlation. 6. Comment on how well the regression equation will perform based on the results above. Question 6 A multinational corporation is conducting a study to see how its em- ployees in five different countries respond to three gifts in an incentive scheme. The numbers of employees who choose each of the three gifts (G1 to G3) in each of the five countries (A to E) are given in the table in ‘Dataset6’ in the Excel document. 1. Set up the structure of an appropriate statistical test to deter- mine whether the data supports a link between choice of gift and country, including the statistic to be used. 2. Carry out this test, showing clearly in your work how the expected values are calculated for your test statistic. Page 6
Answered Same DayMay 04, 2023

Answer To:

Atul answered on May 04 2023
27 Votes
Question 1
Groups Frequencies
300 to 307 12
307 to 314 18
314 to 321 44
321 to 328 88
328 to 335 86
335 to 342 41
342 to 349 15
349 to 356 9
The lifetimes (in units of 106 seconds) of certain satellite components are shown in the
frequency distribution given in ‘Dataset1’.
1. Draw a frequency polygon, histogram and cumulative frequency polygon for the
data.
To draw the frequency polygon, we first need to calculate the midpoints of each group:

Groups Frequencies Midpoints
300 to 307 12 303.5
307 to 314 18 310.5
314 to 321 44 317.5
321 to 328 88 324.5
328 to 335 86 331.5
335 to 342 41 338.5
342 to 349 15 345.5
349 to 356 9 352.5
Histogram
Finally, to draw the cumulative frequency polygon, we need to calculate the cumulative
frequencies:


Groups Midpoints Frequencies

Cumulative
Frequencies
300-307 303.5 12 12
307-314 310.5 18 30
314-321 317.5 44 74
321-328 324.5 88 162
328-335 331.5 86 248
335-342 338.5 41 289
342-349 345.5 15 304
349-356 352.5 9 313
2. Calculate the frequency mean, the frequency standard deviation, the median and the
first and third quartiles for this grouped data.
To calculate the mean, we need to use the midpoint of each group and the frequency of each
group:
Midpoint Frequency Midpoint * Frequency
----------------------------------------------
303.5 12 3,642
310.5 18 5,589
317.5 44 13,970
324.5 88 28,556
331.5 86 28,521
338.5 41 13,895
345.5 15 5,183
352.5 9 3,173
----------------------------------------------
102,509
The total frequency is 323, so the frequency mean is:
Mean = 102,509 / 323 ≈ 317.44
To calculate the frequency standard deviation, we need to find the variance first. We can use
the formula:
Variance = (Σ (f * (x - mean)²)) / (N - 1)
where f is the frequency, x is the midpoint, mean is the frequency mean, and N is the total
frequency.
Midpoint Frequency x - mean (x - mean)² f * (x - mean)²
--------------------------------------------------------------------
303.5 12 -13.94 193.8756 2326.5072
310.5 18 -6.94 48.1636 866.9450
317.5 44 0.56 0.3136 13.7904
324.5 88 7.56 57.1536 5028.9792
331.5 86 14.56 211.9936 18223.0784
338.5 41 21.56 464.6736 19029.7856
345.5 15 28.56 817.5936 12263.9040
352.5 9 35.56 1262.0736 11358.6624
--------------------------------------------------------------------
Σ 66487.6624
Variance = 66487.6624 / (323 - 1) ≈ 207.8
Standard deviation = √207.8 ≈ 14.42
To calculate the median, we need to find the frequency cumulative distribution function
(CDF)
Midpoint Frequency Cumulative frequency
----------------------------------------------
303.5 12 12
310.5 18 30
317.5 44 74
324.5 88 162
331.5 86 248
338.5 41 289
345.5 15 304
352.5 9 313
The total frequency is 323, so the median corresponds to the midpoint that has a cumulative
frequency of 161.5, which lies between the fourth and fifth groups. To estimate the median,
we can use the formula:
Median = L + ((N / 2 - CF(L-1)) / f) * w
where L is the lower limit of the group that contains the median, N is the total frequency,
CF(L-1) is the cumulative frequency up to the previous group, f is the frequency of the group
that contains the median, and w is the width of the group.
In this case, we have :
L = 321
N = 323
CF(L-1) = 30
f = 88
w = 7
Median = 321 + ((323 / 2 - 30) / 88) * 7 ≈ 326.43
To find the quartile boundaries, we can use the formula:
Q(n) = L + ((n / 4 * N) - CF(L-1)) / f * w
where n is the quartile number (1 for the first quartile, 3 for the third quartile), and the other
variables have the same meaning as before.
For the first quartile, we have:
n = 1
Q(1) = L + ((1 / 4 * 323) - CF(L-1)) / f * w
= 321 + ((0.25 * 323) - 30) / 88 * 7
= 313.73
For third Quartile
n = 3
Q(3) = L + ((3 / 4 * 323) - CF(L-1)) / f * w
= 331.5 + ((0.75 * 323) - 248) / 86 * 7
= 340.34
So, the estimated first quartile is 313.73, and the estimated third quartile is 340.34.
3. Compare the median and the mean and state what this indicates about the
distribution. Comment on how the answer to this question relates to your frequency
polygon and histogram.
Comparison of Median and Mean
The median of the data set is 326.43, and the mean is 317.44. The fact that the median is
slightly larger than the mean indicates that the distribution is slightly skewed to the right.
This is consistent with what we see in the frequency polygon and histogram, where there are
more values on the right side of the distribution.
4. Explain the logic behind the equations for the mean and standard deviation for
grouped data, starting from the original equations for a simple list of data values. (This
does not just mean ’explain how the equations are used’.)
The equations for the mean and standard deviation for grouped data are modifications of the
equations for the mean and standard deviation for a simple list of data values. The main
difference is that the grouped data is divided into intervals, and the frequency of each interval
is used to determine the weight of each interval in the calculation of the mean and standard
deviation.
For the mean, the equation for grouped data is:
mean = Σ (midpoint * frequency) / Σ frequency
where midpoint is the midpoint of each interval, and frequency is the frequency of each
interval. The numerator represents the sum of the products of the midpoint and frequency of
each interval, while the denominator represents the total frequency of all intervals. This
equation is used to calculate the weighted average of the midpoints of the intervals, where the
weight of each interval is its frequency.
For the standard deviation, the equation for grouped data is:
standard deviation = sqrt(Σ [(x - mean)^2 * frequency] / (Σ frequency - 1))
where x is the midpoint of each interval, mean is the mean of the data set, and frequency is
the frequency of each interval. The numerator represents the sum of the products of the
squared differences between the midpoint and the mean and the frequency of each interval,
while the denominator represents the total frequency of all intervals minus one. This equation
is used to calculate the weighted average of the squared deviations of the midpoints from the
mean, where the weight of each interval is its frequency.
The modification of the equations is necessary because grouped data provides less
information about the individual data points than a simple list of values. The midpoint of each
interval is used to represent all the data points within the interval, and the frequency of each
interval is used to determine the weight of each interval in the calculation of the mean and
standard deviation.
5.Carry out an appropriate statistical test to determine whether the data is normally
distributed.
We can use the following method Anderson-Darling test for the given grouped data, we first
need to calculate the expected frequencies for a normal distribution with the same mean and
standard deviation as the data. We can use the following formula to calculate the expected
frequency for an interval:
Expected frequency = (Φ(upper bound) - Φ(lower bound)) * N
where Φ() is the cumulative distribution function of the standard normal distribution, upper
bound and lower bound are the upper and lower bounds of the interval, and N is the total
sample size.
Using the given data, we can calculate the sample mean and sample standard deviation as
follows:
mean = (300+307)*12/2 +...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here