Section A
There were over 3.5 million hospital discharges in the year 2000 in the U.S. state of California. Patient length of stay summary statistics available on all reported year 2000 hospital discharges in California include a median length of stay of 3.0 days, a mean length of stay of 4.6 days, and a standard deviation of 4.5 days. Below is a histogram that shows the distribution of the length of stay, measured in days, for all hospital discharges in the year 2000 in California. (the California all discharge data set). You may consider this the population distribution of hospital discharges for the year 2000 in California.
1.0 If a random sample of 1,000 discharges were taken from the California all-discharge
database, and a histogram were made of patient length of stay for the sample, which of the
following is most likely true:
a) The histogram will look approximately like a normal distribution
because the sample size is large, and the Central Limit Theorem applies.
b) The histogram will look approximately like a normal distribution
because the number of samples is large, and the Central Limit Theorem
applies.
c) The histogram will appear to be right skewed.
d) The histogram will appear to be left skewed.
e) The histogram will look like a uniform distribution
2.0 Suppose we compared 2 random samples taken from the California all-discharge database
described above.
Sample A
is a random sample with 100 discharges.
Sample B
is a
random sample with 2,000 discharges. What can be said about the relationship between the
sample standard error in Sample A (
SE
A
) relative to the sample standard error of length-of stay
value in Sample B (SE
B
)?
a)
SE
A
<>
B
b)
SE
A
> SE
B
c)
SE
A
is exactly equal to
SE
B
d) Not enough information given to determine relationship between the two
standard errors.
3.0 Suppose we took 5,000 random samples from the California all-discharge data set, each
sample containing 100 discharges. For each of the 5,000 samples, the sample mean was
computed. A histogram was then created with the 5,000 sample mean values. Which of
the following statements most likely describes this histogram?
a) The histogram will look approximately like a normal distribution
because the size of each sample is large, and the Central Limit Theorem applies.
b) The histogram will look approximately like a normal distribution
because the number of samples is large, and the Central Limit Theorem
applies.
c) The histogram will appear to be right skewed
.
d) The histogram will appear to be left skewed.
e) The histogram will look like a uniform distribution
4.0 In a health care utilization journal, results are reported from a study performed on a
random sample of 100 deliveries at a large teaching hospital. The sample mean birth weight is
reported as 120 grams, and the sample standard deviation is 25 grams. The researchers
neglected to report a 95% confidence interval for the population birth weight (i.e.: mean
birthweight for all deliveries in the hospital). You decide to do so, and find the 95% confidence
interval for the population mean birth weight to be:
a) 119.5 grams to 120.5 grams
b) 115 grams to 125 grams
c) 70 grams to 170 grams
d) 117.5 grams to 122.5 grams
5.0. A survey was conducted on a random sample of 1,000 Baltimore residents. Residents were
asked whether they have health insurance. 650 individuals surveyed said they do have health
insurance, and 350 said they do not have health insurance. A 95% CI for the proportion of
Baltimore residents with health insurance is:
a) 60% to 75%
b) 32% to 38%
c) 62% to 68%
d) 36% to 46%
Section B (LINEAR REGRESSION)
Question 1.0: The
objective of a study is to understand the factors that are associated with systolic blood pressure in infants. Systolic blood pressure, weight (grams) and age (days) are measured in 100 infants. A multiple linear regression is performed to predict blood pressure (mm Hg) from age and weight. The following results are presented in a journal
article. (Questions a-b refer to these results)
|
Variable
|
coefficients
|
SE
|
1
|
Intercept
|
50
|
4.0
|
2
|
Birth Weight
|
0.1
|
0.3
|
3
|
Age(days)
|
4.0
|
0.60
|
a. How much higher would you expect the blood pressure to be of an infant who
weighed 120 grams compared to an infant who weighed 90 grams if both infants
were of exactly the same age?
a. 0.1 mm Hg
b 1.0 mm Hg
c. 2.0 mm Hg
d. 3.0 mm Hg
e. 4.0 mm Hg
b. Which of the following is a 95% confidence for the difference in SBP between two
infants of the same weight who differ by 2 days in age (older compared to younger)?
a. 2.8 mmHg to 5.2 mmHg
b. 5.6 mmHg to 10.4 mmHg
c. 6.8 mmHg to 9.6 mmHg
d. –0.5 mmHg to 0.7 mmH
Question 2.0:
A recent study of the relationship between Scholastic Aptitude Test (SAT) scores and provincial level characteristics found a statistically significant (p < .05)="">
between average SAT scores and the percent of high school seniors who actually took the
SAT within a state. Linear regression was used to estimate this relationship, and the
resulting regression equation was:
y = 1024 – 2.3x
1
where
y
represents average SAT score, and
x
1
represents the percentage of high school
seniors taking the SAT. The coefficient of determination, R
2
is 0.76. What can be said about the correlation coefficient,
r?
a. The correlation coefficient is – 0.76
b. The correlation coefficient is 0.76
c. The correlation coefficient is .87
d. The correlation coefficient is – .87
2.2 The relationship between forced expiratory volume (FEV), which is measured in liters, and age, which is measured in years, is evaluated in a random sample of 200 men between the ages of 20 and 60. A simple linear regression analysis is performed to predict FEV from age. The following results are published in a paper.
Results of simple linear Regression Analysis
|
Variable
|
Regression coefficients
|
Standard Errors
|
1
|
Intercept
|
-4
|
0.3
|
3
|
Age(days)
|
0.02
|
0.005
|
a) Interpret the coefficient of age in words.
b) Give a 95% confidence interval for the coefficient of age. Write a sentence
explaining the interpretation of this confidence interval.
c) Given the above results, can you ascertain whether the linear relationship between
FEV and age is strong? Why or why not?
d) Suppose above results are used to compare 60-year-old men to 50-year-old men –
what would be the estimated average difference in FEV between the two groups of
men?
e) Would it make sense to use the above results to estimate the average FEV levels for
men 80 years of age? Why or why not?
Question 3.0:
A data set contains information about the hourly wage (in U.S. dollars) and the gender of each of the 534 workers surveyed in 1985, as well as information about each worker’s age, union membership, and education level (measured in years of education). A linear regression analysis was performed to model hourly wage as a function of worker sex, number of years of education, and union membership. Below find the results from this regression:
a. Using the above regression results, estimate the mean hourly wage in 1985 for male workers
with a high school education (12 years of education), who were not union members.
a. $10.72 per hour
b. $6.93 per hour
c. $9.12 per hour
d. $8.82 per hour
b. Give a 95% confidence interval for the mean difference in hourly wages for male workers
with a high school education who were union members when compared to male workers with a
high school education who were not union members.
a. $0.90 per hour to $2.90 per hour
b. - $2.70 per hour to -$1.20 per hour
c. Cannot be estimated with the reported regression results.