Multiple
Linear
Regression
Application:
You
will
evaluate
a
set
of
gas
mileage,
horsepower,
and
other
information
for
cars.
Your
response variable is the gas mileage (mpg). You will use multiple linear regression to
identify
the
potential
factors
associated
with
gas
mileage
in
this
data.
We
expect
you
to
analyze
the
data,
write
a
brief
report,
and
submit
the
final
report
to
us
for
evaluation
and
grading.
All
analyses
are
to
be
completed
using
STATA.
We
provided
a
comma-separated
value
format
(.csv)
file.
Your report will have a total of 4 different sections: Introduction, Methods, Results, and
Discussions.
Keep
the
report
to
a
maximum
of
8
pages
(double-spaced).
It
is
okay
to
turn
in
reports
less
than
the
maximum
number
of
pages,
as
long
as
everything
requested
is
included
and
adequately
covered.
Direct
any
questions
about
the
project
to
your
instructor
or
the
TAs
assigned
to
your
class.
Introduction
: This section is expecting to answer, “What is the rationale for the
scientific
question
asked?”
The
rationale
needs
to
be
based
on
a significant
public
health
issue.
Describe
the
relationships
of
interest
and
the
purpose
of
the
analysis.
Please conduct a small literature search (1-3 papers) to understand the scientific question
asked
in
the
project
and
provide
a
brief
summary
in
this
section.
(This
section
should
be
brief
and
amount
to
a
few
paragraphs.
Limit
it
to
about
1
page
double
spaced.)
Methods:
This section should describe what steps and statistical methods you did to
analyze the data and how you applied them to solve the questions asked. You also
need
to
provide
a
description
of
what
statistical
methods
were
used
and
the
rationale
or purpose of it. Please describe any statistical methods used for testing model
assumptions
if
needed.
(Optional) If you created a new variable for your analyses, you need to provide the
rationale
and
method
for
creating
this
variable.
Add
a
sentence
referencing
the
software,
in
this
case
STATA,
you
used
for
all
your
analyses,
just
as
you
are
expected
to
do
for
any
peer
review
publication.
Results
: The results section needs to mimic a peer review publication, so it needs to
include
the
following
elements:
•
Identify
the
variables used
in
the
analysis and
create
a
summary
table
that
describes
your
sample
based
on
these
variables.
These
descriptive
statistics
are
based
on
the
original/raw
data.
•
Use
the
Table
1
Template
in
Appendix
A
to
present
your
descriptive
statistics.
•
Summarize
your
findings
based
on
the
initial
descriptive
statistics
in
a
brief
paragraph.
•
You
will
be
using
regression
models
to address
the
primary
question(s),
so
you
need
to
provide
analyses
that
evaluate
the
model
assumptions.
•
Present
your
initial
exploratory
analyses
on
the
original
data
that
you
used
to
make
a
preliminary
assessment
on
the
presence
of
potential
issues
and
distributional
characteristics
relevant
to
the
statistical
model
needed
to
address
the
primary
hypotheses
you
are
asked
to
evaluate.
•
If you find any violations of the assumptions, describe how you dealt
with
these.
(e.g.
Selecting
an
appropriate
transformation
if
applicable)
•
Fit
the
initial
model
with
all
the
explanatory
variables.
•
Provide
detailed
analysis
of
model
fit
based
on
residuals.
At
a minimum,
you
need
to
include
a
quantile-normal
plot
to
check
the
distribution
of
the
residuals,
residual
versus
fitted
plot,
and
residual
versus
each
predictor
plots.
•
If
you
find
any
issues
in
the
analysis
of
the
residuals,
describe
the
remedial
steps
you
took
to
address
these
issues.
•
Fit
the
final
model
after
the
remedial
steps
you
took
to
resolve
issues
identified
in
your
analysis
of
model
fit.
•
Copy
and
Paste
the
regression
results
for
your
final
model
into
your
report
and
label
it
as
Table
2.
•
Specify
how
the
explanatory
variables
that
appear
in
your
final
model
were
selected.
•
Summarize
the key
results
from
your
final
multiple
linear
regression
model
in
the
text,
and
include
a
table
with
all
the
regression
results
in
the
body
of
the
paper
Discussions:
In this section you need to describe what the results mean in the context
of the scientific question integrating all the questions asked for the project. The
discussions,
like
the
introduction,
should
be
kept
brief.