hello can you please tell to the person that is going to work on this projectif he can do this in the csv fill . All you have to do is fill in the blanks with the averages of the nearest columns I am...

1 answer below »
hello can you please tell to the person that is going to work on this projectif he can do this in the csv fill . All you have to do is fill in the blanks with the averages of the nearest columns
I am attaching project it needs to be like this one but can you please addYou can use BIC() and AIC() to find the best model.
Also my project needs to be similar to this one but using a different style and form of writing.
Please let meknow what you think now.
Please remember thatthis needs to be done on R Studio and there is the guideline project
Thank you I need it perfect. Please let me know so I can pay
is going to be with 2018 Data
She specifies what she did on page 7 for the csv file.
DATA:
Idelete the air pollutionvariable because I did not have any information from 2018
so it goes like this

2018 Data
Y=life expectancy at birthx1= Health expenditure (% of SDP)x2= Average amount of SDPx3= Food importx4= Basic drinking water (% of population)x5=Mortality rate, infantx6= Alcohol consumption (Liters of alcohol)x7=Urban population (% of total population)x8= Undernourishment (% of population)x9= Population (total)x10= Employment to population



1 FLORIDA INTERNATIONAL UNIVERSITY FITTING LIFE EXPECTANCY MODEL FOR THE WORLD POPULATION 2017: REGRESSION MODEL APPROACH CHELSEA CORAL(5469791) ORAL STAT PRESENTATIONS - STA3951 Dr. B.M. GOLAM KIBRIA DATE: 02/20/2020 2 TABLE OF CONTENTS Cover Page………………………………………………………………………………...1 Table of Contents………………………………………………………………………….2 Abstract…………………………………………………………………………………....3 Section I – Introduction - Introduction……………….……………………………………………………..4-6 Section II - Data - 2.1 Source of Data…………………………………………………………………7 - 2.2 Descriptive Statistics…………………………………………………………...8 - 2.2.1 Summary Statistics…………………………………………………………..9 - 2.3 Objective of the Study………………………………………………………..10 Section III - Data Analysis and Fitting Linear Regression Model - 3.1 Full Model Summary Statistics………………………...…………………10-11 - 3.2 Backwards Elimination……………………..………………………………...12 - 3.3 Reduced Model Summary Statistics………………………………………13-14 Section IV – Cross Validation - 4.1 Cross Validation……………………………………………………...............14 - 4.2 Prediction……………………………………………………………………..15 - 4.3 Prediction Intervals…………………………………………………………...15 Section V – Multicollinearity - 5.1 Multicollinearity Testing...…………………………………………………...16 Section VI – Summary & Concluding Remarks - 6.1 Interpretations and Results……………………………………………………16 Bibliography Appendix I – Full Data Set Appendix II – List of Predicted Values 181 to 213 and Intervals 3 ABSTRACT There are many influencing factors that can be used as good indicators of overall public health, one of these indicators is life expectancy. In this study, we want to be able to determine variables that contribute to the study of life expectancy and how they are used to make accurate predictions. Public health is and has always been a major worldwide concern and difficult to improve, but that is because there are so many variables that contribute to the average person’s health that it makes it difficult to distinguish what should be changed, if it can be changed and what is already benefiting. Using this study we will determine what those impactful variables are that play a big role in overall health and how we can use this information to improve their life expectancy. Variable selection in a statistical analysis is one of the most important processes within regression and will be used to create said impactful model. In order to make future predictions, we must establish a meaningful model that will ensure accuracy by using only the significant variables related to the dependent variable; we will call this the reduced model. From those results, we can now conclude that the following variables are considered highly significant in an analysis of life expectancies: Health expenditure per GDP, Infant Mortality rate per 1000 births, Amount of people utilizing basic water drinking services, Urban population, Alcohol consumption Proportion of food imports. 4 1. Introduction Life expectancy is one of the most important indicators of how its government is responding to its healthcare development. The analyses that follow these studies can raise questions concerning healthcare management. It can even lead to questions regarding control of fossil fuel combustion, restrictions on clean and edible food and their management of healthcare. Many world leaders can use this information since they are constantly looking for new ways to improve the lives of their people. A long life expectancy can be a strong indication that active measures are being executed appropriately, contrarily short longevity can indicate a prolonged issue that may eventually become unsafe for the general public. Life expectancy is known as the average amount of time in years that an individual is expected to live based on a multitude of factors. The data in this study will consist of life expectancies for 213 countries. We want to understand the data that belongs in the process of predictions models for future interpretations. We can use the results given in this analysis to determine what factors impact health. For example, I strongly hypothesize that factors relating to environmental issues such as water pollution in drinking water will be a variable that will have a strong relationship with life expectancies because it has been proven in many studies that water pollution involves toxic chemicals from plastic that can potentially be killing humans and other living organisms. This study will be used to determine the unique variables that particularly stand out in an analysis of life expectancy with the exception of a country having their own unique factors apart from other countries that make a bigger contribution to their region. However, this analysis will focus on the conditions of general factors that can be considered “ordinary” for the typical country such as health, employment and environment. 5 Here I will provide general information regarding life expectancy on a worldwide scale. This study will explore the data of a few hypothesized factors and will continue to go into further detail of each variable and its hypothesized contribution to life expectancy. The variables considered in this project are as follows. I. Life Expectancy at Birth (Dependent Variable = y3) On average, how long a newborn is expected to live assuming current mortality rates do not change (excellent health condition). As the dependent variable, it can be used as a reliable indicator of a country’s state of health since shifts can be used to describe positive/negative patterns due to any changes made to its independent variables. II. Healthcare Expenditure per GDP (x1) Proportion of GDP that is administered to the country’s healthcare system. This amount is distributed among many medical related necessities such as rehabilitation centers, community health services, and administration/regulations. III. GDP in US $ (x2) Gross Domestic Product in US dollars is defined as monetary value that provides an estimate of a country’s economic growth and size. In short, it represents the worth of a country’s output. For consistency, all currencies were converted into US dollars. IV. Infant Mortality Rate per 1000 Births (x3) IMR is defined as the death rate of an infant within the first 12 months for every 1000 births. A mother's health plays an important part in a baby’s health. (Low rates in infant death = healthy mother, likewise high rates in infant death = unhealthy mother). The mother’s health will be a considerable indication of an average woman's health. V. Commonness of Malnutrition (x4) Quality and accessibility to proper nutrition. Malnutrition can lead to diseases which can shorten a person’s lifespan. Malnutrition can be determined in 2 ways, overweight and underweight and vary in definition by several factors. How often is a nutritional issue encountered and how is it affecting the population as a whole including children which suffer most from malnutrition. VI. Population Total (x5) Global contribution to enhancing toxic chemicals into the environment such as air pollution, plastic pollution, etc. People lack proper education on how to treat earth’s nonrenewable resources. The greater the population, the greater the contribution becomes. VII. People Using At least Basic Drinking Water Services (x6) Water is evidently vital for survival but many countries have little to no access to clean water. 6 Contaminated water can cause life threatening diseases for all living organisms in the environment. When toxic chemicals and oils are released, it exposes toxicities into our bodies and can even lead to accelerated death. VIII. PM 2.5 Air Pollution in Micrograms per cubic meter, Mean Annual Exposure (x7) PM2.5 is an air pollutant in fine particles that reduces visibility, which if inhaled in high amounts can become extremely poisonous affecting the heart, lungs and other organs. Similar to drinking water, I hypothesize PM2.5 levels will be significantly important to life expectancy. IX. Urban Population Total (% of total population) (x8) People already contribute to global air pollution by using public transportation and smoking habits in small cities. Population increments in urban cities call for needed compact development and tolerance, therefore people of all ages expose themselves to the environment that other locals neglect. X. Total Alcohol Consumption (liters of pure alcohol) (x9) High amounts of alcohol consumption can lead to a higher risk of stroke, liver disease, blood pressure and other life threatening issues. Ingredients include syrups, sugar, and alcohol percentage contain high amounts of calories that can also lead to obesity. XI. Employment to Population Ratio (x10) People are starting to work past retirement age in order to maintain their lifestyle habits and it could be detrimental to their physical health especially for those employees that work in manual labor. Osteoporosis and arthritis are some examples of health issues that can worsen over time and start at later stages in life. XII. Food Imports to Country (% of merchandise imports) (x11) This is the proportion of food that gets distributed from the amount of merchandise that gets imported into the country. Proper nutrition is important but our countries do not always have the appropriate climates year round to provide all food supply. Imported foods have been treated to be continuously transported and not go bad after long periods of time. Preservatives are definitely something to consider when transporting goods. The organization of the project is as follows: The data sources and its descriptions are given in Section 2. Two regression models are developed is Section 3. The cross validation and evaluation of the fitted models are outlined in Section 4. The major independence assumptions are tested in section 5. This paper will end with some concluding remarks in Section 6. 7 2. Data 2.1 Source of Data In this section I will describe how my data was chosen, cleaned and prepared to begin my analysis. I will also explain the methods used for replacing missing data values. For consistency, most of the data used in this analysis was extracted from the World Data Bank and collected from the year 2017. Healthcare expenditure and total alcohol consumption were the only 2 factors that had to be extracted from their most recent 2016 data since no data from 2017 had been recorded. First, we chose our variables using our knowledge of basic health related concerns such as access to nourishment, basic human resources (air and water quality) and other global issues. After the collection of data was complete, I distinguished the countries with too many missing data values for multiple variables and deleted the indicated rows from the overall collection. Additionally, scattered rows with about 1 or 2 missing variable values had to
Answered 6 days AfterNov 19, 2021

Answer To: hello can you please tell to the person that is going to work on this projectif he can do this in...

Franciosalgeo answered on Nov 25 2021
118 Votes
Assignment
2
Contents
Introduction    1
2. Data    2
2.1. Source of data    2
2.2 Descriptive statistics.    3
3. Data Analysis and Fitting Linear Regression Model.    6
3.1 Full Model    6
4. Cross validation    13
5. Multicollinearity    14
Summary and concluding Remarks    14
Reference    15
Appendix 1    16
Introduction
One of the most important indications of how the government is responding to the country's healthcare development is life expectancy. The health-care system's contribution is measured in medical costs per capita, while the system's output is measured in life expectancy at birth. The analysis that follows these trials may raise concerns about healthcare administration. Several global leaders can benefit from this knowledge because they are always looking for innovative methods to better their people's lives. A lengthy average lifespan can be a strong indicator that propaganda efforts are being implemented properly; on the other hand, a short life expectancy can signa
l a long-term problem that could become dangerous to the broader public. The average amount of time in years that an individual is predicted to live based on a variety of circumstances is known as life expectancy.
2. Data
2.1. Source of data
In this section, I’ll explain how I chose, cleaned, and prepared my data for analysis. I’ll also go over the ways for substituting data values that are missing. For consistency, the majority of the data in this research came from the World Data Bank and was collected in 2018. The variables used in the analysis were selected based on our understanding of core health concerns such as food and nutrition security, human resources (air and water quality), and other global difficulties. Because I didn’t have any data from 2018, I removed the air pollution variable. There were missing values in most of the variables with maximum number of missing values in food import (34 missing values), which appeared to be missing at random. All of the missing values were replaced using linear interpolation technique, that is imputing missing values with averages of nearest values. Variables included in this research study are life expectancy at birth (Y) , Health expenditure (% of SDP) (X1), Average amount of SDP (X2), Food import (X3), Basic drinking water (% of population) (X4), infant mortality rate (X5), Alcohol consumption (Liters of alcohol) (X6), Urban population (% of total population) (X7), Undernourishment (% of population) (X8), Population (total) (X9), Employment to population (X10). From i = 1 to 10, I’ll call the variable by their allocated Xi term throughout the rest of the study.
The full data may be found in Appendix 1 at the end of the study.
Before removing any insignificant regressors, I’ll start by fitting the entire model(Hoffmann). To discover which regressors are insignificant, we’ll employ hypothesis testing and the p value. The best fitting model will be found out using BIC and AIC criteria.
2.2 Descriptive statistics.
Table 1. Summary statistics of the variables
    Characteristic
    N = 227
    Life expectancy at birth
    
    Range
    53, 85
    Median (IQR)
    74 (67, 77)
    Mean (SD)
    72 (7)
    Health expenditure
    
    Range
    2.14, 16.89
    Median (IQR)
    6.06 (4.50, 7.96)
    Mean (SD)
    6.51 (2.66)
    Average amount of SDP
    
    Range
    196,737,896, 86,100,000,000,000
    Median (IQR)
    79,788,768,969 (15,005,866,762, 937,000,000,000)
    Mean (SD)
    3,134,987,396,047 (9,941,006,969,831)
    Food Import
    
    Range
    4, 48
    Median (IQR)
    11 (8, 16)
    Mean (SD)
    13 (7)
    People using at least basic drinking water services (% of population)
    
    Range
    39, 100
    Median (IQR)
    94 (80, 99)
    Mean (SD)
    87 (15)
    Infant Mortality rate
    
    Range
    2, 83
    Median (IQR)
    15 (6, 35)
    Mean (SD)
    22 (19)
    Total Alcohol
Consumption (liters of pure alcohol)
    
    Range
    0.0, 20.5
    Median (IQR)
    5.8 (2.7, 9.1)
    Mean (SD)
    6.0 (3.9)
    Urban population (% of total population)
    
    Range
    13, 100
    Median (IQR)
    59 (41, 77)
    Mean (SD)
    58 (22)
    Undernourishment (% of population)
    
    Range
    2, 57
    Median (IQR)
    7 (3, 13)
    Mean (SD)
    10 (10)
    Population (total)
    
    Range
    17,911, 7,592,475,615
    Median (IQR)
    15,477,727 (4,157,091, 96,984,780)
    Mean (SD)
    357,649,591 (1,036,083,609)
     Employment to population
    
    Range
    32, 87
    Median (IQR)
    58 (52, 64)
    Mean (SD)
    58 (11)
Each variable in our data set is listed in Table 1 along with a summary of its individual statistics, such as sample size, average, maximum, and lowest values, and standard deviation.
3. Data Analysis and Fitting Linear Regression Model.
3.1 Full Model
    Table2
    Full Model: Linear Regression
    Dependent varaiable: Y
    Characteristic
    Beta1
    SE2
    95% CI2
    p-value
    (Intercept)
    70***
    2.95
    64, 76
    <0.001
    X1
    0.28***
    0.078
    0.13, 0.43
    <0.001
    X2
    0.00
    0.000
    0.00, 0.00
    0.8
    X3
    -0.05
    0.028
    -0.11, 0.00
    0.055
    X4
    0.05*
    0.023
    0.01, 0.10
    0.019
    X5
    -0.28***
    0.017
    -0.32, -0.25
    <0.001
    X6
    -0.08
    0.051
    -0.19, 0.02
    0.10
    X7
    0.03**
    0.011
    0.01, 0.05
    0.007
    X8
    0.01
    0.024
    -0.04, 0.05
    0.8
    X9
    0.00
    0.000
    0.00, 0.00
    0.3
    X10
    0.02
    0.018
    -0.01, 0.06
    0.2
    1*p<0.05; **p<0.01; ***p<0.001
    2SE = Standard Error, CI = Confidence Interval
To begin, we utilize hypothesis testing to identify inconsequential factors and to see if each beta is equal to 0. The results of hypothesis testing are given in Table2. If the P value in table 2 is greater than 0.05, the corresponding variable is insignificant. From table 2 we can see that X2, X3, X6, X8, X9, X10 are insignificant.
We will select the best fitting model using AIC and BIC criteria. The AIC and BIC of the full model is given below.
## r.squared adj.r.squared sigma statistic p.value df logLik AIC
## 1 0.8826 0.8771 2.5477 162.3498 0 10 -528.7522 1081.504
## BIC deviance df.residual nobs
## 1 1122.604 1402.038 216 227
Thus, the full model has AIC value of 1081.504 and BIC value of 1122.604. The adjusted R2 value of the full model is 0.8771. With ten predictor variables, a sensible approach would be to assess all of the different models that can be built with them and then choose the best one based on BIC/AIC. This procedure is called the best subset selection. I have done this using the MASS::stepAIC function in r. The stepAIC function calculates the BIC/AIC in a somewhat different way than the BIC/AIC functions. This, however, has no bearing on model choices.
    Table3
    Final Model: Linear Regression
    Dependent varaiable: Y
    Characteristic
    Beta1
    SE2
    95% CI2
    p-value
    (Intercept)
    73***
    2.21
    68, 77
    <0.001
    X1
    0.26***
    0.071
    0.12, 0.40
    <0.001
    X3
    -0.07**
    0.026
    -0.12, -0.02
    0.006
    X4
    0.05*
    0.021
    0.01, 0.09
    0.025
    X5
    -0.28***
    0.017
    -0.32, -0.25
    <0.001
    X6
    -0.08
    0.051
    -0.18, 0.02
    0.11
    X7
    0.03**
    0.011
    0.01, 0.05
    0.009
    1*p<0.05; **p<0.01; ***p<0.001
    2SE = Standard Error, CI = Confidence Interval
The AIC and BIC of the final model in original scale are
## r.squared adj.r.squared sigma statistic p.value df logLik AIC
## 1 0.8804 0.8772 2.5473 270.0158 0 6 -530.798 1077.596
## BIC deviance df.residual nobs
## 1 1104.996 1427.539 220 227
Thus, the AIC value of final model is 1077.596, which is better (smaller) than full model. Also, the BIC of the final model is 1104.996, which is also better than that of full model. However, the adjusted R2 value has not improved much. The plots of the final model are given below
4. Cross validation
Cross validation is a technique for determining how good the final fitted model will be at predicting future observations. Using the cv.lm function from the DAAG package in r, we do a 10-fold cross validation. The plot of the original observed y values vs fitted y values for all observations is shown below. When we read these data points from left to right, we notice a strong uphill linear pattern, which may be interpreted as a positive link between observed and expected values.
5. Multicollinearity
One of the major assumptions in linear regression model is independence of observations or no multicollinearity. Multicollinearity will be measured using variance inflation factor (VIF) . A VIF value less than 5 implies no multicollinearity. The VIF value of variables in the final model are given below.
## X1 X3 X4 X5 X6 X7
## 1.24 1.15 3.61 3.65 1.34 1.79
VIF value is found maximum for X4, which is 3.61. All the variables in the final model have VIF value less than 5. Hence, we can conclude the multicollinearity is not observed in the model and model asssumptions are satisfied.
Summary and concluding Remarks
The final model selected using the AIC and BIC criteria is
Y= 73+ 0.26X1 - 0.07X3 + 0.05X4 - 0.28X5 - 0.08X6 + 0.03X7. The variables in the final model are listed below:
1. Y= Life expectancy at birth
1. X1= Health expenditure
1. X3= Food Import
1. X4= People using at least basic drinking water services (% of population)
1. X5= Infant Mortality rate
1. X6= Total Alcohol Consumption (liters of pure alcohol)
1. X7= Urban population (% of total population)
Here we can see that, Health expenditure, basic drinking water services, and Urban population percentage had positive effect on life expectancy. Other variables such as infant mortality rate has negative effect on life expectancy. The R squared value of the final model is 0.88, which means 88% of the variance in life expectancy is explained in final model.
Reference
Hoffmann, John P. Linear Regression Models: Application in R. CRC press, Taylor and...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here