ST 307 Project 1 (50 pts) In this project you will create a SAS program, save it as a .sas file, and upload that file to Moodle on the assignment link. Remember: ● Everyone must submit their own code....

1 answer below »
I need someone to guide me how to run this program in sas software




ST 307 Project 1 (50 pts) In this project you will create a SAS program, save it as a .sas file, and upload that file to Moodle on the assignment link. Remember: ● Everyone must submit their own code. ● You may not work with others on this project! You may obtain help from the TAs only. You should not post to the discussion board about this project either. Failure to adhere to these guidelines will result in an academic integrity violation. ● Be sure that your SAS file adheres to the SAS file submission guidelines (available on Moodle in the “Resources and Information” Section. Datasets: One of the datasets for this homework is available on the assignment link and the other is available via this URL: https://www4.stat.ncsu.edu/~online/datasets/StudentData.txt The WorldDevelopmentIndicators.xlsx dataset comes from the world bank. There is information about the variables and data in the different excel sheets. The sheet with the data is called Data, that is the sheet you’ll want to read in. Download the dataset from the assignment link and place it in your shared folder. The StudentData.txt data comes from the UCI machine learning repository. Information about the variables in the dataset is available at this link. Programming questions You will now write code corresponding to each question/output/etc. below (we don’t need the output – with your code we can recreate it!). That is, do not simply modify the code used for question 1 to do question 2. You can copy and paste the previous code if needed, but we need to see the code used to answer each question below. Don’t forget to add comments prior to your SAS steps describing what you are doing (and include the header)! 1. (1pt) Create a permanent library called project using a LIBNAME statement. World Development Indicators Data Section 2. (3pts) Read the WorldDevelopmentIndicators.xlsx file into your project library. Remember that you want to read in the “Data” sheet (see notes). https://datacatalog.worldbank.org/dataset/world-development-indicators https://archive.ics.uci.edu/ml/datasets/Student+Performance 3. (3pts) We can use PROC FREQ to help understand the dataset. Create one-way contingency tables for the Country_name and Indicator_name variables. a) In a comment below your code, answer the following: How many observations are there for the Country_name “North America”? 4. (4pts) Next, we’ll make a copy of the dataset with some changes. a) Create another permanent dataset (don’t overwrite the original data) that only includes observations where country_name is either Arab World, European Union, Latin America & Caribbean, or North America. b) Rename the country_name variable to Region. For the rest of this section, use the dataset you’ve created in question 4. 5. (1pt) The data should already be sorted by country_name, however, use a PROC SORT step to sort your dataset by country_name just to be safe. 6. (5pts) Use a PROC step to find (only) the mean, first quartile, median, third quartile, and standard deviation of the value variable for every country_name in your dataset. a) In the PROC step where you do this, subset the data to only include observations where Indicator_name is "Life expectancy at birth, total (years)”. b) In a comment below your code, answer the following: Which region had the highest mean? 7. (3pts) Use a PROC step to find the correlation between year and value. a) In the PROC step where you do this, subset the data to only include observations where Indicator_name is "Life expectancy at birth, total (years)”. 8. (1pt) Repeat question 7 but find the correlations between year and value for each region. (This data should be subsetted as well.) 9. (4pts) Use a PROC step and a CLASS statement to find the mean, first quartile, median, third quartile, and standard deviation of the value variable for each Region and Year combination. a) In the PROC step where you do this, subset the data to only include observations where Indicator_name is "Life expectancy at birth, total (years)”. Student Data Section 10. (4pts) Read the StudentData.txt file in from the URL or download it and read it in locally. Note: This is a ‘;’ delimited file. 11. (8pts) Next, we’ll make a copy of the dataset with some changes. a) Create a temporary dataset. b) Your final dataset should have only the following variables: address, fjob, mjob, reason, internet, romantic, absences, avgGrade c) You should provide labels for fjob, mjob, reason, internet, romantic, and absences (see the link for information about the variables). https://archive.ics.uci.edu/ml/datasets/Student+Performance d) You should create the avgGrade variable as the average of the g1, g2, and g3 variables (but note that g1, g2, and g3 are not in the final dataset). e) You should remove any observations where the school variable is “MS” (but note that school is not in the final dataset). For the rest of this section, use the dataset you’ve created in question 11. 12. (4pts) Use a single PROC step to create one-way contingency tables for the romantic and Internet variables. a) Suppress the displaying of cumulative frequencies/percentages. 13. (6pts) Use a single PROC step to create two-way contingency tables between the father’s job and mother’s job, and the internet and reason variables a) Suppress the display of row and column sums. b) Add in expected counts for each cell (hint see the options here and find the option that ‘displays expected cell frequencies’). c) In a comment below your code, answer the following: Which internet/reason combination had the highest frequency of occurrence? 14. (3pts) Print out the data set but include only values where the average grade is less than 10. a) Be sure to indicate that the labels should be printed out as well. Save this program and upload it to wolfware! Great work! https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=procstat&docsetTarget=procstat_freq_syntax08.htm&locale=pl#procstat.freq.tabstmtopts Remember: Datasets: Programming questions
Answered 1 days AfterMar 11, 2021

Answer To: ST 307 Project 1 (50 pts) In this project you will create a SAS program, save it as a .sas file, and...

Aarti answered on Mar 13 2021
137 Votes
/* 1. Creating a permanent library - PROJECT */
libname project "/home/aartigoyal20130";
/* 2. Import
ing the xlsx file and saving in permanent library */
%web_drop_table(WORK.COUNTRY);
FILENAME REFFILE '/home/aartigoyal20130/Data.xlsx';
PROC IMPORT DATAFILE=REFFILE
    DBMS=XLSX
    OUT=WORK.COUNTRY;
    GETNAMES=YES;
RUN;
%web_open_table(WORK.COUNTRY);
data project.country;
set country;
run;
/*3. One way frequency */
proc freq data=project.country;
tables 'country name'n 'indicator name'n;
run;
/* North America has 29058 observations*/
/*4. Filtering the data and renaming variable and saving in PROJECT lib. */
data project.country_1;
set project.country;
where 'country name'n in ('Arab World', 'European Union', 'Latin America & Caribbean', 'North America');
rename 'country name'n = Region;
run;
/*5. Sorting by Region */
PROC SORT DATA=PROJECT.COUNTRY_1;
BY REGION;
RUN;
/* 6. Proc means for statistical values */
PROC MEANS DATA=PROJECT.COUNTRY_1 (WHERE =('INDICATOR NAME'N = 'Life expectancy at...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here