University of California, Los Angeles Department of Statistics Statistics 12 Instructor: Nicolas Christou Data analysis with R - Some simple commands When you are in R, the command line begins with >...

1 answer below »
R studio programing


University of California, Los Angeles Department of Statistics Statistics 12 Instructor: Nicolas Christou Data analysis with R - Some simple commands When you are in R, the command line begins with > To read data from a website use the following command: a <- read.table("http://www.stat.ucla.edu/~nchristo/statistics12/body_fat.txt",="" header="TRUE)" the="" result="" of="" the="" command="" read.table="" is="" a="" “data="" frame”="" (it="" looks="" like="" a="" table).="" in="" our="" example="" we="" give="" the="" name="" data="" to="" our="" data="" frame.="" the="" columns="" of="" a="" data="" frame="" are="" variables.="" this="" file="" contains="" data="" on="" percentage="" of="" body="" fat="" determined="" by="" underwater="" weighing="" and="" various="" body="" circumference="" measurements="" for="" 251="" men.="" here="" is="" the="" variable="" description:="" variable="" description="" x1="" density="" determined="" from="" underwater="" weighing="" y="" percent="" body="" fat="" from="" siri’s="" (1956)="" equation="" x3="" age="" (years)="" x4="" weight="" (lbs)="" x5="" height="" (inches)="" x6="" neck="" circumference="" (cm)="" x7="" chest="" circumference="" (cm)="" x8="" abdomen="" 2="" circumference="" (cm)="" x9="" hip="" circumference="" (cm)="" x10="" thigh="" circumference="" (cm)="" x11="" knee="" circumference="" (cm)="" x12="" ankle="" circumference="" (cm)="" x13="" biceps="" (extended)="" circumference="" (cm)="" x14="" forearm="" circumference="" (cm)="" x15="" wrist="" circumference="" (cm)="" if="" the="" data="" file="" is="" on="" your="" computer="" (e.g.="" on="" your="" desktop),="" first="" you="" need="" to="" change="" the="" working="" directory="" by="" clicking="" on="" misc="" at="" the="" top="" of="" your="" screen="" and="" then="" read="" the="" data="" as="" follows:=""> a <- read.table("filename.txt",="" header="TRUE)" note:="" the="" expression=""><- is="" an="" assignment="" operator.="" once="" we="" read="" the="" data="" we="" can="" display="" them="" by="" simply="" typing="" at="" the="" command="" line="">< a.="" or="" if="" we="" want="" we="" can="" display="" the="" first="" 6="" rows="" of="" the="" data="" by="" typing=""> head(a). Here is the output: > head(a) x1 y x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 1 1.0853 6.1 22 173.25 72.25 38.5 93.6 83.0 98.7 58.7 37.3 23.4 30.5 28.9 18.2 2 1.0414 25.3 22 154.00 66.25 34.0 95.8 87.9 99.2 59.6 38.9 24.0 28.8 25.2 16.6 3 1.0754 10.3 23 188.15 77.50 38.0 96.6 85.3 102.5 59.1 37.6 23.2 31.8 29.7 18.3 4 1.0722 11.7 23 198.25 73.50 42.1 99.6 88.6 104.1 63.1 41.7 25.0 35.6 30.0 19.2 5 1.0708 12.3 23 154.25 67.75 36.2 93.1 85.2 94.5 59.0 37.3 21.9 32.0 27.4 17.1 6 1.0775 9.4 23 159.75 72.25 35.5 92.1 77.1 93.9 56.1 36.1 22.7 30.5 27.2 18.2 1 Useful commands: • Extracting one variable from the data frame (e.g. the second variable): > a[,2] • Another way to extract a variable : > a$y • Similarly if we want to access a particular row in our data (e.g. first row): > a[1,] • To list all the data simply type: > a • To compute the mean of all the variables in the data set: > mean(a) • To compute the mean of just one variable: > mean(a$y) • To compute the mean of variables 2 and 3: > mean(a[,c(2,3)]) • To compute the variance of one variable: > var(a$y) • To compute the variance-covariance matrix of all the variables: > cov(a) • To compute the variance-covariance matrix of all the variables except the first variable: > cov(a[,-1]) • To compute the variance-covariance matrix of variables 1, 2, and 3: > cov(a[,c(1,2,3)]) or cov(a[,1:3]) • To compute the variance-covariance matrix of variables 1, 2, and 5: > cov(a[,c(1,2,5)]) • To compute the correlation matrix: As above, replace cov with cor, for example: > cor(data[,c(1,2,3)]) • To compute summary statistics for all the variables: > summary(a). • To construct stem-and-leaf plot, histogram, boxplot: > stem(a$y) > boxplot(a$y) > hist(a$y) • To plot variable y against variable x10: > plot(a$x10, a$y) 2 • And you can give names to the axes and to your plot: > plot(a$x10, a$y, main="Scatterplot of percent body fat against thigh circumference", ylab="Percent body fat", xlab="Thigh circumference") And here is the plot: ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 60 70 80 0 10 20 30 40 Scatterplot of percent body fat against thigh circumference Thigh circumference P er ce nt b od y fa t • To save a plot as a pdf file under the working directory (e.g. your desktop): > pdf("box.pdf") > boxplot(a$y) > dev.off() A box plot of the variable y can be found on your current working directory with the name box.pdf. If you want to read more about a specific command (for example about histograms and boxplots) at the command line you type the following: > ?hist > ?boxplot • Exercise: Construct the same plots with different variables and save them on your desktop. 3 Create multiple graphs on one page. Suppose 9 graphs, 3 × 3: pdf("plot9.pdf") par(mfrow=c(3,3)) hist(a$y) boxplot(a$x10) plot(a$x10, a$y) boxplot(a$y) boxplot(a$x9) hist(a$x1) plot(a$x13, a$y) hist(a$x9) boxplot(a$x13) dev.off() And here is the plot: Histogram of a$y a$y Fr eq ue nc y 0 10 20 30 40 50 0 10 30 50 ● ● ● ● 50 60 70 80 ●●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● 0 50 100 150 200 250 50 60 70 80 Index a$ x1 0 ● 0 10 20 30 40 ● ● ● 90 11 0 13 0 15 0 Histogram of a$x1 a$x1 Fr eq ue nc y 1.00 1.04 1.08 0 10 20 30 40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● 25 30 35 40 45 0 10 20 30 40 a$x13 a$ y Histogram of a$x9 a$x9 Fr eq ue nc y 90 110 130 150 0 20 40 60 80 ● 25 30 35 40 45 Create subsets: The following simple commands will create subsets of the original data frame a: a1 <- a[,="" 1:3]="" #a="" new="" data="" frame="" with="" only="" the="" first="" three="" columns.="" a2=""><- a[,="" c(1:3,8,10)]="" #a="" new="" data="" frame="" with="" columns="" 1,2,3,8,10.="" 4="" another="" data="" set:="" the="" following="" data="" were="" collected="" in="" the="" area="" west="" of="" the="" town="" stein="" in="" the="" netherlands="" near="" the="" river="" meuse="" (dutch="" maas)="" river="" (see="" map="" below).="" the="" actual="" data="" set="" contains="" many="" variables="" but="" here="" we="" will="" use="" the="" x,="" y="" coordinates="" and="" the="" concentration="" of="" lead="" and="" zinc="" in="" ppm="" at="" each="" data="" point.="" the="" motivation="" for="" this="" study="" was="" to="" predict="" the="" concentration="" of="" heavy="" metals="" around="" the="" banks="" of="" the="" maas="" river="" in="" this="" area.="" these="" heavy="" metals="" were="" accumulated="" over="" the="" years="" because="" of="" the="" river="" pollution.="" here="" is="" the="" area="" of="" study:="" 5="" exercise:="" a.="" you="" can="" access="" these="" data="" using:="" b=""><- read.table("http://www.stat.ucla.edu/~nchristo/statistics12/soil.txt",="" header="TRUE)" b.="" construct="" the="" stem-and-leaf="" plot,="" histrogram,="" and="" boxplot="" for="" each="" one="" of="" the="" two="" variables="" (lead="" and="" zinc),="" and="" compute="" the="" summary="" statistics.="" what="" do="" you="" observe?="" c.="" transform="" the="" data="" in="" order="" to="" produce="" a="" symmetrical="" histrogram.="" here="" is="" what="" you="" can="" do:=""> log_lead <- log(b$lead)=""> log_zinc <- log(b$zinc) construct the stem-and-leaf plot, histrogram, and boxplot for each one of the new vari- ables (log lead and log zinc), and log(b$zinc)="" construct="" the="" stem-and-leaf="" plot,="" histrogram,="" and="" boxplot="" for="" each="" one="" of="" the="" new="" vari-="" ables="" (log="" lead="" and="" log="" zinc),="">
Answered Same DayJan 17, 2021

Answer To: University of California, Los Angeles Department of Statistics Statistics 12 Instructor: Nicolas...

Sourav answered on Jan 20 2021
159 Votes
Stats 10 Lab 1, Submission
Name: Xiangrong Pu
UID: 705129099

Section 1
1) a)
> heights <- c(6.0, 5.8, 5.2)
> print(heights)
[1] 6.0 5.8 5.2

b)
> names <- c("Eric", "Chris
tina", "Xiangrong")
> print(names)
[1] “Eric" "Christina" "Xiangrong"
c)
> cbind(names, heights)
names heights
[1,] "6" "Erirk"
[2,] "5.8" "Christina"
[3,] "5.2" "Xiangrong"

> class(cbind(names, heights))
[1] "matrix"

2) a)
setwd("~/Desktop/Stats 10 Lab1")

Ncbirths <- read.csv (“births.csv”)
b)
head(NCbirths)
Gender Premie weight Apgar1 Fage Mage Feduc Meduc TotPreg Visits
1 Male No 124 8 31 25 13 14 1 13
2 Female No 177 8 36 26 9 12 2 11
3 Male No 107 3 30 16 12 8 2 10
4 Female No 144 6 33 37 12 14 2 12
5 Male No 117 9 36 33 10 16 2 19
6 Female No 98 4 31 29 14 16 3 20

Marital Racemom Racedad Hispmom Hispdad Gained Habit MomPriorCond
1 Married White White NotHisp NotHisp 40 NonSmoker None
2 Unmarried White White Mexican Mexican 20 NonSmoker None
3 Unmarried White Unknown Mexican Unknown 70 NonSmoker At Least One
4 Unmarried White White NotHisp NotHisp 50 NonSmoker None
5 Married White Black NotHisp NotHisp 40 NonSmoker At Least One
6 Married White White NotHisp NotHisp 21 NonSmoker None
BirthDef DelivComp BirthComp
1 None At Least One None
2 None At Least One None
3 None At Least One None
4 None At Least One None
5 None None None
6 None None None

3) a)
> #install.packages("maps")
> find.package("maps")
[1] "C:/Users/Xiangrong/Documents/R/win-library/3.6/maps"
...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here