(a) Compute the mean and the standard deviation for 1, 1.5, 2, 2.5, 3. (b) Compute the mean and the standard deviation for 1, 1.5, 2, 2.5, 30. (c) Comment on the differences. 2. Testing normality of...

(a) Compute the mean and the standard deviation for 1, 1.5, 2, 2.5, 3. (b) Compute the mean and the standard deviation for 1, 1.5, 2, 2.5, 30. (c) Comment on the differences. 2. Testing normality of gene expression. Consider the gene expression values in row 790 and 66 of the Golub et al. (1999) data. (a) Produce a box plot for the expression values of the ALL patients and comment on the differences. Are there outliers? (b) Produce a QQ-plot and formulate a hypothesis about the normality of the genes. (c) Compute the mean and the median for the expression values of the ALL patients and compare these. Do this for both genes. 3. Effect size. An important statistic to measure is the effect size which is defined for a sample as x/s. It measures the mean relative to the standard deviation, so that its value is large when the mean is large and the standard deviation small. (a) Determine the five genes with the largest effect size of the ALL patients from the Golub et al. (1999) data. Comment on their size. (b) Invent a robust variant of the effect size and use it to answer the previous question. 4. Plotting gene expressions for CCND3. Use the gene expressions from CCND3 (Cyclin D3) of Golub et al. (1999) collected in row 1042 of the matrix golub from the multtest library. Use grep() to get the correct row for the CCND3 (Cyclin D3) gene expression values. After using the function plot(), you will produce an object on which you can program. (a) Produce a so-called stripchart for the gene expressions separately for the ALL as well as for the AML patients. Hint: Use factor() to separate the data between the two categories. (b) Rotate the plot to a vertical position and keep it that way for the questions to come. 54 CHAPTER 2. DATA DISPLAY AND DESCRIPTIVE STATISTICS (c) Color the ALL expressions red and AML blue. Hint: Use the col parameter. (d) Add a title to the plot. Hint: Use the title() function. (e) Change the boxes into stars. Hint 1: Use the pch parameter. Hint 2: Using your favorite text editor, save the final script for later use. 5. Box-and-Whiskers plot of CCND3 expression. Use the gene expressions for CCND3 (Cyclin D3) of Golub et al. (1999) from row 1042 of the matrix golub for the ALL patients. Use grep() to get the correct row for the CCND3 (Cyclin D3) gene expression values. (a) Construct the box plot in Figure 2.15. (b) Add text to the plot to explain the meaning of the upper and lower part of the box. (c) Do the same for the whiskers. (d) Export your plot to eps format. Hint 1: Use boxplot.stats() to find coordinates of the positions in the plot. Hint 2: Use parameter xlim when calling boxplot() to make the plot somewhat wider. Hint 3: Use arrows() to add an arrow. Hint 4: Use lwd() to make line widths wider. Hint 5: Use text() to add information at a certain position. 6. Box-and-whiskers plot of patients. (a) Use boxplot(data.frame(golub)) to produce a box-and-whiskers plot for each column (patient). Make a screen shot to save it in a word processor. Describe what you see. Are the medians of similar size? Is the inter quartile range more or less equal? Are there outliers? (b) Compute the mean and medians of the patients. What do you observe? 2.4. EXERCISES 55 (c) Compute the range (minimal and maximum value) of the standard deviations, the IQR and MAD of the patients. Comment on what you observe. 0.5 1.0 1.5 2.0 2.5 CCND3 (Cyclin D3) Expression lower whisker first quartile Median third quartile upper whisker Outlier Outlier Outlier Figure 2.15: Box plot with arrows and explaining text. 7. Oncogenes in the Golub et al. (1999) data. (a) Select the oncogenes with the grep(“oncogene”) function and produce a box-and-whiskers plot of the gene expressions of the ALL patients. Be sure to perform a case-insensitive grep() search. 56 CHAPTER 2. DATA DISPLAY AND DESCRIPTIVE STATISTICS (b) Do the same for the AML patients and use par(mfrow=c(2,1)) to combine the two plots such that the second is beneath the first. The par(mfrow=c(2,1)) command splits the plotting canvas into 2 rows and 1 column. After the two boxplot() calls then you can go back to the default 1 plot per window behavior with the par(mfrow=c(1,1)) command. Are there genes with clear differences between the groups? 8. Descriptive statistics for the ALL gene expression values. (a) Compute the mean and median for gene expression values of the ALL patients, report their range and comment on it. (b) Compute the SD, IQR, and MAD for gene expression values of the ALL patients, report their range and comment on it.
Feb 10, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here