BME 530 Assignment 2 Complete the assignment using R. Q1. (2pts) Search a DNA database with a sample. Each time you attempt to match this sample to an entry in the database, there is a probability of...

1 answer below »
I need help with my R assignment


BME 530 Assignment 2 Complete the assignment using R. Q1. (2pts) Search a DNA database with a sample. Each time you attempt to match this sample to an entry in the database, there is a probability of an accidental chance match of 1e-4. Chance matches are independent. There are 20,000 people in the database. What is the probability I get at least one match, purely by chance? Q2. (2 pts) Disease A occurs with a probability 0.1, and disease B occurs with a probability 0.2. It is not possible to have both diseases. Suppose that there is a single test witch reports positive with a probability 0.8 for a patient with disease A, with a probability 0.5 for a patient with disease B, and with a probability 0.01 for a patient with no disease. What is the probability that you have no disease even if the test comes back positive? Please use the following notations. Let ?: the event that you have disease A; ℬ: the event that you have disease B; ?: the event that you have no disease; and ?: the event that the test result is positive. Q3. (2pts) Stacked Auditory Brainstem Response. The failure of standard auditory brainstem response (ABR) measures to detect small (<1 cm)="" acoustic="" tumors="" has="" led="" to="" the="" use="" of="" enhanced="" magnetic="" resonance="" imaging="" (mri)="" as="" the="" standard="" to="" screen="" for="" small="" tumors.="" a="" study="" investigated="" the="" suitability="" of="" the="" stacked="" abr="" as="" a="" sensitive="" screening="" alternative="" to="" mri="" for="" small="" acoustic="" tumors="" (sats).="" the="" objective="" of="" the="" study="" was="" to="" determine="" the="" sensitivity="" and="" specificity="" of="" the="" stacked="" abr="" technique="" for="" detecting="" sats.="" a="" total="" of="" 54="" patients="" were="" studied="" who="" had="" mri-identified="" acoustic="" tumors="" that="" were="" either=""><1cm in size or undetected by standard abr methods, irrespective of size. there were 78 nontumor normal-hearing subjects who tested as controls. the stacked abr demonstrated 95% sensitivity and 88% specificity. please recover the testing result table. q4. (4pts) duchenne muscular dystrophy, sometimes shortened to dmd or just duchenne, is a rare genetic disease. it primarily affects males, but, in rare cases, can also affect females. duchenne causes the muscles in the body to become weak and damaged over time, and is eventually fatal. the genetic change that causes duchenne — a mutation in the dmd gene — happens before birth and can be inherited, or new mutations in the gene can occur spontaneously . researchers used measures of pyruvate kinase and lactate dehydrogenase to assess an individual’s carrier status. the following table summarizes the test results. woman carrier woman not carrier total test positive 56 6 62 test negative 11 121 132 total 67 127 194 (a) compute the sensitivity and specificity of the test. the sample used in the test study is not representative of the general population for which the prevalence of carriers is 0.03%, or 3 in 10,000. with this information, find the ppv of the test, that is, the probability that a woman is a dmd carrier if she tested positive. (c) what is the ppv if the table was constructed from a random sample of 194 subjects from a general population? (d) approximate the probability that among 15,000 women randomly selected from a general population, at least 2 are dmd carriers. q5. (10pts) data have been collected for evaluating two biomarkers for pancreatic cancer. see the attached data file. more specifically, m = 51 'control' patients with pancreatitis and n =90 'cases' with pancreatic cancer were studied at the mayo clinic with a cancer antigen (ca125) and with a carbohydrate antigen (ca19-9). they are measured for each patient and collected in the first and the second columns respectively in the file. the value “0” in the third column “0” represent control status for each patient. 1) compute all distinct pairs of sensitivity and specificity. how many are there? 2) plot the receiver operating characteristic (roc) curve using the computed values from (1) for each biomarker. 3) calculated the area under the roc curve (auc) for both biomarkers. based on your calculation, which biomarker is better? 4) for each biomarker, choose the best threshold (justify your choice) and compute the sensitivity, specificity, predicted value, and negative predicted value for each biomarker based on that threshold. note: confirm your result with the package here to draw roc curve. but no credit will be given if you use the package to complete the assignment. https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/ submission. submit a zip file (hw2_name.zip) on blackboard that includes: 1) both r notebook (.rmd) and rendered results (.html). your notebook should include some annotations for the scripts so that a grader knows what the script is for. 2) report document (.pdf) https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/ ca 19-9,ca 125,cancer status 28,13.3,0 15.5,11.1,0 8.2,16.7,0 3.4,12.6,0 17.3,7.4,0 15.2,5.5,0 32.9,32.1,0 11.1,27.2,0 87.5,6.6,0 16.2,9.8,0 107.9,10.5,0 5.7,7.8,0 25.6,9.1,0 31.2,12.3,0 21.6,12,0 55.6,42.1,0 8.8,5.9,0 6.5,9.2,0 22.1,7.3,0 14.4,6.8,0 44.2,10.7,0 3.7,15.7,0 7.8,8,0 8.9,6.8,0 18,47.35,0 6.5,17.9,0 4.9,96.2,0 10.4,108.9,0 5,16.6,0 5.3,9.5,0 6.5,179,0 6.9,12.1,0 8.2,35.6,0 21.8,15,0 6.6,12.6,0 7.6,5.9,0 15.4,10.1,0 59.2,8.5,0 5.1,11.4,0 10,54.65,0 5.3,9.7,0 32.6,11.2,0 4.6,35.7,0 6.9,22.5,0 4,21.2,0 3.65,5.6,0 7.8,9.4,0 32.5,12,0 11.5,9.8,0 4,17.2,0 10.2,10.6,0 2.4,79.1,1 719,31.4,1 2106.667,15,1 24000,77.8,1 1715,25.7,1 3.6,11.7,1 521.5,8.25,1 1600,14.95,1 454,8.7,1 109.7,14.1,1 23.7,123.9,1 464,12.1,1 9810,99.1,1 255,18.6,1 58.7,10.5,1 225,6.6,1 90.1,74,1 50,43.9,1 5.6,45.7,1 4070,13,1 592,7.3,1 28.6,8.6,1 6160,17.2,1 1090,15.4,1 10.4,14.3,1 27.3,93.1,1 162,66.3,1 3560,26.7,1 14.7,32.4,1 83.3,9.9,1 336,30.3,1 55.7,11.2,1 1520,202,1 3.9,35.7,1 5.8,9.2,1 8.45,103.6,1 361,21.4,1 369,8.1,1 8230,29.9,1 39.3,17.5,1 43.5,30.8,1 361,57.3,1 12.8,6.5,1 18,33.8,1 9590,53.6,1 555,17.2,1 60.2,94.2,1 21.8,33.5,1 900,3.7,1 6.6,11.7,1 239,19.9,1 3100,38.7,1 3275,27.3,1 682,20.1,1 85.4,86.1,1 10290,844,1 770,36.9,1 247.6,6.9,1 12320,27.7,1 113.1,9.9,1 1079,38.6,1 45.6,142.6,1 1630,12.5,1 79.4,11.6,1 508,21.2,1 3190,13.2,1 542,19.2,1 1021,1024,1 235,14.1,1 251,34.8,1 3160,35.3,1 479,35,1 222,15.5,1 15.7,12.1,1 2540,31.6,1 11630,184.8,1 1810,24.8,1 6.9,10.4,1 4.1,34.5,1 15.6,19.4,1 9820,22.2,1 1490,53.9,1 15.7,15.4,1 45.8,17.3,1 7.8,36.8,1 12.8,49.8,1 100.5333,26.5667,1 227,9.7,1 70.9,19.2,1 2500,14.2,1 in="" size="" or="" undetected="" by="" standard="" abr="" methods,="" irrespective="" of="" size.="" there="" were="" 78="" nontumor="" normal-hearing="" subjects="" who="" tested="" as="" controls.="" the="" stacked="" abr="" demonstrated="" 95%="" sensitivity="" and="" 88%="" specificity.="" please="" recover="" the="" testing="" result="" table.="" q4.="" (4pts)="" duchenne="" muscular="" dystrophy,="" sometimes="" shortened="" to="" dmd="" or="" just="" duchenne,="" is="" a="" rare="" genetic="" disease.="" it="" primarily="" affects="" males,="" but,="" in="" rare="" cases,="" can="" also="" affect="" females.="" duchenne="" causes="" the="" muscles="" in="" the="" body="" to="" become="" weak="" and="" damaged="" over="" time,="" and="" is="" eventually="" fatal.="" the="" genetic="" change="" that="" causes="" duchenne="" —="" a="" mutation="" in="" the="" dmd="" gene="" —="" happens="" before="" birth="" and="" can="" be="" inherited,="" or="" new="" mutations="" in="" the="" gene="" can="" occur="" spontaneously="" .="" researchers="" used="" measures="" of="" pyruvate="" kinase="" and="" lactate="" dehydrogenase="" to="" assess="" an="" individual’s="" carrier="" status.="" the="" following="" table="" summarizes="" the="" test="" results.="" woman="" carrier="" woman="" not="" carrier="" total="" test="" positive="" 56="" 6="" 62="" test="" negative="" 11="" 121="" 132="" total="" 67="" 127="" 194="" (a)="" compute="" the="" sensitivity="" and="" specificity="" of="" the="" test.="" the="" sample="" used="" in="" the="" test="" study="" is="" not="" representative="" of="" the="" general="" population="" for="" which="" the="" prevalence="" of="" carriers="" is="" 0.03%,="" or="" 3="" in="" 10,000.="" with="" this="" information,="" find="" the="" ppv="" of="" the="" test,="" that="" is,="" the="" probability="" that="" a="" woman="" is="" a="" dmd="" carrier="" if="" she="" tested="" positive.="" (c)="" what="" is="" the="" ppv="" if="" the="" table="" was="" constructed="" from="" a="" random="" sample="" of="" 194="" subjects="" from="" a="" general="" population?="" (d)="" approximate="" the="" probability="" that="" among="" 15,000="" women="" randomly="" selected="" from="" a="" general="" population,="" at="" least="" 2="" are="" dmd="" carriers.="" q5.="" (10pts)="" data="" have="" been="" collected="" for="" evaluating="" two="" biomarkers="" for="" pancreatic="" cancer.="" see="" the="" attached="" data="" file.="" more="" specifically,="" m="51" 'control'="" patients="" with="" pancreatitis="" and="" n="90" 'cases'="" with="" pancreatic="" cancer="" were="" studied="" at="" the="" mayo="" clinic="" with="" a="" cancer="" antigen="" (ca125)="" and="" with="" a="" carbohydrate="" antigen="" (ca19-9).="" they="" are="" measured="" for="" each="" patient="" and="" collected="" in="" the="" first="" and="" the="" second="" columns="" respectively="" in="" the="" file.="" the="" value="" “0”="" in="" the="" third="" column="" “0”="" represent="" control="" status="" for="" each="" patient.="" 1)="" compute="" all="" distinct="" pairs="" of="" sensitivity="" and="" specificity.="" how="" many="" are="" there?="" 2)="" plot="" the="" receiver="" operating="" characteristic="" (roc)="" curve="" using="" the="" computed="" values="" from="" (1)="" for="" each="" biomarker.="" 3)="" calculated="" the="" area="" under="" the="" roc="" curve="" (auc)="" for="" both="" biomarkers.="" based="" on="" your="" calculation,="" which="" biomarker="" is="" better?="" 4)="" for="" each="" biomarker,="" choose="" the="" best="" threshold="" (justify="" your="" choice)="" and="" compute="" the="" sensitivity,="" specificity,="" predicted="" value,="" and="" negative="" predicted="" value="" for="" each="" biomarker="" based="" on="" that="" threshold.="" note:="" confirm="" your="" result="" with="" the="" package="" here="" to="" draw="" roc="" curve.="" but="" no="" credit="" will="" be="" given="" if="" you="" use="" the="" package="" to="" complete="" the="" assignment.="" https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/="" submission.="" submit="" a="" zip="" file="" (hw2_name.zip)="" on="" blackboard="" that="" includes:="" 1)="" both="" r="" notebook="" (.rmd)="" and="" rendered="" results="" (.html).="" your="" notebook="" should="" include="" some="" annotations="" for="" the="" scripts="" so="" that="" a="" grader="" knows="" what="" the="" script="" is="" for.="" 2)="" report="" document="" (.pdf)="" https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/="" ca="" 19-9,ca="" 125,cancer="" status="" 28,13.3,0="" 15.5,11.1,0="" 8.2,16.7,0="" 3.4,12.6,0="" 17.3,7.4,0="" 15.2,5.5,0="" 32.9,32.1,0="" 11.1,27.2,0="" 87.5,6.6,0="" 16.2,9.8,0="" 107.9,10.5,0="" 5.7,7.8,0="" 25.6,9.1,0="" 31.2,12.3,0="" 21.6,12,0="" 55.6,42.1,0="" 8.8,5.9,0="" 6.5,9.2,0="" 22.1,7.3,0="" 14.4,6.8,0="" 44.2,10.7,0="" 3.7,15.7,0="" 7.8,8,0="" 8.9,6.8,0="" 18,47.35,0="" 6.5,17.9,0="" 4.9,96.2,0="" 10.4,108.9,0="" 5,16.6,0="" 5.3,9.5,0="" 6.5,179,0="" 6.9,12.1,0="" 8.2,35.6,0="" 21.8,15,0="" 6.6,12.6,0="" 7.6,5.9,0="" 15.4,10.1,0="" 59.2,8.5,0="" 5.1,11.4,0="" 10,54.65,0="" 5.3,9.7,0="" 32.6,11.2,0="" 4.6,35.7,0="" 6.9,22.5,0="" 4,21.2,0="" 3.65,5.6,0="" 7.8,9.4,0="" 32.5,12,0="" 11.5,9.8,0="" 4,17.2,0="" 10.2,10.6,0="" 2.4,79.1,1="" 719,31.4,1="" 2106.667,15,1="" 24000,77.8,1="" 1715,25.7,1="" 3.6,11.7,1="" 521.5,8.25,1="" 1600,14.95,1="" 454,8.7,1="" 109.7,14.1,1="" 23.7,123.9,1="" 464,12.1,1="" 9810,99.1,1="" 255,18.6,1="" 58.7,10.5,1="" 225,6.6,1="" 90.1,74,1="" 50,43.9,1="" 5.6,45.7,1="" 4070,13,1="" 592,7.3,1="" 28.6,8.6,1="" 6160,17.2,1="" 1090,15.4,1="" 10.4,14.3,1="" 27.3,93.1,1="" 162,66.3,1="" 3560,26.7,1="" 14.7,32.4,1="" 83.3,9.9,1="" 336,30.3,1="" 55.7,11.2,1="" 1520,202,1="" 3.9,35.7,1="" 5.8,9.2,1="" 8.45,103.6,1="" 361,21.4,1="" 369,8.1,1="" 8230,29.9,1="" 39.3,17.5,1="" 43.5,30.8,1="" 361,57.3,1="" 12.8,6.5,1="" 18,33.8,1="" 9590,53.6,1="" 555,17.2,1="" 60.2,94.2,1="" 21.8,33.5,1="" 900,3.7,1="" 6.6,11.7,1="" 239,19.9,1="" 3100,38.7,1="" 3275,27.3,1="" 682,20.1,1="" 85.4,86.1,1="" 10290,844,1="" 770,36.9,1="" 247.6,6.9,1="" 12320,27.7,1="" 113.1,9.9,1="" 1079,38.6,1="" 45.6,142.6,1="" 1630,12.5,1="" 79.4,11.6,1="" 508,21.2,1="" 3190,13.2,1="" 542,19.2,1="" 1021,1024,1="" 235,14.1,1="" 251,34.8,1="" 3160,35.3,1="" 479,35,1="" 222,15.5,1="" 15.7,12.1,1="" 2540,31.6,1="" 11630,184.8,1="" 1810,24.8,1="" 6.9,10.4,1="" 4.1,34.5,1="" 15.6,19.4,1="" 9820,22.2,1="" 1490,53.9,1="" 15.7,15.4,1="" 45.8,17.3,1="" 7.8,36.8,1="" 12.8,49.8,1="" 100.5333,26.5667,1="" 227,9.7,1="" 70.9,19.2,1="">
Answered 8 days AfterOct 29, 2021

Answer To: BME 530 Assignment 2 Complete the assignment using R. Q1. (2pts) Search a DNA database with a...

Jobin answered on Nov 07 2021
114 Votes
---
title: "BME 530 Assignment 2"
output: html_document
author: "Thao"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
**Question 1:**
Search a DNA database with a sample. Each time you attempt to match this sample to an entry in the database, there is a probability of an accidental chance match of 1e-4. Chance matches are independent. There are 20,000 people in the database. What is the probability I get at least one match, purely by chance?
**Solution 1:**
Probability of at least 1 matc
h purely by chance = 1 – Probability of zero matches
Since chance matches are independent events, the following equation holds true.
Probability of zero matches = (1 - Probability of an accidental chance match) ^ 20000
Probability of zero matches = (1 - 0.0001) ^ 20000
Probability of zero matches = ```r (1 - 0.0001) ^ 20000```
Probability of at least 1 match purely by chance = 1 - 0.1353217
Probability of at least 1 match purely by chance = ```r 1 - 0.1353217```
Therefore, the probability of at least 1 match purely by chance is ```r round((1 - 0.1353217)*100, 1)```%
**Question 2:**
Disease A occurs with a probability 0.1, and disease B occurs with a probability 0.2. It is not possible to have both diseases. Suppose that there is a single test which reports positive with a probability 0.8 for a patient with disease A, with a probability 0.5 for a patient with disease B, and with a probability 0.01 for a patient with no disease. What is the probability that you have no disease even if the test comes back positive? Please use the following notations. Let �: the event that you have disease A; ℬ: the event that you have disease B; �: the event that you have no disease; and �: the event that the test result is positive.
**Solution 2:**
Using the notations provided in the problem statement and applying Bayes theorem, we have:
$$
P (W|T) = \frac{P (T|W )* P (W)}{(P (T|W) * P (W)) + (P (T|X) * P(X))}
$$
Here, let P (X) be the probability of having a disease.
Now, P (X) = P (� or ℬ)
Since P (� Ո ℬ) = 0, P (� or ℬ) = P (�) + P (ℬ)
Therefore, P (X) = 0.3
Also, P (�) = 1 – P (X)
Therefore, P (�) = 0.7
P (T | X) * P (X) = [P (T | �) * P (�)] + [P (T | ℬ) * P (ℬ)]
P (T | X) * P (X) = (0.8 * 0.1) + (0.5 * 0.2)
P (T | X) * P (X) = ```r (0.8*0.1) + (0.5*0.2)```
Substituting the values, into the formula for P (W|T) shown above, we have:
$$
P (W|T) =\frac{P (T|W)* P(W)}{(P (T|W) * P(W))+(P (T|X) * P(X))}
$$
$$
P (W|T) =\frac{(0.01 * 0.7)}{(0.01 * 0.7)+(0.18)}
$$
P (W|T) = ```r (0.01* 0.7)/((0.01* 0.7)+(0.18))```
Therefore, the probability that one does not have any disease even if the test comes back positive is ```r round(((0.01* 0.7)/((0.01* 0.7)+(0.18)))*100, 1)```%.
**Question 3:**
Stacked Auditory Brainstem Response. The failure of standard auditory brainstem response (ABR) measures to detect small (<1 cm) acoustic tumors has led to the use of enhanced magnetic resonance imaging (MRI) as the standard to screen for small tumors. A study investigated the suitability of the stacked ABR as a sensitive screening alternative to MRI for small acoustic tumors (SATs). The objective of the study was to determine the sensitivity and specificity of the stacked ABR technique for detecting SATs. A total of 54 patients were studied who had MRI-identified acoustic tumors that were either <1cm in size or undetected by standard ABR methods, irrespective of size. There were 78 nontumor normal-hearing subjects who tested as controls. The stacked ABR demonstrated 95% sensitivity and 88% specificity. Please recover the testing result table.
**Solution 3:**
```{r, echo=F, warning=F}
# install.library(pacman)
library(pacman)
p_load(data.table, knitr, kableExtra)
a <- round(0.95*54)
b <- round(0.05*54)
c <- round(0.12*78)
d <- round(0.88*78)
e <- round(a+c)
f <- round(b+d)
data <- data.table(TestResult=c('Test positive','Test negative','Total'), PatientsWithTumor=c(a, b, 54), PatientsWithoutTumor=c(c, d, 78), Total=c(e,f,132))
kable_styling(kbl(data, caption='Table 1: Testing result table'))
```
**Question 4:**
Duchenne muscular dystrophy, sometimes shortened to DMD or just Duchenne, is a rare genetic disease. It primarily affects males, but, in rare cases, can also affect females. Duchenne causes the muscles in the body to become weak and damaged over time, and is eventually fatal. The genetic change that causes Duchenne - a mutation in the DMD gene — happens before birth and can be inherited, or new mutations in the gene can occur spontaneously. Researchers used measures of pyruvate kinase and lactate dehydrogenase to assess an individual’s carrier status. The following table summarizes the test results.
```{r, echo=F, warning=F}
# install.library(pacman)
library(pacman)
p_load(data.table, knitr, kableExtra)
data <- data.table(TestResult=c('Test positive','Test negative','Total'), WomanCarrier=c(56, 11, 67), WomanNotCarrier=c(6, 121, 127), Total=c(62,132,194))
kable_styling(kbl(data, caption='Table 2: Test results'))
```
a) Compute the sensitivity and specificity of the test.
b) The sample used in the test study is not representative of the general population for which the prevalence of carriers is 0.03%, or 3 in 10,000. With this information,...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here