It's a bioinformatics assignment, I have two articles, from one article datasets are selected and...

Question

It's a bioinformatics assignment, I have two articles, from one article datasets are selected and apply the methods of another article on the dataset and get the results and write a report of that whole project.

· The main idea of the project is to take two articles /papers. Take dataset from one article/paper apply the methods (for analysis) from the other article/paper on the dataset you collected from the first article/paper and get the results. You must submit a report on what you have done. · You must get an idea from those two articles/papers and make a project from them. (you can take any two articles/papers from bioinformatic stream after 2017 and do analysis and submit the results and report) · Use python as the programming language to do analysis during your project. Colorectal cancer stages transcriptome analysis RESEARCH ARTICLE Colorectal cancer stages transcriptome analysis Tianyao Huo1, Ronald Canepa2, Andrei Sura1, François Modave1, Yan Gong3,4* 1 Department of Health Outcomes & Policy, College of Medicine, University of Florida, Gainesville, Florida, United States of America, 2 Information Technology and Services, University of Florida, Gainesville, Florida, United States of America, 3 Department of Pharmacotherapy and Translational Research and Center for Pharmacogenomics, College of Pharmacy, University of Florida, Gainesville, Florida, United States of America, 4 University of Florida Health Cancer Center, Gainesville, Florida, United States of America * [email protected] Abstract Colorectal cancer (CRC) is the third most common cancer and the second leading cause of cancer-related deaths in the United States. The purpose of this study was to evaluate the gene expression differences in different stages of CRC. Gene expression data on 433 CRC patient samples were obtained from The Cancer Genome Atlas (TCGA). Gene expression differences were evaluated across CRC stages using linear regression. Genes with p�0.001 in expression differences were evaluated further in principal component analysis and genes with p�0.0001 were evaluated further in gene set enrichment analysis. A total of 377 patients with gene expression data in 20,532 genes were included in the final analysis. The numbers of patients in stage I through IV were 59, 147, 116 and 55, respectively. NEK4 gene, which encodes for NIMA related kinase 4, was differentially expressed across the four stages of CRC. The stage I patients had the highest expression of NEK4 genes, while the stage IV patients had the lowest expressions (p = 9*10−6). Ten other genes (RNF34, HIST3H2BB, NUDT6, LRCh4, GLB1L, HIST2H4A, TMEM79, AMIGO2, C20orf135 and SPSB3) had p value of 0.0001 in the differential expression analysis. Principal component analysis indicated that the patients from the 4 clinical stages do not appear to have distinct gene expression pattern. Network-based and pathway-based gene set enrichment analyses showed that these 11 genes map to multiple pathways such as meiotic synapsis and pack- aging of telomere ends, etc. Ten of these 11 genes were linked to Gene Ontology terms such as nucleosome, DNA packaging complex and protein-DNA interactions. The protein complex-based gene set analysis showed that four genes were involved in H2AX complex II. This study identified a small number of genes that might be associated with clinical stages of CRC. Our analysis was not able to find a molecular basis for the current clinical staging for CRC based on the gene expression patterns. Introduction Colorectal cancer (CRC) is the third most common cancer and the second leading cause of can- cer-related deaths in the United States [1]. Among the five subtypes of CRC (adenocarcinomas, PLOS ONE | https://doi.org/10.1371/journal.pone.0188697 November 28, 2017 1 / 11 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPENACCESS Citation: Huo T, Canepa R, Sura A, Modave F, Gong Y (2017) Colorectal cancer stages transcriptome analysis. PLoS ONE 12(11): e0188697. https://doi.org/10.1371/journal. pone.0188697 Editor: Hiromu Suzuki, Sapporo Ika Daigaku, JAPAN Received: June 2, 2017 Accepted: November 10, 2017 Published: November 28, 2017 Copyright: © 2017 Huo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: TCGA clinical data and expression data were manually downloaded from the Broad Institute (TCGA data version 2016_01_28) via the firebrowse.org website (http://firebrowse.org/?cohort= COADREAD&download_dialog=true). The code used to download the data can be accessed here: https://github.com/indera/crc_transcriptome_ analysis. Funding: The authors received no specific funding for this work. https://doi.org/10.1371/journal.pone.0188697 http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0188697&domain=pdf&date_stamp=2017-11-28 http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0188697&domain=pdf&date_stamp=2017-11-28 http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0188697&domain=pdf&date_stamp=2017-11-28 http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0188697&domain=pdf&date_stamp=2017-11-28 http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0188697&domain=pdf&date_stamp=2017-11-28 http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0188697&domain=pdf&date_stamp=2017-11-28 https://doi.org/10.1371/journal.pone.0188697 https://doi.org/10.1371/journal.pone.0188697 http://creativecommons.org/licenses/by/4.0/ http://firebrowse.org/?cohort=COADREAD&download_dialog=true http://firebrowse.org/?cohort=COADREAD&download_dialog=true https://github.com/indera/crc_transcriptome_analysis https://github.com/indera/crc_transcriptome_analysis carcinoid tumors, gastrointestinal stromal tumors, lymphomas and sarcomas), adenocarcino- mas are the most common (95% of all CRCs). Currently the staging of CRC, referred to as clini- cal staging, is based on results of physical exams, biopsies, and imaging tests (CT or MRI scan, X-rays, PET scan, etc.). The criteria of staging are based on: 1) how far the cancer has grown into the wall of the intestine; 2) whether it has reached nearby structures; and 3) whether it has spread to the nearby lymph nodes or to distant organs. The results of surgery can be combined with clinical staging to determine the pathologic stages. The most often used CRC staging sys- tem is the AJCC cancer staging manual developed by American Joint Committee on Cancer (AJCC), based on conditions of primary tumor (T), regional lymph nodes (N) and distant metastasis (M) [2]. The earliest stage cancers are called stage 0, then range from stage I through IV, with additional sub-stages identified with the letters A, B and C [3]. Several genes, such as WNT, WAPK/PI3K, TGF-β, TP, have been associated with CRC. For instance, mutations in adenomatous polyposis col (APC) gene, a tumor suppressor gene, were found to be responsible for familial adenomatous polyposis and then further developed to CRC [4]. MisMatch Repair system genes such as MLH1 and MSH2 gene were found to be asso- ciated with Lynch syndrome, the most frequent form of hereditary CRC [5, 6]. Further, a 12-gene recurrence score assay has been developed as a prognostic factor in stage II-III colon or rectal carcinoma [7–9]. Even though many genes have been associated with an increased risk of CRC, the genetic differences across different stages of CRC have not been clearly identi- fied. So far, only one study had assessed the gene expression levels of three candidate genes (MMP9, MMP28 and TIMP1) across CRC stages and found no statistically significant differ- ences based on the stage of CRC [10]. There have been no studies in the literature comparing the gene expression levels in the entire transcriptome across CRC stages. The purpose of this study is to explore transcriptome-wide gene expression differences across different stages of CRC followed by gene ontology, gene set network analysis approaches based on the publicly available RNAseq dataset in The Cancer Genome Atlas (TCGA) [11]. Materials and methods Data acquisition The Cancer Genome Atlas (TCGA) (http://cancergenome.nih.gov/) is a joint effort between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) to facilitate the sharing of data and speed up cancer research [11, 12]. The Eli and Edythe L. Broad (Broad) Institute of MIT and Harvard is a joint venture between both institu- tions and several area hospitals (https://www.broadinstitute.org/about-us). Their “FireHose” project ingests, aggregates, standardizes, and processes TCGA data via automated pipelines in an attempt to accelerate analysis and discoveries (https://confluence.broadinstitute.org/ display/GDAC/Rationale). The Broad Institute has established pipelines for processing each TCGA dataset and the outputs from each stage of the pipeline are made available as a versioned set. Illumina HiSeq expression data was processed by Broad Institute to output both reads per kilobase per million mapped reads (RPKM) expression values [13] and RNA-seq by Expectation-Maximization (RSEM) values [14] normalized to “upper quartile count at 1000”. TCGA clinical data and expression data were manually downloaded from the Broad Institute (TCGA data version 2016_01_28) via the firebrowse.org website. (http://firebrowse.org/?cohort=COADREAD&download_dialog=true). The code used to download the data can be accessed here: https://github.com/indera/crc_transcriptome_ analysis. Colorectal cancer stages transcriptome analysis PLOS ONE | https://doi.org/10.1371/journal.pone.0188697 November 28, 2017 2 / 11 Competing interests: The authors have declared that no competing interests exist. http://cancergenome.nih.gov/ https://www.broadinstitute.org/about-us https://confluence.broadinstitute.org/display/GDAC/Rationale https://confluence.broadinstitute.org/display/GDAC/Rationale http://firebrowse.org http://firebrowse.org/?cohort=COADREAD&download_dialog=true https://github.com/indera/crc_transcriptome_analysis https://github.com/indera/crc_transcriptome_analysis https://doi.org/10.1371/journal.pone.0188697 Data merging Using Python 2.7.10 and version 0.19.0 of the Pandas module, the expression data from the Broad Institute was read into a Pandas dataframe, transposed, and re-saved. The clinical data were also transposed in the same manner. Additionally, in order to cut down on the size of the data and number of components of interest, only a subset of the columns from the clinical data were kept for the analysis. These included common demographic data such as patient gender, race, ethnicity, and age; clinical data such as cancer stage, associated International Classification of Diseases (ICD) 10 codes, presence of polyps, whether analysis had been done for common mutations such as KRAS and BRAF; and finally, approximately 85 different ali- quot identifiers from the TCGA dataset itself. Matching of clinical data with expression data was performed using TCGA’s "hybridization REF" identifier from the expression data and searching against the aliquot identifiers present in the clinical data. Eventually, 377 patients with gene expression data from 20,532 genes were included in the final analysis. Differential expression analysis Gene expression differences were evaluated across the disease stages using linear regression. The standard deviation of the gene expression level for each gene was computed. The genes with standard deviation of zero, which indicates no change in the gene expression, were removed from further analysis. To select top genes that are differentially expressed across can- cer stages, a linear regression model was performed for each gene to test the trend in gene expression with increasing cancer stages. The analyses adjusted for age, gender and race/eth- nicity of the patients. Genes with p�0.0001 were considered suggestive and the expression level by cancer stages were presented for these genes. Analyses were performed using R version 3.3.1 and SAS 9.4 (Cary, NC). Principal component analysis In order to identify gene expression pattern of the selected CRC samples across different stages, all the genes with p�0.001 in the linear model analysis were included in the principal component analysis using SAS. Ten principal components (PCs) were identified and the first two PCs were plotted according to the staging status of the CRC patients. Gene annotation and gene set enrichment analysis Genes with expression difference of p� 0.0001 were evaluated further in gene annotation using DAVID [15]. Then the gene IDs and official gene names were used for further analysis. ConsensusPathDB tool [16, 17] was then used to perform network-based and pathway-based analyses on

instructions-ahb5g03b-r1utukof.docx article-one-wb3unpbf-1arndcja.pdf article-two-kbaxnpbv-jdfcxikw.pdf instructions-0flkrbnk-oyn4j5wj.docx

Pushpendra · Accepted Answer

Age Of Diagnosis For TCGA Patient by Cancer Type  
Abstract 
Cancer represents a significant challenge for humankind, as early diagnosis and treatment 
are difficult to achieve. BMI was used to categorize each person as underweight, normal 
weight, overweight or obese. Two- and five-year survival rates were applied to estimate the 
prognosis for each cancer type. All data were statistically analyzed. We identified that males 
were more susceptible to lung, liver and skin cancer when compared with females, whereas 
females were more susceptible to thyroid, breast and adrenal cortex cancer. High BMI (>25) 
was positively associated with the occurrence of cancer, although patients with high BMI at 
the time of initial diagnosis had higher two/five-year survival rates. The survival rates for 
cancer were positively correlated with the age at initial pathologic diagnosis. Some types of 
cancer were associated with particularly young ages of onset, including adrenocortical 
carcinoma, cervical and endocervical cancers, brain lower grade glioma, pheochromocytoma 
and paraganglioma, testicular germ cell tumors and thyroid carcinoma. Hence, the early 
diagnosis and prognosis for these cancers need to be improved. In conclusion, sex, BMI and 
age are associated with the incidence and survival rates for cancers. These results could be 
used to supplement precision and personalized medicine. 
Introduction 
Cancers are diseases involving the uncontrollable growth of abnormal cells that overcome 
the usual limitations to cell division. Cancer is now a relatively common disease. For 
example, there were about 90.5 million individuals diagnosed with cancer in 2015 , and 
more than 14.1 million new cases of cancer occur each year. Cancer is a leading cause of 
death worldwide, accounting for 8.8 million deaths in 2015; 15.7% of all deaths. The most 
common causes for cancer-related death are lung, liver, colorectal, stomach and breast 
cancers .

· The main idea of the project is to take two articles /papers. Take dataset from one article/paper apply the methods (for analysis) from the other article/paper on the dataset you collected from the...

Answer To: · The main idea of the project is to take two articles /papers. Take dataset from one article/paper...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment