Independent Study on a selected area of digital computer architechture and design

Question

Independent Study on a selected area of digital computer architechture and design

New Doc 2017-11-07 Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner access-2756624-pp.pdf 2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2756624, IEEE Access 1 GPU-accelerated Features Extraction from Magnetic Resonance Images Hsin-Yi Tsai, Student, Providence University Hanyu Zhang, Assistant Professor, Providence University & Researcher, École CentraleSupélec, Université Paris-Saclay Che-Lun Hung∗, Professor, Providence University Geyong Min, Professor, Exeter University ∗ Corresponding Author Abstract—The use of a graphics processing unit (GPU) together with a CPU, referred as GPU-accelerated computing, to accelerate tasks that requires extensive computations has been the trends for last a few years in high performance computing. In this paper, we propose a new paradigm of GPU-accelerated method to parallelize extraction of a set of features based on the Gray-Level Co-occurrence Matrix (GLCM), which may be the most widely used method. The method is evaluated on various GPU devices and compared to its serial counterpart implemented and optimized in both Matlab and C on a single machine. A series of experimental tests focused on Magnetic Resonance (MR) brain images demonstrate that the proposed method is very efficient and superior to its serial counterpart, as it could achieve more than 25-105 folds of speedup for single precision and more than 15-85 folds of speedup for double precision on Geforce GTX 1080 along different size of ROIs. Index Terms—Magnetic Resonance Imaging (MRI), Gray-Level Co-occurrence Matrix (GLCM), Texture Features Extraction, GPGPU, Image Analysis, Com- puter Science. I. INTRODUCTION TEXTURAL analysis plays an important rolein image processing. The analysis procedure provides meaningful information to various tasks of image recognition, particularly for medical im- ages nowadays, like in the segmentation of specific anatomical structures, the detection of lesions, and the differentiation between pathological and healthy tissues in different organs [1]. Furthermore, with the increase in accuracy and prompt response in recognition, computer-aided diagnosis systems ex- hibits great potential as a complementary means for diagnosis [2], especially for areas with limited number of doctors. Therefore texture analysis used in different image processing techniques has gained increasing attention. A typical pipeline of texture analysis consists of region segmentation, feature ex- traction, and classification. The performance of the analysis relies on the accuracy of the classification, which eventually and greatly depends on the quality of feature extraction, which, in turn, is quite specific to the image in consideration and is computational extensive and time consuming. In this paper, we are mainly focused on accelerating the extraction of features from the magnetic resonance (MR) brain images by using a graphics processing unit (GPU). MRI has been an essential diagnostic technique for years in routine clinical practice or scientific research [3] due to its powerful, flexible, and non- invasive imaging characteristics [4]. In general, fea- ture extraction methods [5] are categorized as fol- lows: (1) structural methods, (2) statistical methods, (3) model-based methods; and (4) transform-based methods. Arguably, statistical methods may be the most suitable for characterizing tissues that have random, non-homogeneous structures [6], such as brain tissues, whose MR images show no apparent regularities. In addition, statistical textural features achieve better discrimination with same classifiers 2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2756624, IEEE Access 2 by far less number of relevant but distinguishable features in comparison to other methods of struc- tural approach or wavelet transformation [1], [7]. The most widely used statistical textural features are based on the Gray-Level Co-occurrence Matrix (GLCM) introduced by Haralick et al. [8]. The problem with statistical textural features is the ex- tensive computational cost while calculating a set of features for each region of interest (ROI), which slides over a high-resolution image. In addition, hundreds of MR images commonly record a series of slices of merely one patient’s brain, which makes the work laborious. Traditional ways of implemen- tation with a central processing unit (CPU) only can no longer meet the challenge today between the explosive increase in data (in both quantity and quality) and the needs of prompt response. Fortunately, the world is heading parallel [9]. Ac- celerators, such as GPU and FPGA, are widely used in many domains, such as Computational Finance, Climate, Data Science, Bioinformatics, Media and Entertainment, etc. [10]. Medical image processing makes no difference. Early in 2004, Tahir et al. [11] proposed the use of FPGAs to accelerate the com- putation of GLCM and Haralick texture features. They first divided the image into sub regions, and for each region, they simultaneously computed the GLCMs for combinations of four distances and four angles. Based on the GLCM, each processing unit calculates the Haralick texture features in parallel. The approximate speedup of GLCM generation is 4.75, and that of feature computation is 7.3. Markus et al. [12] proposed the use of a GPU to simul- taneously compute GLCMs and extract features for multiple microscopic images of biological cells in 2008. For the GLCM, they reformed the co- matrix to a smaller packed one to save storage and avoid useless computation; in feature extrac- tion, they designed an optimized sequence order of computation based on the dependencies among features. Equipped with these techniques, a speedup of 19 times was achieved in the best scenario. In 2012, Kotseruba et al. [13] developed an auto- mated image analysis and classification system for protein crystallization trial images to address the main computational challenges of fast generation of GLCM and extraction of associated features. They parallelized the process with OpenCL in stages to compute a set of features and simultaneously reduce the GLCM size in each stage. A speedup of around 6-62 times was obtained. The need for accelerating the generation of GLCM and computing textural features has not diminished. In 2016, Dixon et al. [14] investigated four strategies for calculating GLCM and associated features for diffraction im- ages of biological cells. The strategy called block kernel is similar to our proposed method but is only applicable to images with a low resolution of 64 gray levels and exhibits average speedup values of only 7 times for the GLCM calculation and less than 10 times for the feature calculation. In the same year, Doycheva et al. [15] used OpenCL to implement Haralick features on GPU for pavement distress detection. In their work, the speedup for the GLCM computation for images with a resolution of 256 × 256 pixels is approximately 14 times, and the speedup for the calculation of features is approximately 126 times. In 2017, Hong et al. [16] proposed a parallel algorithm that is mainly based on the operation of atomicAdd to generate GLCM and obtained a speedup of 14 times compared to its sequential counterpart. In this paper, instead of computing a single GLCM for an entire image similar to above-mentioned studies, we suggest a new paradigm of parallel method to generate GLCM and extract features based on it for many small overlapping ROIs that cover an image. A closer look into the problem re- veals that the computations of matrices and features of the underlying ROIs are not only completely independent from each other but also similar among them, which generally means that there is a lot parallelism to exploit. The paradigm of accelera- tion makes a one-to-one mapping between threads and ROIs, so that one thread takes care of one ROI. The rest of the paper is organized as follows. In section II, we concisely present the essential foundations of GPU-accelerated computing, includ- ing the hardware, namely, GPU and the Compute Unified Device Architecture (CUDA). In section III, 2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2756624, IEEE Access 3 we recall the definition of GLCM with a set of 11 extracted features. The main idea of our acceleration methods is presented in section IV. Details of the paradigm are discussed in section V. In section VI, we present and discuss the results obtained from a series of experiments on three GPU devices. Finally, section VII presents our conclusions and the possible ways for improving the acceleration performance in future works. II. GPU AND CUDA As the performance increase in CPU has slowed down since 2003 mainly due to energy consumption and heat dissipation issues, trends in processor de- velopment has shifted toward increasing the number of cores, resulting in concurrency revolution [17]. GPUs have become a highly parallel programmable processor along with the release of Open Comput- ing Language (OpenCL) and CUDA, which were established by APPLE Inc. and NVIDIA Corpora- tion, respectively. The performance and potential of GPUs have made them a powerful engine for com- putationally demanding applications [18]. In this paper, we are particularly interested in accelerating GLCM generation and feature extraction by a one- to-one mapping paradigm with CUDA C. Thus, a general introduction to NVIDIA’s GPU and CUDA is presented here. a) GPU: A typical GPU has many cores integrated on a single chip; for example, the latest generation of GPU with Pascal architecture contains 2560 cores. The CUDA cores are streaming processors (SP), which are grouped into a streaming Multipro- cessor (SM), the basic unit that can run different instructions in parallel. SPs within each SM may launch a thread concurrently to execute the same instruction but work on different parts of data. This paradigm is referred to as single instruction multiple thread (SIMT) model, which eventually leads the GPU to run in a single instruction mul- tiple data (SIMD) manner. A CUDA-enabled GPU device usually has a hierarchical organization of six different memory levels, namely, register memory, shared memory/L1 cache, local memory, constant memory/texture memory, and global memory. To fully exploit the computational power of a GPU device, appropriate levels of memory should be combined in practice. Detailed consideration will be discussed in subsection V-B0a. b) CUDA: “CUDA is a data parallel programming model that supports some key abstractions – thread blocks, hierarchical memory

unifolks_636457098106680655_113765_1.pdf unifolks_636457098106680655_113765_2.pdf

New Doc XXXXXXXXXX Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner access XXXXXXXXXXpp.pdf XXXXXXXXXXc) 2017 IEEE....

Get Answer To This Question

Related Questions & Answers

Submit New Assignment