Essential Mathematics for Data ScientistsAssignment 4: Principal Component Analysis This assignment asks you to implement, a principal component analysis (PCA) using a singular value decomposition on...

1 answer below »

Essential Mathematics for Data ScientistsAssignment 4: Principal Component Analysis


This assignment asks you to implement, a principal component analysis (PCA) using a singular value decomposition on a real data set.


A real dataset


We now ask that you investigate the provided dataset which has come from the State of the Tropics report. (State of the Tropics (2017) Sustainable Infrastructure for the Tropics. James Cook University, Townsville, Australia, available atstateofthetropics.org). The data that we have used from this report is available at the Tropical Data Hub (https://research.jcu.edu.au/researchdata/default/search?query=State+of+the+Tropics&sort- field=score&sort-order=desc)


We have consolidated much of the data for the year 2010 into a file SotTCombined2010.xlsx. We ask that you analyse this data using the PCA and obtain the task requirements listed below. You will need to use MATLAB to do the computation. Write your analysis in a report, using Word, Latex, or any program you would like to use. You must submit your report as a PDF file.


The task


You are to implement the PCA using SVD, obtaining:




  1. The principle component vectors




  2. The proportion of variation explained by the principal components




  3. The matrix of scores




  4. A dimensionally reduced representation of the dataset




  5. The residuals of the reduced representation




  6. Any outliers




Your report must determine the relationships between the variables reported in the spreadsheet and the strength of those relationships. You should also identify the outlier countries, those for which the relationships the PCA identifies are not present.


In analysing the data there are several things to keep in mind:


•The data contains many missing values. You should exclude countries which contain missing values.


Assignment: Principal Component Analysis – Marking Scheme


This is to be written in a report format, as a series of paragraphs (not dot points). The report will have the following information embedded within. 50% of the marks associated with each element are associated with your description of it within the report. The remaining marks allocated are to the Matlab code for correctly producing the quantities required to go into the report. Please submit your report and the matlab code.


16 marks total:




  • Excluding countries with missing values(1 mark)




  • Creating, centring and scaling matrix of raw data for PCA(3 marks: 1 for creating, 1 for centring, 1 for scaling)




  • Performing the SVD(1 mark)




  • Identifying relationships between the variables described by the first principal component


    vector(2 marks: 1 for identifying positive relationships, 1 for negative)




  • Identifying relationships between the variables described by the second principal


    component vector(1 mark)




  • Obtaining singular values and proportions of variation(2 marks: 1 for the singular values, 1


    for the proportions)




  • Identifying proportions of variation(2 marks: 1 mark for identifying the proportions, 1 mark for identifying a cumulative total over the highest proportions – at cutoff determined by author of the report)




  • Calculating the matrix of scores(1 mark)




  • Obtaining a dimensionally reduced representation of the data set(1 mark)




  • Computing residuals(2 marks: 1 mark for residuals, 1 mark for interpretation)



Answered 1 days AfterJul 17, 2022

Answer To: Essential Mathematics for Data ScientistsAssignment 4: Principal Component Analysis This assignment...

Bhaskar answered on Jul 19 2022
72 Votes
Principle component analysis using SVD ( singular value decomposition)
To begin with first the spreads
heet is read in to a variable with the following command at the matlab
terminal.
t = readtable("Sottcombined2010.xlsx"); Since the data has missing values and estimation of values
cannot be done using the values of other country they need to be dropped.
t1=t(~any(ismissing(t),2),:); With this command the missing values are dropped. Now to carry out
computations we need to convert the data to matrix, since matrix contains only numeric data we nned
to drop the country column and then convert the table to matrix.
t_new=removevars(t1,'Country');
t_matrix=table2array(t_new);
After cleaning the data now we need the standardise it. So let’s first centre the data
mu = mean(t_matrix);%average of each column
Xmean = bsxfun(@minus, t_matrix ,mu);%data centered
Normalisation
N = normalize(Xmean);% scaling the data
SVD
[coeff,latent] = svd(cov(N));%Singular value decomposition
[latent, ind] =...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here