CSI 5810 (Assignment#1)
1. Inthisexercise,youwillworkwithCensusIncome DataSetthatyoucandownload
fromthefollowinglink:
https://archive.ics.uci.edu/ml/datasets/Census+Income
Onceyouhavedownloadedthedata,youwillprepareadatavisualization
report.Feelfreetoprovideanyadditionalvisualizationthatmighthelpinbetter
understandingofthedata.Writeaparagraph aboutwhatcharacteristicsofthe
datayouseeviavisualization.
2. Thisexerciseisdesignedtomakeyoufamiliarwithmultivariatenormal
distributiongenerationandusingthegenerateddata.
a. Generate3003-dimensionalvectorsthatcomefromanormal
distributionwithmeanvectoras[121]t and3x3covariancematrixas
[5 0.8-0.3;0.83 0.6;-0.30.64]
b. Makescatterplotsofx1vsx2,x1vsx3,andx2vsx3.Explainwhatever
relationshipsyoucangatherfromtheseplots.
c. Calculatethemeanvectorandthecovariancematrixusingthe300
generatedpoints.
d. Pickany5pairs ofgeneratedvectorsandcalculatetheEuclideanand
the Mahalanobisdistancesbetweenthose pairs.
3. YouwillperformthisexerciseusingthePCA-Exercisedatapostedonthecourse
page.
Supposewe areinterestedinreducingthesix-dimensionalrecords totwo
dimensionsbymeansofprincipalcomponentanalysis.Listtheeigenvaluesand
eigenvectorsobtainedviaPCA.Determinethereducedrepresentationforallofthe
recordsand plot thereducedrepresentationintheformofa scatterplot.
Reconstructtheoriginaldataandcomputethereconstructionerror.
4. In this exercise, you will apply PCA to the Spoken Arabic Digit Dataset at the following
link:
https://archive.ics.uci.edu/ml/datasets/Spoken+Arabic+Digit
Youwillusestratifiedsamplingtoselectonly100vectors/class,andreducethetrain
datatotwodimensions [TheclasslabelsarenotusedinPCA].Listalleigenvaluesand
makeascatterplotofthetransformeddata.Showtransformeddatapointsforanydigit
pairofyourchoiceindifferentcolorsorshapes.
5. RepeatExercise#4 usingt-SNEvisualizationmethod tovisualizetheentiretrain
dataset.Commentontheresultsobtained.
Note:ThesubmissionshouldbeintheformofasinglePDFdocument.
Submissioninanyotherformatwillnotbegraded.