Dr. V.S. Kontogiannis 6BUIS001W 2020/21 University of Westminster School of Computer Science & Engineering 6BUIS001W XXXXXXXXXXBusiness Intelligence – Coursework XXXXXXXXXX/21) Module leader Dr. V.S....

1 answer below »
I've got the r solutions but need a report which implements the code and discusses it


Dr. V.S. Kontogiannis 6BUIS001W 2020/21 University of Westminster School of Computer Science & Engineering 6BUIS001W Business Intelligence – Coursework 1 (2020/21) Module leader Dr. V.S. Kontogiannis Unit Coursework 1 Weighting: 50% Qualifying mark 30% Description Show evidence of understanding of various Business Intelligence concepts, through the implementation of clustering & forecasting algorithms using real datasets. Implementation is performed in R environment, while students need to perform some critical evaluation of their results. Learning Outcomes Covered in this Assignment: This assignment contributes towards the following Learning Outcomes (LOs): • LO3 review the recent business intelligence tools to carry out critical evaluation on methodologies and technologies available for information retrieval, pattern recognition and knowledge discovery; • LO4 apply contemporary business intelligence technologies in order enable users to view data patterns by deploying various tools; Handed Out: 05/10/2020 Due Date 05/11/2020, Submission by 13:00 Expected deliverables Submit on Blackboard only a pdf file containing the required details. All implemented codes should be included in your documentation together with the results/analysis/discussion. Method of Submission: Electronic submission on BB via a provided link close to the submission time. Type of Feedback and Due Date: Feedback will be provided on BB, on 26th November 2020 (15 working days) BCS CRITERIA MEETING IN THIS ASSIGNMENT • Problem solving strategies • ‘Knowledge and understanding of mathematical and/or statistical principles’ Assessment regulations Refer to section 4 of the “How you study” guide for undergraduate students for a clarification of how you are assessed, penalties and late submissions, what constitutes plagiarism etc. Penalty for Late Submission If you submit your coursework late but within 24 hours or one working day of the specified deadline, 10 marks will be deducted from the final mark, as a penalty for late submission, except for work which obtains a mark in the range 40 – 49%, in which case the mark will be capped at the pass mark (40%). If you submit your coursework more than 24 hours or more than one working day after the specified deadline you will be given a Dr. V.S. Kontogiannis 6BUIS001W 2020/21 mark of zero for the work in question unless a claim of Mitigating Circumstances has been submitted and accepted as valid. It is recognised that on occasion, illness or a personal crisis can mean that you fail to submit a piece of work on time. In such cases you must inform the Campus Office in writing on a mitigating circumstances form, giving the reason for your late or non-submission. You must provide relevant documentary evidence with the form. This information will be reported to the relevant Assessment Board that will decide whether the mark of zero shall stand. For more detailed information regarding University Assessment Regulations, please refer to the following website:http://www.westminster.ac.uk/study/current-students/resources/academic-regulations Instructions for this coursework During marking period, all coursework assessments will be compared in order to detect possible cases of plagiarism/collusion. For each question, show all the steps of your work (codes/results). In addition, students need to be informed, that although clarifications for CW questions can be provided during tutorials, coursework work has to be performed outside tutorial sessions. Coursework Description Clustering Part In this assignment, we consider a set of observations on a number of silhouettes related to different type of vehicles, using a set of features extracted from the silhouette. Each vehicle may be viewed from one of many different angles. The features were extracted from the silhouettes by the HIPS (Hierarchical Image Processing System) extension BINATTS, which extracts a combination of scale independent features utilising both classical moments based measures such as scaled variance, skewness and kurtosis about the major/minor axes and heuristic measures such as hollows, circularity, rectangularity and compactness. Four model vehicles were used for the experiment: a double decker bus, Chevrolet van, Saab and an Opel Manta. This particular combination of vehicles was chosen with the expectation that the bus, van and either one of the cars would be readily distinguishable, but it would be more difficult to distinguish between the cars. One dataset (vehicles.xls) is available and has 846 observations/samples. There are 19 variables/features, all numerical and one nominal defining the class of the objects. Description of attributes: 1. Comp: Compactness 2. Circ: Circularity 3. D.Circ: Distance Circularity 4. Rad.Ra: Radius ratio 5. Pr.Axis.Ra: pr.axis aspect ratio 6. Max.L.Ra: max.length aspect ratio 7. Scat.Ra: scatter ratio 8. Elong: elongatedness 9. Pr.Axis.Rect: pr.axis rectangularity 10. Max.L.Rect: max.length rectangularity 11. Sc.Var.Maxis: scaled variance along major axis 12. Sc.Var.maxis: scaled variance along minor axis 13. Ra.Gyr: scaled radius of gyration 14. Skew.Maxis: skewness about major axis 15. Skew.maxis: skewness about minor axis 16. Kurt.maxis: kurtosis about minor axis 17. Kurt.Maxis: kurtosis about major axis 18. Holl.Ra: hollows ratio 19. Class: type of cars In this clustering part you need to use the first 18 attributes to your calculations. 1st Objective (partitioning clustering) You need to conduct the k-means clustering analysis of the vehicle dataset problem. Find the ideal number of clusters (please justify your answer). Choose the best two possible numbers of clusters and perform the k-means http://www.westminster.ac.uk/study/current-students/resources/academic-regulations Dr. V.S. Kontogiannis 6BUIS001W 2020/21 algorithm for both candidates. Validate which clustering test is more accurate. For the winning test, get the mean of the each attribute (i.e. centres) of each group. Before conducting the k-means, please investigate if you need to add in your code any pre-processing task (scaling and/or outliers detection and justify your answer). Write a code in R Studio to address all the above issues (codes/results need to be included in your report). In your report you need to check the consistency of your produced cluster outcome against the information obtained from 19th column and provide the related results/discussion (evidence of a “confusion” matrix and extracted information from it). At the end of your report, provide also as an Appendix, the full code developed by you. The usage of kmeans R function is compulsory. (Marks 50) Forecasting Part Time series analysis can be used in a multitude of business applications for forecasting a quantity into the future and explaining its historical patterns. Exchange rate is the currency rate of one country expressed in terms of the currency of another country. In the modern world, exchange rates of the most successful countries are tending to be floating. This system is set by the foreign exchange market over supply and demand for that particular currency in relation to the other currencies. Exchange rate prediction is one of the challenging applications of modern time series forecasting and very important for the success of many businesses and financial institutions. The rates are inherently noisy, non-stationary and deterministically chaotic. One general assumption is made in such cases is that the historical data incorporate all those behavior. As a result, the historical data is the major input to the prediction process. Forecasting of exchange rate poses many challenges. Exchange rates are influenced by many economic factors. As like economic time series exchange rate has trend cycle and irregularity. Classical time series analysis does not perform well on finance-related time series. Hence, the idea of applying Neural Networks (NN) to forecast exchange rate has been considered as an alternative solution. NN tries to emulate human learning capabilities, creating models that represent the neurons in the human brain. In this forecasting part you need to use an MLP-NN model to predict the next step-ahead exchange rate of GBP/EUR. Daily data (exchangeGBP.xls) have been collected from January 2010 until December 2011 (500 data). The first 400 of them have to be used as training data, while the remaining ones as testing set. Use only the 2nd column from the .xls file, which corresponds to the exchange rates. 2nd Objective (MLP) You need to construct an MLP neural network for this problem. You need to consider the appropriate input vector (time-series), as well as the internal network structure (such as hidden layers, nodes, learning rate). You may consider any de-trending scheme if you feel is necessary. Write a code in R Studio to address all these requirements. You need to show the performance of your network both graphically as well as in terms of the following statistical indices (RMSE, MAE and MAPE). Suggestion: Experiment with various network structures as well as various input vectors and show a comparison table of their performances (using these specific statistical indices). This will be a good justification for your final network choice. Show all your working steps (code & results, including comparison results from models with different input vectors and internal structure). As everyone will have different forecasting result, emphasis in the marking scheme will be given to the adopted methodology and the explanation/justification of various decisions you have taken in order to provide an acceptable, in terms of performance, solution. The input selection problem is very important. Experiment with various options (i.e. how many past values you need to consider as potential network inputs). Full details of your results/codes/discussion are needed in your report. At the end of your report, provide also as an Appendix, the full code developed by you. The usage of neuralnet R function for MLP modelling is compulsory. (Marks 50) Coursework Marking scheme The Coursework will be marked based on the following marking criteria: 1st Objective (partitioning clustering) • Find the ideal number of clusters – justify it by showing all necessary steps/methods (via manual & automated tools) 10 • K-means with the best two clusters, 10 • Find the mean of
Answered Same DayNov 02, 2021

Answer To: Dr. V.S. Kontogiannis 6BUIS001W 2020/21 University of Westminster School of Computer Science &...

Naveen answered on Nov 03 2021
127 Votes
# Installing required packages
install.packages(c('nnfor', 'dplyr','factoextra','Metrics', 'devtools'))
# Loading required packages
devtools::install_github('Saraswathi-Analytics/R/SA')
library(dplyr)
library(factoextra
)
library(nnfor)
library(Metrics)
library(SA)
# Before reading the data you need to kept the working directors where your files are saved.
# Reading vehicles data
vehicles <- as.data.frame(readxl::read_xlsx('vehicles.xlsx', sheet = 1))
# Removing dependent variable
vehicles <- vehicles[,-1]
vehicles1 = vehicles
# printing first six records of the data
head(vehicles)
# checking for missing values
Missing_C(vehicles)
# Function for replacing outliers with mean
f <- function(y)
{
out = vector(mode = 'list')
actual = data.frame(row.names = 1:nrow(y))
for(i in 1:ncol(y))
{
x=y[,i]
if((is.numeric(x) == TRUE | is.integer(x)==TRUE)==TRUE)
{
iq = IQR(x)
q1 = quantile(x)[2]
q3 = quantile(x)[4]
l = q1 - 1.5*iq
u = q3 + 1.5*iq
out1 = x[x<=l | x>=u]
name = paste(names(y)[i])
out[[name]] = out1
val = replace(x, x<=l | x>=u, mean(x,na.rm = T))
actual[[name]] = val
}else{
name = paste(names(y)[i])
actual[[name]] = y[,i]
}
}
outs = list(out,actual)
return(outs)
}
a = f(vehicles)
a[[1]]
# Storing a data
vehicles <- a[[2]]
# printing the top six records of data which is replaced with mean
head(vehicles)
# converting data into normal by applying scaling
vehicles <- scale(vehicles[,-ncol(vehicles)]) %>%
as.data.frame() %>%
round(digits = 4) %>%
cbind(Class = vehicles$Class) %>%
as.data.frame()
# printing the top six records of data
head(vehicles)
# Manually finding the best number of clusters for clustering
wss <- function(data, nc=15, seed=1234){
ws <- (nrow(data)-1)*sum(apply(data,2,FUN='var'))
for (i in 2:nc){
set.seed(seed)
ws[i] <- sum(kmeans(data, centers=i)$withinss)}
return(ws)
}
wss1 = wss(vehicles[,1:18],25)
# visualizing the best number of clusters
plot(wss1, type="b", xlab="Number of Clusters",
ylab="Within groups sum of squares")
# Automatically finding the best number of clusters for...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here