CSE3CI – Computational Intelligence for Data Analytics
Due Date: Monday 17th May, 9:00am, 2021
Assessment Weight: 30% of the final mark for the subject
• This is a GROUP assignment. You are permitted to work in groups of up to three. All group members
will receive the same mark. You may complete the assignment as an individual, but if you do so, you
will be marked in the same way as for a group.
Plagiarism is the submission of somebody else’s work in a manner that gives the impression that the work
is your own. When submitting your assignment via the LMS, the following announcement will appear:
Software will be used to assist in the detection of plagiarism. Students are referred to the section on
‘Academic Misconduct’ in the subject’s guideline available on LMS.
Penalties are applied to late assignments (5% of total possible marks for the task is deducted per day,
accepted up to 5 days after the due date only). An assignment submitted more than five working days after
the due date will not be accepted.
You are required to submit the following:
• A pdf format document containing your report.
• A zip file containing all of the Python code that you used for the assignment.
These documents are to be submitted electronically via the Learning Management System.
In the case of group submissions, only one member of the group should submit, and the cover page of the
report must contain the full name and Student ID of all group members.
In the case of solo submissions, ensure your name and Student ID is on the cover page.
You will also be required to do a short oral presentation of your report (10 minutes max ) during the
scheduled lab class in Week 11. Depending on how many submissions are received, it may be necessary
to schedule a second lab class during that week or week 12.
Problem Description – Forecasting Electricity Prices
The problem is to forecast electricity price based on historical data. Let the temperature and total
demand of electricity at time instant t be T(t) and D(t) respectively. The goal is to predict the
recommended retail price (RRP) price by using some historical data as system inputs. The historical data
set consists of the following variables: T(t-2), T(t-1), T(t), D(t-2), D(t-1), D(t). The output should be a
prediction of the Recommended Retail Price (RRP) of electricity at the next time instant t+1, denoted by
You have been provided with real-world electricity pricing data from Queensland, Australia. There are
two datasets: a training set, to be used for model development; and a test set, to be used to evaluate the
performance of your models. Each dataset has the same structure. Rows correspond to successive time
instants, and contain seven values: the predictor variables T(t-2), T(t-1), T(t), D(t-2), D(t-1), D(t), and the
target variable P(t+1). The objective is to predict the value of P(t+1) on the basis of one or more of the six
There are five parts to the assignment, described below, with the approximate assessment weighting.
Parts 1, 2 and 3 are based on content that has been covered up to then end of Week 5. Content for Part 4
will be covered in Week 6 and 7.
Part I – Data Preparation (approx. 5%)
The performance of many systems can be improved through careful preparation of the data. Visualising
the electricity prices will reveal that there are potential outliers1 in the dataset; i.e., observations that lie
an abnormal distance from other values in a random sample from a population.
• Use an appropriate technique to identify and remove outliers of the output variable from the
datasets (for both training and test sets).
• Provide a plot showing the price data before and after the removal of outliers.
Part 2 – Linear Regression Models (approx. 8%)
Linear regression is often a good baseline against which to compare the performance of other models.
• Apply linear regression to the prediction of electricity prices.
• For both the training and test sets, provide the Average Relative Error.
• For both training and test sets, produce a plot showing, for each data point, how the predicted
price compares with the actual price.
Part 3 – Multilayer Perceptron Models (approx. 27%)
Multilayer perceptrons can sometimes yield better performance over linear models.
• Experiment with the application of MLPs to predicting electricity prices. You should try varying
MLPRegressor parameters such as the regularization coefficient, the number of training epochs,
1 You can read more about outliers here: http://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm,
and the number of hidden units. Make sure that you record the training error and test error in
each case. It is suggested that you use logistic units in the hidden layer, but you can use others if
• Provide results for three different MLPRegressor parameter settings.
− one of these should be the result for the best performing MLP that you were able to train;
− one should clearly demonstrate underfitting;
− one should clearly demonstrate overfitting.
For each of these cases, provide the learning parameters that you have used, as well as the
training error and the test error.
• For the best-performing MLP, for both training data and test data, produce a plot showing, for
each data point, how the predicted price compares with the actual price.
Part 4 – Fuzzy Forecasting System (approx. 40%)
For this part, you will develop a fuzzy forecasting system for predicting the electricity price.
(You will learn about fuzzy inferencing systems in Weeks 6 and 7)
• Select appropriate values or fuzzy subsets for the linguistic variables that you will use in your
• Apply statistical analysis (correlation coefficients) and heuristics to develop a set of fuzzy rules;
• Implement your fuzzy system in Python, and produce clear plots of all membership functions
involved in your system;
• Evaluate the system performance in terms of the average relative error on both training and test
You may use either Mamdani-type or Sugeno-type inference, but you should include some justification
for your decision.
Part 5 – Report and Presentation (approx. 20%)
This is the assignment ‘deliverable’; i.e., what you are required to submit. It should contain your results
from Tasks 1 to 4, put together in a clear and coherent manner. It should also clearly describe how you
conducted your investigation and any design choices you made (e.g., What parameters did you
experiment with when applying the MLP?, What different membership functions did you experiment
with in creating your fuzzy system?, Why did you opt for Mamdani-type inference as opposed to Sugeno-
type inference?, and so on). Basically, the more thorough and systematic your analysis, the better. A
summary of your overall findings should also be provided in the report.
Approximate marks for each of Parts 1 to 5 have been indicated above. The marks for Parts 1 to 4 are
based on correctness and completeness of the tasks specified. The 20% allocated for Part 5 will be based
how clearly and coherently the report and presentation have been presented; the description and
justification they provide for the design choices that have been made; the evidence they provide of
systematic experimentation with different system parameters; the conclusions they make in regard to the
use of the various approaches in predicting electricity prices.