Project Assigment as atachedproject_proposal.docx PROJECT PROPOSAL The following dataset...

Question

Project Assigment as atachedproject_proposal.docx PROJECT PROPOSAL The following dataset https://www.kaggle.com/taranmarley/perth-temperatures-and-rainfall is a time series dataset measuring daily minimum and maximum temperature and rainfall in Perth, Australia from 1944 to 2020. Proposed solution: Model and forecast the temperature to better understand the rate of climate change and encourage regulatory bodies to take action to reduce the effects of global warming.   ECON 3343/6645 BUSINESS FORECASTING  Fall 2020									 PROJECT GUIDELINES Please make sure you follow the below steps in your project. You can add more steps if you feel it necessary. Each project should be original and different from others. Your submission will consist of a written report and your R script of your code used in your report. Remember to keep the R code and written analysis separate, the R script should have all of your coding, and the written report should contain only analysis and graphs (no code).  The deadline for submission will be Tuesday December 15, 2020 at 11:59pm. All files will be submitted through Canvas under the Final Project entry listed in the assignment tab. Report Format: The format of the report will adhere to APA standards. For information on formatting a paper to APA please see the guide below: https://owl.purdue.edu/owl/research_and_citation/apa_style/apa_formatting_and_style_guide/general_format.html 1. Introduction: Introduce the data you chose to examine. What problem you are trying to solve by using this data? 2. Data:  · What is your data?  · What is the source, where did you obtain it from (give the link, source, etc.)? · What is time span of your data (ex: 2001.1 to 2016.10 daily, quarterly, annual, how many observations)? · What are the variables, what do they measure? 3. Plot your data. Put the graph for the whole period.  · What can you identify from the time series plot (trend, cyclicity, seasonality)? · If your data has multiple variables, plot a relational scatter plot between X and Y variables, is there evidence of a relationship? 4. Dividing the data: Partition the data into two sets: training, and testing. The distribution of the split is up to you, but some common splits are 70/30, 75/25 and 80/20. The training set is the data that you will use to train your models later. The test data is what will be used to measure the accuracy of your model.  5. Test if there is seasonality and/or trend in the data (hint: Decomposition, ACF function).  If there is evidence of seasonality or trend, what information does it show? If there is no evidence, comment on that as well, how would a lack of seasonality effect our model selection? 6. Establish a baseline accuracy measure. · Use a benchmark forecast model (meanf, drift, snaive, etc) to establish a baseline accuracy. 7. Use Exponential Smoothing (ETS) techniques to train your model with the training data and then check the accuracy of your model against the test data.  · If there is no seasonality or trend smooth the data with MA, moving averages.  · Use all the commands in the chapter to see which model fits the best (Holt, Holt- Winters, damped model etc.) · Which model do you prefer to use? Why? Does the composition of the data make one model more applicable than others? · Check the residuals of your preferred model. Does it look like white noise? What do the residuals indicate about your model? · Forecast for the Test Period.  · Compare your forecast to the test data. What is the forecast performance?  · Decide if this is an acceptable model to use. Why is it acceptable? 8. Use ARIMA(p,d,q) modeling to forecast. · Explain when you can use ARIMA modeling. Is the series stationary, for the Training Period? · Test if the series is stationary. What did you decide? Is the series stationary or not? · If the data is not stationary, make the data stationary (differencing). · Use the necessary command to use ARIMA(p,q) to model the series. What do you find? What is the model? What is p and q? Is it integrated(differenced)? · Check if the residuals are normally distributed? · Plot the ACF of the residuals. What do you observe? · Decide if the model is acceptable to use for forecasting?  · Forecast the test period,  · What is the forecast error? 9. Compare your accuracy of both models (ETS and ARMA)? Which model should you use to forecast with?  10.  Run ARIMA(p,q) model for the natural logarithms of the series, did you results change? (Use lambda=0 in your R command and show your results).   Microsoft Word - CS229_FinalProject_Jpao_Dsulliv2.docx  1 Time Series Sales Forecasting  James J. Pao*, Danielle S. Sullivan**   *jpao@stanford.edu, **danielle.s.sullivan@gmail.com    Abstract—The ability to accurately forecast data is highly desirable in a wide variety of fields such as sales,  stocks, sports performance, and natural phenomena. Presented here is a study of several time series forecasting  methods applied to retail sales data, comprising weekly sales figures from various Walmart department stores  across the United States over a period of approximately 2 and a half years. Significant surges in sales are  noticeable in the data during pre-holiday and holiday weeks, which present a challenge for any developed  forecasting models. The prediction models implemented herein are regression decision trees, Seasonal-Trend  Decomposition using Loess and Autoregressive Integrated Moving-Average (STL + ARIMA) models, and time- lagged feed-forward neural networks (FFNNs). In particular, the STL + ARIMA and the time-lagged FFNN’s  performed reasonably well in forecasting the weekly sales data. The best FFNN implementation, using a time-lag  value d = 4 and mean weekly sales as inputs, achieved a mean absolute error of 1252. Weekly sales for the store  departments are in the tens of thousands. It is also notable that the results achieved by the time-lagged FFNN’s  did not require any deseasonalizing of the sales data, indicating that neural networks may be able to effectively  detect and consider any seasonality during training and prediction.   ————————————————————————  1 INTRODUCTION N a world today where competitive margins are  becoming increasingly narrower and actions  must be decisive yet informed, the ability to accu- rately make forecasts is of premier importance.  This is certainly true in the forecasting of numeri- cal data such as the health of a country’s economy  or the movements of a stock market from day to  day. Forecasting is even beneficial in domains such  as environmental monitoring or sports perfor- mance, and, accordingly, much forecasting work  has been done across a broad swath of exciting  fields and disciplines.  A more traditional yet still thoroughly compel- ling application of forecasting is sales prediction,  which is the focus of this work. As markets become  more and more global and competition is ruthless,  optimizing an organization’s operational effi- ciency is of premium importance. When compa- nies must spread their resources broadly and con- sumers have a surfeit of choices, every advantage  a company can squeeze out will make a difference.  If a company can match the demand of a product  with just the right amount of supply, then there  will be no lost sales due to a lack of inventoy as  well as no costs from overstocking. Sales forecast- ing uses patterns gleaned from historical data to  predict future sales, allowing for informed  courses-of-action such as allocating or diverting  existing inventory, or increasing or decreasing fu- ture production.  This work investigates the performance of a va- riety of predictive models for the application of de- partmental sales forecasting. As a baseline method,  a regression decision tree is implemented. Then,  the more sophisticated models of Seasonal-Trend  Decomposition using Loess and Autoregressive  Integrated Moving-Average (STL + ARIMA) and  feed-forward neural networks using time-lagged  inputs were used.   2 RELATED WORK  Two currently popular approaches to nonlinear  time series prediction problems are statistical ap- proaches using ARIMA and machine learning ap- proaches using Artificial Neural Networks  (ANNs). ANNs have shown to perform well in  time series forecasting because of their ability to  accurately represent non-linear data [1]. Both of  these approaches have had success when applied  to sales forecasting and stock predictions [2].        When applied to financial data, the ARIMA  model is able to leverage the fact that financial time  series data is generally related to past values [3].  Provided there are no sudden changes in value or  behavior, an ARIMA model will also be very effec- tive for financial time series forecasting [4].  In his  2010 paper Adebiyi [4] applies the ARIMA model  to accurately forecast the Nokia stock prices.          It is important to note that the linear assump- tions of the ARIMA model have resulted in poor  forecasting models in cases of stock price predic- tion when the dataset includes values coming into  and coming out of an economic recession (chang- ing properties).  I  2       In a separate paper Adebiyi [2] implements an  ANN model and an ARIMA model to to predict  Dell stock prices. In his model comparison, the  ANN slightly outperforms the ARIMA model.  Adebiyi attributes this partially to the fact that the  ARIMA model assumes that the times series is  generated from a linear process.   3 DATASET   The dataset used was provided by Walmart Inc., an  American multinational retail corporation, for a  2014 data science competition (Kaggle).    The dataset contains historical weekly sales data  from 45 Walmart department stores in different re- gions across the United States. The training set has  421,570 samples. Each sample has the following  features: departmental weekly sales, the associated  department (81 departments, each listed as a num- ber), the associated store (listed as a number), the  store type, the date of the week’s start day, a flag  indicating if the week contains a major holiday  (Super Bowl, Labor Day, Thanksgiving, Christ- mas).  Also supplied is a corresponding set of features  for each week-store combination which includes  temperature, fuel price, CPI, unemployment rate,  and promotional markdown data.   There is no publicy available test set. Specifi- cally, the ground-truth values for the test set are  not available, so assessing each model against the  official test set must be done by making test pre- dictions and submitting to Kaggle’s online plat- form. Hold-out sets are generated from the pro- vided training samples for local validation, but for  some models (namely the neural

project_proposal.docx PROJECT PROPOSAL The following dataset https://www.kaggle.com/taranmarley/perth-temperatures-and-rainfall is a time series dataset measuring daily minimum and maximum temperature...

Get Answer To This Question

Related Questions & Answers

Submit New Assignment