NFL Project Analysis The goal is to select appropriate models and model specifications, and apply the respective methods to enhance data-driven decision making related to the business problem. Format:...

1 answer below »


NFL Project Analysis


The goal is to select appropriate models and model specifications, and apply the respective methods to enhance data-driven decision making related to the business problem.




Format:RMarkdown (word) – using RStudio




Datasets:


Starts with an initial data collection which we have four datasets as follows below:


NFL Arrests 2000-2017:


https://www.kaggle.com/patrickmurphy/nfl-arrests


NFL Trends Over Time:


https://www.kaggle.com/dasbootstrapping/nfl-trends-over-time


NFL Passing Statistics:


2009-2018https://www.kaggle.com/omzqwonxei/nfl-passing-statistics-20092018


Detailed NFL Play-by-Play Data:


2009-2018 https://www.kaggle.com/maxhorowitz/nflplaybyplay2009to2016




Requirements:


oThe paper should be 8-10 pages in length not including figures and tables.


oStart with a one paragraph abstract, followed by an intro/background of the problem, methods, results, discussion/conclusion and acknowledgments, references, in that order.


oList the resources and reference all sources you used to complete the project.


oAnalysis Requirements need predictive modeling and graphing: (display historical facts and predict in the next couple of years):


A.High School vs. College players: who performs better in scoring and yards


B.Who is more likely to be arrest and during what year High School vs. College players


C.Percentage of completions per attempt


D.Average yards gained per attempt


E.Percentage of touchdown passes per attempt


F.Percentage of interceptions per attempt


G.Average Production vs. Experience for NFL Players:example is on Kaggle


H.Do the more passes quarterback throws improve game winning stats





Answered Same DaySep 22, 2021

Answer To: NFL Project Analysis The goal is to select appropriate models and model specifications, and apply...

Pritam answered on Sep 26 2021
130 Votes
Untitled
Untitled
Untitled
26 September 2019
Required libraries:
library(rpart.plot)
library(caTools)
library(ggpubr)
library(scatterplot3d)
library(reshape2)
library(lubridate)
library(scales)
library(plotly)
library(shiny)
library(tidyverse)
library(feather)
library(readr)
library(dplyr)
library(ggplot2)
library(ggthemes)
library(gganimate)
library(gifski)
library(png)
library(transformr)
library(treemapify)
Abstract:
The full research has been done on some particular data set regarding the players of the NFL from 2000 to 2017. One can expect
the motive of this project is to give some useful information to the team management or any team franchise who are keen to take some player on their fitness or criminal record basis. The clearer the player is the more the chance of the team management to consider him as a potential player, specifically, a valuable and dedicated team of the team. Not only just team management requires such information to choose a player, there are huge betting or pools come along in the season of upcoming matches and hence a clarified and informative research on the players could be helpful to lots of areas. The passing stats, the criminal record, playing strategies lots of data have been taken regarding this and hence required huge determination to come along with some models which could explain the scoring potential of a team or a particular player. The visualization and the predictive modeling part has been introduced sequentially to have a better grip of the data and further interpretability of the results.
Introduction:
Its football season, people are gearing up for weekly games and some are participating in football pools. Data can help with football pools, in fact, it can show football fans statistically how good or poorly their teams perform due to injuries or environmental elements. Do fans purely based their information on the previous season stats including core efficiency rates, turnover rates, penalty rates, and who was selected in the current NFL draft? NFL and MLB team’s employee data scientist to track statistical information daily, convert it to insights and send it upstream to stakeholders. This is the goal of our project to understand what they analyze and how to not just replicate the storytelling but improve the element they may not include or thought of analyzing.
Methods:
Basically, the methods we have applied here are mainly visualization and regression for predictive modeling. Regression analysis is the statistical method which actually determines the impact of other variable on a particular variable called response variable. The predictor variables that impact or explain the variance of the response variable are taken through a procedure. There are different measures for evaluating the model and they are R-squared, goodness of fit, etc. but before getting into the analysis one should be careful about the assumptions of the regression models. The assumptions are very vital to remember since violating the assumptions one could have serious implications in the inference part and thus the results can’t be relied upon as they might be erroneous.
NFL Arrest data analysis:
Firstly, we proceed to analyze the arrest data analysis and the best way to do so is to check the data by visualization technique. So, the top 25 teams have been arranged based on the arrest activities.
d1 = read.csv("ArrestIncidents.csv", header = T)
d1$DATE = as.Date(d1$DATE, "%m/%d/%Y")
f1 = as.data.frame(table(d1$TEAM))
f2 = head(arrange(f1,desc(Freq)), n = 25)
head(f2)
## Var1 Freq
## 1 MIN 49
## 2 DEN 47
## 3 CIN 44
## 4 TB 36
## 5 TEN 36
## 6 IND 35
f3 = rename(f2, Team = Var1, Arrest_count = Freq)
p1 = ggplot(f3, aes(x = reorder(factor(Team), -Arrest_count), y = Arrest_count))+
geom_bar(stat = "identity", fill = "#FF6678") +
ggtitle("Frequency of Arrest by Team")+
xlab("Teams")
p1
From the graph one can see that Minnesota is at the topmost position as far as criminal activity or the arrest incidents are concerned for the NFL teams. New York, on the other hand, can be seen to pop up at the end section of the first 25 teams. Also, the topmost team has a frequency rate of 49 as and that for the other top three teams seem to be quite close to that.
Arrests based on position:
f4 = group_by(d1, POSITION)
f5 = summarise(f4, count = n())
pos_plot = ggplot(f5, aes(x = POSITION, y = count)) + geom_bar(stat= "identity", fill = "#CC99CC") + ylab("Count of Arrest") +ggtitle("Frequency of Arrest by Position")
pos_plot
From the graph it is quite obvious that WIDE RECEIVERS get arrested the most. But this result seems to be biased for some obvious reasons as the NFL teams contain players who are mostly Wide Receivers. So, one could check the percentage frequency of for the position based analysis.
The year when the most arrest took place:
Since the data contains date as a potential attribute, one could have the question in mind that how time actually impacts arrest in some way. The most important thing that comes into mind when calculating such time series related things, is the trend present in the data. Since the arrest data only contains the date as a single parameter we need to extract the year and month for the analysis.
d1$YEAR = format(as.Date(d1$DATE, format="%m/%d/%Y"),"%Y")
f8 = group_by(d1, YEAR)
f9 = summarise(f8, count = n())
year_plot = ggplot(f9, mapping = aes(x = YEAR, y = count, group = 1)) +
geom_line(color = "#993399", size= 2) +
geom_point(color = "#FFCC00", size = 3) +
ggtitle("Arrest by Year")
year_plot
One can see that there are almost 70 arrests throughout the year of 2006 and that is the highest also for the range of the analysis...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here