About Us

Data Collection

The dataset used was collected from stats.nba.com, the official NBA statistics website. Each game from seasons 2016-2020 were analyzed and logged by our program through the use of web requests. For every game, the two teams playing were analyzed using the stats of the eight players with the most minutes played from each team.

We collected the following thirteen statistics from each player:

  • Points scored
  • Free throws made
  • Free throws attempted
  • Field goals made
  • Field goals attempted
  • Three-pointers made
  • Three-pointers attempted
  • Assists
  • Turnovers
  • Steals
  • Blocks
  • Offensive rebounds
  • Defensive rebounds
  • Personal fouls

We scaled the statistics of each player to the predicted minutes played to represent the expected production of the players given the quality of their teammates. While this is not necessary when analyzing real rosters, it is necessary to balance the performance of the players when predicting games between fictional rosters due to the limited playtime available for the players (e.g., a team consisting of eight Lebron James’s would not be expected to average over 240+ points simply due to playtime restrictions).

The individual statistics for each teams’ eight players were used to generate the thirteen team statistics.

The team statistics are as follows:

  • Total points scored
  • True shooting percentage
  • Total free throws attempted
  • Free throw percentage
  • Total three-pointers attempted
  • Three-point percentage
  • Total assists
  • Total turnovers
  • Total offensive rebounds
  • Total defensive rebounds
  • Total steals
  • Total blocks
  • Total personal fouls

Thus, a total 26 statistics are collected for each game, 13 for each team.

Creating the Model

Using the sklearn python package, we created a logistic regression model that would predict the outcome of an NBA game. Of the five seasons of data we had collected, 80% was used to train the model, and a random 20% was selected for validation. After tuning the hyperparameters for the model, we were able to reach an accuracy of about 65%, which is sufficient given the randomness of NBA games and the limited scope of information available to the model. We hope to continue improving upon the quality and accuracy of our model in the future.

Creating the Web Application

We used the Flask python package to develop our web application, and the WTForms Python package to create the form for entering the players on each team. HTML, CSS, and Javascript code was used to design the rest of the website and its functionality.

Jiebin Liang

A first-year Engineering student at Rutgers University

Anshul Mittal

A first-year Computer Science major at Georgia Institute of Technology

This began as a project for our Engineering Research class at Manalapan High School, under the Science and Engineering Magnet Program. After we graduated, we decided to continue expanding upon the idea and create a fully functional website.

The purpose of our application is to allow NBA fans to view the predicted result of NBA games, using both real and fictional rosters.

We envision most of our users using our application for one of two reasons:

  1. To unite players from around the NBA to see if their hypothetical lineup would win, either against a real NBA team or another made-up squad
  2. To predict the winner of the NBA Finals using each team's playoff roster