My name is Rishi Chandran. I’m currently finishing up my 4th year at IUPUI, where I’ll graduate with a Master’s in Applied Data Science & a Bachelor’s in Sports Management. Throughout the past 4 years, I’ve been lucky enough to take advantage of numerous opportunities in Indianapolis put forth by Dr. Pierce & Dr. Sherman with the goal of gaining experience in the sports industry, as well as in the field of data science, in order to put those skillsets together & apply them to basketball analytics. This past fall, Dr. Pierce approached our group with an opportunity to compete in Baylor University’s National Collegiate Sports Analytics Competition (NCSAC), where students had the opportunity to analyze either a business or basketball dataset to identify and present their insights. I chose the basketball analytics track, where students were given a 19,000 row dataset representing 90 Big 12 games from the 2018-19 season.
For me, this was the perfect opportunity to compile all of my experiences over the past 4 years and apply what I had learned in a basketball context. Now that I’m reflecting on my experience, I was not only able to do that, but I also learned so much more about different machine learning methods, and how to apply them to basketball in an actionable manner. Check out my presentation, and feel free to reach out with any questions or thoughts!
Figure 1. Identifying a Problem Space
In the fall of 2021, I was able to intern with the Indiana Pacers Basketball Operations department, and one thing I noticed was how many hours of film that scouts watch to create scouting reports, whether to prepare for an upcoming game, or to evaluate a player. My goal with this project was to use machine learning to learn from the 90 games worth of data, and to identify tendencies, strengths, and weaknesses for opponents, and use that information to game plan against that opponent.
Figure 2. Using Machine Learning to Find Points of Emphasis
Here I used Decision Trees to identify team trends in terms of wins & losses. The inspiration for this approach was from watching the 2017-18 Michigan Wolverines, where they were infamously 28-0 in games where Duncan Robinson scored at least 6 points. In this example, we’re looking at the 2018-19 TCU Horned Frogs, who went 7-11 in Big 12 play. On the left, we can see that in games where Desmond Bane is held to less than 21 points, TCU went 0-8, and they were 4-0 in games where he scored 21+ points. Additionally, when Alex Robinson shoots less than 55% from 3, TCU went 0-6. These data-driven points of emphasis are intended to identify the ways that other teams have beaten the opponent, so that coaching staffs have a starting point for defensive game planning.
Figure 3. Using K-Means Cluster Analysis to Establish Play Styles
Because the dataset didn’t establish positions for each player, I used K-means clustering to establish offensive and defensive roles for each player. Average Shot Distance & the Standard Deviation of Shot Distance were used to determine where on the court a player plays. The lower the Average Shot Distance, the closer to the basket a player plays, and the higher the standard deviation, the more versatile a player is. By this, I mean that if a player has a high Standard Deviation of Shot Distance, they take/defend shots both from the perimeter & in the paint. For defensive clusters, I also included Defensive Impact, a metric I created that rewards defenders for forcing good shooters to miss, and penalizes them more for allowing poor shooters by score. Additionally, defenders are penalized less if they allow a good shooter to score, but rewarded less for stopping poor shooters.
Figure 4. Putting it all Together
I then compiled this information, along with charts summarizing TCU’s offensive tendencies and defensive weaknesses to create a scouting report. The leftmost chart shows the different methods that TCU uses to score, in terms of frequency. From this, we can see that they most often go to the 3pt shot, pick & roll, and score in transition. In the middle, the radial chart shows where TCU Opponents were most successful in terms of Points Per Possession (PPP). Based on the scoring methods that TCU’s defense was most vulnerable against (passing to the roll man on a pick, cuts to the basket, and in transition), I identified the Baylor lineups that had the highest PPP for those play types on the bottom of the report. The GIF below is an example of a TCU opponent scoring off of a pass to the roll man on a screen:
This instance shows Kamau Stokes anticipating TCU’s hard hedge, and by getting around it quick enough, he’s able to find Dean Wade on the roll. This provides further context as to how teams are taking advantage of the roll man against TCU, and provides a point of emphasis to focus on in practice: anticipating the hard hedge. While solely-analytical scouting reports lack the context of film breakdown, they can be built & sent out to coaching staffs further in advance, which gives them more time to prepare for defensive strategies like TCU’s hard hedge.
Figure 5. Desmond Bane
Based on the Decision Trees, I identified stopping Desmond Bane from scoring as a key to victory – so how are we going to do that? The radial chart on the left shows Desmond Bane’s shot selection based on frequency, showing that the large majority of his shots come from jumpers. However when we look at the chart on the right (Bane’s PPP based on shot type), we can see that his most efficient shots are dunks, runners/floaters, and tip-ins, which means that we shouldn’t ignore him as a threat in the lane. The middle bar chart summarizes his performance against each of the identified defensive clusters, showing that he struggled the most against Elite Perimeter Defenders. The Baylor coaching staff can then make a note that the two Elite Perimeter Defenders on their roster are Devonte Bandoo & Tristan Clark, and have them guard Bane.
Figure 6. Projecting the Impact of the Gameplan
Next, in order to quantify the impact of the gameplan suggestions, I used a linear model to project Bane’s PPP based on the player guarding him. Then, by breaking down how he gets to his 16.1 PPG, and adjusting for Bane’s PPP against Devonte Bandoo & Tristan Clark, I estimated an 8.8% decrease in his projected points. Additionally, I took the average PPP for the suggested 8-man Baylor rotation for each play type that TCU has struggled to defend, and used those values along with Bane’s projected points to predict TCU’s win probability in a game against Baylor using logistic regression.
Figure 7. Probability of Beating TCU
Based on season averages for Desmond Bane Points, Opponent PPP in P&R Roll Man plays, Opponent PPP in Cut to Basket plays, and Opponent PPP in transition, the model gives TCU a 46.8% chance to beat Baylor. When using the projected values based on our gameplan, however, the model gives TCU a win probability of only 15.4%.
In Summary
Using the 90-game dataset provided by Second Spectrum. I was able to use Machine Learning & Descriptive Statistics to create actionable scouting reports focused on finding strategies that will give a team a better chance of winning. Through this experience, I not only got to learn about machine learning and game preparation, but I also got to set the groundwork for future research & ways to improve on this project. By creating a shot quality model based on an opponent, or doing deeper into the tendencies of each lineup, basketball analysts can uncover even more actionable insights. Thank you for taking the time to read about my project, and feel free to reach out!
**Rishi finished in the Top 4 out of 73 participants at the event
Leave a Reply