Visualizations

Quartile Boxes for Charting in Pitch Movement Visualizations

As I was looking at my PitchLogic movement visualizations that I’m using in coaching youth pitchers, I realized that I could use the quartile values to display how consistent their movement has been. The code is not that complex, so I wanted to make sure to share it. I imagine it will have considerably more uses than just my hobbyist one.

Defining the function

As my Python code for pitching assessment gets more complex, I decided to start breaking out pieces as functions. I had been using cols to create multiple charts of the various pitch types, but having a chart for each pitch type makes more meaningful charts.

For clarity, I made sure to add a good docstring…

def movementCharting (player_df, type):
""" create a chart of horizontal and vertical movement with a box around 'quantile area' showing where pitches go (separate function for creating a table of all pitch types and movement)

Args:
player_df (DataFrame): PitchLogic DataFrame with 'Horizontal Movement (in)' and 'Vertical Movement (in)'
type (string): pitch type to chart

Return:
filename (String): filename of PDF in which the image is stored
"""

Building the Chart

It only takes a few lines to build the chart using Seaborn. We’ve gotten the DataFrame that contains only pitches by one player and we trim it to only pitches of the type we’re charting.

    movement_df = player_df[player_df['Type']==type]
movefig = sns.relplot(x='Horizontal Movement (in)', y='Vertical Movement (in)', data=movement_df, kind='scatter')

Then we compute our quartile values and draw the box on the plot. I’ve commented out the lines for each mean value, since they added visual complexity without making the chart more meaningful. YMMV

    # Calculate mean and confidence intervals
    mean_horiz = movement_df['Horizontal Movement (in)'].mean()
    mean_vert = movement_df['Vertical Movement (in)'].mean()
    ci_horiz = movement_df['Horizontal Movement (in)'].quantile([0.25, 0.75])
    ci_vert = movement_df['Vertical Movement (in)'].quantile([0.25, 0.75])
        
    for ax in movefig.axes.flat:
#        ax.axhline(mean_vert, color='red', linestyle='--')
#        ax.axvline(mean_horiz, color='blue', linestyle='--')
        ax.hlines(ci_vert[0.25], ci_horiz[0.25], ci_horiz[0.75], color='blue')
        ax.hlines(ci_vert[0.75], ci_horiz[0.25], ci_horiz[0.75], color='blue')
        ax.vlines(ci_horiz[0.25], ci_vert[0.25], ci_vert[0.75], color='red')
        ax.vlines(ci_horiz[0.75], ci_vert[0.25], ci_vert[0.75], color='red') 

A couple of lines to put in the title in an appropriate spot….

    movefig.fig.subplots_adjust(top=0.85)  # Adjust the top to make space for the title
movefig.fig.suptitle(type + ' Movement Profile for ' + playerDisplay, y=0.90) # Move the title upward

Returning from our function

As noted in the docstring, we’re returning the filename as the value from the function. That gets dropped into an array so that it can be processed using Automate PDF Creation from Data Visualizations with Python

    filename = playerfigStorage + '/' + type + ' Movement Profile.jpg'
movefig.savefig(filename)
return filename

Conclusion

Visualizations aren’t hard to create using Python and there are many ways to make them more meaningful without excessive coding. As I work with my pitchers and they learn more about using this inter-quartile range box, they might pick up some technical knowledge and understanding of how to use visualization in their own lives.

This series includes:

Categories: Python, Visualizations | Tags: , , , , , , , , , , | 4 Comments

Visualizing Baseball Pitching: Loading and Wrangling PitchLogic Data using Python

In my hobbies, I’ve always used my technical skills to support the endeavor. Web pages for the Boy Scout Troop, an online database for the school painting & landscaping projects, spreadsheets to manage the baseball rosters, radar to measure throwing speed, and email to keep everyone in the loop. So, it will come as no surprise to anyone I’ve encountered in those that I’m taking data visualization into my baseball coaching. This also gives me a chance to break down the process here and provide an opportunity for others to learn from and extend the code. This is part of a series of posts, with snippets delivered in manageable chunks.

Learning on DataCamp

As I was looking around for ways to learn AI and to learn Python as a way to access it, I found DataCamp. It’s a great site for learning Data and AI. There are a variety of career tracks (Data Analyst, Data Engineer, and Data Science, among others) and you can sample without paying. Each course in a track is also broken down into individual lessons, which are just a few minutes long. This allows you to learn in digestible chunks. It also will re-cap your progress when you reopen a course that you’ve already started.

You also get 3 workbooks on the DataLab that practice with data from files, Oracle, and SQL among others. You can use Python or R to manipulate it or ask the AI to do something (or start something for you). I was initially only using it for the exercises, then started playing with my baseball pitching files.

It’s usually $330 a year, but sometimes can be secured at a discount for one person or for a team (mine was half off). Well worth your money!

Basics

My PitchLogic baseball records a lot of information about every pitch, examining the pitcher’s delivery, the release, and the movement of the ball as it travels to the plate. The ball is identical to a Major League baseball in weight and covering, but instead of cork in the center, there are electronics. This means that we never hit the ball, just throw it.

You can export the data from the app, getting a CSV file via email with all the pitch data for the selected date range. I was always opening this in Excel and manipulating it to try to see patterns and to learn what “good” numbers were for age groups and individual pitchers. I had all kinds of formulas for highlighting cells that looked like “good” numbers. With the data being in Excel, I could have created some charts to better understand the data, but it was time-consuming and I wouldn’t know where I’d find value until I put in the work.

I started using ChatGPT to create some charts based on the data in the same way it was done in the app and in session reports on the web. I was including data from all sessions, so it was a little different. I also looked at data from a few seasons ago, which had long left my app but was stored in spreadsheets.

ChatGPT lets you look at the Python it creates to do what you want it to do. So, I started modifying that and reusing it for different players. Somehow, I found DataCamp and then started using DataLab to examine my data in more detail and practice the Python that I was learning.

When I imported the data, I realized that I could use my newly acquired skills and wrangle the data a little. When I was doing all my analysis in Excel, I had some formulas to create new column values, but I did a lot of copying and pasting. By using Python, I can easily apply those to any spreadsheet I drop in.

First, I just wanted to create a Player Name field so I didn’t have to spend time checking both first and last name. Then I wanted to parse my Memo column to separate out location and intended pitch type.

import pandas as pd

# Load the CSV file into a DataFrame
pitchLogic = pd.read_csv('PitchLogic.csv',parse_dates=["Date"])

# Combine names to Player Name
pitchLogic['Player Name'] = pitchLogic['First Name'] + ' ' + pitchLogic['Last Name']
# Split the memo field between location and intended pitch type
pitchLogic['Location'] = pitchLogic['Memo'].str[:1]
pitchLogic['Intended Type'] = pitchLogic['Memo'].str[2:4]

The Arm Slot value is the angle on a clock face for the pitcher’s arm as he’s throwing. The Spin Direction is the same for the direction in which the ball is spinning. An arm slot of 1:30 would be throwing with your arm at a 45 degree angle to your torso. A spin direction of 12:00 would be a ball spinning forward perpendicular to the ground. I want to compute the difference between these two as a way of evaluating the pitch. For a fastball, you usually want them as closely aligned as possible. For accuracy, you want them to be the same every time you throw a fastball, so that it’s predictable.

Computing the difference between the Arm Slot and Spin Direction is only easy when you’re doing it manually. If you do it in Excel or HCL Notes, it can be a rather complex formula. In Python, it’s not bad at all once you convert it to a time.

from datetime import datetime

# Skip rows that are missing Arm Slot or Spin Dir
pitchLogic = pitchLogic.dropna(subset=['Arm Slot', 'Spin Dir'])

# Convert 'Arm Slot' and 'Spin Dir' to datetime objects
pitchLogic['Arm Slot'] = pd.to_datetime(pitchLogic['Arm Slot'], format='%H:%M')
pitchLogic['Spin Dir'] = pd.to_datetime(pitchLogic['Spin Dir'], format='%H:%M')

# Compute the slot difference in minutes
pitchLogic['Slot Diff'] = (pitchLogic['Arm Slot'] - pitchLogic['Spin Dir']).dt.total_seconds() / 60

# Adjust the slot difference if it is less than -360
pitchLogic.loc[pitchLogic['Slot Diff'] < -360, 'Slot Diff'] += 720

# Convert the slot difference to integer
pitchLogic['Slot Diff'] = pitchLogic['Slot Diff'].astype(int)

# Format 'Arm Slot' and 'Spin Dir' to display only time as HH:MM
pitchLogic['Arm Slot'] = pitchLogic['Arm Slot'].dt.strftime('%H:%M')
pitchLogic['Spin Dir'] = pitchLogic['Spin Dir'].dt.strftime('%H:%M')

There is some old data in my PitchLogic files, from before they had auto-tagging of pitch type. For simplicity, and since much of my old data was from very young pitchers who only have one pitch (fastball), I defaulted everything to FF (four-seam fastball). Many of the columns in the CSV export from PitchLogic have column names that have changed over time and may continue to change, so I have an example of how to rename one.

# Check if 'Type' column exists before trying to fill NaN values
if 'Type' in pitchLogic.columns:
pitchLogic['Type'] = pitchLogic['Type'].fillna('FF')
else:
pitchLogic['Type'] = 'FF'

# Rename the 'Speed (mph)' column to 'Speed'
pitchLogic.rename(columns={'Speed (mph)': 'Speed'}, inplace=True)

I enter pitch location data into the Memo field, and also include the pitch type that the player intended if that differs from the auto-tagging. It is very common for players under 12 to have pitch type differ on some pitches, but not others. For high school pitchers, they tend to stay in the same pitch type, but may think it’s something different. At one assessment, a very good HS pitcher was throwing sliders and cutters, thinking he was throwing curveballs and fastballs.

Summary

This should give a good idea of how to import and wrangle the PitchLogic data, or any data of your own, in order to prepare some visualizations. Keep an eye on this blog for other pieces in this series to help understand visualizations using Python.

This series includes:

Categories: Python, Visualizations | Tags: , , , , , , , , , , , | 4 Comments

Automate PDF Creation from Data Visualizations with Python

When creating some really great visualizations, I wondered how I could create a bunch of them and not be overwhelmed by the process of exporting them to PDF files to share with others. So I explored a few options and found that img2pdf was most suitable: always lossless, small, and fast. It allows me to loop through my data, creating charts, move them to individual PDFs, and then combine them into multi-page PDFs.

I’m using DataLab by DataCamp, where I’m learning Python, Data Science, and AI. So, some of my code may rely on that environment and YMMV. Installing and importing img2pdf was very straightforward for me.

!pip install img2pdf
import img2pdf

It turned out to be pretty simple to loop through my CSV data using one of the columns to get data by player and then create each graph, saving it as a PDF, then combining then as multi-page PDFs by player and by category.

I created a directory structure for the files, with a Fig Storage directory for all the individual PDFs and a directory for each team and year. This allows me to scale it to handle data at volume, letting me focus on analyzing that data instead of being bogged down in copying and pasting.

Within each loop, it creates an empty array, imagefiles, in which all filenames are placed, so that those files can be copied into the summary PDFs once the charts have all been generated. Outside the loop, there is another array, byDateArrayFiles, for storing all of the filenames to be bundled together for a ‘Velocity by Date’ file.

Here’s a sample of the loop with only two charts created. I have 8 different ones created for each player, but that would be excessive. This gives you the idea.

season = '2025'
team = 'ICI'
byDateArrayFiles = []
playerNames = pitchLogic['Player Name'].unique()
for player in playerNames:
player_df = pitchLogic[pitchLogic['Player Name'] == player]
imagefiles = []

bydatefig, bydateax = plt.subplots()
bydateax = sns.lineplot(x='Date', y='Speed', data=player_df).set(title='Velocity for ' + player)
bydatefig.autofmt_xdate(rotation=75)
filename = "Fig Storage/" + player + ' VelocityByDate.jpg'
bydatefig.savefig(filename)
byDateArrayFiles.append(filename)
imagefiles.append(filename)

only100_player_df = player_df[abs(player_df['Slot Diff'])<=100]
only100_player_df.loc[only100_player_df['Location'] != 'K', 'Location'] = 'BB'
slotfig = sns.relplot(x='Horiz Mvmt', y='Vertical Mvmt', data=only100_player_df, kind='scatter', hue='Slot Diff', size='Location', style='Type').set(title='Slot Difference and Movement Profile for ' + player)
filename = "2025 Samples/" + player + ' SlotMovement.jpg'
slotfig.savefig(filename)
imagefiles.append(filename)

with open(season + " " + team + "/" + player + ".pdf", "wb") as pdf_file:
pdf_file.write(img2pdf.convert(imagefiles))

with open(season + " " + team + "/Velocity By Date.pdf", "wb") as pdf_file:
pdf_file.write(img2pdf.convert(byDateArrayFiles))

This all saved me loads of time and headaches generating the charts. It lets me quickly explore whether my visualizations are meaningful. It also makes modifying them or updating them very easy. I can put a season’s worth of pitches into the system and have a suite of charts for each player a minute later.

This series includes:

Categories: Python, Visualizations | Tags: , , , , , , , , , , , | 5 Comments

Blog at WordPress.com.