Author Archives: David Navarre

Unknown's avatar

About David Navarre

I'm an amateur historian and wine lover. I study both with hopes of turning my passion for them into a career.

Scraping Tabular Data from Web Pages using Python

In my continuing quest to make sense of the baseball pitching data that I’m studying, I realized that I needed the game performance of the pitchers on whom I have pitching data. This helps validate any theories about what the data from training tells us about ability.

Since we’re using the 2024-2025 Brevard College Rapsodo data, we want to add the 2025 season statistics to see if our analysis of their fall and early spring data matches with their performance.

BeautifulSoup

The package I’m using to parse the HTML is called BeautifulSoup and was created by Leonard Richardson in 2004 (and he’s maintained & upgraded it [with help] for over 20 years!) We’re not going to do anything complex, but it is powerful.

We’re importing two objects from the library – BeautifulSoup to get all the HTML and SoupStrainer to allow use to choose what to parse. We also have to import pandas to put our table in memory and requests to get the document from the web.

from bs4 import BeautifulSoup
from bs4 import SoupStrainer
import pandas as pd
import requests

url = "https://bctornados.com/sports/baseball/stats"
r = requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc)

The Web Page

We’re very fortunate that the Brevard team’s web page is designed well. In particular, the section that contains the pitching data is tagged as “individual-overall-pitching”. This made finding and scraping the data far easier. It’s not odd to see web page developers leaving id’s off or for CMS systems to fail to include them. Here’s the top of the appropriate section on Brevard’s page:

<!-- Individual - Overall - Pitching -->
<section id="individual-overall-pitching">

This makes it very easy for us to acquire the tabular data and put it into a DataFrame.

only_pitching = SoupStrainer(id='individual-overall-pitching')

pitching_section = BeautifulSoup(html_doc, "html.parser", parse_only=only_pitching)

pitching_table = pitching_section.find('table')

pitching_data_df = pd.read_html(str(pitching_table))[0]

Quick transform on the name column

Since I realized this put some gobbledygook in the Player Name column, I did slip in code to modify that, but even if you’re scraping pitching data from somewhere, you might not need those steps, but here’s the code I used.

# Function to convert "Last, First Jersey#etc" to "First Last"
def transform_name(name):
if pd.isna(name):
return name
parts = name.split(',')
if len(parts) < 2:
return name
last = parts[0].strip()
first_parts = parts[1].strip().split()
first = first_parts[0]
return f"{first} {last}"

# Apply transformation
pitching_data_df['Player'] = pitching_data_df['Player'].apply(transform_name)

To Excel

Now, I wanted to know immediately how well this worked, so I dumped it into an Excel spreadsheet.

pitching_data_df.to_excel('brevard.xlsx', index=False)

Next Steps

Now that I have my practice data and my game data in DataFrames, I can start merging in the performance results when displaying practice data and drawing conclusions.

This series includes:

Categories: Python | Tags: , , , , , , , , , , , , , | 4 Comments

Aggregating CSV Data into DataFrames: Rapsodo Pitch Data

This week I met with Vinny Carone, the Head Baseball Coach at Brevard College, to talk about the data I’ve been acquiring on youth pitchers. I’ve been doing a lot of visualizations and wanted to know if they were meaningless visualizations. I do have some improvements to make based on that and I also got some CSV exports of Rapsodo data to work with. I’m very excited about the potential for insights and the opportunity to see them applied.

Quick Point on the Data Sources

The data I’ve been using comes from my PitchLogic baseball, which has some electronics in it to sense and transmit its movement. I’ve been exporting the data into CSVs and doing visualizations. When I export, I specify the data range for the data that I want and get all pitches thrown.

The college team is using the more expensive hardware – Rapsodo 3.0 – which gives much of the same data and location data that PitchLogic does not. It can also export data into CSVs. So far, we have only seen how to do it for individual players. So, aggregation and tagging of data to use all of the data was my first task.

Walking the directory

Since I want to get data from many different files, we need to walk the directory. First, we have to import the os functions, so we’ll be able to create our DataFrames using pandas.

import os
import pandas as pd

# Directory containing the Rapsodo Data files
rapsodo_data_dir = 'Rapsodo Data'

# Initialize an empty list to store player data
data = []

Looping through all the files is now actually really simple. You don’t need to specify anything, except which directory to use.

# Loop through each file in the directory
for filename in os.listdir(rapsodo_data_dir):
if filename.endswith('.csv'):
# Construct the full file path
file_path = os.path.join(rapsodo_data_dir, filename)

Scraping the Player ID and Player Name

The nice thing when viewing the Rapsodo CSV files in Excel is that it gives you a label identifying the player (anonymized here) by ID and name. This is great when I’m viewing one player, but not real useful when I want to aggregate all the data and keep it tagged by player. So, we have to first treat the CSV file as text, then go back to read the data into the DataFrame.

        # Read the file as text in order to get Player ID and Player Name
try:
with open(file_path, 'r') as file:
for _ in range(3):
line = file.readline()
if not line:
break
if "Player ID:" in line:
player_id = line.split('"Player ID:",')[1].strip()
if "Player Name:" in line:
player_name = line.split('"Player Name:",')[1].strip()
except Exception as e:
print(f"Error reading {file_path}: {e}")

The AI in my dev environment in DataCamp urged me to use try-catch for error-handling. It’s always nice to know why and where an error occurred, so catch those exceptions!

Building that DataFrame

Once again, Python is pretty slick at how it handles data. Simple and elegant.

Read the CSV with all 106 columns into our temporary DataFrame, df, add columns at the front tagging each row with player ID and player name. Then, we put each temporary df into a list of DataFrames so we can concatenate them smoothly.

        # Read the file as CSV, skipping the first 4 lines in order to just get pitch data
try:
df = pd.read_csv(file_path, skiprows=4)

# Add Player ID and Player Name columns
df.insert(0, 'Player ID', player_id)
df.insert(1, 'Player Name', player_name)

# Append the DataFrame to the list
data.append(df)

except pd.errors.ParserError as e:
print(f"Error reading {file_path}: {e}")

# Concatenate all DataFrames in the list
final_df = pd.concat(data, ignore_index=True)

Conclusion

Now that I’ve got 2024-2025 Rapsodo data all in a single file, I can start working on team wide and individualized visualizations. One of the keys in our discussion was to create our own Stuff+ metric since we don’t have access to anyone else’s. That’s going to be the next blog post!

This series includes:

Categories: Data Wrangling, Python | Tags: , , , , , , , , , , , | 4 Comments

Quartile Boxes for Charting in Pitch Movement Visualizations

As I was looking at my PitchLogic movement visualizations that I’m using in coaching youth pitchers, I realized that I could use the quartile values to display how consistent their movement has been. The code is not that complex, so I wanted to make sure to share it. I imagine it will have considerably more uses than just my hobbyist one.

Defining the function

As my Python code for pitching assessment gets more complex, I decided to start breaking out pieces as functions. I had been using cols to create multiple charts of the various pitch types, but having a chart for each pitch type makes more meaningful charts.

For clarity, I made sure to add a good docstring…

def movementCharting (player_df, type):
""" create a chart of horizontal and vertical movement with a box around 'quantile area' showing where pitches go (separate function for creating a table of all pitch types and movement)

Args:
player_df (DataFrame): PitchLogic DataFrame with 'Horizontal Movement (in)' and 'Vertical Movement (in)'
type (string): pitch type to chart

Return:
filename (String): filename of PDF in which the image is stored
"""

Building the Chart

It only takes a few lines to build the chart using Seaborn. We’ve gotten the DataFrame that contains only pitches by one player and we trim it to only pitches of the type we’re charting.

    movement_df = player_df[player_df['Type']==type]
movefig = sns.relplot(x='Horizontal Movement (in)', y='Vertical Movement (in)', data=movement_df, kind='scatter')

Then we compute our quartile values and draw the box on the plot. I’ve commented out the lines for each mean value, since they added visual complexity without making the chart more meaningful. YMMV

    # Calculate mean and confidence intervals
    mean_horiz = movement_df['Horizontal Movement (in)'].mean()
    mean_vert = movement_df['Vertical Movement (in)'].mean()
    ci_horiz = movement_df['Horizontal Movement (in)'].quantile([0.25, 0.75])
    ci_vert = movement_df['Vertical Movement (in)'].quantile([0.25, 0.75])
        
    for ax in movefig.axes.flat:
#        ax.axhline(mean_vert, color='red', linestyle='--')
#        ax.axvline(mean_horiz, color='blue', linestyle='--')
        ax.hlines(ci_vert[0.25], ci_horiz[0.25], ci_horiz[0.75], color='blue')
        ax.hlines(ci_vert[0.75], ci_horiz[0.25], ci_horiz[0.75], color='blue')
        ax.vlines(ci_horiz[0.25], ci_vert[0.25], ci_vert[0.75], color='red')
        ax.vlines(ci_horiz[0.75], ci_vert[0.25], ci_vert[0.75], color='red') 

A couple of lines to put in the title in an appropriate spot….

    movefig.fig.subplots_adjust(top=0.85)  # Adjust the top to make space for the title
movefig.fig.suptitle(type + ' Movement Profile for ' + playerDisplay, y=0.90) # Move the title upward

Returning from our function

As noted in the docstring, we’re returning the filename as the value from the function. That gets dropped into an array so that it can be processed using Automate PDF Creation from Data Visualizations with Python

    filename = playerfigStorage + '/' + type + ' Movement Profile.jpg'
movefig.savefig(filename)
return filename

Conclusion

Visualizations aren’t hard to create using Python and there are many ways to make them more meaningful without excessive coding. As I work with my pitchers and they learn more about using this inter-quartile range box, they might pick up some technical knowledge and understanding of how to use visualization in their own lives.

This series includes:

Categories: Python, Visualizations | Tags: , , , , , , , , , , | 4 Comments

Visualizing Baseball Pitching: Loading and Wrangling PitchLogic Data using Python

In my hobbies, I’ve always used my technical skills to support the endeavor. Web pages for the Boy Scout Troop, an online database for the school painting & landscaping projects, spreadsheets to manage the baseball rosters, radar to measure throwing speed, and email to keep everyone in the loop. So, it will come as no surprise to anyone I’ve encountered in those that I’m taking data visualization into my baseball coaching. This also gives me a chance to break down the process here and provide an opportunity for others to learn from and extend the code. This is part of a series of posts, with snippets delivered in manageable chunks.

Learning on DataCamp

As I was looking around for ways to learn AI and to learn Python as a way to access it, I found DataCamp. It’s a great site for learning Data and AI. There are a variety of career tracks (Data Analyst, Data Engineer, and Data Science, among others) and you can sample without paying. Each course in a track is also broken down into individual lessons, which are just a few minutes long. This allows you to learn in digestible chunks. It also will re-cap your progress when you reopen a course that you’ve already started.

You also get 3 workbooks on the DataLab that practice with data from files, Oracle, and SQL among others. You can use Python or R to manipulate it or ask the AI to do something (or start something for you). I was initially only using it for the exercises, then started playing with my baseball pitching files.

It’s usually $330 a year, but sometimes can be secured at a discount for one person or for a team (mine was half off). Well worth your money!

Basics

My PitchLogic baseball records a lot of information about every pitch, examining the pitcher’s delivery, the release, and the movement of the ball as it travels to the plate. The ball is identical to a Major League baseball in weight and covering, but instead of cork in the center, there are electronics. This means that we never hit the ball, just throw it.

You can export the data from the app, getting a CSV file via email with all the pitch data for the selected date range. I was always opening this in Excel and manipulating it to try to see patterns and to learn what “good” numbers were for age groups and individual pitchers. I had all kinds of formulas for highlighting cells that looked like “good” numbers. With the data being in Excel, I could have created some charts to better understand the data, but it was time-consuming and I wouldn’t know where I’d find value until I put in the work.

I started using ChatGPT to create some charts based on the data in the same way it was done in the app and in session reports on the web. I was including data from all sessions, so it was a little different. I also looked at data from a few seasons ago, which had long left my app but was stored in spreadsheets.

ChatGPT lets you look at the Python it creates to do what you want it to do. So, I started modifying that and reusing it for different players. Somehow, I found DataCamp and then started using DataLab to examine my data in more detail and practice the Python that I was learning.

When I imported the data, I realized that I could use my newly acquired skills and wrangle the data a little. When I was doing all my analysis in Excel, I had some formulas to create new column values, but I did a lot of copying and pasting. By using Python, I can easily apply those to any spreadsheet I drop in.

First, I just wanted to create a Player Name field so I didn’t have to spend time checking both first and last name. Then I wanted to parse my Memo column to separate out location and intended pitch type.

import pandas as pd

# Load the CSV file into a DataFrame
pitchLogic = pd.read_csv('PitchLogic.csv',parse_dates=["Date"])

# Combine names to Player Name
pitchLogic['Player Name'] = pitchLogic['First Name'] + ' ' + pitchLogic['Last Name']
# Split the memo field between location and intended pitch type
pitchLogic['Location'] = pitchLogic['Memo'].str[:1]
pitchLogic['Intended Type'] = pitchLogic['Memo'].str[2:4]

The Arm Slot value is the angle on a clock face for the pitcher’s arm as he’s throwing. The Spin Direction is the same for the direction in which the ball is spinning. An arm slot of 1:30 would be throwing with your arm at a 45 degree angle to your torso. A spin direction of 12:00 would be a ball spinning forward perpendicular to the ground. I want to compute the difference between these two as a way of evaluating the pitch. For a fastball, you usually want them as closely aligned as possible. For accuracy, you want them to be the same every time you throw a fastball, so that it’s predictable.

Computing the difference between the Arm Slot and Spin Direction is only easy when you’re doing it manually. If you do it in Excel or HCL Notes, it can be a rather complex formula. In Python, it’s not bad at all once you convert it to a time.

from datetime import datetime

# Skip rows that are missing Arm Slot or Spin Dir
pitchLogic = pitchLogic.dropna(subset=['Arm Slot', 'Spin Dir'])

# Convert 'Arm Slot' and 'Spin Dir' to datetime objects
pitchLogic['Arm Slot'] = pd.to_datetime(pitchLogic['Arm Slot'], format='%H:%M')
pitchLogic['Spin Dir'] = pd.to_datetime(pitchLogic['Spin Dir'], format='%H:%M')

# Compute the slot difference in minutes
pitchLogic['Slot Diff'] = (pitchLogic['Arm Slot'] - pitchLogic['Spin Dir']).dt.total_seconds() / 60

# Adjust the slot difference if it is less than -360
pitchLogic.loc[pitchLogic['Slot Diff'] < -360, 'Slot Diff'] += 720

# Convert the slot difference to integer
pitchLogic['Slot Diff'] = pitchLogic['Slot Diff'].astype(int)

# Format 'Arm Slot' and 'Spin Dir' to display only time as HH:MM
pitchLogic['Arm Slot'] = pitchLogic['Arm Slot'].dt.strftime('%H:%M')
pitchLogic['Spin Dir'] = pitchLogic['Spin Dir'].dt.strftime('%H:%M')

There is some old data in my PitchLogic files, from before they had auto-tagging of pitch type. For simplicity, and since much of my old data was from very young pitchers who only have one pitch (fastball), I defaulted everything to FF (four-seam fastball). Many of the columns in the CSV export from PitchLogic have column names that have changed over time and may continue to change, so I have an example of how to rename one.

# Check if 'Type' column exists before trying to fill NaN values
if 'Type' in pitchLogic.columns:
pitchLogic['Type'] = pitchLogic['Type'].fillna('FF')
else:
pitchLogic['Type'] = 'FF'

# Rename the 'Speed (mph)' column to 'Speed'
pitchLogic.rename(columns={'Speed (mph)': 'Speed'}, inplace=True)

I enter pitch location data into the Memo field, and also include the pitch type that the player intended if that differs from the auto-tagging. It is very common for players under 12 to have pitch type differ on some pitches, but not others. For high school pitchers, they tend to stay in the same pitch type, but may think it’s something different. At one assessment, a very good HS pitcher was throwing sliders and cutters, thinking he was throwing curveballs and fastballs.

Summary

This should give a good idea of how to import and wrangle the PitchLogic data, or any data of your own, in order to prepare some visualizations. Keep an eye on this blog for other pieces in this series to help understand visualizations using Python.

This series includes:

Categories: Python, Visualizations | Tags: , , , , , , , , , , , | 4 Comments

Automate PDF Creation from Data Visualizations with Python

When creating some really great visualizations, I wondered how I could create a bunch of them and not be overwhelmed by the process of exporting them to PDF files to share with others. So I explored a few options and found that img2pdf was most suitable: always lossless, small, and fast. It allows me to loop through my data, creating charts, move them to individual PDFs, and then combine them into multi-page PDFs.

I’m using DataLab by DataCamp, where I’m learning Python, Data Science, and AI. So, some of my code may rely on that environment and YMMV. Installing and importing img2pdf was very straightforward for me.

!pip install img2pdf
import img2pdf

It turned out to be pretty simple to loop through my CSV data using one of the columns to get data by player and then create each graph, saving it as a PDF, then combining then as multi-page PDFs by player and by category.

I created a directory structure for the files, with a Fig Storage directory for all the individual PDFs and a directory for each team and year. This allows me to scale it to handle data at volume, letting me focus on analyzing that data instead of being bogged down in copying and pasting.

Within each loop, it creates an empty array, imagefiles, in which all filenames are placed, so that those files can be copied into the summary PDFs once the charts have all been generated. Outside the loop, there is another array, byDateArrayFiles, for storing all of the filenames to be bundled together for a ‘Velocity by Date’ file.

Here’s a sample of the loop with only two charts created. I have 8 different ones created for each player, but that would be excessive. This gives you the idea.

season = '2025'
team = 'ICI'
byDateArrayFiles = []
playerNames = pitchLogic['Player Name'].unique()
for player in playerNames:
player_df = pitchLogic[pitchLogic['Player Name'] == player]
imagefiles = []

bydatefig, bydateax = plt.subplots()
bydateax = sns.lineplot(x='Date', y='Speed', data=player_df).set(title='Velocity for ' + player)
bydatefig.autofmt_xdate(rotation=75)
filename = "Fig Storage/" + player + ' VelocityByDate.jpg'
bydatefig.savefig(filename)
byDateArrayFiles.append(filename)
imagefiles.append(filename)

only100_player_df = player_df[abs(player_df['Slot Diff'])<=100]
only100_player_df.loc[only100_player_df['Location'] != 'K', 'Location'] = 'BB'
slotfig = sns.relplot(x='Horiz Mvmt', y='Vertical Mvmt', data=only100_player_df, kind='scatter', hue='Slot Diff', size='Location', style='Type').set(title='Slot Difference and Movement Profile for ' + player)
filename = "2025 Samples/" + player + ' SlotMovement.jpg'
slotfig.savefig(filename)
imagefiles.append(filename)

with open(season + " " + team + "/" + player + ".pdf", "wb") as pdf_file:
pdf_file.write(img2pdf.convert(imagefiles))

with open(season + " " + team + "/Velocity By Date.pdf", "wb") as pdf_file:
pdf_file.write(img2pdf.convert(byDateArrayFiles))

This all saved me loads of time and headaches generating the charts. It lets me quickly explore whether my visualizations are meaningful. It also makes modifying them or updating them very easy. I can put a season’s worth of pitches into the system and have a suite of charts for each player a minute later.

This series includes:

Categories: Python, Visualizations | Tags: , , , , , , , , , , , | 5 Comments

Using ChatGPT to Write Code in LotusScript, TypeScript and C#

So, I was reading about an open position that I might be interested in and followed some links that lead to an article on AI in programming and wondered just how good it would be. So, I went ahead and asked ChatGPT to write an agent for me.

write an agent for use in HCL Notes that sends messages to users at intervals determined by a configuration document that identifies which documents to notify about using a querystring and whom to notify by a list on the configuration document. The configuration document should specify whether to send individual messages for each document or a list of documents.

I thought it couldn’t do half the job that I did. I figured I’d need to go through a bunch of refining iterations, explaining that I wanted a function to do this and another function to do that. Nope.

Sub Initialize
Dim session As New NotesSession
Dim db As NotesDatabase
Set db = session.CurrentDatabase

' Retrieve configuration document
Dim configDoc As NotesDocument
Set configDoc = GetConfigDocument(db)

If configDoc Is Nothing Then
MsgBox "Configuration document not found!", 48, "Error"
Exit Sub
End If

' Read configuration values
Dim queryString As String
queryString = configDoc.QueryString(0)

Dim usersToNotify As Variant
usersToNotify = configDoc.UsersToNotify

Dim notificationMethod As String
notificationMethod = configDoc.NotificationMethod(0)

Dim interval As Integer
interval = configDoc.IntervalInHours(0)

' Find the documents to notify about based on query string
Dim view As NotesView
Set view = db.GetView("YourDocumentsView") ' Adjust view as needed
Dim doc As NotesDocument
Set doc = view.GetFirstDocument

Dim docsToNotify As NotesDocumentCollection
Set docsToNotify = New NotesDocumentCollection

Do While Not (doc Is Nothing)
If Evaluate(queryString, doc) Then
docsToNotify.Add doc
End If
Set doc = view.GetNextDocument(doc)
Loop

' Send notifications based on configuration
Dim user As String
ForAll user In usersToNotify
If notificationMethod = "Individual" Then
Call SendIndividualMessages(user, docsToNotify)
ElseIf notificationMethod = "List" Then
Call SendListMessage(user, docsToNotify)
End If
End ForAll

' Wait for the interval before running again (example of a timed agent)
Call ScheduleNextRun(interval)

End Sub

' Function to retrieve configuration document
Function GetConfigDocument(db As NotesDatabase) As NotesDocument
Dim configView As NotesView
Set configView = db.GetView("ConfigView") ' Adjust view as needed
Set GetConfigDocument = configView.GetFirstDocument
End Function

' Function to send individual messages
Sub SendIndividualMessages(user As String, docs As NotesDocumentCollection)
Dim doc As NotesDocument
Set doc = docs.GetFirstDocument

While Not (doc Is Nothing)
Dim mailDoc As NotesDocument
Set mailDoc = db.CreateDocument
mailDoc.Form = "Memo"
mailDoc.Subject = "Document Notification"
mailDoc.SendTo = user
mailDoc.Body = "You have a document to review: " & doc.GetItemValue("Subject")(0)

' Send the mail
Call mailDoc.Send(False)

' Move to next document
Set doc = docs.GetNextDocument(doc)
Wend
End Sub

' Function to send list of documents in one message
Sub SendListMessage(user As String, docs As NotesDocumentCollection)
Dim doc As NotesDocument
Set doc = docs.GetFirstDocument

Dim bodyText As String
bodyText = "You have the following documents to review:" & Chr(10)

While Not (doc Is Nothing)
bodyText = bodyText & doc.GetItemValue("Subject")(0) & Chr(10)
Set doc = docs.GetNextDocument(doc)
Wend

Dim mailDoc As NotesDocument
Set mailDoc = db.CreateDocument
mailDoc.Form = "Memo"
mailDoc.Subject = "Documents Notification"
mailDoc.SendTo = user
mailDoc.Body = bodyText

' Send the mail
Call mailDoc.Send(False)
End Sub

' Function to schedule the next run of the agent
Sub ScheduleNextRun(interval As Integer)
' Schedule the agent to run again in 'interval' hours
Dim agent As NotesAgent
Set agent = session.CurrentDatabase.GetAgent("YourAgentName") ' Adjust agent name
Call agent.Schedule(Interval:=interval)
End Sub

I’ve written one myself and I’m pretty proud of it. I talk about it often. Now, I talk about how my team created the same kind of functionality in MS Dynamics 365 and was working on doing a full stack version before we all got furloughed (USAID contract cuts). Nonetheless,, I think this version for Notes, which completed in the amount of time it took to scroll through the message, would have been a great start.

Some of the positions I’m looking at use TypeScript in the tech stack, and I know TypeScript is still a Javascript-based language. So, I decided to ask it to do the same in TypeScript. Notes is integrated with email, so some of the extra coding in TypeScript wasn’t necessary in LotusScript.

import nodemailer from 'nodemailer';
import axios from 'axios';
import { CronJob } from 'cron';

// Interface for Configuration Document
interface ConfigDocument {
queryString: string;
usersToNotify: string[];
notificationMethod: 'Individual' | 'List';
intervalInHours: number;
}

// Function to fetch configuration document from an API or database
async function getConfigDocument(): Promise<ConfigDocument> {
try {
const response = await axios.get('http://your-api-url/config');
return response.data;
} catch (error) {
console.error('Error fetching configuration document:', error);
throw new Error('Failed to fetch config document');
}
}

// Function to send email
async function sendEmail(to: string, subject: string, body: string): Promise<void> {
// Create a transporter using your email service
let transporter = nodemailer.createTransport({
service: 'gmail', // Example service
auth: {
user: 'your-email@gmail.com', // Replace with your email
pass: 'your-email-password', // Replace with your email password
},
});

// Set up email data
let mailOptions = {
from: 'your-email@gmail.com',
to: to,
subject: subject,
text: body,
};

try {
await transporter.sendMail(mailOptions);
console.log('Email sent successfully to', to);
} catch (error) {
console.error('Error sending email:', error);
}
}

// Function to get documents from the Notes database based on query string
async function fetchDocuments(query: string): Promise<any[]> {
try {
// Assuming there is an API endpoint to fetch documents based on a query
const response = await axios.get(`http://your-api-url/documents?query=${encodeURIComponent(query)}`);
return response.data;
} catch (error) {
console.error('Error fetching documents:', error);
throw new Error('Failed to fetch documents');
}
}

// Function to send individual messages for each document
async function sendIndividualMessages(users: string[], documents: any[]): Promise<void> {
for (const user of users) {
for (const doc of documents) {
const subject = `Document Notification: ${doc.title}`;
const body = `Hello, you have a document to review: ${doc.title}\n\nDetails: ${doc.details}`;
await sendEmail(user, subject, body);
}
}
}

// Function to send a list of documents in one email
async function sendListMessage(users: string[], documents: any[]): Promise<void> {
const documentList = documents.map((doc) => `- ${doc.title}`).join('\n');
const body = `Hello, you have the following documents to review:\n\n${documentList}`;

for (const user of users) {
const subject = 'Documents Notification';
await sendEmail(user, subject, body);
}
}

// Function to run the agent logic
async function runAgent() {
try {
const config = await getConfigDocument();

// Fetch documents based on the query string from the configuration
const documents = await fetchDocuments(config.queryString);

// Send notifications based on the method defined in the config
if (config.notificationMethod === 'Individual') {
await sendIndividualMessages(config.usersToNotify, documents);
} else {
await sendListMessage(config.usersToNotify, documents);
}
} catch (error) {
console.error('Error during agent run:', error);
}
}

// Scheduling the agent to run at a defined interval (in hours)
async function scheduleAgent() {
const config = await getConfigDocument();

// Using Cron to schedule the agent run based on the interval (in hours)
const cronExpression = `0 0 */${config.intervalInHours} * * *`; // Run at interval in hours
new CronJob(cronExpression, runAgent, null, true, 'America/New_York'); // Adjust timezone as needed

console.log(`Agent scheduled to run every ${config.intervalInHours} hours`);
}

// Start the scheduling
scheduleAgent().catch((error) => console.error('Error scheduling agent:', error));

Why not get more afield and have it do the full stack concept version for me. This one is in C#

using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Mail;
using System.Net.Http;
using System.Threading.Tasks;
using Quartz;
using Quartz.Impl;

class Program
{
// Configuration Document class
public class ConfigDocument
{
public string QueryString { get; set; }
public List<string> UsersToNotify { get; set; }
public string NotificationMethod { get; set; } // "Individual" or "List"
public int IntervalInHours { get; set; }
}

// Document class that represents a document in the Notes database
public class Document
{
public string Title { get; set; }
public string Details { get; set; }
}

// Function to fetch configuration from an API or database
static async Task<ConfigDocument> GetConfigDocumentAsync()
{
// Replace with actual logic to fetch the configuration
// For demonstration, we're using hardcoded values
return new ConfigDocument
{
QueryString = "SELECT * FROM Documents WHERE status = 'pending'",
UsersToNotify = new List<string> { "user1@example.com", "user2@example.com" },
NotificationMethod = "Individual",
IntervalInHours = 1
};
}

// Function to fetch documents based on the query string
static async Task<List<Document>> FetchDocumentsAsync(string query)
{
// Replace with actual logic to query the HCL Notes database or API
// Simulating fetching documents
return new List<Document>
{
new Document { Title = "Document 1", Details = "Details of Document 1" },
new Document { Title = "Document 2", Details = "Details of Document 2" }
};
}

// Function to send an email
static async Task SendEmailAsync(string to, string subject, string body)
{
var smtpClient = new SmtpClient("smtp.gmail.com")
{
Port = 587,
Credentials = new NetworkCredential("your-email@gmail.com", "your-email-password"),
EnableSsl = true,
};

var message = new MailMessage
{
From = new MailAddress("your-email@gmail.com"),
Subject = subject,
Body = body,
IsBodyHtml = false,
};

message.To.Add(to);

await smtpClient.SendMailAsync(message);
}

// Function to send individual messages
static async Task SendIndividualMessagesAsync(List<string> users, List<Document> documents)
{
foreach (var user in users)
{
foreach (var doc in documents)
{
var subject = $"Document Notification: {doc.Title}";
var body = $"Hello, you have a document to review:\n\n{doc.Title}\n\nDetails: {doc.Details}";

await SendEmailAsync(user, subject, body);
}
}
}

// Function to send a list of documents in a single email
static async Task SendListMessageAsync(List<string> users, List<Document> documents)
{
var documentList = string.Join("\n", documents.ConvertAll(doc => $"- {doc.Title}"));
var body = $"Hello, you have the following documents to review:\n\n{documentList}";

foreach (var user in users)
{
var subject = "Documents Notification";
await SendEmailAsync(user, subject, body);
}
}

// Main Agent Logic
static async Task RunAgentAsync()
{
try
{
var config = await GetConfigDocumentAsync();
var documents = await FetchDocumentsAsync(config.QueryString);

// Send notifications based on the method specified in the config
if (config.NotificationMethod == "Individual")
{
await SendIndividualMessagesAsync(config.UsersToNotify, documents);
}
else if (config.NotificationMethod == "List")
{
await SendListMessageAsync(config.UsersToNotify, documents);
}
}
catch (Exception ex)
{
Console.WriteLine($"Error during agent run: {ex.Message}");
}
}

// Scheduling the agent to run at the specified interval (in hours)
static async Task ScheduleAgentAsync()
{
var config = await GetConfigDocumentAsync();

var scheduler = await StdSchedulerFactory.GetDefaultScheduler();
await scheduler.Start();

// Schedule the job with Quartz.NET to run at the specified interval
var job = JobBuilder.Create<Job>()
.WithIdentity("DocumentNotificationJob")
.Build();

var trigger = TriggerBuilder.Create()
.WithIdentity("DocumentNotificationTrigger")
.StartNow()
.WithSimpleSchedule(x => x.WithIntervalInHours(config.IntervalInHours).RepeatForever())
.Build();

await scheduler.ScheduleJob(job, trigger);

Console.WriteLine($"Agent scheduled to run every {config.IntervalInHours} hours");
}

// Quartz job that triggers the agent run
public class Job : IJob
{
public Task Execute(IJobExecutionContext context)
{
return RunAgentAsync();
}
}

// Main Entry Point
static async Task Main(string[] args)
{
await ScheduleAgentAsync();

// Keep the application running
Console.WriteLine("Press any key to exit...");
Console.ReadKey();
}
}

Now, in each case, the code needs to be refined and enhanced. I imagine that could just be done with more specific descriptions of what’s needed, but dang, this AI stuff is powerful. I suspect that if I give ChatGPT some of my code and ask it to port it to another tech stack, it’s going to save me loads of time. I can focus on design patterns and on getting interfaces right, leaving the grunt work to the AI. These are indeed exciting times.

Categories: AI Coding, C#, Code Challenge, TypeScript | Tags: , , , , | Leave a comment

Copying a Sharepoint file using Flow

For one of the applications that we’re moving from Notes to the Microsoft platform, I need to copy documents from one Sharepoint site to another. There are two instances in which this task will be called: when a document gets approved for publication and when a project needs a new set of documents.

In Notes, I had a handle on how to do this. For the approved documents, my agent would send them via email to the destination database as a mail-in database. For creating a new set of documents, I would simply copy the main database and give it a new name. Needless to say, I liked my Notes solution a lot, but technology and user requirements are constantly changing.

This is a relatively simple thing, but I am again lost in the code. As in the past. I want to share my experience, so you can learn from it as well. Here’s what my copy file task looks like in Flow:

Copy File Image

So, we start with the assumption that there is a document, thisItem, being processed here, then have to direct Flow where to find it and provide the destination for the document. The trick was making sure to get Folder Path before File name with extension. I tried without using the path, thinking that Sharepoint docs would always be in the same directory (which they aren’t, of course!)

The other good news, beyond realizing I can do something in a different environment is that there actually is some “code” to look at. It’s not all just drag-and-drop, pretty pictures.

{
    "inputs": {
        "host": {
            "connection": {
                "name": "@parameters('$connections')['shared_sharepointonline']['connectionId']"
            }
        },
        "method": "post",
        "body": {
            "sourceFileId": "@{triggerBody()?['{Path}']}@{triggerBody()?['{FilenameWithExtension}']}",
            "destinationDataset": "https://mysite.sharepoint.com/sites/teams/oimt/TAMIS",
            "destinationFolderPath": "/Documents",
            "nameConflictBehavior": 1
        },
        "path": "/datasets/@{encodeURIComponent(encodeURIComponent('https://mysite.sharepoint.com/sites/Policy/PolicyPortal'))}/copyFileAsync",
        "authentication": "@parameters('$authentication')"
    },
    "metadata": {
        "flowSystemMetadata": {
            "swaggerOperationId": "CopyFileAsync"
        }
    }
}

It’s a small step, but one I’m happy to have accomplished and privileged to share.

Categories: Flow, Xpages | Tags: , , | Leave a comment

Lt Gorman, you need this software!

How can I be grumpy at a tech conference? Well, it’s not one of my familiar conferences. That one’s next week, but I’m not going. We’re moving off of Notes, though a part of me continues to bleed yellow. So, I’ve been stomping around, feeling a little grumpy about having to give up my familiar technology and attending conferences with my friends.

The best thing about technology conference is… technology! So, I saw a session with the relatively boring title of “Real Time Data Analytics to the Edge”. OK, I admit, tossing in the word “Analytics” is what caught my eye, because it’s a buzzword. When I look at horse racing, I dive deeply into the data and use analytics to help me understand the possible outcomes and their likelihood. The description sold it a bit better:

Speed of command and safety is a top priority for governments. Delivering on the intelligent edge is made possible by Microsoft Azure. The “edge” is where the operational action lives and is ever present at the very skin of an organization. It’s the convergence of compute, connectivity, and cloud. The intelligent edge is about enabling operations, collaboration, and IoT sensor fusion amongst responders and far forward commanders yet leverages centralized services to augment decision making where the decision space is measured in minutes and in worst case scenarios, seconds. Blueforce Development and Dejero will show how sensor fusion, cognitive services, and mobile connectivity put the power of actionable intelligence in the hands of the frontline users, powered by Microsoft Azure, in real-time.

This still doesn’t convey the coolness of the technology involved.

Go back with me, if you will, to Aliens and sit down with Lieutenant Gorman, commanding his squad of Colonial Marines. Each of your Marines is wearing cameras and sensors to feed back information to their commander. Unfortunately, there’s too much data for him to handle and his lack of experience contributes to his failure. Nonetheless, what cool tech! He can see what any of his Marines can see. He can monitor all of their vitals. He should even be able to pick up movement around them from the sensors.

What BlueForce does is take all that data and make it manageable. I remember taking a course in management way back in the day and among the best ideas was “management by exception”. I use this all the time in software development – I want the system to only notify me when something’s broken. I also like to get occasional messages that just remind me that the system is running (since it might not be able to send a notice if a whole lot is broken!) BlueForce lets the user, in this case our neophyte Lieutenant, take the feeds he wants and arrange them the way he wants to know what’s going on.

In the “It’s a small world” category, the CEO of BlueForce Development is Michael Helfrich, who was Director of Product Development for Knowledge Management at Lotus. So, we were able to share stories about Raven, which I was never able to get any of my customers to buy, since no one could afford $100,000 in 1999!

Among the feeds was the Virtual Surveillance Partner (VSP) by 6-Watch. It uses a combination of Lidar and video to perform threat detection (helping sort out what’s important and what’s not) and provide audio cues about those threats. There are far more applications to this set of tools than just providing our Colonial Marine Lieutenant with real-time threat assessment and asset management, but I suspect anyone reading this is going to hop onto their website to see more about it. (Think “law enforcement” and “forensic evidence”.)

So, the good Lieutenant would have been able to put his VSP up on the screen, select which video feeds matter, perhaps enabling alerts for various vital stats feeds from his Marines and figure out how best to command the situation. After all, he had 38 drops (simulated) to practice it.

20190205_140111.jpg

The command center screens that Lt Gorman would be able to drag-and-drop to configure his feeds to best command those Colonial Marines

It’s some pretty cool stuff and the President of 6-Watch, Eric Gahagan, took some time to chat with me as I was looking at their system in action on the conference floor. Eric spent a long time with the Massachusetts State Police, so the implementation of their products is near and dear to his heart.

20190205_135931.jpg

The Lidar for 6-Watch is atop that tripod. The crate below is just for packing it up for shipping. So, it’s somewhat portable – a Special Forces team could easily mount it on a vehicle.

As much as I struggle with being an “outgoing introvert”, meeting the people who are behind the technologies and making connections has always been the great benefit of going to tech conferences. I can’t say that I’ll ever personally have a use for the technology, but it’s pretty cool.

Categories: Bleed Yellow, Conferences, Security | Tags: , , , , , , , , , , , | 1 Comment

Get excited about learning all over again! #IBMThink #IBMChampion

Do you remember when learning was exciting? If it isn’t anymore, you need to start listening to John Jardin. He’s got a session coming up at IBM Think and the excitement explodes from his blog. If you haven’t registered yet, DO IT NOW!

See you in Las Vegas, March 19-22.

Read about John’s session — http://bleedingcode.com/will-presenting-ibm-think-las-vegas/

p.s. Ask me for a discount code!

Categories: Conferences, IBM Champion | Tags: , | Leave a comment

Proud to be an #IBMChampion

In one of the many jobs that I held, I remember someone telling me that they weren’t sharing knowledge of something because being the only person who knew that gave them job security. I’ve always felt that was foolish. If you’re the only person who knows something, you get stuck doing that same stupid, repetitive task whenever it needs doing. I can think of nothing more boring for a developer than hoarding knowledge. I don’t ever want to have to do the same thing twice – unless doing it the second time is showing someone else how to do it or demonstrating the way things “used to get done” before I automated the task.

So, it’s no surprise that I’ve always wanted to share whatever knowledge I’ve acquired. Sometimes, the knowledge I’ve got to share is so esoteric that people’s eyes glaze over and they ask me to stop talking. (Like the time I was about to talk for 30 minutes about Miljenko Grgich’s life story and how that related to the wine I’d poured, when my best man, Stu Shelton intervened…) Fortunately, in the IBM ICS community, among committed “Lotus” Notes professionals, any short or long blog post, knowledge sharing session or roundtables veritably demands the sharing of esoterica.

Now, here’s the thing. It’s not only encouraged – it is REWARDED.

I really wasn’t good at sharing outside my own work groups until about 2013. I was at Lotusphere (whatever it was called that year) and Marky Roden saw me singing karaoke. As horrid a singer as I am, I believe his thinking had to be ‘anyone who can risk that kind of humiliation can be a conference speaker’. He’d seen me answering questions on Stack Overflow (I’d gotten over a few hundred reputation points and he was shocked that he didn’t know who I was – no one did!) So, he told me I needed to become a speaker.

I’ve worked on that, but not made the breakthrough to the big time. I’ve blogged over the years, helped out with the DC Lotus Users Group (presented once!), participated in more IBM calls/sessions and gotten very involved in MWLUG. I even was on a panel at Connect 2017, talking to executives about technology trends. Then, when Richard Moy asked, I jumped at the chance to be the local host for MWLUG 2017, here in Alexandria, VA.

Nonetheless, I still have felt like I wasn’t doing all that much. I was living with The Imposter Syndrome. So, when time for IBM Champion nominations came round, I would nominate a few people and then cringe at the idea of nominating myself. I’d been nominated by others, but couldn’t get myself to fill out the paperwork. Howard Greenberg was the first to push me toward this. In 2016, Kathy Brown nominated me and pushed me to nominate myself. I wilted when looking at the paperwork, even though it was something I really wanted.

This year, I finally filled out the paperwork, to go with nominations from a few community members and, to my shock, I didn’t need to write a “it was an honor to be nominated” speech while talking about the great stable of Champions out there. I’m swollen with pride. I was named an IBM Champion.

I’ll get some swag. I’ll get a chance to participate more fully in our community. I’ll have a chance for some recognition at the various conferences. I’m excited about it and vowing to put even more effort into our community.

I want to thank everyone who has encouraged me along the way and those who played a particular role in this process. I hope that I can be your humble and obedient servant, sharing knowledge and helping build our community.

Categories: Conferences, IBM Champion | Tags: , , , , , , , , , , | 2 Comments

Blog at WordPress.com.