Posts

Scalable Reinforcement Learning with LLMs - Atropos Guide

This weekend, I will be in attendance at the Nous Research – RL Environments Hackathon, so to prepare I’ve been playing around with Atropos, their new RL framework that we will be using for the event. After failing to find any guides online, I decided to write my own. Update: I got 2nd place with VR-CLImax, my implementation of Verified Rewards via Completion Likelihood Improvement for improving LLM humor! You can find the code merged into the Atropos repository....

Evaluating Reasoning in LLMs Through MTG Deck Building

Update (2025-08-08): GPT-5, GPT-5 Mini, GPT-5 Nano, GPT-4 Turbo 11-06, and GPT-3.5 Turbo added. View Older Updates Update (2025-08-05): Kimi K2, GPT OSS 120B (low), and GPT OSS 120B (high) added. Update (2025-07-13): Grok 4, Grok 3, Gemini 2.5 Flash, Claude Sonnet 4 (thinking), and Command A added. Update (2025-06-11): o3 (high) added after API cost reduction. Update (2025-06-06): Deepseek R1 05-28 and Gemini 2.5 Pro 06-05 added. Update (2025-05-22): Claude Sonnet 4 and Opus 4 added....

Introducing Manamorphosis: A Diffusion Model for MTG Deck Generation

This post details Manamorphosis, a first-of-its-kind diffusion model developed to complete Magic: The Gathering decklists. It takes a set of known cards and fills in the rest to form a 60-card main deck. Subsequently, using the completed main deck as context, it can complete a 15-card sideboard. The core generative mechanism is based on Denoising Diffusion Probabilistic Models (DDPMs), the same family of models powering many image generation systems like Stable Diffusion and Midjourney, but adapted here for the unique domain of card sets....

Gait Analysis for Physical Therapy with YOLOv11

Analyzing how people walk using video is common in research and clinical settings, but getting accurate joint angles usually means either expensive equipment or manually annotating frames, which is slow and tedious. Over Easter weekend, I built a Python tool to help my mother with her research study analyzing patient videos. It uses YOLOv11-pose for automatic detection and adds an interactive interface for manual adjustments. What it Does The script (track....

AccountaBuddy: Your AI Accountability Partner - HackNC 2024

Check out the project on Devpost and view the source code on GitHub. I spent a weekend at HackNC building something to help manage all of my other side projects. What started as a simple “Wouldn’t it be cool if an AI actually rang you to ask if you’d done your work?” turned into AccountaBuddy, a lightweight app that does more than fire off push notifications - it actually calls you, celebrates your wins, and helps you problem-solve when tasks stall....

Daily Yap: A Synthetically Generated Conversational Audio Dataset

Training multimodal models often requires large, high-quality conversational audio datasets, which are scarce at the time of writing. This post details the creation of Daily Yap, a dataset I developed to address this gap. Existing conversational audio datasets present several limitations: Content Scope: Many datasets focus on assistant-user interactions, lacking the breadth of topics found in general human dialogues. Audio-Text Alignment: Datasets with precise alignment between high-quality audio and accurate transcriptions are uncommon....

Large Language Models for Magic: the Gathering

Magic: The Gathering (MTG) has always fascinated me with its complexity and strategic depth, thanks to its extensive rulebook and vast array of unique cards. Despite the game’s popularity, AI systems specifically designed for MTG have been few and far between, often falling short due to their inability to accurately interpret the intricate rules and interactions between cards. This blog post chronicles my recent endeavor to bridge this gap with large language models (LLMs) by creating a specialized dataset and evaluation metric to improve AI performance in MTG-related tasks....