Reverse Engineering GPT-4o
How does GPT-4o work? Based purely on speculation, I have no inside information. All numbers were pulled out of my ass. This is just my highly-detailed, educated guess on how the model works, along with a Pytorch implementation. If that sounds interesting, then keep reading. Tokenization The cornerstone of GPT-4o鈥檚 capabilities lies in its unified representation of diverse input types. Here鈥檚 how each modality is tokenized: Text Tokenization If you鈥檙e familiar with LLMs, you can skip this section....
Open-sourcing 100 Hours of Conversational Audio (Daily Yap)
In the rapidly evolving field of artificial intelligence, one critical gap became apparent to me as I was helping my friend鈥檚 startup locate data for training their multimodal model: the scarcity of high-quality conversational audio datasets. Existing datasets often fall short in several key areas: Content of conversations: Most available datasets consist assistant-user exchanges that don鈥檛 capture the complexity and subject matter of real-world dialogues. Audio-text alignment: There鈥檚 a lack of datasets that provide both high-quality audio and accurately transcribed text, perfectly aligned....
Large Language Models for Magic: the Gathering
Magic: The Gathering (MTG) has always fascinated me with its complexity and strategic depth, thanks to its extensive rulebook and vast array of unique cards. Despite the game鈥檚 popularity, AI systems specifically designed for MTG have been few and far between, often falling short due to their inability to accurately interpret the intricate rules and interactions between cards. This blog post chronicles my recent endeavor to bridge this gap with large language models (LLMs) by creating a specialized dataset and evaluation metric to improve AI performance in MTG-related tasks....