Introducing Manamorphosis: A Diffusion Model for MTG Deck Generation

This post details Manamorphosis, a first-of-its-kind diffusion model developed to complete Magic: The Gathering decklists. It takes a set of known cards and fills in the rest to form a 60-card main deck. Subsequently, using the completed main deck as context, it can complete a 15-card sideboard. The core generative mechanism is based on Denoising Diffusion Probabilistic Models (DDPMs), the same family of models powering many image generation systems like Stable Diffusion and Midjourney, but adapted here for the unique domain of card sets. Applying AI models to MTG has long been a pet project of mine and I’m exciting to share this model, as I believe it is the state-of-the-art (and only) AI model dedicated to understanding deck construction.

Demo Video

Card Representation: Doc2Vec Embeddings

Card identity is represented by 128-dimensional vectors (EMB_DIM=128). These embeddings are generated by training a Doc2Vec model (train_embedding_model.py) on preprocessed text data for each card obtained from MTGJSON’s AtomicCards.json. This captures semantic relationships based on card text. Much like modern RAG engines use embeddings to understand the meaning behind search queries and documents, these Doc2Vec embeddings allow the system to grasp the functional similarities between cards based on their textual descriptions (cost, type, rules).

Rationale for Using Embeddings

Instead of training the model to directly predict discrete cards from a vocabulary of ~28,000+, Manamorphosis uses pre-trained Doc2Vec embeddings as an intermediate representation. This approach offers several advantages:

Handling Data Sparsity: The training dataset (~47,000 decks) contains only a fraction (~5,000) of all legal MTG cards. A model predicting cards directly would struggle to learn meaningful representations or generation logic for the vast majority of cards rarely or never seen during training. The embedding model, trained on all card text, provides a representation for every card.
Generalization to New/Unseen Cards: Because the embedding is derived from card text, the system can generate an embedding for any card, including newly released ones, without retraining the embedding model (though retraining the diffusion model might improve performance with new metagames). The diffusion model learns to operate on the semantic meaning captured in the 128-dimensional embedding space, rather than being limited to the fixed vocabulary seen during its own training. This contrasts with models that predict discrete tokens, which typically require a fixed, predefined vocabulary.
Capturing Semantic Relationships: Doc2Vec learns vectors where cards with similar functions, costs, types, or textual patterns (e.g., different variations of counterspells, cheap red burn spells, evasive creatures) are closer together in the embedding space. This allows the diffusion model to learn higher-level concepts (“needs more removal,” “add card draw”) rather than just memorizing specific card co-occurrences, leading to potentially more robust and contextually relevant deck completions. This focus on semantic similarity is analogous to how embedding-based search engines return results that are conceptually related, not just keyword matches.
Dimensionality Reduction & Decoupling: Working with dense 128-dimensional vectors is more computationally manageable for the transformer architecture than using extremely high-dimensional one-hot vectors (one per card). It also decouples the task of understanding card text semantics (Doc2Vec) from the task of generative deck construction (Diffusion Model).

Card Text Preprocessing (`train_embedding_model.py:get_text`)

The get_text function preprocesses card data into a consistent string format suitable for Doc2Vec training. Specific choices include:

Mana Cost: Replaced curly braces {} with pipe symbols | (|W|, |U|, etc.) and ensured spacing around symbols ({W}{U} -> |W| |U|). This treats each mana symbol as a distinct token and distinguishes them from mana symbols in the card text.
Power/Toughness: Represented as $Power$ #Toughness# (e.g., $2$ #2#). This creates unique tokens for P/T values.
Card Type: The supertype (e.g., “Creature”, “Instant”) is split into individual tokens surrounded by pipes (|Creature|, |Instant|). Subtypes (e.g., “Goblin”, “Wizard”) are kept as single words.
Rules Text:
- Card name references replaced with @. This prevents the model from overfitting to specific card names and focuses on the actions/effects.
- Common self-references (“this creature”, “this enchantment”, etc.) also replaced with @.
- Line breaks, semicolons replaced with spaces. Colons have spaces added (: -> :).
- Reminder text (within parentheses) is removed using regex (re.sub(reminder_remover, '', ...)).
- Special characters like &, −, —, ', ,, ., ', " are handled (replaced or removed).
- Text is converted to lowercase.
Stop Words: Common English stop words (like “the”, “a”, “is”) are removed using nltk.corpus.stopwords to reduce noise and focus on meaningful terms.

This preprocessing aims to convert structured card information and natural language text into a sequence of meaningful tokens that the Doc2Vec model can learn relationships from.

# From train_embedding_model.py (Illustrative snippet)
def get_text(card):
    text = ''

    if 'manaCost' in card:
        text += card['manaCost'].replace('}{', '} {').replace('{', '|').replace('}', '|') + ' '
    if 'power' in card:
        text += '$' + card['power'] + '$ #' + card['toughness'] + '# '
    text += ' '.join(['|' + word + '|' for word in card['type'].split(' — ')[0].split()]) + ' '
    if '—' in card['type']:
        text += card['type'].split(' — ')[1] + ' '
    if 'text' in card:
        # ... (Handling basic lands) ...
        processed_text = card['text'].replace('&', 'and').replace(card['name'], '@').replace(card['name'].split(',')[0], '@') # simplified
        processed_text = processed_text.replace('this creature', '@').replace('this enchantment', '@').replace('this artifact', '@').replace('this land', '@')
        processed_text = processed_text.replace('\\n', ' ').replace(';', ' ').replace(':', ' :').replace('|', '•')
        text += processed_text

    text = re.sub(reminder_remover, '', text.lower()... ) # Lowercasing, punctuation, etc.
    words = [word for word in text.split(' ') if word != '']
    filtered_words = [word for word in words if word not in stop_words]
    return ' '.join(filtered_words)

# Doc2Vec Training (Conceptual)
model = Doc2Vec(vector_size=128, dm=0, dbow_words=0, min_count=2, epochs=200, workers=cores, seed=42)
model.build_vocab(corpus) # corpus yields TaggedDocument(processed_text.split(), [card_id])
model.train(corpus, ...)

A simple linear classifier (train_embedding_classifier.py) is trained separately to map these 128-dim embeddings back to unique card indices. This classifier is essential during the reverse diffusion process to identify the most likely card corresponding to a denoised embedding vector. This is more efficient than performing cosine similarity search for each generated embedding during inference.

# From train_embedding_classifier.py
class CardClassifier(nn.Module):
    def __init__(self, embedding_dim, num_classes):
        super(CardClassifier, self).__init__()
        self.network = nn.Linear(embedding_dim, num_classes)
    
    def forward(self, x):
        return self.network(x)

Denoising Diffusion Probabilistic Model (DDPM) Framework

The core generation mechanism is a DDPM, mirroring the approach used in image generation.

Forward Process (Noise Addition): Starting with the true deck embeddings x0, Gaussian noise is progressively added over T timesteps (here, T=1000). This is analogous to how image diffusion models start with a clear image and gradually add noise until only static remains. The noise level at each step is determined by a predefined variance schedule, specifically a cosine schedule (cosine_beta_schedule).

# diffusion_model.py
def cosine_beta_schedule(T, s=0.008):
    steps = torch.linspace(0, T, T + 1, dtype=torch.float64)
    alpha_bar = torch.cos(((steps / T) + s) / (1 + s) * torch.pi * 0.5) ** 2
    alpha_bar = alpha_bar / alpha_bar[0]
    betas = 1 - (alpha_bar[1:] / alpha_bar[:-1])
    return torch.clip(betas, 0, 0.999).float()

# Within DiffusionTrainer class:
beta = cosine_beta_schedule(T).to(device) # T = TIMESTEPS (e.g., 1000)
alpha = 1.0 - beta
alpha_bar = torch.cumprod(alpha, dim=0)
# Precompute terms used in diffusion and sampling:
self.register("sqrt_alpha_bar", torch.sqrt(alpha_bar))
self.register("sqrt_one_minus_alpha_bar", torch.sqrt(1.0 - alpha_bar))
self.register("beta", beta)
self.register("alpha", alpha)
# ... (other registered buffers)

The state x_t at timestep t can be sampled directly using the cumulative product alpha_bar: x_t = sqrt(alpha_bar_t) * x0 + sqrt(1 - alpha_bar_t) * noise

# diffusion_model.py: DiffusionTrainer.q_sample
# Note: The actual implementation modifies x_t for known cards
# based on the mask *after* the initial noise addition,
# returning a mix of original x0 and noised x_t.
def q_sample(self, x0, t, mask):
    mask_expanded = mask.expand_as(x0)
    sqrt_ab = self._extract(self.sqrt_alpha_bar, t, x0.shape, x0.device)
    sqrt_1mab = self._extract(self.sqrt_one_minus_alpha_bar, t, x0.shape, x0.device)
    noise = torch.randn_like(x0)
    # Calculate the fully noised version first
    x_t_noised = sqrt_ab * x0 + sqrt_1mab * noise
    # Return original embeddings for known positions, noised for unknown
    x_t_masked = mask_expanded * x0 + (1 - mask_expanded) * x_t_noised
    # Returns the masked noisy sample and the *original* noise (for loss)
    return x_t_masked, noise

Reverse Process (Denoising): The model learns to predict the noise epsilon added at timestep t. Starting from pure noise x_T, the model iteratively refines the embeddings by predicting the noise epsilon_pred = model(x_t, x0, sb_x_t, t, mask, sb_mask) and estimating x_{t-1} until x0, the original noise-free main deck, is reached. During inference, the actual denoising step involves sampling x_{t-1} by subtracting the predicted noise epsilon_pred from x_t, then adding a smaller amount of noise for the next timestep.

Model Architecture (`diffusion_model.py:DiffusionModel`)

The model uses a transformer-based architecture, the same core building block behind Large Language Models like GPT and BERT, but adapted for set-based data rather than sequences. It lacks positional embeddings, treating decks as unordered sets, which differs from typical NLP or vision transformer usage where sequence order is crucial. It has distinct paths for main deck and sideboard processing. The internal model dimension is model_dim=384, and the embedding dimension is EMB_DIM=128.

Time Embeddings: Timestep t is encoded using standard sinusoidal embeddings, processed by separate MLPs for main deck and sideboard paths.

# diffusion_model.py
def sinusoidal_embedding(t: torch.Tensor, dim: int = EMB_DIM):
    half = dim // 2
    freqs = torch.exp(-math.log(10000) * torch.arange(half, device=t.device) / (half - 1))
    args = t[:, None] * freqs[None]
    emb = torch.cat((args.sin(), args.cos()), dim=-1)
    if dim % 2:
        emb = nn.functional.pad(emb, (0, 1))
    return emb

# Within DiffusionModel.__init__
ff_dim = cfg["dim_feedforward"] # 3072
self.main_time_mlp = nn.Sequential(
    nn.Linear(EMB_DIM, ff_dim), nn.SiLU(), nn.Linear(ff_dim, EMB_DIM),
)
self.sb_time_mlp = nn.Sequential( # For Sideboard Decoder path
    nn.Linear(EMB_DIM, ff_dim), nn.SiLU(), nn.Linear(ff_dim, EMB_DIM),
)

Mask Embeddings: Binary masks (1.0 for known, 0.0 for unknown) are processed by separate MLPs.

# Within DiffusionModel.__init__
self.main_mask_mlp = nn.Sequential(
    nn.Linear(1, EMB_DIM), nn.SiLU(), nn.Linear(EMB_DIM, EMB_DIM),
)
self.sb_mask_mlp = nn.Sequential( # For Sideboard Decoder path
    nn.Linear(1, ff_dim), nn.SiLU(), nn.Linear(ff_dim, EMB_DIM), # Note: intermediate dim is ff_dim here
)

Input Processing: Input embeddings x_t (main) or sb_x_t (sideboard) are combined via addition with their respective time and mask embeddings, then projected to model_dim.

# Within DiffusionModel.forward
sin_emb = sinusoidal_embedding(t, EMB_DIM) # Shape: [Batch, EMB_DIM]

# Main Deck Input
main_t_emb_flat = self.main_time_mlp(sin_emb) # Shape: [Batch, EMB_DIM]
main_t_emb = main_t_emb_flat[:, None, :].expand(-1, DECK_SIZE, -1) # Shape: [Batch, DECK_SIZE, EMB_DIM]
main_mask_emb = self.main_mask_mlp(mask) # mask shape: [Batch, DECK_SIZE, 1] -> Output: [Batch, DECK_SIZE, EMB_DIM]
h_main = x_t + main_t_emb + main_mask_emb # x_t shape: [Batch, DECK_SIZE, EMB_DIM]
h_main_proj = self.main_input_proj(h_main) # Linear(EMB_DIM, model_dim) -> Shape: [Batch, DECK_SIZE, model_dim]

# Sideboard Input
sb_decoder_t_emb_flat = self.sb_time_mlp(sin_emb) # Shape: [Batch, EMB_DIM]
sb_decoder_t_emb = sb_decoder_t_emb_flat[:, None, :].expand(-1, SIDEBOARD_SIZE, -1) # Shape: [Batch, SIDEBOARD_SIZE, EMB_DIM]
sb_decoder_mask_emb = self.sb_mask_mlp(sb_mask) # sb_mask shape: [Batch, SIDEBOARD_SIZE, 1] -> Output: [Batch, SIDEBOARD_SIZE, EMB_DIM]
h_sb = sb_x_t + sb_decoder_t_emb + sb_decoder_mask_emb # sb_x_t shape: [Batch, SIDEBOARD_SIZE, EMB_DIM]
h_sb_proj = self.sb_input_proj(h_sb) # Linear(EMB_DIM, model_dim) -> Shape: [Batch, SIDEBOARD_SIZE, model_dim]

Main Deck Path (Encoder): Processes h_main_proj through a standard nn.TransformerEncoder (layers=8, nhead=8). The output is projected back to EMB_DIM to predict main deck noise (main_noise_pred).

# Within DiffusionModel.__init__
main_encoder_layer = nn.TransformerEncoderLayer(d_model=model_dim, nhead=nhead, ...)
self.main_transformer_encoder = nn.TransformerEncoder(main_encoder_layer, num_layers=num_layers)
self.main_output_proj = nn.Linear(model_dim, EMB_DIM)

# Within DiffusionModel.forward
main_encoded = self.main_transformer_encoder(h_main_proj)
main_noise_pred = self.main_output_proj(main_encoded)

Sideboard Context Path (Encoder): Processes the original main deck embeddings x0 (noise-free) through a separate, shallow nn.TransformerEncoder (num_layers=1) to create context (sb_context_encoded). This context is used by the sideboard decoder.

# Within DiffusionModel.__init__
self.sb_context_input_proj = nn.Linear(EMB_DIM, model_dim)
sb_context_encoder_layer = nn.TransformerEncoderLayer(d_model=model_dim, nhead=nhead, ...)
self.sideboard_context_encoder = nn.TransformerEncoder(sb_context_encoder_layer, num_layers=1)

# Within DiffusionModel.forward
h_sb_context = x0 # Original main deck embeddings
h_sb_context_proj = self.sb_context_input_proj(h_sb_context)
sb_context_encoded = self.sideboard_context_encoder(h_sb_context_proj)

Sideboard Path (Decoder): Processes projected sideboard embeddings h_sb_proj using a nn.TransformerDecoder (num_layers=1), conditioned on sb_context_encoded via cross-attention (memory). The decoder output is passed through another nn.TransformerEncoder (sb_layers=8) before the final projection back to EMB_DIM to predict sideboard noise (sb_noise_pred). The use of cross-attention to condition the sideboard generation on the main deck context is conceptually similar to how text-to-image models use cross-attention to condition image generation on a text prompt embedding.

# Within DiffusionModel.__init__
sb_decoder_layer = nn.TransformerDecoderLayer(d_model=model_dim, nhead=nhead, ...)
self.sb_transformer_decoder = nn.TransformerDecoder(sb_decoder_layer, num_layers=1)
# Uses the same sb_context_encoder_layer definition for the subsequent encoder
self.sb_transformer_output = nn.TransformerEncoder(sb_context_encoder_layer, num_layers=sb_num_layers)
self.sb_output_proj = nn.Linear(model_dim, EMB_DIM)

# Within DiffusionModel.forward
sb_decoded = self.sb_transformer_decoder(tgt=h_sb_proj, memory=sb_context_encoded)
sb_decoded = self.sb_transformer_output(sb_decoded)
sb_noise_pred = self.sb_output_proj(sb_decoded)

Conditioning: Masking Strategy

Conditional generation (deck completion) is handled via masking. Known card embeddings are provided by the user (or determined during training). This mechanism is analogous to providing a starting image and a mask for inpainting in image generation models, or providing a text prompt to guide generation.

Training: For each training sample (x0_embeddings, x0_indices, etc.), multiple masks (masks_per_deck) are generated dynamically per deck.
- The number of known main deck cards k_main is sampled from partitioned ranges [1, 59] across the generated masks to ensure diverse k values are seen.
- Sideboard k_sb is sampled randomly from [1, 14], with a 50% chance of being forced to 0. This is done so that the model performs well at generating sideboards from scratch, which I expect to be a common use case.

Masking Logic (diffusion_model.py:DiffusionTrainer._create_mask_row): This function generates a single mask row (shape [deck_size, 1]) for a target k. It identifies unique available card indices in the current deck (current_deck_indices). It samples these unique cards without replacement using weights derived from pre-calculated popularity scores (self.card_popularity), slightly favoring less popular cards (0.5 + score, where score is 1.0 - normalized_count). It iterates through these weighted, shuffled unique cards. For each unique card, with 85% probability, it attempts to mask all available copies (up to k remaining); with 15% probability, it masks a random number of available copies (from 1 up to available, limited by k remaining). This process repeats until k positions are marked as known (1.0 in the mask). We typically want to mask all instances of a card to make the task harder, so that the model has to learn what cards go together and not just to add additional copies.

# diffusion_model.py: DiffusionTrainer._create_mask_row (Simplified Pseudocode)
def _create_mask_row(k_target, deck_size, current_deck_indices, popularity_scores):
    mask_row = torch.zeros(deck_size, 1)
    available_indices = torch.ones(deck_size, dtype=torch.bool)
    masked_count = 0

    while masked_count < k_target and available_indices.any():
        # 1. Get unique card indices from currently *available* positions
        unique_cards = torch.unique(current_deck_indices[available_indices])
        if not unique_cards: break

        # 2. Calculate sampling weights (favor less popular)
        weights = torch.tensor([0.5 + popularity_scores.get(idx.item(), 1.0) for idx in unique_cards])

        # 3. Sample unique cards without replacement based on weights
        perm_indices = torch.multinomial(weights, num_samples=len(unique_cards), replacement=False)
        shuffled_unique_cards = unique_cards[perm_indices]

        for card_idx in shuffled_unique_cards:
            if masked_count >= k_target: break

            # 4. Find available positions for this specific card_idx
            potential_pos = (current_deck_indices == card_idx).nonzero(as_tuple=True)[0]
            available_pos = potential_pos[available_indices[potential_pos]] # Filter by available
            available_count = len(available_pos)
            if available_count == 0: continue

            needed = k_target - masked_count

            # 5. Decide how many copies to mask (85% all available, 15% random count)
            if random.random() < 0.85:
                num_to_mask = min(available_count, needed)
            else:
                max_can_mask = min(available_count, needed)
                if max_can_mask <= 0: continue
                num_to_mask = random.randint(1, max(1, max_can_mask))

            # 6. Select specific positions to mask and update mask_row/available_indices
            indices_to_mask = available_pos[torch.randperm(available_count)[:num_to_mask]]
            mask_row[indices_to_mask] = 1.0
            available_indices[indices_to_mask] = False
            masked_count += num_to_mask

    return mask_row

Loss Calculation: The MSE loss is computed only between the predicted noise (main_noise_pred, sb_noise_pred) and the true noise (noise, sb_noise from q_sample) for the unknown (mask value 0.0) card slots. This focuses the model on learning to generate the missing parts.

# diffusion_model.py: DiffusionTrainer.p_losses
main_loss = ((noise - main_noise_pred) * (1 - mask.expand_as(noise))).pow(2).mean()
sb_loss = ((sb_noise - sb_noise_pred) * (1 - sb_mask.expand_as(sb_noise))).pow(2).mean()
total_loss = main_loss + sb_loss

Inference: During the reverse diffusion process (sampling x_{t-1} from x_t), the known card embeddings x0_known (provided by the user) are reapplied at each step to guide the generation towards the desired completion. A common approach (simplified):
1. Predict noise: epsilon_pred = model(x_t, x0_context, sb_x_t, t, mask, sb_mask)
2. Calculate the parameters (mean, variance) of the distribution p(x_{t-1} | x_t) using x_t, t, and epsilon_pred according to the diffusion schedule.
3. Sample the potential next state x_{t-1}_sample from this distribution (adding noise if t > 0, otherwise using the mean).
4. Re-apply knowns to the sample: x_{t-1}_conditioned = mask * x0_known + (1 - mask) * x_{t-1}_sample.
5. Use x_{t-1}_conditioned as the input x_t for the next step (t-2). The main deck context sb_context_encoded for the sideboard decoder is generated from the final denoised main deck embeddings (x0_main_final).

Training (`diffusion_model.py:DiffusionTrainer`)

The current model was trained using ~47,000 decks scraped from MTGTop8 and is format agnostic, with the training data covering Standard, Modern, Pioneer, Pauper, Legacy, and Vintage. The full model contains ~56 million parameters. Due to its small size, training was feasible on consumer hardware, specifically a single Nvidia 3050 Laptop GPU with 4GB of VRAM, taking roughly 4 days to complete 100 epochs.

Dataset: DeckDataset loads decks and filters for exact 60 main deck / 15 sideboard card counts. It converts card names to the pre-trained Doc2Vec embeddings (card_embeddings.pkl) and retrieves corresponding integer indices using the mapping from the trained classifier (card_classifier.pt). Decks with cards missing from embeddings or the classifier map are skipped. It also calculates card popularity scores based on deck frequency for the masking strategy.
Optimizer: AdamW (torch.optim.AdamW) with learning rate (lr=5e-6 default) and weight decay (diff_weight_decay=1e-3 default).
Objective: Minimize the combined MSE loss total_loss described above, calculated over masks_per_deck different masks for each deck in the batch.
Process: Standard PyTorch training loop: iterates epochs, loads batches via DataLoader, calculates loss using p_losses, performs backpropagation (total_loss.backward()), clips gradients (nn.utils.clip_grad_norm_), updates optimizer (opt.step()). Checkpoints containing model state dict, epoch, and config are saved periodically (torch.save).

While the diffusion model learns the underlying patterns of deck construction from the training data, it doesn’t inherently guarantee adherence to strict game rules like the 4-copy limit for non-basic cards or format legality during the raw generation process. To address this, the inference functions employ an iterative refinement strategy after the initial denoising pass:

Initial Generation: The standard reverse diffusion process is performed once to generate initial embeddings for all unknown card slots, conditioned on any user-provided cards.
Classification & Rule Check: The resulting embeddings (both originally known and newly generated) are converted back to card names using the trained linear classifier (CardClassifier). The system then checks for violations:
- 4-Copy Limit: It counts occurrences of each non-basic card name. For sideboard generation, this count considers cards in both the main deck and the current sideboard iteration.
- Format Legality: Using preloaded card data (derived from MTGJSON’s AtomicCards.json), it verifies if each generated card is legal in the specified format (e.g., ‘Modern’, ‘Standard’). Basic lands are exempt from this check.
Identify Violations: The system identifies the specific generated card slots that violate either the 4-copy limit or format legality. User-provided cards are never marked for regeneration.
Mask Update & Regeneration: A new mask is created. User-provided cards and valid generated cards from the current iteration are marked as “known”. Slots corresponding to rule violations are marked as “unknown”.
Re-run Diffusion: The reverse diffusion sampling process is run again, using the updated mask and the embeddings of the known cards (including the valid generated ones) as fixed context. The model only needs to generate new embeddings for the slots marked as unknown due to rule violations.
Repeat: Steps 2-5 are repeated up to a fixed number of maximum refinement iterations (MAX_REFINEMENT_ITERATIONS). This loop continues until no rule violations are found among the generated cards or the iteration limit is reached.

This refinement loop significantly improves the likelihood of producing valid and legal deck completions by correcting rule violations after the initial generation, leveraging the classifier and external card data to guide the process without needing to bake these complex constraints directly into the diffusion model’s training objective. The final deck combines the original user input with the cards generated and refined through this process.

Conclusion

Manamorphosis applies diffusion models to Magic: The Gathering deck generation and was a fun, but time-consuming project. Developing this system, including embedding tuning, model architecture experiments, and refinement loops, was a substantial effort (two weeks of me procrastinating before finals).

While there’s always more to explore, like testing different embeddings or format specializations, the current model provides a solid foundation for AI-driven deck completion.

The complete code is available on GitHub if you’d like to run it yourself or see the nitty-gritty details: Manamorphosis GitHub repository.

Twitter: @JakeABoggs

Demo Video#

Card Representation: Doc2Vec Embeddings#

Rationale for Using Embeddings#

Card Text Preprocessing (train_embedding_model.py:get_text)#

Denoising Diffusion Probabilistic Model (DDPM) Framework#

Model Architecture (diffusion_model.py:DiffusionModel)#

Conditioning: Masking Strategy#

Training (diffusion_model.py:DiffusionTrainer)#

Inference: Enforcing Deck Rules with Iterative Refinement#

Conclusion#