Metamorph - Keyframe Generator Parameters

Keyframe Generator Parameters

The keyframe generator is the foundation of the Metamorph pipeline, powered by DiffMorpher with additional enhancements. It creates a sequence of interpolated frames between your source images, establishing the core structure of your morphing effect.

Model Selection

Metamorph supports multiple diffusion model architectures, each with its own strengths and characteristics:

Base Stable Diffusion V1-5

The original Stable Diffusion model that offers a balanced approach to image generation.

Best for: General-purpose morphing with good quality and reasonable inference time.

Technical details: 512×512 native resolution, strong understanding of general concepts and objects.

Dreamshaper-7 (fine-tuned SD V1-5)

A fine-tuned version of Stable Diffusion V1-5 optimized for artistic and creative outputs.

Best for: Artistic morphs with enhanced details and creative interpretations.

Technical details: Based on SD V1-5 but with improved aesthetic quality and creative elements.

Base Stable Diffusion V2-1

An advanced version of Stable Diffusion with improved image quality and understanding.

Best for: High-quality morphing requiring precise details and realistic transitions.

Technical details: 768×768 native resolution, better understanding of complex scenes and compositions.

Keyframe Generation Parameters

Number of Keyframes

Determines how many intermediate frames will be generated between your two source images. More keyframes result in smoother transitions but increase generation time.

Recommended settings:

Minimal: 4-8 frames (very fast but less smooth)
Balanced: 12-18 frames (good balance between quality and speed)
Maximum quality: 24-30 frames (very smooth but slower generation)

Note: When combined with FILM interpolation, even a small number of keyframes can result in a smooth final video.

LCM-LoRA Acceleration

Latent Consistency Models with Low-Rank Adaptation (LCM-LoRA) is a universal acceleration module that dramatically reduces inference time while maintaining most of the quality of the original diffusion process.

How it works: LCM-LoRA uses a modified sampling approach that requires significantly fewer steps (often 4-8 instead of 50+) by leveraging the self-consistency property of diffusion models.

Performance impact: Can accelerate generation by 5-10x with minimal quality loss.

When to use: Enable when faster generation is required. Particularly effective when combined with FILM interpolation, as any minor quality reduction is often masked by the interpolation process.

Adaptive Instance Normalization (AdaIN)

Adaptive Instance Normalization improves the smoothness of transitions by normalizing the statistical properties of interpolated latent spaces.

How it works: AdaIN adjusts the mean and standard deviation of the interpolated latent noises per channel, enabling smoother transitions when denoised.

Technical formula:

AdaIN(x, y) = σ(y) * (x - μ(x)) / σ(x) + μ(y)

where x is the content input, y is the style input, μ is the mean, and σ is the standard deviation.

Recommendation: Keep enabled for most morphing scenarios for smoother results.

Reschedule Sampling

This advanced technique creates a non-linear sampling schedule based on perceptual distances between frames, ensuring more uniform visual changes across the morphing sequence.

How it works: The system first computes perceptual distances (LPIPS) between consecutive frames in an initial linear sampling pass, then uses these measurements to create a second, non-linear sampling schedule.

Benefits: Ensures more uniform perceptual changes across the entire morphing sequence, avoiding "jumps" or inconsistent transitions.

Recommendation: Keep enabled for most morphing scenarios. Particularly valuable for complex morphs between significantly different images.

Text Description (Prompts)

Optional text descriptions for both source images enable text embedding interpolation, providing additional semantic guidance to the morphing process.

How it works: The system interpolates between the text embeddings of both prompts in parallel with the visual content, helping to maintain semantic coherence.

Best practices:

Use concise, descriptive prompts that highlight key elements of each image
Include important visual elements, styles, and moods
Keep prompts somewhat similar in structure for smoother semantic transitions

Example: Image A: "Portrait of a young woman with blonde hair" → Image B: "Portrait of an elderly woman with gray hair"