🎬 ComfyUI Digital Clone Pipeline

Complete Setup Guide — Low Tier → Mid Tier → High Tier

🟡 LOW TIER — Learning & Testing
Basic Digital Clone Pipeline
Best for: Beginners, testing, learning ComfyUI. Not for client work.
$5–15
Per Month
RTX 3080
GPU (RunPod)
$0
Paid APIs
~20 min
Per Video
1
📸 Input — Photo + ScriptManual Upload
Upload a single portrait photo + type your script text into ComfyUI. No audio cleaning at this tier.
2
🎙️ Voice CloningXTTS v2
Coqui XTTS v2 clones voice from a 6-second audio sample. Free, open-source, runs locally on RunPod. Good quality but not perfect.
FREEOpen Source
3
😶 Face AnimationSadTalker
Animates a still photo to match the audio. Older model — works but produces slightly robotic head movements.
FREE
4
👄 Lip SyncWav2Lip
Syncs lip movements to the cloned voice audio. Basic accuracy — visible artifacts at high quality settings.
FREE
5
✨ Face RestorationGFPGAN
Restores face quality after lip sync distortion. Older model — noticeable smoothing effect on skin.
FREE
6
🎬 Auto CaptionsWhisper
OpenAI Whisper transcribes audio and generates SRT subtitle file automatically. Free and highly accurate.
FREE
7
🎞️ Final AssemblyFFmpeg
Merges video + audio + captions into final MP4. No 4K upscaling at this tier — output is 720p max.
FREE

⚙️ RunPod Setup — Low Tier

1
Create RunPod Account
Go to runpod.io → Sign up → Add $10 credit to start
2
Deploy ComfyUI Template
Pods → + New Pod → Search "ComfyUI" template → Select RTX 3080 (cheapest)
3
Install Custom Nodes via ComfyUI Manager
Open ComfyUI → Manager → Install Missing Nodes
Nodes to install: → ComfyUI-SadTalker → ComfyUI-Wav2Lip → ComfyUI-GFPGAN → ComfyUI-XTTS → ComfyUI-Whisper
4
Stop Pod When Not Using
⚠️ Always stop your pod after use — you are billed per hour even when idle!
🔵 MID TIER ⭐ — Professional & Sellable
Professional Digital Clone Pipeline
Best for: Freelancers, agencies, selling video services. Fully professional output.
$40–70
Per Month
RTX 3080/3090
GPU (RunPod)
$0
Paid APIs
~15 min
Per Video
1
📸 Input — Photo + Script + Raw AudioManual Upload
Upload portrait photo + script text + optional raw audio recording (even noisy mic is fine — Demucs cleans it).
2
🎵 Audio CleaningDemucs
Meta's Demucs separates voice from background noise. Removes room echo, keyboard clicks, fan noise — produces studio-clean audio.
NEW vs Low TierFREE — Meta Open Source
3
🎙️ Voice CloningXTTS v2
Same XTTS v2 as Low Tier — but now fed with Demucs-cleaned audio, producing significantly better voice clone quality.
IMPROVED via Demucs inputFREE
4
😶 Face AnimationLivePortrait
Major upgrade from SadTalker. LivePortrait produces natural head movements, eye blinks, micro-expressions. Looks like a real person talking.
UPGRADE from SadTalkerFREE
5
👄 Lip SyncMuseTalk
Major upgrade from Wav2Lip. MuseTalk produces near-perfect lip sync with zero visible artifacts. Handles fast speech and multiple languages.
UPGRADE from Wav2LipFREE
6
✨ Face RestorationCodeFormer
Upgrade from GFPGAN. CodeFormer preserves natural skin texture, pores, and hair detail. No more plastic-skin effect.
UPGRADE from GFPGANFREE
7
🔍 4K UpscalingReal-ESRGAN
Upscales video from 720p to 4K resolution. Enhances sharpness, removes compression artifacts. Output is crisp on any screen.
NEW vs Low TierFREE
8
🖼️ AI Background GenerationSDXL / FLUX.1
Generates a professional static AI background (office, studio, outdoor). Avatar is composited onto background using chroma key or segmentation.
NEW vs Low TierFREE — Static Background
9
🎬 Auto CaptionsWhisper
Same Whisper as Low Tier — generates accurate SRT captions. At Mid Tier, captions are styled and burned into video automatically.
STYLED captionsFREE
10
🎞️ Final AssemblyFFmpeg
Merges all layers: avatar video + AI background + cleaned audio + styled captions → Final 4K MP4 output. You manually click Queue Prompt in ComfyUI.
FREE — Manual Trigger

⚙️ RunPod Setup — Mid Tier

1
Select RTX 3080 or 3090 (24GB VRAM)
RunPod → New Pod → RTX 3080 (~$0.44/hr) or RTX 3090 (~$0.74/hr). Use Network Volume for persistent storage.
2
Deploy ComfyUI + Install Manager
Template: ComfyUI Official Volume: 60–80GB Network Volume Port: 8188 (ComfyUI UI) Port: 22 (SSH access)
3
Install All Custom Nodes
Via ComfyUI Manager → Install: → ComfyUI-LivePortrait → ComfyUI-MuseTalk → ComfyUI-CodeFormer → ComfyUI-Real-ESRGAN → ComfyUI-XTTS → ComfyUI-Demucs → ComfyUI-Whisper → ComfyUI-FLUX (for backgrounds) → ComfyUI-FFmpeg-Node
4
Download Models to Network Volume
Models needed: → XTTS v2 weights (~1.8GB) → LivePortrait weights (~1.2GB) → MuseTalk weights (~900MB) → CodeFormer weights (~330MB) → Real-ESRGAN x4plus (~67MB) → FLUX.1-dev (~23GB) or SDXL (~7GB) → Whisper large-v3 (~3GB)
5
Load Workflow JSON → Click Queue Prompt
Import the pipeline workflow JSON → connect all nodes → upload your photo + script → click Queue Prompt → wait ~15 mins → download output.
🟢 HIGH TIER — Agency & Scale
Full Agency Pipeline + B-Roll + Automation
Best for: Agencies, high-volume production, premium client work, zero manual work.
$150–420
Per Month
A100 / H100
GPU (RunPod)
$22–120+
Optional APIs
~3–5 min
Per Video
1
🤖 Auto Trigger via n8nn8n Workflow
Client fills a form / sends email / places order → n8n detects it automatically → sends all inputs to ComfyUI via API. Zero manual clicking.
NEW vs Mid Tiern8n self-hosted FREE / Cloud $20/mo
2
🎵 Audio CleaningDemucs
Same as Mid Tier — Demucs cleans audio automatically as part of the n8n-triggered pipeline.
FREE — Same as Mid Tier
3
🎙️ Voice CloningElevenLabs API
Optional upgrade from XTTS v2. ElevenLabs produces ultra-realistic, emotionally expressive voice. Indistinguishable from human recording. $22/mo Starter plan.
UPGRADE from XTTS v2OPTIONAL — $22–299/moOR keep XTTS v2 FREE
4
😶 Face AnimationLivePortrait
Same LivePortrait as Mid Tier — but running on A100 GPU, so 3–5x faster rendering speed.
FASTER on A100FREE
5
👄 Lip SyncMuseTalk
Same MuseTalk as Mid Tier — faster on A100. Perfect lip sync output.
FREE — Same as Mid Tier
6
✨ Face RestorationCodeFormer
Same CodeFormer as Mid Tier — faster on A100.
FREE — Same as Mid Tier
7
🔍 4K UpscalingReal-ESRGAN
Same Real-ESRGAN as Mid Tier — faster on A100.
FREE — Same as Mid Tier
8
🖼️ Cinematic AI BackgroundFLUX.1 + ControlNet
Upgrade from basic SDXL. FLUX.1 + ControlNet + IP-Adapter produces photorealistic, cinematic backgrounds. Looks like a real film set — not AI generated.
UPGRADE from SDXLFREE — Better Quality
9
🎥 B-Roll Video GenerationAnimateDiff + ControlNet
BRAND NEW at High Tier. While avatar talks about "beach vacation" → AI generates actual moving beach video footage as B-roll. Cuts between avatar and B-roll scenes automatically.
NEW — Not in Mid TierFREE — AnimateDiff Open Source
10
🎬 Auto CaptionsWhisper
Same Whisper — styled captions burned in automatically as part of the pipeline.
FREE — Same as Mid Tier
11
🎞️ Final AssemblyFFmpeg
Same FFmpeg as Mid Tier — merges all layers including B-roll scenes. Triggered automatically by n8n, not manually.
FREE — Auto Triggered
12
📤 Auto Delivery via n8nn8n Workflow
n8n detects rendering complete → uploads to Google Drive → posts to YouTube/TikTok → emails client → logs to Notion → sends you Telegram notification. All automatic.
NEW vs Mid Tiern8n self-hosted FREE

🤖 n8n Automation Flow

📋 Client Form
n8n Detects
Send to ComfyUI
Pipeline Runs
Video Rendered
Upload Drive
Post Social
Notify Client
📊 FULL COMPARISON
All Three Tiers — Side by Side
Every tool, every step, every cost — compared honestly.
Step / Tool 🟡 Low Tier 🔵 Mid Tier ⭐ 🟢 High Tier
Auto Trigger❌ Manual❌ Manual✅ n8n
Audio Cleaning❌ None✅ Demucs✅ Demucs
Voice Cloning⚠️ XTTS v2✅ XTTS v2✅✅ ElevenLabs*
Face Animation⚠️ SadTalker✅ LivePortrait✅ LivePortrait
Lip Sync⚠️ Wav2Lip✅ MuseTalk✅ MuseTalk
Face Restoration⚠️ GFPGAN✅ CodeFormer✅ CodeFormer
4K Upscaling❌ None✅ Real-ESRGAN✅ Real-ESRGAN
AI Background❌ None✅ SDXL/FLUX✅✅ FLUX+ControlNet
B-Roll Video❌ None❌ None✅ AnimateDiff
Auto Captions✅ Whisper✅ Whisper✅ Whisper
Final Assembly✅ FFmpeg✅ FFmpeg✅ FFmpeg
Auto Delivery❌ Manual❌ Manual✅ n8n
GPURTX 3080RTX 3080/3090A100 / H100
Render Speed~20 min/video~15 min/video~3–5 min/video
Output Quality⚠️ Basic✅ Professional✅✅ Agency
Paid APIs$0$0$0–120+/mo*
RunPod Cost$5–15/mo$40–70/mo$150–300/mo
Total Cost$5–15/mo$40–70/mo$150–420/mo
Best ForLearningFreelancers ⭐Agencies

🟡 Low Tier — What You Get

  • Basic working digital clone
  • 720p output maximum
  • Visible lip sync artifacts
  • No background replacement
  • Good for learning ComfyUI
  • NOT suitable for selling

🔵 Mid Tier — What You Get ⭐

  • Professional 4K output
  • Near-perfect lip sync
  • Natural face animation
  • AI background replacement
  • Studio-clean audio
  • Fully sellable to clients

🟢 High Tier — What You Get

  • Everything in Mid Tier PLUS
  • Moving B-roll video scenes
  • Cinematic backgrounds
  • Ultra-realistic voice (ElevenLabs*)
  • Fully automated pipeline
  • Agency-level volume capacity

💡 Recommended Path

  • Week 1–2: Start at Low Tier
  • Learn ComfyUI node basics
  • Week 3–4: Move to Mid Tier
  • Start selling at Mid Tier
  • Month 3+: Add High Tier extras
  • Only upgrade when revenue justifies