@happyvertical/smrt-video

AI video production pipeline with characters, performers, scenes, shots, sequences, compositions, and ComfyUI workflow integration.

v0.20.44Video PipelineComfyUIFrame-Based

Overview

smrt-video models the full video production pipeline for AI-powered video generation. Characters define virtual personas with voice and branding, Performers provide physical likeness via IP-Adapter FaceID, and the Composition-Sequence-Shot hierarchy organizes generated content. ComfyUI workflows enable dynamic parameter injection for rendering.

Installation

bash
npm install @happyvertical/smrt-video

Quick Start

typescript
import {
  Character, Performer, Scene,
  VideoShot, VideoSequence, VideoComposition,
  VideoShotCharacter, VideoWorkflow,
} from '@happyvertical/smrt-video';

// Character = virtual persona (outfit, voice, branding)
const anchor = new Character({
  name: 'Bentley News Anchor',
  imageAssetId: 'seed-img-001',
  voiceProfileId: 'voice-123',
  brandingKit: {
    logoAssetId: 'logo-asset',
    primaryColor: '#1a73e8',
    lowerThirdTemplate: 'news-standard',
    tickerEnabled: true,
  },
});
await anchor.save();

// Performer = physical likeness for IP-Adapter face consistency
const performer = new Performer({
  name: 'Alex',
  ipAdapterWeight: 0.85,
});

// Scene = virtual background
const studio = new Scene({
  name: 'News Studio',
  sourceType: 'image',
  projection: 'flat',
});

// Hierarchy: Composition -> Sequence -> Shot
const composition = new VideoComposition({
  title: 'Evening News - March 2, 2026',
  fps: 30,
  width: 1920,
  height: 1080,
});
await composition.save();

const shot = new VideoShot({
  scriptText: 'Welcome to the evening news broadcast.',
  targetDuration: 30,
});
await shot.save();
// Estimated speech: scriptWordCount / 2.7 words per second

// ComfyUI workflow with parameter injection
const workflow = new VideoWorkflow({
  name: 'Wan 2.6 + EchoMimic',
  workflowType: 'broadcast',
  workflowJson: comfyuiApiJson,
  nodeMapping: { seedImage: '1', audioFile: '5', outputVideo: '12' },
  requiredModels: ['wan_2.6_t2v_14b_fp8', 'echomimic_v2'],
});
await workflow.save();

// Inject runtime parameters into a deep-cloned workflow
const injected = workflow.injectParameters({
  seedImage: '/path/to/anchor.png',
  audioFile: '/path/to/tts.wav',
});

Core Models

Character

typescript
class Character extends SmrtObject {
  name: string
  imageAssetId?: string       // Seed image FK
  voiceProfileId?: string     // FK to smrt-voice
  brandingKit?: BrandingConfig // Logo, colors, fonts, lower-thirds
  status: 'pending' | 'ready'
}

VideoShot (extends Content)

typescript
class VideoShot extends Content {
  scriptText?: string
  scriptWordCount: number
  durationInFrames: number
  videoMetadata?: VideoMetadata  // Includes wordTimings for lip-sync
  status: 'draft' | 'queued' | 'processing' | 'ready' | 'failed' | 'published'

  get estimatedDuration(): number  // scriptWordCount / 2.7 (words/sec)
}

VideoComposition (extends Content)

typescript
class VideoComposition extends Content {
  fps: number
  width: number
  height: number
  durationInFrames: number
  renderStatus: 'draft' | 'rendering' | 'ready' | 'failed'
  renderProgress: number
}

VideoWorkflow (ComfyUI)

typescript
class VideoWorkflow extends SmrtObject {
  name: string
  workflowType: 'prebake' | 'broadcast' | 'lipsync' | 'postprod' | 'custom'
  workflowJson: string        // Full ComfyUI API JSON
  nodeMapping: NodeMapping     // Maps semantic names -> node IDs
  requiredModels?: string[]

  // Deep-clones workflow and overwrites node.inputs
  injectParameters(params: Record<string, any>): object
}

Best Practices

DOs

  • Store durations as frames, compute seconds with frames / fps
  • Use nodeMapping to map semantic names to ComfyUI node IDs
  • Use injectParameters() for safe workflow parameter injection (deep-clones)
  • Estimate speech duration at 2.7 words/second (with 15% tolerance)
  • Link Characters to VoiceProfiles from smrt-voice for TTS integration

DON'Ts

  • Don't store durations as seconds (use durationInFrames everywhere)
  • Don't assume wordTimings is auto-generated (requires external TTS provider)
  • Don't mutate workflow JSON directly (use injectParameters() for safe cloning)
  • Don't forget trimBeforeFrames/trimAfterFrames in effective frame calculations
  • Don't upload face embeddings through the framework (weight is metadata-only)

Related Modules