LtxVAE 学习笔记

张开发

• 2026/4/14 11:52:24 • 15 分钟阅读

分享文章

LtxVAE 推理代码import math import os import time import numpy as np import torch import torch.distributed as dist from einops import rearrange import torch.nn as nn from transformers import Wav2Vec2FeatureExtractor import torchvision.transforms as transforms from PIL import Image from loguru import logger import torch from flash_head.ltx_video.ltx_vae import LtxVAE from flash_head.utils.facecrop import process_image def get_cond_image_dict(cond_image_path_or_dir, use_face_crop): def get_image(cond_image_path, use_face_crop): if use_face_crop: try: image process_image(cond_image_path) return image except Exception as e: logger.error(fError processing {cond_image_path}: {e}) return Image.open(cond_image_path).convert(RGB) if os.path.isdir(cond_image_path_or_dir): import glob cond_image_list glob.glob(os.path.join(cond_image_path_or_dir, *.jpg)) cond_image_list.sort() cond_image_dict {cond_image.split(/)[-1].split(.)[0]: get_image(cond_image, use_face_crop) for cond_image in cond_image_list} else: cond_image_dict {cond_image_path_or_dir.split(/)[-1].split(.)[0]: get_image(cond_image_path_or_dir, use_face_crop)} return cond_image_dict def resize_and_centercrop(cond_image, target_size): Resize image or tensor to the target size without padding. # Get the original size if isinstance(cond_image, torch.Tensor): _, orig_h, orig_w cond_image.shape else: orig_h, orig_w cond_image.height, cond_image.width target_h, target_w target_size # Calculate the scaling factor for resizing scale_h target_h / orig_h scale_w target_w / orig_w # Compute the final size scale max(scale_h, scale_w) final_h math.ceil(scale * orig_h) final_w math.ceil(scale * orig_w) # Resize if isinstance(cond_image, torch.Tensor): if len(cond_image.shape) 3: cond_image cond_image[None] resized_tensor nn.functional.interpolate(cond_image, size(final_h, final_w), modenearest).contiguous() # crop cropped_tensor transforms.functional.center_crop(resized_tensor, target_size) cropped_tensor cropped_tensor.squeeze(0) else: resized_image cond_image.resize((final_w, final_h), resampleImage.BILINEAR) resized_image np.array(resized_image) # tensor and crop resized_tensor torch.from_numpy(resized_image)[None, ...].permute(0, 3, 1, 2).contiguous() cropped_tensor transforms.functional.center_crop(resized_tensor, target_size) cropped_tensor cropped_tensor[:, :, None, :, :] return cropped_tensor class FlashHeadPipeline: def __init__(self,vae_dir): self.vae LtxVAE( pretrained_model_type_or_pathvae_dir, dtypetorch.bfloat16, device cuda:0, ) self.target_h 448 self.target_w 448 self.device cuda:0 self.frame_num 33 self.param_dtypetorch.bfloat16 self.vae.model.encode torch.compile(self.vae.model.encode) self.vae.model.decode torch.compile(self.vae.model.decode) cond_image_path_or_dir/data/lbg/project/SoulX-FlashHead-api2/imgs/d11_960.jpg self.cond_image_dict get_cond_image_dict(cond_image_path_or_dir, True) self.cond_image_tensor_dict {} self.ref_img_latent_dict {} starttime.time() for i, (person_name, cond_image_pil) in enumerate(self.cond_image_dict.items()): cond_image_tensor resize_and_centercrop(cond_image_pil, (self.target_h, self.target_w)).to(self.device, dtypeself.param_dtype) # 1 C 1 H W cond_image_tensor (cond_image_tensor / 255 - 0.5) * 2 self.cond_image_tensor_dict[person_name] cond_image_tensor video_frames cond_image_tensor.repeat(1, 1, self.frame_num, 1, 1) self.ref_img_latent_dict[person_name] self.vae.encode(video_frames) # (16, 9, 64, 64) / (128, 5, 16, 16) # if i 0: # self.reset_person_name(person_name) print(vae.encode time,time.time()-start,len(self.cond_image_dict.items()),len(video_frames)) if __name__ __main__: vae_dir r/data/lbg/models/flash_head_models/SoulX-FlashHead-1.3B/VAE_LTX/ flash_pipe FlashHeadPipeline(vae_dir)

LtxVAE 学习笔记

最新文章

2025年小红书跳转卡片开发指南：微信协议直跳实战解析

Navicat无限试用终极指南：5分钟学会macOS重置试用期技巧

5分钟部署Qwen3-4B：vLLM推理+Chainlit前端，搭建专属AI应用

5分钟玩转HandheldCompanion：Windows游戏掌机控制神器完全指南

Qwen3-TTS在Ubuntu服务器上的生产环境部署

AIVideo创意增强技巧：关键词强化+负向提示词+风格权重调节详解

推荐文章

龙虾白嫖指南，请查收~勘

AI Agent在金融科技领域的应用实践：风控、投顾与合规

Unity3D动画插件DoTween进阶应用与性能优化指南

超表面贝塞尔光束生成系统代码功能深度解析

【5G系列】深入解析NAS层UAC：Access Identity与Access Category的获取机制

Spring with AI (): 搜索扩展——向量数据库与RAG(下)肺

相关文章

别再死记硬背MIPI状态转换图了！用Python脚本模拟单向/双向Data Lane状态机

HuggingFace模型下载终极优化：Autodl服务器上的国内镜像与断点续传技巧

Python EXE逆向解密深度解析：从加密打包到源码还原的完整流程

基于 Python 与 PyQt5 构建的特斯拉行车记录仪视频播放器

别再搞混了！PyTorch里CrossEntropyLoss和NLLLoss到底该用哪个？（附代码对比）

别再为Linux打印机驱动烦恼：foo2zjs开源驱动彻底解决兼容性问题

分享文章

更多文章

SecGPT-14B提示词工程：提升OpenClaw安全任务成功率

单片机串口通信协议设计与优化实践

别再只删聊天记录了！个人数字遗产与隐私保护指南：电子数据取证视角下的实用建议

【立煌】P286IVN01.1友达28.6寸长条形液晶屏

OpenClaw数据可视化：Qwen3.5-9B分析结果自动生成图表

LangChain vs LlamaIndex，AI 后端开发到底怎么选？场景 / 性能 / 门槛全对比

干词背单词不仅可以快速学会单词，还能交到全球各地的朋友！是这样吗？

基于ResNet、GoogLeNet和EfficientNet的乳腺癌病理图像自动分类研究

小而美的选择：低成本超自动化巡检工具

Typora笔记整理：系统化学习MogFace涉及的所有知识点

Flowise环境搭建：Mac M1芯片适配安装指南

linux下的串口子系统