DeploySharp 使用 ONNX Runtime 部署 PP-OCR v4/v5 教程

张开发

• 2026/4/15 9:37:20 • 15 分钟阅读

分享文章

DeploySharp 使用 ONNX Runtime 部署 PP-OCR v4/v5 教程本文详细介绍如何使用 DeploySharp 框架和 ONNX Runtime 推理引擎部署 PP-OCR v4/v5 模型涵盖 CPU、CUDA、DML、TensorRT 等多种部署方式的完整指南。目录• 一、ONNX Runtime 简介• 二、支持的后端对比• 三、环境准备• 四、模型准备• 五、CPU 推理实现• 六、CUDA 推理实现• 七、DML 推理实现• 八、TensorRT 推理实现• 九、性能对比与优化• 十、常见问题解答• 十一、软件获取一、ONNX Runtime 简介1.1 什么是 ONNX RuntimeONNX Runtime 是微软推出的高性能跨平台推理引擎支持 ONNX 模型格式。它是目前最受欢迎的推理引擎之一具有以下特点•跨平台支持 Windows、Linux、macOS、Android、iOS 等•多后端支持 CPU、CUDA、TensorRT、OpenVINO、DirectML 等多种执行提供器•高性能经过深度优化推理速度快•易用性简单的 API快速集成1.2 ONNX Runtime 的优势优势说明跨平台一套代码多平台运行多硬件支持CPU、NVIDIA GPU、AMD GPU、Intel GPU 等丰富的执行提供器CPU、CUDA、TensorRT、DML、OpenVINO 等易于集成支持 C#、C、Python 等多种语言活跃社区微软官方维护持续更新二、支持的后端对比ONNX Runtime 支持多种执行提供器Execution Provider以下是各后端的对比执行提供器支持设备性能特点适用场景CPU所有 CPU性能中等通用性强无 GPU 环境跨平台部署CUDANVIDIA GPUGPU 加速性能好NVIDIA 显卡需要 CUDA 环境TensorRTNVIDIA GPUGPU 加速 TensorRT 优化性能最佳NVIDIA 显卡追求极致性能DML多厂商 GPUAMD/NVIDIA/IntelWindows 平台统一接口Windows 平台多品牌显卡OpenVINOIntel CPU/iGPU/GPUIntel 硬件优化Intel 硬件Windows/Linux三、环境准备3.1 系统要求组件最低要求推荐配置操作系统Windows 10/11, LinuxWindows 11.NET 版本.NET 6.0.NET 8.0CPU4核8核内存8GB16GB显卡可选NVIDIA RTX 3060NVIDIA RTX 40703.2 安装 ONNX Runtime NuGet 包CPU 版本dotnet add package Microsoft.ML.OnnxRuntime.ManagedCUDA 版本dotnet add package Microsoft.ML.OnnxRuntime.Gpu.Windows注意CUDA 版本需要与系统安装的 CUDA 版本匹配• CUDA 11.x → OnnxRuntime.Gpu (旧版本)• CUDA 12.x → OnnxRuntime.Gpu.Windows (新版本)DML 版本dotnet add package Microsoft.ML.OnnxRuntime.DirectMLTensorRT 版本dotnet add package Microsoft.ML.OnnxRuntime.Gpu.WindowsTensorRT 执行提供器需要额外安装 TensorRT。3.3 CUDA 环境配置如需1. 访问 NVIDIA CUDA 官网https://developer.nvidia.com/cuda-downloads2. 下载并安装 CUDA 12.x3. 验证安装nvcc --version4.4 依赖文件配置将以下 DLL 文件复制到程序运行目录CPU 模式无需额外 DLL 文件。CUDA 模式cuda_runtime.dll cudnn64_8.dll cudnn_ops_infer64_8.dll cudnn_cnn_infer64_8.dllDML 模式directml.dll onnxruntime_providers_shared.dll四、模型准备PP-OCR 模型结构ppocrv5/ ├── PP-OCRv5_mobile_det_onnx.onnx # 文本检测模型 ├── PP-OCRv5_mobile_cls_onnx.onnx # 文本方向分类模型 ├── PP-OCRv5_mobile_rec_onnx.onnx # 文本识别模型 └── ppocrv5_dict.txt # 识别字典五、CPU 推理实现5.1 创建配置using DeploySharp.Data; using DeploySharp.Engine; using DeploySharp.Model; // 创建 PP-OCR v5 配置 PaddleOCRConfig config PaddleOCRConfig.GetPPOCRv5Config( detModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_det_onnx.onnx, clsModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_cls_onnx.onnx, recModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_rec_onnx_combined.onnx, recDictPath: E:\Model\ppocrv5\ppocrv5_dict.txt ); // 配置推理引擎 config.GlobalInferenceBackend InferenceBackend.OnnxRuntime; config.GlobalDeviceType DeviceType.CPU; config.GlobalOnnxRuntimeDeviceType OnnxRuntimeDeviceType.Cpu;5.2 完整代码示例using DeploySharp.Data; using DeploySharp.Engine; using DeploySharp.Log; using DeploySharp.Model; using OpenCvSharp; using System.Diagnostics; namespace PaddleOCR.ONNX.CPU.Demo { class Program { static void Main(string[] args) { MyLogger.SetLevel(Log.LogLevel.ERROR); // 读取图片 string imagePath E:\Data\ocr\demo_1.jpg; Mat img Cv2.ImRead(imagePath); if (img.Empty()) { Console.WriteLine(图片读取失败); return; } // 创建配置 PaddleOCRConfig config PaddleOCRConfig.GetPPOCRv5Config( detModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_det_onnx.onnx, clsModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_cls_onnx.onnx, recModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_rec_onnx_combined.onnx, recDictPath: E:\Model\ppocrv5\ppocrv5_dict.txt ); // CPU 推理配置 config.GlobalInferenceBackend InferenceBackend.OnnxRuntime; config.GlobalDeviceType DeviceType.CPU; config.GlobalOnnxRuntimeDeviceType OnnxRuntimeDeviceType.Cpu; config.MaxConcurrency 4; config.GlobalMaxBatchSize 1; config.RecConfig.InferImageHeight 48; config.RecConfig.MaxImageWidth 320; // 创建预测器 using (PaddleOcrPredictor predictor new PaddleOcrPredictor(config)) { Console.WriteLine(模型加载完成); // 预热 predictor.Predict(img); // 性能测试 Stopwatch sw Stopwatch.StartNew(); OcrResult result predictor.Predict(img); sw.Stop(); // 输出结果 Console.WriteLine(\n 识别结果 ); Console.WriteLine(result.TextContentsToString()); Console.WriteLine($\n总耗时: {sw.ElapsedMilliseconds} ms); predictor.PrintTimeProfiling(); // 可视化 Mat resultMat Visualize.DrawOcrResult(img, result, new VisualizeOptions(1.0f)); Cv2.ImShow(Result, resultMat); Cv2.WaitKey(); } } } }5.3 性能数据设备耗时备注AMD Ryzen 7 5800H~656ms8核无 GPUIntel Core i7-12700H~550ms12核无 GPU六、CUDA 推理实现6.1 环境准备1. 确认已安装 NVIDIA 显卡驱动2. 安装 CUDA 12.x3. 复制 CUDA 相关 DLL 文件到程序目录6.2 创建配置PaddleOCRConfig config PaddleOCRConfig.GetPPOCRv5Config( detModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_det_onnx.onnx, clsModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_cls_onnx.onnx, recModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_rec_onnx_combined.onnx, recDictPath: E:\Model\ppocrv5\ppocrv5_dict.txt ); // CUDA 推理配置 config.GlobalInferenceBackend InferenceBackend.OnnxRuntime; config.GlobalDeviceType DeviceType.GPU0; config.GlobalOnnxRuntimeDeviceType OnnxRuntimeDeviceType.Cuda; config.MaxConcurrency 4; config.GlobalMaxBatchSize 4;6.3 完整代码示例using DeploySharp.Data; using DeploySharp.Engine; using DeploySharp.Log; using DeploySharp.Model; using OpenCvSharp; using System.Diagnostics; namespace PaddleOCR.ONNX.CUDA.Demo { class Program { static void Main(string[] args) { MyLogger.SetLevel(Log.LogLevel.ERROR); // 读取图片 string imagePath E:\Data\ocr\demo_1.jpg; Mat img Cv2.ImRead(imagePath); // 创建配置 PaddleOCRConfig config PaddleOCRConfig.GetPPOCRv5Config( detModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_det_onnx.onnx, clsModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_cls_onnx.onnx, recModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_rec_onnx_combined.onnx, recDictPath: E:\Model\ppocrv5\ppocrv5_dict.txt ); // CUDA 推理配置 config.GlobalInferenceBackend InferenceBackend.OnnxRuntime; config.GlobalDeviceType DeviceType.GPU0; config.GlobalOnnxRuntimeDeviceType OnnxRuntimeDeviceType.Cuda; config.MaxConcurrency 4; config.GlobalMaxBatchSize 4; config.RecConfig.InferImageHeight 48; config.RecConfig.MaxImageWidth 320; // 创建预测器 using (PaddleOcrPredictor predictor new PaddleOcrPredictor(config)) { Console.WriteLine(模型加载完成); // 预热 predictor.Predict(img); // 性能测试 Stopwatch sw Stopwatch.StartNew(); OcrResult result predictor.Predict(img); sw.Stop(); // 输出结果 Console.WriteLine(\n 识别结果 ); Console.WriteLine(result.TextContentsToString()); Console.WriteLine($\n总耗时: {sw.ElapsedMilliseconds} ms); predictor.PrintTimeProfiling(); // 可视化 Mat resultMat Visualize.DrawOcrResult(img, result, new VisualizeOptions(1.0f)); Cv2.ImShow(Result, resultMat); Cv2.WaitKey(); } } } }6.4 性能数据设备耗时备注NVIDIA RTX 3060~93msCUDA 12NVIDIA RTX 4070~65msCUDA 12NVIDIA RTX 4090~45msCUDA 12七、DML 推理实现7.1 DML 简介DirectML (DML) 是 Windows 平台的高性能硬件加速接口支持 AMD、NVIDIA 和 Intel 多厂商显卡。7.2 创建配置PaddleOCRConfig config PaddleOCRConfig.GetPPOCRv5Config( detModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_det_onnx.onnx, clsModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_cls_onnx.onnx, recModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_rec_onnx_combined.onnx, recDictPath: E:\Model\ppocrv5\ppocrv5_dict.txt ); // DML 推理配置 config.GlobalInferenceBackend InferenceBackend.OnnxRuntime; config.GlobalDeviceType DeviceType.GPU0; config.GlobalOnnxRuntimeDeviceType OnnxRuntimeDeviceType.Dml; config.MaxConcurrency 2; config.GlobalMaxBatchSize 2;7.3 完整代码示例using DeploySharp.Data; using DeploySharp.Engine; using DeploySharp.Log; using DeploySharp.Model; using OpenCvSharp; using System.Diagnostics; namespace PaddleOCR.ONNX.DML.Demo { class Program { static void Main(string[] args) { MyLogger.SetLevel(Log.LogLevel.ERROR); // 读取图片 string imagePath E:\Data\ocr\demo_1.jpg; Mat img Cv2.ImRead(imagePath); // 创建配置 PaddleOCRConfig config PaddleOCRConfig.GetPPOCRv5Config( detModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_det_onnx.onnx, clsModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_cls_onnx.onnx, recModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_rec_onnx_combined.onnx, recDictPath: E:\Model\ppocrv5\ppocrv5_dict.txt ); // DML 推理配置 config.GlobalInferenceBackend InferenceBackend.OnnxRuntime; config.GlobalDeviceType DeviceType.GPU0; config.GlobalOnnxRuntimeDeviceType OnnxRuntimeDeviceType.Dml; config.MaxConcurrency 2; config.GlobalMaxBatchSize 2; config.RecConfig.InferImageHeight 48; config.RecConfig.MaxImageWidth 320; // 创建预测器 using (PaddleOcrPredictor predictor new PaddleOcrPredictor(config)) { Console.WriteLine(模型加载完成); // 预热 predictor.Predict(img); // 性能测试 Stopwatch sw Stopwatch.StartNew(); OcrResult result predictor.Predict(img); sw.Stop(); // 输出结果 Console.WriteLine(\n 识别结果 ); Console.WriteLine(result.TextContentsToString()); Console.WriteLine($\n总耗时: {sw.ElapsedMilliseconds} ms); predictor.PrintTimeProfiling(); // 可视化 Mat resultMat Visualize.DrawOcrResult(img, result, new VisualizeOptions(1.0f)); Cv2.ImShow(Result, resultMat); Cv2.WaitKey(); } } } }7.4 性能数据设备耗时备注NVIDIA RTX 3060~114msDMLNVIDIA RTX 4070~75msDMLAMD RX 6800~95msDMLIntel Arc A750~130msDML八、TensorRT 推理实现8.1 环境准备1. 安装 CUDA 12.x2. 安装 TensorRT 8.x3. 配置环境变量8.2 创建配置PaddleOCRConfig config PaddleOCRConfig.GetPPOCRv5Config( detModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_det_onnx.onnx, clsModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_cls_onnx.onnx, recModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_rec_onnx_combined.onnx, recDictPath: E:\Model\ppocrv5\ppocrv5_dict.txt ); // TensorRT 推理配置 config.GlobalInferenceBackend InferenceBackend.OnnxRuntime; config.GlobalDeviceType DeviceType.GPU0; config.GlobalOnnxRuntimeDeviceType OnnxRuntimeDeviceType.TensorRt; config.MaxConcurrency 4; config.GlobalMaxBatchSize 4;注意首次推理时ONNX Runtime 会自动将 ONNX 模型编译为 TensorRT 引擎这个过程可能需要数分钟。8.3 完整代码示例using DeploySharp.Data; using DeploySharp.Engine; using DeploySharp.Log; using DeploySharp.Model; using OpenCvSharp; using System.Diagnostics; namespace PaddleOCR.ONNX.TensorRT.Demo { class Program { static void Main(string[] args) { MyLogger.SetLevel(Log.LogLevel.ERROR); // 读取图片 string imagePath E:\Data\ocr\demo_1.jpg; Mat img Cv2.ImRead(imagePath); // 创建配置 PaddleOCRConfig config PaddleOCRConfig.GetPPOCRv5Config( detModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_det_onnx.onnx, clsModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_cls_onnx.onnx, recModelPath: E:\Model\ppocrv5\PP-OCRv5_mobile_rec_onnx_combined.onnx, recDictPath: E:\Model\ppocrv5\ppocrv5_dict.txt ); // TensorRT 推理配置 config.GlobalInferenceBackend InferenceBackend.OnnxRuntime; config.GlobalDeviceType DeviceType.GPU0; config.GlobalOnnxRuntimeDeviceType OnnxRuntimeDeviceType.TensorRt; config.MaxConcurrency 4; config.GlobalMaxBatchSize 4; config.RecConfig.InferImageHeight 48; config.RecConfig.MaxImageWidth 320; // 创建预测器 using (PaddleOcrPredictor predictor new PaddleOcrPredictor(config)) { Console.WriteLine(模型加载完成); // 预热首次会编译 TensorRT 引擎需要较长时间 Console.WriteLine(开始预热首次运行会编译 TensorRT 引擎请耐心等待...); predictor.Predict(img); Console.WriteLine(预热完成); // 性能测试 Stopwatch sw Stopwatch.StartNew(); OcrResult result predictor.Predict(img); sw.Stop(); // 输出结果 Console.WriteLine(\n 识别结果 ); Console.WriteLine(result.TextContentsToString()); Console.WriteLine($\n总耗时: {sw.ElapsedMilliseconds} ms); predictor.PrintTimeProfiling(); // 可视化 Mat resultMat Visualize.DrawOcrResult(img, result, new VisualizeOptions(1.0f)); Cv2.ImShow(Result, resultMat); Cv2.WaitKey(); } } } }8.4 性能数据设备耗时备注NVIDIA RTX 3060~52msTensorRTNVIDIA RTX 4070~35msTensorRTNVIDIA RTX 4090~25msTensorRT九、性能对比与优化9.1 性能对比以下为使用相同测试图片在不同后端上的性能对比执行提供器设备耗时相对性能CPUAMD Ryzen 7 5800H656ms1.0xDMLNVIDIA RTX 3060114ms5.75xDMLIntel Arc 140V331ms1.98xCUDANVIDIA RTX 306093ms7.05xTensorRTNVIDIA RTX 306052ms12.6x9.2 优化建议并发优化// 根据硬件调整并发数 // GPU 推理建议设置为 2-4 config.MaxConcurrency 4; // CPU 推理建议设置为 CPU 核心数 config.MaxConcurrency 8;批处理优化// GPU 推理建议增大 Batch Size config.GlobalMaxBatchSize 4; // CPU 推理建议保持 Batch Size 为 1 config.GlobalMaxBatchSize 1;模型优化// 调整识别模型输入尺寸 config.RecConfig.InferImageHeight 48; // 降低高度可加速 config.RecConfig.MaxImageWidth 320; // 限制宽度预热优化// 进行 1-2 次预热推理 for (int i 0; i 2; i) { predictor.Predict(img); }十、常见问题解答Q1: CUDA 推理报错怎么办A:检查以下几点1. 确认 CUDA 版本是否正确安装2. 检查 CUDA 相关 DLL 文件是否在程序目录3. 确认显卡驱动是否为最新版本4. 检查显卡是否支持 CUDAQ2: DML 推理速度慢怎么办A:优化建议1. 确认显卡驱动是否为最新版本2. 减小并发数和 Batch Size3. 尝试使用 CUDA 或 TensorRT如果使用 NVIDIA 显卡Q3: TensorRT 首次推理很慢A:这是正常现象首次推理时ONNX Runtime 会自动将 ONNX 模型编译为 TensorRT 引擎这个过程可能需要数分钟。编译完成后后续推理速度会显著提升。Q4: 如何切换不同执行提供器A:只需修改配置// CPU config.GlobalOnnxRuntimeDeviceType OnnxRuntimeDeviceType.Cpu; // CUDA config.GlobalOnnxRuntimeDeviceType OnnxRuntimeDeviceType.Cuda; // DML config.GlobalOnnxRuntimeDeviceType OnnxRuntimeDeviceType.Dml; // TensorRT config.GlobalOnnxRuntimeDeviceType OnnxRuntimeDeviceType.TensorRt;Q5: 如何选择最佳执行提供器A:根据硬件和需求选择场景推荐后端无 GPU跨平台CPUNVIDIA 显卡快速部署CUDANVIDIA 显卡追求性能TensorRTWindows 平台AMD 显卡DMLWindows 平台多品牌显卡DML十一、软件获取11.1 源码下载DeploySharp 项目已完全开源https://github.com/guojin-yan/DeploySharp.git11.2 Demo 程序控制台 Demodemos/DeploySharp.OpenCvSharp.PaddleOcr.Demo桌面应用 Demoapplications/.NET 8.0/JYPPX.DeploySharp.OpenCvSharp.PaddleOcr结语通过本文的介绍您应该已经掌握了使用 DeploySharp 和 ONNX Runtime 部署 PP-OCR v4/v5 模型的完整流程。ONNX Runtime 作为微软推出的高性能推理引擎支持多种执行提供器和硬件平台是 .NET 开发者进行 OCR 部署的理想选择。如遇到问题欢迎通过 GitHub Issues 或 QQ 交流群945057948联系我们。QQ群二维码作者Guojin Yan发布时间2026年4月【文章声明】本文主要内容基于作者的研究与实践部分表述借助 AI 工具进行了辅助优化。由于技术局限性文中可能存在错误或疏漏之处恳请各位读者批评指正。如果内容无意中侵犯了您的权益请及时通过公众号后台与我们联系我们将第一时间核实并妥善处理。感谢您的理解与支持

DeploySharp 使用 ONNX Runtime 部署 PP-OCR v4/v5 教程

最新文章

3分钟掌握GitHub加速：Fast-GitHub插件让你的下载速度提升10倍

RNA折叠算法实战：用Python实现Nussinov算法预测二级结构

硬件调试工具：示波器、逻辑分析仪

USBCopyer终极指南：Windows平台U盘文件自动备份神器

保姆级教程：用Grad-CAM可视化你的PyTorch模型到底在看哪里（附ResNet50实战代码）

如何优雅地重置 JetBrains IDE 试用期：ide-eval-resetter 技术深度解析

推荐文章

龙虾白嫖指南，请查收~勘

AI Agent在金融科技领域的应用实践：风控、投顾与合规

Unity3D动画插件DoTween进阶应用与性能优化指南

超表面贝塞尔光束生成系统代码功能深度解析

【5G系列】深入解析NAS层UAC：Access Identity与Access Category的获取机制

Spring with AI (): 搜索扩展——向量数据库与RAG(下)肺

相关文章

别再死记硬背MIPI状态转换图了！用Python脚本模拟单向/双向Data Lane状态机

HuggingFace模型下载终极优化：Autodl服务器上的国内镜像与断点续传技巧

Python EXE逆向解密深度解析：从加密打包到源码还原的完整流程

基于 Python 与 PyQt5 构建的特斯拉行车记录仪视频播放器

别再搞混了！PyTorch里CrossEntropyLoss和NLLLoss到底该用哪个？（附代码对比）

别再为Linux打印机驱动烦恼：foo2zjs开源驱动彻底解决兼容性问题

分享文章

更多文章

深度解析：OCRmyPDF多语言字体管理架构与实战配置指南

【反爬虫】极验4 W参数逆向分析

别再死记硬背公式了！用Python+Control库快速仿真Buck/Boost/Buck-Boost传递函数

快速上手PyTorch 2.6：用预装镜像轻松搭建AI开发环境，小白友好

合同审阅多智能体：风险条款识别、修改建议与版本对比

ThinkPad风扇控制神器：TPFanCtrl2让你的笔记本更安静更智能

WeChatExporter终极指南：在Mac上三步完成微信聊天记录完整备份

Cursor+Augment双AI组合实战：手把手教你搭建高效开发环境（附避坑指南）

python学习Day3：列表（List）与数据组织，for循环指令

保姆级教程：用DBeaver社区版连接SQL Server 2019的完整配置流程（含驱动管理器详解）

90% 团队忽略的 API 隐患，Pydantic 这 10 个技巧能避坑（附代码）

阿里云代理商：解锁 OpenClaw 高效工作流 8 大核心技能实战手册