YOLO26改进 | 模块融合 | 使高分辨率下的全局上下文建模更高效并提升精度

张开发

• 2026/4/21 15:30:51 • 15 分钟阅读

分享文章

YOLO26改进 | 模块融合 | 使高分辨率下的全局上下文建模更高效并提升精度

本专栏所有程序均经过测试可成功执行本文给大家带来的教程是将YOLO26的C2PSA替换为C2PSA_Agent来提取特征。文章在介绍主要的原理后将手把手教学如何进行模块的代码添加和修改并将修改后的完整代码放在文章的最后方便大家一键运行小白也可轻松上手实践。以帮助您更好地学习深度学习目标检测YOLO系列的挑战。专栏地址YOLO26改进-论文涨点——点击跳转看所有内容关注不迷路目录1.论文2. C2PSA_Agent代码实现2.1 将C2PSA_Agent添加到YOLO26中2.2 更改init.py文件2.3 添加yaml文件2.4 在task.py中进行注册2.5 执行程序3. 完整代码分享4. GFLOPs5. 进阶6.总结1.论文论文地址Agent Attention: On the Integration of Softmax and Linear Attention官方代码官方代码仓库点击即可跳转2. C2PSA_Agent代码实现2.1 将C2PSA_Agent添加到YOLO26中关键步骤一在ultralytics\ultralytics\nn\modules下面新建文件夹models在文件夹下新建C2PSA_Agent.py粘贴下面代码import torch import torch.nn as nn import torch.nn.functional as F from timm.models.layers import trunc_normal_ from ultralytics.nn.modules.conv import Conv import math class AgentAttention(nn.Module): AgentAttention module that dynamically determines H, W in forward pass. def __init__(self, dim, num_heads8, qkv_biasFalse, qk_scaleNone, attn_drop0., proj_drop0., sr_ratio1, agent_num49, **kwargs): super().__init__() assert dim % num_heads 0, fdim {dim} should be divided by num_heads {num_heads}. assert int(agent_num**0.5 0.5)**2 agent_num, fagent_num ({agent_num}) must be a perfect square self.dim dim self.num_heads num_heads head_dim dim // num_heads self.scale qk_scale or head_dim ** -0.5 self.q nn.Linear(dim, dim, biasqkv_bias) self.kv nn.Linear(dim, dim * 2, biasqkv_bias) self.attn_drop nn.Dropout(attn_drop) self.proj nn.Linear(dim, dim) self.proj_drop nn.Dropout(proj_drop) self.sr_ratio sr_ratio if sr_ratio 1: self.sr nn.Conv2d(dim, dim, kernel_sizesr_ratio, stridesr_ratio) self.norm nn.LayerNorm(dim) self.agent_num agent_num pool_size int(agent_num ** 0.5) agent_patch_H, agent_patch_W pool_size, pool_size self.dwc nn.Conv2d(in_channelsdim, out_channelsdim, kernel_size(3, 3), padding1, groupsdim) # --- Initialize biases --- # 2D biases (interpolated based on target H, W) self.an_bias nn.Parameter(torch.zeros(num_heads, agent_num, agent_patch_H, agent_patch_W)) self.na_bias nn.Parameter(torch.zeros(num_heads, agent_num, agent_patch_H, agent_patch_W)) # 1D biases (interpolated based on target H or W) # Storing them with intended spatial dimension at the end for easier interpolation placeholder_size 1 # Will be resized self.ah_bias nn.Parameter(torch.zeros(1, num_heads, agent_num, placeholder_size)) # Interpolate last dim (H) self.aw_bias nn.Parameter(torch.zeros(1, num_heads, agent_num, placeholder_size)) # Interpolate last dim (W) self.ha_bias nn.Parameter(torch.zeros(1, num_heads, agent_num, placeholder_size)) # Interpolate last dim (H) self.wa_bias nn.Parameter(torch.zeros(1, num_heads, agent_num, placeholder_size)) # Interpolate last dim (W) trunc_normal_(self.an_bias, std.02) trunc_normal_(self.na_bias, std.02) trunc_normal_(self.ah_bias, std.02) trunc_normal_(self.aw_bias, std.02) trunc_normal_(self.ha_bias, std.02) trunc_normal_(self.wa_bias, std.02) self.pool nn.AdaptiveAvgPool2d(output_size(pool_size, pool_size)) self.softmax nn.Softmax(dim-1) def forward(self, x): b, n, c x.shape h_w int(math.sqrt(n)) assert h_w * h_w n, fInput sequence length {n} is not a perfect square. H, W h_w, h_w num_heads self.num_heads head_dim c // num_heads q self.q(x) if self.sr_ratio 1: assert H % self.sr_ratio 0 and W % self.sr_ratio 0, \ fRuntime H{H}, W{W} not divisible by sr_ratio{self.sr_ratio} x_ x.permute(0, 2, 1).reshape(b, c, H, W) x_ self.sr(x_).reshape(b, c, -1).permute(0, 2, 1) x_ self.norm(x_) kv self.kv(x_).reshape(b, -1, 2, c).permute(2, 0, 1, 3) kv_n n // (self.sr_ratio ** 2) kv_H, kv_W H // self.sr_ratio, W // self.sr_ratio else: kv self.kv(x).reshape(b, n, 2, c).permute(2, 0, 1, 3) kv_n n kv_H, kv_W H, W k, v kv[0], kv[1] q_for_pool q.permute(0, 2, 1).reshape(b, c, H, W) agent_tokens self.pool(q_for_pool).reshape(b, c, -1).permute(0, 2, 1) q q.reshape(b, n, num_heads, head_dim).permute(0, 2, 1, 3) k k.reshape(b, kv_n, num_heads, head_dim).permute(0, 2, 1, 3) v v.reshape(b, kv_n, num_heads, head_dim).permute(0, 2, 1, 3) agent_tokens agent_tokens.reshape(b, self.agent_num, num_heads, head_dim).permute(0, 2, 1, 3) # --- Agent - K/V Attention Biases --- kv_size (kv_H, kv_W) # Interpolate 2D bias (an_bias) position_bias1 F.interpolate(self.an_bias, sizekv_size, modebilinear, align_cornersFalse) # (num_heads, agent_num, kv_H, kv_W) position_bias1 position_bias1.reshape(1, num_heads, self.agent_num, kv_n).repeat(b, 1, 1, 1) # (b, num_heads, agent_num, kv_n) # Interpolate 1D biases (ah_bias, aw_bias) # Reshape for interpolate: treat (1 * num_heads * agent_num) as channels, (H_placeholder,) or (W_placeholder,) as spatial dim orig_ah_shape self.ah_bias.shape # (1, num_heads, agent_num, H_placeholder) ah_bias_reshaped self.ah_bias.reshape(orig_ah_shape[0] * orig_ah_shape[1] * orig_ah_shape[2], 1, orig_ah_shape[3]) # (N, C1, H) ah_bias_resized F.interpolate(ah_bias_reshaped, sizekv_H, modelinear, align_cornersFalse) # Interpolate H dim - (N, C1, kv_H) ah_bias_final ah_bias_resized.reshape(orig_ah_shape[0], orig_ah_shape[1], orig_ah_shape[2], kv_H).unsqueeze(-1) # (1, num_heads, agent_num, kv_H, 1) orig_aw_shape self.aw_bias.shape # (1, num_heads, agent_num, W_placeholder) aw_bias_reshaped self.aw_bias.reshape(orig_aw_shape[0] * orig_aw_shape[1] * orig_aw_shape[2], 1, orig_aw_shape[3]) # (N, C1, W) aw_bias_resized F.interpolate(aw_bias_reshaped, sizekv_W, modelinear, align_cornersFalse) # Interpolate W dim - (N, C1, kv_W) aw_bias_final aw_bias_resized.reshape(orig_aw_shape[0], orig_aw_shape[1], orig_aw_shape[2], kv_W).unsqueeze(-2) # (1, num_heads, agent_num, 1, kv_W) # Combine 1D biases position_bias2 (ah_bias_final aw_bias_final) # Broadcasts to (1, num_heads, agent_num, kv_H, kv_W) position_bias2 position_bias2.reshape(1, num_heads, self.agent_num, kv_n).repeat(b, 1, 1, 1) # (b, num_heads, agent_num, kv_n) # Final Agent-KV position bias position_bias position_bias1 position_bias2 # Agent-KV Attention calculation attn_agent_kv (agent_tokens * self.scale) k.transpose(-2, -1) agent_attn self.softmax(attn_agent_kv position_bias) agent_attn self.attn_drop(agent_attn) agent_v agent_attn v # --- Q - Agent Attention Biases --- q_size (H, W) # Interpolate 2D bias (na_bias) agent_bias1 F.interpolate(self.na_bias, sizeq_size, modebilinear, align_cornersFalse) # (num_heads, agent_num, H, W) agent_bias1 agent_bias1.reshape(1, num_heads, self.agent_num, n).permute(0, 1, 3, 2).repeat(b, 1, 1, 1) # (b, num_heads, n, agent_num) # Interpolate 1D biases (ha_bias, wa_bias) - Apply same reshape logic orig_ha_shape self.ha_bias.shape # (1, num_heads, agent_num, H_placeholder) ha_bias_reshaped self.ha_bias.reshape(orig_ha_shape[0] * orig_ha_shape[1] * orig_ha_shape[2], 1, orig_ha_shape[3]) # (N, C1, H) ha_bias_resized F.interpolate(ha_bias_reshaped, sizeH, modelinear, align_cornersFalse) # Interpolate H dim - (N, C1, H) ha_bias_final ha_bias_resized.reshape(orig_ha_shape[0], orig_ha_shape[1], orig_ha_shape[2], H).unsqueeze(-1) # (1, num_heads, agent_num, H, 1) # Permute to match (b, num_heads, n, agent_num) structure: need (1, num_heads, H, 1, agent_num) ha_bias_final ha_bias_final.permute(0, 1, 3, 4, 2) # (1, num_heads, H, 1, agent_num) orig_wa_shape self.wa_bias.shape # (1, num_heads, agent_num, W_placeholder) wa_bias_reshaped self.wa_bias.reshape(orig_wa_shape[0] * orig_wa_shape[1] * orig_wa_shape[2], 1, orig_wa_shape[3]) # (N, C1, W) wa_bias_resized F.interpolate(wa_bias_reshaped, sizeW, modelinear, align_cornersFalse) # Interpolate W dim - (N, C1, W) wa_bias_final wa_bias_resized.reshape(orig_wa_shape[0], orig_wa_shape[1], orig_wa_shape[2], W).unsqueeze(-2) # (1, num_heads, agent_num, 1, W) # Permute to match (b, num_heads, n, agent_num) structure: need (1, num_heads, 1, W, agent_num) wa_bias_final wa_bias_final.permute(0, 1, 3, 4, 2) # (1, num_heads, 1, W, agent_num) # Combine 1D biases agent_bias2 (ha_bias_final wa_bias_final) # Broadcasts to (1, num_heads, H, W, agent_num) agent_bias2 agent_bias2.reshape(1, num_heads, n, self.agent_num).repeat(b, 1, 1, 1) # (b, num_heads, n, agent_num) # Final Q-Agent position bias agent_bias agent_bias1 agent_bias2 # Q-Agent Attention calculation attn_q_agent (q * self.scale) agent_tokens.transpose(-2, -1) q_attn self.softmax(attn_q_agent agent_bias) q_attn self.attn_drop(q_attn) x q_attn agent_v # --- Combine Heads and DWC Path --- x x.transpose(1, 2).reshape(b, n, c) v_for_dwc v.transpose(1, 2).reshape(b, kv_n, c) v_for_dwc v_for_dwc.permute(0, 2, 1).reshape(b, c, kv_H, kv_W) if self.sr_ratio 1: v_for_dwc F.interpolate(v_for_dwc, size(H, W), modebilinear, align_cornersFalse) dwc_out self.dwc(v_for_dwc) dwc_out dwc_out.permute(0, 2, 3, 1).reshape(b, n, c) x x dwc_out # --- Final Projection --- x self.proj(x) x self.proj_drop(x) return x # --- PSABlock and C2PSA remain the same as the previous working version --- # Make sure PSABlock uses Linear FFN and C2PSA takes args correctly from YAML # # Modified PSABlock (No input_resolution in init) # class PSABlock(nn.Module): PSABlock using AgentAttention, determining H, W dynamically. def __init__(self, c, num_heads8, qkv_biasFalse, sr_ratio1, agent_num49, attn_drop0., proj_drop0., ffn_exp_ratio2.0, shortcutTrue): # Removed input_resolution, added ffn_exp_ratio super().__init__() self.attn AgentAttention( dimc, num_headsnum_heads, qkv_biasqkv_bias, attn_dropattn_drop, proj_dropproj_drop, sr_ratiosr_ratio, agent_numagent_num ) ffn_hidden_dim int(c * ffn_exp_ratio) self.ffn nn.Sequential( nn.Linear(c, ffn_hidden_dim), nn.GELU(), nn.Dropout(proj_drop), nn.Linear(ffn_hidden_dim, c), nn.Dropout(proj_drop) ) self.add shortcut # Controls internal residuals self.norm1 nn.LayerNorm(c) self.norm2 nn.LayerNorm(c) def forward(self, x): B, C, H, W x.shape x_attn_input x.flatten(2).transpose(1, 2).contiguous() # (B, N, C) # Attention Block normed_x self.norm1(x_attn_input) attn_out self.attn(normed_x) # Residual connection for attention x_attn_output x_attn_input attn_out # Assumes self.add controls this path implicitly # FFN Block normed_ffn_input self.norm2(x_attn_output) ffn_out self.ffn(normed_ffn_input) # Residual connection for FFN x_ffn_output x_attn_output ffn_out # Assumes self.add controls this path implicitly # Reshape final output back to (B, C, H, W) x_final x_ffn_output.transpose(1, 2).reshape(B, C, H, W) # If shortcutTrue in C2PSAs PSABlock call, internal residuals are added. # C2PSA handles the parallel branch summation itself. return x_final # # Modified C2PSA (No input_resolution in init) # class C2PSA_Agent(nn.Module): C2PSA using PSABlock with dynamic H/W determination. def __init__(self, c1, c2, n1, e0.5, num_heads8, sr_ratio1, agent_num49, qkv_biasFalse, attn_drop0., proj_drop0., shortcutTrue): # Removed input_resolution super().__init__() assert c1 c2, C2PSA requires c1 c2 typically. Check YAML definition. self.c int(c1 * e) self.cv1 Conv(c1, 2 * self.c, 1, 1) self.cv2 Conv(2 * self.c, c2, 1) ffn_exp_ratio_psa 2.0 # Keep consistent with PSABlock internal self.m nn.Sequential(*( PSABlock( cself.c, num_headsnum_heads, qkv_biasqkv_bias, sr_ratiosr_ratio, agent_numagent_num, attn_dropattn_drop, proj_dropproj_drop, ffn_exp_ratioffn_exp_ratio_psa, shortcutshortcut # Controls internal PSABlock residuals ) for _ in range(n) )) def forward(self, x): split_features self.cv1(x) a, b split_features.split((self.c, self.c), dim1) b self.m(b) return self.cv2(torch.cat((a, b), dim1)) # --- Example Usage (Standalone) --- if __name__ __main__: device torch.device(cuda if torch.cuda.is_available() else cpu) B 4 # C must be divisible by num_heads C 256 # Input/Output channels for C2PSA H, W 20, 20 # Example input resolution (MUST BE SQUARE divisible by sr_ratio if 1) num_heads_test 4 sr_ratio_test 2 agent_num_test 49 # 7x7 # Create a C2PSA module instance c2psa_module C2PSA( c1C, c2C, n2, num_headsnum_heads_test, sr_ratiosr_ratio_test, agent_numagent_num_test, # e0.5, shortcutTrue # Defaults ).to(device) # Create a dummy input tensor input_tensor torch.randn(B, C, H, W).to(device) # Perform a forward pass print(fInput shape: {input_tensor.shape}) try: output_tensor c2psa_module(input_tensor) print(fOutput shape: {output_tensor.shape}) assert input_tensor.shape output_tensor.shape print(\nC2PSA with dynamic H/W determination and bias interpolation created and tested successfully.) except Exception as e: print(f\nError during forward pass: {e}) import traceback traceback.print_exc()2.2 更改init.py文件关键步骤二在文件ultralytics\ultralytics\nn\modules\models文件夹下新建__init__.py文件先导入函数然后在下面的__all__中声明函数2.3 添加yaml文件关键步骤三在/ultralytics/ultralytics/cfg/models/26下面新建文件yolo26_C2PSA_Agent.yaml文件粘贴下面的内容目标检测# Ultralytics AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. modelyolo26n.yaml will call yolo26.yaml with scale n # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 2, C3k2, [1024, True]] - [-1, 1, SPPF, [1024, 5]] # 9 - [-1, 2, C2PSA_Agent, [1024]] # 10 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2, [512, False]] # 13 - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 10], 1, Concat, [1]] # cat head P5 - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large) - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)语义分割# Ultralytics AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. modelyolo26n.yaml will call yolo26.yaml with scale n # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 2, C3k2, [1024, True]] - [-1, 1, SPPF, [1024, 5]] # 9 - [-1, 2, C2PSA_Agent, [1024]] # 10 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2, [512, False]] # 13 - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 10], 1, Concat, [1]] # cat head P5 - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large) - [[16, 19, 22], 1, Segment, [nc, 32, 256]]旋转目标检测# Ultralytics AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. modelyolo26n.yaml will call yolo26.yaml with scale n # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 2, C3k2, [1024, True]] - [-1, 1, SPPF, [1024, 5]] # 9 - [-1, 2, C2PSA_Agent, [1024]] # 10 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2, [512, False]] # 13 - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 10], 1, Concat, [1]] # cat head P5 - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large) - [[16, 19, 22], 1, OBB, [nc, 1]]温馨提示本文只是对yolo26基础上添加模块如果要对yolo26 n/l/m/x进行添加则只需要指定对应的depth_multiple 和 width_multipleend2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. modelyolo26n.yaml will call yolo26.yaml with scale n # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs2.4 在task.py中进行注册关键步骤四在parse_model函数中进行注册添加C2PSA_Agent先在task.py导入函数然后在task.py文件下找到parse_model这个函数如下图添加C2PSA_Agent1.在base_modules中添加2.在repeat_modules中添加2.5 执行程序关键步骤五:在ultralytics文件中新建train.py将model的参数路径设置为yolo26_C2PSA_Agent.yaml的路径即可【注意是在外边的Ultralytics下新建train.py】from ultralytics import YOLO import warnings warnings.filterwarnings(ignore) from pathlib import Path if __name__ __main__: # 加载模型 model YOLO(ultralytics/cfg/26/yolo26.yaml) # 你要选择的模型yaml文件地址 # Use the model results model.train(datar你的数据集的yaml文件地址, epochs100, batch16, imgsz640, workers4, namePath(model.cfg).stem) # 训练模型运行程序如果出现下面的内容则说明添加成功from n params module arguments 0 -1 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2] 1 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2] 2 -1 1 6640 ultralytics.nn.modules.block.C3k2 [32, 64, 1, False, 0.25] 3 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2] 4 -1 1 26080 ultralytics.nn.modules.block.C3k2 [64, 128, 1, False, 0.25] 5 -1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2] 6 -1 1 87040 ultralytics.nn.modules.block.C3k2 [128, 128, 1, True] 7 -1 1 295424 ultralytics.nn.modules.conv.Conv [128, 256, 3, 2] 8 -1 1 346112 ultralytics.nn.modules.block.C3k2 [256, 256, 1, True] 9 -1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5] 10 -1 1 305456 ultralytics.nn.models.C2PSA_Agent.C2PSA_Agent[256, 256, 1] 11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, nearest] 12 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1] 13 -1 1 111296 ultralytics.nn.modules.block.C3k2 [384, 128, 1, False] 14 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, nearest] 15 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1] 16 -1 1 32096 ultralytics.nn.modules.block.C3k2 [256, 64, 1, False] 17 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2] 18 [-1, 13] 1 0 ultralytics.nn.modules.conv.Concat [1] 19 -1 1 86720 ultralytics.nn.modules.block.C3k2 [192, 128, 1, False] 20 -1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2] 21 [-1, 10] 1 0 ultralytics.nn.modules.conv.Concat [1] 22 -1 1 378880 ultralytics.nn.modules.block.C3k2 [384, 256, 1, True] 23 [16, 19, 22] 1 309656 ultralytics.nn.modules.head.Detect [80, 1, True, [64, 128, 256]] YOLO26_C2PSA_Agent summary: 227 layers, 2,524,552 parameters, 2,524,552 gradients, 6.0 GFLOPs3. 完整代码分享主页侧边4. GFLOPs关于GFLOPs的计算方式可以查看百面算法工程师 | 卷积基础知识——Convolution未改进的YOLO26n GFLOPs改进后的GFLOPs5. 进阶可以与其他的注意力机制或者损失函数等结合进一步提升检测效果6.总结通过以上的改进方法我们成功提升了模型的表现。这只是一个开始未来还有更多优化和技术深挖的空间。在这里我想隆重向大家推荐我的专栏——专栏地址YOLO26改进-论文涨点——点击跳转看所有内容关注不迷路。这个专栏专注于前沿的深度学习技术特别是目标检测领域的最新进展不仅包含对YOLO26的深入解析和改进策略还会定期更新来自各大顶会如CVPR、NeurIPS等的论文复现和实战分享。为什么订阅我的专栏——专栏地址YOLO26改进-论文涨点——点击跳转看所有内容关注不迷路前沿技术解读专栏不仅限于YOLO系列的改进还会涵盖各类主流与新兴网络的最新研究成果帮助你紧跟技术潮流。详尽的实践分享所有内容实践性也极强。每次更新都会附带代码和具体的改进步骤保证每位读者都能迅速上手。问题互动与答疑订阅我的专栏后你将可以随时向我提问获取及时的答疑。实时更新紧跟行业动态不定期发布来自全球顶会的最新研究方向和复现实验报告让你时刻走在技术前沿。专栏适合人群对目标检测、YOLO系列网络有深厚兴趣的同学希望在用YOLO算法写论文的同学对YOLO算法感兴趣的同学等

YOLO26改进 | 模块融合 | 使高分辨率下的全局上下文建模更高效并提升精度

最新文章

Phi-3-mini-4k-instruct-gguf部署实操：解决vLLM启动失败、模型路径错误、端口被占三大问题

从‘整妆待发’到‘基于XX的XX’：一次搞懂创赛项目与科研项目命名的底层逻辑差异

离散系统与有限状态机建模实践

别再只盯着信号强度了！用Wi-Fi CSI数据玩点新花样：从手势识别到室内定位

2026 视频生成卷疯了！Wan2.2-Lightx2v 本地部署指南（附一键整合包）

边缘计算+YOLO三位一体实战｜工业视觉+控制全栈落地（零云端依赖+7×24h稳定）

推荐文章

【SAP Basis】从SU01出发：深度解析SAP用户类型与安全策略

3分钟掌握RPG Maker解密技巧：解锁游戏资源宝藏

终极编程语言图标库：50+高清开发标志一键获取

Colmap实战解析：从特征提取到鲁棒匹配的工程化实现

别再手动调音效了！用这5款Unity音频插件，让你的游戏音效瞬间‘活’起来

Ryujinx模拟器终极指南：免费在PC上畅玩Switch游戏的完整教程

相关文章

别再死记硬背MIPI状态转换图了！用Python脚本模拟单向/双向Data Lane状态机

HuggingFace模型下载终极优化：Autodl服务器上的国内镜像与断点续传技巧

Python EXE逆向解密深度解析：从加密打包到源码还原的完整流程

基于 Python 与 PyQt5 构建的特斯拉行车记录仪视频播放器

别再搞混了！PyTorch里CrossEntropyLoss和NLLLoss到底该用哪个？（附代码对比）

别再为Linux打印机驱动烦恼：foo2zjs开源驱动彻底解决兼容性问题

分享文章

更多文章

RAG检索优化秘籍：提升大模型效果，小白也能学会的收藏技巧！

一文读懂GPTQ：大模型量化“老将”，让LLM高效部署更简单

实测有效！给YOLOv11加上这个MSCAA注意力模块，mAP涨了3个点（附完整代码）

Java EE进阶：Linux的基本使用

【嵌入式实战】波特率：从原理到排坑，我的ESP32与OpenMV串口通信血泪史

2025年技术趋势全景解读：麦肯锡最新报告揭示13大前沿技术方向

洛谷-算法1-2-排序2

Dify Chatflow 进阶指南：从自然语言到可视化报表的全流程实现

MySQL跨地区数据库迁移怎么优化速度_数据压缩与网络带宽限制

终极鼠标键盘录制自动化工具：5分钟快速上手KeymouseGo完整指南

颠覆性系统优化：让Windows 11性能提升70%的开源工具全攻略

提升英雄联盟游戏体验：基于LCU API的智能客户端工具集实战指南