伤员转运后送
01-从角色2向角色3医疗设施航空医疗后送期间的战斗伤亡管理
03-Collective aeromedical evacuations of SARS-CoV-2-related ARDS patients in a military tactical plane- a retrospective descriptive study
04-乌克兰火车医疗后送的特点,2022
02-Decision Support System Proposal for Medical Evacuations in Military Operations
02-军事行动中医疗后送的决策支持系统建议
05-无人驾驶飞机系统的伤员疏散需要做什么
04-Characteristics of Medical Evacuation by Train in Ukraine, 2022.
05-Unmanned Aircraft Systems for Casualty Evacuation What Needs to be Done
07-一个德语语料库,用于搜索和救援领域的语音识别
08-雷达人类呼吸数据集的应用环境辅助生活和搜索和救援行动
08-Radar human breathing dataset for applications of ambient assisted living and search and rescue operations
06-基于信息融合的海上搜索救援目标定位
07-RESCUESPEECH- A GERMAN CORPUS FOR SPEECH RECOGNITION IN SEARCH AND RESCUE DOMAIN
12-欧盟和世卫组织联手进一步加强乌克兰的医疗后送行动
09-战场伏击场景下无人潜航器最优搜索路径规划
11-麦斯卡尔医疗后送-康涅狄格州陆军警卫医务人员在大规模伤亡训练中证明了他们的能力
06-Target localization using information fusion in WSNs-based Marine search and rescue
13- 年乌克兰火车医疗后送的特点
09-Optimal search path planning of UUV in battlefeld ambush scene
10-志愿医护人员从乌克兰前线疏散受伤士兵
14-海上搜救资源配置的多目标优化方法——在南海的应用
14-A Multi-Objective Optimization Method for Maritime Search and Rescue Resource Allocation An Application to the South China Sea
15-基于YOLOv5和分层人权优先的高效无人机搜索路径规划方法
17-乌克兰医疗保健专业人员在火药行动期间的经验对增加和加强培训伙伴关系的影响
17-Ukrainian Healthcare Professionals Experiences During Operation Gunpowder Implications for Increasing and Enhancing Training Partnerships
15-An Integrated YOLOv5 and Hierarchical Human Weight-First Path Planning Approach for Efficient UAV Searching Systems
16-基于旋转变压器的YOLOv5s海上遇险目标检测方法
16-YOLOv5s maritime distress target detection method based on swin transformer
19-人工智能的使用在伤员撤离、诊断和治疗阶段在乌克兰战争中
19-THE USE OF ARTIFICIAL INTELLIGENCE AT THE STAGES OF EVACUATION, DIAGNOSIS AND TREATMENT OF WOUNDED SOLDIERS IN THE WAR IN UKRAINE
18-军事行动中医疗后送的决策支持系统建议
20-乌克兰医疗保健专业人员在火药行动中的经验对增加和加强培训伙伴关系的影响
20-Ukrainian Healthcare Professionals Experiences During Operation Gunpowder Implications for Increasing and Enhancing Training Partnerships
21-大国冲突中医疗后送的人工智能
18-Decision Support System Proposal for Medical Evacuations in Military Operations
23-伤亡运输和 疏散
24-某军用伤员疏散系统仿真分析
23-CASUALTY TRANSPORT AND EVACUATION
24-Simulation Analysis of a Military Casualty Evacuation System
25-无人驾驶飞机系统的伤员疏散需要做什么
26-Aeromedical Evacuation, the Expeditionary Medicine Learning Curve, and the Peacetime Effect.
26-航空医疗后送,远征医学学习曲线,和平时期的影响
25-Unmanned Aircraft Systems for Casualty Evacuation What Needs to be Done
28-军用战术飞机上sars - cov -2相关ARDS患者的集体航空医疗后送——一项回顾性描述性研究
27-乌克兰火车医疗后送的特点,2022
27-Characteristics of Medical Evacuation by Train in Ukraine, 2022.
28-Collective aeromedical evacuations of SARS-CoV-2-related ARDS patients in a military tactical plane- a retrospective descriptive study
03-军用战术飞机上sars - cov -2相关ARDS患者的集体航空医疗后送——一项回顾性描述性研究
30-评估局部现成疗法以减少撤离战场受伤战士的需要
31-紧急情况下重伤人员的医疗后送——俄罗斯EMERCOM的经验和发展方向
31-Medical Evacuation of Seriously Injured in Emergency Situations- Experience of EMERCOM of Russia and Directions of Development
30-Evaluation of Topical Off-the-Shelf Therapies to Reduce the Need to Evacuate Battlefield-Injured Warfighters
29-军事行动中医疗后送的决策支持系统建议
29-Decision Support System Proposal for Medical Evacuations in Military Operations
32-决策支持在搜救中的应用——系统文献综述
32-The Syrian civil war- Timeline and statistics
35-印尼国民军准备派飞机接运 1
33-eAppendix 1. Information leaflet basic medical evacuation train MSF – Version April 2022
36-战场上的医疗兵
34-Characteristics of Medical Evacuation by Train in Ukraine
22-空军加速变革以挽救生命:20年来航空医疗后送任务如何取得进展
34-2022年乌克兰火车医疗疏散的特点
33-信息传单基本医疗后送车
40-航空医疗后送
43-美军的黄金一小时能持续多久
42-陆军联手直升机、船只和人工智能进行伤员后送
47-受伤的士兵撤离
46-伤员后送的历史从马车到直升机
37-从死亡到生命之路
41-后送医院
52-印度军队伤员航空医疗后送经验
53-“地狱之旅”:受伤的乌克兰士兵撤离
45-伤病士兵的撤离链
54-热情的和资源匮乏的士兵只能靠自己
57-2022 年乌克兰火车医疗后送
51-医务人员在激烈的战斗中撤离受伤的乌克兰士兵
59-乌克兰展示医疗后送列车
61-俄罗斯士兵在乌克兰部署自制UGV进行医疗后送
60-“流动重症监护室”:与乌克兰顿巴斯战斗医务人员共24小时
50-医疗后送——保证伤员生命安全
阿拉斯加空军国民警卫队医疗后送受伤陆军伞兵
航空撤离,印度经验 抽象的
通过随机森林模拟规划方法解决军事医疗后送问题
2022 年乌克兰火车医疗后送的特点
战术战地救护教员指南 3E 伤员后送准备和要点 INSTRUCTOR GUIDE FOR TACTICAL FIELD CARE 3E PREAPRING FOR CASUALTY EVACUTION AND KEY POINTS
军事医疗疏散
北极和极端寒冷环境中的伤亡疏散:战术战斗伤亡护理中创伤性低温管理的范式转变
-外地伤员后送现场伤亡疏散
伤员后送图片
从角色2到角色3医疗设施期间战斗人员伤亡管理
关于军事行动中医疗疏散的决策支持系统建议书
在军事战术平面上对sars-cov-2相关 ARDS患者进行的集体空中医疗后送: 回顾性描述性研究
2022年乌克兰火车医疗疏散的特点
透过战争形势演变看外军营救后送阶梯 及医疗救护保障措施
东部伤兵营 英文 _Wounded_Warrior_Battalion_East
组织紧急医疗咨询和医疗后送 2015 俄文
-
+
首页
16-基于旋转变压器的YOLOv5s海上遇险目标检测方法
<p><a href="http://crossmark.crossref.org/dialog/?doi=10.1049%2Fipr2.13024&domain=pdf&date_stamp=2024-01-11"><img src="/media/202408//1724838587.168684.jpeg" /></a></p><p>Received: 14 October 2023 <img src="/media/202408//1724838587.1888719.png" /> Revised: 6 December 2023 <img src="/media/202408//1724838587.1926851.png" /> Accepted: 21 December 2023 <img src="/media/202408//1724838587.196497.png" /><strong> IET Image Processing</strong></p><p>DOI: 10.1049/ipr2.13024</p><p><strong>ORIGINAL RESEARCH</strong></p><p><img src="/media/202408//1724838587.218859.png" /></p><p>The Institution of</p><p>Engineering and Technology</p><p>WILEY</p><p><strong>YOLOv5s maritime distress target detection method based on </strong><a id="bookmark1"></a><strong>swin transformer</strong></p><p><strong>Kun Liu</strong><a href="#bookmark1"><strong>1</strong></a><strong> </strong><img src="/media/202408//1724838587.243517.png" /><strong> Yueshuang Qi</strong><a href="#bookmark2"><strong>2</strong></a><a href="https://orcid.org/0000-0003-0559-5675"><img src="/media/202408//1724838587.247562.png" /><strong> </strong></a><img src="/media/202408//1724838587.254153.png" /><strong> Guofeng Xu</strong><a href="#bookmark1"><strong>1</strong></a><strong> </strong><img src="/media/202408//1724838587.268887.png" /><strong> Jianglong Li</strong><a href="#bookmark1"><strong>1</strong></a></p><p>1PLA Naval Aviation University, Qingdao Campus, Qingdao, China</p><p>2 School of Information Science and Engineering, Qilu Normal University, Jinan, China</p><p><strong>Correspondence</strong></p><p>Yueshuang Qi, School of Information Science and Engineering, Qilu Normal University, Jinan 250200, China.</p><p>Email:<a href="mailto:free_qys@163.com">free_qys@163.com</a></p><p><a id="bookmark2"></a><strong>Abstract</strong></p><p>In recent years, the task of maritime emergency rescue has increased, while the cost of time for traditional methods of search and rescue is pretty long with poor effect subject to the constraints of the complex circumstances around thesea, the effective conditions, and the support capability. This paper applies deep learning and proposes a YOLOv5s-SwinDS algorithm for target detection in distress at sea. Firstly, the backbone network of the YOLOv5s algorithm is replaced by swin transformer, and a multi-level feature fusion mod- ule is introduced to enhance the feature expression ability for maritime targets. Secondly, deformable convolutional networks v2 (DCNv2) is used instead of traditional convolution to improve the recognition capability for irregular targets when the neck network features are output. Finally, the CIoU loss function is replaced with SIoU to reduce the redun- dant box effectively while accelerating the convergence and regression of the predicted box. Experimenting on the publicly dataset SeaDronesSee, the <em>Precision</em>, <em>Recall</em>, <em>mAP</em>0.5 and <em>mAP</em>0.5−0.95 of YOLOv5s-SwinDS model are 87.9%, 75.8%, 79.1% and 42.9%, respec- tively, which get higher results than the original YOLOv5s model, the YOLOv7 series of models, and the YOLOv8 series of models. The experiments verifies that the algorithm has good performance in detecting maritime distress targets.</p><p><strong>1 </strong><img src="/media/202408//1724838587.286735.png" /><strong> INTRODUCTION</strong></p><p>As human activities at sea become more and more frequent, people swim, dive, surf, ride motor boats, or take boats for sea fishing, etc. Maritime activities have become an important part of daily life with the number of boats operating at sea increas- ing. However, many people die every year because they fall into the sea for various reasons but are not rescued in time. Tradi- tional maritime search and rescue methods mainly rely on the experience of experts and the spot where targets fell in thesea to set the area for rescue, which might take a long time and have a relatively poor effect on rescue restricted by the com- plex circumstances around thesea, the effective conditions and the support capability.</p><p>In recent years, with the rapid development of Big Data and the Internet of Things <a href="#bookmark3">[1–6]</a>, the use of unmanned intelligent equipment for target detection in distress at sea has become a new research hotspot. J. Wang et al. proposed some advanced research achievements, such as the MultiStage Self-Guided Sep-</p><p>aration Network and Representation-Enhanced Status Replay Network <a href="#bookmark4">[7, 8]</a>. M. Zhang et al. proposed SOT-NET <a href="#bookmark5">[9]</a>. Due to the characteristics of low cost, sensitivity, intelligence, and autonomy, unmanned aerial vehicles (UAVs) have gradually been used in the maritime search and rescue field <a href="#bookmark6">[10–12]</a>.</p><p>Object detection by UAVs is an important research direc- tion in the field of computer vision. Compared to the images taken from ahorizontal perspective, images taken by UAVs not only with variable angles and heights but also have the features of large view, small percentage of the target, complex back- ground, and sensitivity to sunlight <a href="#bookmark7">[13]</a>, which make it difficult for object detection based on UAV images. Therefore, we pro- pose an improved Yolov5s algorithm to improve the accuracy of UAVs for object detection of maritime distress. There are two kinds of target detection methods based on depth learn- ing, one of which is two-stage target detection models based on R-CNN <a href="#bookmark8">[14]</a>, fast R-CNN <a href="#bookmark9">[15]</a> and faster R-CNN <a href="#bookmark10">[16]</a>. And the other is one-stage target detection models based on YOLO <a href="#bookmark11">[17–21]</a> and SSD <a href="#bookmark12">[22]</a>. Some of the common two-stage models</p><p>This is an open access article under the terms of the<a href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution</a>License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.</p><p>© 2024 The Authors. <em>IET Image Processing </em>published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.</p><p>LIU ET AL.</p><p>17519667, 0, Downloaded from <a href="https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024">https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024</a>, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (<a href="https://onlinelibrary.wiley.com/terms-and-conditions">https://onlinelibrary.wiley.com/terms-and-conditions</a>) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License</p><p><strong>2</strong></p><p>have high accuracy and better effect on detection but require a large amount of computation.</p><p>To overcome the low accuracy of the YOLO series of algo- rithms, paper <a href="#bookmark13">[20]</a> used cross-stage-partial (CSP) to extract features, which introduces spatial pyramid pooling (SPP) and path aggregation network (PAN) <a href="#bookmark14">[23]</a>, and then extracted fea- tures by fusing high-level and low-level feature information, combining with mosaic data enhancement method to improve the accuracy of object detection.</p><p>In order to further improve the accuracy of detection, YOLOv5 <a href="#bookmark15">[24]</a> introduced adaptive candidate boxes (anchors) <a href="#bookmark16">[25]</a> and adaptive image scaling on the basis of YOLOv4 net- works to enhance data and improve the robustness of the network. YOLOv5 adopted the focus structure and designed two CSP structures, which strengthened the fusion capability for network features and retained richer feature information. Apart from this, YOLOv5 applied CIOU_loss function <a href="#bookmark17">[26]</a> to improve the accuracy and convergence stability of the net- work. YOLOv5 also introduced network depth and width as scaling factors and four models came into being according to the number of layers and channels, namely YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x, of those had a sequential improve- ment in network size and accuracy, which made the network more flexible. YOLOv5s had higher efficiency of detection and lower requirements for hardware equipment, so the algo- rithm proposed in this paper makes the improvements based on YOLOv5s.</p><p>We propose the YOLOv5s-SwinDS algorithm for target detection in distress at sea based on YOLOv5s with a swin transformer. The main contributions of this paper are as follows:</p><p>1. YOLOv5s algorithm uses the traditional CNN network to extract features. Due to the limitation of CNN, it can only establish certain connections in local areas while can not establish remote dependency with farther locations. Swin transformer can connect with any location because of their unique self-attention mechanism. In response to the above issues, this article innovatively combines swin transformer with the YOLOv5s algorithm, using swin transformer as the backbone to extract features, complementing each other’s strengths and weaknesses, in order to achieve a better ability to capture remote dependency relationships. The backbone network of the YOLOv5s algorithm is replaced by swin transformer, and a multi-level feature fusion module is intro- duced to enhance the feature expression ability for maritime objects.</p><p>2. The deformable convolutional networks v2 (DCNv2) is used instead of traditional convolution in the feature output of the neck network to improve the recognition capability for irregular targets.</p><p>3. The CIoU_loss function is replaced with SIoU to reduce the redundant box effectively while accelerating the convergence and regression of the predicted box.</p><p>The experimental results on the opening dataset SeaD- ronesSee show that the proposed YOLOv5s-SwinDS model</p><p>is superior to the traditional YOLOv5s model, YOLOv7 series of models, and YOLOv8 series of models in terms of <em>Precision</em>, <em>Recall</em>, <em>mAP</em>0.5 and <em>mAP</em>0.5−0.95 . Our algorithm takes into account the improvement of detection speed while increasing accuracy, which has a better effect on searching for distress targets.</p><p><strong>2 </strong><img src="/media/202408//1724838587.319154.png" /><strong> YOLOv5s NETWORK</strong></p><p><strong>ARCHITECTURE</strong></p><p>The framework of YOLOv5s can be divided into an input end, a backbone network,a neck network, and an output end, and its structure is shown in Figure <a href="#bookmark18">1.</a></p><p><strong>2.1 </strong><img src="/media/202408//1724838587.3240588.png" /><strong> Input</strong></p><p>The main technologies of the input end are mosaic enhance- ment to dataset and adaptive anchor box calculation. Mosaic enhancement can effectively enrich the data set by scaling, cutting, arranging, and splicing images at will. In the training process of adaptive anchor box calculation, the predicted box will be output according to the pre-set initial anchor box, and then compared with the real box. The optimal anchor box value will begotten in the light of the reverse update of the difference between the predicted box and the real box.</p><p><strong>2.2 </strong><img src="/media/202408//1724838587.352214.png" /><strong> Backbone</strong></p><p>The feature extraction network of YOLOv5s draws on the design idea of CSPDarkNet and is composed of spatial pyra- mid pooling fast (SPPF) and C3, which contains three standard convolutional layers and several bottlenecks. The C3 module contains a residual structure, which can reduce the num- ber of network parameters and improve the training speed. SPPF improves based on spatial pyramid pooling (SPP), which deletes redundant operations to perform feature fusion at a faster speed.</p><p><strong>2.3 </strong><img src="/media/202408//1724838587.38029.png" /><strong> Neck</strong></p><p>The neck network combines image features and transmits them to the head module for prediction. It uses an FPN+PAN struc- ture, in which FPN is a pyramid-reinforced structure that is in charge of the semantic features in a high-level network and transmits from top to bottom. PAN is responsible for the posi- tioning features in the underlying network and transmits from bottom to top. FPN and PAN are shown in Figure <a href="#bookmark19">2.</a></p><p><strong>2.4 </strong><img src="/media/202408//1724838587.4082072.png" /><strong> Head</strong></p><p>The output end, also known as the head, is the classifier and regressor of YOLOv5s, which is responsible for judging the</p><p>LIU ET AL.</p><p>17519667, 0, Downloaded from <a href="https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024">https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024</a>, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (<a href="https://onlinelibrary.wiley.com/terms-and-conditions">https://onlinelibrary.wiley.com/terms-and-conditions</a>) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License</p><p><strong>3</strong></p><p><img src="/media/202408//1724838587.4304268.jpeg" /></p><p><strong>FIGURE 1 </strong>Structure of YOLOv5s.</p><p><img src="/media/202408//1724838587.500364.jpeg" /></p><p><strong>FIGURE 2 </strong>Feature pyramid network (FPN) and path aggregation <a id="bookmark18"></a><a id="bookmark19"></a>network (PAN).</p><p>previously obtained feature points and whether there are objects corresponding to the feature points. The output end uses non- maximum suppression (NMS) and adopts CIOU_Loss as the loss function for the bounding box, which can solve the mis- alignment problem of the bounding box and effectively improve the speed and accuracy of predicted box regression.</p><p><strong>3 </strong><img src="/media/202408//1724838587.535035.png" /><strong> YOLOv5s-SwinDS NETWORK ARCHITECTURE</strong></p><p>In this paper, we propose a YOLOv5s-SwinDS algorithm for target detection in distress at sea based on YOLOv5s with swin transformer. Firstly, the backbone network of YOLOv5s algo-</p><p><img src="/media/202408//1724838587.712161.jpeg" /></p><p><strong>FIGURE 3 </strong>Structure of YOLOv5s-SwinDS.</p><p>rithm is replaced by the swin transformer, and a multi-level feature fusion module is introduced to enhance the fea- ture expression ability for maritime objects. Secondly, DCNv2 instead of traditional convolution is used in the feature output of the neck network to improve the recognition capability for irregular targets and enable adaptive feature sampling. Finally, at the head section, the CIoU_loss function is replaced with SIoU to reduce the redundant box effectively while accelerating the convergence and regression of the predicted box. The network structure of YOLOv5s-SwinDS is shown in Figure <a href="#bookmark19">3.</a></p><p>LIU ET AL.</p><p>17519667, 0, Downloaded from <a href="https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024">https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024</a>, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (<a href="https://onlinelibrary.wiley.com/terms-and-conditions">https://onlinelibrary.wiley.com/terms-and-conditions</a>) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License</p><p><strong>4</strong></p><p><a id="bookmark20"></a><strong>3.1 </strong><img src="/media/202408//1724838587.810345.png" /><strong> Replace the backbone network with</strong></p><p><strong>swin transformer</strong></p><p>Transformer was first used in natural language processing, and its birth came from the inability to use parallel computing and GPU for acceleration in training RNNs so that CNN was used for parallel acceleration instead of RNNs. However, there are certain challenges in applying a transformer to the field of com- puter vision because of the differences between transformer and CNN. Firstly, the detection speed of the transformer in computer vision is slow. Secondly, there are problems such as large computational intensity while applying the transformer model used in natural language processing tasks directly as the amount of information in computer vision tasks is much greater than that of text. The proposal of the swin transformer solves the shortage of slow detection speed of Transformer in com- puter vision field and keeps the characteristics of convolutional neural network such as displacement invariance and resolu- tion reduction by stage, which state-of-the-art effects have been achieved in many fields. Swin transformer <a href="#bookmark21">[27]</a> is a network model that introduces a sliding window and hierarchical struc- ture, it consists of four stages and each of them reduces the resolution of their input features which is similar to the role of a convolutional neural network in expanding the receptive field layer by layer. The swin transformer model mainly includes a patch embedding segmentation coding module, swin trans- former block sliding module, patch merging mobile splicing module, and soon.</p><p>The detection process of the swin transformer is as follows: Firstly, segmentation coding patch embedding is executed on input to split the image into multiple image blocks, which is convenient for operating on each block afterward to reduce the calculation amount. Secondly, each stage contains patch merging mobile splicing module and multiple blocks, in which LayerNorm, MLP, window attention, and shifted window atten- tion together constitute the swin transformer block. The patch merging mobile splicing module acts at the beginning of each stage to reduce the resolution of the image. The structure of the swin transformer is shown in Figure <a href="#bookmark20">4a,b.</a></p><p><strong>3.2 </strong><img src="/media/202408//1724838588.075428.png" /><strong> Using deformable convolution instead of traditional convolution</strong></p><p>The objects of SeaDronesSee dataset used in this paper for mar- itime distress target detection have diverse shapes and attitudes, which are irregular targets. Traditional convolution is insuffi- cient to extract the feature information of irregular targets when output the neck network features.</p><p>Convolution kernel is used to extract the features of input images, in which the kernel size is usually fixed. The biggest problem of this kind of convolution kernel is that it has poor adaptability to unknown changes and poor generalization abil- ity. The disadvantages of the traditional standard convolution kernel areas follows:</p><p>1. The convolution unit samples feature map for inputs at a fixed position.</p><p><img src="/media/202408//1724838588.20849.jpeg" /></p><p><strong>FIGURE 4 </strong>(a) Structure of swin transformer. (b) Two successive swin transformer blocks.</p><p>2. The pooling layer continuously reduces the size of feature map.</p><p>3. RoI pooling layer generates RoI which has a limited spatial location.</p><p>The localization sampling method of traditional convolu- tional neural networks is hard to adapt to the deformation of objects. The model formula of this process is as follows:</p><p><img src="/media/202408//1724838588.395589.png" /></p><p>where <em>x </em>represents input feature map, and the convolution ker- nel samples according to the square grid points. <em>w </em>represents weight. As for the position <em>p</em>0 of output <em>y</em>, the output feature mapping is equal to the sum of the sample values assigned by <em>w</em>. <em>R </em>represents positioning information. The calculation formula is as follows:</p><p><em>R </em>= {(−1,−1), (−1, 0), … , (0, 1), (1, 1)}. (2)</p><p>Deformable convolution means that the convolution kernel adds a directional parameter to each element so that the convo- lution kernel can be extended to a large range during training. Using deformable convolution in the output stage of neck net- work, not only the features of regular targets can be extracted, but also the features of irregular targets can be fully extracted. Zhu et al. <a href="#bookmark22">[28]</a> proposed deformable convolution that adjusted</p><p>LIU ET AL.</p><p>17519667, 0, Downloaded from <a href="https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024">https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024</a>, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (<a href="https://onlinelibrary.wiley.com/terms-and-conditions">https://onlinelibrary.wiley.com/terms-and-conditions</a>) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License</p><p><strong>5</strong></p><p><img src="/media/202408//1724838588.486451.jpeg" /></p><p><strong>FIGURE 5 </strong>Convolution: (a) Traditional standard convolution and (b) Deformable convolution.</p><p><a id="bookmark23"></a>the direction vector of convolution kernel on the basis of tra- ditional convolution, which could adaptively sample with the shape of objects. Therefore, in order to adapt to various forms of marine distress targets, this paper introduces deformable convolution to freely sample at sampling position, which is not <a id="bookmark24"></a>limited to square grid units.</p><p>The main advantage of deformable convolution is that it can sample adaptively for features and have learning ability for geo- metric deformation, which is well suited for detecting objects of different sizes and shapes, while the method only increases the computation time to a certain extent. The deformable con- volution formula with additional learning target offset for each sampling point is shown below:</p><p><em>y </em>(<em>p</em>0) = ∑ <em>w</em>(<em>pn</em>) × <em>x</em>(<em>p</em>0 +<em>pn</em>+Δ<em>pn</em>). (3)</p><p><em>pn</em>∈<em>R</em></p><p>After assigning offset <em>p </em>for each sampling position, the sampling becomes irregular, which makes the transformation modelling capability of the new method better than the tradi- tional convolutional neural network. Deformable convolution is shown in Figure <a href="#bookmark23">5.</a></p><p><strong>3.3 </strong><img src="/media/202408//1724838588.746513.png" /><strong> Improvement of loss function</strong></p><p>In YOLOv5s, CIoU <a href="#bookmark25">[29]</a> is used to calculate the regression loss of predicted box. As shown in Equations <a href="#bookmark26">(4)</a> and <a href="#bookmark27">(5)</a>, the penalty term of CIoU is set by adding an influence factor based on the penalty term of DIoU.</p><p><a id="bookmark26"></a><em>L</em>DIoU = 1 − IoU + <img src="/media/202408//1724838588.793935.png" /> . (4)</p><p><a id="bookmark27"></a><em>L</em>CIoU = 1 − IoU + <img src="/media/202408//1724838588.802583.png" /> +α<em>v</em>. (5)</p><p>where, α is the parameter of trade-off, defined as Equation <a href="#bookmark28">(6)</a>.</p><p><a id="bookmark28"></a>α = <em> v </em>(6)</p><p>(1 − IoU) + <em>v </em>,</p><p><img src="/media/202408//1724838588.807838.jpeg" /></p><p><strong>FIGURE 6 </strong>Calculation process of angle cost.</p><p><em>v </em>is used to measure the consistency of aspect ratio; the definitionis shown in Equation <a href="#bookmark24">(7)</a>.</p><p><em>v </em>= <img src="/media/202408//1724838588.8740962.png" />arctan <img src="/media/202408//1724838588.892666.png" /> − arctan <img src="/media/202408//1724838588.8960729.png" />2 . (7)</p><p>CIoU does not take into account the mismatch between the direction of predicted box and real box, which results in con- verging slowly and inefficiently. The predicted box may “wander around” during training and produce a worse model ultimately.</p><p>SIoU <a href="#bookmark29">[30]</a> considers the direction and angle between regres- sion vectors, which introduces the vector angle between the real box and the predicted box for constraining the predicted box in a certain direction of the<em>X </em>or <em>Y</em>-axis to improve the con- vergence speed. SIoU is composed of angle cost, distance cost, shape cost and IoU cost.</p><p>3.3.1 <img src="/media/202408//1724838588.906532.png" /> Angle cost</p><p><em>B </em>(<em>bcx </em>, <em>bcy </em>) is the centre point of the predicted box, and set the</p><p>horizontal axis or vertical axis to the centre point <em>BGT </em>(<em>b </em>, <em>bt </em>)</p><p>of real box to reduce the DOF of the anchor box and achieve a rapid approach to the real box along the relevant axis, as shown in Figure <a href="#bookmark23">6</a>and the formula is shown below:</p><p>Λ = 1 − 2 ∗ sin2 (arcsin <img src="/media/202408//1724838588.923885.png" /> − <img src="/media/202408//1724838588.9280179.png" /> = 1 − 2 ∗ sin2 (α − <img src="/media/202408//1724838588.931665.png" /> .</p><p>(8)</p><p>where σ and <em>ch </em>represent the difference in distance and height between the centre point of real box and predicted box, respectively.</p><p>3.3.2 <img src="/media/202408//1724838588.943427.png" /> Distance cost</p><p>Using diagonal distance of the smallest external rectangle of predicted box and real box as the distance cost, as shown in Figure <a href="#bookmark30">7</a>. The equations of distance cost are shown in Equations <a href="#bookmark31">(9)</a> and <a href="#bookmark32">(10)</a>.</p><p>LIU ET AL.</p><p>17519667, 0, Downloaded from <a href="https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024">https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024</a>, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (<a href="https://onlinelibrary.wiley.com/terms-and-conditions">https://onlinelibrary.wiley.com/terms-and-conditions</a>) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License</p><p><strong>6</strong></p><p><img src="/media/202408//1724838588.966534.jpeg" /></p><p><a id="bookmark33"></a><strong>FIGURE 7 </strong>Calculation process of distance cost.</p><p><a id="bookmark31"></a>Δ = 2 − <em>e</em>−(2−Λ)p<em>x </em>− <em>e</em>−(2−Λ)p<em>y </em>(9)</p><p><a id="bookmark32"></a><img src="/media/202408//1724838588.983521.png" /></p><p>where <em>cw </em>and <em>ch </em>represent the width and height of the smallest external rectangle.</p><p>3.3.3 <img src="/media/202408//1724838588.9870489.png" /> Shape cost</p><p>Ω = 1 − <em>e</em>−<em>ww </em>θ + 1 − <em>e</em>−<em>wh</em>θ (11)</p><p><img src="/media/202408//1724838588.994208.png" /></p><p>where, <em>w </em>and <em>wgt </em>indicate the width of predicted box and real box, respectively, <em>h</em>and <em>hgt </em>indicate the height of predicted box and real box, respectively. θ is close to 4.</p><p>The loss function of SIoU regression is shown in Equation</p><p><a id="bookmark34"></a><a href="#bookmark34">(13)</a>.</p><p><em>Loss</em>SIoU = 1 − <em>IoU </em>+ <img src="/media/202408//1724838589.0013282.png" /> (13)</p><p>where IOU represents the intersection of the predicted box and the real box.</p><p><strong>4 </strong><img src="/media/202408//1724838589.027618.png" /><strong> EXPERIMENT</strong></p><p><strong>4.1 </strong><img src="/media/202408//1724838589.033191.png" /><strong> Experimental dataset</strong></p><p>The SeaDronesSee dataset was presented at the WACV 2022 conference by Varga and other researchers at the Univer- sity of Tuebingen in Germany. SeaDronesSee is a large-scale dataset of object detection and tracking that has collected and recorded more than 54, 000 images and 400, 000 instances that were taken by drones at altitudes of 5 to 260 m and viewing angles of 0◦ to 90◦ while providing corresponding values of height, angle and other information. Swimmer, boat, jet ski, life_saving_appliances, and buoy are selected as the detection</p><p><strong>TABLE 1 </strong>Environment configuration.</p><p><a id="bookmark30"></a><strong>Parameter Configuration</strong></p><table><tr><td><p>Operating system PyTorch version CUDA version cuDNN version Python version OpenCV version</p></td><td><p>Windows 10</p><p>1.12.1</p><p>11.7</p><p>8.9.0</p><p>3.9.12</p><p><a href="4.6.0.66">4.6.0.66</a></p></td></tr></table><p><img src="/media/202408//1724838589.0407228.jpeg" /></p><p><strong>FIGURE 8 </strong>Loss curve on the verification set.</p><p>objects in this dataset. The image acquisition for actual dis- tress targets is difficult and the amount is relatively small. The SeaDronesSee dataset contains several types of targets that can simulate the state of actual targets in maritime distress, so we chose SeaDronesSee as the experimental dataset.</p><p>In order to prove the effectiveness of our algorithm and apply it to the field of searching and rescuing targets at sea, we adopted the stratified sampling method to make a subset of the SeaD- roneSee dataset, which put 893 images into the training set and 155 images into the verification set.</p><p><strong>4.2 </strong><img src="/media/202408//1724838589.093432.png" /><strong> Experimental environment and</strong></p><p><strong>parameter settings</strong></p><p>The CPU used for experiment is anAMD Ryzen 9 3900XT 12- Core Processor 3.80 GHz, 32 GB, equipped with an NVIDIA GeForce RTX 3080 Ti GPU. Pytorch framework and CUDA are used to realize the detection model. Table <a href="#bookmark30">1</a> shows the detailed environment configuration.</p><p>The size of the input image is adjusted to 640 × 640, the ini- tial learning rate is 0.01, and the final learning rate is 0.0001. The optimizer is SGD, and the cosine learning rate decay is used to control for changes in learning rates, where the learning rate momentum is 0.937 and the weight attenuation is 0.0005. Training rounds are set to 1000 and use the early stopping mech- anism. We use the same dataset and parameters in the training. The loss curve on the verification set is shown in Figure <a href="#bookmark33">8.</a> It</p><p>LIU ET AL.</p><p>17519667, 0, Downloaded from <a href="https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024">https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024</a>, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (<a href="https://onlinelibrary.wiley.com/terms-and-conditions">https://onlinelibrary.wiley.com/terms-and-conditions</a>) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License</p><p><strong>7</strong></p><p><a id="bookmark35"></a><strong>TABLE 2 </strong>Results of ablation experiment.</p><table><tr><td><p><strong>Serial number</strong></p></td><td><p><strong>Model</strong></p></td><td><p><strong><em>Precision</em></strong></p></td><td><p><strong><em>Recall</em></strong></p></td><td><p><strong><em>mAP</em>0</strong>.<strong>5</strong></p></td><td><p><strong><em>mAP</em>0</strong>.<strong>5</strong>−<strong>0</strong>.<strong>95</strong></p></td></tr><tr><td><p>1</p></td><td><p>YOLOv5s</p></td><td><p>82.3</p></td><td><p>71.1</p></td><td><p>74.9</p></td><td><p>39.6</p></td></tr><tr><td><p>2</p></td><td><p>YOLOv5s+Swin transformer</p></td><td><p>81.8</p></td><td><p>73.4</p></td><td><p>77.6</p></td><td><p>41.2</p></td></tr><tr><td><p>3</p></td><td><p>YOLOv5s+DCNv2</p></td><td><p>89.3</p></td><td><p>71.8</p></td><td><p>75.8</p></td><td><p>41.3</p></td></tr><tr><td><p>4</p></td><td><p>YOLOv5s+SIOU</p></td><td><p>84.7</p></td><td><p>73.3</p></td><td><p>75.5</p></td><td><p>41.8</p></td></tr><tr><td><p>5</p></td><td><p>YOLOv5s+Swin transformer+DCNv2</p></td><td><p>85.3</p></td><td><p>74.4</p></td><td><p>77.4</p></td><td><p>42.1</p></td></tr><tr><td><p>6</p></td><td><p>YOLOv5s+Swin transformer+SIOU</p></td><td><p><strong>90.6</strong></p></td><td><p>74.4</p></td><td><p>77.7</p></td><td><p>41.8</p></td></tr><tr><td><p>7</p></td><td><p>YOLOv5s+DCNv2+SIOU</p></td><td><p>82.7</p></td><td><p>73.9</p></td><td><p>75.9</p></td><td><p>41.4</p></td></tr><tr><td><p>8</p></td><td><p>YOLOv5s-SwinDS</p></td><td><p>87.9</p></td><td><p><strong>75.8</strong></p></td><td><p><strong>79.1</strong></p></td><td><p><strong>42.9</strong></p></td></tr></table><p>can be seen that the model quickly fits in the first 100 epochs and converges gently at about 200 epochs.</p><p><strong>4.3 </strong><img src="/media/202408//1724838589.149329.png" /><strong> Evaluation metrics</strong></p><p>In this paper, we choose precision, recall and mean average pre- cision (MAP) as evaluation metrics to evaluate the performance of our model <a href="#bookmark36">[31]</a>.</p><p>Suppose that <em>TP </em>and <em>FP </em>are the number of positive and negative samples, respectively when the prediction results are true. <em>FN </em>is the number of positive samples when the prediction results are false. The formulas of <em>Precision</em>, <em>Recall </em>and <em>mAP </em>areas follows:</p><p><img src="/media/202408//1724838589.170049.png" /> (14)</p><p><img src="/media/202408//1724838589.237183.png" /> (15)</p><p><img src="/media/202408//1724838589.435358.png" /> (16)</p><p><em>mAP </em>= <img src="/media/202408//1724838589.4497778.png" /> × 100, (17)</p><p>where <em>NC </em>is the number of classes.</p><p><em>mAP</em>0.5 and <em>mAP</em>0.5−0.95 are taken to evaluate the detection accuracy of our model at different IoU thresholds. <em>mAP</em>0.5 indicates the value calculated when the IoU threshold is 0.5. <em>mAP</em>0.5−0.95 is the mean of <em>mAP</em>0.5 values under ten IOU thresh- olds of those range from 0.5 to 0.95 and 0.05 as the step. <em>mAP</em>0.5−0.95 evaluates the detection capability of the model in a more comprehensive way, as shown in Equation <a href="#bookmark37">(18)</a>.</p><p><img src="/media/202408//1724838589.457026.png" /></p><p><a id="bookmark37"></a><strong>4.4 </strong><img src="/media/202408//1724838589.483475.png" /><strong> Ablation experiments</strong></p><p>To verify the effect of each improved module, we carried out eight ablation experiments on the subset of the SeaDrone- See dataset. With YOLOv5s as the benchmark, the backbone</p><p>network is replaced with swin transformer, deformable con- volutional DCNv2 is adopted in the neck network instead of traditional convolution, and SIoU takes the place of the CIoU loss function. The experimental results are shown in Table <a href="#bookmark35">2,</a> where + represents the improvement of each module.</p><p>By replacing the backbone network of YOLOv5s algorithm with swin transformer, the <em>Precision</em>, of YOLOv5s+Swin trans- former slightly decreases by 0.5%, while the <em>Recall</em>, <em>mAP</em>0.5 and <em>mAP</em>0.5−0.95 significantly increase by 2.3%, 2.7%, and 1.6%,respectively.</p><p>Irregular targets account for a relatively high propor- tion in the dataset so we adopt deformable convolution, DCNv2 instead of traditional convolution to improve the recognition capability for irregular objects when the neck net- work output. The <em>Precision</em>, <em>Recall</em>, <em>mAP</em>0.5 and <em>mAP</em>0.5−0.95 of YOLOv5s+DCNv2 increase by 7%, 0.7%, 0.9%, and 1.7%, respectively, in comparison to YOLOv5s.</p><p>In view of the direction and angle between regression vec- tors, SIoU introduces the vector angle between the real box and the predicted box, which can improve the convergence speed by constraining the predicted box in a certain direction of the <em>X </em>or <em>Y </em>axis. The <em>Precision</em>, <em>Recall</em>, <em>mAP</em>0.5 and <em>mAP</em>0.5−0.95 of YOLOv5s+SIOU increase by 2.4%, 2.2%, 0.6%, and 2.2%, respectively, than of YOLOv5s.</p><p>Combining the three improvements mentioned above, the performance of YOLOv5s+Swin transformer+DCNv2+ SIOU has been greatly improved that the <em>Precision</em>, <em>Recall</em>, <em>mAP</em>0.5 , and <em>mAP</em>0.5−0.95 increased by 5.6%, 4.7%, 4.2% and 3.3% compared to YOLOv5s.</p><p>It can be seen from the results of ablation experiments as shown in Table <a href="#bookmark35">2</a> that each improvement module is able to enhance the network.</p><p>After experimental verification, the <em>Precision</em>, <em>Recall</em>, <em>mAP</em>0.5 and <em>mAP</em>0.5−0.95 of the yolov5s-SwinDS model on the untested dataset are similar to those on the validation set in this experi- ment.</p><p><strong>4.5 </strong><img src="/media/202408//1724838589.507191.png" /><strong> Experimental comparison</strong></p><p>YOLOv7 series models and YOLOv8 series models are relatively advanced target detection models at present. In</p><p>LIU ET AL.</p><p>17519667, 0, Downloaded from <a href="https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024">https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024</a>, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (<a href="https://onlinelibrary.wiley.com/terms-and-conditions">https://onlinelibrary.wiley.com/terms-and-conditions</a>) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License</p><p><strong>8</strong></p><p><strong>TABLE 3 </strong>Evaluation metrics results.</p><table><tr><td><p><strong>Serial number</strong></p></td><td><p><strong>Model</strong></p></td><td><p><strong><em>Precision</em></strong></p></td><td><p><strong><em>Recall</em></strong></p></td><td><p><a id="bookmark38"></a><strong><em>mAP</em>0</strong>.<strong>5</strong></p></td><td><p><strong><em>mAP</em>0</strong>.<strong>5</strong>−<strong>0</strong>.<strong>95</strong></p></td></tr><tr><td><p>1</p></td><td><p>YOLOv5s</p></td><td><p>82.3</p></td><td><p>71.1</p></td><td><p>74.9</p></td><td><p>39.6</p></td></tr><tr><td><p>2</p></td><td><p>YOLOv7</p></td><td><p>90.7</p></td><td><p>68.4</p></td><td><p>76.7</p></td><td><p>39.4</p></td></tr><tr><td><p>3</p></td><td><p>YOLOv7tiny</p></td><td><p><strong>94.5</strong></p></td><td><p>65.7</p></td><td><p>71.4</p></td><td><p>37.1</p></td></tr><tr><td><p>4</p></td><td><p>YOLOv7X</p></td><td><p>81.4</p></td><td><p>73.8</p></td><td><p>77.6</p></td><td><p>41.0</p></td></tr><tr><td><p>5</p></td><td><p>YOLOv8n</p></td><td><p>74.8</p></td><td><p>58.5</p></td><td><p>60.1</p></td><td><p>33.5</p></td></tr><tr><td><p>6</p></td><td><p>YOLOv8s</p></td><td><p>86.2</p></td><td><p>56.4</p></td><td><p>67.6</p></td><td><p>38.6</p></td></tr><tr><td><p>7</p></td><td><p>YOLOv8m</p></td><td><p>74.1</p></td><td><p>60.4</p></td><td><p>67.7</p></td><td><p>41.4</p></td></tr><tr><td><p>8</p></td><td><p>YOLOv8l</p></td><td><p>81.1</p></td><td><p>58.6</p></td><td><p>66.8</p></td><td><p>40.0</p></td></tr><tr><td><p>9</p></td><td><p>YOLOv8x</p></td><td><p>90.9</p></td><td><p>58.1</p></td><td><p>71.7</p></td><td><p>42.4</p></td></tr><tr><td><p>10</p></td><td><p>YOLOv5s-SwinDS</p></td><td><p>87.9</p></td><td><p><strong>75.8</strong></p></td><td><p><strong>79.1</strong></p></td><td><p><strong>42.9</strong></p></td></tr></table><p><img src="/media/202408//1724838589.543094.jpeg" /></p><p><a id="bookmark39"></a><strong>FIGURE 9 </strong>Comparison results of the first scene: (a) YOLOv5s and (b) YOLOv5s-SwinDS.</p><p>order to estimate the effectiveness and performance of our YOLOv5s-SwinDS model, we conducted comparison experi- ments between the YOLOv5s-SwinDS model and YOLOv7 <a id="bookmark40"></a>series models and YOLOv8 series models for target detection.</p><p>No pretraining weights are used in comparison experiments to ensure fairness. Comparison results are shown in Table <a href="#bookmark38">3.</a> It can be seen that YOLOv5s-SwinDS is superior to the other algorithms in terms of <em>Recall</em>, <em>mAP</em>0.5 and <em>mAP</em>0.5−0.95 while <em>Precision </em>is lower than some algorithms, which could prove the superiority of our model.</p><p><a id="bookmark41"></a><strong>4.6 </strong><img src="/media/202408//1724838589.597236.png" /><strong> Visual comparisons</strong></p><p>In order to compare and evaluate the improvement of our model more intuitively, we selected seven images in the SeaDronesSee dataset for testing, as shown in Figures <a href="#bookmark42">9 to 15.</a></p><p>It can be seen from Figure <a href="#bookmark42">9a,b</a> that YOLOv5s failed to detect the boat target due to the effect of light reflection from the sea surface, while YOLOv5s-SwinDS detected the target. Figure <a href="#bookmark38">10a,b</a> shows that YOLOv5s cannot work well for swim- mer targets that with less pixels and Figure <a href="#bookmark43">11a,b</a> also shows that YOLOv5s was unable to detect some swimmer targets because of the influence of sunlight reflection and the small size of objects.</p><p><img src="/media/202408//1724838589.613513.jpeg" /></p><p><strong>FIGURE 10 </strong>Comparison results of the second scene: (a) YOLOv5s and <a id="bookmark42"></a><a id="bookmark43"></a>(b) YOLOv5s-SwinDS.</p><p><img src="/media/202408//1724838589.707189.jpeg" /></p><p><strong>FIGURE 11 </strong>Comparison results of the third scene: (a) YOLOv5s and (b) YOLOv5s-SwinDS.</p><p><img src="/media/202408//1724838589.738742.jpeg" /></p><p><strong>FIGURE 12 </strong>Comparison results of the fourth scene: (a) YOLOv5s and (b) YOLOv5s-SwinDS.</p><p><img src="/media/202408//1724838589.884347.jpeg" /></p><p><strong>FIGURE 13 </strong>Comparison results of the fifth scene: (a) YOLOv5s and (b) YOLOv5s-SwinDS.</p><p><img src="/media/202408//1724838589.937163.jpeg" /></p><p><strong>FIGURE 14 </strong>Comparison results of the sixth scene: (a)YOLOv5s and (b) YOLOv5s-SwinDS.</p><p>LIU ET AL.</p><p>17519667, 0, Downloaded from <a href="https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024">https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024</a>, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (<a href="https://onlinelibrary.wiley.com/terms-and-conditions">https://onlinelibrary.wiley.com/terms-and-conditions</a>) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License</p><p><strong>9</strong></p><p><img src="/media/202408//1724838589.9827971.jpeg" /></p><p><strong>FIGURE 15 </strong>Comparison results of the seventh scene: (a) YOLOv5s and (b) YOLOv5s-SwinDS.</p><p>In addition, it can be seen from Figures <a href="#bookmark39">12a,b</a> and <a href="#bookmark40">13a,b</a> that YOLOv5s repeatedly detected the same swimmer targets multi- ple times. As shown in Figures <a href="#bookmark41">14a,b</a>and <a href="#bookmark44">15a,b,</a> YOLOv5s had error results that life_saving_appliances target and buoy target were detected as swimmers, while YOLOv5s-SwinDS can get correct detection results.</p><p>From the above detection results, it can be seen that the YOLOv5s-SwinDS algorithm is more capable of detecting for small targets and is able to accurately detect small targets missed by YOLOv5s, which reduces the miss rate of small target detec- tion. The YOLOv5s algorithm did not detect the targets wished to be detected due to undesirable factors such as light reflection (Figure <a href="#bookmark33">8a,e)</a>.</p><p>The improved YOLOv5s-SwinDS algorithm can effectively deal with the interference information of complex background and successfully detect the targets that were expected because the backbone network is replaced with swin transformer, which makes our model able to prioritize under complex background, increases the weight of the region of interest, and suppresses the influence of the noise.</p><p>Compared with the original YOLOv5s model, although the YOLOv5s-SwinDS model has improved the <em>Precision </em>by 5.6%, <em>Recall </em>by 4.7%, <em>mAP</em>0.5 by 4.2% and <em>mAP</em>0.5−0.95 by 3.3%, the YOLOv5s-SwinD model also has some limitations.</p><p>1. The weight file size obtained after training the YOLOv5s SwinDS model is 64.1 M, significantly larger than the weight file size obtained after training the original YOLOv5s model (14.5 M).</p><p>2. The floating point operations (FLOPs) of the YOLOv5s- SwinDS model are 79.0 GFLOPs, significantly higher than <a id="bookmark3"></a>the original YOLOv5s model’s floating point operations (15.8 GFLOPs).</p><p>The computing power carried by drones is limited, so how to overcome the above two limitations and load the YOLOv5s- SwinDS model onto the onboard computer is the next focus of work.</p><p><strong>5 </strong><img src="/media/202408//1724838590.121543.png" /><strong> CONCLUSION</strong></p><p>With the development of unmanned aerial vehicles (UAVs) and remote sensing techniques, target detection of aerial images taken by UAV is widely applied and has significant value in the fields of transportation planning, military reconnaissance, and</p><p><a id="bookmark44"></a>environmental monitoring. In this paper, we apply deep learning and propose YOLOv5s-SwinDS as a target search algorithm for maritime distress based on YOLOv5s with swin transformer. Firstly, the backbone network of the YOLOv5s algorithm is replaced by the swin transformer, and a multi-level feature fusion module is introduced to enhance the feature expres- sion ability of the model for maritime distress targets. Secondly, DCNv2 is used instead of traditional convolution to improve the recognition ability of irregular targets when the neck net- work features are output. Finally, the CIoU loss function is replaced with SIoU to effectively reduce the redundant box while accelerating the convergence and regression of the pre- dicted box. Experimenting on a subset of the publicly available dataset SeaDronesSee, our proposed YOLOv5s-SwinDS model is superior to the original YOLOv5s model, the YOLOv7 series of models, and the YOLOv8 series of models, which have better recognition efficiency and speed and can be widely used in the field of target detection for maritime distress.</p><p><strong>AUTHOR CONTRIBUTIONS</strong></p><p><strong>Kun Liu</strong>: Formal analysis; methodology; project admin- istration; validation; visualization; writing—original draft. <strong>Yueshuang Qi</strong>: Data curation; visualization; writing—original draft; writing—review and editing. <strong>Guofeng Xu</strong>: Conceptual- ization; validation. <strong>Jianglong Li</strong>: Supervision.</p><p><strong>CONFLICT OF INTEREST STATEMENT</strong></p><p>The authors declare no conflicts of interest.</p><p><strong>DATA AVAILABILITY STATEMENT</strong></p><p>The data that support the findings of this study are available from the corresponding author upon reasonable request.</p><p>Experimental dataset:</p><p><a href="https://pan.baidu.com/s/1JnuHeCChTsnPwtjWt8HaUQ?pwd=yggf">https://pan.baidu.com/s/1JnuHeCChTsnPwtjWt8HaUQ?</a> <a href="https://pan.baidu.com/s/1JnuHeCChTsnPwtjWt8HaUQ?pwd=yggf">pwd=yggf.</a></p><p>YOLOv5s-SwinDS model source code:</p><p><a href="https://github.com/liukun6606/YOLOv5s-SwinDS">https://github.com/liukun6606/YOLOv5s-SwinDS.</a></p><p><strong>ORCID</strong></p><p><a href="https://orcid.org/0000-0003-0559-5675"><em>YueshuangQi</em><img src="/media/202408//1724838590.228438.png" /><img src="/media/202408//1724838590.270607.png" />https://orcid.org/0000-0003-0559-5675</a></p><p><strong>REFERENCES</strong></p><p>1. Cao, D., Ren, X., Zhu, M., Song, W.: Visual question answering research on multi-layer attention mechanism based on image target features. Hum.- centric Comput. Inf. Sci. 11, 11 (2021). <a href="https://doi.org/10.2296/HCIS.2021.11.011">https://doi.org/10.2296/HCIS.</a> <a href="https://doi.org/10.2296/HCIS.2021.11.011">2021.11.011</a></p><p>2. Yuan, H., Zhou, H., Cai, Z., Zhang, S., Wu, R.: Dynamic pyramid attention networks for multi-orientation object detection.J. Internet Technol. 23(1), 79–90 (2022)</p><p>3. Wang,J., Zou, Y., Lei, P., Sherratt, R.S., Wang, L.: Research on recurrent neural network based crack opening prediction of concrete dam.J. Internet Technol. 21(4), 1161–1169 (2020)</p><p>4. Wang, J., Yang, Y., Wang, T., Sherratt, R.S., Zhang, J.: Big data service architecture: A survey.J. Internet Technol. 21(2), 393–405 (2020)</p><p>5. Zhang,J., Zhong, S., Wang, T., Chao, H.-C., Wang, J.: Blockchain-based systems and applications: A survey.J. Internet Technol. 21(1), 1–14 (2020)</p><p>6. Wang, J., Zhao, C., He, S., Gu, Y., Alfarraj, O., Abugabah, A.: LogUAD: Log unsupervised anomaly detection based on Word2Vec. Comput. Syst. Sci. Eng. 41(3), 1207–1222 (2022)</p><p>LIU ET AL.</p><p>17519667, 0, Downloaded from <a href="https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024">https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.13024</a>, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (<a href="https://onlinelibrary.wiley.com/terms-and-conditions">https://onlinelibrary.wiley.com/terms-and-conditions</a>) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License</p><p><strong>10</strong></p><p><a id="bookmark4"></a>7. Wang,J., Li, W., Zhang, M., Tao, R., Chanussot,J.: Remote sensing scene classification via multi-stage self-guided separation network. IEEE Trans. <a id="bookmark12"></a>Geosci. Remote Sens. 61, 5615312 (2023)</p><p>8. Wang,J., Li, W., Wang, Y., Tao, R., Du, Q.: Representation-enhanced sta- tus replay network for multisource remote-sensing image classification. IEEE Trans. Neural. Netw. Learn. Syst. (2023). <a href="https://doi.org/10.1109/TNNLS.2023.3286422">https://doi.org/10.1109/</a> <a id="bookmark5"></a><a id="bookmark14"></a><a href="https://doi.org/10.1109/TNNLS.2023.3286422">TNNLS.2023.3286422</a></p><p>9. Zhang, M., Li, W., Zhang, Y., Tao, R., Du, Q.: Hyperspectral and LIDAR data classification based on structural optimization transmission. IEEE <a id="bookmark6"></a><a id="bookmark15"></a>Trans. Cybern. 53(5), 3153–3164 (2022)</p><p>10. Otote, D.A., Li, B., Ai, B., Gao, S., Xu, J., Chen, X., Lv, G.: A decision-</p><p>making algorithm for maritime search and rescue plan. Sustainability 11(7), <a id="bookmark16"></a><a id="bookmark17"></a>2084 (2019)</p><p>11. Rahmes, M.D., Chester, D., Hunt,J., Chiasson, B.: Optimizing coopera- tive cognitive search and rescue UAVs. In: Autonomous Systems: Sensors, Vehicles, Security and the Internet of Everything. SPIE, Bellingham, WA <a id="bookmark7"></a><a id="bookmark21"></a>(2018)</p><p>12. Dai,J., Xu, F., Chen, Q.: Multi-uav cooperative search on region division and path planning. Acta Aeronaut. Astronaut. Sin. 41(S1), 149–156 (2020)</p><p>13. Mao, G., Deng,T., Yu, N.: Object detection in UAV images based on multi- <a id="bookmark8"></a>scale split attention. Acta Aeronaut. Astronaut. Sin 43(12), 326738 (2022)</p><p>14. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies <a id="bookmark22"></a>for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, <a id="bookmark9"></a>pp. 580–587. IEEE, Piscataway, NJ (2014)</p><p>15. Girshick, R.: Fast R-CNN. In: Proceedings of 2015 IEEE International Conference on Computer Vision, pp. 1440–1448. IEEE, Piscataway, NJ <a id="bookmark10"></a><a id="bookmark25"></a><a id="bookmark29"></a>(2015)</p><p>16. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. <a id="bookmark11"></a><a id="bookmark36"></a>Mach. Intell. 39(6), 1137–1149 (2017)</p><p>17. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of 2016 IEEE Con- ference on Computer Vision & Pattern Recognition, pp. 779–788. IEEE, Piscataway, NJ (2016)</p><p>18. Redmon,J., Farhadi,A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271. IEEE, Piscataway, NJ (2017)</p><p>19. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. <a id="bookmark13"></a>arXiv:1804.02767 (2018)</p><p>20. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv:2004.10934 (2020)</p><p>21. Khalfaoui, A., Badri, A., Mourabit, I.E.: Comparative study of yolov3 and yolov5’s performances for real-time person detection. In: Proceedings of</p><p>2nd International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET), pp. 1–5. IEEE, Piscataway, NJ (2022)</p><p>22. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: Single shot multibox detector. In: Proceedings of 14th Euro- pean Conference on Computer Vision–ECCV 2016, pp. 21–37. Springer, Cham (2016)</p><p>23. Liu, S., Qi, L., Qin, H., Shi,J., Jia,J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768. IEEE, Piscataway, NJ (2018)</p><p>24. Jocher, G., Nishimura, K., Mineeva, T., Vilariño, R.: yolov5. Code repository (2020)</p><p>25. Zhihong, X., Xiafei, T., et al.: Anchor-free scale adaptive pedestrian detection algorithm.J. Control Decis. 36(2), 295–302 (2021)</p><p>26. Rezatofighi, H., Tsoi, N., Gwak,J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666. IEEE,Piscataway, NJ (2019)</p><p>27. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022. IEEE, Piscataway, NJ (2021)</p><p>28. Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: Proceedings of the IEEE/CVF Confer- enceon Computer Vision and Pattern Recognition, pp. 9308–9316. IEEE, Piscataway, NJ (2019)</p><p>29. Zheng, Z., Wang,P., Liu,W., Li,J., Ye, R., Ren, D.: Distance-iou loss: Faster and better learning for bounding box regression. Proc. AAAI Conf. Artif. Intell. 34, 12993–13000 (2020)</p><p>30. Gevorgyan, Z.: Siou loss: More powerful learning for bounding box regression. arXiv:2205.12740 (2022)</p><p>31. Lin, S., Liu, M., Tao, Z.: Detection of underwater treasures using attention mechanism and improved yolov5. Trans. Chin. Soc. Agric. Eng. 37(18), 307–314 (2021)</p><table><tr><td><p><strong>How to cite this article: </strong>Liu, K., Qi, Y., Xu, G., Li,J.: YOLOv5s maritime distress target detection method based on swin transformer. IET Image Process. 1–10 (2024). <a href="https://doi.org/10.1049/ipr2.13024">https://doi.org/10.1049/ipr2.13024</a></p></td></tr></table>
刘世财
2024年8月28日 17:49
转发文档
收藏文档
上一篇
下一篇
手机扫码
复制链接
手机扫一扫转发分享
复制链接
Markdown文件
HTML文件
PDF文档(打印)
分享
链接
类型
密码
更新密码