[go: up one dir, main page]

CN119919836A - A small target detection method from the perspective of drone based on SCGD-YOLO network - Google Patents

A small target detection method from the perspective of drone based on SCGD-YOLO network Download PDF

Info

Publication number
CN119919836A
CN119919836A CN202510101054.5A CN202510101054A CN119919836A CN 119919836 A CN119919836 A CN 119919836A CN 202510101054 A CN202510101054 A CN 202510101054A CN 119919836 A CN119919836 A CN 119919836A
Authority
CN
China
Prior art keywords
network
scgd
convolution
small target
yolo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510101054.5A
Other languages
Chinese (zh)
Inventor
项铁铭
苏旭麟
杨梦雅
成思霖
林铭煌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202510101054.5A priority Critical patent/CN119919836A/en
Publication of CN119919836A publication Critical patent/CN119919836A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

本发明公开了一种基于SCGD‑YOLO网络的无人机视角下的小目标检测方法,包括以下步骤:获取无人机视角下小目标的开源数据集,并将数据集划分为训练集、验证集和测试集,数据集包含十种类别,主要类别以小目标为主;并配置网络模型所需的网络环境;构建SCGD‑YOLO网络模型,SCGD‑YOLO网络模型包含骨干网络、颈部网络、检测头;将训练数据集的图片与标签送入构建的SCGD‑YOLO网络模型进行训练,并根据验证集的结果调整相应超参数获得最佳训练结果;最后将测试集中待检测的图片送入训练好的SCGD‑YOLO网络模型中进行小目标检测,并输出检测结果,本发明解决了无人机在高空拍摄时小目标分辨率较低、检测精度下降的问题,不仅提高了对小目标的检测精度,还降低了模型的参数量。

The invention discloses a small target detection method under the perspective of an unmanned aerial vehicle (UAV) based on an SCGD-YOLO network. The method comprises the following steps: obtaining an open source data set of small targets under the perspective of an unmanned aerial vehicle (UAV), and dividing the data set into a training set, a validation set and a test set, wherein the data set comprises ten categories, and the main category is mainly small targets; and configuring a network environment required by a network model; constructing an SCGD-YOLO network model, wherein the SCGD-YOLO network model comprises a backbone network, a neck network and a detection head; sending pictures and labels of the training data set into the constructed SCGD-YOLO network model for training, and adjusting corresponding hyperparameters according to the result of the validation set to obtain the best training result; finally, sending pictures to be detected in the test set into the trained SCGD-YOLO network model for small target detection, and outputting the detection result. The invention solves the problem that the resolution of small targets is low and the detection accuracy is reduced when the UAV is shooting at high altitude, and not only improves the detection accuracy of small targets, but also reduces the number of parameters of the model.

Description

Small target detection method based on SCGD-YOLO network under unmanned aerial vehicle visual angle
Technical Field
The invention relates to the technical field of small target detection under an unmanned aerial vehicle visual angle, in particular to a small target detection method under an unmanned aerial vehicle visual angle based on an SCGD-YOLO network.
Background
The unmanned aerial vehicle is widely applied to the fields of remote sensing images, agricultural pest detection, disaster monitoring, industrial target detection and the like, has the characteristics of low cost, high flexibility, wide viewing angle and the like, can perform tasks such as monitoring, patrol, target tracking and the like, but many application scenes relate to detection of small targets, on one hand, the problems of low resolution of the small targets in the images, easiness in interference of complex environments, mutual shielding among the small targets, unstable shooting in high-altitude environments and the like cause great difficulty in feature extraction of the small targets, and on the other hand, the conventional target detection algorithm is mostly developed for target detection of conventional size, and the detection algorithm aiming at specific scenes of the small targets is relatively few.
The target detection is an important research direction in the field of computer vision, aims at identifying and positioning targets from images or videos, and the current target detection algorithm can be roughly divided into a candidate region-based target detection algorithm and a regression-based target detection algorithm, wherein the candidate region-based target detection algorithm comprises R-CNN, faster R-CNN, mask R-CNN and the like, and the Faster R-CNN is a detection algorithm commonly used in the target detection of a deep learning method at present, but has larger network parameters, high calculation cost and relatively slow reasoning speed and is not suitable for the task of real-time detection. In order to solve the problem of low reasoning speed, a YOLO series algorithm is generated, and the algorithm is a regression-based target detection algorithm, and the selection of candidate frames is eliminated, so that target frames and category predictions are directly generated in an original image, the target detection process is simplified, and the speed and efficiency are more emphasized. Although the YOLO algorithm has a faster detection speed, in processing small target detection at the unmanned aerial vehicle viewing angle, the target resolution is correspondingly reduced due to the lower resolution. For this reason, many researches in recent years improve the detection accuracy of the model by introducing attention mechanisms and the like, but also greatly increase the parameters of the model, make the subsequent deployment of the model difficult, and affect the practicability of the model in practical application.
Disclosure of Invention
The invention aims to provide a small target detection method based on an SCGD-YOLO network under the view angle of an unmanned aerial vehicle, which solves the problem of detection accuracy reduction caused by the conditions of lower resolution of the small target, complex environment interference and the like when the unmanned aerial vehicle shoots at high altitude, and specifically provides the following technical scheme for solving the technical problems:
a small target detection method based on SCGD-YOLO network under the unmanned aerial vehicle visual angle comprises the following steps:
Step 1, acquiring an open source data set of a small target under the view angle of an unmanned aerial vehicle, and dividing the data set into a training set, a verification set and a test set, wherein the data set comprises ten categories, and the main categories are mainly the small target;
preferably, visDrone 2019 data sets are adopted as experimental data sets for training, verification and testing, and objects photographed by the unmanned aerial vehicle in different positions, angles and environments are collected, wherein 10 categories are included, and most of the categories take small targets as main bodies, and are data sets specially used for small target detection.
Step 2, configuring a network environment required by a network model;
Preferably, the configured network environment is Ubuntu 16.04LTS operating system, experimental runs were performed using a network of NVIDIA GTX3090 GPU with 16GB video memory, and Python 3.8.16, and Pytorch versions 1.13.1 and torchvision 0.14.1.
Step3, constructing an SCGD-YOLO network model, wherein the SCGD-YOLO network model comprises a backbone network, a neck network and a detection head;
The SCGD-YOLO network model comprises the steps of firstly carrying out feature extraction on an input image through a backbone network, gradually extracting low-level and high-level features of the image through a convolution layer, batch normalization and an activation function, capturing space information and semantic information in the image through multi-layer feature extraction, inputting the extracted feature information into a neck network for feature fusion, fusing features of different layers through a feature pyramid in the neck network, finally sending feature fusion information into a detection head, outputting target information predicted by each network, and integrating class labels, frame coordinates and confidence of the output image.
Preferably, the SCGD-YOLO network model is improved by taking YOLOv as a baseline model, firstly introducing improved C2f-CAG and C2f-CFG modules into a backbone network and a neck network to replace an original C2f module, secondly replacing a feature pyramid of the original neck network with a brand-new feature pyramid structure SCOK, and finally replacing a decoupling head structure of a head network with a lightweight detection head LSDC comprising shared convolution.
Preferably, in the improved C2f-CAG and C2f-CFG modules, the C2f-CAG structure is introduced into a CAFormer module in the Transformer, the token mixer of the CAFormer module is a self-attention layer, a Convolutional GLU gating mechanism in TransNext is used for replacing an MLP layer in the CAFormer module, the GLU controls the transmission of information flow in an adaptive manner, a convolution operation is used for replacing a traditional full connection layer, the C2f-CFG structure is introduced into a ConvFormer module in the Transformer, the token mixer of the ConvFormer module is a separable convolution, the separable convolution is composed of a depth-wise convolution and a point-wise convolution, the depth-wise convolution is used for extracting spatial features, the point-wise convolution is used for extracting channel features, and meanwhile, the Convolutional GLU gating mechanism is also used for replacing the MLP layer in the ConvFormer module, and the C2f-CFG module is used in a collocation.
Preferably, the Convolutional GLU gating mechanism is a nonlinear activation function based on the gating mechanism, the expression is GLU (x) = (W 1x)⊙σ(W2 x), wherein W 1 x is a linear transformation of an input, W 2 x is another linear transformation as a feature extraction part, sigma (·) is a sigmoid activation function, the output controls gating between 0 and 1, and as such, indicates element level multiplication.
Preferably, in the brand new feature pyramid SCOK, an SPD-Conv is introduced to extract small target information, the SPD-Conv is composed of a space-to-depth layer and a non-step convolution layer, the SPD-Conv downsamples the feature map and retains all information in the channel dimension, and after the small target information in the SPD-Conv and the small target information in the feature layer are spliced, the small target information is transmitted to a SPlit-Omni-Kernel module to perform feature fusion and is output to a detection head to perform small target detection and positioning.
Preferably, the SPlit-Omni-Kernel module divides the input feature into two branches according to the CSP residual concept, one branch is processed by the Omni-Kernel module, the other branch is kept unchanged, and finally, the reconstruction of the multi-scale information is realized through feature cascade. Wherein the Omni-Kernel module comprises a big branch, a global branch and a local branch.
Preferably, in the lightweight detection head LSDC of the shared convolution, a1×1 convolution is first used to adjust the number of channels, then 23×3 convolutions are used as shared weight convolutions instead of the original 123×3 convolutions for feature extraction, small target information is captured in the feature extraction stage by introducing detail enhancement convolutions, normalization is introduced in the 1×1 convolution, and normalization is introduced in the convolution of the feature extractor, and the flow of the normalization is deduced as follows:
Wherein NxCxHxW is the size of the defined input feature map x, the normalization is performed by dividing the number of channels into a plurality of groups, and assuming that the number of channels is divided into G groups, each group contains C' =C/G, and for G groups, each group calculates the mean μ g and variance thereof Wherein the method comprises the steps ofFor calculating the normalization of each channel in each group, normalizing each element, and introducing a trainable scaling factor γ and offset β.
Preferably, the detail enhancement convolution comprises five convolution layers which are deployed in parallel, including a common convolution, an angle difference convolution, a center difference convolution, a horizontal difference convolution and a vertical difference convolution, and is used for recovering the spatial resolution of the image and enhancing the detail part.
Step 4, sending the pictures and the labels of the training data set into the constructed SCGD-YOLO network model for training, and adjusting corresponding super parameters according to the result of the verification set to obtain an optimal training result;
preferably, the training method is that no pre-trained model is used in the training process, the input size of an image is set to 640 x 640 pixels, 200 epochs are trained, the batch is set to 16, the initial learning rate is 0.01, random gradient descent (SGD) is used for parameter optimization, and a final weight file is saved after training is completed.
And 5, sending the pictures to be detected in the test set into a trained SCGD-YOLO network model to detect the small target, and outputting a detection result.
Compared with the prior art, the invention has the following beneficial effects:
(1) Aiming at the fact that Bottleneck in a C2f module in a backbone network and a neck network has weak detection capability on a small target in a complex environment and a large number of parameters are contained in MLP (multi-level programmable logic) of the small target, the novel lightweight Bottleneck module is designed, the network is improved, the parameter quantity and the calculation quantity of a model are reduced, and delay expenditure is minimized.
(2) According to the invention, a brand-new feature pyramid module is designed, and the small target information is fused with the P2 feature layer by extracting the small target information of the feature layer, so that the feature fusion of different parts and the capturing capability of small targets in unmanned aerial vehicle images are improved, good detection precision can be maintained in environments such as dense vehicle stacking and sunlight irradiation environments, and the adaptability of the unmanned aerial vehicle in different environments can be improved.
(3) Aiming at the fact that the decoupling heads occupy larger parameter quantity in the network model, a brand new detection head is designed, on the premise of reserving the decoupling heads, a shared convolution module is introduced to reduce the parameter quantity, and detail enhancement convolution is introduced to improve the capability of capturing small target information, so that the detection head is light in weight on the premise of guaranteeing the accuracy.
(4) The SCGD-YOLO algorithm provided by the invention has the characteristics of high precision, few parameters and easiness in deployment, and has strong practicability and great application prospect.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.
In the drawings:
FIG. 1 is a general flow chart of an embodiment of the present invention;
FIG. 2 is a diagram of an improved model of an embodiment of the present invention;
FIG. 3 is a schematic diagram of the structures of the C2f-CAG and the C2f-CFG according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a novel feature pyramid SCOK according to an example embodiment of the present invention;
FIG. 5 is a schematic diagram of the SPlit-Omni-Kernel module of FIG. 4;
FIG. 6 is a schematic diagram of the Omni-Kernel module of FIG. 5;
FIG. 7 is a schematic diagram of a structure of a LSDC of a shared weight detection head according to an embodiment of the present invention;
FIG. 8 is a graph of the detection effect of the original model in the case of dense vehicle stacking;
FIG. 9 is a graph showing the detection effect of SCGD-YOLO in the case of dense vehicle stacking according to the embodiment of the present invention;
FIG. 10 is a graph of the detection effect of an original model in a sunlight environment;
FIG. 11 is a graph showing the effect of SCGD-YOLO detection in a sunlight environment according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention provides a small target detection method based on an SCGD-YOLO network under the view angle of an unmanned aerial vehicle, and specifically provides a technical scheme with reference to FIGS. 1 to 11, wherein the method specifically comprises the following steps of:
Step 1, acquiring an open source data set of a small target in an unmanned aerial vehicle view angle on a network, wherein the data set comprises ten categories, and the small target is taken as a main point;
In this embodiment, visDrone 2019 dataset is adopted as the experimental dataset for training, verification and testing, the dataset is collected by the Tianjin university machine learning and data mining laboratory AISKYEYE team, objects photographed by the unmanned aerial vehicle in different positions, angles and environments are collected, the objects comprise 10 types, the distribution ratio of the training set is 6471 pictures, the distribution ratio of the verification set is 548, the distribution ratio of the testing set is 1610, the ten types of objects contained in the image are automobiles, pedestrians, bicycles, characters, tricycles, trucks, open tricycles, buses, motorcycles and trucks, and most of the objects take small targets as main bodies, and the objects are the datasets specially used for small target detection.
Step 2, configuring a network environment required by a network model;
In this embodiment, the configured network environment is Ubuntu 16.04LTS operating system, the network of NVIDIA GTX3090 GPU with 16GB video memory is adopted for experimental operation, python 3.8.16 is used, and Pytorch versions 1.13.1 and torchvision 0.14.1.
Step 3, constructing an SCGD-YOLO network model, wherein the SCGD-YOLO network model is improved by taking YOLOv as a baseline model, the model comprises a Backbone network (Backbone), a neck network (Neck) and a detection Head (Head), firstly, introducing improved C2f-CAG and C2f-CFG modules into the Backbone network and the neck network to replace an original C2f module, secondly, designing a brand-new feature pyramid structure SCOK to replace a feature pyramid of the original neck network, and finally, replacing a decoupling Head structure of the Head network with a lightweight detection Head LSDC comprising shared convolution.
In this embodiment, in combination with fig. 2, in the scgd-YOLO network model diagram, an input image is firstly subjected to feature extraction through a backbone network, low-level and high-level features of the image are gradually extracted through a convolution layer, batch normalization and activation functions, spatial information and semantic information in the image can be captured through multi-layer feature extraction, then the extracted feature information is input into a neck network for feature fusion, features of different layers are fused through a feature pyramid in the neck network, finally feature fusion information is sent into a detection head, the model outputs target information predicted by each network, and finally class labels, frame coordinates and confidence of the output image are integrated.
Further, the invention replaces the C2f module in the backbone network and the backbone network with the C2f-CAG and C2f-CFG module, the structures of the two modules are shown in figure 3, the C2f-CAG structure introduces CAFormer modules in the Transformer, the token mixer of the module is a self-attention layer, which channels are more important in the characteristic extraction process can be more accurately distinguished, thus dynamically adjusting the weight of each channel, improving the characteristic selection capacity of the model, simultaneously using a Convolutional GLU gating mechanism in TransNext to replace an MLP layer in the CAFormer module, the GLU can control the transmission of information flow in a self-adaptive mode, thereby enhancing the nonlinear expression capacity of the model, and using convolution operation to replace the traditional full-connection layer, thereby obviously reducing the parameter quantity of the model, and likewise, the token mixer of the module is a separable convolution, which is formed by convolution with a progressive depth and convolution point convolution, the convolution is used for extracting the characteristic of the channel, the GLF is more suitable for the purpose of extracting the space point, and the GLF is more suitable for the purpose of reducing the characteristic extraction of the channel, and the GLF 2 is more suitable for the purpose of reducing the error.
Illustratively, the GLU acts as a non-linear activation function based on gating mechanisms, expressed as follows:
GLU(x)=(W1x)⊙σ(W2x) (1)
Wherein W 1 x is the linear transformation of the input, W 2 x is another linear transformation as the feature extraction part, sigma (&) is the sigmoid activation function, and the output controls the gating between 0 and 1, as indicated by element level multiplication.
Further, aiming at the poor extraction and fusion effects of small target information, a brand-new feature pyramid structure SCOK is designed in the neck network, in this embodiment, with reference to fig. 4, fig. 4 is a schematic structural diagram of a brand-new feature pyramid SCOK in the embodiment of the present invention, and SPD-Conv is composed of a space-to-depth layer and a non-step convolution layer. After integrating small target information in SPD-Conv with small target information of a P3 feature layer, considering that feature information fusion is insufficient, and the parameter quantity is greatly increased by processing the small target information completely, a SPlit-Omni-Kernel module is designed, specifically, with reference to FIG. 5, FIG. 5 is a schematic diagram of a SPlit-Omni-Kernel module in FIG. 4, input features are divided into two branches according to CSP residual error concept, one branch is processed by the Omni-Kernel module, the other branch is kept unchanged, and finally reconstruction of multi-scale information is realized through feature cascading.
The method comprises the steps of an Omni-Kernel module, as shown in fig. 6, performing 1×1 convolution processing on an input feature map, then performing large branches respectively, including 15×1 deep convolution, 15×15 deep convolution and 1×15 deep convolution, to capture small target information in different directions, wherein the global branches are composed of a double-Domain Channel Attention Module (DCAM) and a frequency-based spatial attention module (FSAM), the global domains which cannot be covered by the large branches can be compensated by adopting double-domain processing, the local branches adopt simple 1×1 deep convolution layers, the utilization rate of feature information is improved under the condition of not increasing model complexity, and finally, the results of the 3 branches and the feature map output by the 1×1 convolution on the input side are spliced and then subjected to 1×1 convolution processing, wherein the value of the depth convolution K in the large branches influences the parameter quantity and the accuracy of the model, and the accuracy are balanced by setting the value of K to 15 through testing.
In this embodiment, in conjunction with fig. 7, the decoupling head structure of the head network is replaced with a lightweight detection head LSDC comprising a shared convolution, as shown in fig. 7. The detection head adopted by the original model is a decoupling head, the detection head divides the target classification task and the boundary frame regression task into two independent processes, the method can improve the characteristic extraction capability of the network to a certain extent, but as each regression task needs two 3×3 convolutions to extract and process the characteristics, and one 1×1 convolution is used for adjusting the boundary frame and outputting boundary frame prediction information, the network model needs to respectively finish the detection of the characteristic layers of three scales of P3, P4 and P5, 123×3 convolutions and 61×1 convolutions are needed, and the parameter quantity and the calculation quantity of the model are greatly increased. For this problem, two convolutions sharing weights are used to replace 12 convolutions of 3×3 for image feature extraction, while considering that the weight sharing can reduce the number of parameters and the calculation amount, a part of small target information is lost, so that the precision is reduced, and therefore, a detail enhancement convolution is introduced to capture the small target information in the feature extraction stage so as to maintain the precision not to be reduced. Through extensive research, it has been found that the Group Normalization (GN) method has been demonstrated to improve the accuracy of classification and localization of the detection heads, thus attempting to introduce GN on a 1 x 1 convolution and on a feature extractor convolution to compensate for the accuracy loss, the flow of GN is deduced as follows:
In the formulas (2) (3), assuming that the magnitude of the input feature map x is nxc×h×w, GN is obtained by dividing the number of channels into a plurality of groups first, assuming that the number of channels is divided into G groups, each group contains C' =c/G for which the average μg and variance are calculated for each group In the formula (4) of the present invention,The normalization of each channel in each group is calculated, in equation (5), each element is normalized, and a trainable scaling factor γ and offset β are introduced for restoring the expressive power of the model.
And 4, sending the pictures and the labels in the training data set into a network model for training, and adjusting corresponding super parameters according to the result of the verification set to obtain an optimal training result.
In the embodiment, the training method for training is that no pre-trained model is used in the training process, the input size of an image is set to 640×640 pixels, 200 epochs are trained, the batch is set to 16, the initial learning rate is 0.01, random gradient descent (SGD) is used for parameter optimization, and a final weight file is saved after training is completed.
And 5, sending the pictures to be detected in the test set into a trained SCGD-YOLO model to detect the small target.
In this example, the results of ablation experiments are analyzed below to verify the role of each module in the model, and the results of ablation experiments are shown in table 1 below.
TABLE 1 ablation experiment (V represents adding the module to the model)
As shown by the ablation experimental results in table 1, after the novel feature pyramid network is introduced, the mAP50 and the mAP50-95 detected by the small targets are respectively improved by 2.9% and 2% compared with the reference model YOLOv s, the parameter quantity and the model size are improved to a certain extent, and the feature pyramid aiming at the small targets can better capture the features of the small targets to improve the precision, but the parameter quantity and the model volume are improved due to the introduction of more feature information of the small targets.
Further, for the improvement of the C2f module in the YOLOv model, as separable convolution is introduced into the C2f module of the neck network for light weight, and a self-attention mechanism is used in the C2f module in the main network, after the C2f-CAG and the C2f-CFG modules are introduced, the average detection precision can be known to be reduced by 10.6 percent under the condition of small loss according to experimental results, and the model volume is reduced by 2.2MB. The LSDC detection head is improved aiming at the decoupling head of the reference model, so that the parameter quantity of the model is reduced by 15.3%, the volume of the model is reduced by 1.7MB, and mAP50 is reduced by only 0.1%, and the detection head has the effect of light weight. The modules provided by the invention are randomly combined, the mAP50 and the mAP50-95 of the model are improved, and the parameter quantity is reduced compared with a reference model, but the size of the model is increased due to the introduction of small target characteristic information. And finally, adding the network model obtained by all modules, wherein compared with the original reference model, mAP50 and mAP50-95 of the model respectively rise by 2.4% and 1.7%, the parameter quantity is reduced by 27.5%, the model volume is reduced by 4.3MB, the deployment into the unmanned aerial vehicle is facilitated, and the small target detection task of the unmanned aerial vehicle visual angle is met.
Preferably, to represent the applicability of the algorithm in different scenarios, a TinyPerson dataset is alternatively selected for generalization experiments. The TinyPerson data set takes tiny target detection in a long distance and large background as a design reference, the images are collected from the Internet, and the key characteristics of the data set are that people are divided into two types, namely offshore people and land people, the offshore people comprise people on a ship, people lying in water and the like, the land people comprise other everybody, the targets of the data set are mostly small targets, and the data set is used for small target detection. The results of the generalization experiment are shown in table 2 below.
TABLE 2 generalization experiment results
Model P(%) R(%) mAP50(%) mAP50-95(%) Parameters(%) Modelsize/MB
YOLOv8s 44.5 29.6 28.3 9.14 11.13 21.5
Ours 47.2 33.7 31.4 9.83 8.06 17.1
Exemplary, as shown by the generalization experimental results of table 2, compared with the reference model, the precision (P), recall (R) and average precision (mAP 50) of the improved model on the Tiny Person data set are respectively improved by 2.7%, 4.1% and 3.1%, the parameter quantity is reduced by 27.6%, the model volume is reduced by 4.4MB, and the improved algorithm has good universality and universality.
Preferably, the present application will perform two sets of comparison experiments, the first set comparing the SCGD-YOLO algorithm herein with other algorithms of the YOLO series, and the second set comparing with other mainstream algorithms in the field of target detection in recent years, the comparison test results are shown in tables 3 and 4 below.
TABLE 3 comparative test with the YOLO series Algorithm
Model P(%) R(%) mAP50(%) Parameters(M)
YOLOv5s 42.5 31.9 31.0 7.04
YOLOv7-tiny 46.4 36.4 34.1 6.03
YOLOv7 51.3 42.0 39.6 36.53
YOLOv8s 50.3 38.3 39.1 11.13
YOLOv10s 50.2 38.7 39.4 7.22
YOLOv11s 49.7 37.9 38.6 9.42
Ours 53.7 39.5 41.5 8.06
By way of example, as can be seen from the comparison test of table 3, the SCGD-YOLO algorithm proposed by the present invention is significantly advanced in accuracy and average accuracy with other YOLO series algorithms, and the model parameters are also lighter in the YOLO series algorithm.
TABLE 4 comparative test with other mainstream algorithms
By way of example, as can be seen from the comparison experiment in table 4, the SCGD-YOLO algorithm provided by the present invention is far ahead in accuracy and average accuracy compared with other mainstream algorithms, and the parameter quantity is far lower than that of other algorithms, so that the SCGD-YOLO algorithm is more advantageous in model deployment.
In this embodiment, in order to demonstrate the effects achieved by the present invention, description will be given with reference to fig. 8, 9, 10 and 11. The detection diagram of the improved model is compared with the detection diagram of the original model, as shown in fig. 8, most of vehicles and pedestrians can be detected by the original model under the condition that vehicles are dense, but only half of the images of motorcycles and pedestrians at the right lower corner of the picture are unrecognized, and the pedestrians and motorcycles which are missed in the right lower corner can be identified by the improved model even under the condition that the vehicles are dense as shown in fig. 9, as shown in fig. 10, partial information can be covered and difficultly extracted under the condition that the sunlight irradiates, most of vehicles and pedestrians can be detected by the original model, but the pedestrians at the left upper corner and the right side of the picture can not be identified due to sunlight irradiation and vehicle shielding, and the pedestrians can be identified under the condition that the sunlight irradiates and the vehicles are shielded by the improved model as shown in fig. 11, so that the detection precision of the improved model can be ensured under different environments better than that of the original model can be seen by the comparison diagram.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the above-mentioned embodiments are merely preferred embodiments of the present invention, and the present invention is not limited thereto, but may be modified or substituted for some of the technical features thereof by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1.一种基于SCGD-YOLO网络的无人机视角下的小目标检测方法,其特征在于:包括以下步骤:1. A small target detection method from the perspective of a drone based on the SCGD-YOLO network, characterized in that it includes the following steps: 步骤1:获取无人机视角下小目标的开源数据集,并将数据集划分为训练集、验证集和测试集,所述数据集包含十种类别,主要类别以小目标为主;Step 1: Obtain an open source dataset of small targets from the perspective of a drone, and divide the dataset into a training set, a validation set, and a test set. The dataset contains ten categories, with small targets being the main category. 步骤2:配置网络模型所需的网络环境;Step 2: Configure the network environment required by the network model; 步骤3:构建SCGD-YOLO网络模型,所述SCGD-YOLO网络模型包含骨干网络、颈部网络、检测头;Step 3: Build a SCGD-YOLO network model, which includes a backbone network, a neck network, and a detection head; 步骤4:将训练数据集的图片与标签送入构建的SCGD-YOLO网络模型进行训练,并根据验证集的结果调整相应超参数获得最佳训练结果;Step 4: Send the images and labels of the training data set to the constructed SCGD-YOLO network model for training, and adjust the corresponding hyperparameters according to the results of the validation set to obtain the best training results; 步骤5:将测试集中待检测的图片送入训练好的SCGD-YOLO网络模型中进行小目标检测,并输出检测结果。Step 5: Send the images to be detected in the test set to the trained SCGD-YOLO network model for small target detection and output the detection results. 2.根据权利要求1所述的一种基于SCGD-YOLO网络的无人机视角下的小目标检测方法,其特征在于:所述构建的SCGD-YOLO网络模型包括:输入图像首先经过骨干网络进行特征提取,通过卷积层、批量归一化和激活函数,网络逐步提取出图像的低级和高级特征,经过多层的特征提取,捕捉图像中的空间信息和语义信息;将提取出的特征信息输入到颈部网络进行特征融合,通过颈部网络中的特征金字塔,将不同层的特征进行融合;最后将特征融合信息送入检测头中,输出每个网络所预测的目标信息,并整合输出图像的类别标签、框坐标和置信度。2. A small target detection method under the perspective of an unmanned aerial vehicle based on a SCGD-YOLO network according to claim 1, characterized in that: the constructed SCGD-YOLO network model comprises: the input image is first subjected to feature extraction by a backbone network, and the network gradually extracts low-level and high-level features of the image through convolutional layers, batch normalization and activation functions, and captures spatial information and semantic information in the image through multi-layer feature extraction; the extracted feature information is input into a neck network for feature fusion, and features of different layers are fused through a feature pyramid in the neck network; finally, the feature fusion information is sent to a detection head, the target information predicted by each network is output, and the category label, frame coordinates and confidence of the output image are integrated. 3.根据权利要求1所述的一种基于SCGD-YOLO网络的无人机视角下的小目标检测方法,其特征在于:所述SCGD-YOLO网络模型是以YOLOv8为基线模型进行改进,首先在骨干网络与颈部网络引入改进后的C2f-CAG与C2f-CFG模块替换原本的C2f模块,其次利用全新的特征金字塔结构SCOK代替原本颈部网络的特征金字塔,最后将头部网络的解耦头结构替换为包含共享卷积的轻量化检测头LSDC。3. A small target detection method based on a SCGD-YOLO network from the perspective of a drone according to claim 1, characterized in that: the SCGD-YOLO network model is improved based on YOLOv8 as the baseline model, firstly, the improved C2f-CAG and C2f-CFG modules are introduced into the backbone network and the neck network to replace the original C2f module, and then the new feature pyramid structure SCOK is used to replace the feature pyramid of the original neck network, and finally the decoupling head structure of the head network is replaced by a lightweight detection head LSDC containing shared convolution. 4.根据权利要求3所述的一种基于SCGD-YOLO网络的无人机视角下的小目标检测方法,其特征在于:所述改进后的C2f-CAG与C2f-CFG模块中,C2f-CAG结构引入Transformer中的CAFormer模块,所述CAFormer模块的令牌混合器为自注意力层,使用TransNext中的Convolutional GLU门控机制替代CAFormer模块中的MLP层,GLU通过自适应的方式控制信息流的传递,利用卷积操作代替传统的全连接层;4. According to claim 3, a small target detection method based on the SCGD-YOLO network under the perspective of a drone is characterized in that: in the improved C2f-CAG and C2f-CFG modules, the C2f-CAG structure introduces the CAFormer module in the Transformer, the token mixer of the CAFormer module is a self-attention layer, and the Convolutional GLU gating mechanism in TransNext is used to replace the MLP layer in the CAFormer module. GLU controls the transmission of information flow in an adaptive manner, and uses convolution operations to replace traditional fully connected layers; 所述C2f-CFG结构引入了Transformer中的ConvFormer模块,所述ConvFormer模块的令牌混合器为可分离卷积,可分离卷积由逐深度卷积和逐点卷积构成,深度卷积用于提取空间特征,逐点卷积用于提取通道特征,同时也采用Convolutional GLU门控机制替代ConvFormer模块中的MLP层,所述C2f-CAG与所述C2f-CFG两种模块搭配使用。The C2f-CFG structure introduces the ConvFormer module in Transformer. The token mixer of the ConvFormer module is a separable convolution, which is composed of depth-wise convolution and point-wise convolution. The depth-wise convolution is used to extract spatial features, and the point-by-point convolution is used to extract channel features. At the same time, the Convolutional GLU gating mechanism is also used to replace the MLP layer in the ConvFormer module. The C2f-CAG and C2f-CFG modules are used in combination. 5.根据权利要求4所述的一种基于SCGD-YOLO网络的无人机视角下的小目标检测方法,其特征在于:所述Convolutional GLU门控机制为一种基于门控机制的非线性激活函数,表达式为:GLU(x)=W1x)⊙σ(W2x),其中,W1x为输入的线性变换,作为特征提取部分,W2x为另一个线性变换,作为门控部分,σ(·)是sigmoid激活函数,输出控制在0到1之间的门控,⊙表示元素级乘法。5. According to a small target detection method based on SCGD-YOLO network from the perspective of a drone according to claim 4, it is characterized in that: the Convolutional GLU gating mechanism is a nonlinear activation function based on a gating mechanism, and the expression is: GLU(x)= W1x )⊙σ( W2x ), wherein W1x is a linear transformation of the input, as a feature extraction part, W2x is another linear transformation, as a gating part, σ(·) is a sigmoid activation function, the output is controlled to be a gating between 0 and 1, and ⊙ represents element-wise multiplication. 6.根据权利要求3所述的一种基于SCGD-YOLO网络的无人机视角下的小目标检测方法,其特征在于:所述全新的特征金字塔SCOK中,引入SPD-Conv用以提取小目标信息,所述SPD-Conv由一个空间到深度层和一个非步长卷积层组成,SPD-Conv对特征映射进行下采样,并保留通道维度中的所有信息,所述SPD-Conv中的小目标信息与特征层的小目标信息进行拼接处理后,将其输送到SPlit-Omni-Kernel模块进行特征融合,并输出到检测头中进行小目标检测与定位。6. According to a small target detection method based on the SCGD-YOLO network from the perspective of a drone as described in claim 3, it is characterized in that: SPD-Conv is introduced into the new feature pyramid SCOK to extract small target information, and the SPD-Conv consists of a space-to-depth layer and a non-step convolution layer. SPD-Conv downsamples the feature map and retains all information in the channel dimension. After the small target information in the SPD-Conv is spliced with the small target information in the feature layer, it is transmitted to the SPlit-Omni-Kernel module for feature fusion and output to the detection head for small target detection and positioning. 7.根据权利要求6所述的一种基于SCGD-YOLO网络的无人机视角下的小目标检测方法,其特征在于:所述SPlit-Omni-Kernel模块,根据CSP残差思想,输入特征分成两个分支,其中一个分支经过Omni-Kernel模块处理,另一分支保持不变,并在最终通过特征级联实现多尺度信息的重,且所述Omni-Kernel模块包含大分支、全局分支、局部分支。7. According to claim 6, a small target detection method based on the SCGD-YOLO network from the perspective of a drone is characterized in that: the SPlit-Omni-Kernel module, according to the CSP residual idea, divides the input features into two branches, one of which is processed by the Omni-Kernel module, and the other branch remains unchanged, and finally realizes the weight of multi-scale information through feature cascading, and the Omni-Kernel module contains a large branch, a global branch, and a local branch. 8.根据权利要求3所述的一种基于SCGD-YOLO网络的无人机视角下的小目标检测方法,其特征在于:所述共享卷积的轻量化检测头LSDC中,首先采用1×1的卷积用以调整通道数,然后采用2个3×3的卷积作为共享权值卷积代替原本的12个3×3卷积用以特征提取,通过引入细节增强卷积在特征提取阶段捕获小目标信息,并在1×1的卷积引入归一化,以及在特征提取器的卷积上引入归一化,所述归一化的流程推导如下:8. According to claim 3, a small target detection method based on SCGD-YOLO network from the perspective of a drone is characterized in that: in the shared convolution lightweight detection head LSDC, a 1×1 convolution is first used to adjust the number of channels, and then two 3×3 convolutions are used as shared weight convolutions to replace the original 12 3×3 convolutions for feature extraction, and small target information is captured in the feature extraction stage by introducing detail enhancement convolution, and normalization is introduced in the 1×1 convolution, and normalization is introduced on the convolution of the feature extractor, and the normalization process is derived as follows: 其中,N×C×H×W为定义的输入特征图x的大小,归一化的方法是:首先将通道数分为多组,假设将通道数分为G组,则每组包含C'=C/G,针对于G组,每组对其进行计算均值μg和方差其中用于计算每个组中的每个通道的归一化,对每个元素进行归一化处理,并且引入可训练的缩放因子γ和偏移量β。Among them, N×C×H×W is the size of the defined input feature map x. The normalization method is: first, the number of channels is divided into multiple groups. Assuming that the number of channels is divided into G groups, each group contains C'=C/G. For each G group, the mean μg and variance are calculated. in It is used to calculate the normalization of each channel in each group, normalize each element, and introduce a trainable scaling factor γ and offset β. 9.根据权利要求8所述的一种基于SCGD-YOLO网络的无人机视角下的小目标检测方法,其特征在于:所述细节增强卷积中,包含五个并行部署的卷积层,包括普通卷积、角度差分卷积、中心差分卷积、水平差分卷积和垂直差分卷积,用于恢复图像的空间分辨率并增强细节部分。9. According to claim 8, a small target detection method based on the SCGD-YOLO network from the perspective of a drone is characterized in that: the detail enhancement convolution comprises five parallel deployed convolution layers, including ordinary convolution, angle difference convolution, center difference convolution, horizontal difference convolution and vertical difference convolution, which are used to restore the spatial resolution of the image and enhance the details. 10.根据权利要求1所述的一种基于SCGD-YOLO网络的无人机视角下的小目标检测方法,其特征在于:所述训练方法为:训练过程中未使用任何预训练的模型,通过使用随机梯度下降进行参数优化,并在训练完成后保存最后的权重文件。10. A small target detection method based on the SCGD-YOLO network from the perspective of a drone according to claim 1, characterized in that: the training method is: no pre-trained model is used during the training process, parameter optimization is performed by using stochastic gradient descent, and the final weight file is saved after the training is completed.
CN202510101054.5A 2025-01-22 2025-01-22 A small target detection method from the perspective of drone based on SCGD-YOLO network Pending CN119919836A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510101054.5A CN119919836A (en) 2025-01-22 2025-01-22 A small target detection method from the perspective of drone based on SCGD-YOLO network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510101054.5A CN119919836A (en) 2025-01-22 2025-01-22 A small target detection method from the perspective of drone based on SCGD-YOLO network

Publications (1)

Publication Number Publication Date
CN119919836A true CN119919836A (en) 2025-05-02

Family

ID=95511909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510101054.5A Pending CN119919836A (en) 2025-01-22 2025-01-22 A small target detection method from the perspective of drone based on SCGD-YOLO network

Country Status (1)

Country Link
CN (1) CN119919836A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120411109A (en) * 2025-07-03 2025-08-01 湖南福瑞康电子有限公司 Solder defect detection method, detection device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120411109A (en) * 2025-07-03 2025-08-01 湖南福瑞康电子有限公司 Solder defect detection method, detection device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111291809B (en) A processing device, method and storage medium
Ball et al. Comprehensive survey of deep learning in remote sensing: theories, tools, and challenges for the community
Rijal et al. Ensemble of deep neural networks for estimating particulate matter from images
WO2021043112A1 (en) Image classification method and apparatus
WO2021147325A1 (en) Object detection method and apparatus, and storage medium
WO2021043168A1 (en) Person re-identification network training method and person re-identification method and apparatus
WO2018141429A1 (en) A method and apparatus for detecting objects of interest in images
US20200202542A1 (en) Systems and methods for determining depth information in two-dimensional images
CN109325915B (en) A super-resolution reconstruction method for low-resolution surveillance video
CN119919836A (en) A small target detection method from the perspective of drone based on SCGD-YOLO network
CN113065645A (en) Twin attention network, image processing method and device
CN106056102A (en) Road vehicle classification method based on video image analysis
Cao et al. Learning spatial-temporal representation for smoke vehicle detection
Malav et al. DHSGAN: An end to end dehazing network for fog and smoke
Wang et al. Near-surface pedestrian detection method based on deep learning for UAVs in low illumination environments
CN118982734A (en) Improved target detection method based on YOLOv8s
Hu et al. Video-based driver action recognition via hybrid spatial–temporal deep learning framework
CN117172285A (en) A perceptual network and data processing method
CN119380001B (en) A UAV detection method and system combined with infrared non-uniform deviation field correction
Yuan et al. A novel dense generative net based on satellite remote sensing images for vehicle classification under foggy weather conditions
CN118379599B (en) Optical remote sensing salient target detection method and system based on heterogeneous feature collaboration
Zhao Image semantic segmentation method based on GAN network and FCN model
Aldawsari et al. Real‐Time Instance Segmentation Models for Identification of Vehicle Parts
CN114757820B (en) A semantically guided content feature transfer style transfer method and system
Li et al. CycFormer: Unsupervised Rain Removal Network Based on CycleGAN and Transformer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination