CN115641518A - View sensing network model for unmanned aerial vehicle and target detection method - Google Patents

View sensing network model for unmanned aerial vehicle and target detection method Download PDF

Info

Publication number
CN115641518A
CN115641518A CN202211226543.6A CN202211226543A CN115641518A CN 115641518 A CN115641518 A CN 115641518A CN 202211226543 A CN202211226543 A CN 202211226543A CN 115641518 A CN115641518 A CN 115641518A
Authority
CN
China
Prior art keywords
aerial vehicle
unmanned aerial
branch
loss
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211226543.6A
Other languages
Chinese (zh)
Other versions
CN115641518B (en
Inventor
魏玲
杨晓刚
李兴隆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Weiran Intelligent Technology Co ltd
Original Assignee
Shandong Weiran Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Weiran Intelligent Technology Co ltd filed Critical Shandong Weiran Intelligent Technology Co ltd
Priority to CN202211226543.6A priority Critical patent/CN115641518B/en
Publication of CN115641518A publication Critical patent/CN115641518A/en
Application granted granted Critical
Publication of CN115641518B publication Critical patent/CN115641518B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention mainly provides a view perception network model for an unmanned aerial vehicle and a target detection method, wherein the model is improved based on a network framework of YOLO V4, and comprises a backbone network, a tack layer and a head detection head. The backbone network selects CSPDarknet53 as a backbone network, and feature extraction is carried out on the image; the tack layer adopts a feature fusion module DPFPN and is used for collecting network layers of feature maps in different stages and further extracting complex features in the unmanned aerial vehicle image; the head detection head is used for predicting the category, the position and the confidence coefficient of the unmanned aerial vehicle image target, wherein parameters of a classification branch, a regression branch and a confidence coefficient branch are subjected to training loss calculation and optimized adjustment through a Vari-focalloss algorithm in a training stage. According to the invention, an upper and lower double-path feature fusion module DPFPN is constructed to fuse target features more finely, so that the multi-scale problem of the target in an unmanned aerial vehicle scene is relieved; the computation of the vari-focalloss loss function is integrated to mitigate detector position errors caused by the image-dense targets of the drone.

Description

View sensing network model for unmanned aerial vehicle and target detection method
Technical Field
The invention belongs to the technical field of unmanned aerial vehicle image recognition, and particularly relates to a view perception network model for an unmanned aerial vehicle and a target detection method.
Background
Compared with target detection in a natural scene, the target detection in the unmanned aerial vehicle scene is slow in development, and with the development of deep learning, the target detection in the unmanned aerial vehicle scene obtains a great development momentum in the aspects of rescue, monitoring, traffic monitoring and pedestrian tracking, and all the applications need to perform robust and effective identification on targets in the unmanned aerial vehicle view. However, detecting targets in images of a drone view is not easy, and presents some challenges, such as frequent environmental changes, drastic changes in target dimensions, limited drone computing power, and so on. With the popularity of large-scale data sets today, many of the most advanced convolutional neural network-based detectors have shown fairly high performance, such as the R-CNN series and the YOLO series. However, these detectors are suitable for relatively low resolution images, and processing higher resolution images becomes more challenging due to computational cost constraints.
The invention provides an improved scheme aiming at two problems: the violent change of object yardstick and the sheltering from of intensive object are shot to unmanned aerial vehicle, and current detector discernment precision and efficiency still have the space of further promotion.
Disclosure of Invention
Aiming at the multi-scale problem of the target in the unmanned aerial vehicle scene, an upper and lower double-path feature fusion module DPFPN is constructed to fuse the target features more finely; aiming at the problem of target density in an unmanned aerial vehicle image, the invention introduces vari-focalloss loss calculation to relieve the position error of a detector caused by the dense target.
The invention provides a view perception network model for an unmanned aerial vehicle, which is improved based on a network framework of YOLO V4 and comprises a backbone network, a neck layer and a head detection head;
the backbone network selects CSPDarknet53 as a backbone network, and feature extraction is carried out on the preprocessed unmanned aerial vehicle image; the CSPDarknet53 backbone network output comprises four layers of C2, C3, C4 and C5, wherein C3, C4 and C5 are used as the input of a neck layer;
the tack layer adopts a feature fusion module DPFPN and is used for collecting network layers of feature maps in different stages and further extracting complex features in the unmanned aerial vehicle image; outputs P5, P4 and P3 of the hack layer are used as the input of the head detection head;
the head detection head is used for predicting the category, the position and the confidence coefficient of an unmanned aerial vehicle image target and comprises three branches, wherein the three branches are input into three scale feature maps and respectively correspond to P3, P4 and P5 of a tack layer from top to bottom; each branch comprises three sub-branches, namely a classification branch, a regression branch and a confidence branch; and parameters of the classification branch, the regression branch and the confidence coefficient branch are subjected to training loss calculation and optimization adjustment through a Vari-focalloss algorithm in a training stage.
In one possible design, the CSPDarknet53 backbone network sequentially passes through 2 CBM modules, 1 CSP module, 2 CSP modules, 2 CBM modules, 1 CSP module, and one CBM module to obtain an output.
In one possible design, the feature fusion DPFPN module is refined based on FPN; carrying out convolution on output C5 obtained by the backbone network by 1 multiplied by 1 to obtain P5 with the same size; performing splicing operation on the result obtained by performing down-sampling on the P5 and the result obtained by performing convolution on the C4 by 1 x 1 to obtain P4; performing splicing operation on the result obtained by performing down-sampling on the P4 and the result obtained by performing convolution on the C3 by 1 multiplied by 1 to obtain P3; the obtained P3 has three branches, and one branch is subjected to convolution of 3 multiplied by 3 to obtain the final P3; the other two P3 branches are subjected to the same operation, and are subjected to upsampling to obtain P4, and then are subjected to further upsampling to obtain P5; performing feature fusion on the two P4 branches, and performing convolution operation of 3 multiplied by 3 to obtain final P4; performing feature fusion on the two P5 branches, and performing convolution operation of 3 multiplied by 3 to obtain final P5; and when the features are fused, the feature layers of the two branches can be fused with abundant semantic information and spatial information to obtain output P5, P4 and P3 of the neck of the jack.
In a possible design, the head detection head comprises 2 CBL modules at the beginning of three branches to realize channel dimension reduction, then three subbranches are respectively a classification branch, a regression branch and a confidence coefficient branch, after each subbranch passes through the CBL module, convolution operation and a sigmoid activation function, the three subbranches are connected, and reshape is output after size adjustment operation is carried out.
In one possible design, the specific process of calculating the training loss of the parameters of the classification branch, the regression branch and the confidence branch in the training stage through the Vari-focalloss algorithm is as follows:
in the training phase, confidence loss, classification loss and localization loss are calculated, and the loss is calculated as follows:
L total =L conf +L cls +L bbox
wherein ,Ltotal Represents the total loss of the model, L vonf Represents the loss of confidence, L vls Represents a classification loss, L bbox Indicating a loss of positioning;
the Vari-focalloss was introduced to train the target detector to predict IACS:
Figure BDA0003879982370000031
wherein p is the predicted IACS score; q is the target IoU score, q is set to the IoU value between the generated bbox and gtbox for positive samples in training, and all classes of training targets q are set to 0 for negative samples in training; α is the loss weight of the foreground background; gamma times p is the weight of different samples;
L bbox the loss of localization of the target is expressed using the IoU loss:
Figure BDA0003879982370000032
wherein, A represents the area of the prediction box, and B represents the area of the real box;
L cls classification loss representing target:
L cls =-(ylog(y′)+(1-y)log(1-y′))
where y is the true category and y' represents the predicted category.
The second aspect of the present invention provides a target detection method for an unmanned aerial vehicle, including the following processes:
shooting by an unmanned aerial vehicle to obtain an image;
preprocessing the image; the preprocessing comprises the steps of adjusting the size of an image, and selecting a Mosaic of Mosaic and Cut-Mix to shear and Mix for data enhancement;
inputting the preprocessed image into the view-aware network model of any one of claims 1 to 5 for processing and detection.
A third aspect of the invention provides a target detection apparatus for a drone, the apparatus comprising at least one processor and at least one memory; the memory having stored therein a program of the view-aware network model of the first aspect; when the processor executes the program stored in the memory, the target detection and identification of the unmanned aerial vehicle image can be realized.
A fourth aspect of the present invention provides a computer-readable storage medium, in which a computer-executable program of the view-aware network model according to the first aspect is stored, and when executed by a processor, the computer-executable program can implement target detection and identification of images of a drone.
Has the beneficial effects that: the invention provides an unmanned aerial vehicle view perception network for target detection, wherein an upper and lower double-path feature fusion module DPFPN is constructed to fuse target features more finely, so that the problem of multi-scale targets in an unmanned aerial vehicle scene is solved; and integrating the calculation of the vari-focalloss loss function to relieve the position error of the detector caused by the image dense target of the unmanned aerial vehicle. On the VisDrone data set, the invention realizes SOTA performance, and the average precision mAP is respectively improved by 1.2 and 0.4 points compared with YOLOX and YOLOV 5; compared with some two-stage models, the method is more accurate and more efficient. The higher precision and efficiency of target detection under the unmanned aerial vehicle scene have been realized on the whole.
Drawings
Fig. 1 is a schematic structural diagram of a view sensing network model for an unmanned aerial vehicle according to the present invention.
Fig. 2 is a schematic diagram of the CSPDarkNet53 backbone network structure of the present invention.
FIG. 3 is a schematic view of a head detection head according to the present invention.
Fig. 4 is a schematic structural diagram of the feature fusion module DPFPN according to the present invention.
Fig. 5 is a schematic diagram of a simple structure of the target detection device for the unmanned aerial vehicle according to the present invention.
Detailed Description
The invention is further illustrated by the following examples.
Example 1:
compared with target detection in a natural scene, the main reason that the target detection precision of the unmanned aerial vehicle is low is that the unmanned aerial vehicle has a wider flight view and is easy to have the phenomena of a small target size at a long distance and a large target size at a short distance, so that the target proportion is changed violently; in addition, dense targets such as people and vehicles make the detectors less capable of locating during the regression process.
The unmanned aerial vehicle view-aware network architecture for target detection provided by the invention is shown in fig. 1. Currently in object detection, the detector mapping metric is improved by considering the selection of a less large backbone with powerful image feature extraction capability, since this affects the detection speed, the present invention selects CSPDarkNet53 as the backbone network according to YOLOV 4. First, the input image is resized to 640, where Mosaic and Cut-Mix cropping Mix are selected as the data enhancement method. Post-processing is carried out through a backbone network CSPDarkNet53, and the CSPDarkNet53 sequentially passes through 2 CBM modules, 1 CSP module, 2 CBM modules and one CBM module to obtain output. The CBM module consists of convolution operation, batch norm operation and Mish activation function; the CSP module consists of CBL and res unit residual error units; the CBL module consists of convolution operation, batch standardization of batchnorm and a Leaky ReLu activation function; the res unit residual unit is composed of CBL module residuals, and the specific structure is shown in fig. 2. The output of the backbone network has four layers, referred to as C2, C3, C4, C5. C3, C4 and C5 are used as input of a network tack layer, outputs of P5, P4 and P3 are obtained by designing a fine feature extraction module DPFPN by using a Silu activation function, the P5, P4 and P3 are fed to a shared head detection head, and a final detection result is obtained by NMS non-maximum value inhibition.
As shown in fig. 3, the head detector has three major branches, and the inputs of the three major branches are three kinds of scale feature maps, which correspond to P3, P4, and P5 of the tack layer from top to bottom, respectively. The beginning of each branch is 2 CBL modules for realizing channel dimensionality reduction, and then three subbranches, namely a classification branch, a regression branch and a confidence branch are arranged. And after each subbranch passes through the CBL module, the convolution operation and the sigmoid activation function, the three subbranches are connected, and output is carried out after reshape size adjustment operation is carried out. The confidence branch Obj is used for determining the existence or nonexistence of the target, the target class branch Cls is used for predicting the output class of the network, namely classifying the target, and the Reg branch is used for regressing and positioning the position of the target.
Regarding the feature fusion module DPFPN:
the feature fusion module is an important component of target detection, fuses features extracted from a backbone network, and then obtains a richer feature layer, which is convenient for downstream detection tasks. Compared with a natural scene, the scene captured by the unmanned aerial vehicle has a plurality of small targets in the picture, and because the small targets are subjected to down-sampling operation for many times in the backbone network, the information of the small targets is easily lost.
As shown in fig. 4, the feature fusion DPFPN module of the network makes full use of the bottom layer of FPN. Carrying out convolution on output C5 obtained by the backbone network by 1 multiplied by 1 to obtain P5 with the same size; performing splicing operation on the result obtained by performing down-sampling on the P5 and the result obtained by performing convolution on the C4 by 1 x 1 to obtain P4; performing splicing operation on the result obtained by performing down-sampling on the P4 and the result obtained by performing convolution on the C3 by 1 multiplied by 1 to obtain P3; the obtained P3 has three branches, and one branch is convolved by 3 multiplied by 3 to obtain the final P3; the other two P3 branches are subjected to the same operation, and are subjected to upsampling to obtain P4, and then are subjected to further upsampling to obtain P5; performing feature fusion on the two P4 branches, and performing convolution operation of 3 multiplied by 3 to obtain final P4; and performing feature fusion on the two P5 branches, and performing convolution operation of 3 x 3 to obtain the final P5. And when the features are fused, the feature layers of the two branches can be fused with abundant semantic information and spatial information to obtain output P5, P4 and P3 of the neck of the jack.
FPN uses a strategy that combines shallow low-level location features with high-level deep semantic features to achieve good output results. The DPFPN provided by the invention is based on the improvement of the FPN, not only can fuse the feature information transmitted by the FPN in the horizontal direction, but also can further fuse the feature information extracted from a backbone network in the vertical direction, namely, double-path feature fusion is carried out in the width and depth, so that more fine feature information is obtained.
Calculation of the loss on Vari-focalloss training:
in the training phase, confidence loss, classification loss and localization loss are calculated, and the loss is calculated as follows:
L total =L conf +L cls +L bbox
wherein ,Ltotal Represents the total loss of the model, L conf Represents the loss of confidence, L cls Represents a classification loss, L bbox Indicating a loss of positioning.
Vari-focalloss was introduced to train the target detector to predict IACS (IACS is an IoU-aware classification score that represents confidence in target presence and positioning accuracy).
Figure BDA0003879982370000071
Wherein p is the predicted IACS score; q is the target IoU score, q is set to the IoU value between the generated bbox and gtbox for positive samples in training, and all classes of training targets q are set to 0 for negative samples in training; α is the loss weight of the foreground background; gamma times p is the weight of the different samples.
L bbox The IoU loss is used to represent the loss of localization of the target.
Figure BDA0003879982370000081
Where a denotes the area of the prediction box and B denotes the area of the real box.
L cls Representing the classification loss of the object.
L cls =-(ylog(y′)+(1-y)log(1-y′))
Where y is the true category and y' represents the predicted category.
Based on the model structure, the invention provides a target detection method for an unmanned aerial vehicle, which comprises the following steps:
firstly, shooting by an unmanned aerial vehicle to obtain an image; preprocessing the shot image; the preprocessing comprises the steps of adjusting the size of an image, and selecting a Mosaic of Mosaic and Cut-Mix to shear and Mix for data enhancement; and inputting the preprocessed image into the view perception network model for processing and detection, and finally outputting a prediction result.
Example 2:
as shown in fig. 5, the present invention also provides a knowledge-graph-based small object detection device, which comprises at least one processor and at least one memory, wherein the processor and the memory are coupled; the memory is stored with a computer program of the small target detection model constructed as described in embodiment 1; the processor, when executing the computer program stored by the memory, causes the apparatus to implement small target detection. The internal bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Enhanced ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus. The memory may include a high-speed RAM memory, and may further include a non-volatile storage NVM, such as at least one magnetic disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic disk, or an optical disk. The device may be provided as a terminal, server, or other form of device.
Fig. 5 is a block diagram of an apparatus shown for exemplary purposes. The device may include one or more of the following components: processing components, memory, power components, multimedia components, audio components, interfaces for input/output (I/O), sensor components, and communication components. The processing components typically control overall operation of the electronic device, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components may include one or more processors to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component can include one or more modules that facilitate interaction between the processing component and other components. For example, the processing component may include a multimedia module to facilitate interaction between the multimedia component and the processing component.
The memory is configured to store various types of data to support operations at the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and so forth. The memory may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component provides power to various components of the electronic device. The power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for an electronic device. The multimedia component includes a screen providing an output interface between the electronic device and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component is configured to output and/or input an audio signal. For example, the audio assembly includes a Microphone (MIC) configured to receive an external audio signal when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals. The I/O interface provides an interface between the processing component and a peripheral interface module, which may be a keyboard, click wheel, button, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly includes one or more sensors for providing various aspects of status assessment for the electronic device. For example, the sensor assembly may detect an open/closed state of the electronic device, the relative positioning of the components, such as a display and keypad of the electronic device, the sensor assembly may also detect a change in the position of the electronic device or a component of the electronic device, the presence or absence of user contact with the electronic device, orientation or acceleration/deceleration of the electronic device, and a change in the temperature of the electronic device. The sensor assembly may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly may further include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component is configured to facilitate wired or wireless communication between the electronic device and other devices. The electronic device may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.
Example 3:
the invention also provides a computer-readable storage medium, wherein a program or an instruction of the small target detection model constructed as described in embodiment 1 is stored in the computer-readable storage medium, and when the program or the instruction is executed by a processor, the computer is enabled to realize small target detection.
In particular, a system, apparatus or device may be provided which is provided with a readable storage medium on which software program code implementing the functionality of any of the embodiments described above is stored and which causes a computer or processor of the system, apparatus or device to read and execute instructions stored in the readable storage medium. In this case, the program code itself read from the readable medium can realize the functions of any of the above-described embodiments, and thus the machine-readable code and the readable storage medium storing the machine-readable code form part of the present invention.
The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks (e.g., CD-ROM, CD-R, CD-RW, DVD-20ROM, DVD-RAM, DVD-RW), magnetic tape, or the like. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of hardware and software modules.
It should be understood that a storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in a terminal or server.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Although the present invention has been described with reference to the specific embodiments, it should be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (8)

1. A view perception network model for an unmanned aerial vehicle is improved based on a network framework of YOLO V4, and is characterized in that: the system comprises a backbone network, a hack layer and a head detection head;
the backbone network selects CSPDarknet53 as a backbone network, and feature extraction is carried out on the preprocessed unmanned aerial vehicle image; the CSPDarknet53 backbone network output comprises four layers of C2, C3, C4 and C5, wherein C3, C4 and C5 are used as the input of a neck layer;
the tack layer adopts a feature fusion module DPFPN and is used for collecting network layers of feature maps in different stages and further extracting complex features in the unmanned aerial vehicle image; outputs P5, P4 and P3 of the tack layer are used as the input of the head detection head;
the head detection head is used for predicting the category, the position and the confidence coefficient of an unmanned aerial vehicle image target and comprises three branches, the three branches are input into three scale feature maps and respectively correspond to P3, P4 and P5 of a tack layer from top to bottom; each branch comprises three sub-branches, namely a classification branch, a regression branch and a confidence branch; and parameters of the classification branch, the regression branch and the confidence coefficient branch are subjected to training loss calculation and optimization adjustment through a Vari-focalloss algorithm in a training stage.
2. The view aware network model for an unmanned aerial vehicle of claim 1, wherein: the CSPDarknet53 backbone network sequentially passes through the 2 CBM modules, the 1 CSP module, the 2 CBM modules, the 1 CSP module and the CBM module to obtain output.
3. The view aware network model for an unmanned aerial vehicle of claim 1, wherein: the feature fusion DPFPN module is improved based on FPN; carrying out convolution of 1 multiplied by 1 on output C5 obtained by the backbone network to obtain P5 with the same size; performing splicing operation on the result obtained by performing down-sampling on the P5 and the result obtained by performing convolution on the C4 by 1 x 1 to obtain P4; performing splicing operation on the result obtained by performing down-sampling on the P4 and the result obtained by performing convolution on the C3 by 1 multiplied by 1 to obtain P3; the obtained P3 has three branches, and one branch is subjected to convolution of 3 multiplied by 3 to obtain the final P3; the other two P3 branches are subjected to the same operation, and are subjected to upsampling to obtain P4, and then are subjected to further upsampling to obtain P5; performing feature fusion on the two P4 branches, and performing convolution operation of 3 multiplied by 3 to obtain final P4; performing feature fusion on the two P5 branches, and performing convolution operation of 3 multiplied by 3 to obtain final P5; and when the features are fused, the feature layers of the two branches can be fused with abundant semantic information and spatial information to obtain output P5, P4 and P3 of the neck of the jack.
4. The view aware network model for an unmanned aerial vehicle of claim 1, wherein: the head detection head is characterized in that 2 CBL modules are arranged at the beginning of three branches to realize channel dimension reduction, then three subbranches are arranged and respectively comprise a classification branch, a regression branch and a confidence coefficient branch, after each subbranch respectively passes through the CBL module, convolution operation and a sigmoid activation function, the three subbranches are connected, and resipe operation is carried out and then output is carried out.
5. The view-aware network model for an unmanned aerial vehicle of claim 1, wherein: the specific process of performing training loss calculation on the parameters of the classification branch, the regression branch and the confidence coefficient branch through a Vari-focalloss algorithm in the training stage is as follows:
in the training phase, confidence loss, classification loss and localization loss are calculated, and the loss is calculated as follows:
L total =L conf +L cls +L bbox
wherein ,Ltotal Represents the total loss of the model, L vonf Represents the loss of confidence, L vls Represents a classification loss, L bbox Indicating a loss of positioning;
vari-focalloss was introduced to train the target detector to predict IACS:
Figure FDA0003879982360000021
wherein p is the predicted IACS score; q is the target IoU score, q is set to the IoU value between the generated bbox and gtbox for positive samples in training, and the training targets q of all classes are set to 0 for negative samples in training; α is the loss weight of the foreground background; gamma times p is the weight of different samples;
L bbox the loss of localization of the target is expressed using the IoU loss:
Figure FDA0003879982360000031
wherein, A represents the area of the prediction box, and B represents the area of the real box;
L cls classification loss representing target:
L cls =-(ylog(y′)+(1-y)log(1-y′))
where y is the true category and y' represents the predicted category.
6. A target detection method for an unmanned aerial vehicle is characterized by comprising the following processes:
shooting by an unmanned aerial vehicle to obtain an image;
preprocessing the image; the preprocessing comprises the steps of adjusting the size of an image, and selecting Mosaic and Cut-Mix shearing mixing for data enhancement;
inputting the preprocessed image into the view-aware network model of any one of claims 1 to 5 for processing and detection.
7. The utility model provides a target check out test set for unmanned aerial vehicle which characterized in that: the apparatus comprises at least one processor and at least one memory; the memory having stored therein a program of the view aware network model of any one of claims 1 to 5; when the processor executes the program stored in the memory, the target detection and identification of the unmanned aerial vehicle image can be realized.
8. A computer-readable storage medium, in which a computer-executable program of the view-aware network model according to any one of claims 1 to 5 is stored, and when executed by a processor, the computer-executable program can implement target detection and recognition of a drone image.
CN202211226543.6A 2022-10-09 2022-10-09 View perception network model for unmanned aerial vehicle and target detection method Active CN115641518B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211226543.6A CN115641518B (en) 2022-10-09 2022-10-09 View perception network model for unmanned aerial vehicle and target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211226543.6A CN115641518B (en) 2022-10-09 2022-10-09 View perception network model for unmanned aerial vehicle and target detection method

Publications (2)

Publication Number Publication Date
CN115641518A true CN115641518A (en) 2023-01-24
CN115641518B CN115641518B (en) 2023-09-26

Family

ID=84941591

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211226543.6A Active CN115641518B (en) 2022-10-09 2022-10-09 View perception network model for unmanned aerial vehicle and target detection method

Country Status (1)

Country Link
CN (1) CN115641518B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116935366A (en) * 2023-09-15 2023-10-24 南方电网数字电网研究院有限公司 Target detection method and device, electronic equipment and storage medium
CN117392527A (en) * 2023-12-11 2024-01-12 中国海洋大学 High-precision underwater target classification detection method and model building method thereof

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321923A (en) * 2019-05-10 2019-10-11 上海大学 Object detection method, system and the medium of different scale receptive field Feature-level fusion
CN112926685A (en) * 2021-03-30 2021-06-08 济南大学 Industrial steel oxidation zone target detection method, system and equipment
CN113033604A (en) * 2021-02-03 2021-06-25 淮阴工学院 Vehicle detection method, system and storage medium based on SF-YOLOv4 network model
CN114359851A (en) * 2021-12-02 2022-04-15 广州杰赛科技股份有限公司 Unmanned target detection method, device, equipment and medium
CN114627052A (en) * 2022-02-08 2022-06-14 南京邮电大学 Infrared image air leakage and liquid leakage detection method and system based on deep learning
CN114842365A (en) * 2022-07-04 2022-08-02 中国科学院地理科学与资源研究所 Unmanned aerial vehicle aerial photography target detection and identification method and system
CN114937201A (en) * 2022-07-04 2022-08-23 中国海洋大学三亚海洋研究院 Construction method and identification method of marine organism target detection algorithm model
CN115035386A (en) * 2022-06-29 2022-09-09 合肥学院 YOLOX target detection model compression method based on positioning distillation

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321923A (en) * 2019-05-10 2019-10-11 上海大学 Object detection method, system and the medium of different scale receptive field Feature-level fusion
CN113033604A (en) * 2021-02-03 2021-06-25 淮阴工学院 Vehicle detection method, system and storage medium based on SF-YOLOv4 network model
CN112926685A (en) * 2021-03-30 2021-06-08 济南大学 Industrial steel oxidation zone target detection method, system and equipment
CN114359851A (en) * 2021-12-02 2022-04-15 广州杰赛科技股份有限公司 Unmanned target detection method, device, equipment and medium
CN114627052A (en) * 2022-02-08 2022-06-14 南京邮电大学 Infrared image air leakage and liquid leakage detection method and system based on deep learning
CN115035386A (en) * 2022-06-29 2022-09-09 合肥学院 YOLOX target detection model compression method based on positioning distillation
CN114842365A (en) * 2022-07-04 2022-08-02 中国科学院地理科学与资源研究所 Unmanned aerial vehicle aerial photography target detection and identification method and system
CN114937201A (en) * 2022-07-04 2022-08-23 中国海洋大学三亚海洋研究院 Construction method and identification method of marine organism target detection algorithm model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BIN ZHAO ET AL.: "E-Commerce Picture Text Recognition Information System Based on Deep Learning", 《COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE》 *
HAOYANG ZHANG ET AL.: "VarifocalNet: An IoU-aware Dense Object Detector", 《ARXIV》 *
贾晓雅 等: "基于 框架的无锚框 图像舰船目标检测", 《系统工程与电子技术》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116935366A (en) * 2023-09-15 2023-10-24 南方电网数字电网研究院有限公司 Target detection method and device, electronic equipment and storage medium
CN116935366B (en) * 2023-09-15 2024-02-20 南方电网数字电网研究院股份有限公司 Target detection method and device, electronic equipment and storage medium
CN117392527A (en) * 2023-12-11 2024-01-12 中国海洋大学 High-precision underwater target classification detection method and model building method thereof
CN117392527B (en) * 2023-12-11 2024-02-06 中国海洋大学 High-precision underwater target classification detection method and model building method thereof

Also Published As

Publication number Publication date
CN115641518B (en) 2023-09-26

Similar Documents

Publication Publication Date Title
CN109829501B (en) Image processing method and device, electronic equipment and storage medium
CN111476306B (en) Object detection method, device, equipment and storage medium based on artificial intelligence
CN108629354B (en) Target detection method and device
US11393256B2 (en) Method and device for liveness detection, and storage medium
CN110544217B (en) Image processing method and device, electronic equipment and storage medium
US20210133468A1 (en) Action Recognition Method, Electronic Device, and Storage Medium
CN115641518A (en) View sensing network model for unmanned aerial vehicle and target detection method
KR20210102180A (en) Image processing method and apparatus, electronic device and storage medium
CN110751659B (en) Image segmentation method and device, terminal and storage medium
CN110443366B (en) Neural network optimization method and device, and target detection method and device
US20200294249A1 (en) Network module and distribution method and apparatus, electronic device, and storage medium
JP2022522551A (en) Image processing methods and devices, electronic devices and storage media
CN114937201A (en) Construction method and identification method of marine organism target detection algorithm model
CN106056379A (en) Payment terminal and payment data processing method
CN110569835A (en) Image identification method and device and electronic equipment
CN114677517B (en) Semantic segmentation network model for unmanned aerial vehicle and image segmentation and identification method
CN115908442B (en) Image panorama segmentation method and model building method for unmanned aerial vehicle ocean monitoring
CN113326768A (en) Training method, image feature extraction method, image recognition method and device
CN116863286B (en) Double-flow target detection method and model building method thereof
CN114267041B (en) Method and device for identifying object in scene
CN111242034A (en) Document image processing method and device, processing equipment and client
CN113313115B (en) License plate attribute identification method and device, electronic equipment and storage medium
CN112269939B (en) Automatic driving scene searching method, device, terminal, server and medium
CN116611482B (en) Model training method, device, electronic equipment and medium
CN111523599B (en) Target detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant