CN115641518B - View perception network model for unmanned aerial vehicle and target detection method - Google Patents

View perception network model for unmanned aerial vehicle and target detection method Download PDF

Info

Publication number
CN115641518B
CN115641518B CN202211226543.6A CN202211226543A CN115641518B CN 115641518 B CN115641518 B CN 115641518B CN 202211226543 A CN202211226543 A CN 202211226543A CN 115641518 B CN115641518 B CN 115641518B
Authority
CN
China
Prior art keywords
aerial vehicle
unmanned aerial
branches
branch
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211226543.6A
Other languages
Chinese (zh)
Other versions
CN115641518A (en
Inventor
魏玲
杨晓刚
李兴隆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Weiran Intelligent Technology Co ltd
Original Assignee
Shandong Weiran Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Weiran Intelligent Technology Co ltd filed Critical Shandong Weiran Intelligent Technology Co ltd
Priority to CN202211226543.6A priority Critical patent/CN115641518B/en
Publication of CN115641518A publication Critical patent/CN115641518A/en
Application granted granted Critical
Publication of CN115641518B publication Critical patent/CN115641518B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The application mainly provides a view-aware network model for an unmanned aerial vehicle and a target detection method, wherein the model is improved based on a network frame of YOLO V4 and comprises a backup network, a back layer and a head detection head. The backhaul network selects CSPDarknet53 as a backbone network to extract the characteristics of the image; the neg layer adopts a feature fusion module DPFPN for collecting network layers of feature images in different stages and further extracting complex features in the unmanned aerial vehicle image; the head detection head is used for predicting the category, the position and the confidence coefficient of the unmanned aerial vehicle image target, wherein parameters of a classification branch, a regression branch and a confidence coefficient branch are subjected to training loss calculation and optimization adjustment in a training stage through a Vari-focalioss algorithm. According to the method, an up-down dual-path feature fusion module DPFPN is constructed to fuse target features more finely, so that the problem of target multi-scale in an unmanned aerial vehicle scene is solved; the calculation of the vari-focalloss loss function is integrated to mitigate detector position errors caused by dense targets of unmanned aerial vehicle images.

Description

View perception network model for unmanned aerial vehicle and target detection method
Technical Field
The application belongs to the technical field of unmanned aerial vehicle image recognition, and particularly relates to a view perception network model for an unmanned aerial vehicle and a target detection method.
Background
Compared with the target detection in a natural scene, the target detection in the unmanned aerial vehicle scene is slow to develop, and along with the development of deep learning, the target detection in the unmanned aerial vehicle scene obtains huge development potential in rescue, monitoring, traffic monitoring and pedestrian tracking, and all the applications need to carry out robust and effective recognition on the target in the unmanned aerial vehicle view. However, it is not easy to detect targets in images of drone views, and there are challenges such as frequent environmental changes, dramatic changes in target dimensions, limited drone computing power, etc. With the popularity of large-scale data sets today, many of the most advanced convolutional neural network-based detectors have shown quite high performance, such as the R-CNN series and YOLO series. However, these detectors are suitable for relatively low resolution images, and processing higher resolution images is a greater challenge due to computational cost limitations.
The application provides an improvement scheme aiming at the problems in two aspects: the existing detector still has further improved recognition precision and efficiency due to the drastic change of the dimensions of objects shot by the unmanned aerial vehicle and the shielding of dense objects.
Disclosure of Invention
Aiming at the problem of target multiscale in the unmanned plane scene, the application constructs an up-down dual-path feature fusion module DPFPN to fuse target features more finely; aiming at the problem of dense targets in the unmanned aerial vehicle image, vari-focalloss loss calculation is introduced to relieve the position error of the detector caused by the dense targets.
The first aspect of the application provides a view-aware network model for an unmanned aerial vehicle, which is improved based on a network frame of YOLO V4 and comprises a backhaul network, a back layer and a head detection head;
the backhaul network selects CSPDarknet53 as a backbone network, and performs feature extraction on the preprocessed unmanned aerial vehicle image; the CSPDarknet53 backbone network output comprises four layers of C2, C3, C4 and C5, wherein C3, C4 and C5 are used as inputs of a neg layer;
the neg layer adopts a feature fusion module DPFPN for collecting network layers of feature graphs in different stages and further extracting complex features in the unmanned aerial vehicle image; the outputs P5, P4 and P3 of the neg layer are used as the inputs of a head detection head;
the head detection head is used for predicting the category, the position and the confidence coefficient of an unmanned aerial vehicle image target and comprises three branches, wherein the input of the three branches is three scale feature graphs, and the three scale feature graphs correspond to P3, P4 and P5 of a neg layer from top to bottom respectively; each branch comprises three sub-branches, namely a classification branch, a regression branch and a confidence branch; and the parameters of the classification branch, the regression branch and the confidence branch are subjected to training loss calculation and optimization adjustment through a Vari-focalioss algorithm in the training stage.
In one possible design, the CSPDarknet53 backbone network is exported sequentially through 2 CBM modules, 1 CSP module, and one CBM module.
In one possible design, the feature fusion DPFPN module is improved based on FPN; the output C5 obtained by the backbone network is convolved by 1 multiplied by 1 to obtain P5 with the same size; splicing the result of the downsampling of the P5 and the result of the convolution of the C4 by 1 multiplied by 1 to obtain P4; splicing the result of the downsampling of the P4 and the result of the convolution of the C3 by 1 multiplied by 1 to obtain P3; the obtained P3 has three branches, and one branch is subjected to convolution of 3 multiplied by 3 to obtain the final P3; the rest two P3 branches carry out the same operation, and P4 is obtained by up-sampling firstly, and then P5 is obtained by further up-sampling; performing feature fusion on the two P4 branches, and performing a convolution operation of 3×3 to obtain the final P4; performing feature fusion on the two P5 branches, and performing a convolution operation of 3 multiplied by 3 to obtain the final P5; the feature layers of the two branches can be fused with rich semantic information and spatial information during feature fusion, and the output P5, P4 and P3 of the neck of the neg are obtained.
In one possible design, the beginning of three branches in the head detection head is 2 CBL modules, so as to realize channel dimension reduction, and then three sub-branches are respectively a classification branch, a regression branch and a confidence branch, each sub-branch is respectively connected after passing through the CBL modules, the convolution operation and the sigmoid activation function, and is output after carrying out the reshape size adjustment operation.
In one possible design, the specific process of performing training loss calculation on the parameters of the classification branch, the regression branch and the confidence branch through the Vari-focalloss algorithm in the training stage is as follows:
in the training phase, confidence, classification and positioning losses are calculated, the losses being calculated as follows:
L total =L conf +L cls +L bbox
wherein ,Ltotal Representing the total loss of the model, L vonf Indicating confidence loss, L vls Representing the classification loss, L bbox Indicating a loss of positioning;
vari-focalloss was introduced to train the target detector to predict IACS:
where p is the predicted IACS score; q is a target IoU score, q is set to IoU values between generated bbox and gtbox for positive samples in training, and q is set to 0 for negative samples in training for all classes of training targets; alpha is the loss weight of foreground and background; gamma times p is the weight of the different samples;
L bbox the IoU loss is used to represent the loss of positioning of the target:
wherein A represents the area of the predicted frame and B represents the area of the real frame;
L cls classification loss representing the target:
L cls =-(ylog(y′)+(1-y)log(1-y′))
where y is the true category and y' is the predicted category.
The second aspect of the application provides a target detection method for an unmanned aerial vehicle, which comprises the following steps:
shooting by an unmanned aerial vehicle to obtain an image;
preprocessing an image; the preprocessing comprises the steps of adjusting the size of an image, and selecting a Mosaic Mosaic and Cut-Mix shearing and mixing for data enhancement;
the preprocessed image is input into the view aware network model according to any of claims 1 to 5 for processing and detection.
A third aspect of the present application provides an object detection device for a drone, the device comprising at least one processor and at least one memory; the memory stores therein a program of the view aware network model according to the first aspect; when the processor executes the program stored in the memory, the target detection and identification of the unmanned aerial vehicle image can be realized.
A fourth aspect of the present application provides a computer-readable storage medium having stored therein a computer-implemented program of the view-aware network model according to the first aspect, which when executed by a processor, enables target detection and recognition of unmanned aerial vehicle images.
The beneficial effects are that: the application provides an unmanned aerial vehicle view sensing network for target detection, which constructs an upper path feature fusion module DPFPN and a lower path feature fusion module DPFPN to fuse target features more finely and relieve the problem of multiple scales of targets in an unmanned aerial vehicle scene; the calculation of the vari-focalloss loss function is integrated to mitigate detector position errors caused by dense targets of unmanned aerial vehicle images. On the VisDrone data set, the application realizes SOTA performance, and the average precision mAP is respectively improved by 1.2 and 0.4 points compared with YOLOX and YOLOV 5; compared with some two-stage models, the method is more accurate and more efficient. The method realizes higher precision and efficiency of target detection under the unmanned airport scene.
Drawings
Fig. 1 is a schematic structural diagram of a view-aware network model for an unmanned aerial vehicle according to the present application.
Fig. 2 is a schematic diagram of a backbone network structure of a CSPDarkNet53 according to the present application.
FIG. 3 is a schematic diagram of a head detection head according to the present application.
Fig. 4 is a schematic structural diagram of a feature fusion module DPFPN in the present application.
Fig. 5 is a schematic diagram of a simple structure of an object detection device for an unmanned aerial vehicle according to the present application.
Detailed Description
The application will be further described with reference to specific examples.
Example 1:
compared with target detection in a natural scene, the main reason that the unmanned aerial vehicle has lower target detection precision is that the unmanned aerial vehicle has wider flight field of view, and the phenomena of long-distance small target size and short-distance large target size easily occur, so that the target proportion is changed drastically; moreover, dense targets such as people and vehicles make the detectors less able to locate during regression.
The unmanned aerial vehicle view-aware network architecture for target detection provided by the application is shown in fig. 1. In object detection today, detector mapping metrics are improved by considering selecting a backbone that has strong image feature extraction capability and is not too large, as this can affect detection speed, the present application selects CSPDarkNet53 as the backbone network according to YOLOV 4. First, the input image is resized to 640, where the mosaicmosaic and Cut-Mix Cut Mix are selected as the data enhancement method. The post-processing is performed through a backbone network CSPDarkNet53, and the CSPDarkNet53 sequentially passes through 2 CBM modules, 1 CSP module, 2 CBM modules, 1 CSP module and one CBM module to obtain output. The CBM module consists of convolution operation, batch norm batch standardization operation and Mish activation function; the CSP module consists of CBL and res unit residual error units; the CBL module is composed of convolution operation, batch norm batch standardization and a leak ReLu activation function; the res unit residual error unit consists of a CBL module residual error, and the specific structure is shown in fig. 2. The output of the backbone network has four layers, called C2, C3, C4, C5. C3, C4 and C5 are used as the input of a network neg layer, a Silu activation function is used, the output of P5, P4 and P3 is obtained by designing a fine feature extraction module DPFPN, and P5, P4 and P3 are fed to a shared head detection head and subjected to NMS non-maximum suppression, so that a final detection result is obtained.
As shown in FIG. 3, the head detection head has three branches, and three branches are input into three scale feature graphs, which correspond to P3, P4 and P5 of the neg layer from top to bottom. The beginning of each branch is 2 CBL modules, which realizes channel dimension reduction, and then three sub-branches, namely a classification branch, a regression branch and a confidence branch. Each sub-branch is connected through a CBL module, convolution operation and a sigmoid activation function, and is output after the reshape is adjusted in size. The confidence branch Obj is used for determining the existence or non-existence of the target, the target class branch Cls is used for predicting the network output category, namely classifying the target, and the Reg branch is used for regressively positioning the position of the target.
Regarding the feature fusion module DPFPN:
the feature fusion module is an important component of target detection, fuses features extracted from a backbone network, and then obtains a richer feature layer, thereby facilitating downstream detection tasks. Compared with a natural scene, the scene captured by the unmanned aerial vehicle has a plurality of small targets in the picture, and the small target information is easy to lose because the small targets are subjected to a plurality of downsampling operations in the backbone network.
As shown in fig. 4, the feature fusion DPFPN module of the network fully utilizes the bottom layer of the FPN. The output C5 obtained by the backbone network is convolved by 1 multiplied by 1 to obtain P5 with the same size; splicing the result of the downsampling of the P5 and the result of the convolution of the C4 by 1 multiplied by 1 to obtain P4; splicing the result of the downsampling of the P4 and the result of the convolution of the C3 by 1 multiplied by 1 to obtain P3; the obtained P3 has three branches, and one branch is subjected to convolution of 3 multiplied by 3 to obtain the final P3; the rest two P3 branches carry out the same operation, and P4 is obtained by up-sampling firstly, and then P5 is obtained by further up-sampling; performing feature fusion on the two P4 branches, and performing a convolution operation of 3×3 to obtain the final P4; and performing feature fusion on the two P5 branches, and performing a convolution operation of 3×3 to obtain the final P5. The feature layers of the two branches can be fused with rich semantic information and spatial information during feature fusion, and the output P5, P4 and P3 of the neck of the neg are obtained.
The FPN uses a strategy of combining shallow low-level position features with high-level deep semantic features, so that a good output result can be obtained. The DPFPN provided by the application is based on the improvement of the FPN, not only can the feature information transmitted by the FPN be fused in the horizontal direction, but also the feature information extracted from the backbone network can be further fused in the vertical direction, namely, the dual-path feature fusion is performed in the breadth and the depth, so that finer feature information is obtained.
Calculation of Vari-focalloss training loss:
in the training phase, confidence, classification and positioning losses are calculated, the losses being calculated as follows:
L total =L conf +L cls +L bbox
wherein ,Ltotal Representing the total loss of the model, L conf Indicating confidence loss, L cls Representing the classification loss, L bbox Indicating a loss of positioning.
Vari-focalloss was introduced to train the target detector to predict IACS (IACS is a IoU perceived classification score, representing the confidence in the presence and positioning accuracy of the target).
Where p is the predicted IACS score; q is a target IoU score, q is set to IoU values between generated bbox and gtbox for positive samples in training, and q is set to 0 for negative samples in training for all classes of training targets; alpha is the loss weight of foreground and background; the gamma times p are the weights of the different samples.
L bbox The IoU loss is used to represent the loss of positioning of the target.
Where a represents the area of the predicted frame and B represents the area of the real frame.
L cls Representing the classification loss of the target.
L cls =-(ylog(y′)+(1-y)log(1-y′))
Where y is the true category and y' is the predicted category.
Based on the model structure, the application provides a target detection method for an unmanned aerial vehicle, which comprises the following steps:
firstly, shooting by an unmanned aerial vehicle to obtain an image; preprocessing a shot image; the preprocessing comprises the steps of adjusting the size of an image, and selecting a mosaicmosaic and Cut-Mix shearing and mixing for data enhancement; and inputting the preprocessed image into the view perception network model for processing and detecting, and finally outputting a prediction result.
Example 2:
as shown in fig. 5, the present application also provides a small object detection device based on a knowledge graph, the device comprising at least one processor and at least one memory, the processor and the memory being coupled; the memory stores therein a computer program of the small target detection model constructed as described in embodiment 1; the processor, when executing the computer program stored by the memory, causes the device to effect small target detection. Wherein the internal bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (Peripheral Component, PCI) bus, or an extended industry standard architecture (XtendedIndustry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the buses in the drawings of the present application are not limited to only one bus or to one type of bus. The memory may include a high-speed RAM memory, and may further include a nonvolatile memory NVM, such as at least one magnetic disk memory, and may also be a U-disk, a removable hard disk, a read-only memory, a magnetic disk, or an optical disk. The device may be provided as a terminal, server or other form of device.
Fig. 5 is a block diagram of an apparatus shown for illustration. The device may include one or more of the following components: a processing component, a memory, a power component, a multimedia component, an audio component, an input/output (I/O) interface, a sensor component, and a communication component. The processing component generally controls overall operation of the electronic device, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component may include one or more processors to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component may include one or more modules that facilitate interactions between the processing component and other components. For example, the processing component may include a multimedia module to facilitate interaction between the multimedia component and the processing component.
The memory is configured to store various types of data to support operations at the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and the like. The memory may be implemented by any type of volatile or nonvolatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power supply assembly provides power to the various components of the electronic device. Power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic devices. The multimedia assembly includes a screen between the electronic device and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia assembly includes a front camera and/or a rear camera. When the electronic device is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component is configured to output and/or input an audio signal. For example, the audio component includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals. The I/O interface provides an interface between the processing assembly and a peripheral interface module, which may be a keyboard, click wheel, button, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly includes one or more sensors for providing status assessment of various aspects of the electronic device. For example, the sensor assembly may detect an on/off state of the electronic device, a relative positioning of the assemblies, such as a display and keypad of the electronic device, a change in position of the electronic device or one of the assemblies of the electronic device, the presence or absence of user contact with the electronic device, an orientation or acceleration/deceleration of the electronic device, and a change in temperature of the electronic device. The sensor assembly may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly may further include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component is configured to facilitate communication between the electronic device and other devices in a wired or wireless manner. The electronic device may access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further comprises a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
Example 3:
the present application also provides a computer-readable storage medium in which a program or instructions of the small target detection model constructed as described in embodiment 1 are stored, which when executed by a processor, cause a computer to realize small target detection.
In particular, a system, apparatus or device provided with a readable storage medium on which a software program code implementing the functions of any of the above embodiments is stored and whose computer or processor is caused to read and execute instructions stored in the readable storage medium may be provided. In this case, the program code itself read from the readable medium may implement the functions of any of the above-described embodiments, and thus the machine-readable code and the readable storage medium storing the machine-readable code form part of the present application.
The storage medium may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks (e.g., CD-ROM, CD-R, CD-RW, DVD-20ROM, DVD-RAM, DVD-RW), magnetic tape, and the like. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
It should be understood that the above processor may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be other general purpose processors, digital signal processors (english: digital Signal Processor, abbreviated as DSP), application specific integrated circuits (english: application Specific Integrated Circuit, abbreviated as ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.
It should be understood that a storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short). The processor and the storage medium may reside as discrete components in a terminal or server.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.
While the foregoing describes the embodiments of the present application, it should be understood that the present application is not limited to the embodiments, and that various modifications and changes can be made by those skilled in the art without any inventive effort.

Claims (7)

1. A view-aware network model for an unmanned aerial vehicle is improved based on a network frame of YOLO V4, and is characterized in that: the device comprises a backhaul network, a neg layer and a head detection head;
the backhaul network selects CSPDarknet53 as a backbone network, and performs feature extraction on the preprocessed unmanned aerial vehicle image; the CSPDarknet53 backbone network output comprises four layers of C2, C3, C4 and C5, wherein C3, C4 and C5 are used as inputs of a neg layer;
the neg layer adopts a feature fusion module DPFPN for collecting network layers of feature graphs in different stages and further extracting complex features in the unmanned aerial vehicle image; the outputs P5, P4 and P3 of the neg layer are used as the inputs of a head detection head;
wherein, the feature fusion module DPFPN is improved based on FPN; the output C5 obtained by the backbone network is convolved by 1 multiplied by 1 to obtain P5 with the same size; splicing the result of the downsampling of the P5 and the result of the convolution of the C4 by 1 multiplied by 1 to obtain P4; splicing the result of the downsampling of the P4 and the result of the convolution of the C3 by 1 multiplied by 1 to obtain P3; the obtained P3 has three branches, and one branch is subjected to convolution of 3 multiplied by 3 to obtain the final P3; the rest two P3 branches carry out the same operation, and P4 is obtained by up-sampling firstly, and then P5 is obtained by further up-sampling; performing feature fusion on the two P4 branches, and performing a convolution operation of 3×3 to obtain the final P4; performing feature fusion on the two P5 branches, and performing a convolution operation of 3 multiplied by 3 to obtain the final P5; the feature fusion method comprises the steps that when features are fused, feature layers of two branches can be fused with rich semantic information and spatial information, and output P5, P4 and P3 of a neg layer are obtained;
the head detection head is used for predicting the category, the position and the confidence coefficient of an unmanned aerial vehicle image target and comprises three branches, wherein the input of the three branches is three scale feature graphs, and the three scale feature graphs correspond to P3, P4 and P5 of a neg layer from top to bottom respectively; each branch comprises three sub-branches, namely a classification branch, a regression branch and a confidence branch; and the parameters of the classification branch, the regression branch and the confidence branch are subjected to training loss calculation and optimization adjustment through a Vari-focalioss algorithm in the training stage.
2. A view aware network model for an unmanned aerial vehicle as claimed in claim 1, wherein: the CSPDarknet53 backbone network sequentially outputs through 2 CBM modules, 1 CSP module, 2 CBM modules, 1 CSP module and one CBM module.
3. A view aware network model for an unmanned aerial vehicle as claimed in claim 1, wherein: the beginning of three branches in the head detection head is 2 CBL modules, channel dimension reduction is achieved, then three sub-branches are respectively classified branches, regression branches and confidence branches, each sub-branch is connected after the CBL modules, convolution operation and sigmoid activation function are respectively conducted, and the three sub-branches are output after reshape size adjustment operation is conducted.
4. A view aware network model for an unmanned aerial vehicle as claimed in claim 1, wherein: the specific process of training loss calculation of the parameters of the classification branch, the regression branch and the confidence branch through the Vari-focalioss algorithm in the training stage is as follows:
in the training phase, confidence, classification and positioning losses are calculated, the losses being calculated as follows:
wherein ,representing the total loss of the model, +.>Indicating confidence loss, ++>Representation ofClassification loss (S)>Indicating a loss of positioning;
vari-focalloss was introduced to train the target detector to predict IACS, a IoU perceived classification score:
where p is the predicted IACS score; q is a target IoU score, q is set to IoU values between generated bbox and gtbox for positive samples in training, and q is set to 0 for negative samples in training for all classes of training targets; alpha is the loss weight of foreground and background; the gamma power of p is the weight of the different samples;
the IoU loss is used to represent the loss of positioning of the target:
wherein A represents the area of the predicted frame and B represents the area of the real frame;
classification loss representing the target:
where y is the true class, and where y is the true class,representing the prediction category.
5. The target detection method for the unmanned aerial vehicle is characterized by comprising the following steps of:
shooting by an unmanned aerial vehicle to obtain an image;
preprocessing an image; the preprocessing comprises the steps of adjusting the size of an image, and selecting a Mosaic Mosaic and Cut-Mix shearing and mixing for data enhancement;
the preprocessed image is input into the view aware network model according to any of claims 1 to 4 for processing and detection.
6. Target detection equipment for unmanned aerial vehicle, its characterized in that: the apparatus includes at least one processor and at least one memory; the memory stores therein a program of the view aware network model according to any one of claims 1 to 4; and when the processor executes the program stored in the memory, the target detection and identification of the unmanned aerial vehicle image are realized.
7. A computer readable storage medium, wherein a computer executable program of the view aware network model according to any one of claims 1 to 4 is stored in the computer readable storage medium, and when the computer executable program is executed by a processor, the object detection and recognition of the unmanned aerial vehicle image is realized.
CN202211226543.6A 2022-10-09 2022-10-09 View perception network model for unmanned aerial vehicle and target detection method Active CN115641518B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211226543.6A CN115641518B (en) 2022-10-09 2022-10-09 View perception network model for unmanned aerial vehicle and target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211226543.6A CN115641518B (en) 2022-10-09 2022-10-09 View perception network model for unmanned aerial vehicle and target detection method

Publications (2)

Publication Number Publication Date
CN115641518A CN115641518A (en) 2023-01-24
CN115641518B true CN115641518B (en) 2023-09-26

Family

ID=84941591

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211226543.6A Active CN115641518B (en) 2022-10-09 2022-10-09 View perception network model for unmanned aerial vehicle and target detection method

Country Status (1)

Country Link
CN (1) CN115641518B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116935366B (en) * 2023-09-15 2024-02-20 南方电网数字电网研究院股份有限公司 Target detection method and device, electronic equipment and storage medium
CN117392527B (en) * 2023-12-11 2024-02-06 中国海洋大学 High-precision underwater target classification detection method and model building method thereof

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321923A (en) * 2019-05-10 2019-10-11 上海大学 Object detection method, system and the medium of different scale receptive field Feature-level fusion
CN112926685A (en) * 2021-03-30 2021-06-08 济南大学 Industrial steel oxidation zone target detection method, system and equipment
CN113033604A (en) * 2021-02-03 2021-06-25 淮阴工学院 Vehicle detection method, system and storage medium based on SF-YOLOv4 network model
CN114359851A (en) * 2021-12-02 2022-04-15 广州杰赛科技股份有限公司 Unmanned target detection method, device, equipment and medium
CN114627052A (en) * 2022-02-08 2022-06-14 南京邮电大学 Infrared image air leakage and liquid leakage detection method and system based on deep learning
CN114842365A (en) * 2022-07-04 2022-08-02 中国科学院地理科学与资源研究所 Unmanned aerial vehicle aerial photography target detection and identification method and system
CN114937201A (en) * 2022-07-04 2022-08-23 中国海洋大学三亚海洋研究院 Construction method and identification method of marine organism target detection algorithm model
CN115035386A (en) * 2022-06-29 2022-09-09 合肥学院 YOLOX target detection model compression method based on positioning distillation

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321923A (en) * 2019-05-10 2019-10-11 上海大学 Object detection method, system and the medium of different scale receptive field Feature-level fusion
CN113033604A (en) * 2021-02-03 2021-06-25 淮阴工学院 Vehicle detection method, system and storage medium based on SF-YOLOv4 network model
CN112926685A (en) * 2021-03-30 2021-06-08 济南大学 Industrial steel oxidation zone target detection method, system and equipment
CN114359851A (en) * 2021-12-02 2022-04-15 广州杰赛科技股份有限公司 Unmanned target detection method, device, equipment and medium
CN114627052A (en) * 2022-02-08 2022-06-14 南京邮电大学 Infrared image air leakage and liquid leakage detection method and system based on deep learning
CN115035386A (en) * 2022-06-29 2022-09-09 合肥学院 YOLOX target detection model compression method based on positioning distillation
CN114842365A (en) * 2022-07-04 2022-08-02 中国科学院地理科学与资源研究所 Unmanned aerial vehicle aerial photography target detection and identification method and system
CN114937201A (en) * 2022-07-04 2022-08-23 中国海洋大学三亚海洋研究院 Construction method and identification method of marine organism target detection algorithm model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
E-Commerce Picture Text Recognition Information System Based on Deep Learning;Bin Zhao et al.;《Computational Intelligence and Neuroscience》;全文 *
VarifocalNet: An IoU-aware Dense Object Detector;Haoyang Zhang et al.;《arXiv》;全文 *
基于 框架的无锚框 图像舰船目标检测;贾晓雅 等;《系统工程与电子技术》;全文 *

Also Published As

Publication number Publication date
CN115641518A (en) 2023-01-24

Similar Documents

Publication Publication Date Title
CN109829501B (en) Image processing method and device, electronic equipment and storage medium
CN115641518B (en) View perception network model for unmanned aerial vehicle and target detection method
US20210118112A1 (en) Image processing method and device, and storage medium
US20210019562A1 (en) Image processing method and apparatus and storage medium
CN110544217B (en) Image processing method and device, electronic equipment and storage medium
US20210027081A1 (en) Method and device for liveness detection, and storage medium
WO2023087741A1 (en) Defect detection method and apparatus, and electronic device, storage medium and computer program product
KR20220009965A (en) Network training method and apparatus, target detection method and apparatus, and electronic device
CN110619350B (en) Image detection method, device and storage medium
CN104298429A (en) Information presentation method based on input and input method system
CN109615006B (en) Character recognition method and device, electronic equipment and storage medium
CN110889469A (en) Image processing method and device, electronic equipment and storage medium
CN110543850B (en) Target detection method and device and neural network training method and device
US20220392202A1 (en) Imaging processing method and apparatus, electronic device, and storage medium
CN110569835B (en) Image recognition method and device and electronic equipment
EP3905122B1 (en) Video type detection method, apparatus, electronic device and storage medium
KR20220011207A (en) Image processing method and apparatus, electronic device and storage medium
CN111753895A (en) Data processing method, device and storage medium
KR20210036955A (en) Motion recognition method and device, driver condition analysis method and device
CN110532956B (en) Image processing method and device, electronic equipment and storage medium
CN114677517B (en) Semantic segmentation network model for unmanned aerial vehicle and image segmentation and identification method
CN111259967A (en) Image classification and neural network training method, device, equipment and storage medium
CN113326768A (en) Training method, image feature extraction method, image recognition method and device
CN112926510A (en) Abnormal driving behavior recognition method and device, electronic equipment and storage medium
CN111435422B (en) Action recognition method, control method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant