CN111340850A - Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss - Google Patents

Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss Download PDF

Info

Publication number
CN111340850A
CN111340850A CN202010198544.9A CN202010198544A CN111340850A CN 111340850 A CN111340850 A CN 111340850A CN 202010198544 A CN202010198544 A CN 202010198544A CN 111340850 A CN111340850 A CN 111340850A
Authority
CN
China
Prior art keywords
network
frame image
target
feature extraction
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010198544.9A
Other languages
Chinese (zh)
Inventor
林白
耿洋洋
李冬冬
蒯杨柳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
System General Research Institute Academy Of Systems Engineering Academy Of Military Sciences
National University of Defense Technology
Original Assignee
System General Research Institute Academy Of Systems Engineering Academy Of Military Sciences
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by System General Research Institute Academy Of Systems Engineering Academy Of Military Sciences, National University of Defense Technology filed Critical System General Research Institute Academy Of Systems Engineering Academy Of Military Sciences
Priority to CN202010198544.9A priority Critical patent/CN111340850A/en
Publication of CN111340850A publication Critical patent/CN111340850A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an unmanned aerial vehicle ground target tracking method based on a twin network and central logic loss, which aims to solve the problems of more network parameters, large calculated amount, unbalanced positive and negative training samples and the like in the existing target tracking technology and belongs to the technical field of computer image processing. The method comprises the following steps: extracting a target feature map of a first frame image at a known target position by using a first feature extraction network, and extracting a search feature map of a second frame image by using a second feature extraction network; calculating the cross correlation between the search area of the second frame image and the target area of the first frame image according to the target characteristic diagram and the search characteristic diagram to obtain a score response diagram of the second frame image, and further obtaining the target position of the second frame image according to the score response diagram of the second frame image; the first feature extraction network and the second feature extraction network are two branches of a twin convolutional network and respectively consist of a lightweight convolutional neural network.

Description

Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss
Technical Field
The invention belongs to the technical field of computer image processing, and particularly relates to a ground target tracking method of an unmanned aerial vehicle based on a twin network and central logic loss.
Background
Deep learning based visual tracking algorithms, such as the full convolution twin network based visual tracking algorithm shown in fig. 1, have been accepted and approved by a wide range of developers and users.
The method comprises the steps of respectively inputting a template image extracted from a first frame and a search image extracted from each frame into two sub-networks to extract high-level semantic features, and then performing cross-correlation on the high-level semantic features to obtain the similarity of the template image at each position in the search image. Typically, the parameters in both sub-networks are shared and can be learned offline using training data. The high-level semantic features have rich semantic characteristics related to the target category, so that the method has strong robustness on target appearance change caused by occlusion, distortion and the like, and the network does not need to be updated in the tracking process, thereby greatly reducing the calculation amount of the algorithm and ensuring the real-time performance of the algorithm. The network has two inputs: the target template image and the search area image are input to carry out feature extraction through a twin subnet of a twin neural network sharing parameters.
However, such methods are mainly oriented to general visual tracking tasks and cannot meet the hardware platform of the unmanned aerial vehicle with limited computing and storage resources. Firstly, a great number of weight parameters exist in the convolutional neural network, and the requirement for the memory of the equipment is high when the great number of weight parameters are stored. Secondly, the computing resources of the embedded hardware platforms such as the unmanned aerial vehicle are limited, so that the efficient and real-time convolution calculation in the convolution neural network is difficult to realize.
Meanwhile, in the actual tracking process, such implementation requires intensive detection of the target at each position of the search area. For the unmanned aerial vehicle aerial image, the search area generally contains many negative samples in a simple background (such as area 3 in fig. 2), few difficult negative samples (such as area 1 in fig. 2), and a positive sample containing a foreground target (such as area 2 in fig. 2), and the large number of negative samples in the simple background causes imbalance of training samples and dominates training of the network, thereby causing model degradation.
Disclosure of Invention
The invention aims to provide a ground target tracking method of an unmanned aerial vehicle based on a twin network and central logic loss aiming at the characteristics of an unmanned aerial vehicle hardware platform, so as to solve the problems of more network parameters, large calculated amount, unbalanced positive and negative training samples and the like in the existing target tracking technology.
According to a first aspect of the present invention, a training method for a ground target visual tracking model by an unmanned aerial vehicle is provided, where the tracking model is a twin convolutional network, and two branches of the twin convolutional network are respectively a first feature extraction network and a second feature extraction network, and the training method includes:
acquiring a video sequence data set, wherein the data set comprises a template image and a search image which are paired;
extracting a target feature map of the template image by using a first feature extraction network, and extracting a search feature map of the search image by using a second feature extraction network;
calculating the cross correlation between the search area of the search image and the target area of the template image according to the target characteristic diagram and the search characteristic diagram to obtain a score response diagram of the search image;
calculating the difference between the score response graph and the true value according to a central logic loss function to obtain a difference result; and
back-propagating the difference results to adjust the weights of the layers in the twin neural network;
the first feature extraction network and the second feature extraction network are respectively composed of a lightweight convolutional neural network.
Optionally, the lightweight convolutional neural network is a MobileNetV2 model.
Optionally, the central logic loss function is:
Figure BDA0002418512910000031
wherein, v ∈ Rm×nIs a score plot of the network output, y ∈ { +1, -1} is the artificially labeled true value a/(1+ exp (b · yv)) is a modulation factor for the logic loss that adaptively adjusts the contribution of each training sample to the training loss according to the input yv.
Further, when yv > 0, the modulation factor assigns a first weight to the logic loss; when yv < 0, the modulation factor assigns a second weight to the logic loss, the first weight being less than the second weight.
According to a second aspect of the invention, an unmanned aerial vehicle ground target visual tracking method comprises the following steps:
extracting a target feature map of a first frame image at a known target position by using a first feature extraction network, and extracting a search feature map of a second frame image by using a second feature extraction network;
calculating the cross correlation between the search area of the second frame image and the target area of the first frame image according to the target characteristic diagram and the search characteristic diagram to obtain a score response diagram of the second frame image, and further obtaining the target position of the second frame image according to the score response diagram of the second frame image;
the first feature extraction network and the second feature extraction network are two branches of a twin convolutional network and respectively composed of a lightweight convolutional neural network, and the lightweight convolutional neural network is a MobileNet V2 model.
According to a third aspect of the present invention, an unmanned aerial vehicle visual tracking device for a ground target comprises:
the identification unit is used for extracting a target feature map of a first frame image with a known target position by using a first feature extraction network and extracting a search feature map of a second frame image by using a second feature extraction network;
the calculating unit is used for calculating the cross correlation between the searching area of the second frame image and the target area of the first frame image according to the target characteristic graph and the searching characteristic graph to obtain a score response graph of the second frame image;
the determining unit is used for obtaining the target position of the second frame image according to the score response image of the second frame image;
the first feature extraction network and the second feature extraction network in the identification unit are two branches of a twin convolution network and respectively consist of a lightweight convolution neural network.
According to a fourth aspect of the invention, an electronic device comprises:
at least one processor; and
a memory communicatively coupled to the processor and storing instructions executable by the processor; when the instructions are executed by the processor, the processor performs the drone-to-ground target visual tracking method, or the training method.
According to a fifth aspect of the present invention, a readable storage medium is stored with a computer program, wherein the computer program is executed by a processor to implement the method for visually tracking the target by the drone or the training method.
The invention adopts the MobileNet V2 model in the lightweight network as the feature extraction sub-network at the front end of the depth frame, thereby reducing the calculation complexity and the number of parameters in the convolutional neural network. Meanwhile, the processing speed and the accuracy can be well balanced, so that the method can adapt to the limited storage and calculation resources of the hardware platform of the unmanned aerial vehicle.
In addition, the invention applies different weights to different training samples in the search area by adopting a central logic loss function, solves the problem of unbalance of positive and negative training samples, avoids the problem of network degradation in offline training and enables the learned convolution characteristics to have stronger discrimination.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is obvious that the drawings in the following description are some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive exercise.
FIG. 1 is a schematic diagram of a conventional twin network structure;
FIG. 2 is a practical target tracking scenario including a simple negative example (area 3), a difficult negative example (area 1), and a positive example (area 2) containing a foreground target;
FIG. 3 is a schematic diagram of a visual tracking network model structure according to an embodiment of the invention;
FIG. 4 is a schematic diagram of a MobileNet V2 network structure;
FIG. 5 is a schematic flow chart of a visual tracking network training and tracking method according to an embodiment of the present invention;
fig. 6 is a schematic flow chart of a method for visually tracking a ground target by an unmanned aerial vehicle according to an embodiment of the invention;
fig. 7 is a schematic structural diagram of a visual tracking device for a ground target of an unmanned aerial vehicle according to an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the following description, UAV (unmanned aerial Vehicle) refers primarily to an unmanned aerial Vehicle that is operated using a radio remote control device and a self-contained program control device, or is operated autonomously, either completely or intermittently, by an onboard computer.
FIG. 3 illustrates a twin network based visual tracking network constructed in accordance with an embodiment of the invention.
As shown in fig. 3, the visual tracking network includes a first branch and a second branch with the same structure, and the two branch distributions include a feature extraction network for extracting a feature map of an image. The feature extraction network of each branch in the invention is composed of a lightweight convolutional neural network. According to one embodiment of the invention, the lightweight convolutional neural network is the MobileNet V2 model.
Fig. 4 shows the network structure of the MobileNet V2 model. MobileNet V2 uses convolution that is separable in depth as an efficient building block. In addition, MobileNet V2 introduces two new architectural features: 1) a linear bottleneck layer between layers; 2) and connecting shortcuts between the bottleneck layers. This network architecture provides additional security, privacy and power benefits to MobileNet V2.
Therefore, the MobileNet V2 network sets a first convolutional layer of 1 × 1 for expansion before extracting the features of the image by using the DW convolutional layer, and then compresses by using a third convolutional layer after extracting the features, so that the MobileNet V2 network realizes the process of expansion → convolutional feature extraction → compression, and avoids the problem of channel loss.
Meanwhile, ReLU functions are respectively arranged at the output ends of the first convolution layer and the second convolution layer. Since the expansion → convolution extraction → compression process described above is used, a problem is encountered after compression, namely the ReLU function destroys the feature, the ReLU function has all zero outputs for negative inputs, and therefore the Linear function is used for the activation function of the third convolutional layer in order to avoid further loss of the feature.
The vision tracking network based on the twin network provided by the invention adopts a central logic loss function, and the central logic loss function is as follows:
Figure BDA0002418512910000061
wherein:
v∈Rm×nis a score chart of the network output, and m × n is the size of the score chart;
y ∈ { +1, -1} is the true value of the manual annotation;
a/(1+ exp (b · yv)) is a modulation factor for the logic loss, and a and b are parameters in the modulation factor, e.g. according to an embodiment of the present invention, a is 2 and b is 1.
The modulation factor adaptively adjusts the contribution of each training sample to the training loss according to the input yv. When yv is greater than 0, the sample is a simple sample, and the modulation factor assigns smaller weight to the logic loss; conversely, when yv < 0, the sample is a difficult sample and the modulation factor assigns a greater weight to the logic loss.
Most convolutional neural networks use logic loss or cross entropy loss as a supervisory signal in a deep model training process, and a model trained by the loss function has good separability but poor discriminability.
Unlike closed set problems such as object classification and recognition, target tracking belongs to an open set problem, and requires not only separability of features output by a depth model, but also strong discriminability. When processing long-tailed datasets, where most samples belong to few classes and many other classes have very few samples, it can be tricky how to weight the loss of the different classes. For the target tracking problem, foreground targets are easier to collect as positive samples, while the difficult negative samples in the background that are useful for training are few.
Therefore, the invention can self-adaptively adjust the proportion of positive and negative samples by using the central logic loss function in an end-to-end form, thereby preventing the learned network from being influenced by imbalance of training samples. Specifically, the invention applies different weights to different samples by adopting the central logic loss function, so that the problem of imbalance of foreground-background training samples can be solved.
Fig. 5 shows an exemplary process of visual tracking network training and unmanned aerial vehicle-to-ground target visual tracking using the trained network model according to an embodiment of the present invention.
As shown in fig. 5, the training process of the visual tracking network in the present invention includes a pre-training phase and a fine-tuning phase.
In the pre-training stage, for example, the video database in ImageNet large-scale visual recognition challenge match ILSVRC is adopted as a sample video sequence, and the marked sample video sequence is used for training the visual tracking network. During training, the maximum iteration times and the learning rate of the visual tracking network are set, a network parameter initialization method and a back propagation method are selected, and network parameters are optimized.
According to an embodiment of the invention, the maximum iteration times and the learning rate are set, and the back propagation method is selected as follows:
maximum number of iterations: 50 epoch;
initial learning rate: 0.001;
the network initialization method comprises the following steps: xavier process
The back propagation method comprises the following steps: random gradient descent method.
After the pre-training phase is completed, the network parameters are further optimized through a fine-tuning phase.
In the fine adjustment stage, the method forms a video data set by using an unmanned aerial vehicle to collect the ground targets, labels each frame image of the video data set according to the category, divides the labeled video data set into a training set, a verification set and a test set, and finally processes the labeled video data set into data types which can be identified by a visual tracking network model.
And training the pre-trained visual tracking network by utilizing the training set and the verification set so as to finely tune the visual tracking network and reserve the structure and parameters of the finely tuned visual tracking network model.
And testing the finely adjusted visual tracking network by using the test set to obtain the tracking accuracy. And judging the tracking accuracy, wherein if the tracking accuracy can meet the actual engineering requirements, the visual tracking network model can be applied to the actual task of identifying the specific target of the unmanned aerial vehicle on the ground. Otherwise, the training set cannot meet the actual engineering requirements, the training set needs to be expanded, and the pre-training and fine-tuning steps are restarted until the actual engineering requirements are met.
In the training process, the invention adopts a central logic loss function, and the central logic loss function is as follows:
Figure BDA0002418512910000081
wherein:
v∈Rm×nis a score plot of the network output;
y ∈ { +1, -1} is the true value of the manual annotation;
a/(1+ exp (b · yv)) is a modulation factor for the logic loss that adaptively adjusts the contribution of each training sample to the training loss according to the input yv.
When yv is greater than 0, the sample is a simple sample, and the modulation factor assigns smaller weight to the logic loss; conversely, when yv < 0, the sample is a difficult sample and the modulation factor assigns a greater weight to the logic loss.
Therefore, the invention applies different weights to different samples by adopting the central logic loss function, thereby being capable of processing the problem of the imbalance of the foreground-background training samples.
After the network model training is completed, the visual tracking network is applied to an actual scene of the unmanned aerial vehicle for tracking the ground target, and the target in the video acquired by the unmanned aerial vehicle is tracked.
Fig. 5 and 6 show schematic flows of the twin network-based unmanned aerial vehicle ground target tracking method according to the embodiment of the invention.
As shown in fig. 6, the method includes the steps of:
extracting a first frame image at a known target position, namely a target feature map of a template image, by using a first feature extraction network, and extracting a second frame image, namely a search feature map of a search image, by using a second feature extraction network;
calculating the cross correlation between the search area of the second frame image and the target area of the first frame image according to the target characteristic diagram and the search characteristic diagram to obtain a score response diagram of the second frame image, and further obtaining the target position of the second frame image according to the score response diagram of the second frame image;
the first feature extraction network and the second feature extraction network are two branches of a twin convolutional network and respectively consist of a lightweight convolutional neural network.
For example, the 1 st frame of image in the video sequence is calibrated, and the target position of the 2 nd frame of image can be obtained by using the visual tracking network; and then the target position of the image of the 3 rd frame can be obtained by utilizing the visual tracking network according to the calibration result of the 1 st frame or the tracking result of the image of the 2 nd frame. By analogy, the target position of each frame of image in the video sequence can be obtained, and the target tracking of the video sequence is realized.
As shown in fig. 5, the method further includes updating parameters of the feature extraction network according to a score response graph obtained by cross-correlation calculation, so as to further improve accuracy and reliability of the feature extraction network.
Fig. 7 is a ground target visual tracking apparatus of an unmanned aerial vehicle according to an embodiment of the present invention, including:
the identification unit 701 is used for extracting a target feature map of a first frame image with a known target position by using a first feature extraction network and extracting a search feature map of a second frame image by using a second feature extraction network;
a calculating unit 702, configured to calculate, according to the target feature map and the search feature map, a cross-correlation between a search region of the second frame image and a target region of the first frame image, to obtain a score response map of the second frame image;
a determining unit 703, configured to obtain a target position of the second frame image according to the score response map of the second frame image;
the first feature extraction network and the second feature extraction network in the identification unit are two branches of a twin convolution network and respectively consist of a lightweight convolution neural network.
An embodiment of the present invention further provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the processor and storing instructions executable by the processor; when the instructions are executed by the processor, the processor performs the drone-to-ground target visual tracking method, or the training method.
Alternatively, the memory may be separate or integrated with the processor.
When the memory is independently provided, the electronic device further comprises a bus for connecting the memory and the processor.
Further, an embodiment of the present invention further provides a readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for visually tracking the ground target by the drone or the training method.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor to execute some steps of the methods described in the embodiments of the present application.
It should be understood that the processor may be a Central Processing Unit (CPU), but may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an extended EISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuit (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.
Those skilled in the art will appreciate that all or part of the steps for implementing the above-described method embodiments may be implemented by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; the storage medium includes various media that can store program codes, such as ROM, RAM, magnetic disk, or optical disk. The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. The utility model provides a training method of unmanned aerial vehicle to ground target visual tracking model, the tracking model is twin convolution network, and two branches of this twin convolution network are first characteristic extraction network, second characteristic extraction network respectively, its characterized in that, this training method includes:
acquiring a video sequence data set, wherein the data set comprises a template image and a search image which are paired;
extracting a target feature map of the template image by using a first feature extraction network, and extracting a search feature map of the search image by using a second feature extraction network;
calculating the cross correlation between the search area of the search image and the target area of the template image according to the target characteristic diagram and the search characteristic diagram to obtain a score response diagram of the search image;
calculating the difference between the score response graph and the true value according to a central logic loss function to obtain a difference result; and
back-propagating the difference results to adjust the weights of the layers in the twin neural network;
the first feature extraction network and the second feature extraction network are respectively composed of a lightweight convolutional neural network.
2. The training method of claim 1, wherein the lightweight convolutional neural network is a MobileNetV2 model.
3. Training method according to claim 1 or 2, characterized in that the central logic loss function is:
Figure FDA0002418512900000011
wherein, v ∈ Rm×nIs a score plot of the network output, y ∈ { +1, -1} is the artificially labeled true value a/(1+ exp (b · yv)) is a modulation factor for the logic loss that adaptively adjusts the contribution of each training sample to the training loss according to the input yv, a, b are parameters in the modulation factor.
4. Training method according to claim 3, wherein the modulation factor assigns a first weight for the logic loss when yv > 0; when yv < 0, the modulation factor assigns a second weight to the logic loss, the first weight being less than the second weight.
5. The training method according to claim 3, wherein the parameters a-2 and b-1 in the central logic loss function modulation factor.
6. An unmanned aerial vehicle ground target visual tracking method is characterized by comprising the following steps:
extracting a target feature map of a first frame image at a known target position by using a first feature extraction network, and extracting a search feature map of a second frame image by using a second feature extraction network;
calculating the cross correlation between the search area of the second frame image and the target area of the first frame image according to the target characteristic diagram and the search characteristic diagram to obtain a score response diagram of the second frame image, and further obtaining the target position of the second frame image according to the score response diagram of the second frame image;
the first feature extraction network and the second feature extraction network are two branches of a twin convolutional network and respectively composed of a lightweight convolutional neural network, and the lightweight convolutional neural network is a MobileNet V2 model.
7. An unmanned aerial vehicle is to ground target visual tracking device which characterized in that includes:
the identification unit is used for extracting a target feature map of a first frame image with a known target position by using a first feature extraction network and extracting a search feature map of a second frame image by using a second feature extraction network;
the calculating unit is used for calculating the cross correlation between the searching area of the second frame image and the target area of the first frame image according to the target characteristic graph and the searching characteristic graph to obtain a score response graph of the second frame image;
the determining unit is used for obtaining the target position of the second frame image according to the score response image of the second frame image;
the first feature extraction network and the second feature extraction network in the identification unit are two branches of a twin convolutional network and respectively comprise a lightweight convolutional neural network, and the lightweight convolutional neural network is a MobileNet V2 model.
8. An electronic device, comprising:
at least one processor; and a memory communicatively coupled to the processor and storing instructions executable by the processor; when the instructions are executed by the processor, the processor performs the drone-to-ground target visual tracking method of claim 6, or the training method of any of claims 1-5.
9. A readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the drone-to-ground target visual tracking method of claim 6, or the training method of any one of claims 1-5.
CN202010198544.9A 2020-03-20 2020-03-20 Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss Pending CN111340850A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010198544.9A CN111340850A (en) 2020-03-20 2020-03-20 Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010198544.9A CN111340850A (en) 2020-03-20 2020-03-20 Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss

Publications (1)

Publication Number Publication Date
CN111340850A true CN111340850A (en) 2020-06-26

Family

ID=71184196

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010198544.9A Pending CN111340850A (en) 2020-03-20 2020-03-20 Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss

Country Status (1)

Country Link
CN (1) CN111340850A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627050A (en) * 2020-07-27 2020-09-04 杭州雄迈集成电路技术股份有限公司 Training method and device for target tracking model
CN111860540A (en) * 2020-07-20 2020-10-30 深圳大学 Neural network image feature extraction system based on FPGA
CN112037254A (en) * 2020-08-11 2020-12-04 浙江大华技术股份有限公司 Target tracking method and related device
CN112070175A (en) * 2020-09-04 2020-12-11 湖南国科微电子股份有限公司 Visual odometer method, device, electronic equipment and storage medium
CN112116635A (en) * 2020-09-17 2020-12-22 赵龙 Visual tracking method and device based on rapid human body movement
CN112633517A (en) * 2020-12-29 2021-04-09 重庆星环人工智能科技研究院有限公司 Training method of machine learning model, computer equipment and storage medium
CN113033397A (en) * 2021-03-25 2021-06-25 开放智能机器(上海)有限公司 Target tracking method, device, equipment, medium and program product
CN113379797A (en) * 2021-06-01 2021-09-10 大连海事大学 Real-time tracking method and system for observation target of unmanned aerial vehicle
CN113569696A (en) * 2021-07-22 2021-10-29 福建师范大学 Method for extracting human body micro tremor signal based on video
CN113888595A (en) * 2021-09-29 2022-01-04 中国海洋大学 Twin network single-target visual tracking method based on difficult sample mining
WO2022236824A1 (en) * 2021-05-14 2022-11-17 北京大学深圳研究生院 Target detection network construction optimization method, apparatus and device, and medium and product
WO2024065389A1 (en) * 2022-09-29 2024-04-04 京东方科技集团股份有限公司 Method and system for detecting camera interference, and electronic device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107562805A (en) * 2017-08-08 2018-01-09 浙江大华技术股份有限公司 It is a kind of to scheme to search the method and device of figure
CN107992826A (en) * 2017-12-01 2018-05-04 广州优亿信息科技有限公司 A kind of people stream detecting method based on the twin network of depth
US20180129906A1 (en) * 2016-11-07 2018-05-10 Qualcomm Incorporated Deep cross-correlation learning for object tracking
CN108665485A (en) * 2018-04-16 2018-10-16 华中科技大学 A kind of method for tracking target merged with twin convolutional network based on correlation filtering
CN109191491A (en) * 2018-08-03 2019-01-11 华中科技大学 The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion
CN109815332A (en) * 2019-01-07 2019-05-28 平安科技(深圳)有限公司 Loss function optimization method, device, computer equipment and storage medium
CN109816695A (en) * 2019-01-31 2019-05-28 中国人民解放军国防科技大学 Target detection and tracking method for infrared small unmanned aerial vehicle under complex background
CN110084777A (en) * 2018-11-05 2019-08-02 哈尔滨理工大学 A kind of micro parts positioning and tracing method based on deep learning
CN112580416A (en) * 2019-09-27 2021-03-30 英特尔公司 Video tracking based on deep Siam network and Bayesian optimization

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129906A1 (en) * 2016-11-07 2018-05-10 Qualcomm Incorporated Deep cross-correlation learning for object tracking
CN107562805A (en) * 2017-08-08 2018-01-09 浙江大华技术股份有限公司 It is a kind of to scheme to search the method and device of figure
CN107992826A (en) * 2017-12-01 2018-05-04 广州优亿信息科技有限公司 A kind of people stream detecting method based on the twin network of depth
CN108665485A (en) * 2018-04-16 2018-10-16 华中科技大学 A kind of method for tracking target merged with twin convolutional network based on correlation filtering
CN109191491A (en) * 2018-08-03 2019-01-11 华中科技大学 The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion
CN110084777A (en) * 2018-11-05 2019-08-02 哈尔滨理工大学 A kind of micro parts positioning and tracing method based on deep learning
CN109815332A (en) * 2019-01-07 2019-05-28 平安科技(深圳)有限公司 Loss function optimization method, device, computer equipment and storage medium
CN109816695A (en) * 2019-01-31 2019-05-28 中国人民解放军国防科技大学 Target detection and tracking method for infrared small unmanned aerial vehicle under complex background
CN112580416A (en) * 2019-09-27 2021-03-30 英特尔公司 Video tracking based on deep Siam network and Bayesian optimization
US20220130130A1 (en) * 2019-09-27 2022-04-28 Intel Corporation Video tracking with deep siamese networks and bayesian optimization

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860540B (en) * 2020-07-20 2024-01-12 深圳大学 Neural network image feature extraction system based on FPGA
CN111860540A (en) * 2020-07-20 2020-10-30 深圳大学 Neural network image feature extraction system based on FPGA
CN111627050A (en) * 2020-07-27 2020-09-04 杭州雄迈集成电路技术股份有限公司 Training method and device for target tracking model
CN112037254A (en) * 2020-08-11 2020-12-04 浙江大华技术股份有限公司 Target tracking method and related device
CN112070175A (en) * 2020-09-04 2020-12-11 湖南国科微电子股份有限公司 Visual odometer method, device, electronic equipment and storage medium
CN112070175B (en) * 2020-09-04 2024-06-07 湖南国科微电子股份有限公司 Visual odometer method, visual odometer device, electronic equipment and storage medium
CN112116635A (en) * 2020-09-17 2020-12-22 赵龙 Visual tracking method and device based on rapid human body movement
CN112633517B (en) * 2020-12-29 2024-02-02 重庆星环人工智能科技研究院有限公司 Training method of machine learning model, computer equipment and storage medium
CN112633517A (en) * 2020-12-29 2021-04-09 重庆星环人工智能科技研究院有限公司 Training method of machine learning model, computer equipment and storage medium
CN113033397A (en) * 2021-03-25 2021-06-25 开放智能机器(上海)有限公司 Target tracking method, device, equipment, medium and program product
WO2022236824A1 (en) * 2021-05-14 2022-11-17 北京大学深圳研究生院 Target detection network construction optimization method, apparatus and device, and medium and product
CN113379797A (en) * 2021-06-01 2021-09-10 大连海事大学 Real-time tracking method and system for observation target of unmanned aerial vehicle
CN113569696A (en) * 2021-07-22 2021-10-29 福建师范大学 Method for extracting human body micro tremor signal based on video
CN113569696B (en) * 2021-07-22 2023-06-06 福建师范大学 Method for extracting human body micro tremor signals based on video
CN113888595A (en) * 2021-09-29 2022-01-04 中国海洋大学 Twin network single-target visual tracking method based on difficult sample mining
CN113888595B (en) * 2021-09-29 2024-05-14 中国海洋大学 Twin network single-target visual tracking method based on difficult sample mining
WO2024065389A1 (en) * 2022-09-29 2024-04-04 京东方科技集团股份有限公司 Method and system for detecting camera interference, and electronic device

Similar Documents

Publication Publication Date Title
CN111340850A (en) Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss
US11462007B2 (en) System for simplified generation of systems for broad area geospatial object detection
US10636169B2 (en) Synthesizing training data for broad area geospatial object detection
CN108470332B (en) Multi-target tracking method and device
CN112639828A (en) Data processing method, method and equipment for training neural network model
CN109492674B (en) Generation method and device of SSD (solid State disk) framework for target detection
WO2019232772A1 (en) Systems and methods for content identification
CN112132847A (en) Model training method, image segmentation method, device, electronic device and medium
US20240257423A1 (en) Image processing method and apparatus, and computer readable storage medium
CN114726692B (en) SERESESESENet-LSTM-based radiation source modulation mode identification method
WO2018222775A1 (en) Broad area geospatial object detection
US11881052B2 (en) Face search method and apparatus
CN114842411A (en) Group behavior identification method based on complementary space-time information modeling
CN116152938A (en) Method, device and equipment for training identity recognition model and transferring electronic resources
CN108460335B (en) Video fine-granularity identification method and device, computer equipment and storage medium
CN116152573A (en) Image recognition method, device, electronic equipment and computer readable storage medium
CN111783688A (en) Remote sensing image scene classification method based on convolutional neural network
CN111626212A (en) Method and device for identifying object in picture, storage medium and electronic device
CN114332716B (en) Clustering method and device for scenes in video, electronic equipment and storage medium
CN114882334A (en) Method for generating pre-training model, model training method and device
Xu et al. Deep Neural Network‐Based Sports Marketing Video Detection Research
Mindrup et al. Extending robust parameter design to noise by noise interactions with an application to hyperspectral imagery
CN118397315B (en) Low-visibility image matching method, device, computer equipment and storage medium
CN113269176B (en) Image processing model training method, image processing device and computer equipment
CN116958186A (en) Target tracking method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200626