CN111340850A - Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss - Google Patents
Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss Download PDFInfo
- Publication number
- CN111340850A CN111340850A CN202010198544.9A CN202010198544A CN111340850A CN 111340850 A CN111340850 A CN 111340850A CN 202010198544 A CN202010198544 A CN 202010198544A CN 111340850 A CN111340850 A CN 111340850A
- Authority
- CN
- China
- Prior art keywords
- network
- frame image
- target
- feature extraction
- search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000012549 training Methods 0.000 claims abstract description 51
- 238000000605 extraction Methods 0.000 claims abstract description 48
- 238000010586 diagram Methods 0.000 claims abstract description 26
- 230000004044 response Effects 0.000 claims abstract description 19
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 17
- 230000000007 visual effect Effects 0.000 claims description 34
- 230000006870 function Effects 0.000 claims description 19
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 abstract description 7
- 238000005516 engineering process Methods 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000007906 compression Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 2
- 238000011423 initialization method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an unmanned aerial vehicle ground target tracking method based on a twin network and central logic loss, which aims to solve the problems of more network parameters, large calculated amount, unbalanced positive and negative training samples and the like in the existing target tracking technology and belongs to the technical field of computer image processing. The method comprises the following steps: extracting a target feature map of a first frame image at a known target position by using a first feature extraction network, and extracting a search feature map of a second frame image by using a second feature extraction network; calculating the cross correlation between the search area of the second frame image and the target area of the first frame image according to the target characteristic diagram and the search characteristic diagram to obtain a score response diagram of the second frame image, and further obtaining the target position of the second frame image according to the score response diagram of the second frame image; the first feature extraction network and the second feature extraction network are two branches of a twin convolutional network and respectively consist of a lightweight convolutional neural network.
Description
Technical Field
The invention belongs to the technical field of computer image processing, and particularly relates to a ground target tracking method of an unmanned aerial vehicle based on a twin network and central logic loss.
Background
Deep learning based visual tracking algorithms, such as the full convolution twin network based visual tracking algorithm shown in fig. 1, have been accepted and approved by a wide range of developers and users.
The method comprises the steps of respectively inputting a template image extracted from a first frame and a search image extracted from each frame into two sub-networks to extract high-level semantic features, and then performing cross-correlation on the high-level semantic features to obtain the similarity of the template image at each position in the search image. Typically, the parameters in both sub-networks are shared and can be learned offline using training data. The high-level semantic features have rich semantic characteristics related to the target category, so that the method has strong robustness on target appearance change caused by occlusion, distortion and the like, and the network does not need to be updated in the tracking process, thereby greatly reducing the calculation amount of the algorithm and ensuring the real-time performance of the algorithm. The network has two inputs: the target template image and the search area image are input to carry out feature extraction through a twin subnet of a twin neural network sharing parameters.
However, such methods are mainly oriented to general visual tracking tasks and cannot meet the hardware platform of the unmanned aerial vehicle with limited computing and storage resources. Firstly, a great number of weight parameters exist in the convolutional neural network, and the requirement for the memory of the equipment is high when the great number of weight parameters are stored. Secondly, the computing resources of the embedded hardware platforms such as the unmanned aerial vehicle are limited, so that the efficient and real-time convolution calculation in the convolution neural network is difficult to realize.
Meanwhile, in the actual tracking process, such implementation requires intensive detection of the target at each position of the search area. For the unmanned aerial vehicle aerial image, the search area generally contains many negative samples in a simple background (such as area 3 in fig. 2), few difficult negative samples (such as area 1 in fig. 2), and a positive sample containing a foreground target (such as area 2 in fig. 2), and the large number of negative samples in the simple background causes imbalance of training samples and dominates training of the network, thereby causing model degradation.
Disclosure of Invention
The invention aims to provide a ground target tracking method of an unmanned aerial vehicle based on a twin network and central logic loss aiming at the characteristics of an unmanned aerial vehicle hardware platform, so as to solve the problems of more network parameters, large calculated amount, unbalanced positive and negative training samples and the like in the existing target tracking technology.
According to a first aspect of the present invention, a training method for a ground target visual tracking model by an unmanned aerial vehicle is provided, where the tracking model is a twin convolutional network, and two branches of the twin convolutional network are respectively a first feature extraction network and a second feature extraction network, and the training method includes:
acquiring a video sequence data set, wherein the data set comprises a template image and a search image which are paired;
extracting a target feature map of the template image by using a first feature extraction network, and extracting a search feature map of the search image by using a second feature extraction network;
calculating the cross correlation between the search area of the search image and the target area of the template image according to the target characteristic diagram and the search characteristic diagram to obtain a score response diagram of the search image;
calculating the difference between the score response graph and the true value according to a central logic loss function to obtain a difference result; and
back-propagating the difference results to adjust the weights of the layers in the twin neural network;
the first feature extraction network and the second feature extraction network are respectively composed of a lightweight convolutional neural network.
Optionally, the lightweight convolutional neural network is a MobileNetV2 model.
Optionally, the central logic loss function is:
wherein, v ∈ Rm×nIs a score plot of the network output, y ∈ { +1, -1} is the artificially labeled true value a/(1+ exp (b · yv)) is a modulation factor for the logic loss that adaptively adjusts the contribution of each training sample to the training loss according to the input yv.
Further, when yv > 0, the modulation factor assigns a first weight to the logic loss; when yv < 0, the modulation factor assigns a second weight to the logic loss, the first weight being less than the second weight.
According to a second aspect of the invention, an unmanned aerial vehicle ground target visual tracking method comprises the following steps:
extracting a target feature map of a first frame image at a known target position by using a first feature extraction network, and extracting a search feature map of a second frame image by using a second feature extraction network;
calculating the cross correlation between the search area of the second frame image and the target area of the first frame image according to the target characteristic diagram and the search characteristic diagram to obtain a score response diagram of the second frame image, and further obtaining the target position of the second frame image according to the score response diagram of the second frame image;
the first feature extraction network and the second feature extraction network are two branches of a twin convolutional network and respectively composed of a lightweight convolutional neural network, and the lightweight convolutional neural network is a MobileNet V2 model.
According to a third aspect of the present invention, an unmanned aerial vehicle visual tracking device for a ground target comprises:
the identification unit is used for extracting a target feature map of a first frame image with a known target position by using a first feature extraction network and extracting a search feature map of a second frame image by using a second feature extraction network;
the calculating unit is used for calculating the cross correlation between the searching area of the second frame image and the target area of the first frame image according to the target characteristic graph and the searching characteristic graph to obtain a score response graph of the second frame image;
the determining unit is used for obtaining the target position of the second frame image according to the score response image of the second frame image;
the first feature extraction network and the second feature extraction network in the identification unit are two branches of a twin convolution network and respectively consist of a lightweight convolution neural network.
According to a fourth aspect of the invention, an electronic device comprises:
at least one processor; and
a memory communicatively coupled to the processor and storing instructions executable by the processor; when the instructions are executed by the processor, the processor performs the drone-to-ground target visual tracking method, or the training method.
According to a fifth aspect of the present invention, a readable storage medium is stored with a computer program, wherein the computer program is executed by a processor to implement the method for visually tracking the target by the drone or the training method.
The invention adopts the MobileNet V2 model in the lightweight network as the feature extraction sub-network at the front end of the depth frame, thereby reducing the calculation complexity and the number of parameters in the convolutional neural network. Meanwhile, the processing speed and the accuracy can be well balanced, so that the method can adapt to the limited storage and calculation resources of the hardware platform of the unmanned aerial vehicle.
In addition, the invention applies different weights to different training samples in the search area by adopting a central logic loss function, solves the problem of unbalance of positive and negative training samples, avoids the problem of network degradation in offline training and enables the learned convolution characteristics to have stronger discrimination.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is obvious that the drawings in the following description are some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive exercise.
FIG. 1 is a schematic diagram of a conventional twin network structure;
FIG. 2 is a practical target tracking scenario including a simple negative example (area 3), a difficult negative example (area 1), and a positive example (area 2) containing a foreground target;
FIG. 3 is a schematic diagram of a visual tracking network model structure according to an embodiment of the invention;
FIG. 4 is a schematic diagram of a MobileNet V2 network structure;
FIG. 5 is a schematic flow chart of a visual tracking network training and tracking method according to an embodiment of the present invention;
fig. 6 is a schematic flow chart of a method for visually tracking a ground target by an unmanned aerial vehicle according to an embodiment of the invention;
fig. 7 is a schematic structural diagram of a visual tracking device for a ground target of an unmanned aerial vehicle according to an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the following description, UAV (unmanned aerial Vehicle) refers primarily to an unmanned aerial Vehicle that is operated using a radio remote control device and a self-contained program control device, or is operated autonomously, either completely or intermittently, by an onboard computer.
FIG. 3 illustrates a twin network based visual tracking network constructed in accordance with an embodiment of the invention.
As shown in fig. 3, the visual tracking network includes a first branch and a second branch with the same structure, and the two branch distributions include a feature extraction network for extracting a feature map of an image. The feature extraction network of each branch in the invention is composed of a lightweight convolutional neural network. According to one embodiment of the invention, the lightweight convolutional neural network is the MobileNet V2 model.
Fig. 4 shows the network structure of the MobileNet V2 model. MobileNet V2 uses convolution that is separable in depth as an efficient building block. In addition, MobileNet V2 introduces two new architectural features: 1) a linear bottleneck layer between layers; 2) and connecting shortcuts between the bottleneck layers. This network architecture provides additional security, privacy and power benefits to MobileNet V2.
Therefore, the MobileNet V2 network sets a first convolutional layer of 1 × 1 for expansion before extracting the features of the image by using the DW convolutional layer, and then compresses by using a third convolutional layer after extracting the features, so that the MobileNet V2 network realizes the process of expansion → convolutional feature extraction → compression, and avoids the problem of channel loss.
Meanwhile, ReLU functions are respectively arranged at the output ends of the first convolution layer and the second convolution layer. Since the expansion → convolution extraction → compression process described above is used, a problem is encountered after compression, namely the ReLU function destroys the feature, the ReLU function has all zero outputs for negative inputs, and therefore the Linear function is used for the activation function of the third convolutional layer in order to avoid further loss of the feature.
The vision tracking network based on the twin network provided by the invention adopts a central logic loss function, and the central logic loss function is as follows:
wherein:
v∈Rm×nis a score chart of the network output, and m × n is the size of the score chart;
y ∈ { +1, -1} is the true value of the manual annotation;
a/(1+ exp (b · yv)) is a modulation factor for the logic loss, and a and b are parameters in the modulation factor, e.g. according to an embodiment of the present invention, a is 2 and b is 1.
The modulation factor adaptively adjusts the contribution of each training sample to the training loss according to the input yv. When yv is greater than 0, the sample is a simple sample, and the modulation factor assigns smaller weight to the logic loss; conversely, when yv < 0, the sample is a difficult sample and the modulation factor assigns a greater weight to the logic loss.
Most convolutional neural networks use logic loss or cross entropy loss as a supervisory signal in a deep model training process, and a model trained by the loss function has good separability but poor discriminability.
Unlike closed set problems such as object classification and recognition, target tracking belongs to an open set problem, and requires not only separability of features output by a depth model, but also strong discriminability. When processing long-tailed datasets, where most samples belong to few classes and many other classes have very few samples, it can be tricky how to weight the loss of the different classes. For the target tracking problem, foreground targets are easier to collect as positive samples, while the difficult negative samples in the background that are useful for training are few.
Therefore, the invention can self-adaptively adjust the proportion of positive and negative samples by using the central logic loss function in an end-to-end form, thereby preventing the learned network from being influenced by imbalance of training samples. Specifically, the invention applies different weights to different samples by adopting the central logic loss function, so that the problem of imbalance of foreground-background training samples can be solved.
Fig. 5 shows an exemplary process of visual tracking network training and unmanned aerial vehicle-to-ground target visual tracking using the trained network model according to an embodiment of the present invention.
As shown in fig. 5, the training process of the visual tracking network in the present invention includes a pre-training phase and a fine-tuning phase.
In the pre-training stage, for example, the video database in ImageNet large-scale visual recognition challenge match ILSVRC is adopted as a sample video sequence, and the marked sample video sequence is used for training the visual tracking network. During training, the maximum iteration times and the learning rate of the visual tracking network are set, a network parameter initialization method and a back propagation method are selected, and network parameters are optimized.
According to an embodiment of the invention, the maximum iteration times and the learning rate are set, and the back propagation method is selected as follows:
maximum number of iterations: 50 epoch;
initial learning rate: 0.001;
the network initialization method comprises the following steps: xavier process
The back propagation method comprises the following steps: random gradient descent method.
After the pre-training phase is completed, the network parameters are further optimized through a fine-tuning phase.
In the fine adjustment stage, the method forms a video data set by using an unmanned aerial vehicle to collect the ground targets, labels each frame image of the video data set according to the category, divides the labeled video data set into a training set, a verification set and a test set, and finally processes the labeled video data set into data types which can be identified by a visual tracking network model.
And training the pre-trained visual tracking network by utilizing the training set and the verification set so as to finely tune the visual tracking network and reserve the structure and parameters of the finely tuned visual tracking network model.
And testing the finely adjusted visual tracking network by using the test set to obtain the tracking accuracy. And judging the tracking accuracy, wherein if the tracking accuracy can meet the actual engineering requirements, the visual tracking network model can be applied to the actual task of identifying the specific target of the unmanned aerial vehicle on the ground. Otherwise, the training set cannot meet the actual engineering requirements, the training set needs to be expanded, and the pre-training and fine-tuning steps are restarted until the actual engineering requirements are met.
In the training process, the invention adopts a central logic loss function, and the central logic loss function is as follows:
wherein:
v∈Rm×nis a score plot of the network output;
y ∈ { +1, -1} is the true value of the manual annotation;
a/(1+ exp (b · yv)) is a modulation factor for the logic loss that adaptively adjusts the contribution of each training sample to the training loss according to the input yv.
When yv is greater than 0, the sample is a simple sample, and the modulation factor assigns smaller weight to the logic loss; conversely, when yv < 0, the sample is a difficult sample and the modulation factor assigns a greater weight to the logic loss.
Therefore, the invention applies different weights to different samples by adopting the central logic loss function, thereby being capable of processing the problem of the imbalance of the foreground-background training samples.
After the network model training is completed, the visual tracking network is applied to an actual scene of the unmanned aerial vehicle for tracking the ground target, and the target in the video acquired by the unmanned aerial vehicle is tracked.
Fig. 5 and 6 show schematic flows of the twin network-based unmanned aerial vehicle ground target tracking method according to the embodiment of the invention.
As shown in fig. 6, the method includes the steps of:
extracting a first frame image at a known target position, namely a target feature map of a template image, by using a first feature extraction network, and extracting a second frame image, namely a search feature map of a search image, by using a second feature extraction network;
calculating the cross correlation between the search area of the second frame image and the target area of the first frame image according to the target characteristic diagram and the search characteristic diagram to obtain a score response diagram of the second frame image, and further obtaining the target position of the second frame image according to the score response diagram of the second frame image;
the first feature extraction network and the second feature extraction network are two branches of a twin convolutional network and respectively consist of a lightweight convolutional neural network.
For example, the 1 st frame of image in the video sequence is calibrated, and the target position of the 2 nd frame of image can be obtained by using the visual tracking network; and then the target position of the image of the 3 rd frame can be obtained by utilizing the visual tracking network according to the calibration result of the 1 st frame or the tracking result of the image of the 2 nd frame. By analogy, the target position of each frame of image in the video sequence can be obtained, and the target tracking of the video sequence is realized.
As shown in fig. 5, the method further includes updating parameters of the feature extraction network according to a score response graph obtained by cross-correlation calculation, so as to further improve accuracy and reliability of the feature extraction network.
Fig. 7 is a ground target visual tracking apparatus of an unmanned aerial vehicle according to an embodiment of the present invention, including:
the identification unit 701 is used for extracting a target feature map of a first frame image with a known target position by using a first feature extraction network and extracting a search feature map of a second frame image by using a second feature extraction network;
a calculating unit 702, configured to calculate, according to the target feature map and the search feature map, a cross-correlation between a search region of the second frame image and a target region of the first frame image, to obtain a score response map of the second frame image;
a determining unit 703, configured to obtain a target position of the second frame image according to the score response map of the second frame image;
the first feature extraction network and the second feature extraction network in the identification unit are two branches of a twin convolution network and respectively consist of a lightweight convolution neural network.
An embodiment of the present invention further provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the processor and storing instructions executable by the processor; when the instructions are executed by the processor, the processor performs the drone-to-ground target visual tracking method, or the training method.
Alternatively, the memory may be separate or integrated with the processor.
When the memory is independently provided, the electronic device further comprises a bus for connecting the memory and the processor.
Further, an embodiment of the present invention further provides a readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for visually tracking the ground target by the drone or the training method.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor to execute some steps of the methods described in the embodiments of the present application.
It should be understood that the processor may be a Central Processing Unit (CPU), but may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an extended EISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuit (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.
Those skilled in the art will appreciate that all or part of the steps for implementing the above-described method embodiments may be implemented by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; the storage medium includes various media that can store program codes, such as ROM, RAM, magnetic disk, or optical disk. The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (9)
1. The utility model provides a training method of unmanned aerial vehicle to ground target visual tracking model, the tracking model is twin convolution network, and two branches of this twin convolution network are first characteristic extraction network, second characteristic extraction network respectively, its characterized in that, this training method includes:
acquiring a video sequence data set, wherein the data set comprises a template image and a search image which are paired;
extracting a target feature map of the template image by using a first feature extraction network, and extracting a search feature map of the search image by using a second feature extraction network;
calculating the cross correlation between the search area of the search image and the target area of the template image according to the target characteristic diagram and the search characteristic diagram to obtain a score response diagram of the search image;
calculating the difference between the score response graph and the true value according to a central logic loss function to obtain a difference result; and
back-propagating the difference results to adjust the weights of the layers in the twin neural network;
the first feature extraction network and the second feature extraction network are respectively composed of a lightweight convolutional neural network.
2. The training method of claim 1, wherein the lightweight convolutional neural network is a MobileNetV2 model.
3. Training method according to claim 1 or 2, characterized in that the central logic loss function is:
wherein, v ∈ Rm×nIs a score plot of the network output, y ∈ { +1, -1} is the artificially labeled true value a/(1+ exp (b · yv)) is a modulation factor for the logic loss that adaptively adjusts the contribution of each training sample to the training loss according to the input yv, a, b are parameters in the modulation factor.
4. Training method according to claim 3, wherein the modulation factor assigns a first weight for the logic loss when yv > 0; when yv < 0, the modulation factor assigns a second weight to the logic loss, the first weight being less than the second weight.
5. The training method according to claim 3, wherein the parameters a-2 and b-1 in the central logic loss function modulation factor.
6. An unmanned aerial vehicle ground target visual tracking method is characterized by comprising the following steps:
extracting a target feature map of a first frame image at a known target position by using a first feature extraction network, and extracting a search feature map of a second frame image by using a second feature extraction network;
calculating the cross correlation between the search area of the second frame image and the target area of the first frame image according to the target characteristic diagram and the search characteristic diagram to obtain a score response diagram of the second frame image, and further obtaining the target position of the second frame image according to the score response diagram of the second frame image;
the first feature extraction network and the second feature extraction network are two branches of a twin convolutional network and respectively composed of a lightweight convolutional neural network, and the lightweight convolutional neural network is a MobileNet V2 model.
7. An unmanned aerial vehicle is to ground target visual tracking device which characterized in that includes:
the identification unit is used for extracting a target feature map of a first frame image with a known target position by using a first feature extraction network and extracting a search feature map of a second frame image by using a second feature extraction network;
the calculating unit is used for calculating the cross correlation between the searching area of the second frame image and the target area of the first frame image according to the target characteristic graph and the searching characteristic graph to obtain a score response graph of the second frame image;
the determining unit is used for obtaining the target position of the second frame image according to the score response image of the second frame image;
the first feature extraction network and the second feature extraction network in the identification unit are two branches of a twin convolutional network and respectively comprise a lightweight convolutional neural network, and the lightweight convolutional neural network is a MobileNet V2 model.
8. An electronic device, comprising:
at least one processor; and a memory communicatively coupled to the processor and storing instructions executable by the processor; when the instructions are executed by the processor, the processor performs the drone-to-ground target visual tracking method of claim 6, or the training method of any of claims 1-5.
9. A readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the drone-to-ground target visual tracking method of claim 6, or the training method of any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010198544.9A CN111340850A (en) | 2020-03-20 | 2020-03-20 | Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010198544.9A CN111340850A (en) | 2020-03-20 | 2020-03-20 | Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111340850A true CN111340850A (en) | 2020-06-26 |
Family
ID=71184196
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010198544.9A Pending CN111340850A (en) | 2020-03-20 | 2020-03-20 | Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111340850A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111627050A (en) * | 2020-07-27 | 2020-09-04 | 杭州雄迈集成电路技术股份有限公司 | Training method and device for target tracking model |
CN111860540A (en) * | 2020-07-20 | 2020-10-30 | 深圳大学 | Neural network image feature extraction system based on FPGA |
CN112037254A (en) * | 2020-08-11 | 2020-12-04 | 浙江大华技术股份有限公司 | Target tracking method and related device |
CN112070175A (en) * | 2020-09-04 | 2020-12-11 | 湖南国科微电子股份有限公司 | Visual odometer method, device, electronic equipment and storage medium |
CN112116635A (en) * | 2020-09-17 | 2020-12-22 | 赵龙 | Visual tracking method and device based on rapid human body movement |
CN112633517A (en) * | 2020-12-29 | 2021-04-09 | 重庆星环人工智能科技研究院有限公司 | Training method of machine learning model, computer equipment and storage medium |
CN113033397A (en) * | 2021-03-25 | 2021-06-25 | 开放智能机器(上海)有限公司 | Target tracking method, device, equipment, medium and program product |
CN113379797A (en) * | 2021-06-01 | 2021-09-10 | 大连海事大学 | Real-time tracking method and system for observation target of unmanned aerial vehicle |
CN113569696A (en) * | 2021-07-22 | 2021-10-29 | 福建师范大学 | Method for extracting human body micro tremor signal based on video |
CN113888595A (en) * | 2021-09-29 | 2022-01-04 | 中国海洋大学 | Twin network single-target visual tracking method based on difficult sample mining |
WO2022236824A1 (en) * | 2021-05-14 | 2022-11-17 | 北京大学深圳研究生院 | Target detection network construction optimization method, apparatus and device, and medium and product |
WO2024065389A1 (en) * | 2022-09-29 | 2024-04-04 | 京东方科技集团股份有限公司 | Method and system for detecting camera interference, and electronic device |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107562805A (en) * | 2017-08-08 | 2018-01-09 | 浙江大华技术股份有限公司 | It is a kind of to scheme to search the method and device of figure |
CN107992826A (en) * | 2017-12-01 | 2018-05-04 | 广州优亿信息科技有限公司 | A kind of people stream detecting method based on the twin network of depth |
US20180129906A1 (en) * | 2016-11-07 | 2018-05-10 | Qualcomm Incorporated | Deep cross-correlation learning for object tracking |
CN108665485A (en) * | 2018-04-16 | 2018-10-16 | 华中科技大学 | A kind of method for tracking target merged with twin convolutional network based on correlation filtering |
CN109191491A (en) * | 2018-08-03 | 2019-01-11 | 华中科技大学 | The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion |
CN109815332A (en) * | 2019-01-07 | 2019-05-28 | 平安科技(深圳)有限公司 | Loss function optimization method, device, computer equipment and storage medium |
CN109816695A (en) * | 2019-01-31 | 2019-05-28 | 中国人民解放军国防科技大学 | Target detection and tracking method for infrared small unmanned aerial vehicle under complex background |
CN110084777A (en) * | 2018-11-05 | 2019-08-02 | 哈尔滨理工大学 | A kind of micro parts positioning and tracing method based on deep learning |
CN112580416A (en) * | 2019-09-27 | 2021-03-30 | 英特尔公司 | Video tracking based on deep Siam network and Bayesian optimization |
-
2020
- 2020-03-20 CN CN202010198544.9A patent/CN111340850A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180129906A1 (en) * | 2016-11-07 | 2018-05-10 | Qualcomm Incorporated | Deep cross-correlation learning for object tracking |
CN107562805A (en) * | 2017-08-08 | 2018-01-09 | 浙江大华技术股份有限公司 | It is a kind of to scheme to search the method and device of figure |
CN107992826A (en) * | 2017-12-01 | 2018-05-04 | 广州优亿信息科技有限公司 | A kind of people stream detecting method based on the twin network of depth |
CN108665485A (en) * | 2018-04-16 | 2018-10-16 | 华中科技大学 | A kind of method for tracking target merged with twin convolutional network based on correlation filtering |
CN109191491A (en) * | 2018-08-03 | 2019-01-11 | 华中科技大学 | The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion |
CN110084777A (en) * | 2018-11-05 | 2019-08-02 | 哈尔滨理工大学 | A kind of micro parts positioning and tracing method based on deep learning |
CN109815332A (en) * | 2019-01-07 | 2019-05-28 | 平安科技(深圳)有限公司 | Loss function optimization method, device, computer equipment and storage medium |
CN109816695A (en) * | 2019-01-31 | 2019-05-28 | 中国人民解放军国防科技大学 | Target detection and tracking method for infrared small unmanned aerial vehicle under complex background |
CN112580416A (en) * | 2019-09-27 | 2021-03-30 | 英特尔公司 | Video tracking based on deep Siam network and Bayesian optimization |
US20220130130A1 (en) * | 2019-09-27 | 2022-04-28 | Intel Corporation | Video tracking with deep siamese networks and bayesian optimization |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111860540B (en) * | 2020-07-20 | 2024-01-12 | 深圳大学 | Neural network image feature extraction system based on FPGA |
CN111860540A (en) * | 2020-07-20 | 2020-10-30 | 深圳大学 | Neural network image feature extraction system based on FPGA |
CN111627050A (en) * | 2020-07-27 | 2020-09-04 | 杭州雄迈集成电路技术股份有限公司 | Training method and device for target tracking model |
CN112037254A (en) * | 2020-08-11 | 2020-12-04 | 浙江大华技术股份有限公司 | Target tracking method and related device |
CN112070175A (en) * | 2020-09-04 | 2020-12-11 | 湖南国科微电子股份有限公司 | Visual odometer method, device, electronic equipment and storage medium |
CN112070175B (en) * | 2020-09-04 | 2024-06-07 | 湖南国科微电子股份有限公司 | Visual odometer method, visual odometer device, electronic equipment and storage medium |
CN112116635A (en) * | 2020-09-17 | 2020-12-22 | 赵龙 | Visual tracking method and device based on rapid human body movement |
CN112633517B (en) * | 2020-12-29 | 2024-02-02 | 重庆星环人工智能科技研究院有限公司 | Training method of machine learning model, computer equipment and storage medium |
CN112633517A (en) * | 2020-12-29 | 2021-04-09 | 重庆星环人工智能科技研究院有限公司 | Training method of machine learning model, computer equipment and storage medium |
CN113033397A (en) * | 2021-03-25 | 2021-06-25 | 开放智能机器(上海)有限公司 | Target tracking method, device, equipment, medium and program product |
WO2022236824A1 (en) * | 2021-05-14 | 2022-11-17 | 北京大学深圳研究生院 | Target detection network construction optimization method, apparatus and device, and medium and product |
CN113379797A (en) * | 2021-06-01 | 2021-09-10 | 大连海事大学 | Real-time tracking method and system for observation target of unmanned aerial vehicle |
CN113569696A (en) * | 2021-07-22 | 2021-10-29 | 福建师范大学 | Method for extracting human body micro tremor signal based on video |
CN113569696B (en) * | 2021-07-22 | 2023-06-06 | 福建师范大学 | Method for extracting human body micro tremor signals based on video |
CN113888595A (en) * | 2021-09-29 | 2022-01-04 | 中国海洋大学 | Twin network single-target visual tracking method based on difficult sample mining |
CN113888595B (en) * | 2021-09-29 | 2024-05-14 | 中国海洋大学 | Twin network single-target visual tracking method based on difficult sample mining |
WO2024065389A1 (en) * | 2022-09-29 | 2024-04-04 | 京东方科技集团股份有限公司 | Method and system for detecting camera interference, and electronic device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111340850A (en) | Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss | |
US11462007B2 (en) | System for simplified generation of systems for broad area geospatial object detection | |
US10636169B2 (en) | Synthesizing training data for broad area geospatial object detection | |
CN108470332B (en) | Multi-target tracking method and device | |
CN112639828A (en) | Data processing method, method and equipment for training neural network model | |
CN109492674B (en) | Generation method and device of SSD (solid State disk) framework for target detection | |
WO2019232772A1 (en) | Systems and methods for content identification | |
CN112132847A (en) | Model training method, image segmentation method, device, electronic device and medium | |
US20240257423A1 (en) | Image processing method and apparatus, and computer readable storage medium | |
CN114726692B (en) | SERESESESENet-LSTM-based radiation source modulation mode identification method | |
WO2018222775A1 (en) | Broad area geospatial object detection | |
US11881052B2 (en) | Face search method and apparatus | |
CN114842411A (en) | Group behavior identification method based on complementary space-time information modeling | |
CN116152938A (en) | Method, device and equipment for training identity recognition model and transferring electronic resources | |
CN108460335B (en) | Video fine-granularity identification method and device, computer equipment and storage medium | |
CN116152573A (en) | Image recognition method, device, electronic equipment and computer readable storage medium | |
CN111783688A (en) | Remote sensing image scene classification method based on convolutional neural network | |
CN111626212A (en) | Method and device for identifying object in picture, storage medium and electronic device | |
CN114332716B (en) | Clustering method and device for scenes in video, electronic equipment and storage medium | |
CN114882334A (en) | Method for generating pre-training model, model training method and device | |
Xu et al. | Deep Neural Network‐Based Sports Marketing Video Detection Research | |
Mindrup et al. | Extending robust parameter design to noise by noise interactions with an application to hyperspectral imagery | |
CN118397315B (en) | Low-visibility image matching method, device, computer equipment and storage medium | |
CN113269176B (en) | Image processing model training method, image processing device and computer equipment | |
CN116958186A (en) | Target tracking method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200626 |