CN110992378A - Dynamic update visual tracking aerial photography method and system based on rotor flying robot - Google Patents

Dynamic update visual tracking aerial photography method and system based on rotor flying robot Download PDF

Info

Publication number
CN110992378A
CN110992378A CN201911220924.1A CN201911220924A CN110992378A CN 110992378 A CN110992378 A CN 110992378A CN 201911220924 A CN201911220924 A CN 201911220924A CN 110992378 A CN110992378 A CN 110992378A
Authority
CN
China
Prior art keywords
target
image
frame
network
tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911220924.1A
Other languages
Chinese (zh)
Other versions
CN110992378B (en
Inventor
谭建豪
谭姗姗
殷旺
刘力铭
王耀南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN201911220924.1A priority Critical patent/CN110992378B/en
Publication of CN110992378A publication Critical patent/CN110992378A/en
Application granted granted Critical
Publication of CN110992378B publication Critical patent/CN110992378B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention belongs to the technical field of unmanned aerial vehicles, and discloses a dynamic update visual tracking aerial photography method and system based on a rotor wing flying robot, wherein an HOG + SVM is used for detecting a target in a picture; then, the AlexNet network structure is improved by designing three important influence factors, namely the size of a twin network receptive field, the total network step length and feature filling, and a smooth matrix and a background suppression matrix are added, so that the features of the first frames are effectively utilized; and (3) integrating multilayer characteristic elements to learn the appearance change and background suppression of the target on line, and using continuous video sequence training. The invention utilizes the dynamic twin network to ensure the balance of precision and real-time tracking, uses the dynamic update network to rapidly learn the appearance change of the target, fully utilizes the space-time information of the target, and effectively solves the problems of drift, target shielding and the like. According to the invention, a deeper network is selected to obtain target characteristics, and appearance learning and background suppression are used for dynamic tracking, so that robustness is effectively increased.

Description

Dynamic update visual tracking aerial photography method and system based on rotor flying robot
Technical Field
The invention belongs to the technical field of unmanned aerial vehicles, and particularly relates to a unmanned aerial vehicle
Relates to a dynamic update visual tracking aerial photography method and system based on a rotor flying robot.
Background
Currently, the closest prior art: unmanned Aerial Vehicles (UAVs) are Unmanned Aerial vehicles that are operated by radio remote control devices or program control devices and are capable of autonomously performing flight missions without human intervention. In military, due to the characteristics of small size, strong maneuverability, easy control and the like, the rotor flying robot can operate in extreme environments, so that the rotor flying robot is widely applied to anti-terrorism explosion prevention, traffic monitoring, earthquake resistance and disaster relief. In the civil field, unmanned aerial vehicle can be used for fields such as high altitude shooting, pedestrian detection. When a rotor flying robot performs a specific task, it is generally required to perform tracking flight on a specific target and transmit information of the target to a ground station in real time. Therefore, tracking flight of a vision-based rotor flying robot is a great concern and is a current research hotspot.
The tracking flight of the rotor flying robot refers to that a camera is carried on the rotor flying robot flying at low altitude, an image frame sequence of a ground moving target is obtained in real time, image coordinates of the target are calculated and used as input of visual servo control, the speed required by an aircraft is obtained, the position and the posture of the rotor flying robot are further automatically controlled, and the tracked ground moving target is maintained near the visual field center of the camera. The traditional twin network tracking method is good in real-time performance, but when the target is lost due to target shielding and the influence of complex background or illumination is caused, the situation that the target cannot be correctly tracked may occur by taking the first frame as a standard reference. The method aims at the situation that the target is lost due to the influences of shielding, appearance change of the target, tracker drifting, background factor interference and the like in the aerial photography process of the rotor flying robot.
In summary, the problems of the prior art are as follows: (1) the existing rotor flying robot is easy to cause the situations of drift, target loss and the like due to the influences of shielding, illumination, background factor interference and the like in the aerial photography process.
(2) In the prior art, AlexNet network is basically used for extracting features by a tracker, and deeper features related to a target can be extracted by adopting a deeper CIRESNet network, so that the tracker can lock the target in a search area and reduce the influence of a complex background.
(3) Although the existing twin network tracker operates at a high frame rate, the absence of an update part in its frame means that the tracker cannot quickly cope with drastic changes in the target or background, possibly leading to tracking drift in some cases.
The difficulty of solving the technical problems is as follows: when the appearance of the target is changed drastically during the tracking process, the method of identifying the position of the target in the search area using the color feature and the contour feature may fail.
In the tracking process, if each frame is re-detected or a threshold value is used for judging whether tracking loss occurs, the operation time is increased.
More feature information can be obtained by using the CIResNet network for feature extraction, but the tracker frame rate is slightly reduced due to the fact that the CIResNet network is deeper compared with the AlexNet network.
The significance of solving the technical problems is as follows: the tracking precision can be improved by using deeper network extraction features, and the overall performance of the tracker can be improved.
The dynamic updating part enables the robustness of the tracker to be improved, and the tracker does not learn only the characteristic information of the first frame any more, but continuously learns the tracking result of the previous frame, so that the tracker adapts to the change of the target.
The CIRESNet network can effectively extract more sample characteristics, and the tracker can learn more characteristic information of a target, so that the method is suitable for the increase of the complex background capability.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a dynamic update visual tracking aerial photography method and system based on a rotor wing flying robot.
The invention is realized in this way, a dynamic update visual tracking aerial photography method based on a rotor flying robot, comprising the following steps:
firstly, carrying out target detection on an input image by using an HOG feature extraction algorithm and a support vector machine algorithm SVM;
and step two, transmitting target frame information obtained by target detection to a visual tracking part, and tracking the target in real time by adopting a dynamic updating twin network based on a CIRESNet network.
Further, in the first step, the target detection method comprises:
(1) dividing the image into a plurality of connected regions which are 8 multiplied by 8 pixel cell units;
(2) collecting gradient amplitude and gradient direction of each pixel point in a cell unit, averagely dividing the gradient direction of [ -90 degrees, 90 degrees ] into 9 intervals (bin), and using the gradient amplitude as weight;
(3) performing histogram statistics on the gradient amplitude of each pixel in the unit in each direction bin interval to obtain a one-dimensional gradient direction histogram;
(4) performing contrast normalization on the histogram on the spatial block;
(5) extracting HOG descriptors through a detection window, and combining the HOG descriptors of all blocks in the detection window to form a final feature vector;
(6) inputting the feature vector into a linear SVM, and performing target detection by using an SVM classifier;
(7) dividing a detection window into overlapped blocks, calculating HOG descriptors for the blocks, and putting formed feature vectors into a linear SVM for target/non-target binary classification;
(8) scanning the detection window at all positions and scales of the whole image, and carrying out non-maximum suppression on the output pyramid to detect a target;
the method for carrying out contrast normalization on the histogram in the step (4) comprises the following steps:
the density of each histogram in this bin is first calculated and then normalized for each cell unit in the bin based on this density.
Further, in the first step, the HOG feature extraction method specifically includes:
① normalizing the whole image, and normalizing the color space of the input image by Gamma correction method, wherein the Gamma correction formula is as follows:
f(I)=Iγ
wherein, I is an image pixel value, and Gamma is a Gamma correction coefficient;
② calculating the gradient of the horizontal and vertical coordinates of the image, and calculating the gradient direction value of each pixel position;
Gx(x,y)=H(x+1,y)-H(x-1,y);
Gy(x,y)=H(x,y+1)-H(x,y-1);
in the formula, Gx (x, y) and Gy (x, y) respectively represent the horizontal gradient and the vertical gradient of the pixel point (x, y) in the input image;
Figure BDA0002300823490000041
Figure BDA0002300823490000042
in the formula, G (x, y), H (x, y), α (x, y) respectively represent the gradient magnitude, pixel value and gradient direction of the pixel point at (x, y);
③ histogram calculation, dividing the image into small cell units, providing a code for the local image area;
④ grouping the cell units into large blocks, normalized gradient histograms within the blocks;
⑤ collect the HOG features of all overlapped blocks in the detection window and combine them into the final feature vector for classification.
Further, the step of tracking the target in real time comprises:
(1) obtaining a first frame from a video sequence as a template frame O1Obtaining a search area Z using the current frametSeparately obtaining f via a CIRESNet-16 networkl(O1) and fl(Zt);
(2) The network adds a transform matrix V and a transform matrix W, both of which can be computed quickly in the frequency domain by FFT. The transformation matrix V is obtained by the tracking result of the t-1 th frame and the first frame target, acts on the convolution characteristic of the target template, learns the change of the target to ensure that the convolution characteristic of the template at the t-th moment is approximately equal to the convolution characteristic of the template at the t-1 th moment, and smoothes the change of the current frame relative to the previous frames;
the transformation matrix W is obtained from the tracking result of the t-1 frame, acts on the convolution characteristic of the candidate region at the t moment, and learns background suppression so as to eliminate the influence caused by irrelevant background characteristics in the target region;
for the transformation matrix V and the transformation matrix W, training is performed using canonical linear regression, fl(O1) and fl(Zt) Respectively obtaining after transforming the matrix
Figure BDA0002300823490000043
And
Figure BDA0002300823490000044
wherein "+" represents a cyclic convolution operation,
Figure BDA0002300823490000045
representing the change of the appearance form of the target to obtain the target template after the current update,
Figure BDA0002300823490000046
representing background suppression transformation to obtain a more suitable current search template; the final model is as follows:
Figure BDA0002300823490000051
adding two transformation matrixes of a smooth matrix V and a background suppression W into the final model on the basis of the twin network, wherein the smooth matrix V learns the appearance change of the previous frame; the background suppression matrix W eliminates clutter in the background.
Further, in step two, the dynamic updating twin network based on CIResNet includes:
performing 7 multiplied by 7 convolution after the cutting operation to delete the characteristics influenced by the filling;
(II) entering an improved network CIRESNet unit after passing through a maximum pooling layer with the stride of 2, wherein the CIR unit stage network has 3 layers in total, the first layer is 1 multiplied by 1 convolution, and the number of channels is 64; the second layer is a 3 × 3 convolution, and the number of channels is 64; the third layer is 1 multiplied by 1 convolution, and the number of channels is 256; adding the feature graph after passing through the convolutional layer, and then entering a crop operation, wherein the crop operation is a 3 multiplied by 3 convolution and offsets the feature that padding is 1 influence;
(III) entering a CIR-D unit, wherein the CIR-D unit stage network has a total of 12 layers, and the first layer, the second layer and the third layer are taken as unit blocks to circulate for 4 times; the first layer is 1 × 1 convolution, and the number of channels is 128; the second layer is a 3 × 3 convolution, and the number of channels is 128; the third layer is 1 multiplied by 1 convolution, and the number of channels is 512;
(IV) cross-correlation operations: the improved twin network structure takes an image pair as input, including an example image Z and a candidate search image X; image Z represents an object of interest, while X represents a search area in a subsequent video frame, typically larger; both inputs were processed by ConvNet with parameter θ; two feature maps are generated, the cross-correlation being:
Figure BDA0002300823490000052
b represents a deviation term, and the formula searches the image X in a mode of Z so that the maximum value in the response image f is matched with the target position; the network was trained offline by means of random image pairs (Z, X) and corresponding ground labels y obtained from training videos, and the parameter θ in ConvNet was obtained by minimizing the following loss parameters in the training set:
Figure BDA0002300823490000053
the basic formula of the loss function is:
l(y,v)=log(1+exp(-yv));
where y ∈ (+1, -1) represents a true value, and v represents a sample search imageThe actual score of (a); from the sigmoid function, the above formula represents the probability of a positive sample as
Figure BDA0002300823490000061
Probability of negative example is
Figure BDA0002300823490000062
Then the following is easily obtained from the formula of the cross entropy:
Figure BDA0002300823490000063
further, in step (iii), the first block of the CIR-D unit stage is down-sampled by the proposed CIR-D unit, and the number of filters is doubled after down-sampling the feature size; changing the step length of the volume in the bottleneck layer and the quick connection layer from 2 to 1 by CIR-D, and inserting and cutting again after adding operation to delete the characteristics influenced by filling; finally, performing spatial down-sampling of the feature map using maximum pooling; the spatial size of the output feature map is 7 × 7, each feature receiving information from an area of 77 × 77 pixels in size on the input image plane; performing addition operation on the feature graph passing through the convolution layer, and then entering a crop operation and a maximum pooling layer; the key idea of these modifications is to ensure that only the functions affected by the padding are deleted while keeping the inherent block structure unchanged.
Further, in the step two, in the real-time tracking of the target by adopting the dynamic updating twin network based on the CIRESNet, the dynamic updating algorithm comprises the following steps:
(1) inputting a picture to obtain a template image O1;
(2) determining a candidate frame search area Zt in a frame to be tracked;
(3) mapping the original image to a specific feature space through feature mapping to respectively obtain fl(O1) and fl(Zt) These two depth features;
(4) learning the change of the tracking result of the previous frame and the template frame of the first frame according to the RLR;
Figure BDA0002300823490000064
fast calculation in the frequency domain can result in:
Figure BDA0002300823490000065
thereby obtaining the variation
Figure BDA0002300823490000066
As follows:
Figure BDA0002300823490000067
in this connection, it is possible to use,
Figure BDA0002300823490000071
wherein O represents the target, f represents the matrix, the upper right index represents the l channel, and the lower right index represents the several frames, namely, the tracking result of the previous frame and the target of the first frame are obtained;
(5) obtaining the suppression quantity of the current frame background according to the RLR calculation formula in the frequency domain
Figure BDA0002300823490000072
Figure BDA0002300823490000073
wherein ,Gt-1Is a map of the same size as the search area of the previous frame,
Figure BDA0002300823490000074
is to Gt-1Multiplying the central point of the picture by a Gaussian smoothing; learning object changes through online
Figure BDA0002300823490000075
And background suppression transform
Figure BDA0002300823490000076
(6) Element multi-layer feature fusion;
Figure BDA0002300823490000077
(7) joint training is performed, first by forward propagation, for a given N-frame video sequence { I }tTracking | t ═ 1.. times, N } to obtain N response graphs, and using { S }t1., N, while denoted by JtI t 1.., N } represents N target frames;
Figure BDA0002300823490000078
(8) gradient propagation and parameter updating using BPTT and SGD to obtain LtAll parameters; by
Figure BDA0002300823490000079
Calculate out
Figure BDA00023008234900000710
And
Figure BDA00023008234900000711
through the left-hand CirConv and RLR layers, efficient propagation of the loss gradient to f is ensuredl
Figure BDA00023008234900000712
Figure BDA00023008234900000713
Figure BDA00023008234900000714
Figure BDA00023008234900000715
Figure BDA00023008234900000716
wherein ,
Figure BDA00023008234900000717
f and E after representing Fourier transform are discrete Fourier transform matrixes, and for a multi-feature fusion formula, the discrete Fourier transform matrixes are converted into
Figure BDA00023008234900000718
The invention also aims to provide a dynamic update visual tracking aerial photography system based on the rotor wing flying robot, which implements the dynamic update visual tracking aerial photography method based on the rotor wing flying robot.
The invention further aims to provide an information data processing terminal for realizing the dynamic update visual tracking aerial photography method based on the rotor wing flying robot.
It is another object of the present invention to provide a computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method for dynamically updating a visual tracking aerial photography based on a rotary-wing flying robot.
In summary, the advantages and positive effects of the invention are: (1) by adopting a deeper CIREsNet network and a sample learning method, the classification standard is automatically established, the adaptability of a complex background is enhanced, and the effective extraction of more sample characteristics is met.
(2) According to the invention, the smooth transformation matrix V is added in the traditional twin network, the target appearance change of the previous frames can be learned on line, the space-time information is effectively utilized, and meanwhile, the background suppression matrix W is added, so that the influence of background disorder factors can be effectively controlled.
(3) The first frame is not singly used as a standard reference, and dynamic tracking is performed by using appearance learning and background suppression, so that the problems of shielding and the like can be effectively solved.
(4) The precision and the overlapping rate are both increased, and the speed can reach 16fps, thereby basically meeting the real-time requirement.
Table 1: tracking index comparisons
Tracking device Accuracy of measurement Overlap ratio Speed (fps)
Ours 0.5512 0.2905 16.
SiamFC 0.5355 0.2889 65
DSiam 0.5414 0.2804 25
DSST 0.5078 0.1678 134
The algorithm is realized and debugged under an ubuntu16.04 operating system, and the hardware of the computer is configured into an IntelCorei7-8700k, a main frequency of 3.7GHz and a GeForce RTX2080TI display card.
According to the dynamic update visual tracking aerial photography method based on the rotor wing flying robot, the CIRESNet network is used for replacing the original AlexNet network, and compared with the AlexNet network, the network level is deeper, and the characteristic acquisition of a target is facilitated. Compared with the traditional twin network, the method adds the target appearance change of the previous frames of the online learning of the smooth transformation matrix V, effectively utilizes the space-time information, and simultaneously adds the background suppression matrix W to effectively control the influence of background clutter factors. The method provided by the invention selects a deeper network to obtain the target characteristics instead of singly taking the first frame as a standard reference, and uses appearance learning and background suppression to perform dynamic tracking, thereby effectively increasing the robustness.
Drawings
Fig. 1 is a flowchart of a method for dynamically updating a visual tracking aerial photography based on a rotor flying robot according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a dynamic update visual tracking aerial photography method based on a rotor flying robot according to an embodiment of the present invention.
Fig. 3 is a block diagram of a detecting section provided in the embodiment of the present invention.
Fig. 4 is a block diagram of a tracking section provided by an embodiment of the present invention.
Fig. 5 is a schematic diagram illustrating a CIResNet network according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of a single-layer network structure according to an embodiment of the present invention.
Figure 7 is a diagram of the results on the UAV data set provided by embodiments of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The tracking flight of the rotor flying robot refers to that a camera is carried on the rotor flying robot flying at low altitude, an image frame sequence of a ground moving target is obtained in real time, image coordinates of the target are calculated and used as input of visual servo control, the speed required by an aircraft is obtained, the position and the posture of the rotor flying robot are further automatically controlled, and the tracked ground moving target is maintained near the visual field center of the camera. The traditional twin network tracking method is good in real-time performance, but when the target is lost due to target shielding and the influence of complex background or illumination is caused, the situation that the target cannot be correctly tracked may occur by taking the first frame as a standard reference.
Aiming at the problems in the prior art, the invention provides a dynamic update visual tracking aerial photography method based on a rotor flying robot, a CIRESNet network is used for replacing an original AlexNet network, and compared with the AlexNet network, the CIRESNet network has deeper network hierarchy and is beneficial to the feature acquisition of a target. Compared with the traditional twin network, the method adds the target appearance change of the previous frames of the online learning of the smooth transformation matrix V, effectively utilizes the space-time information, and simultaneously adds the background suppression matrix W to effectively control the influence of background clutter factors. The method provided by the invention selects a deeper network to obtain the target characteristics instead of singly taking the first frame as a standard reference, and uses appearance learning and background suppression to perform dynamic tracking, thereby effectively increasing the robustness. The present invention will be described in detail below with reference to the accompanying drawings.
As shown in fig. 1, a dynamic update visual tracking aerial photography method based on a rotor flying robot provided by an embodiment of the present invention includes the following steps:
s101: and performing target detection on the input image by using an HOG (histogram of ordered gradient) feature + Support Vector Machine (SVM) algorithm.
Even if the corresponding gradient and edge position information of the target in the image is unknown, the appearance and the shape of the target are still described by using the distribution of local gradient or edge direction. The HOG feature is used as a basis for constructing feature description by calculating and counting a gradient direction histogram of a target region, and the principle can keep good invariance on geometric change and optical deformation of an image.
Firstly, an image is divided into a plurality of connected regions, usually a unit (cell) of 8 x 8 pixels, which is called a cell unit, then the gradient amplitude and direction of each pixel point in the cell unit are collected, the gradient direction of [ -90 degrees and 90 degrees ] is averagely divided into 9 intervals (bins), then histogram statistics is carried out on the gradient amplitude of each pixel in the unit in each bin interval, and a one-dimensional gradient direction histogram is obtained. To improve the invariance of features to illumination and shadows, histograms need to be contrast normalized, usually by normalizing them over a larger range. We first calculate the density of each histogram in this bin and then normalize each cell unit in the bin according to this density, where the normalized block descriptor is called the HOG descriptor.
And combining the HOG description groups of all the blocks in the detection window to form a final feature vector, and then carrying out target detection by using an SVM classifier. FIG. 3 depicts the feature extraction and target detection process, with the detection window divided into overlapping blocks, HOG descriptors computed for these blocks, and the resulting feature vectors placed into a linear SVM for target/non-target dichotomy. The detection window scans all positions and scales of the whole image, and performs non-maximum suppression on the output pyramid to detect the target.
S102: target frame information obtained by target detection is transmitted to a visual tracking part, a dynamic update twin network based on CIRESNet is adopted to track the target in real time, and a tracking frame is shown in figure 4.
Obtaining a first frame from a video sequence as a template frame O1Obtaining a search area Z using the current frametSeparately obtaining f via a CIRESNet-16 networkl(O1) and fl(Zt)。
The end result of the conventional twin network is expressed as follows:
Figure BDA0002300823490000111
the result of this formula calculation is a similarity where corr represents the correlation filtering, which can be replaced by other metric functions,trepresents time, and l represents the l-th layer.
Unlike the traditional Simese network, the network proposed by the invention adds two transformation matrixes, wherein the first transformation matrix V acts on the convolution characteristic of the target template, so that the convolution characteristic of the template at the t-th moment is approximately equal to the convolution characteristic of the template at the t-1 th moment, and the transformation matrix is learned from the t-1 th frame and is considered as smooth deformation of the target. The second transformation matrix W acts on the convolution features of the candidate region at time t to emphasize that the target region eliminates irrelevant background features.
For the transformation matrices V and W, the invention uses canonical linear regression for training, fl(O1) and fl(Zt) Respectively obtaining after transforming the matrix
Figure BDA0002300823490000112
And
Figure BDA0002300823490000113
wherein "+" represents a cyclic convolution operation,
Figure BDA0002300823490000114
represents the change of the appearance form of the target,
Figure BDA0002300823490000115
representing the background suppression transform. The final model is as follows:
Figure BDA0002300823490000116
the model adds two transformation matrixes of smoothing and background suppression on the basis of the twin network, and the smoothing matrix learns the appearance change of the previous frame, so that the spatiotemporal information can be effectively utilized; the background suppression matrix eliminates clutter influence factors in the background and enhances robustness. Meanwhile, the CIRESNet-16 network is used for replacing an AlexNet network in the traditional twin network, and the precision is higher.
Fig. 2 is a schematic diagram of a dynamic update visual tracking aerial photography method based on a rotor flying robot according to an embodiment of the present invention.
The detailed description of the HOG feature extraction in step S101 is:
1) to reduce the influence of the illumination factor, the whole image needs to be normalized first. In the texture intensity of the image, the local exposure contribution of the surface layer has a large proportion, so that the local shadow and illumination change of the image can be effectively reduced by performing compression processing. The image is typically converted to a gray scale map where the color space of the input image is normalized (or normalized) using Gamma correction. The Gamma correction is understood to be that the image contrast effect of a dark or bright part in an image is improved, and the local shadow and illumination change of the image can be effectively reduced, and the Gamma correction formula is as follows:
f(I)=Iγ(3)
where I is the image pixel value and γ is the Gamma correction factor.
2) Calculating the gradients of the horizontal coordinate and the vertical coordinate of the image, and calculating the gradient direction value of each pixel position according to the gradients; the derivation operation can capture contour and some texture information, and can further weaken the influence of illumination;
Gx(x,y)=H(x+1,y)-H(x-1,y) (4)
Gy(x,y)=H(x,y+1)-H(x,y-1) (5)
in the above formula, Gx (x, y), Gy (x, y) respectively represent the horizontal gradient and the vertical gradient at the pixel point (x, y) in the input image.
Figure BDA0002300823490000121
Figure BDA0002300823490000122
G (x, y), H (x, y), α (x, y) respectively represent the gradient magnitude, pixel value and gradient direction of the pixel point at (x, y).
3) And (3) histogram calculation: the image is divided into small cell units (which may be rectangular or circular) with the purpose of providing a code for the local image area.
4) The cell units are grouped into large blocks (blocks) with gradient histograms normalized within the blocks.
5) All overlapping blocks in the detection window are collected for HOG features and combined into a final feature vector for classification.
The detailed description of the improved network CIResNet-16 in step S102 is as follows:
the CIRESNet-16 is divided into three stages (step size 8) and consists of 18 weighted convolutional layers.
(1) The features affected by the fill are removed by a crop operation (size 2) followed by a 7 x 7 convolution.
(2) After passing through the maximum pooling layer with the stride of 2, entering an improved network CIRESNet unit, wherein the CIR unit is a network with a total of 3 layers in the stage as shown in (a) in FIG. 5, the first layer is 1 multiplied by 1 convolution, and the number of channels is 64; the second layer is a 3 × 3 convolution, and the number of channels is 64; the third layer is a 1 × 1 convolution and the number of channels is 256. As described in fig. 5, the feature map after passing through the convolutional layer is subjected to an addition operation and then enters a crop operation, which is a 3 × 3 convolution, so as to cancel the feature with the effect of padding being 1.
(3) Enter CIR-D (Downsampling CIR) unit as shown in FIG. 5 (b), which is a total of 12 layers of the network at this stage, and the first, second and third layers are used as unit blocks to cycle for 4 times. Wherein the first layer is 1 × 1 convolution, and the number of channels is 128; the second layer is a 3 × 3 convolution, and the number of channels is 128; the third layer is 1 × 1 convolution and the number of channels is 512.
The first block at this stage (4 blocks in total) is down-sampled by the proposed CIR-D unit, and after down-sampling the feature map size, the number of filters will be doubled to improve feature resolvability. CIR-D changes the step of the convolution in the bottleneck and quick connect layers from 2 to 1, and inserts the cut again after the add operation to remove the features affected by the fill. Finally, spatial down-sampling of the feature map is performed using maximum pooling. The spatial size of the output feature map is 7 × 7, and each feature receives information from an area of 77 × 77 pixels in size on the input image plane. As shown in fig. 5, the feature map after passing through the convolutional layer is subjected to an addition operation and then enters a crop operation and a maximum pooling layer. The key idea of these modifications is to ensure that only the functions affected by the padding are deleted while keeping the inherent block structure unchanged.
(4) Cross-correlation operations:
the improved twin network structure takes as input an image pair, comprising an example image Z and a candidate search image X. Image Z represents an object of interest (e.g., an image block centered on the target object in the first video frame), while X represents a search area, typically larger, in subsequent video frames. Both inputs are processed by ConvNet with parameter theta. This will produce two signatures that are cross-correlated as:
Figure BDA0002300823490000141
where b represents the bias term and the whole formula corresponds to an exhaustive search in the Z mode for the image X, with the aim of matching the maximum in the response map f with the target position. To achieve this goal, the network is trained offline by means of random image pairs (Z, X) and corresponding ground labels y obtained from training videos, the parameter θ in ConvNet being obtained by minimizing the following loss parameters in the training set:
Figure BDA0002300823490000142
the basic formula of the loss function is:
l(y,v)=log(1+exp(-yv)) (10)
where y ∈ (+1, -1) represents the true value and v represents the actual score of the sample search image. From the sigmoid function, the above formula represents the probability of a positive sample as
Figure BDA0002300823490000143
Probability of negative example is
Figure BDA0002300823490000144
Then the following is easily obtained from the formula of the cross entropy:
Figure BDA0002300823490000145
the step of dynamically updating the algorithm in step S102 is:
(1) inputting a picture to obtain a template image O1;
(2) determining a candidate frame search area Zt in a frame to be tracked;
(3) mapping the original image to a specific feature space through feature mapping to respectively obtain fl(O1) and fl(Zt) These two depth features;
(4) learning the change of the tracking result of the previous frame and the template frame of the first frame according to a Regulated Linear Regression (RLR);
Figure BDA0002300823490000146
fast calculation in the frequency domain can result in:
Figure BDA0002300823490000151
thereby obtaining the variation
Figure BDA0002300823490000152
As follows:
Figure BDA0002300823490000153
in this connection, it is possible to use,
Figure BDA0002300823490000154
where O denotes the target, f denotes a matrix, the upper right index denotes the ith channel, and the lower right index denotes the several frames, i.e. the tracking result of the previous frame and the target of the first frame.
(5) Obtaining the suppression quantity of the current frame background according to the RLR calculation formula in the frequency domain
Figure BDA0002300823490000155
Figure BDA0002300823490000156
wherein ,Gt-1Is a map of the same size as the search area of the previous frame,
Figure BDA0002300823490000157
is to Gt-1The picture center point is multiplied by a gaussian smoothing, the purpose of which is to highlight the center and suppress the edges. Learning object changes through online
Figure BDA0002300823490000158
And background suppression transform
Figure BDA0002300823490000159
The improved model can improve the tracking precision and the real-time speed by starting the adaptive capacity of the static twin network on line.
(6) Element multi-layer feature fusion;
Figure BDA00023008234900001510
the center weight of the shallow feature is high, the peripheral weight of the deep feature is high, the center of the deep feature is low, if the target is in the center of the search area, the shallow feature can better locate the target, and if the target is in the periphery of the search area, the deep feature can also effectively determine the position of the target.
That is, when the target is close to the center of the search area, the deeper layer features are helpful for eliminating background interference, and the shallower layer features are helpful for obtaining accurate positioning of the target; and if the target is positioned at the periphery of the search area, only the deeper layer characteristics can effectively determine the position of the target.
(7) Joint training is performed, first by forward propagation, for a given N-frame video sequence { I }tTracking | t ═ 1.. times, N } to obtain N response graphs, and using { S }t1., N, while denoted by JtI t 1.., N } represents N target frames;
Figure BDA0002300823490000161
(8) a schematic diagram of a single layer network architecture is shown in fig. 6. Where "Eltwise" (elementary multi-layer fusion) is trained to a matrix γ, where the values in the matrix represent the weights for different positions of different feature maps. BPTT (backward propagation time) and SGD (Stochastic Gradient Description) are used for Gradient propagation and parameter updating. To effectively use BPTT and random gradient (SGD) trained networks, L must be obtainedtAll parameters, as shown in FIG. 6, are represented by
Figure BDA0002300823490000162
Calculate out
Figure BDA0002300823490000163
And
Figure BDA0002300823490000164
then through the left "CirConv" and "RLR" layers to ensure that the loss gradient can propagate efficiently to fl
Figure BDA0002300823490000165
Figure BDA0002300823490000166
Figure BDA0002300823490000167
Figure BDA0002300823490000168
Figure BDA0002300823490000169
wherein ,
Figure BDA00023008234900001610
f, E after fourier transform is a discrete fourier transform matrix, and can also be calculated using the above process for cell-based multi-layer fusion. For multi-feature fusion formula, the method can be converted into
Figure BDA00023008234900001611
The model has reliable online adaptability, effectively learns foreground and background changes and inhibits background interference, does not damage real-time response capability, and has excellent balance tracking performance in experiments. In addition, the model is directly trained on the marked video sequence as a whole instead of on the image pair, so that rich space-time information of a moving object can be better captured. Meanwhile, the model uses joint training, and all parameters can be subjected to offline learning through back propagation, so that data training is facilitated. The specific effect is shown in fig. 7.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A dynamic update visual tracking aerial photography method based on a rotor wing flying robot is characterized by comprising the following steps:
firstly, carrying out target detection on an input image by using an HOG feature extraction algorithm and a support vector machine algorithm SVM;
and step two, transmitting target frame information obtained by target detection to a visual tracking part, and tracking the target in real time by adopting a dynamic updating twin network based on a CIRESNet network.
2. The dynamic update vision tracking aerial photography method based on the rotor flying robot as claimed in claim 1, wherein in the step one, the target detection method comprises:
(1) dividing the image into a plurality of connected regions which are 8 multiplied by 8 pixel cell units;
(2) collecting gradient amplitude and gradient direction of each pixel point in a cell unit, averagely dividing the gradient direction of [ -90 degrees, 90 degrees ] into 9 intervals (bin), and using the gradient amplitude as weight;
(3) performing histogram statistics on the gradient amplitude of each pixel in the unit in each direction bin interval to obtain a one-dimensional gradient direction histogram;
(4) performing contrast normalization on the histogram on the spatial block;
(5) extracting HOG descriptors through a detection window, and combining the HOG descriptors of all blocks in the detection window to form a final feature vector;
(6) inputting the feature vector into a linear SVM, and performing target detection by using an SVM classifier;
(7) dividing a detection window into overlapped blocks, calculating HOG descriptors for the blocks, and putting formed feature vectors into a linear SVM for target/non-target binary classification;
(8) scanning the detection window at all positions and scales of the whole image, and carrying out non-maximum suppression on the output pyramid to detect a target;
the method for carrying out contrast normalization on the histogram in the step (4) comprises the following steps:
the density of each histogram in this bin is first calculated and then normalized for each cell unit in the bin based on this density.
3. The dynamic update vision tracking aerial photography method based on the rotor flying robot as claimed in claim 1, wherein in step one, the HOG feature extraction method specifically comprises:
① normalizing the whole image, and normalizing the color space of the input image by Gamma correction method, wherein the Gamma correction formula is as follows:
f(I)=Iγ
wherein, I is an image pixel value, and Gamma is a Gamma correction coefficient;
② calculating the gradient of the horizontal and vertical coordinates of the image, and calculating the gradient direction value of each pixel position;
Gx(x,y)=H(x+1,y)-H(x-1,y);
Gy(x,y)=H(x,y+1)-H(x,y-1);
in the formula, Gx (x, y) and Gy (x, y) respectively represent the horizontal gradient and the vertical gradient of the pixel point (x, y) in the input image;
Figure FDA0002300823480000021
Figure FDA0002300823480000022
in the formula, G (x, y), H (x, y), α (x, y) respectively represent the gradient magnitude, pixel value and gradient direction of the pixel point at (x, y);
③ histogram calculation, dividing the image into small cell units, providing a code for the local image area;
④ grouping the cell units into large blocks, normalized gradient histograms within the blocks;
⑤ collect the HOG features of all overlapped blocks in the detection window and combine them into the final feature vector for classification.
4. A method for dynamically updating vision tracking aerial photography based on rotor flying robots as claimed in claim 1 wherein the step of tracking the targets in real time comprises:
(1) obtaining a first frame from a video sequence as a template frame O1Obtaining a search area Z using the current frametSeparately obtaining f via a CIRESNet-16 networkl(O1) and fl(Zt);
(2) The network adds a transform matrix V and a transform matrix W, both of which are computed rapidly in the frequency domain by FFT. The transformation matrix V is obtained by the tracking result of the t-1 th frame and the first frame target, acts on the convolution characteristic of the target template, learns the change of the target to ensure that the convolution characteristic of the template at the t-th moment is approximately equal to the convolution characteristic of the template at the t-1 th moment, and smoothes the change of the current frame relative to the previous frames; the transformation matrix W is obtained from the tracking result of the t-1 frame, acts on the convolution characteristic of the candidate region at the t moment, and learns the influence caused by irrelevant background characteristics in the target region by background suppression and elimination;
for the transformation matrix V and the transformation matrix W, training is performed using canonical linear regression, fl(O1) and fl(Zt) Respectively obtaining after transforming the matrix
Figure FDA0002300823480000031
And
Figure FDA0002300823480000032
wherein "+" represents a cyclic convolution operation,
Figure FDA0002300823480000033
representing the change of the appearance form of the target to obtain the target template after the current update,
Figure FDA0002300823480000034
representing background suppression transformation to obtain a more suitable current search template; the final model is as follows:
Figure FDA0002300823480000035
adding two transformation matrixes of a smooth matrix V and a background suppression W into the final model on the basis of the twin network, wherein the smooth matrix V learns the appearance change of the previous frame; the background suppression matrix W eliminates clutter in the background.
5. The method of claim 1, wherein the step two, the dynamic updating twin network based on CIResNet comprises:
performing 7 multiplied by 7 convolution after the cutting operation to delete the characteristics influenced by the filling;
(II) entering an improved network CIRESNet unit after passing through a maximum pooling layer with the stride of 2, wherein the CIR unit stage network has 3 layers in total, the first layer is 1 multiplied by 1 convolution, and the number of channels is 64; the second layer is a 3 × 3 convolution, and the number of channels is 64; the third layer is 1 multiplied by 1 convolution, and the number of channels is 256; adding the feature graph after passing through the convolutional layer, and then entering a crop operation, wherein the crop operation is a 3 multiplied by 3 convolution and offsets the feature that padding is 1 influence;
(III) entering a CIR-D unit, wherein the CIR-D unit stage network has a total of 12 layers, and the first layer, the second layer and the third layer are taken as unit blocks to circulate for 4 times; the first layer is 1 × 1 convolution, and the number of channels is 128; the second layer is a 3 × 3 convolution, and the number of channels is 128; the third layer is 1 multiplied by 1 convolution, and the number of channels is 512;
(IV) cross-correlation operations: the improved twin network structure takes an image pair as input, including an example image Z and a candidate search image X; image Z represents an object of interest, while X represents a search area in a subsequent video frame, typically larger; both inputs were processed by ConvNet with parameter θ; two feature maps are generated, the cross-correlation being:
Figure FDA0002300823480000036
b represents a deviation term, and the formula searches the image X in a mode of Z so that the maximum value in the response image f is matched with the target position; the network was trained offline by means of random image pairs (Z, X) and corresponding ground labels y obtained from training videos, and the parameter θ in ConvNet was obtained by minimizing the following loss parameters in the training set:
Figure FDA0002300823480000041
the basic formula of the loss function is:
l(y,v)=log(1+exp(-yv));
wherein y ∈ (+1, -1) represents a true value, and v represents an actual score of the sample search image; from the sigmoid function, the above formula represents the probability of a positive sample as
Figure FDA0002300823480000042
Probability of negative example is
Figure FDA0002300823480000043
Then the following is easily obtained from the formula of the cross entropy:
Figure FDA0002300823480000044
6. a method for dynamically updating vision tracking aerial photography based on rotor-flying robots as claimed in claim 5 wherein in step (iii) the first block of the CIR-D unit stage is downsampled by the proposed CIR-D unit and the number of filters is doubled after downsampling the signature size; changing the step length of the volume in the bottleneck layer and the quick connection layer from 2 to 1 by CIR-D, and inserting and cutting again after adding operation to delete the characteristics influenced by filling; finally, performing spatial down-sampling of the feature map using maximum pooling; the spatial size of the output feature map is 7 × 7, each feature receiving information from an area of 77 × 77 pixels in size on the input image plane; performing addition operation on the feature graph passing through the convolution layer, and then entering a crop operation and a maximum pooling layer; the key idea of these modifications is to ensure that only the functions affected by the padding are deleted while keeping the inherent block structure unchanged.
7. The method according to claim 1, wherein in step two, the dynamic update twin network based on CIResNet is used for real-time tracking of the target, and the dynamic update algorithm comprises:
(1) inputting a picture to obtain a template image O1;
(2) determining a candidate frame search area Zt in a frame to be tracked;
(3) mapping the original image to a specific feature space through feature mapping to respectively obtain fl(O1) and fl(Zt) These two depth features;
(4) learning the change of the tracking result of the previous frame and the template frame of the first frame according to the RLR;
Figure FDA0002300823480000051
fast calculation in the frequency domain yields:
Figure FDA0002300823480000052
thereby obtaining the variation
Figure FDA0002300823480000053
As follows:
Figure FDA0002300823480000054
wherein ,f1 l=fl(O1),
Figure FDA0002300823480000055
Wherein O represents the target, f represents the matrix, the upper right index represents the l channel, and the lower right index represents the several frames, namely, the tracking result of the previous frame and the target of the first frame are obtained;
(5) obtaining the suppression quantity of the current frame background according to the RLR calculation formula in the frequency domain
Figure FDA0002300823480000056
Figure FDA0002300823480000057
wherein ,Gt-1Is a map of the same size as the search area of the previous frame,
Figure FDA0002300823480000058
is to Gt-1Multiplying the central point of the picture by a Gaussian smoothing; learning object changes through online
Figure FDA0002300823480000059
And background suppression transform
Figure FDA00023008234800000510
(6) Element multi-layer feature fusion;
Figure FDA00023008234800000511
(7) joint training is performed, first by forward propagation, for a given N-frame video sequence { I }tTracking | t ═ 1.. times, N } to obtain N response graphs, and using { S }t1., N, while denoted by JtI t 1.., N } represents N target frames;
Figure FDA00023008234800000512
(8) gradient propagation and parameter updating using BPTT and SGD to obtain LtAll parameters; by
Figure FDA00023008234800000513
Calculate out
Figure FDA0002300823480000061
And
Figure FDA0002300823480000062
through the left-hand CirConv and RLR layers, efficient propagation of the loss gradient to f is ensuredl
Figure FDA0002300823480000063
Figure FDA0002300823480000064
Figure FDA0002300823480000065
Figure FDA0002300823480000066
Figure FDA0002300823480000067
wherein ,
Figure FDA0002300823480000068
f and E after representing Fourier transform are discrete Fourier transform matrixes, and for a multi-feature fusion formula, the discrete Fourier transform matrixes are converted into
Figure FDA0002300823480000069
8. A dynamically updated visual tracking aerial photography system based on a rotary-wing flying robot implementing the dynamically updated visual tracking aerial photography method based on a rotary-wing flying robot of claim 1.
9. An information data processing terminal for realizing the dynamic update visual tracking aerial photography method based on the rotor wing flying robot according to any one of claims 1 to 7.
10. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method for dynamically updating vision-tracking aerial photography based on rotor-flying robots of any of claims 1 to 7.
CN201911220924.1A 2019-12-03 2019-12-03 Dynamic updating vision tracking aerial photographing method and system based on rotor flying robot Active CN110992378B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911220924.1A CN110992378B (en) 2019-12-03 2019-12-03 Dynamic updating vision tracking aerial photographing method and system based on rotor flying robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911220924.1A CN110992378B (en) 2019-12-03 2019-12-03 Dynamic updating vision tracking aerial photographing method and system based on rotor flying robot

Publications (2)

Publication Number Publication Date
CN110992378A true CN110992378A (en) 2020-04-10
CN110992378B CN110992378B (en) 2023-05-16

Family

ID=70089566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911220924.1A Active CN110992378B (en) 2019-12-03 2019-12-03 Dynamic updating vision tracking aerial photographing method and system based on rotor flying robot

Country Status (1)

Country Link
CN (1) CN110992378B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610888A (en) * 2021-06-29 2021-11-05 南京信息工程大学 Twin network target tracking method based on Gaussian smoothness
CN114863267A (en) * 2022-03-30 2022-08-05 南京邮电大学 Aerial tree number accurate statistical method based on multi-track intelligent prediction
CN115984333A (en) * 2023-02-14 2023-04-18 北京拙河科技有限公司 Smooth tracking method and device for airplane target
CN116088580A (en) * 2023-02-15 2023-05-09 北京拙河科技有限公司 Flying object tracking method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108898620A (en) * 2018-06-14 2018-11-27 厦门大学 Method for tracking target based on multiple twin neural network and regional nerve network
CN109272530A (en) * 2018-08-08 2019-01-25 北京航空航天大学 Method for tracking target and device towards space base monitoring scene
EP3506166A1 (en) * 2017-12-29 2019-07-03 Bull SAS Prediction of movement and topology for a network of cameras
CN109993774A (en) * 2019-03-29 2019-07-09 大连理工大学 Online Video method for tracking target based on depth intersection Similarity matching
CN110070562A (en) * 2019-04-02 2019-07-30 西北工业大学 A kind of context-sensitive depth targets tracking
CN110443827A (en) * 2019-07-22 2019-11-12 浙江大学 A kind of UAV Video single goal long-term follow method based on the twin network of improvement
CN110490906A (en) * 2019-08-20 2019-11-22 南京邮电大学 A kind of real-time vision method for tracking target based on twin convolutional network and shot and long term memory network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3506166A1 (en) * 2017-12-29 2019-07-03 Bull SAS Prediction of movement and topology for a network of cameras
CN108898620A (en) * 2018-06-14 2018-11-27 厦门大学 Method for tracking target based on multiple twin neural network and regional nerve network
CN109272530A (en) * 2018-08-08 2019-01-25 北京航空航天大学 Method for tracking target and device towards space base monitoring scene
CN109993774A (en) * 2019-03-29 2019-07-09 大连理工大学 Online Video method for tracking target based on depth intersection Similarity matching
CN110070562A (en) * 2019-04-02 2019-07-30 西北工业大学 A kind of context-sensitive depth targets tracking
CN110443827A (en) * 2019-07-22 2019-11-12 浙江大学 A kind of UAV Video single goal long-term follow method based on the twin network of improvement
CN110490906A (en) * 2019-08-20 2019-11-22 南京邮电大学 A kind of real-time vision method for tracking target based on twin convolutional network and shot and long term memory network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
林晓林 等: "基于机器学习的小目标检测与追踪的算法研究" *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610888A (en) * 2021-06-29 2021-11-05 南京信息工程大学 Twin network target tracking method based on Gaussian smoothness
CN113610888B (en) * 2021-06-29 2023-11-24 南京信息工程大学 Twin network target tracking method based on Gaussian smoothing
CN114863267A (en) * 2022-03-30 2022-08-05 南京邮电大学 Aerial tree number accurate statistical method based on multi-track intelligent prediction
CN115984333A (en) * 2023-02-14 2023-04-18 北京拙河科技有限公司 Smooth tracking method and device for airplane target
CN115984333B (en) * 2023-02-14 2024-01-19 北京拙河科技有限公司 Smooth tracking method and device for airplane target
CN116088580A (en) * 2023-02-15 2023-05-09 北京拙河科技有限公司 Flying object tracking method and device
CN116088580B (en) * 2023-02-15 2023-11-07 北京拙河科技有限公司 Flying object tracking method and device

Also Published As

Publication number Publication date
CN110992378B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN110222787B (en) Multi-scale target detection method and device, computer equipment and storage medium
CN110992378B (en) Dynamic updating vision tracking aerial photographing method and system based on rotor flying robot
CN106960446B (en) Unmanned ship application-oriented water surface target detection and tracking integrated method
CN109800689B (en) Target tracking method based on space-time feature fusion learning
CN113034548B (en) Multi-target tracking method and system suitable for embedded terminal
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
CN110675423A (en) Unmanned aerial vehicle tracking method based on twin neural network and attention model
CN108805906A (en) A kind of moving obstacle detection and localization method based on depth map
CN108446634B (en) Aircraft continuous tracking method based on combination of video analysis and positioning information
Wang et al. Window zooming–based localization algorithm of fruit and vegetable for harvesting robot
CN110766723B (en) Unmanned aerial vehicle target tracking method and system based on color histogram similarity
CN112818905B (en) Finite pixel vehicle target detection method based on attention and spatio-temporal information
CN110006444B (en) Anti-interference visual odometer construction method based on optimized Gaussian mixture model
CN112364865B (en) Method for detecting small moving target in complex scene
CN111160365A (en) Unmanned aerial vehicle target tracking method based on combination of detector and tracker
CN104715251A (en) Salient object detection method based on histogram linear fitting
CN109949229A (en) A kind of target cooperative detection method under multi-platform multi-angle of view
CN109389609B (en) Interactive self-feedback infrared target detection method based on FART neural network
Zou et al. Microarray camera image segmentation with Faster-RCNN
CN108345835B (en) Target identification method based on compound eye imitation perception
CN112651381A (en) Method and device for identifying livestock in video image based on convolutional neural network
CN117409339A (en) Unmanned aerial vehicle crop state visual identification method for air-ground coordination
CN104715476A (en) Salient object detection method based on histogram power function fitting
CN109635649B (en) High-speed detection method and system for unmanned aerial vehicle reconnaissance target
CN110287957B (en) Low-slow small target positioning method and positioning device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant