CN112927264B

CN112927264B - Unmanned aerial vehicle tracking shooting system and RGBD tracking method thereof

Info

Publication number: CN112927264B
Application number: CN202110207735.1A
Authority: CN
Inventors: 吴秋霞; 肖丰
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2022-12-16
Anticipated expiration: 2041-02-25
Also published as: CN112927264A

Abstract

The invention discloses an unmanned aerial vehicle tracking shooting system and an RGBD tracking method thereof. The storage and service module provides a uniform model training interface for the tracking task and is used for storing and optimizing the model; the tracking control module tracks a shot object in real time by using a target tracking algorithm, and unmanned aerial vehicle tracking shooting under a complex environment is realized; the data transmission module is used for transmitting the image data and the tracking control instruction. The invention realizes the automatic tracking shooting function of the unmanned aerial vehicle without manual control, can solve the problem that the unmanned aerial vehicle is easy to lose targets due to shielding factors in a complex shooting environment, and improves the usability of the unmanned aerial vehicle shooting technology in more scenes.

Description

Unmanned aerial vehicle tracking shooting system and RGBD tracking method thereof

Technical Field

The invention relates to the technical field of visual target tracking, in particular to an unmanned aerial vehicle tracking shooting system and an RGBD tracking method thereof.

Background

The unmanned aerial vehicle technology has been widely applied to a plurality of fields such as advertisement photography, express delivery transportation, city planning, wherein the unmanned aerial vehicle technique of taking photo by plane has advantages such as the definition is high, the area of shooing is big, the scene is limited little, has replaced the traditional means of shooing in many scenes. When using unmanned aerial vehicle to shoot moving object, need shoot the person and use ground equipment manual control unmanned aerial vehicle's flight state to accomplish with shoot the task and guarantee to shoot the quality usually. In recent years, artificial intelligence technology is rapidly developed, especially visual image recognition technology realizes landing of a large number of intelligent products in the fields of video monitoring, robot interaction and the like, and great convenience is provided for production and life. Nowadays, unmanned aerial vehicle shoots the technique and no longer is subject to manual control's flight mode through intelligent development, can break away from the automation of accomplishing the long time under the condition of manual operation and follow the task of clapping, and possess the automatic barrier function of keeping away based on binocular stereovision, ultrasonic ranging or infrared range finding.

The unmanned aerial vehicle tracking shooting technology is mainly characterized by good tracking performance, and along with the rise of machine learning and deep learning, the unmanned aerial vehicle mainly realizes the following shooting function through a visual target tracking algorithm with real-time performance at the present stage. Target tracking is always one of the hot spots in the field of computer vision research, and the target tracking research has been developed greatly in the past decades and is applied to the fields of intelligent vision navigation, modernization military, man-machine interaction, unmanned driving and the like. From the perspective of the appearance model, target tracking can be divided into a generative model and a discriminant model. The generative model is the earliest tracking method in the field of target tracking, firstly, the appearance of a target is modeled, then, a region with the minimum error is reconstructed from a candidate search region to be used as a prediction target, and a classical generative target tracking algorithm comprises particle filtering, mean shift and the like. The discriminant tracking algorithm introduces machine learning into target tracking, and simultaneously extracts the characteristics of a target and a background to be used as a binary task, and due to the multi-aspect limitations such as high calculation complexity of a generative model, the discriminant tracking algorithm represented by a correlation filtering network and a twin network becomes mainstream at present.

The optimization of a target tracking algorithm in the aspects of network, model, feature extraction and the like has achieved remarkable effect, but the shooting requirement of an unmanned aerial vehicle can appear in various complex environments, high-quality follow shooting still cannot be guaranteed in some specific scenes, and especially when the target is completely shielded or the background is similar to the target feature, the target is easily lost, so that the target deviates from a shooting center or exceeds a shooting visual angle. Since information such as color and texture of the surface of an object under a single viewing angle can be obtained from an RGB video image, 2D visual tracking has a great limitation. After the popularization of the depth camera, the research of target tracking is expanded to the 3D field, and the depth image acquired by the depth camera describes the distance between the surface of an object and a viewpoint, so that more tracking problems can be solved.

In conclusion, the RGBD target tracking can be realized by carrying a depth camera on the unmanned aerial vehicle, the shooting problem in a complex scene is solved by utilizing the depth image and the RGB image characteristics together, the high-efficiency target tracker is established based on the deep learning network to complete the rapid target positioning, and meanwhile, the target loss problem under the condition of long-term shielding or background interference is solved by combining the target repositioning algorithm based on characteristic point detection.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides an unmanned aerial vehicle tracking shooting system and an RGBD tracking method thereof.

In order to realize the purpose, the technical scheme provided by the invention is as follows: an unmanned aerial vehicle tracking shooting system, comprising:

the storage and service module is used for storing the image data and the model parameters, providing a uniform model training interface for the tracking task and finishing the updating and optimization of the model parameters in an off-line manner;

the tracking control module tracks a shot object in real time by using a target tracking algorithm, automatically generates a flight control instruction and realizes unmanned aerial vehicle tracking shooting in a complex environment;

and the data transmission module is used for completing a remote transmission task between the unmanned aerial vehicle and the ground control station, transmitting the RGB image and the depth image acquired by the unmanned aerial vehicle back to the ground control station in real time, and transmitting a flight control command generated after the image is analyzed to the unmanned aerial vehicle.

Further, the storage and service module comprises a model pre-training module, a model optimization module and a data storage module, wherein:

the model pre-training module is used for obtaining a pre-training model which can be used for tasks from a large-scale data set, constructing a tracking and positioning model M based on a twin network according to task requirements, randomly initializing model parameters of the tracking and positioning model M, enabling twin network double branches in the tracking and positioning model M to be rolling blocks with fixed layer numbers, inputting target images and search images containing target areas into the double branches, and outputting two feature maps F with the same channel number and different scales through the double branches ₁ And F ₂ Wherein the smaller scale of F ₁ As a convolution kernel with F ₂ Completing convolution operation to obtain a single-channel response matrix S, wherein S is mapped into a matrix S 'with the same scale as the input image, and the element value of S' represents the probability of the target center; acquiring a sample pair image in any large-scale data set, wherein the sample pair image does not contain depth information and is trained by using a normalized RGB image;

the model optimization module is used for carrying out fine adjustment on parameters of a pre-training model aiming at a specific task to obtain a tracking and positioning model meeting the task requirement, a model optimization interface provided by the model optimization module automatically loads task data for fine adjustment and the pre-training model specified by the corresponding task, the task data comprises a depth image and an RGB image, and the model optimization process uses an RGBD fusion image to train the loaded pre-training model; the depth image and the RGB image are fused as follows:

I _R',G',B' ＝I _R,G,B *k+I _D *(1-k) (1)

in the formula I _D Representing a single channel of a depth image, I _R,G,B Three color channels representing RGB images, k represents the weight of the channel fusion, I _{R’,G’,B’} Fusing the three color channels calculated for fusion, namely RGBD fused images;

the data storage module is used for storing a pre-training model, a specific task model and training data; the data storage module provides a model list, a data list and a calling interface of any model and data, and automatically calls the designated data or model when a training task or a tracking task is required.

Further, the tracking control module comprises a target positioning module, a shielding detection module and a target repositioning module, wherein:

the target positioning module is used for positioning a target center in a current shooting frame and is realized by a tracking positioning model constructed by a deep learning framework, the tracking positioning model is a model trained offline in a storage and service module and is specified as a tracking positioning model with a twin network structure for any tracking task, the model inputs RGBD target images tracked by a previous frame and RGBD search images to be detected in the current frame, a response matrix output by the model represents the probability of the target center appearing at each position, a point with the maximum probability is predicted as the target center, and the tracking positioning of the target in the current shooting frame is completed;

the shielding detection module is used for identifying the complete shielding state of the target, when the shooting object is completely shielded, the shielding is judged according to two similarity relations of depth information and color information in the area where the target characteristic can not be detected in the picture: target depth value range estimation is performed by the auxiliary box, whose width and height are defined as follows:

in the formula, w _a And h _a Width and height, w, of the auxiliary frame _p And h _p Respectively predicting the width and the height of a target frame, calculating a depth value set with large pixel ratio in the target frame and small pixel ratio in an auxiliary frame, selecting a value closest to a previous frame as target overall depth representation, and if the difference between the depth value of the current frame and the previous frame is higher than a threshold value, determining the current frame as shielding; establishing a depth image mask according to the depth value range, removing background pixel points with different depths,extracting color histograms of a previous frame target and a current frame predicted target as feature vectors, wherein the similarity is calculated as follows:

in the formula, sim represents an image I _t And image I _p Of similarity, hist _ti Representing an image I _t Of the ith color histogram bin, hist _pi Representing an image I _p Value of the ith color histogram bin of (2), num _t And Num _p Respectively representing images I _t And image I _p N represents the number of color histogram bins; if the similarity Sim is lower than the threshold value, the current frame is considered to be completely shielded, namely the target is not tracked in the current frame;

the target repositioning module is responsible for finding the target position again and recovering tracking after the target is lost by the tracker due to shielding, and the shielding is divided into two conditions of short-term shielding and long-term shielding: for short-time shielding, the target motion range is still in the tracker search area, and the target position is directly found by the target positioning module; for long-time shielding, feature points and descriptors in the template RGB image and the current complete RGB image are extracted by using an SIFT detection algorithm, points in a target depth range are screened out and feature point pairs are matched, the calculation time of invalid feature points is saved by using depth information, and the real-time performance required by tracking is guaranteed; then using a Gaussian model to find the most concentrated region of the matching points in the current frame, calculating the position relation between the points in the region and the original target center, and finally calculating the target center of the current frame; after the target is repositioned, the flight control system guides the unmanned aerial vehicle to solve the visual angle deviation under the target loss state, so that the shooting viewpoint of the unmanned aerial vehicle is continuously consistent with the target center.

The invention also provides an RGBD tracking method of the unmanned aerial vehicle tracking shooting system, which comprises the following steps:

s1, operating a system on unmanned aerial vehicle equipment, and entering a storage and service module; if the model is trained in the task scene for the first time, uploading the training data of the task scene through a data storage module, calling a model optimization interface in a model optimization module, and selecting a pre-training model to complete model optimization to obtain a tracking and positioning model for the task; if the model is not trained in the task scene for the first time, directly finding a tracking and positioning model corresponding to the task in a model list provided by a data storage module;

s2, when the unmanned aerial vehicle starts shooting, the data transmission module returns a shot image at a preset speed, an operator marks a target initial position in the first frame image through a human-computer interaction interface and transmits initial information to the tracking control module, and the tracking control module starts a target tracking algorithm and starts to calculate a target position in a subsequently received image; when analyzing the target position of the t-th frame image, two situations are needed: the target is tracked in the t-1 frame image and the target in the t-1 frame image is in a complete shielding state;

s3, if the target is tracked in the t-1 th frame image, the tracking control module directly calculates the target center through the target positioning module after receiving the information of the t-1 th frame image; performing RGBD fusion on a target image in the t-1 frame image and a search image in the t frame respectively, inputting a tracking and positioning model under a current task, outputting a response matrix by the tracking and positioning model, and predicting a point with the maximum response value as a target center;

s4, if the target is not tracked in the t-1 th frame image, the tracking control module enters a target repositioning module after receiving the information of the t-1 th frame image; the target repositioning module extracts image information of a t-k frame, wherein k is the number of frames which are not tracked to a target currently, and then calculates the target position of the t frame;

s5, the occlusion detection module carries out complete occlusion judgment on the t frame target position predicted by the target positioning module or the target repositioning module; if the depth information and the color information are both expressed as the incomplete target blocking state, the current predicted target position is saved as the t-th frame tracking result; if the target is completely shielded according to the depth information or the color information, the tracking result of the t-th frame is saved to be in a no-target state;

s6, the tracking control module generates a flight control instruction according to the target tracking result of the t frame to maintain the tracking flight of the unmanned aerial vehicle; the flight control strategy at the t-th frame time is divided into two cases: if the target is not tracked in the t-th frame, controlling the unmanned aerial vehicle to keep the current speed and state; and if the specific position of the target in the t-th frame is obtained, controlling the unmanned aerial vehicle to change the flying speed or direction according to the principle of keeping the target positioned in the center of the shot picture.

In step S4, the target relocation module calculates the target location into two cases: for the short-time shielding condition, the target repositioning module extracts a target image of a t-k frame, and then the target positioning module calculates the target position of the t frame; and for the long-time shielding condition, the target repositioning module performs characteristic point matching on the RGB image of the t-k frame as a template image and the RGB image of the t frame, and calculates the target position of the t frame according to the matching relation.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the shot object is continuously positioned by utilizing the RGBD target tracking algorithm, so that the unmanned aerial vehicle shooting is separated from manual control, any moving object can be automatically tracked and shot, the labor cost is saved, and convenience is provided for long-term shooting tasks in any scene.

2. Compared with a general two-dimensional model, the image data analysis method provided by the invention has the advantages that the distance information from a viewpoint to an observed object is increased by inputting the depth image, the calculation amount is not increased and the width and the depth of the original model are not changed when the fusion method is used, so that the tracking performance is improved and the real-time performance of the model is ensured.

3. The tracking and positioning model is based on the twin neural network, adopts a mode of pre-training on a large-scale data set and fine-tuning on specific task data, can complete a tracking task based on any shot object without retraining model parameters, and ensures the high efficiency and accuracy of tracking.

4. The invention provides a rapid and effective occlusion detection algorithm, the algorithm simultaneously utilizes the spatial information of the depth image and the color information of the RGB image to carry out similarity measurement to complete reliable occlusion detection, and particularly, when the depth image is used for judgment, depth segmentation is not needed, and the target depth is obtained by directly calculating the pixel point distribution rule through constructing an auxiliary frame.

5. The target relocation algorithm is designed aiming at the situation that the target is completely shielded in the complex environment, and the situation that the unmanned aerial vehicle is mistakenly tracked or loses the target due to the fact that the tracked object is shielded in the shooting process can be avoided. Aiming at the difficult long-term shielding condition, the algorithm rapidly relocates the target through an image matching method of SIFT feature points, so that the unmanned aerial vehicle can still perform stable tracking shooting in a complex shielding scene.

Drawings

Fig. 1 is a relationship diagram of each module of the unmanned aerial vehicle shooting system.

Fig. 2 is a schematic structural diagram of a tracking and positioning model.

FIG. 3 is a flow chart of the operation of the tracking control module.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

As shown in fig. 1, the unmanned aerial vehicle tracking shooting system provided by this embodiment includes a storage and service module, a tracking control module, and a data transmission module.

The storage and service module is used for storing image data and model parameters, providing a uniform model training interface for a tracking task, and completing the updating and optimization of the model parameters in an off-line manner; the method comprises a model pre-training module, a model optimization module and a data storage module, wherein:

the model pre-training module is used for obtaining a pre-training model which can be used for any task from a large-scale data set, constructing a tracking and positioning model M based on a twin network according to task requirements, randomly initializing model parameters of the tracking and positioning model M, enabling twin network double branches in the tracking and positioning model M to be convolution blocks with fixed layer numbers, inputting the double branches into a target image and a search image containing a target area, and outputting the double branches as phaseTwo feature maps F with same channel number and different scales ₁ And F ₂ Wherein F of smaller scale ₁ As a convolution kernel with F ₂ Completing convolution operation to obtain a single-channel response matrix S, wherein S is mapped into a matrix S 'with the same scale as the input image, and the element value of S' represents the probability of the occurrence of the target center; acquiring a sample pair image in any large-scale data set, wherein the sample pair image does not contain depth information and is trained by using a normalized RGB image;

I _R',G',B' ＝I _R,G,B *k+I _D *(1-k) (1)

The tracking control module tracks a shot object in real time by using a target tracking algorithm, automatically generates a flight control instruction, and realizes unmanned aerial vehicle tracking shooting under a complex environment; the device comprises a target positioning module, a shielding detection module and a target repositioning module, wherein:

the shielding detection module is used for identifying the complete shielding state of the target, when the shooting object is completely shielded, the shielding is judged according to two similarity relations of depth information and color information in the area where the target characteristic cannot be detected in the picture: the target depth value range estimation is performed by the auxiliary box, and the width and height of the auxiliary box are defined as follows:

in the formula, w _a And h _a Width and height, w, of the auxiliary frame _p And h _p Respectively predicting the width and the height of a target frame, calculating a depth value set with a large pixel proportion in the target frame and a small pixel proportion in an auxiliary frame, selecting a value closest to a previous frame as a target integral depth representation, and if the difference between the depth value of the current frame and the depth value of the previous frame is higher than a threshold value, regarding the current frame as shielding; establishing a depth image mask by using the depth value range, removing background pixel points with different depths, extracting color histograms of a previous frame target and a current frame predicted target as feature vectors, and calculating the similarity as follows:

in the formula, sim represents an image I _t And image I _p The degree of similarity of (a) to (b),Hist _ti representing an image I _t Of the ith color histogram bin, hist _pi Representing an image I _p Value of the ith color histogram bin of (2), num _t And Num _p Respectively represent images I _t And image I _p N represents the number of color histogram bins; if the similarity Sim is lower than the threshold value, the current frame is considered as completely shielded, namely the target is not tracked in the current frame;

the target repositioning module is responsible for finding the target position again and recovering tracking after the target is lost by the tracker due to shielding, and the shielding is divided into two conditions of short-term shielding and long-term shielding: for short-time shielding, the target motion range is still in the tracker search area, and the target position is directly found by the target positioning module; for long-time shielding, feature points and descriptors in the template RGB image and the current complete RGB image are extracted by using an SIFT detection algorithm, points in a target depth range are screened out and feature point pairs are matched, the calculation time of invalid feature points is saved by using depth information, and the real-time performance required by tracking is guaranteed; then using a Gaussian model to find the most concentrated region of the matching points in the current frame, calculating the position relation between the points in the region and the original target center, and finally calculating the target center of the current frame; after the target is repositioned, the flight control system guides the unmanned aerial vehicle to solve the visual angle deviation in the target loss state, so that the shooting viewpoint of the unmanned aerial vehicle is continuously consistent with the target center.

The data transmission module is used for completing a remote transmission task between the unmanned aerial vehicle and the ground control station, transmitting the RGB image and the depth image collected by the unmanned aerial vehicle back to the ground control station in real time, and transmitting a flight control command generated after the image is analyzed to the unmanned aerial vehicle.

The embodiment also provides an RGBD tracking method of the unmanned aerial vehicle tracking shooting system, which includes the following steps:

s1, operating a system on unmanned aerial vehicle equipment, and entering a storage and service module; if the model is trained in the task scene for the first time, uploading the training data of the task scene through a data storage module, calling a model optimization interface in a model optimization module, and selecting a pre-training model to complete model optimization to obtain a tracking and positioning model for the task; if the model is not trained in the task scene for the first time, the tracking and positioning model corresponding to the task is directly found in a model list provided by a data storage module.

S2, when the unmanned aerial vehicle starts to shoot, the data transmission module returns shot images at a preset speed, an operator marks a target initial position in a first frame of image through a human-computer interaction interface and transmits initial information to the tracking control module, and the tracking control module starts a target tracking algorithm and starts to calculate a target position in a subsequently received image; when analyzing the target position of the t-th frame image, two situations are needed: the target is tracked in the t-1 frame image and the target in the t-1 frame image is in a complete occlusion state.

S3, if the target is tracked in the t-1 th frame image, the tracking control module directly calculates the target center through the target positioning module after receiving the information of the t-1 th frame image; and respectively carrying out RGBD fusion on a target image in the t-1 frame image and a search image in the t frame, then inputting a tracking and positioning model under the current task, outputting a response matrix by the tracking and positioning model, and predicting a point with the maximum response value as a target center.

S4, if the target is not tracked in the t-1 th frame image, the tracking control module enters a target repositioning module after receiving the information of the t-1 th frame image; the target repositioning module extracts image information of a t-k frame, wherein k is the number of frames which are not tracked to a target at present, and then calculates the target position of the t frame; the target relocation module calculates the target location in two cases: for the short-time shielding condition, the target repositioning module extracts a target image of the t-k frame, and then the target positioning module calculates the target position of the t frame; and for the long-term shielding condition, the target repositioning module performs characteristic point matching on the RGB image of the t-k frame as a template image and the RGB image of the t frame, and calculates the target position of the t frame according to the matching relation.

S5, the occlusion detection module carries out complete occlusion judgment on the t frame target position predicted by the target positioning module or the target repositioning module; if the depth information and the color information are both expressed as the incomplete target blocking state, the current predicted target position is stored as the t-th frame tracking result; and if the target is completely shielded according to the depth information or the color information, storing the tracking result of the t-th frame as a no-target state.

S6, the tracking control module generates a flight control instruction according to the target tracking result of the t frame and maintains the tracking flight of the unmanned aerial vehicle; the flight control strategy at the t-th frame time is divided into two cases: if the target is not tracked in the t-th frame, controlling the unmanned aerial vehicle to keep the current speed and state; and if the specific position of the target in the t-th frame is obtained, controlling the unmanned aerial vehicle to change the flying speed or direction according to the principle of keeping the target positioned in the center of the shot picture.

The unmanned aerial vehicle tracking and shooting system of the embodiment completes the automatic tracking and shooting task, and comprises the following steps:

1) And training a tracking and positioning model according to a specific task, and finishing by a storage and service module. Firstly, transmitting a data set processed as a tracking sample pair, and training a tracking and positioning model after completing RGBD image fusion, or performing model training by using a pre-training model specified by a built model training interface and the data set. And transmitting the trained tracking and positioning model to a tracking control module, and waiting for calling in a storage space.

The tracking and positioning model is shown in fig. 2. The model input sample pair is composed of a template image and a search image, a target area is cut out from an image which is tracked to a target in the last time to be used as the template image, and an area which is twice as large as the template image is cut out from the same position of a current frame to be used as the search image. And then performing RGBD fusion operation on the template image and the search image respectively, and performing weighted addition on the single-channel depth image after being expanded into three channels and the three channels of the RGB image to obtain an RGBD fusion image. And inputting the fused sample into a target tracking and positioning model of the twin structure, calculating by a convolution block to obtain two characteristic graphs with the same channel number, performing correlation calculation on the two characteristic graphs to obtain a response graph, mapping the response graph to the corresponding position of the original image of the current frame, and predicting the point with the maximum response value as a target center. When model training is carried out, a logic loss function is used for calculating sample pair loss, a random gradient descent method is adopted for carrying out back propagation to realize model parameter updating optimization, training is stopped when a model is converged, and model parameters are stored in a server and wait for calling.

2) When the unmanned aerial vehicle starts shooting, a tracking object is appointed in a man-machine interaction mode, and the target position of an initial frame is stored. In the flight process, unmanned aerial vehicle acquires the image data of the current moment, wherein the image data include the depth image that unmanned aerial vehicle machine carries the depth camera and the RGB image that the optical camera was shot, and data transmission module transmits image data to the tracking control module and handles.

3) And the tracking control module completes target position prediction on the returned depth image and RGB image data, and generates a control instruction according to a tracking result so that the unmanned aerial vehicle continues to perform tracking shooting.

The processing flow of the base tracking control module is shown in fig. 3, and includes the following steps:

s1, obtaining template image information and current frame image information in a storage space, completing RGBD image fusion by using a depth image and an RGB image respectively, then inputting a tracking and positioning model to obtain a response image, mapping the response image back to an original search image, and taking a point with the maximum response value as a target center position.

S2, occlusion detection is carried out on the predicted target, and occlusion is judged according to two similarity relations: extracting a plurality of values of the target mapped on the depth image, selecting the value closest to the previous frame as the target integral depth representation, and if the difference between the depth value of the current frame and the previous frame is higher than a threshold value, regarding the current frame as shielding; establishing a depth image mask by using the depth value range, removing a large number of background pixel points, extracting color histograms of RGB foreground pixel points in a target frame and a current frame prediction frame as feature vectors, calculating the similarity of the feature vectors by using a Barn coefficient, and regarding the feature vectors as shielding if the similarity is lower than a threshold value. The occlusion determination result is divided into three cases: updating the template and feeding back a tracking result when the target is not shielded; temporarily recording the target as a target loss state when the target is shielded and is not shielded for a long time; and (4) the target is shielded for a long time and is processed by a target repositioning module.

And S3, if the target is judged to be in a shielding state for a long time, considering that the target exceeds the searching range of the tracking and positioning model, and starting a long-time shielding processing algorithm in the target repositioning module. And extracting feature points and descriptors in the template RGB image and the current complete RGB image by using an SIFT detection algorithm, screening out points in a target depth range and matching feature point pairs. And finding the most concentrated area of the matching points in the current frame by using a Gaussian model, calculating the position relation between the points in the area and the original target center, and then finding the target center of the current frame to recover tracking.

S4, if the target is completely shielded, performing state backtracking and template updating, and outputting a tracking result of the temporary target loss; and the target is not shielded or is relocated to the target after the shielding is finished, the current image information and the target position are recorded, the template image is updated to be the current target area, and the current target position is output.

4) The tracking control module obtains the calculation result and generates a control instruction, and sends the control instruction to the unmanned aerial vehicle, so that the automatic flight and tracking shooting of the unmanned aerial vehicle are realized.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such modifications are intended to be included in the scope of the present invention.

Claims

1. An unmanned aerial vehicle tracking shooting system, comprising:

the data transmission module is used for completing a long-distance transmission task between the unmanned aerial vehicle and the ground control station, transmitting the RGB image and the depth image acquired by the unmanned aerial vehicle back to the ground control station in real time, and transmitting a flight control command generated after the image is analyzed to the unmanned aerial vehicle;

the tracking control module comprises a target positioning module, a shielding detection module and a target repositioning module, wherein:

the target positioning module is used for positioning a target center in a current shooting frame and is realized by a tracking and positioning model constructed by a deep learning framework, the tracking and positioning model is a model trained offline in a storage and service module and is designated as a tracking and positioning model of a twin network structure for any tracking task, the model inputs RGBD target images tracked by a previous frame and RGBD search images to be detected in the current frame, a response matrix output by the model represents the probability of the target center appearing at each position, a point with the maximum probability is predicted as the target center, and the tracking and positioning of the target in the current shooting frame are completed;

the shielding detection module is used for identifying the complete shielding state of the target, when the shooting object is completely shielded, the shielding is judged according to two similarity relations of depth information and color information in the area where the target characteristic cannot be detected in the picture: target depth value range estimation is performed by the auxiliary box, whose width and height are defined as follows:

in the formula, w _a And h _a Width and height, w, of the auxiliary frame _p And h _p Respectively predicting the width and the height of a target frame, calculating a depth value set with a large pixel proportion in the target frame and a small pixel proportion in an auxiliary frame, selecting a value closest to a previous frame as a target integral depth representation, and if the difference between the depth value of the current frame and the depth value of the previous frame is higher than a threshold value, regarding the current frame as shielding; establishing a depth image mask according to the depth value range, removing background pixel points with different depths, and extracting the target sum of the previous frameThe color histogram of the current frame prediction target is a feature vector, and the similarity of the feature vector is calculated as follows:

in the formula, sim represents an image I _t And image I _p Similarity of (1), hist _ti Representing an image I _t The value of the ith color histogram bin of (1), hist _pi Representing an image I _p The value of the ith color histogram bin, num _t And Num _p Respectively representing images I _t And image I _p N represents the number of color histogram bins; if the similarity Sim is lower than the threshold value, the current frame is considered to be completely shielded, namely the target is not tracked in the current frame;

2. The unmanned aerial vehicle tracking and shooting system of claim 1, wherein: the storage and service module comprises a model pre-training module, a model optimization module and a data storage module, wherein:

the moldThe type pre-training module is used for obtaining a pre-training model which can be used for tasks from a large-scale data set, constructing a tracking and positioning model M based on a twin network according to task requirements, randomly initializing model parameters of the tracking and positioning model M, enabling twin network double branches in the tracking and positioning model M to be rolling blocks with fixed layer numbers, inputting target images and search images containing target areas into the double branches, and outputting two feature maps F with the same channel number and different scales through the double branches ₁ And F ₂ Wherein the smaller scale of F ₁ As convolution kernel with F ₂ Completing convolution operation to obtain a single-channel response matrix S, wherein S is mapped into a matrix S 'with the same scale as the input image, and the element value of S' represents the probability of the occurrence of the target center; acquiring a sample pair image in any large-scale data set, wherein the sample pair image does not contain depth information and is trained by using a normalized RGB image;

I _{R′,G′,B′} ＝I _R,G,B *k+I _D *(1-k) (1)

in the formula I _D Representing a single channel of a depth image, I _R,G,B Three color channels representing RGB image, k represents weight of channel fusion, I _{R’,G’,B’} Fusing the three color channels calculated for fusion, namely RGBD fused images;

3. An RGBD tracking method of the unmanned aerial vehicle tracking and shooting system according to any one of claims 1 to 2, characterized by comprising the following steps:

s2, when the unmanned aerial vehicle starts to shoot, the data transmission module returns shot images at a preset speed, an operator marks a target initial position in a first frame of image through a human-computer interaction interface and transmits initial information to the tracking control module, and the tracking control module starts a target tracking algorithm and starts to calculate a target position in a subsequently received image; when analyzing the target position of the t-th frame image, two situations are needed: the tracked target in the t-1 frame image and the target in the t-1 frame image are in a complete shielding state;

s3, if the target is tracked in the t-1 th frame of image, the tracking control module directly calculates the target center through the target positioning module after receiving the information of the t-1 th frame of image; performing RGBD fusion on a target image in the t-1 frame image and a search image in the t frame respectively, inputting a tracking and positioning model under a current task, outputting a response matrix by the tracking and positioning model, and predicting a point with the maximum response value as a target center;

s4, if the target is not tracked in the t-1 th frame of image, the tracking control module enters a target repositioning module after receiving the information of the t-1 th frame of image; the target repositioning module extracts image information of a t-k frame, wherein k is the number of frames which are not tracked to a target currently, and then calculates the target position of the t frame;

4. The RGBD tracking method of the unmanned aerial vehicle tracking and shooting system according to claim 3, wherein: in step S4, the target relocation module calculates the target location into two cases: for the short-time shielding condition, the target repositioning module extracts a target image of the t-k frame, and then the target positioning module calculates the target position of the t frame; and for the long-time shielding condition, the target repositioning module performs characteristic point matching on the RGB image of the t-k frame as a template image and the RGB image of the t frame, and calculates the target position of the t frame according to the matching relation.