CN110659566A

CN110659566A - Target tracking method and system in shielding state

Info

Publication number: CN110659566A
Application number: CN201910754803.9A
Authority: CN
Inventors: 王海华; 马福齐
Original assignee: Chongqing Terminus Technology Co Ltd
Current assignee: Chongqing Terminus Technology Co Ltd
Priority date: 2019-08-15
Filing date: 2019-08-15
Publication date: 2020-01-07
Anticipated expiration: 2039-08-15
Also published as: CN110659566B

Abstract

The invention provides a target tracking method in a shielding state, which comprises the following steps: s1, selecting a tracking target from the 1 st frame of video picture frame, and extracting the picture characteristics of the tracking target in the picture frame; s2, selecting one or more candidate targets from the continuous video picture frames of the 2 nd frame, the 3 rd frame, the … th frame and the nth frame; s3, judging whether the candidate target selected in the S2 is in an occlusion state; s4, when the candidate target is in an occlusion state, carrying out occlusion removing reconstruction on the candidate target by using a GAN neural network; and S5, when the candidate target is in an unoccluded state or after the candidate target is subjected to the deblocking reconstruction, performing general target tracking processing on the candidate target, and confirming the candidate target in the blocked picture frame as the tracking target. A set of matched system is designed based on the method, and the system realizes the de-occlusion reconstruction of the occluded candidate target by using the GAN neural network, thereby ensuring the normal use of the target tracking technology in people flow and traffic flow.

Description

Target tracking method and system in shielding state

Technical Field

The invention relates to the technical field of video monitoring, in particular to a target tracking method and a target tracking system in a shielding state.

Background

With the continuous expansion of the scale of a video monitoring system, target tracking is used as a basic technology in the field of video monitoring, is continuously developed and advanced, and is applied to emergency scenes such as suspicious person tracking, illegal vehicle tracking and the like.

The target tracking refers to analyzing and extracting the same target, such as the same person, vehicle or animal, from continuous video picture frames, and belongs to the basic technology of the video monitoring field. The general method of target tracking is: firstly, extracting the characteristics of a tracking target, namely selecting the tracking target from a first frame of video picture frame, and extracting the picture characteristics of the tracking target in the video picture frame, such as one or more of color distribution characteristics, texture characteristics and edge characteristics; further, extracting the characteristics of the candidate targets, namely selecting one or more candidate targets from the subsequent continuous video picture frames, and extracting the picture characteristics of the one or more candidate targets; and finally, comparing the picture feature similarity of the tracking target with the candidate target, and if the similarity is greater than a threshold value, determining that the candidate target in the subsequent video picture frame is the tracking target.

For the above functions, it is a troublesome problem that a part or even most of the area of the tracking target is blocked by other objects, for example, a person or a vehicle as the tracking target is blocked by other persons or vehicles in the flow of people and vehicles, even if the selected candidate target is actually the tracking target, the similarity between the extracted image feature and the image feature of the tracking target cannot be greater than a threshold value due to the influence of the blocking object, which may easily cause a recognition failure.

At present, a target tracking technology can only track people and vehicles in a video image with a clear tracking target, and cannot track a partially or mostly shielded tracking target, but the shielded tracking target is an inevitable common phenomenon.

The GAN is a generative confrontation neural network, compared with other generative models, the GAN only uses back propagation and can generate clearer and more real samples, the GAN adopts an unsupervised learning mode for training, and can be widely used in the fields of unsupervised learning and semi-supervised learning, for example, when the GAN is applied to scenes such as image completion, only one reference is needed, a discriminator is used for discrimination, and the rest is handed to confrontation training, so that the difficulty of designing a loss function can be effectively avoided.

Therefore, how to combine GAN with a target tracking technology to achieve a de-occlusion process on an occluded tracked target, and further achieve an implementation of the tracking target technology in a real scene of people flow and traffic flow is a problem to be solved by those skilled in the art.

Disclosure of Invention

In view of this, the present invention provides a target tracking method and system in an occlusion state, which utilize a GAN neural network to realize de-occlusion reconstruction of an occluded candidate target, then extract image features of the de-occluded reconstructed candidate target, and perform similarity comparison with the image features of a tracked target, thereby completing target tracking of the occluded candidate target.

In order to achieve the purpose, the invention adopts the following technical scheme:

a target tracking method in an occlusion state comprises the following steps:

s1, selecting a tracking target from the 1 st frame of video picture frame, and extracting the picture characteristics of the tracking target in the picture frame;

s2, selecting one or more candidate targets from the continuous video picture frames of the 2 nd frame, the 3 rd frame, the … th frame and the nth frame;

s3, judging whether the candidate target selected in the S2 is in an occlusion state;

s4, when the candidate target is in an occlusion state, carrying out occlusion removing reconstruction on the candidate target by using a GAN neural network;

and S5, when the candidate target is in an unoccluded state or after the candidate target is subjected to the deblocking reconstruction, performing general target tracking processing on the candidate target, and confirming the candidate target in the blocked picture frame as the tracking target.

Preferably, the picture features extracted in S1 include one or more of color distribution features, texture features, and edge features, and the picture features with distinct features are selected to facilitate comparison of similarity between the candidate target and the tracking target.

Preferably, the specific steps of S3 are as follows:

s31, dividing the candidate target picture into a plurality of sub-regions respectively;

s32, extracting the picture characteristics of each sub-area;

s33, calculating the variation of the picture characteristics of each subregion relative to the adjacent subregions;

and S34, judging whether the variation exceeds a mutation threshold value.

When the candidate target is shielded, the picture characteristics of the picture frame of the candidate target, such as color distribution characteristics, texture characteristics and edge characteristics, have region mutation in distribution, while the picture characteristics of the picture frame of the candidate target which is not shielded are distributed more uniformly without mutation; therefore, the candidate target is divided into a plurality of sub-regions, each candidate target picture is divided into 10 × 10 sub-regions arranged in a matrix, picture features such as color distribution features, texture features and edge features of each sub-region are extracted, the variation of the picture features of each sub-region relative to the adjacent sub-regions in the 100 sub-regions is calculated, and if the variation is larger than a variation threshold, the candidate target is considered to have a region mutation and to be in a shielding state.

Preferably, the specific steps of S4 are as follows:

s41, establishing a training sample library, wherein the training sample library comprises a sample target picture in an occlusion state and a sample target picture in a non-occlusion state;

s42, overlapping the sample target picture in the shielding state with the randomly distributed variable, and generating a sample target picture after de-shielding reconstruction by using a generator;

s43, calculating a loss function value of the sample target picture after the occlusion removal reconstruction relative to the sample target picture in an occlusion-free state by a discriminator;

s44, when the loss function value is not in the allowable range, the feedback generator optimizes the loss function value until the sample target picture output by the generator after the occlusion removal reconstruction is judged to be in the allowable range by the discriminator to obtain a trained GAN;

and S45, substituting the candidate target picture in the shielded state into the trained GAN to obtain a target picture after de-shielding reconstruction.

The GAN neural network comprises a convolutional neural network serving as a generator and a convolutional neural network serving as a discriminator, the convolutional neural network has self-learning capacity, associative storage capacity and high-speed optimal solution searching capacity, and the convolutional neural network serving as the discriminator is trained through a large number of discrimination samples when in use and can be directly used. In order to train the convolutional neural network serving as a generator, a sample library containing a sample target picture in an occlusion state and a sample target picture in a non-occlusion state is established first, the sample target picture in the occlusion state is overlapped with a random distribution variable and then transmitted to the convolutional neural network serving as the generator, the convolutional neural network serving as the generator randomly generates a sample target picture subjected to de-occlusion reconstruction, the generated sample target picture subjected to de-occlusion reconstruction is transmitted to the convolutional neural network serving as a discriminator, the convolutional neural network serving as the discriminator discriminates and compares the generated sample target picture subjected to de-occlusion reconstruction with the sample target picture in a non-occlusion state, a loss function value is calculated, and if the loss function value is within an allowable range, the GAN neural network is trained; and if the loss function value is not in the allowable range, feeding the result back to the convolutional neural network serving as the generator for learning optimization, and repeating the process continuously until the loss function value is in the allowable range, finishing the training of the convolutional neural network serving as the generator, and finishing the GAN neural network.

Preferably, the specific steps of S5 are as follows:

s51, extracting picture features of the picture frame where the candidate target is located;

s52, comparing the similarity of the picture characteristics of the tracking target and the candidate target;

and S53, judging whether the similarity is greater than a threshold value, and if the similarity is greater than the threshold value, determining that the candidate target in the subsequent video picture frame is the tracking target.

A target tracking system in an occluded state, comprising: the device comprises a tracking target picture feature extraction module, a candidate target selection module, an occlusion state judgment module, a de-occlusion reconstruction module and a general target tracking processing module; wherein the content of the first and second substances,

the tracking target picture feature extraction module is used for selecting a tracking target from a first frame of video picture frame and extracting the picture feature of the tracking target in the picture frame;

the candidate target selection module is used for selecting one or more candidate targets from the continuous video picture frames of the 2 nd frame, the 3 rd frame, … th frame and the nth frame;

the shielding state judging module is used for judging whether the candidate target selected by the candidate target selecting module is in a shielding state;

the de-occlusion reconstruction module is used for performing de-occlusion reconstruction on the candidate target by using a GAN neural network when the candidate target is in an occlusion state;

and the general target tracking processing module is used for performing general target tracking processing on the candidate target after the candidate target is in an unoccluded state or is subjected to de-occlusion reconstruction, and determining the candidate target in the occluded picture frame as the tracking target.

Preferably, the picture features extracted by the tracking target picture feature extraction module include one or more of color distribution features, separation features, and edge features.

Preferably, the shielding state determining module includes: the device comprises a region discrete unit, a sub-region picture feature extraction unit, a variation calculation unit and a judgment unit; wherein the content of the first and second substances,

the area discrete unit is used for respectively dividing the candidate target picture into a plurality of sub-areas;

the subarea picture feature extraction unit is used for extracting the picture feature of each subarea;

the variable quantity calculating unit is used for calculating the variable quantity of the picture characteristic of each sub-area relative to the adjacent sub-area;

the judging unit is used for judging whether the variation exceeds a mutation threshold value.

Preferably, the de-occlusion reconstruction module includes: the device comprises a training sample library establishing unit, a sample target picture generating unit after occlusion removal reconstruction, a loss function value calculating unit, a generator optimizing unit and a target picture generating unit after occlusion removal reconstruction; wherein the content of the first and second substances,

the training sample library establishing unit is used for establishing a training sample library, and the training sample library comprises a sample target picture in an occlusion state and a sample target state in a non-occlusion state;

the sample target picture generation unit after the de-occlusion reconstruction is used for superposing a sample target picture in an occlusion state with a random distribution variable and generating a sample target picture after the de-occlusion reconstruction by using a generator;

the loss function value calculating unit is used for calculating a loss function value of a sample target picture after the occlusion removal reconstruction relative to a sample target picture in an occlusion-free state by a discriminator;

the generator optimization unit is used for feeding back the generator to optimize when the loss function value is not in the allowable range until the sample target picture output by the generator after the occlusion removal reconstruction is judged to be in the allowable range by the discriminator to obtain a trained GAN;

and the de-occlusion reconstructed target picture generation unit is used for substituting the candidate target picture in the pointed state into the trained GAN to obtain a de-occlusion reconstructed target picture.

Preferably, the general target tracking processing module includes: the device comprises a candidate target picture feature extraction unit, a similarity comparison unit and a result judgment unit; wherein the content of the first and second substances,

the candidate target picture feature extraction unit is used for extracting picture features of a picture frame where the candidate target is located;

the similarity comparison unit is used for comparing the similarity of the picture characteristics of the tracking target and the candidate target;

the result judging unit is used for judging whether the similarity is greater than a threshold value or not, and if the similarity is greater than the threshold value, the candidate target in the subsequent video picture frame is determined to be the tracking target.

The invention has the following beneficial effects:

based on the above technical scheme, based on the prior art, the invention provides a target tracking method in an occlusion state, and a target tracking system in the occlusion state is designed according to the method, and the GAN neural network is used for realizing the de-occlusion reconstruction of the candidate target, so that the occluded candidate target is efficiently restored, the situation that the candidate target cannot be identified due to occlusion is avoided in the target tracking process, the tracking efficiency of the target tracking technology on suspects and hit vehicles in actual people flow and vehicle flow is enhanced, and the target tracking technology has the feasibility of implementation.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a method of the present invention;

fig. 2 is a block diagram of the system architecture of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the present invention proposes the following method:

a target tracking method in an occlusion state comprises the following steps:

To facilitate similarity comparison with the candidate object, the easily distinguishable picture features extracted in S1 include one or more of color distribution features, texture features, and edge features.

In order to further optimize the above technical features, the specific steps of S3 are as follows:

s32, extracting the picture characteristics of each sub-area;

and S34, judging whether the variation exceeds a mutation threshold value.

Specifically, when the judgment of whether the continuous video picture frames of the 2 nd frame, the 3 rd frame, … and the nth frame are in the occlusion state is made, if the candidate object is blocked, the picture feature distribution of the candidate object will have region mutation, the candidate targets which are not shielded are generally uniform in the distribution of the image features and have no abrupt change, based on the principle, the candidate target image is divided into 10 × 10 matrix arrangement sub-regions, the image features of the 100 sub-regions are respectively extracted, the image features comprise one or more of color distribution features, texture features and edge features, the amount of change in picture characteristics for each sub-region relative to its neighbors is then calculated, and if there is a change that exceeds the snap threshold, the candidate target is determined to have a regional mutation, so that the candidate target is in an occlusion state and de-occlusion reconstruction should be performed.

In order to further optimize the above technical features, the specific steps of S4 are as follows:

The GAN neural network comprises a generator and a discriminator, wherein the generator and the discriminator are convolutional neural networks and have excellent performances of self-learning capability of the neural networks, and the convolutional neural network used as the discriminator in the invention is a trained neural network with accurate discrimination capability, so that the invention establishes a training sample library, trains the convolutional neural network used as the generator, the training sample library comprises a sample target picture in an occlusion state and a sample target picture in a non-occlusion state, the sample target picture in the occlusion state is superposed with random distribution variables, the generator generates a sample target picture after de-occlusion reconstruction, the discriminator calculates loss function values of the sample target picture after de-occlusion reconstruction by taking the sample target picture in the non-occlusion state as a reference, and feeds back the results to the generator, the generator continuously self-learns and trains, and repeats the process until the calculated loss function value is in the allowed range, the convolutional neural network training as the generator is completed, the GAN neural network training is completed, and the occluded target picture can be subjected to de-occlusion reconstruction.

7. In order to further optimize the above technical features, the specific steps of S5 are as follows:

As shown in figure 2 of the drawings, in which,

a target tracking system in an occluded state, comprising: the device comprises a tracking target picture feature extraction module 1, a candidate target selection module 2, an occlusion state judgment module 3, a de-occlusion reconstruction module 4 and a general target tracking processing module 5; wherein the content of the first and second substances,

the tracking target picture feature extraction module 1 is used for selecting a tracking target from a first frame of video picture frame and extracting picture features of the tracking target in the picture frame;

the candidate target selecting module 2 is used for selecting one or more candidate targets from the continuous video picture frames of the 2 nd frame, the 3 rd frame, …, the nth frame;

the shielding state judging module 3 is used for judging whether the candidate target selected by the candidate target selecting module 2 is in a shielding state;

the de-occlusion reconstruction module 4 is configured to perform de-occlusion reconstruction on the candidate target by using a GAN neural network when the candidate target is in an occlusion state;

the general target tracking processing module 5 is configured to perform general target tracking processing on the candidate target when the candidate target is in an unobscured state or after performing deblock reconstruction, and confirm that the candidate target in the occluded picture frame is the tracking target.

In order to further optimize the above technical features, the picture features extracted by the tracking target picture feature extraction module 1 include one or more of color distribution features, separation features, and edge features.

In order to further optimize the above technical features, the occlusion state determination module 3 includes: the device comprises a region discrete unit, a sub-region picture feature extraction unit, a variation calculation unit and a judgment unit; wherein the content of the first and second substances,

In order to further optimize the above technical features, the de-occlusion reconstruction module 4 comprises: the device comprises a training sample library establishing unit, a sample target picture generating unit after occlusion removal reconstruction, a loss function value calculating unit, a generator optimizing unit and a target picture generating unit after occlusion removal reconstruction; wherein the content of the first and second substances,

the sample target picture generation unit after the de-occlusion reconstruction is used for superposing the sample target picture in the occlusion state with the random distribution variable and generating a sample target picture after the de-occlusion reconstruction by using the generator;

the loss function value calculation unit is used for calculating the loss function value of the sample target picture after the occlusion removal reconstruction relative to the sample target picture in the non-occlusion state by the discriminator;

In order to further optimize the above technical features, the general target tracking processing module 5 includes: the device comprises a candidate target picture feature extraction unit, a similarity comparison unit and a result judgment unit; wherein the content of the first and second substances,

and the result judging unit is used for judging whether the similarity is greater than a threshold value or not, and if the similarity is greater than the threshold value, the candidate target in the subsequent video picture frame is determined to be the tracking target.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A target tracking method under an occlusion state is characterized by comprising the following steps:

2. The method according to claim 1, wherein the picture features extracted in S1 include one or more of color distribution features, texture features, and edge features.

3. The method for tracking the target under the occlusion state according to claim 1, wherein the specific steps of S3 are as follows:

s32, extracting the picture characteristics of each sub-area;

and S34, judging whether the variation exceeds a mutation threshold value.

4. The method for tracking the target under the occlusion state according to claim 1, wherein the specific steps of S4 are as follows:

5. The method for tracking the target under the occlusion state according to claim 1, wherein the specific steps of S5 are as follows:

6. A target tracking system in an occluded state, comprising: the device comprises a tracking target picture feature extraction module (1), a candidate target selection module (2), an occlusion state judgment module (3), a de-occlusion reconstruction module (4) and a general target tracking processing module (5); wherein the content of the first and second substances,

the tracking target picture feature extraction module (1) is used for selecting a tracking target from a first frame video picture frame and extracting the picture feature of the tracking target in the picture frame;

the candidate object selection module (2) is used for selecting one or more candidate objects from the continuous video picture frames of the 2 nd frame, the 3 rd frame, …, the nth frame;

the shielding state judging module (3) is used for judging whether the candidate target selected by the candidate target selecting module (2) is in a shielding state;

the de-occlusion reconstruction module (4) is used for performing de-occlusion reconstruction on the candidate target by using a GAN neural network when the candidate target is in an occlusion state;

the general target tracking processing module (5) is used for performing general target tracking processing on the candidate target when the candidate target is in an unoccluded state or after carrying out de-occlusion reconstruction, and confirming that the candidate target in the occluded picture frame is the tracking target.

7. The occlusion state target tracking system according to claim 6, wherein the image features extracted by the tracking target image feature extraction module (1) include one or more of color distribution features, separation features, and edge features.

8. The occlusion state target tracking system according to claim 6, wherein the occlusion state determination module (3) comprises: the device comprises a region discrete unit, a sub-region picture feature extraction unit, a variation calculation unit and a judgment unit; wherein the content of the first and second substances,

9. The occlusion state target tracking system of claim 6, wherein the de-occlusion reconstruction module (4) comprises: the device comprises a training sample library establishing unit, a sample target picture generating unit after occlusion removal reconstruction, a loss function value calculating unit, a generator optimizing unit and a target picture generating unit after occlusion removal reconstruction; wherein the content of the first and second substances,

the training sample library establishing unit is used for establishing a training sample library, and the training sample library comprises a sample target picture in an occlusion state and a sample target picture in a non-occlusion state;

10. The occlusion state target tracking system according to claim 6, wherein the general target tracking processing module (5) comprises: the device comprises a candidate target picture feature extraction unit, a similarity comparison unit and a result judgment unit; wherein the content of the first and second substances,