CN112614150A

CN112614150A - Off-line pedestrian tracking method, system and storage medium based on dual-model interactive semi-supervised learning

Info

Publication number: CN112614150A
Application number: CN202011511434.XA
Authority: CN
Inventors: 郑伟诗; 陈柏高
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2021-04-06

Abstract

The invention discloses an off-line pedestrian tracking method, system and storage medium based on dual-model interactive semi-supervised learning, wherein the method comprises the following steps: selecting two pedestrian tracking models based on a neural network, and performing supervised training on the pedestrian tracking models by using labeled training data; predicting the pseudo label and performing offline interpolation optimization; learning by adopting an interactive semi-supervised method; final prediction and output. The invention adopts an off-line interpolation optimization method, utilizes complete video information, and carries out interpolation correction on the disconnected part of the pedestrian track, so that the pedestrian track disconnection condition is less, and the influence of pedestrian shielding is less. The invention also provides a method for performing semi-supervised self-learning on test data by using the dual models, so that the models can be gradually familiar with the test data, the performance is greatly improved after multiple rounds of iteration, and good performance can be kept in the face of scenes which are not seen by the models.

Description

Off-line pedestrian tracking method, system and storage medium based on dual-model interactive semi-supervised learning

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to an offline pedestrian tracking method, system and storage medium based on dual-model interactive semi-supervised learning.

Background

The pedestrian tracking technology is a core part of a human-centered video analysis technology, and is a technical premise of many important downstream applications, such as pedestrian search, behavior recognition, event analysis and the like. Generally, pedestrian tracking mainly comprises two parts, namely, the first part is to detect pedestrians in a single-frame image, namely, to locate all pedestrians in a picture and output a boundary frame of each pedestrian; the second part is to perform data correlation on all bounding boxes of the same pedestrian in consecutive adjacent frames, i.e. to form a trajectory belonging to the particular pedestrian using pedestrian Re-identification (Person Re-ID).

Based on the different implementation methods of the two parts, the existing pedestrian tracking technology is mainly divided into two groups, one is a 'two-step walking' method, namely, pedestrian detection is firstly carried out on a picture, then local image features of pedestrians are extracted, and Re-ID is carried out, and the other is a 'one-time' method, namely, a multi-task learning structure is used, so that a network simultaneously completes image pedestrian detection and Re-ID feature extraction tasks to obtain a faster reasoning speed.

The pedestrian track detected by the existing pedestrian tracking technology is often broken, the condition that middle frames are not detected often occurs, because the environment in a real scene is very complex, the pedestrian shielding phenomenon is serious, pedestrians are often only visible on the half body, the head or the legs, the sensitivity of the pedestrian detection technology to illumination and posture change is added, most of the existing pedestrian tracking technologies are not optimized for the pedestrian shielding phenomenon in the real and complex scene, the existing pedestrian tracking technology is almost an online tracking technology by depending on the boundary frame result provided by a pedestrian detection part, namely, the current frame result is output by only using the information of the current frame or the past frame, and the subsequent video information is not used for correcting the result.

Meanwhile, almost all pedestrian tracking technologies use a basic transfer learning method, a large amount of data sets are used for pre-training a model, then simple transfer learning is performed on part of marked test data or real industrial data, and then actual business prediction is performed on unmarked real data, but a large difference (such as illumination, background, angle, pedestrian characteristics and the like) often exists between data used for training a pedestrian tracking model and business data of a real scene which is not seen by the model, and if the simple transfer learning is used, the performance is often poor. Almost all existing pedestrian tracking technologies do not take into account the "data set adaptation" problem described above, with "perform well on the training set, but poorly on the test set".

Disclosure of Invention

The invention mainly aims to overcome the defects in the prior art and provide an offline pedestrian tracking method, an offline pedestrian tracking system and a storage medium based on dual-model interactive semi-supervised learning.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides an off-line pedestrian tracking method based on dual-model interactive semi-supervised learning, which comprises the following steps:

s1, selecting two pedestrian tracking models based on the neural network, and performing supervised training on the pedestrian tracking models by using the labeled training data until the models can basically fit the training data to obtain initial model weight F with good prediction performance on a training data set_T1And F_T2；

S2 order iteration model F_S1＝F_T1，F_S2＝F_T2；

S3, Using the iterative model F_S1And F_S2Directly predicting the label-free test data, and simultaneously optimizing the result by using an offline interpolation optimization method to obtain the iterative model F_S1And F_S2The output results of (1), namely pseudo tag 1 and pseudo tag 2; the off-line interpolation optimization method is used for splicing disconnected pedestrian tracks and recovering pedestrian track boundary frames with lost intermediate frames;

s4, learning by adopting an interactive semi-supervised method: using the iterative model F_S1The output pseudo label 1 is used as a training label of the label-free test data to obtain pseudo label data 1, and the pseudo label data 1 and the labeled training data are mixed together to be used as the iterative model F_S2Training the training data again; in the same way, it goes againstTo model the iteration F_S2Combining the output pseudo label 2 with label-free test data to obtain pseudo label data 2, and mixing the pseudo label data 2 with label training data to be used as an iterative model F_S1Training the training data; for the F_S1And F_S2Performing loop iteration training;

s5, using the iterated model F_S1And F_S2And predicting the label-free test data to obtain a final output result.

As a preferred technical solution, in step S3, the offline interpolation optimization method includes the steps of:

(1) judging an effective track, reserving the effective pedestrian track and participating in an interpolation optimization process;

(2) and judging whether the interval length of the front and rear frame numbers of the disconnected part of each effective pedestrian track meets the interpolation optimization condition or not, and performing the interpolation optimization process.

As a preferred technical solution, in step S3, in the offline interpolation optimization method, the specific judgment of the effective trajectory is as follows:

for the pedestrian track, when the total track frame number N is larger than the set track minimum frame number threshold N_minAnd "confidence level above minimum confidence threshold thr" is satisfied in the trace_conf"number of frames

N_t＝|{conf_t|conf_t＞thr_conf,t＝1,2,3,...,N}|

Greater than a threshold N of a minimum number of qualified frames_valThen it is considered as a valid track, otherwise it is directly ignored.

As a preferred technical solution, in step S3, in the offline interpolation optimization method, the interpolation optimization conditions specifically include:

for the disconnected position of each effective pedestrian track, defining the frame number of the last frame of a continuous frame before the disconnected position as f_tThe frame number of the first frame of a section of continuous frames after the disconnection position is f_t+1Judging the number of lost frames (f)_t+1-f_t) Whether the following equation is satisfied:

1＜(f_t+1-f_t)＜δ_max，

wherein delta_maxIf the preset maximum interval length meets the formula, the disconnected part is considered to meet the interpolation optimization condition, and interpolation can be carried out; if the formula is not satisfied, the disconnected part is considered to not satisfy the interpolation optimization condition, and interpolation is not performed.

As a preferred technical solution, in step S3, in the offline interpolation optimization method, an interpolation optimization process is used to supplement the pedestrian frame coordinates lost at the disconnection point, and f, f is a frame for each frame_t＜f＜f_t+1The corresponding calculation formula of the pedestrian boundary frame b is as follows:

wherein, b ═ x^min,y^min,x^max,y^max]，x^min,y^min,x^max,y^maxRespectively representing the minimum value and the maximum value of the pedestrian boundary frame on x and y coordinate axes; b_t+1A pedestrian boundary frame corresponding to a first frame of a section of continuous frames after the disconnection; b_tA pedestrian boundary frame corresponding to the last frame of a section of continuous frames before the disconnection;

the offline interpolation optimization process is mainly used to improve F_S1And F_S2The output quality of the semi-supervised learning process and the robustness of the semi-supervised learning process.

Preferably, in step S4, the pair F_S1And F_S2Performing cycle iteration training refers to repeatedly using F_S1And F_S2The output results mutually guide the learning and training process of the other side, so that the models can respectively take the advantages and mutually improve the advantages.

As a preferred technical solution, in step S5, the final output result may adopt one of the results in the dual model as the final prediction.

The invention also provides an off-line pedestrian tracking system based on dual-model interactive semi-supervised learning, which comprises a model pre-training module, a pseudo label prediction module, an interactive semi-supervised learning module and a final prediction and output module;

the model pre-training module performs supervised training on two selected pedestrian tracking models based on the neural network by using labeled training data until the models can basically fit the training data to obtain initial model weight F with good prediction performance on a training data set_T1、F_T2And an iterative model F_S1、F_S2；

The pseudo tag prediction module utilizes an iterative model F_S1And F_S2Predicting the label-free test data and optimizing by using an off-line interpolation optimization method to obtain the iterative model F_S1And F_S2The output results of (1), namely pseudo tag 1 and pseudo tag 2;

the interactive semi-supervised learning module is used for executing an interactive semi-supervised learning method, and specifically comprises the following steps: using the iterative model F_S1The output pseudo label 1 is used as a training label of the label-free test data to obtain pseudo label data 1, and the pseudo label data 1 and the labeled training data are mixed together to be used as the iterative model F_S2Training the training data again; in the same way, the iterative model F is reversely used_S2Combining the output pseudo label 2 with the label-free test data to obtain pseudo label data 2, and mixing the pseudo label data 2 with the labeled training data to be used as an iterative model F_S1Training the training data; for the F_S1And F_S2Performing loop iteration training;

a final prediction and output module: using the iterated model F_S1And F_S2And predicting the label-free test data to obtain a final output result.

The invention also provides a storage medium which stores a program, and the program is characterized in that when being executed by a processor, the off-line pedestrian tracking method based on the dual-model interactive semi-supervised learning is realized.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) the invention adopts an off-line interpolation optimization method to perform interpolation supplement optimization processing on the frame losing the boundary frame, and solves the problem that the prior art is generally in an on-line type, only depends on the information of the current frame and the past frame, and cannot correct the track by utilizing the subsequent information, thereby achieving the technical effects of less pedestrian track disconnection and less influence of pedestrian shielding.

(2) The invention adopts a semi-supervised learning method, performs multi-round semi-supervised learning on unseen scene data, partially masters scene modes, and solves the problems that the prior art uses simple transfer learning, always performs prediction on strange scenes during final prediction, the effect is generally not stable, and the performance is greatly reduced during testing, thereby achieving the technical effects of keeping good performance and outputting better results on the unseen scenes of the model.

Drawings

FIG. 1 is a flowchart of an offline pedestrian tracking method based on dual-model interactive semi-supervised learning according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an offline pedestrian tracking system based on dual-model interactive semi-supervised learning according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a storage medium according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Examples

As shown in fig. 1, the offline pedestrian tracking method based on dual-model interactive semi-supervised learning provided by the present invention includes the following training processes:

s1, selecting two pedestrian tracking models with better performance based on the neural network, and carrying out supervised training on the pedestrian tracking models by using the labeled training data until the models can basically fit the training data to obtain initial model weight F with good prediction performance on a training data set_T1And F_T2；

S2 order iteration model F_S1＝F_T1，F_S2＝F_T2；

s4, interactive semi-supervised learning method: using the iterative model F_S1The output pseudo label 1 is used as a training label of the label-free test data to obtain pseudo label data 1, and the pseudo label data 1 and the labeled training data are mixed together to be used as the iterative model F_S2Training the training data again; in the same way, the iterative model F is reversely used_S2Combining the output pseudo label 2 with label-free test data to obtain pseudo label data 2, and mixing the pseudo label data 2 with label training data to be used as an iterative model F_S1Training the training data; repeating the step F for 3-4 times_S1And F_S2The output results mutually guide the learning and training process of the other side to carry out the loop iteration training so as to improve F_S1And F_S2The output quality of the system and the robustness of the semi-supervised learning process;

s5, using the iterated model F_S1And F_S2And predicting the unlabeled test data, wherein the final output result can adopt one result in the dual models as the final prediction, or adopt a result obtained by continuously fusing the results of the two models as the final prediction.

The offline interpolation optimization method in step S3 can splice the broken pedestrian tracks, and the boundary frame with the lost intermediate frame can be recovered, which can improve the track breaking phenomenon caused by pedestrian occlusion or unstable detection algorithm, and at the same time, will play an important role in the interactive semi-supervised learning training method, and effectively improve the robustness of the semi-supervised learning method. The off-line interpolation optimization method comprises the following specific steps:

(1) and judging the effective track. For the pedestrian track, when the total track frame number N is larger than the set track minimum frame number threshold N_minAnd "confidence level above minimum confidence threshold thr" is satisfied in the trace_conf"number of frames

N_t＝|{conf_t|conf_t＞thr_conf,t＝1,2,3,...,N}|

Greater than a threshold N of a minimum number of qualified frames_valIf the track is valid, the track is considered to be a valid track, otherwise, the track is directly ignored; wherein, the track minimum frame number threshold N_minThe frame rate of the video is generally set, and if the video is 30 frames per second, N is the value_min30; minimum qualified frame number threshold N_valIs generally set to 5;

(2) reserving an effective pedestrian track and participating in an interpolation optimization process;

(3) and judging whether the interval length of the front and rear frame numbers of the disconnected part of each effective pedestrian track meets the interpolation optimization condition. For the disconnected position of each effective pedestrian track, defining the frame number of the last frame of a continuous frame before the disconnected position as f_tThe frame number of the first frame of a section of continuous frames after the disconnection position is f_t+1Judging the number of lost frames (f)_t+1-f_t) Whether the following equation is satisfied:

1＜(f_t+1-f_t)＜δ_max，

wherein delta_maxFor a set maximum interval length, it is generally set to N_min/3, i.e. delta, when the video frame rate is 30_max＝10；

If the above formula is satisfied, the disconnected connection is considered to satisfy the interpolation optimization condition, and interpolation can be performed; if the formula is not satisfied, the disconnected connection is considered to not satisfy the interpolation optimization condition, and interpolation is not performed;

(4) and (5) an interpolation optimization process. For the disconnected part meeting the interpolation optimization condition, the interpolation optimization process supplements the missing pedestrian frame coordinates, and f (f) is carried out for each frame_t＜f＜f_t+1) The corresponding calculation formula of the pedestrian boundary frame b is as follows:

wherein, b ═ x^min,y^min,x^max,y^max]，x^min,y^min,x^max,y^maxRespectively representing the minimum value and the maximum value of the pedestrian boundary frame on x and y coordinate axes; b_t+1A pedestrian boundary frame corresponding to a first frame of a section of continuous frames after the disconnection; b_tThe pedestrian boundary frame corresponding to the last frame of the previous continuous frame at the disconnected position.

Through the step S1, two slightly reliable original models are obtained, the pseudo labels output by the unlabeled data for the first time have certain reference value, meanwhile, the two models are bound to show the advantages and disadvantages on different videos, the models can learn the advantages of the other side better through a mutual guidance mode, the models cannot be easily 'bad' due to the addition of training data, and finally, due to the fact that the output results of the two iterative models both use an interpolation frame optimization technology, the performance of the models can be guaranteed to develop towards better and better directions, and the robustness of semi-supervised learning is improved. After several iterations, both models will have a larger lift than the original model.

Furthermore, the double-model interactive semi-supervised learning method can be further expanded into a multi-model semi-supervised learning method: when training an initial model in the first step of the method, three or even more pedestrian tracking models with good or bad can be adopted for initialization; the subsequent interactive semi-supervised learning guide training process can randomly distribute the pseudo labels output by the plurality of models.

As shown in fig. 2, the present embodiment further provides an offline pedestrian tracking system based on dual-model interactive semi-supervised learning, which includes a model pre-training module, a pseudo label prediction module, an interactive semi-supervised learning module, and a final prediction and output module;

It should be noted that the system provided in the above embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above function allocation may be completed by different functional modules according to needs, that is, the internal structure is divided into different functional modules to complete all or part of the above described functions.

As shown in fig. 3, this embodiment further provides a storage medium storing a program, where the program, when executed by a processor, implements an offline pedestrian tracking method based on dual-model interactive semi-supervised learning, specifically:

selecting two pedestrian tracking models with better performance based on the neural network, and carrying out supervised training on the pedestrian tracking models by using labeled training data until the models can basically fit the training data to obtain initial model weight F with good prediction performance on a training data set_T1And F_T2；

Order iteration model F_SI＝F_T1，F_S2＝F_T2；

Using the iterative model F_S1And F_S2Directly predicting the label-free test data, and simultaneously optimizing the result by using an offline interpolation optimization method to obtain the iterative model F_S1And F_S2The output results of (1), namely pseudo tag 1 and pseudo tag 2; the off-line interpolation optimization method is used for splicing disconnected pedestrian tracks and recovering pedestrian track boundary frames with lost intermediate frames;

learning by adopting an interactive semi-supervised method: using the iterative model F_S1The output pseudo label 1 is used as a training label of the label-free test data to obtain pseudo label data 1, and the pseudo label data 1 and the labeled training data are mixed together to be used as the iterative model F_S2Training the training data again; in the same way, the iterative model F is reversely used_S2Combining the output pseudo label 2 with label-free test data to obtain pseudo label data 2, and mixing the pseudo label data 2 with label training data to be used as an iterative model F_S1Training the training data; for the F_S1And F_S2Performing loop iteration training;

using the iterated model F_S1And F_S2For no markAnd predicting the test data to obtain the final output result.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. An off-line pedestrian tracking method based on dual-model interactive semi-supervised learning is characterized by comprising the following steps:

selecting two pedestrian tracking models based on a neural network, and carrying out supervised training on the pedestrian tracking models by using labeled training data until the models can basically fit the training data to obtain initial model weight F with good prediction performance on a training data set_T1And F_T2；

Order iteration model F_S1＝F_T1，F_S2＝F_T2；

Using the iterative model F_S1And F_S2Directly predicting the label-free test data, and simultaneously optimizing the result by using an offline interpolation optimization method to obtain the iterative model F_S1And F_S2The output results of (1), namely pseudo tag 1 and pseudo tag 2; the off-line interpolation optimization method is used for splicing disconnected pedestrian tracks and enabling the pedestrian tracks with lost intermediate framesRestoring the trace bounding box;

using the iterated model F_S1And F_S2And predicting the label-free test data to obtain a final output result.

2. The offline pedestrian tracking method based on the dual-model interactive semi-supervised learning as claimed in claim 1, wherein the offline interpolation optimization method comprises the following steps:

judging an effective track, reserving the effective pedestrian track and participating in an interpolation optimization process;

and judging whether the interval length of the front and rear frame numbers of the disconnected part of each effective pedestrian track meets the interpolation optimization condition or not, and performing the interpolation optimization process.

3. The offline pedestrian tracking method based on the dual-model interactive semi-supervised learning as recited in claim 1 or 2, wherein the offline interpolation optimization method specifically determines the effective trajectory as follows:

N_t＝|{conf_t|conf_t＞thrr_conf，t＝1，2，3，...，N}|

4. The offline pedestrian tracking method based on dual-model interactive semi-supervised learning according to claim 1 or 2, wherein in the offline interpolation optimization method, the interpolation optimization conditions are specifically as follows:

1＜(f_t+1-f_t)＜δ_max，

5. The offline pedestrian tracking method based on dual-model interactive semi-supervised learning as claimed in claim 1 or 2, wherein in the offline interpolation optimization method, an interpolation optimization process is used to supplement the pedestrian frame coordinates lost at the disconnected part, and for each frame f, f_t＜f＜f_t+1The corresponding calculation formula of the pedestrian boundary frame b is as follows:

wherein, b ═ x^min，y^min，x^max，y^max]，x^min，y^min，x^max，y^maxRespectively representing the minimum value and the maximum value of the pedestrian boundary frame on x and y coordinate axes; b_t+1After the connection is brokenA pedestrian bounding box corresponding to a first frame of a segment of continuous frames; b_tA pedestrian boundary frame corresponding to the last frame of a section of continuous frames before the disconnection;

the offline interpolation optimization procedure is used to increase F_S1And F_S2The output quality of the semi-supervised learning process and the robustness of the semi-supervised learning process.

6. The method of claim 1, wherein the pair F is used for off-line pedestrian tracking based on dual-model interactive semi-supervised learning_S1And F_S2Performing cycle iteration training refers to repeatedly using F_S1And F_S2The output results mutually guide the learning and training process of the other side, so that the models can respectively take the advantages and mutually improve the advantages.

7. The method of claim 1, wherein the final output result adopts one of the results in the dual model as the final prediction.

8. An off-line pedestrian tracking system based on double-model interactive semi-supervised learning, which is characterized in that the off-line pedestrian tracking system is applied to the off-line pedestrian tracking method based on double-model interactive semi-supervised learning of any one of claims 1 to 7, and comprises a model pre-training module, a pseudo label prediction module, an interactive semi-supervised learning module and a final prediction and output module;

9. A storage medium storing a program, wherein the program, when executed by a processor, implements the offline pedestrian tracking method based on dual-model interactive semi-supervised learning of any one of claims 1 to 7.