CN110232330B

CN110232330B - Pedestrian re-identification method based on video detection

Info

Publication number: CN110232330B
Application number: CN201910434555.XA
Authority: CN
Inventors: 薛丽敏; 冯瑞; 蒋龙泉
Original assignee: Fujun Intelligent Technology Suzhou Co ltd
Current assignee: Fujun Intelligent Technology Suzhou Co ltd
Priority date: 2019-05-23
Filing date: 2019-05-23
Publication date: 2020-11-06
Anticipated expiration: 2039-05-23
Also published as: CN110232330A

Abstract

The invention provides a pedestrian re-identification method based on video detection, which is characterized by comprising the following steps of: step S1, acquiring a key frame of the video to be detected by using an interframe difference method; step S2, extracting key depth features in the key frames based on the detection network; step S3, extracting non-key depth features and corresponding manual features in non-key frames based on the optical flow network; step S4, similarity calculation is carried out according to the key depth features, the non-key depth features and the manual features to construct a pedestrian re-identification model; step S5, analyzing each video to be tested through a pedestrian re-identification model, acquiring position information and time information of target pedestrians in each video to be tested, and sequencing; step S6, analyzing all videos to be detected through a pedestrian re-identification model, acquiring the probability of occurrence of target pedestrians in each video to be detected, and sequencing; and step S7, drawing the track of the target pedestrian in the preset monitoring scene according to the sequencing result.

Description

Pedestrian re-identification method based on video detection

Technical Field

The invention relates to the technical field of video monitoring, in particular to a pedestrian re-identification method based on video detection.

Background

The monitoring video is widely applied to subways, airports and traffic intersections and becomes an important tool for security protection, the working principle of the monitoring video is that key pedestrian targets in the video are detected, and the occurrence tracks of the pedestrian targets in the whole scene are obtained through a GPS (global positioning system) with a camera and the occurrence time points. However, in practical application scenarios, the pre-prevention and post-inspection work is often manually checked, which is inefficient and long in time, so that the cross-shot automatic identification of pedestrians is realized, and then the occurrence track of each pedestrian object in the whole monitoring scenario is obtained, so that it is quite necessary to realize tracking.

The pedestrian re-identification means that a specific pedestrian appears in a plurality of different camera videos, and relates to pedestrian detection work and feature extraction work in a scene, and feature similarity measurement work of two pedestrians. However, in the actual pedestrian re-identification scientific research work, the feature extraction and similarity measurement usually adopt a pedestrian picture obtained by an artificial labeling or detection algorithm as a data set, and are performed independently of pedestrian detection, which is often difficult to apply to the actual video scene (refer to article Mengyue gen, Yaowei Wang, Tao Xiang, Yonghong titanium, depletion learning for person identification [ J ]. arXiv prediction arixiv: 1611.05244,2016.).

Compared with picture detection, the video has the phenomena of motion blur, camera defocusing or strange posture and serious shielding of a target object, detection re-identification based on the phenomena can not only cause network load, but also greatly reduce the accuracy of the model, in order to solve the problems, a video frame is generally extracted by a fixed step length to serve as a key frame, other frames serve as non-key frames, and the content of the non-key frames is inferred and pixel point supplement is carried out by utilizing time sequence information extracted based on an optical Flow network (refer to an article: xizhu, Yuwen Xiong, Jifeng Dai, Lu yuan.

Disclosure of Invention

In order to solve the problems, the invention adopts the following technical scheme:

the invention provides a pedestrian re-identification method based on video detection, which is used for identifying a target pedestrian in a preset monitoring scene according to a plurality of videos to be detected consisting of image frames shot in the preset monitoring scene, and is characterized by comprising the following steps of:

step S1, reading image frames in the video to be detected, calculating the image frames by utilizing an interframe difference method, and taking the image frames corresponding to the local maximum value of the difference intensity in the image frames as key frames of the video to be detected;

step S2, extracting the characteristics of the target pedestrian in the key frame based on the detection network as key depth characteristics;

step S3, taking the rest image frames in the image frames as non-key frames, extracting the relevant features of the target pedestrian in the non-key frames based on the optical flow network, and taking the relevant features as non-key depth features and corresponding manual features;

and step S4, similarity calculation is carried out on the key depth features, the non-key depth features and the manual features, and a pedestrian re-identification model is constructed according to the result of the similarity calculation.

Step S5, analyzing each video to be detected through a pedestrian re-identification model, acquiring the position information and the time information of the target pedestrian in each video to be detected, and sequencing the position information and the time information of the target pedestrian in each video to be detected;

step S6, analyzing all videos to be detected through a pedestrian re-identification model, acquiring the probability of occurrence of a target pedestrian in each video to be detected, and sequencing the videos to be detected according to the probability;

and step S7, drawing the track of the target pedestrian appearing in the preset monitoring scene according to the sequencing results in the step S5 and the step S6.

The invention provides a pedestrian re-identification method based on video detection, which can also have the characteristics that the step S1 comprises the following sub-steps:

step S1-1, reading image frames in the video to be detected;

step S1-2, calculating the gray difference of corresponding pixel points between two adjacent image frames;

step S1-3, binary computation is carried out on the gray level difference value, and the coordinate of the pixel point is judged to be a foreground coordinate or a background coordinate according to the result of the binary computation;

a step S1-4 of acquiring a motion region in the image frame according to the result of the determination in the step S1-3;

and step S1-5, performing connectivity analysis on the image frame, and when the area of the motion region in the image frame is larger than a preset threshold value, determining that the current image frame is a key frame.

The invention provides a pedestrian re-identification method based on video detection, which can also have the characteristics that the step S3 comprises the following sub-steps:

step S3-1, judging whether the image frame is a key frame, if not, taking the image frame as a non-key frame;

step S3-2, calculating the non-key frame and the previous key frame adjacent to the non-key frame based on the optical flow estimation algorithm to obtain an optical flow graph;

step S3-3, adjusting the key depth features of the key frames to the same spatial resolution as the corresponding light flow graph for propagation;

and step S3-4, extracting non-key depth features and corresponding manual features in the non-key frames according to the propagation result.

The invention provides a pedestrian re-identification method based on video detection, which can also have the characteristic that a bilinear interpolation algorithm is adopted in the step S-3 to propagate the key depth characteristic.

The invention provides a pedestrian re-identification method based on video detection, which can also have the characteristic that in the step S-3, a time attention mechanism is adopted to limit the vector offset of pixel points in a non-key frame.

The invention provides a pedestrian re-identification method based on video detection, which can also have the characteristics that the step S4 comprises the following sub-steps:

step S4-1, similarity calculation is carried out on the key depth features, the non-key depth features and the manual features, and a similarity matrix is obtained;

and step S4-2, performing parameter learning on the similarity matrix fusion loss function, and thus constructing a pedestrian re-identification model.

The invention provides a pedestrian re-identification method based on video detection, which can also have the characteristics that the step S4-2 comprises the following sub-steps:

step S4-2-1, classifying and learning the similarity matrix by using a Softmax loss function, thereby removing a detection frame without pedestrians in the similarity matrix;

step S4-2-2, sequentially calculating the distances between the manual features and the key depth features and the non-key depth features by adopting a cosine distance measurement method, and sequencing according to the distances;

and S4-2-3, based on the sequencing result, the similarity matrix is continuously subjected to parameter learning in a multitask mode by using an OIM loss function, so that a pedestrian re-identification model is built.

Action and Effect of the invention

According to the pedestrian re-identification method based on video detection, disclosed by the invention, because the key frame is extracted from the image frame of the video by utilizing an inter-frame difference method, namely an inter-frame fusion mode, the relationship between the image frames can be better utilized, and the negative effects of network load and alignment accuracy caused by blurred frames (namely image frame blurring caused by motion blurring, camera defocusing or strange posture, serious shielding and the like of a target object) are effectively reduced. Furthermore, the optical flow network is adopted to extract the non-key depth features and the manual features of the target pedestrians in the non-key frames, and the key frame features, the non-key depth features and the manual features are subjected to point-to-point fusion on the similarity matrix, so that the obvious context information existing between the adjacent image frames is supplemented, the accuracy rate of the pedestrian re-identification model is higher, and the detection speed is higher.

Drawings

FIG. 1 is a flow chart of an implementation of a pedestrian re-identification method based on video detection in an embodiment of the invention;

FIG. 2 is a flowchart illustrating a pedestrian re-identification method based on video detection according to an embodiment of the present invention;

FIG. 3 is a flowchart of an implementation of non-key frame feature extraction based on a light flow graph in an example of the present invention;

FIG. 4 is a workflow diagram of non-key frame feature extraction based on a light flow graph in an example of the invention;

fig. 5 is a schematic diagram of the pedestrian movement locus finally obtained in the embodiment of the invention.

Detailed Description

In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the pedestrian re-identification method based on the video detection of the invention is specifically described below with reference to the accompanying drawings.

< example >

In the embodiment, a pytorch deep learning frame is adopted for network model building, a Mars data set is applied in model training, the data set comprises 6 cameras, 1261 pedestrians and 1,191,003 marking frames, a CUHK03 data set is adopted for testing, and the data set comprises 2 cameras and 1360 pedestrians. The testing method includes the steps of intercepting a pedestrian target to be searched in a video shot by one camera, re-identifying the pedestrian target in a video shot by one or more cameras, returning camera position information and time information according to re-identification results, sequencing all search results of a single video, calculating the possibility of occurrence of a corresponding search target for all videos to be detected, and sequencing the videos.

It should be noted that the parts of the present invention not described in detail belong to the prior art.

Fig. 1 is a flowchart of an implementation of a pedestrian re-identification method based on video detection in an embodiment of the present invention, and fig. 2 is a flowchart of an operation of the pedestrian re-identification method based on video detection in an embodiment of the present invention.

As shown in fig. 1 and fig. 2, the pedestrian re-identification method based on video detection in this embodiment is used for identifying a target pedestrian in a predetermined monitoring scene according to a plurality of videos to be detected, which are composed of image frames and captured in the predetermined monitoring scene, and includes the following steps:

step S1, reading an image frame in the video to be detected, calculating the image frame by using an inter-frame difference method, and using the image frame corresponding to the local maximum of the difference intensity in the image frame as a key frame of the video to be detected, which specifically includes the following sub-steps:

step S1-1, reading the image frame in each video to be detected;

step S1-2, calculating the gray difference of the corresponding pixel points between two adjacent image frames, assuming f_t(i, j) and f_t-1(i, j) are respectively the t-th frame and t-1 th frame of an image sequence, then their difference image is represented as:

D_t＝|f_t(i,j)-f_t-1(i,j)|

where (i, j) represents discrete image coordinates.

Step S1-3, binary calculation is carried out on the gray difference valueJudging whether the coordinates of the pixel points are foreground coordinates or background coordinates according to the result of the binarization calculation, and when D is reached_tIf the value is larger than the preset threshold value T, the coordinate is regarded as the foreground coordinate, and otherwise, the coordinate is regarded as the background coordinate.

Step S1-4, acquiring the motion region R in the image frame according to the result of the determination in step S1-3_t(i, j), the motion region being represented as:

step S1-5, connectivity analysis is carried out on the image frame after binarization, when the area of the motion area in the image frame is larger than a preset threshold value, the current image frame is judged to be a key frame and defined as I_k。

Step S2, extracting the characteristics of the target pedestrian in the key frame based on the detection network as the key depth characteristics f_kThe detection network used in the present embodiment is a fast-RCNN network, but is not limited to the fast-RCNN network.

And step S3, taking the rest image frames in the image frames as non-key frames, and extracting the features of the target pedestrians based on the features of the non-key frames in the optical flow network, so as to obtain the relevant features of feature alignment, wherein the relevant features are taken as non-key depth features and corresponding manual features.

Fig. 3 is a flowchart of a specific implementation of the non-key frame feature extraction based on the light flow graph in the example of the present invention, and fig. 4 is a flowchart of the work flow of the non-key frame feature extraction based on the light flow graph in the example of the present invention.

As shown in fig. 3 and 4, the non-key frame feature extraction in step S3 includes the following sub-steps:

step S3-1, judging whether the image frame is a key frame I_kIf not, the image frame is taken as a non-key frame I_i；

Step S3-2, based on the above-mentioned detection network, extracting and non-key frame I_iAdjacent last key frame I_kCorresponding key depth feature f_k；

Step S3-3, non-key frame I is estimated based on optical flow_iAnd non-key frame I_iAdjacent last key frame I_kCalculating to obtain a light flow diagram, wherein the specific process is as follows:

let M_i→kFor a two-dimensional flow field, an optical flow graph F is obtained from a non-key frame and a previous key frame adjacent to the non-key frame based on an optical flow estimation algorithm (such as FlowNet, but not limited to FlowNet network), wherein M is_i→k＝F(I_k，I_i)。

Step S3-4, adjusting the key depth feature of the key frame to the same spatial resolution as the corresponding light flow graph for propagation, and projecting the position P in the current non-key frame i to the position P + P in the key frame k in the propagation process, where P ═ M_i→k(p), comprising the following two steps:

1) in general, p is a decimal number, and the corresponding coordinates of the pixel points are integers, so that a bilinear interpolation algorithm is adopted to realize the propagation operation of the features, and the propagation formula is as follows:

wherein c is a channel in the feature map f, p and q enumerate all spatial position coordinates in the feature map, and G is a two-dimensional kernel function of bilinear interpolation.

2) In order to eliminate the inaccuracy of the optical flow network, a time attention mechanism is adopted to further limit the vector offset of the pixel coordinates in the feature map corresponding to the non-key frame, and the formula of the time attention mechanism is as follows:

wherein f is_tA feature map obtained by passing the t-th frame image through a network (such as the above-mentioned detection network or optical flow network), and e represents f_tP represents the coordinate position corresponding to each pixel on the feature map.

And step S4, similarity calculation is carried out on the key depth features, the non-key depth features and the manual features, and a pedestrian re-identification model is constructed according to the result of the similarity calculation. The method comprises the following substeps:

step S4-2, parameter learning is carried out on the similarity matrix fusion loss function, so that a pedestrian re-identification model is built, and the method comprises the following substeps:

the concrete calculation process of the softmax loss function in the step S4-2-1 is as follows:

1) screening out the detection frames without pedestrians, assuming that the number of pedestrian categories is N, the output layer is [ Z ]₁,Z₂,...Z_N]Normalizing each pedestrian probability as:

2) using cross entropy as a loss function:

wherein P is_iIndicates the found softmax value, t_iAre true values.

In step S4-2-3, only the labeled identity example and the unlabeldidence example in the training data are considered, the pedestrian targets with the same ID are minimized, the difference between the pedestrian targets with different IDs is maximized, the re-identification network continues to perform parameter learning in a multi-task manner, and the specific calculation process of the OIM loss function is as follows:

1) the feature probability that the feature vector f is regarded as i-class pedestrians is expressed as:

wherein, L is a marked pedestrian feature list, and Q stores a detected but unmarked pedestrian feature list. v is a marked characteristic vector, u represents an unmarked characteristic vector in pedestrian detection, and T is a flat factor used for controlling the flat degree of probability distribution.

2) Calculating a loss function:

L＝E_x[log P_t]

wherein t is a classification label of the target pedestrian.

Step S5, calculating each video to be detected through a pedestrian re-identification model, acquiring the position information and the time information of the target pedestrian in each video to be detected, and sequencing the position information and the time information of the target pedestrian in each video to be detected;

step S6, calculating all videos to be detected through a pedestrian re-identification model, acquiring the probability of occurrence of a target pedestrian in each video to be detected, and sequencing the videos to be detected according to the probability;

fig. 3 is a schematic diagram of the pedestrian movement locus finally obtained in the embodiment of the invention.

And step S7, drawing the track of the target pedestrian appearing in the predetermined monitoring scene according to the sequencing results in the steps S5 and S6, as shown in fig. 3, that is, the motion track of the target pedestrian under the designated camera in the embodiment.

Examples effects and effects

According to the pedestrian re-identification method based on video detection in the embodiment, because the key frame is extracted from the image frame of the video by using the interframe difference method, namely, the interframe fusion mode, the relationship between the image frames can be better utilized, and the negative effects of network load and alignment accuracy caused by blurred frames (namely, image frame blurring caused by motion blurring, camera defocusing or strange posture and serious shielding of a target object) are effectively reduced. Furthermore, the optical flow network is adopted to extract the non-key depth features and the manual features of the target pedestrians in the non-key frames, and the key frame features, the non-key depth features and the manual features are subjected to point-to-point fusion on the similarity matrix, so that the obvious context information existing between the adjacent image frames is supplemented, the accuracy rate of the pedestrian re-identification model is higher, and the detection speed is higher.

Aiming at fuzzy non-key frames caused by multiple light sources, shielding property, noise, transparency and the like, the time attention mechanism is adopted to limit the vector offset of the pixel points in the non-key frames, so that the time sequence information of the non-key frame characteristics and the manual characteristics obtained by the streamer network is more accurate, and therefore the pixel points of the non-key frames corresponding to the time sequence information can better supplement the adjacent previous key frame, the parameter training of the similarity matrix is more facilitated, and the network load is greatly reduced.

Due to the fact that the bilinear interpolation algorithm is adopted to propagate the key depth features, the non-key depth features and the manual features which are obtained through the flow chart are in a feature alignment state, parameter training of the similarity matrix is facilitated, and the analysis result of the pedestrian re-recognition model is accurate.

Aiming at the situation that the number of the pedestrians in the video to be detected is too small and only a few pedestrians are in each image frame in actual use, as the OIM loss function is utilized in the embodiment, only the instances of layed identity and unlabeled identity in the training data are considered, the pedestrian targets with the same ID are minimized, the difference among the pedestrian targets with different IDs is maximized, the distances between the manual features and the key depth features and the non-key depth features are calculated one by one through a cosine distance measurement method, and sequencing is performed according to the distance, so that the calculated amount of the pedestrian re-identification model is effectively reduced, and the problem that the model cannot be converged during parameter learning is solved.

The pedestrian re-identification method based on video detection in the embodiment is an end-to-end one-stage pedestrian re-identification method formed by combining pedestrian detection and a traditional pedestrian re-identification task, so that the method has practical value for pedestrian re-identification in image scenes and video scenes acquired in daily scenes.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. A pedestrian re-identification method based on video detection is used for identifying a target pedestrian in a preset monitoring scene according to a plurality of videos to be detected consisting of image frames shot in the preset monitoring scene, and is characterized by comprising the following steps:

step S1, reading the image frames in the video to be detected, calculating the image frames by utilizing an interframe difference method, and taking the image frames corresponding to the local maximum value of the difference intensity in the image frames as key frames of the video to be detected;

step S2, extracting the characteristics of the target pedestrian in the key frame based on a detection network as key depth characteristics;

step S3, taking the rest image frames in the image frames as non-key frames, extracting the features of the target pedestrian in the non-key frames based on an optical flow network, and taking the features as non-key depth features and corresponding manual features;

step S4, similarity calculation is carried out on the key depth features, the non-key depth features and the manual features, and the pedestrian re-identification model is built according to the result of the similarity calculation;

step S6, analyzing all the videos to be detected through the pedestrian re-identification model, acquiring the probability of the target pedestrian in each video to be detected, and sequencing the videos to be detected according to the probability;

and step S7, drawing the track of the target pedestrian in the preset monitoring scene according to the sequencing results in the step S5 and the step S6.

2. The pedestrian re-identification method based on video detection according to claim 1, wherein:

wherein the step S1 includes the following sub-steps:

step S1-1, reading the image frame in the video to be detected;

step S1-2, calculating the gray difference value of the corresponding pixel point between two adjacent image frames;

step S1-3, carrying out binarization calculation on the gray level difference value, and judging the coordinates of the pixel points to be foreground coordinates or background coordinates according to the result of the binarization calculation;

step S1-5, performing connectivity analysis on the image frames, and when the area of the motion region in the image frame is greater than a predetermined threshold, determining that the current image frame is the key frame.

3. The pedestrian re-identification method based on video detection according to claim 1, wherein:

wherein, the step S3 includes the following substeps:

step S3-1, judging whether the image frame is a key frame, if not, taking the image frame as the non-key frame;

step S3-2, calculating the non-key frame and the previous key frame adjacent to the non-key frame based on an optical flow estimation algorithm to obtain an optical flow graph;

step S3-3, adjusting the key depth features of the key frame to the same spatial resolution as the corresponding light flow graph for propagation;

and step S3-4, extracting the non-key depth features and the corresponding manual features in the non-key frames according to the propagation result.

4. The pedestrian re-identification method based on video detection according to claim 3, wherein:

in step S3-3, a bilinear interpolation algorithm is used to propagate the key depth features.

5. The pedestrian re-identification method based on video detection according to claim 3, wherein:

in step S3-3, a time attention mechanism is used to limit the vector offset of the pixel points in the non-key frame.

6. The pedestrian re-identification method based on video detection according to claim 1, wherein:

wherein the step S4 includes the following sub-steps:

and step S4-2, performing parameter learning on the similarity matrix fusion loss function, and thus building the pedestrian re-identification model.

7. The video detection-based pedestrian re-identification method according to claim 6, wherein:

wherein the step S4-2 includes the following sub-steps:

and S4-2-3, based on the sequencing result, performing parameter learning on the similarity matrix in a multitask mode by using an OIM loss function, and thus building the pedestrian re-identification model.