CN110232330A

CN110232330A - A kind of recognition methods again of the pedestrian based on video detection

Info

Publication number: CN110232330A
Application number: CN201910434555.XA
Authority: CN
Inventors: 薛丽敏; 冯瑞; 蒋龙泉
Original assignee: Fujun Intelligent Technology (suzhou) Co Ltd
Current assignee: Fujun Intelligent Technology (suzhou) Co Ltd
Priority date: 2019-05-23
Filing date: 2019-05-23
Publication date: 2019-09-13
Anticipated expiration: 2039-05-23
Also published as: CN110232330B

Abstract

The present invention provides a kind of recognition methods again of the pedestrian based on video detection, which comprises the steps of: step S1 obtains the key frame of video to be measured using frame differential method；Step S2 extracts the critical depth feature in key frame based on detection network；Step S3 extracts non-key depth characteristic and corresponding manual feature in non-key frame based on light stream network；Step S4 carries out similarity calculation building pedestrian's weight identification model according to critical depth feature, non-key depth characteristic and manual feature；Step S5 analyzes each video to be measured by pedestrian's weight identification model, obtains location information and temporal information of the target pedestrian in each video to be measured, and be ranked up；Step S6 analyzes all videos to be measured by pedestrian's weight identification model, obtains the probability that target pedestrian occurs in each video to be measured, and be ranked up；Step S7 draws the track that target pedestrian occurs in predetermined monitoring scene according to ranking results.

Description

A kind of recognition methods again of the pedestrian based on video detection

Technical field

The present invention relates to technical field of video monitoring more particularly to a kind of recognition methods again of the pedestrian based on video detection.

Background technique

Monitor video is widely used in subway, airport, traffic intersection, becomes the important tool of security protection, working principle It is to be detected to emphasis pedestrian target in video, by the exact GPS positioning system of camera and time of occurrence point, is somebody's turn to do Appearance track of the pedestrian target in entire scene.However in practical application scene, prevents in advance, verifies work often afterwards It manually examines, low efficiency, the time is long, therefore realizes across the camera lens automatic identification to pedestrian, and then obtains each pedestrian's object Appearance track in entire monitoring scene realizes that tracking is reasonably necessary.

Pedestrian identifies again to be referred in multiple and different camera videos, searches some specific pedestrian in which camera In occurred, pedestrian detection work, the feature extraction work being related in scene, the characteristic similarity of two pedestrians measure work Make.However identified in research work again in actual pedestrian, feature extraction and similarity measurement are generallyd use by manually marking Or pedestrian's picture that detection algorithm obtains independently carries out with pedestrian detection as data set, is often very difficult to apply in actual view It (can be with articles of reference Mengyue Geng, Yaowei Wang, Tao Xiang, Yonghong Tian.Deep in frequency scene transfer learning for person reidentification[J].arXiv preprint arXiv: 1611.05244,2016.)。

Compared with picture detection, can have that motion blur, camera be out of focus in video or the strange posture of target object, Phenomena such as seriously blocking, and the detection based on these phenomenons identifies the load for not only resulting in network again, but will substantially reduce mould The accuracy rate of type generally extracts video frame as key frame using fixed step-length, other frames are as non-in order to solve these problems Key frame infer using content of the timing information extracted based on light stream network to non-key frame and pixel supplement (can With articles of reference: Xizhou Zhu, Yuwen Xiong, Jifeng Dai, Lu Yuan.Deep Feature Flow for Video Recognition.arXiv:1611.07715v2,2017), but inaccuracy and key frame step due to light stream network Whether long difference, the supplement that can influence context information to a certain extent are complete.

Summary of the invention

To solve the above problems, present invention employs following technical solutions:

The present invention provides a kind of recognition methods again of the pedestrian based on video detection, for according to being clapped in predetermined monitoring scene The multiple videos to be measured being made of picture frame taken the photograph identify the target pedestrian in predetermined monitoring scene, which is characterized in that Include the following steps:

Step S1 reads the picture frame in video to be measured, is calculated using frame differential method picture frame, by picture frame Key frame of the picture frame as video to be measured corresponding to middle differential intensity local maximum；

Step S2 extracts the feature of target pedestrian in key frame based on detection network, as critical depth feature；

Step S3 is extracted in non-key frame using remaining picture frame in picture frame as non-key frame based on light stream network The correlated characteristic of target pedestrian, as non-key depth characteristic and corresponding manual feature；

Step S4, to critical depth feature, non-key depth characteristic and manual feature progress similarity calculation, and according to The result of similarity calculation weighs identification model to construct pedestrian.

Step S5, by pedestrian weight identification model each video to be measured is analyzed, obtain target pedestrian it is each to The location information and temporal information in video are surveyed, and location information and time letter to the target pedestrian in each video to be measured Breath is ranked up；

Step S6 analyzes all videos to be measured by pedestrian's weight identification model, obtains mesh in each video to be measured The probability that pedestrian occurs is marked, and video to be measured is ranked up according to the size of probability；

Step S7 draws target pedestrian according to the result to sort in step S5 and step S6 and occurs in predetermined monitoring scene Track.

The present invention provides a kind of recognition methods again of the pedestrian based on video detection, can also have the feature that, In, step S1 includes following sub-step:

Step S1-1 reads the picture frame in video to be measured；

Step S1-2 calculates the gray scale difference value of corresponding pixel between two adjacent picture frames；

Step S1-3 carries out binaryzation calculating to gray scale difference value, determines pixel according to the result that binaryzation calculates Coordinate is prospect coordinate or background coordination；

Step S1-4 obtains the moving region in picture frame according to the result determined in step S1-3；

Step S1-5 carries out connectivity analysis to picture frame, when the area of the moving region in picture frame is greater than predetermined threshold When value, then determine current picture frame for key frame.

The present invention provides a kind of recognition methods again of the pedestrian based on video detection, can also have the feature that, In, include following sub-step in step S3:

Step S3-1 judges whether picture frame is key frame, if being judged as NO, using picture frame as non-key frame；

Step S3-2 carries out non-key frame and a upper key frame adjacent with non-key frame based on light stream algorithm for estimating It calculates, obtains light stream figure；

Step S3-3, by the critical depth Character adjustment of key frame to spatial resolution identical with corresponding light stream figure On propagated；

Step S3-4 extracts the non-key depth characteristic and corresponding craft in non-key frame according to the result of propagation Feature.

The present invention provides a kind of recognition methods again of the pedestrian based on video detection, can also have the feature that, In, critical depth feature is propagated using bilinear interpolation algorithm in step S-3.

The present invention provides a kind of recognition methods again of the pedestrian based on video detection, can also have the feature that, In, the vector shift amount of the pixel in non-key frame is limited using time attention mechanism in step S-3.

The present invention provides a kind of recognition methods again of the pedestrian based on video detection, can also have the feature that, In, step S4 includes following sub-step:

Step S4-1 carries out similarity calculation to critical depth feature, non-key depth characteristic and manual feature, obtains Similarity matrix；

Similarity matrix fusion loss function is carried out parameter learning by step S4-2, to build pedestrian's weight identification model.

The present invention provides a kind of recognition methods again of the pedestrian based on video detection, can also have the feature that, In, step S4-2 includes following sub-step:

Step S4-2-1 carries out classification learning to similarity matrix using Softmax loss function, to remove similarity There is no the detection block of pedestrian in matrix；

Step S4-2-2 successively calculates manual feature and critical depth feature and non-pass using the method that COS distance is measured The distance between key depth characteristic, and be ranked up according to the size of distance；

Step S4-2-3, it is based on sequence as a result, by OIM loss function in the way of multitask to similarity matrix after It is continuous to carry out parameter learning, to build pedestrian's weight identification model.

Invention action and effect

Pedestrian based on video detection recognition methods again according to the present invention uses frame due to utilizing frame differential method Between the mode that merges key frame is extracted from the picture frame of video, therefore can have preferably using relationship between picture frame Effect ground mitigates since fuzzy frame is (i.e. because motion blur, camera be out of focus or the strange posture of target object, seriously blocks Phenomenon causes picture frame fuzzy) bring network load and the negative effect to accuracy rate.Further, due to using light stream net Network extracts the non-key depth characteristic of target pedestrian and manual feature in non-key frame, and allows key frame feature, non-key depth Feature and manual feature point-to-point fusion on similarity matrix are spent, thus to existing apparent between adjacent picture frame Contextual information is supplemented, so that the accuracy rate of pedestrian's weight identification model is higher, detects speed faster.

Detailed description of the invention

Fig. 1 is the implementation flow chart of pedestrian in present example based on video detection recognition methods again；

Fig. 2 is the work flow diagram of pedestrian of the embodiment of the present invention based on video detection recognition methods again；

Fig. 3 is the implementation flow chart of the non-key frame feature extraction in present example based on light stream figure；

Fig. 4 is the work flow diagram of the non-key frame feature extraction in present example based on light stream figure；

Fig. 5 is finally obtained pedestrian movement's track schematic diagram in present example.

Specific embodiment

In order to be easy to understand the technical means, the creative features, the aims and the efficiencies achieved by the present invention, tie below Attached drawing is closed to be specifically addressed the pedestrian of the invention based on video detection again recognition methods.

Network model is carried out using pytorch deep learning frame in the present embodiment to build, and is applied in model training Mars data set, the data set share 6 cameras, 1261 pedestrians and 1,191,003 callout box, using CUHK03 data Collection is tested, which shares 2 cameras, 1360 pedestrians.Test method is the video shot in a camera The pedestrian target that middle interception needs to retrieve knows the pedestrian target in the video of another or the shooting of multiple cameras again Not, camera position information and temporal information are returned to according to weight recognition result, all search results of single video is carried out Sequence calculates a possibility that corresponding searched targets occur for all videos to be detected, video is ranked up.

It should be noted that the part not elaborated in the present invention belongs to the prior art.

Fig. 1 is the specific implementation flow chart of pedestrian in present example based on video detection recognition methods again, and Fig. 2 is this The work flow diagram of pedestrian of the inventive embodiments based on video detection recognition methods again.

As depicted in figs. 1 and 2, the recognition methods again of the pedestrian based on video detection in the present embodiment, for according to predetermined Multiple videos to be measured being made of picture frame of shooting identify the target pedestrian in predetermined monitoring scene in monitoring scene, Include the following steps:

Step S1 reads the picture frame in video to be measured, is calculated using frame differential method picture frame, by picture frame Key frame of the picture frame as video to be measured corresponding to middle differential intensity local maximum, specifically includes following sub-step:

Step S1-1 reads the picture frame in each video to be measured；

Step S1-2 calculates the gray scale difference value of corresponding pixel between two adjacent picture frames, it is assumed that f_t(i, j) and f_t-1(i, j) is respectively the t frame and t-1 frame of a certain image sequence, then their difference image indicates are as follows:

D_t=| f_t(i,j)-f_t-1(i,j)|

Wherein, (i, j) indicates discrete picture coordinate.

Step S1-3 carries out binaryzation calculating to gray scale difference value, determines pixel according to the result that binaryzation calculates Coordinate is prospect coordinate or background coordination, works as D_tProspect coordinate is then considered greater than predetermined threshold T, is otherwise background coordination.

Step S1-4 obtains the moving region R in picture frame according to the result determined in step S1-3_t(i, j), the movement Region indicates are as follows:

Step S1-5 carries out connectivity analysis to the picture frame after binaryzation, when the area of the moving region in picture frame When greater than predetermined threshold, then determine that current picture frame for key frame, is defined as I_k。

Step S2 extracts the feature of target pedestrian in key frame based on detection network, as critical depth feature f_k, this reality Applying detection network used by example is Faster-RCNN, but is simultaneously only not limited to Faster-RCNN network.

Step S3, using remaining picture frame in picture frame as non-key frame, based on target in light stream network non-key frame The feature of pedestrian extracts, so that the correlated characteristic of feature alignment is obtained, as non-key depth characteristic and corresponding Manual feature.

Fig. 3 is the specific implementation flow chart of the non-key frame feature extraction in present example based on light stream figure, and Fig. 4 is this The work flow diagram of non-key frame feature extraction in invention example based on light stream figure.

As shown in Figure 3 and Figure 4, the non-key frame feature extraction in step S3 includes following sub-step:

Step S3-1 judges whether picture frame is key frame I_kIf being judged as NO, using picture frame as non-key frame I_i；

Step S3-2, be based on above-mentioned detection network, extract with non-key frame I_iAn adjacent upper key frame I_kIn it is opposite The critical depth feature f answered_k；

Step S3-3, based on light stream algorithm for estimating to non-key frame I_iWith with non-key frame I_iAn adjacent upper key frame I_kIt is calculated, obtains light stream figure, detailed process are as follows:

Enable M_i→kFor two-dimensional flow field, light stream algorithm for estimating is based on (such as by a non-key frame upper key frame adjacent thereto FlowNet, but be not limited only to FlowNet network) light stream figure F is obtained, wherein M_i→k=F (I_k, I_i)。

Step S3-4, by the critical depth Character adjustment of key frame to spatial resolution identical with corresponding light stream figure On propagated, the position P in current non-key frame i is projected into the position p+ δ p in key frame k in communication process, wherein δ p=M_i→k(p), following two steps are specifically included:

1) δ p under normal conditions is decimal, and pixel respective coordinates are integer, therefore uses bilinear interpolation Algorithm realizes the propagation operation of feature, the formula of the propagation are as follows:

Wherein, c is characterized the channel in figure f, and p, q enumerate all spatial position coordinates in characteristic pattern, and G is two-dimensional The kernel function of bilinear interpolation.

2) to eliminate the inaccuracy because of light stream network, it is corresponding that non-key frame is further limited using time attention mechanism The vector shift amount of pixel coordinate in characteristic pattern, the time formula of attention mechanism are as follows:

Wherein, f_tThe feature obtained after network (detection network or light stream network as described above) for t frame image Figure, e represent f_tE-th of channel, p represents the corresponding coordinate position of each pixel on characteristic pattern.

Step S4, to critical depth feature, non-key depth characteristic and manual feature progress similarity calculation, and according to The result of similarity calculation weighs identification model to construct pedestrian.Including following sub-step:

Similarity matrix fusion loss function is carried out parameter learning by step S4-2, so that pedestrian's weight identification model is built, Including following sub-step:

The specific calculating process of softmax loss function is as follows in step S4-2-1:

1) detection block of no pedestrian is screened out, it is assumed that pedestrian's classification number is N, and output layer is [Z₁,Z₂,...Z_N], normalization is every A pedestrian's probability are as follows:

2) using cross entropy as loss function:

Wherein P_iIndicate the softmax value found out, t_iFor true value.

The example and unlabeled of the labeled identity in training data are only considered in step S4-2-3 The example of identity minimizes the pedestrian target of same ID, the otherness between the pedestrian target of different ID is maximized, with more Task state counterweight identification network continues parameter learning, and the specific calculating process of OIM loss function is as follows:

1) characteristic probability that feature vector f is considered as i class pedestrian indicates are as follows:

Wherein, L is the pedestrian's feature list marked, the pedestrian's characteristic series for having detected but not marked of Q storage Table.V is the feature vector marked, and u represents the feature vector not marked in pedestrian detection, and T is the gentle factor, general for controlling The gradual degree of rate distribution.

2) loss function is calculated:

L=E_x[log P_t]

Wherein, t is the tag along sort of target pedestrian.

Step S5, by pedestrian weight identification model each video to be measured is calculated, obtain target pedestrian it is each to The location information and temporal information in video are surveyed, and location information and time letter to the target pedestrian in each video to be measured Breath is ranked up；

Step S6 calculates all videos to be measured by pedestrian's weight identification model, obtains mesh in each video to be measured The probability that pedestrian occurs is marked, and video to be measured is ranked up according to the size of probability；

Fig. 3 is finally obtained pedestrian movement's track schematic diagram in present example.

Step S7 draws target pedestrian according to the result to sort in step S5 and step S6 and occurs in predetermined monitoring scene Track, as shown in figure 3, motion profile of the target pedestrian under specified camera as in the present embodiment.

Embodiment action and effect

It is used according to the recognition methods again of the pedestrian based on video detection of the present embodiment due to utilizing frame differential method The mode of interframe fusion extracts key frame from the picture frame of video, therefore can preferably utilize relationship between picture frame, Effectively mitigate since fuzzy frame is (i.e. because motion blur, camera be out of focus or the strange posture of target object, seriously blocks Phenomena such as cause picture frame fuzzy) bring network load and the negative effect to accuracy rate.Further, due to using light stream Network extracts the non-key depth characteristic of target pedestrian and manual feature in non-key frame, and allows key frame feature, non-key Depth characteristic and manual feature point-to-point fusion on similarity matrix, thus to existing obvious between adjacent picture frame Contextual information supplemented so that pedestrian weight identification model accuracy rate it is higher, detection speed faster.

For the non-key frame obscured caused by by reasons such as multiple light courcess, blocking property, noise and the transparencys, when due to using Between attention mechanism the vector shift amount of the pixel in non-key frame is limited so that by streamer network it is obtained The timing information of non-key frame feature and manual feature is more acurrate, therefore, the picture of non-key frame corresponding to these timing informations Vegetarian refreshments can preferably supplement an adjacent upper key frame, not only the more conducively parameter training of similarity matrix, together When also significantly reduce network load.

Due to being propagated using bilinear interpolation algorithm critical depth feature, so that being obtained by streamer figure The non-key depth characteristic and manual feature taken is in feature aligned condition, to be more advantageous to the parameter instruction of similarity matrix Practice, so that the analysis result of pedestrian's weight identification model is more accurate.

For pedestrian ID in video to be measured in actual use is very little and each picture frame in only several pedestrian ID the case where, By in this present embodiment utilize OIM loss function, only consider training data in labeled identity example and The example of unlabeled identity minimizes the pedestrian target of same ID, between the pedestrian target for maximizing different ID Otherness, the method measured by COS distance calculate manual feature and critical depth feature and non-key depth characteristic one by one Distance, and be ranked up according to the size of distance, to effectively reduce the calculation amount of pedestrian's weight identification model, avoided The problem of model occurred in parameter learning can not restrain.

It is by pedestrian detection and traditional pedestrian by the pedestrian based on video detection in this present embodiment again recognition methods Weight identification mission is combined together to form one-stage pedestrian's recognition methods again of end-to-end, therefore, under everyday scenes Pedestrian under the image scene that gets, video scene is identified again with more real value.

The preferred embodiment of the present invention has been described in detail above.It should be appreciated that the ordinary skill of this field is without wound The property made labour, which according to the present invention can conceive, makes many modifications and variations.Therefore, all technician in the art Pass through the available technology of logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea Scheme, all should be within the scope of protection determined by the claims.

Claims

1. a kind of recognition methods again of the pedestrian based on video detection, for multiple by image according to what is shot in predetermined monitoring scene The video to be measured that frame is constituted identifies the target pedestrian in the predetermined monitoring scene, which is characterized in that including walking as follows It is rapid:

Step S1 is read the described image frame in the video to be measured, is calculated using frame differential method described image frame, Using picture frame corresponding to differential intensity local maximum in described image frame as the key frame of the video to be measured；

Step S2 extracts the feature of target pedestrian described in the key frame based on detection network, as critical depth feature；

Step S3 is extracted described non-key using remaining picture frame in described image frame as non-key frame based on light stream network The feature of target pedestrian described in frame, as non-key depth characteristic and corresponding manual feature；

Step S4 carries out similarity meter to the critical depth feature, the non-key depth characteristic and the manual feature It calculates, and constructs pedestrian's weight identification model according to the result of the similarity calculation；

Step S5 analyzes each video to be measured by pedestrian's weight identification model, obtains the target pedestrian every Location information and temporal information in a video to be measured, and to the target pedestrian's in each video to be measured Location information and temporal information are ranked up；

Step S6 analyzes all videos to be measured by pedestrian weight identification model, obtains each described to be measured The probability that target pedestrian described in video occurs, and the video to be measured is ranked up according to the size of the probability；

Step S7 draws the target pedestrian in the predetermined monitoring field according to the result of sequence described in step S5 and step S6 The track occurred in scape.

2. the recognition methods again of the pedestrian based on video detection according to claim 1, it is characterised in that:

Wherein, the step S1 includes following sub-step:

Step S1-1 reads the described image frame in the video to be measured；

Step S1-2 calculates the gray scale difference value of corresponding pixel between two adjacent described image frames；

Step S1-3 carries out binaryzation calculating to the gray scale difference value, is determined according to the result that the binaryzation calculates described The coordinate of pixel is prospect coordinate or background coordination；

Step S1-4 obtains the moving region in described image frame according to the result of judgement described in step S1-3；

Step S1-5 carries out connectivity analysis to described image frame, when the area of the moving region in described image frame is big When predetermined threshold, then determine that current described image frame is the key frame.

3. the recognition methods again of the pedestrian based on video detection according to claim 1, it is characterised in that:

Wherein, include following sub-step in the step S3:

Step S3-1 judges whether described image frame is key frame, if being judged as NO, using described image frame as the non-pass Key frame；

Step S3-2, based on light stream algorithm for estimating to the non-key frame and the upper pass adjacent with the non-key frame Key frame is calculated, and light stream figure is obtained；

Step S3-3, by the critical depth Character adjustment of the key frame to sky identical with the corresponding light stream figure Between propagated in resolution ratio；

Step S3-4 extracts the non-key depth characteristic in the non-key frame and opposite according to the result of the propagation The manual feature answered.

4. the recognition methods again of the pedestrian based on video detection according to claim 3, it is characterised in that:

Wherein, the critical depth feature is propagated using bilinear interpolation algorithm in the step S-3.

5. the recognition methods again of the pedestrian based on video detection according to claim 3, it is characterised in that:

Wherein, inclined to the vector of the pixel in the non-key frame using time attention mechanism in the step S-3 Shifting amount is limited.

6. the recognition methods again of the pedestrian based on video detection according to claim 1, it is characterised in that:

Wherein, the step S4 includes following sub-step:

Step S4-1 carries out similarity to the critical depth feature, the non-key depth characteristic and the manual feature It calculates, obtains similarity matrix；

Similarity matrix fusion loss function is carried out parameter learning, identified again to build the pedestrian by step S4-2 Model.

7. the recognition methods again of the pedestrian based on video detection according to claim 6, it is characterised in that:

Wherein, the step S4-2 includes following sub-step:

Step S4-2-1 carries out classification learning to the similarity matrix using Softmax loss function, to remove the phase There is no the detection block of pedestrian like spending in matrix；

Step S4-2-2, using the method that COS distance is measured successively calculate the manual feature and the critical depth feature and The distance between described non-key depth characteristic, and be ranked up according to the size of the distance；

Step S4-2-3, it is based on the sequence as a result, by OIM loss function in the way of multitask to the similarity moment Battle array continues parameter learning, to build pedestrian's weight identification model.