CN110852219B

CN110852219B - Multi-pedestrian cross-camera online tracking system

Info

Publication number: CN110852219B
Application number: CN201911048719.1A
Authority: CN
Inventors: 杨军峰; 林钢鑫; 朱贵冬; 李欢
Original assignee: Guangzhou Haige Xinghang Information Technology Co ltd
Current assignee: Guangzhou Haige Xinghang Information Technology Co ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2022-07-08
Anticipated expiration: 2039-10-30
Also published as: CN110852219A

Abstract

The invention discloses a multi-pedestrian cross-camera online tracking system which comprises a pedestrian detection module, a human body recognition module, a face recognition module, a target screening module, a multi-camera target fusion module and a cross-camera target feature collection module. The invention improves the identification accuracy by fusing the face identification technology and the human body identification technology, improves the pedestrian identification accuracy by fusing the spatial position points of the historical track under the condition of crossing the cameras, fuses the face identification technology and the spatial position data on the human body identification technology, corrects the human body identification result in real time, solves the problems that the target is easy to lose and the identification error is easy to occur because the prior art only adopts a single identification technology for tracking, and can effectively improve the accuracy of the multi-pedestrian online tracking technology.

Description

Multi-pedestrian cross-camera online tracking system

Technical Field

The invention relates to the technical field of camera tracking, in particular to a multi-pedestrian cross-camera online tracking system.

Background

At present, machine vision is widely applied to the video monitoring industry nowadays, and real-time target tracking also becomes one of the applications. The method has the advantages that real-time pictures of the cameras are analyzed (pedestrian re-recognition and face recognition), and the recognition accuracy of the pedestrian target is improved through various data fusion strategies (pedestrian moving speed, face recognition results, similarity threshold values and the like), so that the pedestrian target can be tracked on line under the condition of crossing the cameras.

The existing online tracking technology has the following problems:

1) the problems of missed detection and false detection of target pedestrian detection caused by complex background, high density of pedestrian targets, mutual shielding and the like in an actual monitoring scene.

2) The change of illumination, the change of visual angle and the change of the posture of the pedestrian across the cameras lead to the change of appearance characteristics of the same pedestrian under different cameras, and the disappearance of the target is difficult to be found back by other cameras accurately.

3) Target tracking cannot be achieved by means of face recognition alone, although face recognition is accurate, in an actual scene, a camera is difficult to capture an accurate face of a pedestrian in real time, for example, the pedestrian is far away from the camera to walk, and only the back of the pedestrian can be captured.

Disclosure of Invention

The embodiment of the invention provides a multi-pedestrian cross-camera online tracking system, which aims to solve the problem that cross-camera target tracking is easy to lose due to the problems of external illumination, background, visual angle and the like of the existing monitoring software, so that the accuracy of a multi-pedestrian online tracking technology can be effectively improved.

In order to solve the above technical problem, an embodiment of the present invention provides an online multi-person cross-camera tracking system, including:

the pedestrian detection module is used for detecting pedestrians in the video image acquired by the camera in real time and returning pedestrian detection information in real time; the pedestrian detection information comprises a detection frame, a pedestrian Id and a pedestrian characteristic value;

the human body recognition module is used for extracting human body features of the pedestrian detection information, comparing the human body feature extraction result with information in a target library, and returning a target human body recognition result in the comparison;

the face recognition module is used for carrying out face detection on the pedestrian detection information, searching and comparing the face detection result with information in a face target library, and returning a target face recognition result in comparison;

the target screening module is used for carrying out data fusion on the target human body recognition result and the target face recognition result according to the pedestrian detection information, meanwhile, carrying out speed calculation and screening on the compared targets, and returning a target screening result;

the multi-camera target fusion module is used for eliminating incorrect target identification results according to target position information acquired by the multiple cameras and outputting final target tracking information obtained through screening; and the final target tracking information comprises pedestrian information and positioning information of the target.

Furthermore, the system also comprises a cross-camera target feature collection module which is used for acquiring feature values of the target in different directions of different cameras, storing various feature values of the target and updating the various feature values in real time.

Further, the pedestrian detection module comprises an image acquisition unit, a detection frame output unit and a detection information sending unit; wherein

The image acquisition unit is used for acquiring a video screenshot and image information transmitted by the camera in real time from the network message queue;

the detection frame output unit is used for detecting the pedestrian in the image and returning the framed human body frame and head frame of the pedestrian;

the detection information sending unit is used for sending the pedestrian detection information to a network message queue.

Further, the human body identification module comprises a neural network model construction unit, a human body feature extraction unit and a human body identification result output unit; wherein the content of the first and second substances,

the neural network model building unit is used for building a pedestrian human body recognition network model according to the obtained pedestrian data set; the pedestrian data set comprises human body characteristic data under different postures, different angles and different illumination conditions;

the human body feature extraction unit is used for extracting the pedestrian features by using the human body recognition network model according to the pedestrian detection information to obtain a pedestrian feature vector;

and the human body recognition result output unit is used for comparing the pedestrian characteristic vector with the information in the target library, screening the pedestrian characteristic vector distance according to the Mahalanobis distance, and returning the target human body recognition result obtained by screening.

Furthermore, the face recognition module comprises a face image acquisition unit, a face comparison unit and a face recognition result output unit; wherein the content of the first and second substances,

the face image acquisition unit is used for acquiring pedestrian detection information of each camera frame by frame and extracting face images according to a head frame of the pedestrian detection information;

the face comparison unit is used for searching and comparing the extracted face image with face information in a face target library, and screening out a target face recognition result in the comparison according to a preset similarity threshold;

and the face recognition result output unit is used for returning the pedestrian information and the image information of the target face recognition result.

Further, the target screening module comprises an alternative recognition result acquisition unit, a target fusion unit and a target screening unit; wherein, the first and the second end of the pipe are connected with each other,

the alternative recognition result acquisition unit is used for receiving a target human body recognition result and a target human face recognition result of the same timestamp;

the target fusion unit is used for carrying out target fusion on the target human body recognition result and the target human face recognition result according to a preset data fusion strategy;

and the target screening unit is used for removing the abnormal pedestrian recognition result according to the moving speed of the target and returning the target screening result after target fusion and target removal operation.

Further, the multi-camera target fusion module comprises a human body recognition result collection unit, an outlier rejection unit and a fusion result output unit; wherein, the first and the second end of the pipe are connected with each other,

the human body identification result collecting unit is used for collecting human body identification results under different cameras at the same timestamp and screening and identifying different cameras with the same target according to the pedestrian Id of the human body identification result;

the outlier rejection unit is used for respectively calculating the average distance between the target position points under different cameras and the target position points under other cameras, and rejecting the identification result of the camera with the average distance larger than a preset threshold value;

and the fusion result output unit is used for outputting the target screening result after the identification result is removed.

Compared with the prior art, the invention has the following beneficial effects:

according to the multi-pedestrian cross-camera online tracking system provided by the embodiment of the invention, the identification accuracy is improved by fusing the face identification technology and the human body identification technology, and meanwhile, the pedestrian identification accuracy is improved by fusing the spatial position points of the historical track under the condition of cross-camera. The invention integrates the face recognition and the spatial position data on the human body recognition technology, corrects the human body recognition result in real time, and solves the problems that the target is easy to lose and the error is easy to recognize because only a single recognition technology is adopted for tracking in the prior art, thereby effectively improving the accuracy of the multi-line human online tracking technology.

Drawings

Fig. 1 is a schematic diagram of a logical relationship between multi-pedestrian cross-camera online tracking and other core modules according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a logical relationship between functions of a multi-pedestrian cross-camera online tracking system according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a multi-pedestrian cross-camera online tracking system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-3, an embodiment of the present invention provides an online multi-person cross-camera tracking system, including:

the pedestrian detection module 1 is used for detecting pedestrians in the video image acquired by the camera in real time and returning pedestrian detection information in real time; the pedestrian detection information comprises a detection frame, a pedestrian Id and a pedestrian characteristic value;

the human body recognition module 2 is used for extracting human body features of the pedestrian detection information, comparing the human body feature extraction result with information in a target library, and returning a target human body recognition result in the comparison;

the face recognition module 3 is used for carrying out face detection on the pedestrian detection information, searching and comparing the face detection result with information in a face target library, and returning a target face recognition result in the comparison;

the target screening module 4 is used for performing data fusion on the target human body recognition result and the target face recognition result according to the pedestrian detection information, meanwhile, performing speed calculation and screening on the compared targets, and returning a target screening result;

the multi-camera target fusion module 5 is used for eliminating incorrect target identification results according to target position information acquired by a plurality of cameras and outputting final target tracking information obtained through screening; and the final target tracking information comprises pedestrian information and positioning information of the target.

In the embodiment of the present invention, the system further includes a cross-camera target feature collection module 6, configured to acquire feature values of a target in different orientations of different cameras, store various feature values of the target, and update the various feature values in real time.

It can be understood that the main business functions of the multi-person cross-camera online tracking system of the present invention include the following items:

1) pedestrian detection: and detecting the pedestrian in the video picture, and returning pedestrian information (a detection frame, a pedestrian Id, a pedestrian characteristic value and the like) in real time.

2) Human body recognition: and (4) performing characteristic comparison on the detected pedestrians and the target entering the garage, and determining that the comparison result exceeds a certain threshold (preferably, the comparison result can be defined as 0.85 by self) to be compared.

3) Face recognition: and performing face detection and face comparison on the detected pedestrian by combining the existing face comparison service, and if the comparison is successful with the target face library and exceeds a threshold (preferably, the threshold can be defined as 0.9) then the detected pedestrian is regarded as a tracked target.

4) And (3) target screening: the accuracy of face recognition is higher than that of human body recognition, so the result of human body recognition is screened by means of the face recognition result, the target position is calculated at the same time, the speed from the last position point to the current position is judged according to the last position point of the target, the speed exceeds the normal walking speed of the pedestrian, and the result is rejected, so that the comparison accuracy is improved.

5) Fusing a plurality of camera targets: under the condition that the target pedestrians are in the sight overlapping area of the cameras, the spatial positions of the target pedestrians are close, outliers can be removed in a position clustering mode, and the removed outliers are the results that the target recognition is incorrect.

6) Cross-camera target feature set: the method is used for storing and updating the characteristic values of the target in different orientations under different cameras, and is convenient for characteristic comparison according to corresponding orientations.

In the embodiment of the present invention, further, the pedestrian detection module 1 includes an image acquisition unit, a detection frame output unit, and a detection information sending unit; wherein

In the embodiment of the invention, the pedestrian detection adopts a deep neural network technology to detect the human body/head detection frame of the pedestrian in the image.

The pedestrian detection method based on the YOLO network integrates candidate frame extraction, feature extraction, target classification and target positioning into a neural network. The neural network directly extracts candidate regions from the image and predicts the pedestrian position and probability through the whole image characteristics. The pedestrian detection problem is converted into the regression problem, and end-to-end detection is really realized.

The detection steps are as follows:

the first step is as follows: acquiring video screenshots and image information (time stamps, camera IDs and the like) of streaming media in real time from a network message queue (kafka);

the second step is that: detecting a pedestrian in the image, and outputting a human body frame and a human head frame of the pedestrian;

the third step: the detection result and the image information are sent to a network message queue (kafka).

In the embodiment of the present invention, further, the human body identification module 2 includes a neural network model construction unit, a human body feature extraction unit, and a human body identification result output unit; wherein the content of the first and second substances,

the neural network model building unit is used for building a pedestrian human body recognition network model according to the acquired pedestrian data set; the pedestrian data set comprises human body characteristic data under different postures, different angles and different illumination conditions;

The existing human body recognition method based on deep learning can be divided into two types of feature learning and distance measurement learning according to the use condition of a deep neural network. The goal of the feature learning network is to learn a robust and discriminative feature representation pedestrian image. Distance metric learning aims to reduce the distance between descriptors containing images of the same person.

The invention adopts a distance measurement learning mode to judge whether the pictures of pedestrians under a plurality of cameras are tracking targets.

The method comprises the following implementation steps:

the first step is as follows: and (5) off-line training the neural network model.

1. A training data set is constructed. The data set is generated by adopting a manual marking and verifying mode. The data set comprises complex conditions such as various postures and angles of pedestrians and different illumination conditions, and aims to enable the neural network model to learn the characteristics of the pedestrians in the complex conditions of a real scene.

2. And constructing a network model. The network model for pedestrian identification adopts a resnet convolutional neural network as a backbone network and a triplet loss function as a target function.

The second step: the neural network model is applied online.

1. And receiving a pedestrian detection result of each camera frame by frame.

2. And (3) performing feature extraction on a pedestrian picture obtained by detecting each pedestrian by using the neural network model obtained in the first step to obtain a feature vector of the pedestrian.

3. And comparing the obtained pedestrian feature vectors with the tracking target feature vectors respectively, and obtaining the distance of the feature vectors by using the Mahalanobis distance. Alternatively, when the distance is less than 0.85, the pedestrian is determined as a target person, and when the distance is greater than 0.85, the pedestrian is determined as a non-target person.

The recognition result information is fed to the next module (which may be the face recognition module 3).

In the embodiment of the present invention, further, the face recognition module 3 includes a face image obtaining unit, a face comparison unit, and a face recognition result output unit; wherein the content of the first and second substances,

In the embodiment of the invention, the face recognition module 3 acquires the pedestrian detection result from the network message queue, and performs face detection and face comparison on the human body or the head of the pedestrian in the image.

The first step is as follows: and receiving the pedestrian detection result of each camera frame by frame, and entering the second step.

The second step: intercepting a small figure of the human head according to a human body or a human head detection frame in a pedestrian detection result, performing face detection (through a soup face service) on the small figure of the human head, discarding no human face or having low human face quality (the human face picture is fuzzy, the recognition degree is not high, and the self-defined threshold value is below [40.0 ]), and performing third-step recognition on the remaining human face picture;

the third step: searching and comparing the face picture in a face target library through a face comparison service, determining that the similarity score in the comparison result exceeds a certain threshold (preferably 70% by self-definition), and then entering the fourth step.

And fourthly, sending the pedestrian information and the image information which are successfully compared with the human face to a next module (which can be a target screening module 4).

In the embodiment of the present invention, further, the target screening module 4 includes an alternative recognition result obtaining unit, a target fusion unit, and a target screening unit; wherein the content of the first and second substances,

In the embodiment of the invention, when the human body recognition result of the pedestrian and the face recognition result are received, the target screening function is started.

The first step is as follows: and receiving the human body recognition result and the human face recognition result with the same timestamp, and entering the second step.

The second step is that: and performing data fusion according to the serial number of the pedestrian (the serial number Id in the identification data), performing human body identification and face identification with the same serial number, and entering the third step.

The third step: when the personnel numbers (PersonId) are different, the following fusion strategy is adopted:

1. if the face similarity is greater than a maximum threshold (e.g., 0.9), the same person is considered.

2 if the face similarity is less than the maximum threshold (e.g. 0.9) and greater than the minimum threshold (e.g. 0.5), taking the integrated similarity =0.8 + 0.2; if the integrated similarity is greater than the integrated threshold (e.g., 0.7), then the same person is considered;

3. if the human face similarity is less than a minimum threshold (e.g., 0.5), the human face similarity is considered unreliable, but if the human body similarity of the pedestrian is greater than a threshold (e.g., 0.8), the target same person is considered.

And entering the fourth step.

The fourth step: and calculating the position of the current target through visual positioning. And entering the fifth step.

The fifth step: and calculating the moving speed of the pedestrian to the current position according to the previous position, if the speed is higher than the moving speed of the normal pedestrian, considering that the result is identified as wrong, removing the error and screening, and entering the sixth step.

And a sixth step: and finishing the target screening and returning.

In the embodiment of the present invention, further, the multi-camera target fusion module 5 includes a human body recognition result collecting unit, an outlier rejection unit, and a fusion result output unit; wherein, the first and the second end of the pipe are connected with each other,

the human body recognition result collecting unit is used for collecting human body recognition results under different cameras of the same timestamp, and screening and recognizing different cameras of the same target according to the pedestrian Id of the human body recognition result;

In the embodiment of the invention, when the same target is in the vision field overlapping areas of a plurality of cameras, a plurality of position points can appear, the correct position points are distributed in a certain area, and the difference value between the position points is only the distance of the visual positioning error, so that the wrong identification result can be eliminated in a mode of eliminating outliers.

For example, the average distance threshold is set to 3, three cameras 1, 2, and 3 recognize the same object, wherein,

the average distance between the position point of the camera 1 for identifying the object and the position points of the 2 and 3 for identifying the object is 5,

the average distance between the position point of the camera 2 for identifying the object and the position points of the 1 and 3 for identifying the object is 2,

the average distance between the position point of the camera 3 for identifying the object and the position points of the 1 and 2 for identifying the object is 1,

the result of the identification of the camera 1 with the average distance greater than the threshold value is eliminated.

The first step is as follows: and collecting human body recognition results under different cameras with the same timestamp, and entering a second step.

The second step is that: and (4) sorting according to the pedestrian number (PersonId), if a target pedestrian has a plurality of cameras, entering the third step, and if not, directly entering the fourth step.

The third step: calculating the average value (error value) of the distances from each position point to other position points, if the calculated error value is greater than a threshold value (for example, 3m, which is specifically related to the error value of the visual positioning calculation), regarding the position point as an outlier, removing the recognition result of the position point, and entering the fourth step.

The fourth step: and outputting and returning relevant tracking information such as effective identification results, pedestrian information, positioning information and the like.

It should be noted that the above method or flow embodiment is described as a series of acts or combinations for simplicity, but those skilled in the art should understand that the present invention is not limited by the described acts or sequences, as some steps may be performed in other sequences or simultaneously according to the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are exemplary embodiments and that no single embodiment is necessarily required by the inventive embodiments.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A multi-person cross-camera online tracking system is characterized by comprising:

a pedestrian detection module including a detection frame output unit;

the face recognition module is used for carrying out face detection on the pedestrian detection information, searching and comparing the face detection result with information in a face target library, and returning a target face recognition result in the comparison;

the multi-camera target fusion module is used for eliminating incorrect target identification results according to target position information acquired by a plurality of cameras and outputting final target tracking information obtained through screening; the final target tracking information comprises pedestrian information and positioning information of the target;

the multi-camera target fusion module comprises a human body recognition result collecting unit and an outlier removing unit; wherein the content of the first and second substances,

and the outlier rejection unit is used for respectively calculating the average distance between the target position points under different cameras and the target position points under other cameras and rejecting the identification result of the camera with the average distance larger than a preset threshold value.

2. The multi-pedestrian cross-camera online tracking system according to claim 1, further comprising a cross-camera target feature collection module for acquiring feature values of a target in different orientations of different cameras, storing and updating various feature values of the target in real time.

3. The multi-pedestrian cross-camera online tracking system of claim 1, wherein the pedestrian detection module further comprises an image acquisition unit and a detection information sending unit; wherein

4. The multi-person cross-camera online tracking system of claim 1, wherein the human body recognition module comprises a neural network model construction unit, a human body feature extraction unit, and a human body recognition result output unit; wherein the content of the first and second substances,

the human body feature extraction unit is used for extracting the features of the pedestrians by using the human body recognition network model according to the pedestrian detection information to obtain a pedestrian feature vector;

5. The multi-pedestrian cross-camera online tracking system according to claim 1, wherein the face recognition module comprises a face image acquisition unit, a face comparison unit and a face recognition result output unit; wherein the content of the first and second substances,

6. The multi-pedestrian cross-camera online tracking system according to claim 1, wherein the target screening module comprises an alternative recognition result obtaining unit, a target fusion unit, and a target screening unit; wherein the content of the first and second substances,

the alternative recognition result acquisition unit is used for receiving a target human body recognition result and a target human face recognition result with the same timestamp;

7. The multi-pedestrian cross-camera online tracking system of claim 1, wherein the multi-camera target fusion module further comprises a fusion result output unit; wherein, the first and the second end of the pipe are connected with each other,