Disclosure of Invention
The present invention is directed to one or more of the above-mentioned technical drawbacks of the prior art, and designs a specific method to solve the above-mentioned technical problems, and proposes the following technical solutions.
A tracking method of a cross-camera track based on cooperation of multiple recognition technologies comprises the following steps:
an acquisition step of acquiring information of an object obj to be tracked and a spatial position Pos 0 ;
A setting step, namely setting a region0 of a monitoring video range to be analyzed, wherein the region0 is a position Pos in space 0 Is a circle center, and has a radius of R, wherein R is a real number larger than zero;
a step of co-locating the target to be tracked based on the information of the target obj to be tracked and the acquired video data in the monitoring video range region0 by using a plurality of identification technologies to obtain a locating result Pos 1 ;
Obtaining a plurality of positioning results Pos by repeating the setting step and the synergetic positioning step of a plurality of recognition technologies n-1 ……Pos n Pos is put into 0 ,Pos 1 ...Pos n As a real-time trajectory of the object obj to be tracked.
Further, the information of the object obj to be tracked comprises a face image and a whole-body image.
Still further, the plurality of recognition techniques include face recognition positioning and video structured positioning, wherein the face recognition positioning recognizes a face captured by a camera within the monitoring video range based on the face image, and the video structured positioning recognizes video data acquired by a monitoring camera within the monitoring video range based on the whole-body image.
Still further, the video structured locations include pedestrian attribute locations that are located based on appearance local features of pedestrians and pedestrian re-identification locations that are located based on appearance global features of pedestrians.
Further, the operation of co-locating the target to be tracked to obtain a locating result includes: judging whether the positioning results of multiple recognition technologies are consistent, if the positioning results are consistent or only one positioning result is consistent, determining the latest position Pos of obj n And take the previous position Pos n-1 Preserving at the same time in terms of spatial position Pos n The latest position of the target; if the positioning results are inconsistent, combining the positioning results through a preset conflict processing strategy, and determining the latest position Pos of obj n Position Pos n-1 Preserving at the same time in terms of spatial position Pos n The latest position of the target;
the predetermined conflict handling policy is: setting two control thresholds for face recognition positioning: epsilon 1 And epsilon 2 Wherein ε is 1 >ε 2 ,ε 1 The accuracy of the face distribution control early warning result is extremely high, epsilon 2 The accuracy of the face control result is generally high, if the conflict positioning result contains the face result and the face control early warning similarity threshold value is more than or equal to epsilon 1 Then the face positioning result is taken as the latest position Pos of obj n If the conflict positioning result does not contain a human face or contains a human face, and the distribution control early warning similarity epsilon accords with epsilon 2 ≤ε<ε 1 Then confirm the latest position Pos of obj manually n 。
The invention also provides a device for tracking the track across the cameras based on the cooperation of a plurality of recognition technologies, which comprises:
an acquisition unit for acquiring information of the object obj to be tracked and emptyInter-position Pos 0 ;
The setting unit sets a region0 of the monitoring video to be analyzed, wherein the region0 is a position Pos in space 0 Is a circle center, and has a radius of R, wherein R is a real number larger than zero;
the multiple recognition technology co-locating unit is used for obtaining a locating result Pos by using multiple recognition technologies to co-locate the target to be tracked based on the information of the target obj to be tracked and the acquired video data in the monitoring video range region0 1 ;
Obtaining a plurality of positioning results Pos by repeatedly executing the operations of the setting unit and the co-positioning unit of a plurality of recognition technologies n-1 ……Pos n Pos is put into 0 ,Pos 1 ...Pos n As a real-time trajectory of the object obj to be tracked.
Further, the information of the object obj to be tracked comprises a face image and a whole-body image.
Still further, the plurality of recognition techniques include face recognition positioning and video structured positioning, wherein the face recognition positioning recognizes a face captured by a camera within the monitoring video range based on the face image, and the video structured positioning recognizes video data acquired by a monitoring camera within the monitoring video range based on the whole-body image.
Still further, the video structured locations include pedestrian attribute locations that are located based on appearance local features of pedestrians and pedestrian re-identification locations that are located based on appearance global features of pedestrians.
Further, the operation of co-locating the target to be tracked to obtain a locating result includes: judging whether the positioning results of multiple recognition technologies are consistent, if the positioning results are consistent or only one positioning result is consistent, determining the latest position Pos of obj n And take the previous position Pos n-1 Preserving at the same time in terms of spatial position Pos n The latest position of the target; if the positioning results are inconsistent, combining the positioning results through a preset conflict processing strategyAs a result, the latest position Pos of obj is determined n Position Pos n-1 Preserving at the same time in terms of spatial position Pos n The latest position of the target;
the predetermined conflict handling policy is: setting two control thresholds for face recognition positioning: epsilon 1 And epsilon 2 Wherein ε is 1 >ε 2 ,ε 1 The accuracy of the face distribution control early warning result is extremely high, epsilon 2 The accuracy of the face control result is generally high, if the conflict positioning result contains the face result and the face control early warning similarity threshold value is more than or equal to epsilon 1 Then the face positioning result is taken as the latest position Pos of obj n If the conflict positioning result does not contain a human face or contains a human face, and the distribution control early warning similarity epsilon accords with epsilon 2 ≤ε<ε 1 Then confirm the latest position Pos of obj manually n 。
The invention also proposes a computer readable storage medium having stored thereon computer program code which, when executed by a computer, performs any of the methods described above.
The invention has the technical effects that: the invention provides a method, a device and a storage medium for tracking a track across cameras based on cooperation of multiple recognition technologies, wherein the method comprises the following steps: an acquisition step of acquiring information of an object obj to be tracked and a spatial position Pos 0 The method comprises the steps of carrying out a first treatment on the surface of the A setting step, namely setting a region0 of a monitoring video range to be analyzed, wherein the region0 is a position Pos in space 0 Is a circle center, and has a radius of R, wherein R is a real number larger than zero; a step of co-locating the target to be tracked based on the information of the target obj to be tracked and the acquired video data in the monitoring video range region0 by using a plurality of identification technologies to obtain a locating result Pos 1 The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a plurality of positioning results Pos by repeating the setting step and the synergetic positioning step of a plurality of recognition technologies n-1 ……Pos n Pos is put into 0 ,Pos 1 ...Pos n As a real-time trajectory of the object obj to be tracked. The invention works cooperatively by a plurality of recognition technologies, fully utilizes the positioning accuracy and universality of the face recognition technologyThe invention achieves the aims of accurate positioning and complete track by adopting a high deployment density of a monitoring camera (video structuring technology) and introducing a conflict processing strategy of a multi-recognition technology means positioning recognition result.
Detailed Description
The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 shows a method for tracking a track across cameras based on cooperation of multiple recognition technologies, which comprises the following steps:
an acquisition step S101 of acquiring information of the object obj to be tracked and the spatial position Pos 0 The method comprises the steps of carrying out a first treatment on the surface of the The object to be trackedThe target obj can be a pedestrian, but also can be other objects such as animals, etc., a camera, a snap camera, etc. can be used for acquiring the information of the target obj to be tracked, and the spatial position Pos 0 As the initial position, the information of the object obj to be tracked can be obtained by an image recognition technology or can be manually input, and preferably, the information of the object obj to be tracked comprises a face image and a whole-body image.
A setting step S102, setting a region0 of the monitoring video to be analyzed, wherein the region0 is a spatial position Pos 0 Is a circle center, and has a radius of R, wherein R is a real number larger than zero; the invention aims at the bottleneck that a large amount of hardware server equipment is needed to support in the area range by the current video structuring technology, and provides the method for carrying out the structuring analysis on the monitoring video in the radius R range by taking the target space position as the center, dynamically selecting and changing the monitoring video needing the structuring analysis along with the movement of the positioned target, tracking the target in a video relay mode, reducing the number of the required structuring video, achieving the aim of tracking the target in real time by a small amount of hardware equipment, and reducing the hardware cost of the existing real-time tracking system.
A step S103 of co-locating the target to be tracked based on the information of the target obj to be tracked and the acquired video data in the monitoring video range region0 by using a plurality of identification technologies to obtain a locating result Pos 1 The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a plurality of positioning results Pos by repeating the setting step and the synergetic positioning step of a plurality of recognition technologies n-1 ……Pos n Pos is put into 0 ,Pos 1 ...Pos n As a real-time trajectory of the object obj to be tracked.
In one embodiment, the plurality of recognition techniques includes face recognition localization that recognizes faces captured by cameras within the surveillance video based on the face images and video structured localization that recognizes video data acquired by surveillance cameras within the surveillance video based on the whole-body images.
In one embodiment, the video structured locations include pedestrian attribute locations and pedestrian re-identification locations, the pedestrian attribute locations being located based on appearance local features of the pedestrian, generally appearance local features including, but not limited to, the person's wear (such as coat, pants, hat, etc.), backpack, hairstyle, glasses, etc.; the pedestrian re-identification positioning is based on the global appearance feature of the pedestrian, and in general, the global appearance feature is formed by all local features of the pedestrian and can be expressed in a multi-dimensional vector form.
The invention provides a plurality of identification positioning technologies to cooperatively work, the positioning results are mutually fused, the positions of pedestrians in the cameras are judged, the positioning accuracy of key objects is improved, and a more complete and accurate pedestrian tracking track crossing the cameras is formed. The multi-recognition means cooperative working process of the invention is specifically as follows: the human face picture is extracted according to the provided key object picture and used for carrying out key human dynamic distribution control, the human body picture is extracted for carrying out pedestrian recognition distribution control of the pedestrian re-recognition technology, the human body attribute characteristic is extracted, and the apparent characteristic attribute which is obviously differentiated from the characteristic attribute is selected for characteristic attribute comparison distribution control. And according to the control recognition result, combining with the spatial position information of the camera from which the control early warning picture is derived, the position of the key object (pedestrian) in real time can be known. The multi-recognition technology has a plurality of positioning results, and the positioning results are fused, wherein the fusion method can be as follows: if the intersection is not an empty set, the intersection position is the real-time position of the key object (pedestrian), if the intersection is empty, namely, the control results of a plurality of recognition means have conflicts, a positioning result conflict processing strategy is introduced for processing, and the fusion method of the positioning results is not limited to the method, and the positioning can be performed by adopting a result voting method or a weighting calculation method, which is an important invention point of the invention.
In one embodiment, the operation of co-locating the target to be tracked to obtain a locating result includes: judging multiple kinds of knowledgeWhether the positioning results of other technologies are consistent or not, namely adopting a calculation mode of solving intersection of a plurality of positioning results, if the positioning results are consistent or only one positioning result is adopted, determining the latest position Pos of obj n And take the previous position Pos n-1 Preserving at the same time in terms of spatial position Pos n The latest position of the target; if the positioning results are inconsistent, combining the positioning results through a preset conflict processing strategy, and determining the latest position Pos of obj n Position Pos n-1 Preserving at the same time in terms of spatial position Pos n The latest position of the target;
the predetermined conflict handling policy is: setting two control thresholds for face recognition positioning: epsilon 1 And epsilon 2 Wherein ε is 1 >ε 2 ,ε 1 The accuracy of the face distribution control early warning result is extremely high, epsilon 2 The accuracy of the face control result is generally high, if the conflict positioning result contains the face result and the face control early warning similarity threshold value is more than or equal to epsilon 1 Then the face positioning result is taken as the latest position Pos of obj n If the conflict positioning result does not contain a human face or contains a human face, and the distribution control early warning similarity epsilon accords with epsilon 2 ≤ε<ε 1 Then confirm the latest position Pos of obj manually n . At the same time in spatial position Pos n And setting a monitoring range for the latest position of the target, and updating video real-time data to be analyzed.
The method disclosed by the invention works cooperatively through a plurality of recognition technologies, fully utilizes the positioning accuracy of the face recognition technology, has high deployment density of the common monitoring cameras (video structuring technology), and introduces a conflict processing strategy of the positioning recognition results by a plurality of recognition technology means, so that the aims of positioning accuracy and complete track are achieved. According to the method, the video monitoring range is required to be analyzed through dynamic selection and dynamic change, the function of tracking the target in real time can be achieved through a video relay mode by using a small amount of hardware equipment resources, meanwhile, the face recognition technology, the pedestrian re-recognition technology and the pedestrian appearance characteristic comparison work cooperatively through multiple recognition technology means, multiple positioning results are fused, and the accuracy and the completeness of real-time track tracking of key objects (pedestrians) across cameras are improved.
FIG. 2 illustrates a cross-camera trajectory tracking device based on the cooperation of multiple recognition techniques, the device comprising:
an acquisition unit 201 that acquires information of an object obj to be tracked, and a spatial position Pos 0 The method comprises the steps of carrying out a first treatment on the surface of the The object obj to be tracked can be a pedestrian, but can also be other objects, such as animals, etc., and the information of the object obj to be tracked and the spatial position Pos can be obtained by using a camera, a snapshot camera, etc 0 As the initial position, the information of the object obj to be tracked can be obtained by an image recognition technology or can be manually input, and preferably, the information of the object obj to be tracked comprises a face image and a whole-body image.
The setting unit 202 sets a region0 of the monitoring video to be analyzed, where the region0 is defined by a spatial position Pos 0 Is a circle center, and has a radius of R, wherein R is a real number larger than zero; the invention aims at the bottleneck that a large amount of hardware server equipment is needed to support in the area range by the current video structuring technology, and provides the method for carrying out the structuring analysis on the monitoring video in the radius R range by taking the target space position as the center, dynamically selecting and changing the monitoring video needing the structuring analysis along with the movement of the positioned target, tracking the target in a video relay mode, reducing the number of the required structuring video, achieving the aim of tracking the target in real time by a small amount of hardware equipment, and reducing the hardware cost of the existing real-time tracking system.
The multiple recognition technology co-locating unit 203 performs co-locating on the target to be tracked based on the information of the target obj to be tracked and the acquired video data in the monitoring video range region0 by using multiple recognition technologies to obtain a locating result Pos 1 The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a plurality of positioning results Pos by performing operations of the repeated setting unit and the multiple recognition technology co-positioning unit n-1 ……Pos n Pos is put into 0 ,Pos 1 ...Pos n As a real-time trajectory of the object obj to be tracked.
In one embodiment, the plurality of recognition techniques includes face recognition localization that recognizes faces captured by cameras within the surveillance video based on the face images and video structured localization that recognizes video data acquired by surveillance cameras within the surveillance video based on the whole-body images.
In one embodiment, the video structured locations include pedestrian attribute locations and pedestrian re-identification locations, the pedestrian attribute locations being located based on appearance local features of the pedestrian, generally appearance local features including, but not limited to, the person's wear (such as coat, pants, hat, etc.), backpack, hairstyle, glasses, etc.; the pedestrian re-identification positioning is based on the global appearance feature of the pedestrian, and in general, the global appearance feature is formed by all local features of the pedestrian and can be expressed in a multi-dimensional vector form.
The invention provides a plurality of identification positioning technologies to cooperatively work, the positioning results are mutually fused, the positions of pedestrians in the cameras are judged, the positioning accuracy of key objects is improved, and a more complete and accurate pedestrian tracking track crossing the cameras is formed. The multi-recognition means cooperative working process of the invention is specifically as follows: the human face picture is extracted according to the provided key object picture and used for carrying out key human dynamic distribution control, the human body picture is extracted for carrying out pedestrian recognition distribution control of the pedestrian re-recognition technology, the human body attribute characteristic is extracted, and the apparent characteristic attribute which is obviously differentiated from the characteristic attribute is selected for characteristic attribute comparison distribution control. And according to the control recognition result, combining with the spatial position information of the camera from which the control early warning picture is derived, the position of the key object (pedestrian) in real time can be known. The multi-recognition technology has a plurality of positioning results, and the positioning results are fused, wherein the fusion method can be as follows: if the intersection is not an empty set, the intersection position is the real-time position of the key object (pedestrian), if the intersection is empty, namely, the control results of a plurality of recognition means have conflicts, a positioning result conflict processing strategy is introduced for processing, and the fusion method of the positioning results is not limited to the method, and the positioning can be performed by adopting a result voting method or a weighting calculation method, which is an important invention point of the invention.
In one embodiment, the operation of co-locating the target to be tracked to obtain a locating result includes: judging whether the positioning results of multiple recognition technologies are consistent or not, namely adopting a calculation mode of solving intersection of multiple positioning results, and if the positioning results are consistent or only one positioning result is adopted, determining the latest position Pos of obj n And take the previous position Pos n-1 Preserving at the same time in terms of spatial position Pos n The latest position of the target; if the positioning results are inconsistent, combining the positioning results through a preset conflict processing strategy, and determining the latest position Pos of obj n Position Pos n-1 Preserving at the same time in terms of spatial position Pos n The latest position of the target;
the predetermined conflict handling policy is: setting two control thresholds for face recognition positioning: epsilon 1 And epsilon 2 Wherein ε is 1 >ε 2 ,ε 1 The accuracy of the face distribution control early warning result is extremely high, epsilon 2 The accuracy of the face control result is generally high, if the conflict positioning result contains the face result and the face control early warning similarity threshold value is more than or equal to epsilon 1 Then the face positioning result is taken as the latest position Pos of obj n If the conflict positioning result does not contain a human face or contains a human face, and the distribution control early warning similarity epsilon accords with epsilon 2 ≤ε<ε 1 Then confirm the latest position Pos of obj manually n . At the same time in spatial position Pos n And setting a monitoring range for the latest position of the target, and updating video real-time data to be analyzed.
The device disclosed by the invention works cooperatively through a plurality of recognition technologies, fully utilizes the positioning accuracy of the face recognition technology, has high deployment density of the common monitoring cameras (video structuring technology), and introduces a conflict processing strategy of the positioning recognition results by a plurality of recognition technology means, so that the aims of positioning accuracy and complete track are achieved. The device can achieve the function of tracking the target in real time by using a small amount of hardware equipment resources in a video relay mode through dynamically selecting and dynamically changing the required analysis video monitoring range, meanwhile, the face recognition technology, the pedestrian re-recognition technology and the pedestrian appearance characteristic comparison work cooperatively by a plurality of recognition technology means, various positioning results are fused, and the accuracy and the completeness of real-time track tracking of key objects (pedestrians) across cameras are improved.
For convenience of description, the above method is described as functionally divided into various units. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present application.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.
Finally, what should be said is: the above embodiments are merely for illustrating the technical aspects of the present invention, and it should be understood by those skilled in the art that although the present invention has been described in detail with reference to the above embodiments: modifications and equivalents may be made thereto without departing from the spirit and scope of the invention, which is intended to be encompassed by the claims.