CN112950674B

CN112950674B - Cross-camera track tracking method and device based on cooperation of multiple recognition technologies and storage medium

Info

Publication number: CN112950674B
Application number: CN202110257986.0A
Authority: CN
Inventors: 覃智泉; 林淑强; 庄毅滨; 李立扬; 常鹏; 李山
Original assignee: Xiamen Public Security Bureau; Xiamen Meiya Pico Information Co Ltd
Current assignee: Xiamen Public Security Bureau; Xiamen Meiya Pico Information Co Ltd
Priority date: 2021-03-09
Filing date: 2021-03-09
Publication date: 2024-03-05
Anticipated expiration: 2041-03-09
Also published as: CN112950674A

Abstract

The invention provides a method, a device and a storage medium for tracking a track across cameras based on cooperation of multiple recognition technologies, wherein the method comprises the following steps: acquiring information of an object obj to be tracked and a spatial position Pos ₀ The method comprises the steps of carrying out a first treatment on the surface of the Setting a monitoring video range region0 to be analyzed, wherein the region0 is a spatial position Pos ₀ Is the circle center and the radius is the range of R; the target to be tracked is co-located based on the information of the target obj to be tracked and the acquired video data in the monitoring video range region0 by using a plurality of recognition technologies to obtain a locating result Pos ₁ The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a plurality of positioning results Pos by repeating the setting step and the synergetic positioning step of a plurality of recognition technologies _n‑1 ……Pos _n Pos is put into ₀ ，Pos ₁ ...Pos _n As a real-time trajectory of the object obj to be tracked. The invention cooperates through a plurality of recognition technologies, has accurate positioning and complete tracking track, uses a small amount of hardware equipment resources in a video relay mode, reduces the hardware cost and provides a specific conflict processing strategy.

Description

Cross-camera track tracking method and device based on cooperation of multiple recognition technologies and storage medium

Technical Field

The invention relates to the technical field of image recognition, in particular to a method and a device for tracking a track across cameras based on cooperation of multiple recognition technologies and a storage medium.

Background

In the field of computer vision, video-based object tracking problems have been an important topic and research hotspot. With the increase of the number and coverage of urban public safety video monitoring cameras, a target track tracking technology based on video images is more valued and applied in the aspects of social management and public safety maintenance.

At present, aiming at the tracking management and control work of key objects, the following methods of the video image target (pedestrian) track tracking technology are adopted: 1. positioning a target object by staring at a video monitoring picture to form track tracking; 2. based on the face snapshot camera, the camera captures face pictures, and whether a target object appears at the position of each face camera is positioned through face recognition comparison, so that track tracking is formed; 3. based on a common video monitoring camera, pedestrian (human body) and appearance attribute features thereof in a video image are extracted and analyzed through a video structuring technology, and then a pedestrian re-recognition technology and a pedestrian appearance attribute feature comparison technology are utilized to locate whether key objects appear at positions of all monitoring cameras.

The three methods of the video image target (pedestrian) track tracking technology have a plurality of defects and bottlenecks, and mainly comprise: the method 1 consumes a great deal of manpower and material resources for staring at the monitoring, has limited human precision and is tired, and the target is easy to lose, so that the method is difficult to popularize and use; the method 2 has mature face recognition technology at present and accurate positioning of the target object, but the method relies on face snapshot cameras, and the actual situations of various places at present are that the deployment density of the face cameras is not high, holes and holes in a plurality of space positions are left, the target object is easy to lose, and finally the track of the target object is incomplete; according to the method 3, a pedestrian (human body) object and attribute characteristics thereof in a video image are extracted through a video structuring technology, and then a pedestrian re-recognition technology and a pedestrian attribute characteristic comparison technology are utilized to locate whether key objects appear at each monitoring camera position.

Disclosure of Invention

The present invention is directed to one or more of the above-mentioned technical drawbacks of the prior art, and designs a specific method to solve the above-mentioned technical problems, and proposes the following technical solutions.

A tracking method of a cross-camera track based on cooperation of multiple recognition technologies comprises the following steps:

an acquisition step of acquiring information of an object obj to be tracked and a spatial position Pos ₀ ；

A setting step, namely setting a region0 of a monitoring video range to be analyzed, wherein the region0 is a position Pos in space ₀ Is a circle center, and has a radius of R, wherein R is a real number larger than zero;

a step of co-locating the target to be tracked based on the information of the target obj to be tracked and the acquired video data in the monitoring video range region0 by using a plurality of identification technologies to obtain a locating result Pos ₁ ；

Obtaining a plurality of positioning results Pos by repeating the setting step and the synergetic positioning step of a plurality of recognition technologies _n-1 ……Pos _n Pos is put into ₀ ,Pos ₁ ...Pos _n As a real-time trajectory of the object obj to be tracked.

Further, the information of the object obj to be tracked comprises a face image and a whole-body image.

Still further, the plurality of recognition techniques include face recognition positioning and video structured positioning, wherein the face recognition positioning recognizes a face captured by a camera within the monitoring video range based on the face image, and the video structured positioning recognizes video data acquired by a monitoring camera within the monitoring video range based on the whole-body image.

Still further, the video structured locations include pedestrian attribute locations that are located based on appearance local features of pedestrians and pedestrian re-identification locations that are located based on appearance global features of pedestrians.

Further, the operation of co-locating the target to be tracked to obtain a locating result includes: judging whether the positioning results of multiple recognition technologies are consistent, if the positioning results are consistent or only one positioning result is consistent, determining the latest position Pos of obj _n And take the previous position Pos _n-1 Preserving at the same time in terms of spatial position Pos _n The latest position of the target; if the positioning results are inconsistent, combining the positioning results through a preset conflict processing strategy, and determining the latest position Pos of obj _n Position Pos _n-1 Preserving at the same time in terms of spatial position Pos _n The latest position of the target;

the predetermined conflict handling policy is: setting two control thresholds for face recognition positioning: epsilon ₁ And epsilon ₂ Wherein ε is ₁ ＞ε ₂ ，ε ₁ The accuracy of the face distribution control early warning result is extremely high, epsilon ₂ The accuracy of the face control result is generally high, if the conflict positioning result contains the face result and the face control early warning similarity threshold value is more than or equal to epsilon ₁ Then the face positioning result is taken as the latest position Pos of obj _n If the conflict positioning result does not contain a human face or contains a human face, and the distribution control early warning similarity epsilon accords with epsilon ₂ ≤ε＜ε ₁ Then confirm the latest position Pos of obj manually _n 。

The invention also provides a device for tracking the track across the cameras based on the cooperation of a plurality of recognition technologies, which comprises:

an acquisition unit for acquiring information of the object obj to be tracked and emptyInter-position Pos ₀ ；

The setting unit sets a region0 of the monitoring video to be analyzed, wherein the region0 is a position Pos in space ₀ Is a circle center, and has a radius of R, wherein R is a real number larger than zero;

the multiple recognition technology co-locating unit is used for obtaining a locating result Pos by using multiple recognition technologies to co-locate the target to be tracked based on the information of the target obj to be tracked and the acquired video data in the monitoring video range region0 ₁ ；

Obtaining a plurality of positioning results Pos by repeatedly executing the operations of the setting unit and the co-positioning unit of a plurality of recognition technologies _n-1 ……Pos _n Pos is put into ₀ ,Pos ₁ ...Pos _n As a real-time trajectory of the object obj to be tracked.

Further, the operation of co-locating the target to be tracked to obtain a locating result includes: judging whether the positioning results of multiple recognition technologies are consistent, if the positioning results are consistent or only one positioning result is consistent, determining the latest position Pos of obj _n And take the previous position Pos _n-1 Preserving at the same time in terms of spatial position Pos _n The latest position of the target; if the positioning results are inconsistent, combining the positioning results through a preset conflict processing strategyAs a result, the latest position Pos of obj is determined _n Position Pos _n-1 Preserving at the same time in terms of spatial position Pos _n The latest position of the target;

The invention also proposes a computer readable storage medium having stored thereon computer program code which, when executed by a computer, performs any of the methods described above.

The invention has the technical effects that: the invention provides a method, a device and a storage medium for tracking a track across cameras based on cooperation of multiple recognition technologies, wherein the method comprises the following steps: an acquisition step of acquiring information of an object obj to be tracked and a spatial position Pos ₀ The method comprises the steps of carrying out a first treatment on the surface of the A setting step, namely setting a region0 of a monitoring video range to be analyzed, wherein the region0 is a position Pos in space ₀ Is a circle center, and has a radius of R, wherein R is a real number larger than zero; a step of co-locating the target to be tracked based on the information of the target obj to be tracked and the acquired video data in the monitoring video range region0 by using a plurality of identification technologies to obtain a locating result Pos ₁ The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a plurality of positioning results Pos by repeating the setting step and the synergetic positioning step of a plurality of recognition technologies _n-1 ……Pos _n Pos is put into ₀ ，Pos ₁ ...Pos _n As a real-time trajectory of the object obj to be tracked. The invention works cooperatively by a plurality of recognition technologies, fully utilizes the positioning accuracy and universality of the face recognition technologyThe invention achieves the aims of accurate positioning and complete track by adopting a high deployment density of a monitoring camera (video structuring technology) and introducing a conflict processing strategy of a multi-recognition technology means positioning recognition result.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings.

FIG. 1 is a flow chart of a method of tracking a track across cameras based on collaboration of multiple recognition techniques, in accordance with an embodiment of the present invention.

FIG. 2 is a block diagram of a cross-camera trajectory tracking device based on collaboration of multiple recognition techniques in accordance with an embodiment of the invention.

Detailed Description

The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 shows a method for tracking a track across cameras based on cooperation of multiple recognition technologies, which comprises the following steps:

an acquisition step S101 of acquiring information of the object obj to be tracked and the spatial position Pos ₀ The method comprises the steps of carrying out a first treatment on the surface of the The object to be trackedThe target obj can be a pedestrian, but also can be other objects such as animals, etc., a camera, a snap camera, etc. can be used for acquiring the information of the target obj to be tracked, and the spatial position Pos ₀ As the initial position, the information of the object obj to be tracked can be obtained by an image recognition technology or can be manually input, and preferably, the information of the object obj to be tracked comprises a face image and a whole-body image.

A setting step S102, setting a region0 of the monitoring video to be analyzed, wherein the region0 is a spatial position Pos ₀ Is a circle center, and has a radius of R, wherein R is a real number larger than zero; the invention aims at the bottleneck that a large amount of hardware server equipment is needed to support in the area range by the current video structuring technology, and provides the method for carrying out the structuring analysis on the monitoring video in the radius R range by taking the target space position as the center, dynamically selecting and changing the monitoring video needing the structuring analysis along with the movement of the positioned target, tracking the target in a video relay mode, reducing the number of the required structuring video, achieving the aim of tracking the target in real time by a small amount of hardware equipment, and reducing the hardware cost of the existing real-time tracking system.

A step S103 of co-locating the target to be tracked based on the information of the target obj to be tracked and the acquired video data in the monitoring video range region0 by using a plurality of identification technologies to obtain a locating result Pos ₁ The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a plurality of positioning results Pos by repeating the setting step and the synergetic positioning step of a plurality of recognition technologies _n-1 ……Pos _n Pos is put into ₀ ,Pos ₁ ...Pos _n As a real-time trajectory of the object obj to be tracked.

In one embodiment, the plurality of recognition techniques includes face recognition localization that recognizes faces captured by cameras within the surveillance video based on the face images and video structured localization that recognizes video data acquired by surveillance cameras within the surveillance video based on the whole-body images.

In one embodiment, the video structured locations include pedestrian attribute locations and pedestrian re-identification locations, the pedestrian attribute locations being located based on appearance local features of the pedestrian, generally appearance local features including, but not limited to, the person's wear (such as coat, pants, hat, etc.), backpack, hairstyle, glasses, etc.; the pedestrian re-identification positioning is based on the global appearance feature of the pedestrian, and in general, the global appearance feature is formed by all local features of the pedestrian and can be expressed in a multi-dimensional vector form.

The invention provides a plurality of identification positioning technologies to cooperatively work, the positioning results are mutually fused, the positions of pedestrians in the cameras are judged, the positioning accuracy of key objects is improved, and a more complete and accurate pedestrian tracking track crossing the cameras is formed. The multi-recognition means cooperative working process of the invention is specifically as follows: the human face picture is extracted according to the provided key object picture and used for carrying out key human dynamic distribution control, the human body picture is extracted for carrying out pedestrian recognition distribution control of the pedestrian re-recognition technology, the human body attribute characteristic is extracted, and the apparent characteristic attribute which is obviously differentiated from the characteristic attribute is selected for characteristic attribute comparison distribution control. And according to the control recognition result, combining with the spatial position information of the camera from which the control early warning picture is derived, the position of the key object (pedestrian) in real time can be known. The multi-recognition technology has a plurality of positioning results, and the positioning results are fused, wherein the fusion method can be as follows: if the intersection is not an empty set, the intersection position is the real-time position of the key object (pedestrian), if the intersection is empty, namely, the control results of a plurality of recognition means have conflicts, a positioning result conflict processing strategy is introduced for processing, and the fusion method of the positioning results is not limited to the method, and the positioning can be performed by adopting a result voting method or a weighting calculation method, which is an important invention point of the invention.

In one embodiment, the operation of co-locating the target to be tracked to obtain a locating result includes: judging multiple kinds of knowledgeWhether the positioning results of other technologies are consistent or not, namely adopting a calculation mode of solving intersection of a plurality of positioning results, if the positioning results are consistent or only one positioning result is adopted, determining the latest position Pos of obj _n And take the previous position Pos _n-1 Preserving at the same time in terms of spatial position Pos _n The latest position of the target; if the positioning results are inconsistent, combining the positioning results through a preset conflict processing strategy, and determining the latest position Pos of obj _n Position Pos _n-1 Preserving at the same time in terms of spatial position Pos _n The latest position of the target;

the predetermined conflict handling policy is: setting two control thresholds for face recognition positioning: epsilon ₁ And epsilon ₂ Wherein ε is ₁ ＞ε ₂ ，ε ₁ The accuracy of the face distribution control early warning result is extremely high, epsilon ₂ The accuracy of the face control result is generally high, if the conflict positioning result contains the face result and the face control early warning similarity threshold value is more than or equal to epsilon ₁ Then the face positioning result is taken as the latest position Pos of obj _n If the conflict positioning result does not contain a human face or contains a human face, and the distribution control early warning similarity epsilon accords with epsilon ₂ ≤ε＜ε ₁ Then confirm the latest position Pos of obj manually _n . At the same time in spatial position Pos _n And setting a monitoring range for the latest position of the target, and updating video real-time data to be analyzed.

The method disclosed by the invention works cooperatively through a plurality of recognition technologies, fully utilizes the positioning accuracy of the face recognition technology, has high deployment density of the common monitoring cameras (video structuring technology), and introduces a conflict processing strategy of the positioning recognition results by a plurality of recognition technology means, so that the aims of positioning accuracy and complete track are achieved. According to the method, the video monitoring range is required to be analyzed through dynamic selection and dynamic change, the function of tracking the target in real time can be achieved through a video relay mode by using a small amount of hardware equipment resources, meanwhile, the face recognition technology, the pedestrian re-recognition technology and the pedestrian appearance characteristic comparison work cooperatively through multiple recognition technology means, multiple positioning results are fused, and the accuracy and the completeness of real-time track tracking of key objects (pedestrians) across cameras are improved.

FIG. 2 illustrates a cross-camera trajectory tracking device based on the cooperation of multiple recognition techniques, the device comprising:

an acquisition unit 201 that acquires information of an object obj to be tracked, and a spatial position Pos ₀ The method comprises the steps of carrying out a first treatment on the surface of the The object obj to be tracked can be a pedestrian, but can also be other objects, such as animals, etc., and the information of the object obj to be tracked and the spatial position Pos can be obtained by using a camera, a snapshot camera, etc ₀ As the initial position, the information of the object obj to be tracked can be obtained by an image recognition technology or can be manually input, and preferably, the information of the object obj to be tracked comprises a face image and a whole-body image.

The setting unit 202 sets a region0 of the monitoring video to be analyzed, where the region0 is defined by a spatial position Pos ₀ Is a circle center, and has a radius of R, wherein R is a real number larger than zero; the invention aims at the bottleneck that a large amount of hardware server equipment is needed to support in the area range by the current video structuring technology, and provides the method for carrying out the structuring analysis on the monitoring video in the radius R range by taking the target space position as the center, dynamically selecting and changing the monitoring video needing the structuring analysis along with the movement of the positioned target, tracking the target in a video relay mode, reducing the number of the required structuring video, achieving the aim of tracking the target in real time by a small amount of hardware equipment, and reducing the hardware cost of the existing real-time tracking system.

The multiple recognition technology co-locating unit 203 performs co-locating on the target to be tracked based on the information of the target obj to be tracked and the acquired video data in the monitoring video range region0 by using multiple recognition technologies to obtain a locating result Pos ₁ The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a plurality of positioning results Pos by performing operations of the repeated setting unit and the multiple recognition technology co-positioning unit _n-1 ……Pos _n Pos is put into ₀ ，Pos ₁ ...Pos _n As a real-time trajectory of the object obj to be tracked.

In one embodiment, the operation of co-locating the target to be tracked to obtain a locating result includes: judging whether the positioning results of multiple recognition technologies are consistent or not, namely adopting a calculation mode of solving intersection of multiple positioning results, and if the positioning results are consistent or only one positioning result is adopted, determining the latest position Pos of obj _n And take the previous position Pos _n-1 Preserving at the same time in terms of spatial position Pos _n The latest position of the target; if the positioning results are inconsistent, combining the positioning results through a preset conflict processing strategy, and determining the latest position Pos of obj _n Position Pos _n-1 Preserving at the same time in terms of spatial position Pos _n The latest position of the target;

The device disclosed by the invention works cooperatively through a plurality of recognition technologies, fully utilizes the positioning accuracy of the face recognition technology, has high deployment density of the common monitoring cameras (video structuring technology), and introduces a conflict processing strategy of the positioning recognition results by a plurality of recognition technology means, so that the aims of positioning accuracy and complete track are achieved. The device can achieve the function of tracking the target in real time by using a small amount of hardware equipment resources in a video relay mode through dynamically selecting and dynamically changing the required analysis video monitoring range, meanwhile, the face recognition technology, the pedestrian re-recognition technology and the pedestrian appearance characteristic comparison work cooperatively by a plurality of recognition technology means, various positioning results are fused, and the accuracy and the completeness of real-time track tracking of key objects (pedestrians) across cameras are improved.

For convenience of description, the above method is described as functionally divided into various units. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present application.

From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.

Finally, what should be said is: the above embodiments are merely for illustrating the technical aspects of the present invention, and it should be understood by those skilled in the art that although the present invention has been described in detail with reference to the above embodiments: modifications and equivalents may be made thereto without departing from the spirit and scope of the invention, which is intended to be encompassed by the claims.

Claims

1. A method for tracking a track across cameras based on cooperation of multiple recognition technologies is characterized by comprising the following steps:

an acquisition step of acquiring information of an object obj to be tracked and a spatial position Pos ₀ The information of the object obj to be tracked comprises a face image and a whole-body image;

a setting step of setting a to-be-setThe resolved monitoring video range region0, wherein the region0 is defined by a spatial position Pos ₀ Is a circle center, and has a radius of R, wherein R is a real number larger than zero;

a step of co-locating the target to be tracked based on the information of the target obj to be tracked and the acquired video data in the monitoring video range region0 by using a plurality of identification technologies to obtain a locating result Pos ₁ The method comprises the steps of carrying out a first treatment on the surface of the The plurality of recognition technologies comprise face recognition positioning and video structuring positioning, wherein the face recognition positioning is used for recognizing a face captured by a camera in the monitoring video range based on the face image, and the video structuring positioning is used for recognizing video data acquired by a monitoring camera in the monitoring video range based on the whole-body image;

obtaining a plurality of positioning results Pos by repeating the setting step and the synergetic positioning step of a plurality of recognition technologies _n-1 ……Pos _n Pos is put into ₀ ,Pos ₁ ...Pos _n As a real-time track of the object obj to be tracked; the operation of co-locating the target to be tracked to obtain a locating result comprises the following steps: judging whether the positioning results of multiple recognition technologies are consistent, if the positioning results are consistent or only one positioning result is consistent, determining the latest position Pos of obj _n And take the previous position Pos _n-1 Preserving at the same time in terms of spatial position Pos _n The latest position of the target; if the positioning results are inconsistent, combining the positioning results through a preset conflict processing strategy, and determining the latest position Pos of obj _n Position Pos _n-1 Preserving at the same time in terms of spatial position Pos _n The latest position of the target;

the predetermined conflict handling policy is: setting two control thresholds for face recognition positioning: epsilon ₁ And epsilon ₂ Wherein ε is ₁ ＞ε ₂ ，ε ₁ The accuracy of the face distribution control early warning result is extremely high, epsilon ₂ The accuracy of the face control result is generally high, if the conflict positioning result contains the face result and the face control early warning similarity threshold value is more than or equal to epsilon ₁ Then the face positioning result is taken asLatest position Pos of obj _n If the conflict positioning result does not contain a human face or contains a human face, and the distribution control early warning similarity epsilon accords with epsilon ₂ ≤ε＜ε ₁ Then confirm the latest position Pos of obj manually _n 。

2. The method of claim 1, wherein the video structured locations include a pedestrian attribute location that locates based on a local feature of appearance of a pedestrian and a pedestrian re-identification location that locates based on a global feature of appearance of a pedestrian.

3. A cross-camera track tracking device based on cooperation of multiple recognition technologies is characterized in that the device comprises:

an acquisition unit for acquiring information of an object obj to be tracked and a spatial position Pos ₀ The information of the object obj to be tracked comprises a face image and a whole-body image;

the multiple recognition technology co-locating unit is used for obtaining a locating result Pos by using multiple recognition technologies to co-locate the target to be tracked based on the information of the target obj to be tracked and the acquired video data in the monitoring video range region0 ₁ The method comprises the steps of carrying out a first treatment on the surface of the The plurality of recognition technologies comprise face recognition positioning and video structuring positioning, wherein the face recognition positioning is used for recognizing a face captured by a camera in the monitoring video range based on the face image, and the video structuring positioning is used for recognizing video data acquired by a monitoring camera in the monitoring video range based on the whole-body image;

obtaining a plurality of positioning results Pos by repeatedly executing the operations of the setting unit and the co-positioning unit of a plurality of recognition technologies _n-1 ……Pos _n Pos is put into ₀ ,Pos ₁ ...Pos _n Real-time track as object obj to be trackedA trace; the operation of co-locating the target to be tracked to obtain a locating result comprises the following steps: judging whether the positioning results of multiple recognition technologies are consistent, if the positioning results are consistent or only one positioning result is consistent, determining the latest position Pos of obj _n And take the previous position Pos _n-1 Preserving at the same time in terms of spatial position Pos _n The latest position of the target; if the positioning results are inconsistent, combining the positioning results through a preset conflict processing strategy, and determining the latest position Pos of obj _n Position Pos _n-1 Preserving at the same time in terms of spatial position Pos _n The latest position of the target;

4. The apparatus of claim 3, wherein the video structured locations comprise a pedestrian attribute location that locates based on a local feature of appearance of a pedestrian and a pedestrian re-identification location that locates based on a global feature of appearance of a pedestrian.

5. A computer readable storage medium, characterized in that the storage medium has stored thereon a computer program code which, when executed by a computer, performs the method of any of claims 1-2.