CN113592910A

CN113592910A - Cross-camera tracking method and device

Info

Publication number: CN113592910A
Application number: CN202110862748.2A
Authority: CN
Inventors: 赵雷; 潘华东; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2021-11-02

Abstract

Embodiments of the present invention provide a cross-camera tracking method and apparatus. The method includes: performing gait feature extraction on each first picture of a target object collected by a first camera to obtain a first feature of the target object; and determining motion information of the target object based on the position information of the target object collected by the first camera; Determine the camera located in the search area as the second camera according to the motion information; extract the gait feature of each second picture of each tracked object collected by the second camera to obtain the second feature of each tracked object; The similarity between the second feature and the first feature of the target object determines whether each tracking object includes the target object. The accuracy of determining the search area is improved, and the second camera determined through the search area has a high probability of success in tracking, which not only improves the efficiency of cross-camera tracking, but also improves the accuracy of cross-camera tracking.

Description

Cross-camera tracking method and device

Technical Field

The embodiment of the invention relates to the technical field of computer vision, in particular to a cross-camera tracking method, a cross-camera tracking device, a computing device and a computer readable storage medium.

Background

With the rapid development of the digital informatization of the society, monitoring cameras are deployed in a large quantity on the street and even in the home so as to monitor some illegal behaviors and dangerous events.

However, the visual range of a single monitoring camera is limited, and if the target object needs to be continuously determined after disappearing in the shooting range of the monitoring camera, a plurality of monitoring cameras are needed to cooperatively track the target object in relay, that is, to track the target object across cameras.

In summary, the embodiments of the present invention provide a cross-camera tracking method for improving efficiency and accuracy of cross-camera tracking.

Disclosure of Invention

The embodiment of the invention provides a cross-camera tracking method, which is used for improving the efficiency and accuracy of cross-camera tracking.

In a first aspect, an embodiment of the present invention provides a cross-camera tracking method, including:

gait feature extraction is carried out on each first picture of a target object acquired by a first camera to obtain first features of the target object;

determining motion information of the target object based on the position information of the target object acquired by the first camera;

determining that the camera located in the search area is a second camera according to the motion information;

performing gait feature extraction on each second picture of each tracked object acquired by the second camera to obtain second features of each tracked object;

and determining whether each tracking object comprises the target object or not based on the similarity of the second characteristic of each tracking object and the first characteristic of the target object.

The method comprises the steps of determining a first feature of a target object through a picture of the target object shot by a first camera, determining a second feature of a tracking object through pictures of all tracking objects shot by a second camera, and determining whether the tracking object corresponding to the second feature is the target object or not through comparison of the first feature and the second feature. Because the target object may be changed in clothes, a human face and the like, or the target object may be dressed in uniform, there is a certain limitation in identifying through clothing features or human face features, and the tracking accuracy is not high. The gait features are not easy to disguise generally, can be identified remotely and are not easy to be perceived by the target object, so the gait features are introduced as comparison features, and the interference of the target object on tracking accuracy under the conditions of changing clothes, making up or wearing uniform and the like is reduced. Secondly, a search area is determined by analyzing the motion information of the target object instead of searching in a large range without destination, so that the accuracy of determining the search area is improved.

Optionally, performing gait feature extraction on each first picture of the target object acquired by the first camera to obtain a first feature of the target object, including:

determining an identification picture sequence from first pictures of a target object acquired by the first camera; carrying out gait feature extraction on the identification picture sequence to obtain a first step feature of the target object;

determining an identification picture from first pictures of a target object acquired by the first camera; extracting image features of the identification picture to obtain first image features of the target object;

and performing feature fusion on the first step state feature and the first image feature to obtain the first feature.

The first step state features reflect the features of the gait aspect of the target object, the first image features reflect the features of the face and/or the dress aspect of the target object, and the first step state features and the first image features are subjected to feature fusion processing, so that the obtained first features can reflect the features of the target object more comprehensively, and the tracking accuracy is improved. Meanwhile, the first image features are determined through the identification pictures determined in the first pictures of the target object shot by the first camera, and the first step-state features are determined through the identification picture sequences determined in the first pictures of the target object shot by the first camera, so that the first image features and the first step-state features can be determined more specifically.

Optionally, determining an identification picture from the first pictures of the target object captured by the first camera includes:

performing picture analysis on each first picture to obtain a quality assessment value of each first picture, and determining the first picture with the quality assessment value meeting the set requirement as the identification picture;

performing image feature extraction on the identification picture to obtain a first image feature of the target object, including:

and inputting the identification picture into an image feature extraction network, and extracting a first image feature of the target object.

Firstly, picture analysis is carried out on each first picture, the first picture meeting the requirements is determined to be an identification picture according to the quality assessment value, and the identification picture subjected to picture analysis can clearly reflect image information of a face, a dress and the like of a target object. The recognition picture is then input to an image feature extraction network, so that a first image feature of the target object can be extracted. By the method, the accuracy of extracting the first image feature of the target object can be improved.

Optionally, determining a sequence of recognition pictures from the first pictures of the target object captured by the first camera includes:

performing walking recognition on each first picture to obtain the walking probability of the target object in each first picture;

screening out an identification picture sequence from each first picture; the identification picture sequence is continuous K first pictures with the walking probability larger than a first preset threshold value;

performing gait feature extraction on the identification picture sequence to obtain a first step feature of the target object, including:

and inputting the identification picture sequence into a gait feature extraction network, and extracting the first step feature of the target object.

Since there may be behaviors of the target object such as bending, squatting, sitting down or turning in each first picture of the target object captured by the first camera, and it is not reasonable to determine the gait feature of the target object by using these behaviors, first, the walking probability of the target object in each first picture is determined by performing walking recognition on each first picture. The walking probability is set to be larger than a first preset threshold value, so that a continuous identification picture sequence containing K first pictures is screened out. The walking probability of the target object in the first picture in the identification picture sequence meets the requirement and is a continuous picture, so that the gait feature of the target object can be accurately reflected. The identification picture sequence is input into a gait feature extraction network, so that the accuracy of the extracted first step feature is improved.

Optionally, determining motion information of the target object based on the position information of the target object acquired by the first camera includes:

selecting the last N pictures shot according to the time sequence from the first pictures;

and determining the motion direction of the target object according to the position information of the target object in the last N pictures respectively.

And determining the motion information of the target object according to the last N pictures taken in the first picture according to the time sequence, wherein the motion information can be the motion direction. Specifically, the position change condition of the target object is judged according to the position of the target object in the last N pictures of the target object shot by the first camera, so that the motion direction of the target object is determined. And a search area is determined according to the possible motion direction of the target object in the follow-up process, so that the search range can be narrowed, the successful search efficiency is improved, and the process of finally determining the tracked object is quicker and more accurate.

Optionally, determining that the camera located in the search area is the second camera according to the motion information includes:

forming a sector area with a set angle by taking the position of the first camera as the center of a circle and the walking distance in a set time length as a radius, wherein the sector area is a search area; wherein the axis of symmetry of the sector-shaped area is parallel to the direction of motion;

determining that a camera located in the search area is a second camera.

The specific search area can be a sector area, in order to simplify the operation, the position of the first camera and the shooting range of the first camera can be abstracted to be the same point, the point is used as the center of a circle, the walking distance in the set duration is used as the radius to determine a sector area, and the angle of the sector area can be any. Compared with the method for searching the second camera in an unscheduled way, the method for searching the second camera in the searching area can improve the searching efficiency by determining the searching area with a smaller possible range through the motion direction in the scheme.

Optionally, determining whether each tracked object includes the target object based on the similarity between the second feature of each tracked object and the first feature of the target object includes:

performing feature fusion on the first feature and the second feature to obtain a fused feature;

inputting the fusion features into a verification classifier to obtain feature values of the fusion features; the feature value is used for characterizing the similarity of the first feature and the second feature;

and if the characteristic value is larger than a second preset threshold value, determining that the tracking object is the target object.

After the first feature of the target object and the second feature of the tracked object are obtained, the first feature and the second feature need to be compared. According to the scheme, the first feature and the second feature are fused to obtain a fusion feature, and the feature value of the fusion feature is calculated, so that the similarity of the first feature and the second feature can be obtained. And determining whether the tracking object is the target object or not by comparing the similarity with a second preset threshold value. By adopting the scheme, the judgment accuracy of the similarity of the first characteristic and the second characteristic can be improved, so that the tracking object can be accurately determined, and the accuracy of cross-camera tracking is improved.

In a second aspect, an embodiment of the present invention further provides a cross-camera tracking apparatus, including:

the device comprises a determining unit, a judging unit and a judging unit, wherein the determining unit is used for carrying out gait feature extraction on each first picture of a target object acquired by a first camera to obtain first features of the target object; determining motion information of the target object based on the position information of the target object acquired by the first camera;

the processing unit is used for determining that the camera located in the search area is a second camera according to the motion information; performing gait feature extraction on each second picture of each tracked object acquired by the second camera to obtain second features of each tracked object; and determining whether each tracking object comprises the target object or not based on the similarity of the second characteristic of each tracking object and the first characteristic of the target object.

In a third aspect, an embodiment of the present invention further provides a computing device, including:

a memory for storing a computer program;

and the processor is used for calling the computer program stored in the memory and executing the cross-camera tracking method listed in any mode according to the obtained program.

In a fourth aspect, the present invention further provides a computer-readable storage medium, where a computer-executable program is stored, where the computer-executable program is configured to enable a computer to execute a cross-camera tracking method listed in any one of the above manners.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a cross-camera tracking method according to an embodiment of the present invention;

fig. 3 is a process of performing target detection on a picture taken by a first camera according to an embodiment of the present invention;

FIG. 4A is a diagram illustrating the results of one possible target tracking method provided by an embodiment of the present invention;

FIG. 4B is a diagram illustrating the results of one possible target tracking method provided by an embodiment of the present invention;

FIG. 5A is a schematic diagram illustrating a search area determined according to a motion direction according to an embodiment of the present invention;

FIG. 5B is a schematic diagram illustrating a search area determined according to a motion direction according to an embodiment of the present invention;

fig. 5C is a schematic diagram of determining a second camera located in the search area according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a cross-camera tracking device according to an embodiment of the present invention.

Detailed Description

To make the objects, embodiments and advantages of the present application clearer, the following description of exemplary embodiments of the present application will clearly and completely describe the exemplary embodiments of the present application with reference to the accompanying drawings in the exemplary embodiments of the present application, and it is to be understood that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

All other embodiments, which can be derived by a person skilled in the art from the exemplary embodiments described herein without inventive step, are intended to be within the scope of the claims appended hereto. In addition, while the disclosure herein has been presented in terms of one or more exemplary examples, it should be appreciated that aspects of the disclosure may be implemented solely as a complete embodiment.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and are not necessarily intended to limit the order or sequence of any particular one, Unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.

Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.

Fig. 1 exemplarily shows a system architecture to which an embodiment of the present invention is applicable, and the system architecture may include a server 100 and an image pickup apparatus 200, where the image pickup apparatus may include a plurality of apparatuses, such as an image pickup apparatus a, an image pickup apparatus B, and an image pickup apparatus C.

The server 100, which is a type of computer, runs faster and is more highly loaded than a general computer. The server provides a computing or application service to other clients (e.g., various devices such as a PC (Personal Computer), a smart phone, an ATM (Automated Teller Machine), etc.) in the network. The server has high-speed Central Processing Unit (CPU) computing capability, long-time reliable operation, powerful Input/Output (I/O) external data throughput capability, and better expandability. Generally, a server has the capability of responding to a service request, supporting a service, and guaranteeing the service according to the service provided by the server. The server is used as an electronic device, and the internal structure of the server is very complex, but the difference with the internal structure of a common computer is not great, such as: cpu, hard disk, memory, system bus, etc.

An image pickup apparatus 200 for taking images and/or videos. The system can be installed at each intersection, and events occurring on roads can be monitored, criminal suspects can be tracked, lost population can be searched and the like by calling video contents shot by the camera equipment; the safety monitoring device can also be installed in a household and used for monitoring some potential safety hazards in the household.

It should be noted that the structure shown in fig. 1 is only an example, and the embodiment of the present invention is not limited thereto.

Fig. 2 illustrates an example of a cross-camera tracking method provided in an embodiment of the present invention, including:

step 201, performing gait feature extraction on each first picture of the target object acquired by the first camera to obtain a first feature of the target object.

Firstly, target detection and target tracking are carried out on a series of pictures shot by a first camera in a set time period, so that each first picture of a target object is obtained. Each first picture is a series of pictures taken by the first camera in the process from the shooting range of the target object going into the first camera to the shooting range of the target object going out of the first camera. The target object in the embodiment of the present invention may be various target objects such as pedestrians and animals, which is not limited in this embodiment of the present invention, and the following describes the processes of target detection and target tracking by taking the tracking pedestrian as an example.

Fig. 3 shows a process of object detection for a picture taken by a first camera. The human body target in the picture is detected by using a common target detection method, and an id (Identity document) is allocated to the detected human body target. For example, target detection is performed on a first frame of picture shot by a first camera, two pedestrians are detected and are respectively numbered as a pedestrian 1 and a pedestrian 2; performing target detection on the second frame picture, comparing the characteristics of the pedestrian with the characteristics of the pedestrian 1 and the pedestrian 2 in the first frame picture aiming at any pedestrian in the picture, and if the similarity of the characteristics of the pedestrian and the characteristics of the pedestrian 1 is higher, determining that the pedestrian is the pedestrian 1; if the feature similarity with the pedestrian 2 is high, determining that the pedestrian is the pedestrian 2; if the feature similarity with both the pedestrian 1 and the pedestrian 2 is low, the pedestrian is numbered as the pedestrian 3. In this manner, the number of the pedestrian can be determined. Various features such as human face features or clothing features can be compared, and the embodiment of the present invention is not limited to this. For example, after feature comparison, it is determined that the pedestrian 1 and the pedestrian 2 exist in the second frame picture. By analogy, a corresponding number may be determined for the pedestrian in each picture taken by the first camera, for example, in the third frame picture, there are the pedestrian 1 and the pedestrian 4, which indicates that in this frame picture, the pedestrian 1 still exists, but the pedestrian 2 is not in the shooting range of the first camera, and a new pedestrian, namely the pedestrian 4, appears in the shooting range of the first camera.

By the target detection method, target objects contained in a series of pictures taken by the first camera are determined, and then corresponding first pictures are determined for the target objects by target tracking. Fig. 4A exemplarily shows a result of one possible target tracking, i.e. a respective first picture for each target object. For example, for the pedestrian 1, the first pictures of the target object are determined to be the first frame picture, the second frame picture and the third frame picture, and for the pedestrian 2, the first pictures of the target object are determined to be the first frame picture and the second frame picture. The above is merely an example, and the number of the first pictures corresponding to the target object is not limited in the embodiment of the present invention, and the first pictures corresponding to the target object may be continuous picture frames or discontinuous, for example, if the pedestrian 2 is not detected in the third frame picture, but the pedestrian 2 is detected in the sixth frame picture, then the first pictures corresponding to the pedestrian 2 are the first frame picture, the second frame picture and the sixth frame picture.

Optionally, each first picture may be in the form of a whole picture including the target object and other objects, for example, as illustrated in fig. 4A, each first picture is a whole picture taken by the first camera, and thus, the first picture includes many other pedestrians or other information, so that in the subsequent extraction of the gait feature of a certain target object, an unrelated target object or unrelated information needs to be masked or deleted; each first picture may also be in the form of a detection frame only including the target object, for example, as illustrated in fig. 4B, each first picture corresponding to the target object pedestrian 1 is a detection frame including the pedestrian 1, and thus, the first feature of each target object may be extracted more accurately in the subsequent step.

And aiming at any target object, obtaining the first characteristics of the target object through each first picture of the target object, wherein the first characteristics at least comprise first step state characteristics. In addition, the method can also comprise a first image feature, wherein the first image feature is obtained by extracting the image feature of the face, the clothing and the like of the target object. The first feature is not limited in the embodiment of the present invention, and the specific feature included in the first image feature is also not limited, and a person skilled in the art can freely select the first feature according to needs. For example, only the first step feature is extracted or the first step feature and the first image feature are extracted according to each first picture of the target object shot by the first camera, and the first feature is obtained after feature fusion is performed on the first step feature and the first image feature. The first image feature may be feature extraction only on the face of the target object, feature extraction only on the clothing of the target object, or feature fusion to obtain the first image feature after feature extraction on the face and the clothing of the target object.

The method for extracting the first-step feature will be described in detail below.

Firstly, an identification picture sequence is determined from each first picture of a target object acquired by a first camera, and then a first step state characteristic of the target object is obtained through the identification picture sequence.

The process of determining the sequence of the recognition pictures is actually a process of screening each first picture, and since each first picture contains the target object but has various postures, such as walking, running, squatting, bending and turning, wherein the postures of running, squatting, bending and turning do not reflect the gait characteristics of the target object, the first pictures of the postures need to be excluded, and the first pictures of the target object for normal walking are reserved, so as to determine the sequence of the recognition pictures. The method of determining the identification picture sequence may be as follows.

In a first mode

And adopting a pre-trained neural network model to carry out walking recognition on each first picture to obtain the walking probability of the target object in each first picture, and screening out continuous K first pictures with the walking probability larger than a preset threshold value as a recognition picture sequence.

The neural network model may be trained by: inputting a large number of pedestrian images, wherein the pedestrian images comprise pedestrians in various postures, such as blocking, walking, running, squatting, bending and turning, and inputting walking probabilities corresponding to the pedestrian images, for example, for the pedestrian images in the postures of walking, the walking probability is higher and is 1; for the pedestrian image with the turning posture, the walking probability is lower and is 0; for a partially occluded pedestrian image, the walking probability is 0.3. The foregoing is by way of example only and is not intended as a limitation upon the embodiments of the present invention.

And inputting a large number of pedestrian images and corresponding walking probabilities into the neural network model for network training, wherein the obtained neural network model can give the walking probability of the corresponding target object aiming at any first picture. For example, each first picture corresponding to the pedestrian 1 is input: p { s1, s2, s3 … … sn }, the neural network model obtains the walking probability of the target object in each first picture: o ═ O1, O2, O3 … … on, where 0 ≦ oi ≦ 1.

And then screening out continuous K first pictures with the walking probability larger than a preset threshold value as an identification picture sequence.

For example, if the walking probability of the target object in each of the obtained first pictures is "O, {1,0.9,0.7,0.2,0.5,1,1,0.9,0.8,0}, and the preset threshold value is 0.6, the walking probabilities screened out are {1,0.9,0.7} and {1,1,0.9,0.8}, respectively, and the corresponding recognition picture sequences are P1 and P2. In order to improve the accuracy of the final extraction of the first-step feature, the number of first pictures included in the identification picture sequence may be limited, for example, not less than 4, that is, K is 4, the number of first pictures included in the identification picture sequence P1 is 3, and the number of first pictures included in the identification picture sequence P2 is 4, so that the identification picture sequence is determined to be P2. Generally, the greater the number of the first pictures included in the identification picture sequence, the more the gait features of the pedestrian can be reflected, and the more accurate the first step features are extracted. The value of K can be set accordingly as desired by those skilled in the art.

If K is 3, both the two identification picture sequences P1 and P2 meet the requirement, and both the two identification picture sequences may be used as identification picture sequences for subsequently extracting the first-step feature, or one of the identification picture sequences may be screened for subsequently extracting the first-step feature. For example, the sum of the walking probabilities of the two recognition picture sequences may be compared, and after the comparison, the sum of the walking probabilities of the recognition picture sequence P2 is determined to be 1+1+0.9+ 0.8-3.7, and the sum of the walking probabilities of the recognition picture sequence P1 is determined to be 1+0.9+ 0.7-2.6. If the sum of the walking probabilities of the recognition picture sequence P2 is higher, the recognition picture sequence P2 is used as the final recognition picture sequence for subsequent gait feature extraction.

In implementation, the number of the first pictures included in the obtained identification picture sequence does not meet the requirement of the K value, and at this time, the identification picture sequence with the highest identification probability and/or the largest number of the included first pictures may be selected, and then the identification picture sequence is subjected to frame complementing. For example, if K is 5, the number of first pictures included in the recognition picture sequences P1 and P2 is not satisfactory, the recognition picture sequence P2 having the highest recognition probability and the largest number of first pictures is selected, and then the frame supplementing operation is performed on P2 to supplement a blank frame.

The Neural Network provided by the embodiment of the present invention includes, but is not limited to, CNN (Convolutional Neural Networks), RNN (Recurrent Neural Networks), and MLP (multi-Layer perceptron), Muti-Layer Perception.

Mode two

And screening out first pictures of the target object which normally walk from the first pictures as an identification picture sequence by adopting a manual labeling mode.

The method is suitable for tracking scenes with few pictures and low real-time requirement.

The above is merely an example, and the method for determining the recognition picture sequence is not limited in the embodiment of the present invention.

And then inputting the identification picture sequence into a gait feature extraction network to extract the first step feature of the target object. The gait feature extraction network is a pre-trained network model and is used for extracting the gait features of the pedestrians according to a plurality of continuous walking images of the pedestrians.

Next, a method of extracting the first image feature will be described in detail.

Performing picture analysis on each first picture to obtain a quality evaluation value of each first picture; and determining a first picture with a score meeting a set requirement as the identification picture. And then inputting the identification picture into an image feature extraction network to extract a first image feature of the target object.

There may be a plurality of methods for determining an identification picture, and the embodiment of the present invention provides the following two ways:

in a first mode

Since the target object in each first picture may have unclear features such as a face shielding part, a face shielding wearing part or a face wearing part, the target object in the first picture can be subjected to picture analysis by training the neural network model, and the quality evaluation value of each first picture is obtained. Specifically, if the target object blocks the face, the quality assessment value is low, and if the target object has clear facial features, the quality assessment value is high. The training process of the neural network model is the same as the training process of the neural network model in the first step state feature extraction, and details are not repeated here.

And comparing the quality evaluation values of the first pictures to determine that the first pictures with the quality evaluation values meeting the setting requirements are identification pictures, wherein the setting requirements can be that the quality evaluation values are greater than a preset threshold value. The embodiments of the present invention are not limited in this regard.

Mode two

The walking probability of the target object corresponding to the determined identification picture sequence in the step of extracting the first step-state features can be directly adopted to screen the identification picture.

In the step of extracting the first step feature, a walking probability of the target object in each first picture is obtained, for example, O ═ 1,0.9,0.7,0.2,0.5,1,1,0.9,0.8,0}, the identified recognition picture sequence is P2, and the corresponding walking probability is {1,1,0.9,0.8 }. The picture having the highest walking probability in the recognition picture sequence P2 is taken as the recognition picture. In this example, 2 pictures with a walking probability of 1 are taken as identification pictures; or further screening the 2 pictures, selecting one of the 2 pictures as an identification picture, wherein the screening standard can be definition, size of a target object and the like. The embodiments of the present invention are not limited in this regard.

The above is merely an example, and the method for determining the identification picture is not limited in the embodiment of the present invention.

And then inputting the identification picture into an image feature extraction network to extract a first image feature of the target object. The image feature extraction network is a pre-trained network model and is used for extracting image features of pedestrians according to a plurality of continuous walking images of the pedestrians.

After the first image characteristic and the first step state characteristic of the target object are obtained, the first image characteristic and the first step state characteristic are subjected to fusion processing to obtain a first characteristic.

In step 202, motion information of the target object is determined based on the position information of the target object acquired by the first camera.

In this step, from each first picture of the target object taken by the first camera, motion information of the target object is determined, and the motion information may be a motion direction of the target object or a disappearance position of the target object. The following two examples are provided below.

Example one

And if the motion information is the motion direction, selecting the last N pictures shot according to the time sequence from the first pictures. From these N pictures, the direction of movement of the target object that is possible after it has disappeared within the range of capture of the first camera can be determined.

Specifically, a rectangular coordinate system is established in the pictures, the pixel positions of the target object in the last N pictures are determined, a vector line segment is determined according to the N pixel positions, and the vector line segment can represent the motion direction of the target object in the pictures. And determining the motion direction of the target object in the geodetic coordinate system according to the proportional relation between the size of the picture and the actual shooting range of the first camera. For example, the pixel positions of the target object in the last 3 pictures are (300,500), (400,600), (500,700), respectively, and the motion direction of the target object in the pictures is fitted by the 3 pixel positions. And then converted into the direction of motion in the geodetic coordinate system.

Example two

And if the motion information is the disappearing position of the target object, selecting the last picture shot according to the time sequence from the first pictures, determining the pixel position of the target object in the picture, and determining the real position of the target object in the geodetic coordinate system according to the proportional relation between the picture size and the actual shooting range of the first camera.

And step 203, determining the camera located in the search area as a second camera according to the motion information.

The method comprises the following steps: a search area is first determined based on the motion information. The following two cases are possible:

firstly, if the motion information is the motion direction, the search area can be determined by the following method: forming a sector area with a set angle by taking the position of the first camera as the center of a circle and the walking distance in a set time length as a radius, wherein the sector area is a search area; wherein the symmetry axis of the sector area is parallel to the direction of motion.

Fig. 5A shows a schematic diagram of determining a search area according to a motion direction. In this example, the position of the first camera a and the shooting range of the first camera a are abstracted to the same point, and the shooting range of the first camera a is not focused here. For example, by determining that the direction of movement of the target object in the geodetic coordinate system is as shown in FIG. 5A via step 202, a search area may be determined. The search area is in a fan shape, the radius is the walking distance within a set duration, the central angle of the fan-shaped area can be any number of degrees, the larger the number of degrees is, the search accuracy rate can be increased, but the longer the required search time is. Typically, the center angle is 30 °.

Alternatively, the search area is not limited to a sector shape, and may be any figure, such as a rectangle, a triangle, a pentagon, and the like, which is not limited in this embodiment of the present invention.

Alternatively, the walking distance may be an empirical value, that is, the walking distance of a general person within a set time period; the walking distance may also be obtained by calculating the moving speed of the target object multiplied by the set time period, and for example, when the moving direction is determined in step 202, the possible moving speed of the target object may be simultaneously determined by the difference between the distance between two points and the photographing time between the two points.

The method considers the possible movement direction of the target object after the target object leaves the shooting range of the first camera, so that the targeted searching is carried out, and the target object tracking efficiency is high. However, since the search area defined according to this method is likely to overlap with the shooting range of the first camera, it causes a waste of computing resources.

Secondly, if the motion information is the disappearance position of the target object, the search area can be determined in the following way: forming a sector area with a set central angle by taking the disappearing position as a circle center and a possible walking distance of the target object in a set time length as a radius, wherein the sector area is a search area; wherein the fan-shaped area is not coincident with a photographing range of the first camera.

In this embodiment, the position of the first camera and the shooting range of the first camera are not abstracted into a point, as shown in fig. 5B, which shows the shooting range of the first camera a as a rectangle. Meanwhile, if the disappearing position of the target object is determined to be the point a by the step 202, the determined sector area is as shown in the figure. The sector area does not coincide with the shooting range of the first camera a.

The search area determined by the method is not overlapped with the shooting range of the first camera, so that the efficiency of cross-camera tracking success can be further improved, and all possible motion directions of the target object after leaving the shooting range of the first camera are considered, so that the tracking success probability is increased. But does not consider the moving direction of the target object, and thus has certain limitations.

Step two: determining a camera located in the search area as a second camera.

In step one, a search area is determined, and then a second camera in the search area is determined, as shown in fig. 5C, a second camera B and a second camera C are determined.

And 204, performing gait feature extraction on each second picture of each tracked object acquired by the second camera to obtain a second feature of each tracked object.

The second feature may include only the second step state feature, or may include the second step state feature and the second image feature. The method for determining the second characteristic is the same as the method for determining the first characteristic in step 201, and is not described herein again.

For example, it is determined that within the shooting range of the second camera B, there are two tracking objects B1 and B2, whose second features are B11 and B22, respectively. In the shooting range of the second camera C, there are two tracking objects C1 and C2, and their second features are C11 and C22, respectively. As shown in table one.

TABLE 1

Step 205, determining whether each tracked object includes the target object based on the similarity between the second feature of each tracked object and the first feature of the target object.

Two methods of determining whether a tracked object is a target object are provided below.

In a first mode

Performing feature fusion on the first feature and the second feature to obtain a fusion feature;

inputting the fusion features into a verification classifier to obtain feature values of the fusion features; the characteristic value is used for representing the similarity of the first characteristic and the second characteristic;

The verification classifier is a pre-trained model, and the training process is as follows: fusing the two characteristics with high similarity, inputting the fused characteristics into a model, and marking a higher characteristic value; and fusing the two characteristics with low similarity, inputting the fused characteristics into the model, and labeling a lower characteristic value, so that a large amount of sample training data are input to train the model.

The trained verification classifier can give out corresponding characteristic values according to the input fusion characteristics so as to represent the similarity of two characteristics in the fusion characteristics.

Mode two

Inputting the first characteristic and the second characteristic into a verification classifier to obtain the similarity of the first characteristic and the second characteristic;

and if the similarity is greater than a third preset threshold, determining that the tracking object is the target object.

In the scheme, the verification classifier is a pre-trained model, and the training process is as follows: inputting the two characteristics with high similarity into the model, and marking a higher characteristic value; and inputting two features with low similarity into the model, and labeling a lower feature value, so that a large amount of sample training data is input to train the model.

The trained verification classifier can give out corresponding characteristic values according to the two input characteristics so as to represent the similarity of the two characteristics.

For example, according to the above method, the similarity of each second feature determined in table one to the first feature is respectively shown in the following table.

TABLE 2

If the second preset threshold is set to 0.8, in table 2, the similarity between the second feature of the tracked object B1 captured by the second camera B and the first feature of the tracked object C1 captured by the second camera C and the target object meets the second preset threshold. B1 and c1 were determined to be target objects with significant attention.

Alternatively, among the tracking objects shown in table 2, the tracking object with the highest similarity may be determined as the target object, i.e., the tracking object b 1. The first i tracking objects with the highest similarity can also be set as target objects. The embodiments of the present invention are not limited in this regard.

Based on the same technical concept, fig. 6 exemplarily shows a structure of a cross-camera tracking device provided by an embodiment of the present invention, which can perform a flow of pedestrian identification and tracking.

As shown in fig. 6, the apparatus specifically includes:

the determining unit 601 is configured to perform gait feature extraction on each first picture of the target object acquired by the first camera to obtain a first feature of the target object; determining motion information of the target object based on the position information of the target object acquired by the first camera;

a processing unit 602, configured to determine, according to the motion information, that a camera located in a search area is a second camera; performing gait feature extraction on each second picture of each tracked object acquired by the second camera to obtain second features of each tracked object; and determining whether each tracking object comprises the target object or not based on the similarity of the second characteristic of each tracking object and the first characteristic of the target object.

Based on the same technical concept, an embodiment of the present invention further provides a computing device, including:

a memory for storing a computer program;

and the processor is used for calling the computer program stored in the memory and executing the method for tracking the camera in any mode according to the obtained program.

Based on the same technical concept, the embodiment of the present invention further provides a computer-readable storage medium, in which a computer-executable program is stored, where the computer-executable program is used to enable a computer to execute the method for cross-camera tracking listed in any of the above manners.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A cross-camera tracking method, comprising:

Perform gait feature extraction on each first picture of the target object collected by the first camera to obtain the first feature of the target object;

determining the motion information of the target object based on the position information of the target object collected by the first camera;

Determine according to the motion information that the camera located in the search area is the second camera;

performing gait feature extraction on each second picture of each tracking object collected by the second camera to obtain the second feature of each tracking object;

Based on the similarity between the second feature of each tracking object and the first feature of the target object, it is determined whether each tracking object includes the target object.

2. The method according to claim 1, characterized in that, performing gait feature extraction on each first picture of the target object collected by the first camera to obtain the first feature of the target object, comprising:

Determine a recognition picture sequence from each first picture of the target object collected by the first camera; perform gait feature extraction on the recognition picture sequence to obtain the first gait feature of the target object;

Determine a recognition picture from each first picture of the target object collected by the first camera; perform image feature extraction on the recognition picture to obtain the first image feature of the target object;

Feature fusion is performed on the first state feature and the first image feature to obtain the first feature.

3. The method according to claim 2, wherein determining the identification picture from each first picture of the target object collected by the first camera comprises:

Perform picture analysis on each of the first pictures to obtain the quality evaluation value of each first picture, and determine the first picture whose quality evaluation value meets the set requirements as the identified picture;

Perform image feature extraction on the recognition picture to obtain the first image feature of the target object, including:

Input the recognized picture into an image feature extraction network to extract the first image feature of the target object.

4. The method according to claim 2, wherein determining a sequence of recognized pictures from each first picture of the target object collected by the first camera comprises:

Perform walking recognition on each of the first pictures to obtain the walking probability of the target object in each of the first pictures;

The identification picture sequence is screened out from the first pictures; the identification picture sequence is consecutive K first pictures whose walking probability is greater than the first preset threshold;

The gait feature extraction is performed on the recognition picture sequence to obtain the first gait feature of the target object, including:

Input the recognized picture sequence into a gait feature extraction network to extract the first gait feature of the target object.

5. The method according to claim 1, wherein determining the motion information of the target object based on the position information of the target object collected by the first camera comprises:

Selecting the last N pictures taken in chronological order in each of the first pictures;

The moving direction of the target object is determined according to the position information of the target object in the last N pictures respectively.

6. The method according to claim 5, wherein determining that the camera located in the search area is the second camera according to the motion information, comprising:

Taking the position of the first camera as the center of the circle, and the walking distance within the set duration as the radius, a fan-shaped area with a set angle is formed, and the fan-shaped area is the search area; wherein, the symmetry axis of the fan-shaped area and the The direction of movement is parallel;

The camera located in the search area is determined to be the second camera.

7 . The method according to claim 1 , wherein, based on the similarity between the second feature of each tracking object and the first feature of the target object, it is determined whether the target is included in each tracking object. 8 . objects, including:

Perform feature fusion with the first feature and the second feature to obtain a fusion feature;

Inputting the fusion feature into a verification classifier to obtain a feature value of the fusion feature; the feature value is used to characterize the similarity between the first feature and the second feature;

If the feature value is greater than a second preset threshold, the tracking object is determined to be the target object.

8. A cross-camera tracking device, comprising:

a determining unit, configured to perform gait feature extraction on each first picture of the target object collected by the first camera to obtain the first feature of the target object; based on the position information of the target object collected by the first camera, determining the motion information of the target object;

a processing unit, configured to determine the camera located in the search area as the second camera according to the motion information; perform gait feature extraction on each second picture of each tracking object collected by the second camera to obtain each tracking object The second feature of the tracking object is determined based on the similarity between the second feature of each tracking object and the first feature of the target object, whether the tracking object includes the target object.

9. A computing device, comprising:

memory for storing computer programs;

The processor is configured to call the computer program stored in the memory, and execute the method according to any one of claims 1 to 7 according to the obtained program.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer-executable program, and the computer-executable program is used to cause a computer to execute the computer-executable program described in any one of claims 1 to 7. method.