CN111461222B

CN111461222B - Method and device for obtaining track similarity of target object and electronic equipment

Info

Publication number: CN111461222B
Application number: CN202010250352.8A
Authority: CN
Inventors: 戴鹏; 翁仁亮; 林元庆
Original assignee: Beijing Aibee Technology Co Ltd
Current assignee: Beijing Aibee Technology Co Ltd
Priority date: 2020-04-01
Filing date: 2020-04-01
Publication date: 2023-05-02
Anticipated expiration: 2040-04-01
Also published as: CN111461222A

Abstract

The application discloses a method, a device and electronic equipment for obtaining track similarity of a target object, wherein the method comprises the following steps: obtaining detection data corresponding to the target duration; the detection data includes at least one set of data pairs, the set of data pairs including: first data and second data, the first data comprising: coordinate data of a first position in the first detection frame and size data of the first detection frame; the first detection frame is a detection frame in which a first target object is located in an image frame acquired by the first acquisition device; the second data includes: coordinate data of a second position in the second detection frame and size data of the second detection frame; the second detection frame is a detection frame of a second target object in the image frame acquired by the second acquisition device; the second position corresponds to the first position; and taking the data pairs in the detection data as detection input data of the similarity measurement model which is trained in advance so as to obtain a detection result output by the similarity measurement model, wherein the detection result represents the similarity between the first target object and the second target object.

Description

Method and device for obtaining track similarity of target object and electronic equipment

Technical Field

The present disclosure relates to the technical field of object trajectory similarity obtaining methods, apparatuses, and electronic devices.

Background

The Multi-target Multi-camera Tracking (MTMC Tracking) technology is a key and basis for realizing an intelligent video monitoring system, and can be widely applied to traffic, parks, markets, hotels, banks, storage warehouses and other scenes. As the scale of the video monitoring system is enlarged, the number of cameras is increased, and the cooperation among multiple cameras is particularly important, so that the target tracking range can be expanded, and the target tracking accuracy can be improved by utilizing the cross information.

The basis for realizing the collaborative tracking among the multiple cameras is to realize the measurement of similarity values among different cameras about the pedestrian track. The currently widely used pedestrian track similarity measurement scheme mainly comprises a space-time-based similarity measurement scheme, in the scheme, coordinate information of contact points of pedestrian target feet and ground planes (namely edges of pedestrians under detection frames of camera acquisition areas) in two cameras at the same time is obtained first, world coordinates of a pedestrian track are calculated based on the two-dimensional coordinate information, and then a similarity value of the pedestrian track is obtained through the distance between the world coordinates, so that whether pedestrians in the two cameras are the same or not is judged.

However, in the measurement scheme, there is a process of converting two-dimensional coordinates and three-dimensional world coordinates of a pedestrian track in the camera, and in the process of converting the coordinates, due to lack of depth information, the solution is needed by means of single ground plane constraint, so that one-dimensional data loss is caused, and the finally obtained similarity value is inaccurate. Therefore, a technical solution capable of more accurately realizing the pedestrian track similarity measurement is needed.

Disclosure of Invention

In view of this, the present application provides a method, an apparatus and an electronic device for obtaining a track similarity of a target object, as follows:

a method for obtaining a target track similarity, the method comprising:

obtaining detection data corresponding to the target duration; wherein the detection data comprises at least one set of data pairs, one set of the data pairs comprising: first data and second data, the first data comprising: coordinate data of a first position in a first detection frame and size data of the first detection frame; the first detection frame is a detection frame in which a first target object is located in an image frame acquired by the first acquisition device; the second data includes: coordinate data of a second position in a second detection frame and size data of the second detection frame; the second detection frame is a detection frame of a second target object in the image frame acquired by the second acquisition device; the second position corresponds to the first position;

Taking the data pairs in the detection data as detection input data of a similarity measurement model which is trained in advance, so as to obtain a detection result output by the similarity measurement model, wherein the detection result represents the similarity between the first target object and the second target object;

the similarity measurement model is a model which is built in advance and is trained by using at least two groups of sample pairs with sample identifications, and one group of sample pairs comprises: a first sample and a second sample, the first sample comprising: coordinate data of the first position in the first sample frame and size data of the first sample frame; the second sample includes: coordinate data of the second position in the second sample frame and size data of the second sample frame; the first sample frame is a detection frame in which a first sample in the sample frames acquired by the first acquisition device is located; the second sample frame is a detection frame in which a second sample object is located in the sample frame acquired by the second acquisition device; the sample identity of the sample pair characterizes whether the first and second sample objects are the same sample object.

In the above method, preferably, in the data pair, the first data and the second data both correspond to the same time within the target duration.

In the above method, preferably, in the data pair, the first target object may be identified in a first detection frame corresponding to the first data, and the second target object may be identified in a second detection frame corresponding to the second data.

In the above method, preferably, the detecting input data of the similarity measurement model which is trained in advance is the data pair in the detecting data, so as to obtain the detecting result output by the similarity measurement model, including:

taking the first data and the second data of the data pair in the detection data as detection input data of the similarity measurement model to obtain an output result corresponding to the data pair output by the similarity measurement model;

and obtaining detection results of the first target object and the second target object according to the output results corresponding to the data pairs in the detection data.

In the above method, preferably, obtaining the detection results of the first target object and the second target object according to the output results of the data pair in the detection data includes:

and under the condition that the data pairs in the detection data are multiple groups, according to a preset result processing strategy, taking values of the output results corresponding to the multiple groups of data pairs to obtain the detection results.

An apparatus for obtaining a target trajectory similarity, the apparatus comprising:

the acquisition unit is used for acquiring detection data corresponding to the target duration; wherein the detection data comprises at least one set of data pairs, one set of the data pairs comprising: first data and second data, the first data comprising: coordinate data of a first position in a first detection frame and size data of the first detection frame; the first detection frame is a detection frame in which a first target object is located in an image frame acquired by the first acquisition device; the second data includes: coordinate data of a second position in a second detection frame and size data of the second detection frame; the second detection frame is a detection frame of a second target object in the image frame acquired by the second acquisition device; the second position corresponds to the first position;

the detection unit is used for taking the data pair in the detection data as detection input data of a similarity measurement model which is trained in advance so as to obtain a detection result output by the similarity measurement model, wherein the detection result represents the similarity between the first target object and the second target object;

a training unit, configured to train the similarity metric model constructed in advance by using at least two groups of samples with sample identifiers, where a group of the sample pairs includes: a first sample and a second sample, the first sample comprising: coordinate data of the first position in the first sample frame and size data of the first sample frame; the second sample includes: coordinate data of the second position in the second sample frame and size data of the second sample frame; the first sample frame is a detection frame in which a first sample in the sample frames acquired by the first acquisition device is located; the second sample frame is a detection frame in which a second sample object is located in the sample frame acquired by the second acquisition device; the sample identity of the sample pair characterizes whether the first and second sample objects are the same sample object.

In the above device, preferably, in the data pair, the first data and the second data both correspond to the same time within the target duration;

in the data pair, the first target object can be identified in a first detection frame corresponding to the first data, and the second target object can be identified in a second detection frame corresponding to the second data.

In the above apparatus, preferably, the detection unit includes:

the single frame detection module is used for taking the first data and the second data of the data pair in the detection data as detection input data of the similarity measurement model to obtain an output result corresponding to the data pair output by the similarity measurement model;

and the result processing module is used for obtaining the detection results of the first target object and the second target object according to the output results corresponding to the data pairs in the detection data.

The above apparatus, preferably, the result processing module is specifically configured to: and under the condition that the data pairs in the detection data are multiple groups, according to a preset result processing strategy, taking values of the output results corresponding to the multiple groups of data pairs to obtain the detection results.

An electronic device, comprising:

a memory for storing an application program and data generated by the operation of the application program;

a processor for executing the application program to realize: obtaining detection data corresponding to the target duration; wherein the detection data comprises at least one set of data pairs, one set of the data pairs comprising: first data and second data, the first data comprising: coordinate data of a first position in a first detection frame and size data of the first detection frame; the first detection frame is a detection frame in which a first target object is located in an image frame acquired by the first acquisition device; the second data includes: coordinate data of a second position in a second detection frame and size data of the second detection frame; the second detection frame is a detection frame of a second target object in the image frame acquired by the second acquisition device; the second position corresponds to the first position; taking the data pairs in the detection data as detection input data of a similarity measurement model which is trained in advance, so as to obtain a detection result output by the similarity measurement model, wherein the detection result represents the similarity between the first target object and the second target object;

According to the method, the device and the electronic equipment for obtaining the track similarity of the target object, when the track similarity measurement of the target object is carried out between the two acquisition devices with the overlapped monitoring areas, a similarity measurement model is built in advance, and model training is carried out by utilizing a plurality of groups of sample pairs which contain coordinate data and size data in sample frames and have sample identifications, so that after detection data of the two acquisition devices are obtained, the coordinate data and the size data of the detection frames in the data pairs are used as input data of the similarity measurement model, a detection result output by the similarity measurement model can be obtained, and the similarity between the first target object and the second target object monitored by the two acquisition devices is represented by the detection result. Therefore, the width and the height of the detection frames of the plurality of acquisition devices are utilized, the coordinate data of the detection frames are combined, similarity calculation is converted into object classification through the constructed similarity measurement model, the situation that the accuracy of the similarity measurement is reduced due to one-dimensional loss caused by converting two-dimensional coordinates into three-dimensional coordinates is avoided, the similarity between the objects monitored by the plurality of acquisition devices is obtained, and the accuracy of the similarity measurement is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of a scene of the present application in a target trajectory similarity measure;

fig. 2 is a flowchart of a method for obtaining a track similarity of a target object according to a first embodiment of the present application;

FIGS. 3-4 are respectively exemplary diagrams of embodiments of the present application;

fig. 5 is a schematic structural diagram of an apparatus for obtaining a track similarity of a target object according to a second embodiment of the present application;

fig. 6 is a schematic diagram of a part of a device for obtaining a track similarity of a target object according to a second embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to a third embodiment of the present application;

fig. 8 is a schematic diagram of a system according to an embodiment of the present application.

Detailed Description

As shown in fig. 1, a camera 1 and a camera 2 in the video monitoring system perform video monitoring on respective monitoring areas, and overlapping partial areas exist in the monitoring areas of the camera 1 and the camera 2, and non-overlapping partial areas also exist. Therefore, the video monitoring system realizes multi-target cross-camera tracking by utilizing the cameras with overlapping monitoring areas, and is widely applied to scenes such as traffic, parks, markets, hotels, banks, storage warehouses and the like. In order to realize collaborative tracking among multiple cameras, similarity values among different cameras about pedestrian tracks need to be measured.

The inventors of the present application found in the development that: the track similarity measurement model widely used at present mainly comprises two parts: appearance characteristic similarity measure and space-time similarity measure. The former is mainly obtained by calculating the similarity of the pedestrian Re-recognition feature (Re-identification features), and the latter is mainly obtained by using the distance of the three-dimensional coordinates of the trajectories in the overlapping time. Taking the space-time similarity measurement scheme as an example, when the same object such as a pedestrian appears in the acquisition areas of a plurality of cameras, the world coordinates of the pedestrian are identical. Based on the above, in the space-time similarity measurement scheme, the two-dimensional coordinates of the pedestrian on each camera are obtained and converted into three-dimensional world coordinates, and then whether the three-dimensional world coordinates are the same or approximate to the same is judged to realize the similarity measurement.

In order to realize the above similarity measurement scheme, the positions and the acquisition areas of all cameras need to be calibrated in advance so as to be unified to the same world coordinate system. Thus, calibrating a large number of cameras is itself a challenging and resource-intensive problem. In addition, in reality, some scenes which are difficult to calibrate by using calibration objects (such as a two-dimensional checkerboard used in a Zhang Zhengyou calibration method) exist, and the strategy is difficult to realize. Furthermore, three-dimensional world coordinates are calculated from two-dimensional image coordinates, which require solutions by means of a single ground plane constraint due to the lack of depth information. In the implementation scheme, the requirement on the detection precision of the contact point (namely the lower edge of the detection frame) between the target foot of the pedestrian and the ground plane is high, and when shielding exists, a large projection error can be caused.

Meanwhile, in the current space-time similarity measurement scheme, only the coordinate information of the contact point (namely the lower edge of the detection frame) between the target foot of the pedestrian and the ground plane is utilized, and the width and height information of the detection frame are not considered. In practice, the width and height values of the detection frame can represent the height, the body shape and other information of the pedestrian target, and can be used for improving the accuracy of space-time similarity calculation.

Based on the analysis, the inventor of the application provides an acquisition scheme of the target object track similarity based on deep learning through further research, the camera is not required to be calibrated in advance, and all information of the detection frame is fully utilized so as to improve the accuracy of target track similarity calculation. The specific scheme is as follows:

firstly, constructing a similarity measurement model, and training by using at least two groups of sample pairs with sample identifications to obtain an optimized similarity measurement model. Each set of sample pairs comprises: a first sample and a second sample, the first sample comprising: coordinate data of a first position in a first sample frame and size data of the first sample frame; the second sample includes: coordinate data of a second position in the second sample frame and size data of the second sample frame; the first sample frame is a detection frame in which a first sample in the sample frames acquired by the first acquisition device is located; the second sample frame is a detection frame in which a second sample object is located in the sample frame acquired by the second acquisition device; the sample identity of the sample pair characterizes whether the first sample and the second sample are the same sample.

Then, when the target object track similarity measurement is needed, firstly obtaining detection data corresponding to the target time length, wherein the detection data comprises at least one group of data pairs, and the group of data pairs comprises: first data and second data, the first data comprising: coordinate data of a first position in the first detection frame and size data of the first detection frame; the first detection frame is a detection frame in which a first target object is located in an image frame acquired by the first acquisition device; the second data includes: coordinate data of a second position in the second detection frame and size data of the second detection frame; the second detection frame is a detection frame of a second target object in the image frame acquired by the second acquisition device; the second position corresponds to the first position;

and finally, taking the data pair in the detection data as detection input data of the similarity measurement model which is trained in advance to obtain a detection result output by the similarity measurement model, wherein the detection result represents the similarity between the first target object and the second target object.

It should be noted that, in each of the above data pairs participating in the target track similarity measurement, the first data and the second data both correspond to the same time within the target duration, that is, the data pair obtained by detecting the image frames that are overlapped in time in the data is screened, and the first data and the second data that are not overlapped in time, that is, do not correspond to the same time, do not form a data pair, that is, do not participate in the measurement of the target track similarity.

In each data pair, the first target object can be identified in the first detection frame corresponding to the first data, and the second target object can be identified in the second detection frame corresponding to the second data. That is, in the image frames acquired by the first acquisition device and the second acquisition device at any time within the target duration, if the target object cannot be identified in the detection frames corresponding to any acquisition device, the image frame is deleted, that is, the corresponding data pair is removed, and only the data pair capable of identifying the target object in the two detection frames is reserved.

Correspondingly, when the data pairs in the detection data are used for detecting the similarity of the target object, the first data and the second data in each data pair can be respectively used as detection input data of a similarity measurement model, namely, the detection is carried out frame by frame, so that an output result corresponding to each data pair output by the similarity measurement model is obtained; and finally, obtaining detection results of the first target object and the second target object according to output results corresponding to the data pairs in at least one group of data pairs.

Under the condition that only one group of data pairs in the detection data is provided, the output result corresponding to the data pairs can be used as the detection results of the first target object and the second target object; and under the condition that the data pairs in the detection data are multiple groups, according to a preset result processing strategy, the output results corresponding to the multiple groups of data pairs are valued to obtain detection results. For example, the maximum value, the minimum value, the average value, the median value or the like of the output results corresponding to the plurality of groups of data pairs are subjected to value processing, so that the detection result is obtained.

In summary, in the present application, the width and the height of each detection frame of the plurality of acquisition devices are utilized, and the coordinate data of the detection frames are combined, so that similarity is calculated and converted into the classification of the target object through the constructed similarity measurement model, thereby avoiding the situation that the accuracy of the similarity measurement is reduced due to one-dimensional loss caused by converting the two-dimensional coordinates into the three-dimensional coordinates, and obtaining the similarity between the target objects monitored by the plurality of acquisition devices, and further improving the accuracy of the similarity measurement.

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Referring to fig. 2, a flowchart of an implementation method of obtaining a track similarity of a target object according to an embodiment of the present application is applicable to an electronic device capable of performing data processing, such as a computer or a server. The method in the embodiment is mainly used for improving accuracy of target object track similarity measurement when tracking the target object among a plurality of acquisition devices.

Specifically, the method in this embodiment may include the following steps:

step 201: and obtaining detection data corresponding to the target duration.

In this embodiment, a plurality of acquisition devices, such as cameras, may be connected to an electronic device in a data communication manner, so that in this embodiment, image frames acquired by a plurality of acquisition devices may be acquired through the data communication connection, and detection data corresponding to a target duration may be acquired based on the image frames;

or, in this embodiment, a data communication connection may be established between the plurality of acquisition devices and the storage device, and the image frames acquired by the plurality of acquisition devices are transmitted to the storage device for storage through the data communication connection, and in this embodiment, the image frames stored in the storage device may be read through the data communication connection between the electronic device and the storage device, and detection data corresponding to the target duration may be obtained based on the image frames.

In this embodiment, the detection data includes at least one data pair, and each data pair includes: first data and second data, wherein the first data comprises: coordinate data of a first position in the first detection frame and size data of the first detection frame; the second data includes: coordinate data of a second position in the second detection frame and size data of the second detection frame.

The second position corresponds to the first position, for example, the first position is a center position of the first detection frame, and the second position corresponds to the center position of the second detection frame, as shown in fig. 3.

And the coordinate data of the first location includes: the abscissa and ordinate data of the first location may be represented by det-x1 and det-y1, and the coordinate data of the second location includes: the abscissa and ordinate data of the second position may be represented by det-x2 and det-y 2.

It should be noted that, the first detection frame is a detection frame where the first target object is located in the image frame acquired by the first acquisition device, and the size data of the first detection frame may include a width and a height of the first detection frame and may be represented by det-w1 and det-h 1; the second detection frame is a detection frame of a second target object in the image frame acquired by the second acquisition device, and the size data of the second detection frame can comprise the width and the height of the second detection frame and can be expressed by det-w2 and det-h 2.

The detection frame in this embodiment may be understood as a frame generated after performing contour recognition on an object in an image frame, where the width and the height of the detection frame are adaptive to the shape of an area occupied by a target object in the detection frame, so that the size of the detection frame is related to the size of the target object therein.

In a specific implementation, in this embodiment, image frames acquired by the first acquisition device and the second acquisition device within a target duration may be acquired first, that is, only image frames overlapping in time are acquired from image frames acquired by the first acquisition device and the second acquisition device, and correspondingly, each moment in the target duration corresponds to two image frames in the acquired image frames, that is, the image frames acquired by the first acquisition device and the second acquisition device respectively at the moment; then, identifying the target object in the two image frames corresponding to each moment, wherein one detection frame is respectively arranged in the two image frames corresponding to each moment, namely: a first detection frame and a second detection frame, as shown in fig. 3, in which a target object can be identified in the first detection frame, and in which a target object can also be identified in the second detection frame, and deleting image frames in which a target object cannot be identified, leaving only image frames in which a target object can be identified in the detection frames, that is, overlapping image frames with respect to the target object; finally, identifying coordinate data and size data of the detection frames at specific positions in the respective detection frames in the two image frames corresponding to each moment, namely first data and second data corresponding to each moment, so as to form a group of data pairs at each moment, wherein the first data and the second data in the formed data pairs correspond to the same moment in the target duration; and the first target object can be identified in the first detection frame corresponding to the first data in the data pair, and the second target object can be identified in the second detection frame corresponding to the second data.

Thus, the method is finally obtained based on the implementation scheme: and a group of data pairs corresponding to each time in the target duration form detection data.

It should be noted that the above object may be an object capable of moving autonomously or by other means, such as a person or an ornament.

Step 202: and taking the data pairs in the detection data as detection input data of the similarity measurement model which is trained in advance, so as to obtain a detection result output by the similarity measurement model.

In a specific implementation, the detection result may be represented by a value greater than 0 and less than 1 to represent the similarity between the first object in the first detection frame and the second object in the second detection frame, and in another way, the detection result may represent the confidence that the first object and the second object belong to the same object, and correspondingly, the higher the value in the detection result, the more similar the first object and the second object are, and the higher the confidence that the first object and the second object belong to the same object.

Based on the above implementation, in this embodiment, a similar threshold may be further preset, and in the case where the value in the detection result is greater than or equal to the similar threshold, it may be determined that the first target object in the first detection frame and the second target object in the second detection frame are the same target object; and under the condition that the numerical value in the detection result is smaller than the similar threshold value, the first target object in the first detection frame and the second target object in the second detection frame can be determined to be different targets. For example, the pedestrian in the detection frame of the image frame acquired by the camera 1 and the pedestrian in the detection frame of the image frame acquired by the camera 2 are the same person, based on which the pedestrian tracking across the lenses is realized.

The similarity measurement model is a model which is built in advance and is trained by at least two groups of sample pairs with sample identifications. In a specific implementation, the similarity metric model may be a machine model previously constructed based on a preset deep learning algorithm, such as a model based on a convolutional neural network, a model based on a self-coding neural network of multi-layer neurons, or a model based on a layer self-coding neural network, and so on.

Accordingly, after the similarity metric model is constructed, the similarity metric model is trained using multiple sets of samples, each set having a sample identification. Specifically, in this embodiment, each sample pair may be sequentially used as sample input data of a similarity measurement model, the similarity measurement model is operated, after a sample output result is obtained, a sample data result is compared with a sample identifier, and a loss function of a model parameter with the similarity measurement model is minimized by using the comparison result until the loss function converges, so as to obtain a model parameter after training and optimization, and training of the similarity measurement model is completed.

In the multiple sets of sample pairs used in the specific training, each set of sample pairs includes: a first sample and a second sample, the first sample comprising: coordinate data of a first position in a first sample frame and size data of the first sample frame; the second sample includes: coordinate data of a second position in the second sample frame and size data of the second sample frame; the first sample frame is a detection frame in which a first sample in the sample frames acquired by the first acquisition device is located; the second sample frame is a detection frame in which a second sample object is located in the sample frame acquired by the second acquisition device; the sample identity of the sample pair characterizes whether the first sample and the second sample are the same sample.

The second position in the second sample frame may correspond to the first position in the first sample frame, e.g., the first position in the first sample frame is the center position of the first sample frame, and correspondingly, the second position in the second sample frame is the center position of the second sample frame, as shown in fig. 4. It should be noted that the first position in the first detection frame is consistent with the first position in the first sample frame, that is, the first position in the first sample frame determines the first position in the first detection frame. For example, if the first position in the first sample frame is the center position of the first sample frame, the first position in the first detection frame is the center position in the first detection frame, and correspondingly, the second position in the second sample frame and the second position in the second detection frame are also the center positions in the belonging frames; and if the first position in the first sample frame is the lower left corner position of the first sample frame, then correspondingly, the first position in the first detection frame is also the lower left corner position in the first detection frame, and so on, the second position in the second sample frame and the second position in the second detection frame are also the lower left corner position in the belonging frame.

Similarly, the coordinate data of the first position in the first sample includes: the abscissa data and the ordinate data of the first position in the first sample frame may be represented by bb-x1 and bb-y1, and the coordinate data of the second position in the second sample includes: the abscissa and ordinate data for the second location in the second sample may be represented by bb-x2 and bb-y 2.

It should be noted that, the first sample frame is a detection frame where the first target object is located in the sample frame acquired by the first acquisition device, and the size data of the first sample frame may include a width and a height of the first sample frame and may be represented by bb-w1 and bb-h 1; the second sample frame is a detection frame of a second target object in the sample frame acquired by the second acquisition device, and the size data of the second sample frame can comprise the width and the height of the second sample frame and can be expressed by bb-w2 and bb-h 2.

The sample frame in this embodiment may be understood as a frame generated after the contour recognition of the object in the sample frame, where the width and the height of the sample frame are adaptive to the shape of the area occupied by the sample object in the sample frame, so that the size of the sample frame is related to the size of the sample object therein.

In a specific implementation, in this embodiment, multiple sets of sample pairs with sample identifiers may be obtained by:

Firstly, acquiring sample frames acquired by a first acquisition device and a second acquisition device, specifically, acquiring only sample frames overlapped in time in the sample frames acquired by the first acquisition device and the second acquisition device respectively, wherein each moment in the target duration corresponds to two sample frames in the acquired sample frames, namely the sample frames acquired by the first acquisition device and the second acquisition device respectively at the moment; then, identifying sample objects in two sample frames corresponding to each moment, wherein each sample frame corresponds to each moment, and a detection frame is respectively arranged in the two sample frames, namely: a first sample frame in which a sample can be identified and a second sample frame in which a sample can also be identified, as shown in fig. 4, and deleting sample frames in which a sample cannot be identified, leaving only sample frames in which a sample can be identified, i.e., sample frames overlapping with respect to a sample, in the sample frames; finally, identifying coordinate data and size data of a sample frame at a specific position in each sample frame in two sample frames corresponding to each time, namely a first sample and a second sample corresponding to each time, so as to form a group of sample pairs at each time, wherein the first sample and the second sample in the formed sample pairs correspond to the same time in the target duration; and the first sample can be identified in a first sample frame corresponding to the first sample in the sample pair, and the second sample can be identified in a second sample frame corresponding to the second sample.

And then, after finally obtaining a group of sample pairs corresponding to a plurality of moments respectively based on the implementation, labeling a sample identification for each group of sample pairs, wherein the sample identification represents whether a first sample object in a first sample frame and a second sample object in a second sample frame in the sample pairs are the same sample object, and the plurality of groups of sample pairs and the sample identifications of the sample pairs form sample data for training a similarity measurement model. In a specific implementation, the labeling interface may be displayed for the user, and after the user performs the labeling input operation through the labeling interface, in this embodiment, the labeling input operation of the user is received, and labeling data in the labeling input operation is set as a sample identifier of a corresponding labeled sample pair.

It should be noted that the sample object may be an object capable of moving autonomously or by other means, such as a person or an ornament.

In one implementation, step 202 may be implemented by:

firstly, taking first data and second data of a data pair in detection data as detection input data of a similarity measurement model, and operating the similarity measurement model to further obtain an output result corresponding to the data pair output by the similarity measurement model;

And then, according to the corresponding output result of the data pair in the detection data, obtaining the detection results of the first target object and the second target object.

In a specific implementation, when the number of data pairs in the detection data is multiple, the multiple groups of data pairs may be valued according to a preset result processing policy, for example, a maximum value taking result processing policy, a minimum value taking result processing policy, an intermediate value taking result processing policy, or an average value taking result processing policy, so as to obtain a detection result. For example, taking the maximum value, the minimum value, the intermediate value or the average value in the output results corresponding to the multiple groups of data pairs as the final detection result to represent the similarity between the first target object and the second target object;

or, in the case that the data pairs in the detection data are a group, the output result corresponding to the group of data pairs may be taken as the final detection result, so as to represent the similarity between the first target object and the second target object.

As can be seen from the foregoing, in the method for obtaining the track similarity of the target object provided in the first embodiment of the present application, when the track similarity measurement of the target object is performed between two acquisition devices having overlapping monitoring areas, a similarity measurement model is pre-built, and a plurality of sets of sample pairs including coordinate data and size data in sample frames and having sample identifiers are used for model training, so that after obtaining the detection data of the two acquisition devices, the coordinate data and the size data of the detection frames in the data pairs are used as input data of the similarity measurement model, and then the detection result output by the similarity measurement model can be obtained, and the similarity between the first target object and the second target object monitored by the two acquisition devices is represented by the detection result. Therefore, in this embodiment, the width and the height of each detection frame of the plurality of acquisition devices are utilized, and the coordinate data of the detection frames are combined, so that similarity calculation is converted into object classification through the constructed similarity measurement model, thereby avoiding the situation that the accuracy of the similarity measurement is reduced due to one-dimensional loss caused by converting two-dimensional coordinates into three-dimensional coordinates, and obtaining the similarity between the objects monitored by the plurality of acquisition devices, and further improving the accuracy of the similarity measurement.

Referring to fig. 5, a schematic structural diagram of an apparatus for obtaining a track similarity of a target object according to a second embodiment of the present application may be configured in an electronic device capable of performing data processing, such as a computer or a server. The device in this embodiment is mainly used to improve accuracy of target track similarity measurement when tracking targets among multiple acquisition devices.

Specifically, the apparatus in this embodiment may include the following functional units:

an obtaining unit 501, configured to obtain detection data corresponding to a target duration; wherein the detection data comprises at least one set of data pairs, a set of data pairs comprising: first data and second data, the first data comprising: coordinate data of a first position in the first detection frame and size data of the first detection frame; the first detection frame is a detection frame in which a first target object is located in an acquisition area of the first acquisition device; the second data includes: coordinate data of a second position in the second detection frame and size data of the second detection frame; the second detection frame is a detection frame of a second target object in an acquisition area of the second acquisition device; the second position corresponds to the first position.

Wherein, each group of data pair in the detection data, the first data and the second data correspond to the same moment in the target duration; in each data pair, a first target object can be identified in a first detection frame corresponding to the first data, and a second target object can be identified in a second detection frame corresponding to the second data.

The detection unit 502 is configured to input data in the detection data as detection input data of the similarity measurement model that is trained in advance, so as to obtain a detection result output by the similarity measurement model, where the detection result represents a similarity between the first target object and the second target object.

Preferably, as shown in fig. 6, the detection unit 502 may include the following modules:

the single frame detection module 601 is configured to take first data and second data of a data pair in at least one group of data pairs as detection input data of a similarity metric model, so as to obtain an output result corresponding to the data pair output by the similarity metric model;

the result processing module 602 is configured to obtain a detection result of the first target object and the second target object according to an output result corresponding to the data pair in at least one set of first data.

In a specific implementation, the result processing module 602 may take values of output results corresponding to the multiple sets of data pairs according to a preset result processing policy when the data pairs in at least one set of data pairs are multiple sets, so as to obtain a detection result; alternatively, when the data pairs in the detection data are one set, the result processing module 602 may use the output result corresponding to the set of data pairs as the final detection result.

A training unit 503, configured to train a pre-constructed similarity metric model by using at least two groups of samples with sample identifiers, where a group of sample pairs includes: a first sample and a second sample, the first sample comprising: coordinate data of a first position in a first sample frame and size data of the first sample frame; the second sample includes: coordinate data of a second position in the second sample frame and size data of the second sample frame; the first sample frame is a detection frame where a first sample object is located in an acquisition area of the first acquisition device; the second sample frame is a detection frame in which a second sample object is located in an acquisition area of the second acquisition device; the sample identity of the sample pair characterizes whether the first sample and the second sample are the same sample.

As can be seen from the above-mentioned scheme, in the device for obtaining the track similarity of the target object provided in the second embodiment of the present application, when the track similarity measurement of the target object is performed between two acquisition devices having overlapping monitoring areas, a similarity measurement model is constructed in advance, and a plurality of sets of sample pairs including coordinate data and size data in sample frames and having sample identifications are used for model training, so that after obtaining the detection data of the two acquisition devices, the coordinate data and the size data of the detection frames in the data pairs are used as input data of the similarity measurement model, and then the detection result output by the similarity measurement model can be obtained, and the similarity between the first target object and the second target object monitored by the two acquisition devices is represented by the detection result. Therefore, in this embodiment, the width and the height of each detection frame of the plurality of acquisition devices are utilized, and the coordinate data of the detection frames are combined, so that similarity calculation is converted into object classification through the constructed similarity measurement model, thereby avoiding the situation that the accuracy of the similarity measurement is reduced due to one-dimensional loss caused by converting two-dimensional coordinates into three-dimensional coordinates, and obtaining the similarity between the objects monitored by the plurality of acquisition devices, and further improving the accuracy of the similarity measurement.

It should be noted that, the specific implementation of each unit in this embodiment may refer to the corresponding content in the foregoing, which is not described in detail herein.

Referring to fig. 7, a schematic structural diagram of an electronic device according to a third embodiment of the present application may be an electronic device capable of performing data processing, such as a computer or a server. The electronic device in this embodiment is mainly used to improve accuracy of target track similarity measurement when tracking a target among a plurality of acquisition devices.

Specifically, the electronic device in this embodiment may include the following structure:

a memory 701 for storing an application program and data generated by the application program running;

a processor 702 for executing an application to implement: obtaining detection data corresponding to the target duration; wherein the detection data comprises at least one set of data pairs, a set of data pairs comprising: first data and second data, the first data comprising: coordinate data of a first position in the first detection frame and size data of the first detection frame; the first detection frame is a detection frame in which a first target object is located in an acquisition area of the first acquisition device; the second data includes: coordinate data of a second position in the second detection frame and size data of the second detection frame; the second detection frame is a detection frame of a second target object in an acquisition area of the second acquisition device; the second position corresponds to the first position; taking the data pair in the detection data as detection input data of a similarity measurement model which is trained in advance to obtain a detection result output by the similarity measurement model, wherein the detection result represents the similarity between the first target object and the second target object;

The similarity measurement model is a model which is built in advance and is trained by at least two groups of sample pairs with sample identifications, and one group of sample pairs comprises: a first sample and a second sample, the first sample comprising: coordinate data of a first position in a first sample frame and size data of the first sample frame; the second sample includes: coordinate data of a second position in the second sample frame and size data of the second sample frame; the first sample frame is a detection frame where a first sample object is located in an acquisition area of the first acquisition device; the second sample frame is a detection frame in which a second sample object is located in an acquisition area of the second acquisition device; the sample identity of the sample pair characterizes whether the first sample and the second sample are the same sample.

As can be seen from the above-mentioned scheme, in the electronic device provided in the third embodiment of the present application, when performing the track similarity measurement of the target object between two acquisition devices having overlapping monitoring areas, a similarity measurement model is constructed in advance, and a plurality of groups of sample pairs including coordinate data and size data in sample frames and having sample identifiers are used for performing model training, so that after obtaining the detection data of the two acquisition devices, the coordinate data and the size data of the detection frames in the data pairs are used as the input data of the similarity measurement model, and the detection result output by the similarity measurement model can be obtained, and the similarity between the first target object and the second target object monitored by the two acquisition devices is represented by the detection result. Therefore, in this embodiment, the width and the height of each detection frame of the plurality of acquisition devices are utilized, and the coordinate data of the detection frames are combined, so that similarity calculation is converted into object classification through the constructed similarity measurement model, thereby avoiding the situation that the accuracy of the similarity measurement is reduced due to one-dimensional loss caused by converting two-dimensional coordinates into three-dimensional coordinates, and obtaining the similarity between the objects monitored by the plurality of acquisition devices, and further improving the accuracy of the similarity measurement.

It should be noted that, the specific implementation of the processor in this embodiment may refer to the corresponding content in the foregoing, which is not described in detail herein.

In addition, as shown in fig. 8, a system is provided in the present application, in which a plurality of acquisition devices 801 are provided, a first acquisition device and a second acquisition device (a camera 1 and a camera 2 in fig. 1) are included therein, such as a camera, and further a storage device 802 and an electronic device 803 (refer to the electronic device shown in fig. 7) are provided, and data communication can be achieved among the acquisition devices 801, the storage device 802 and the electronic device 803 through a data communication connection in a wireless or wired manner.

When tracking the target track between the following cameras 1 and 2, the method and the device are used for illustrating a target track similarity measurement scheme, and are applicable to the realization of target track similarity measurement between any 2 cameras:

first, the pedestrian trajectory space-time similarity calculation scheme based on deep learning in this example mainly includes two parts: the training part and the testing part are specifically as follows:

1. training part:

training data is a collection of basic units (bbx 1, bbx, label) that can be represented as { (bbx 1, bbx2, label) } (sample pair and sample identity). Wherein bbx1 = [ x, y1, w1, h1] is detection frame data (sample frame) of a pedestrian in a t-th frame in the camera 1, and includes coordinate data and size data of the detection frame, wherein (x 1, y 1) represents coordinate values of a center point of the detection frame in an image coordinate system, and (w 1, h 1) represents width and height of the detection frame in the image coordinate system.

Similarly, bbx 2= [ x2, y2, w2, h2] is a pedestrian detection frame in the t-th frame of the camera 2 (note that the two detection frames must be time-synchronized, i.e. the frame numbers are identical), wherein (x 2, y 2) represents the coordinate value of the center point of the detection frame in the image coordinate system, and (w 2, h 2) represents the width and height of the detection frame in the image coordinate system.

And label is a label (sample identifier) of whether the two detection frames are the same pedestrian, label=1 indicates that the two detection frames are the same person, whereas label=0 indicates that the two detection frames belong to different pedestrian targets.

The model structure of the similarity measurement model can adopt 3-5 layers of full-connection layer neural networks as the similarity measurement model, and the model structure is input as [ x1, x2, y1, y2, w1, w2, h1, h2], and the confidence coefficient c that the two detection frames belong to the same pedestrian target is output.

In the model training after the completion of the construction of the similarity measurement model, the similarity measurement model may be trained by using cross entropyloss, for example, confidence coefficient c output by the similarity measurement model for training data is compared with a sample identifier, and further training is performed by using a corresponding loss function according to the comparison result until the loss function converges, so that training of the similarity training model is completed, and the similarity measurement model between the camera 1 and the camera 2 is obtained.

2. Test part:

first, the detection data in the present application is the time overlapping portion of the track 1 in the camera 1 and the track 2 in the camera 2 { (det 11, det21, t 1), (det 12, det22, t 2), …, (det 1n, det2n, tn) }, where det1 i= [ x = _1i ，y _1i ，w _1i ，h _1i ]The coordinates, width and height of the detection frame of the ith frame in track 1 are shown, where det2i= [ x ] _2i ，y _2i ，w _2i ，h _2i ]The coordinates, width and height of the detection frame of the i-th frame in track 2 are shown.

Meanwhile, the detection data in the present application is the overlapping portion between the camera 1 and the camera 2 about the pedestrian, and the pedestrian is recognized in the respective detection frames of the camera 1 and the camera 2 at the same time, thereby, a similarity metric model (the input of which is [ x ] _1i ，y _1i ，w _1i ，h _1i ，x _2i ，y _2i ，w _2i ，h _2i ]) And further obtaining a similarity value ci calculated for each frame overlapped by the pedestrians.

Finally, for the frame-by-frame similarity [ c1, c2, …, cn ] calculated by the overlapping portion, the comprehensive similarity cf of the track 1 and the track 2 may be calculated by adopting strategies such as averaging, taking a maximum value or taking a median value.

Therefore, the implementation scheme of multi-camera network calibration and coordinate distance calculation is skillfully converted into the two-classification problem, and the adaptability to different scenes is improved by constructing a similarity measurement model. Moreover, the deep learning method is adopted in the method, the width and the height of the detection frame are fully utilized, and the accuracy of space-time similarity calculation is improved.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The method for obtaining the track similarity of the target object is characterized by comprising the following steps:

obtaining detection data corresponding to the target duration; wherein the detection data comprises at least one set of data pairs, one set of the data pairs comprising: first data and second data, the first data comprising: coordinate data of a first position in a first detection frame and size data of the first detection frame; the first detection frame is a detection frame in which a first target object is located in an image frame acquired by the first acquisition device; the second data includes: coordinate data of a second position in a second detection frame and size data of the second detection frame; the second detection frame is a detection frame of a second target object in the image frame acquired by the second acquisition device; the second position corresponds to the first position; the first data and the second data both correspond to the same time within the target duration, the first target object can be identified in a first detection frame corresponding to the first data, and the second target object can be identified in a second detection frame corresponding to the second data;

the similarity measurement model is a model which is built in advance and is trained by using at least two groups of sample pairs with sample identifications, and one group of sample pairs comprises: a first sample and a second sample, the first sample comprising: coordinate data of the first position in a first sample frame and size data of the first sample frame; the second sample includes: coordinate data of the second position in a second sample frame and size data of the second sample frame; the first sample frame is a detection frame in which a first sample in the sample frames acquired by the first acquisition device is located; the second sample frame is a detection frame in which a second sample object is located in the sample frame acquired by the second acquisition device; the sample identity of the sample pair characterizes whether the first and second sample objects are the same sample object.

2. The method of claim 1, wherein the first data and the second data in the pair of data each correspond to a same time within the target time period.

3. The method according to claim 1 or 2, wherein in the data pair, the first target object can be identified in a first detection frame corresponding to the first data, and the second target object can be identified in a second detection frame corresponding to the second data.

4. The method according to claim 1 or 2, wherein inputting the data pairs in the detection data as detection input data of a similarity metric model which is trained in advance to obtain detection results output by the similarity metric model, comprises:

5. The method of claim 4, wherein obtaining the detection results of the first object and the second object according to the output results corresponding to the pair of data in the detection data comprises:

6. An apparatus for obtaining a track similarity of a target object, the apparatus comprising:

the acquisition unit is used for acquiring detection data corresponding to the target duration; wherein the detection data comprises at least one set of data pairs, one set of the data pairs comprising: first data and second data, the first data comprising: coordinate data of a first position in a first detection frame and size data of the first detection frame; the first detection frame is a detection frame in which a first target object is located in an image frame acquired by the first acquisition device; the second data includes: coordinate data of a second position in a second detection frame and size data of the second detection frame; the second detection frame is a detection frame of a second target object in the image frame acquired by the second acquisition device; the second position corresponds to the first position; the first data and the second data both correspond to the same time within the target duration, the first target object can be identified in a first detection frame corresponding to the first data, and the second target object can be identified in a second detection frame corresponding to the second data;

a training unit, configured to train the similarity metric model constructed in advance by using at least two groups of samples with sample identifiers, where a group of the sample pairs includes: a first sample and a second sample, the first sample comprising: coordinate data of the first position in a first sample frame and size data of the first sample frame; the second sample includes: coordinate data of the second position in a second sample frame and size data of the second sample frame; the first sample frame is a detection frame in which a first sample in the sample frames acquired by the first acquisition device is located; the second sample frame is a detection frame in which a second sample object is located in the sample frame acquired by the second acquisition device; the sample identity of the sample pair characterizes whether the first and second sample objects are the same sample object.

7. The apparatus of claim 6, wherein the first data and the second data in the pair of data each correspond to a same time within the target time period;

8. The apparatus according to claim 6 or 7, wherein the detection unit comprises:

9. The apparatus of claim 8, wherein the result processing module is specifically configured to: and under the condition that the data pairs in the detection data are multiple groups, according to a preset result processing strategy, taking values of the output results corresponding to the multiple groups of data pairs to obtain the detection results.

10. An electronic device, comprising:

a processor for executing the application program to realize: obtaining detection data corresponding to the target duration; wherein the detection data comprises at least one set of data pairs, one set of the data pairs comprising: first data and second data, the first data comprising: coordinate data of a first position in a first detection frame and size data of the first detection frame; the first detection frame is a detection frame in which a first target object is located in an image frame acquired by the first acquisition device; the second data includes: coordinate data of a second position in a second detection frame and size data of the second detection frame; the second detection frame is a detection frame of a second target object in the image frame acquired by the second acquisition device; the second position corresponds to the first position; the first data and the second data both correspond to the same time within the target duration, the first target object can be identified in a first detection frame corresponding to the first data, and the second target object can be identified in a second detection frame corresponding to the second data; taking the data pairs in the detection data as detection input data of a similarity measurement model which is trained in advance, so as to obtain a detection result output by the similarity measurement model, wherein the detection result represents the similarity between the first target object and the second target object;