CN111047622B

CN111047622B - Method and device for matching objects in video, storage medium and electronic device

Info

Publication number: CN111047622B
Application number: CN201911143170.4A
Authority: CN
Inventors: 黄湘琦
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-11-20
Filing date: 2019-11-20
Publication date: 2023-05-30
Anticipated expiration: 2039-11-20
Also published as: CN111047622A

Abstract

The invention discloses a method and a device for matching objects in video, a storage medium and an electronic device. Wherein the method comprises the following steps: acquiring a first video shot by a first camera and a second video shot by a second camera; determining the track similarity of each object in M objects and each object in N objects; and determining that a first object of the M objects and a second object of the N objects are the same object according to the track similarity of each object of the M objects and each object of the N objects. The invention solves the technical problem of low accuracy of tracking the object in the related technology.

Description

Method and device for matching objects in video, storage medium and electronic device

Technical Field

The present invention relates to the field of computers, and in particular, to a method and apparatus for matching objects in video, a storage medium, and an electronic apparatus.

Background

In the related art, in the process of tracking an object according to video contents captured by cameras, the video contents captured by a plurality of cameras are generally obtained, and then whether the object in the video contents captured by the plurality of cameras is the same object is determined by comparing the similarity of the objects in the video contents, so as to realize the tracking of the object.

However, if the method is adopted, the time stamps of the cameras may be different, so that the collected video content is not synchronous, and the accuracy of tracking the object is low.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a method and a device for matching objects in video, a storage medium and an electronic device, which are used for at least solving the technical problem of low accuracy of tracking objects in related technologies.

According to an aspect of an embodiment of the present invention, there is provided a method for matching objects in video, including: acquiring a first video shot by a first camera and a second video shot by a second camera, wherein the shooting area of the first camera and the shooting area of the second camera comprise overlapping areas, the first video comprises M objects, the second video comprises N objects, and the M and the N are positive integers; determining the track similarity of each object in the M objects and each object in the N objects; and determining that a first object of the M objects and a second object of the N objects are the same object according to the track similarity of each object of the M objects and each object of the N objects.

According to another aspect of the embodiment of the present invention, there is also provided a matching apparatus for an object in a video, including: an obtaining unit, configured to obtain a first video captured by a first camera and a second video captured by a second camera, where a capturing area of the first camera and a capturing area of the second camera include overlapping areas, the first video includes M objects, the second video includes N objects, and the M and the N are positive integers; a first determining unit configured to determine a trajectory similarity between each of the M objects and each of the N objects; and the second determining unit is used for determining that the first object of the M objects and the second object of the N objects are the same object according to the track similarity of each object of the M objects and each object of the N objects.

As an alternative example, the above apparatus further includes: a third determining unit configured to determine, after determining the trajectory similarity of each of the M objects and each of the N objects, an appearance feature similarity of each of the M objects and each of the N objects; the second determination unit includes: and the second determining module is used for determining that a first object of the M objects and a second object of the N objects are the same object according to the sum of the track similarity and the appearance feature similarity of each object of the M objects and each object of the N objects.

As an optional example, the above second determining unit includes: the matching module is used for matching the M objects with the N objects according to the track similarity by using a Hungary algorithm to obtain a matching result, wherein the matching result comprises a plurality of groups of object groups, each group of object groups comprises one object of the M objects and one object of the N objects, each group of object groups comprises a target object group, and each target object group comprises the first object and the second object; and a third determining module, configured to determine two objects in each of the object groups as the same object.

As an alternative example, the above-described acquisition unit includes: the first obtaining module is used for obtaining the first system time and the second system time, the calculating module is used for calculating the absolute value of the difference value between the first system time and the second system time to obtain the first threshold value, the second obtaining module is used for obtaining the video which is shot by the first camera and has the time length longer than the first threshold value from the first system time as the first video, and obtaining the video which is shot by the second camera and has the time length longer than the first threshold value from the second system time as the second video.

According to a further aspect of embodiments of the present invention, there is also provided a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the above method of matching objects in video at run-time.

According to still another aspect of the embodiments of the present invention, there is further provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the method for matching objects in video described above through the computer program.

In the embodiment of the invention, a first video shot by a first camera and a second video shot by a second camera are adopted, wherein the shooting area of the first camera and the shooting area of the second camera comprise overlapping areas, the first video comprises M objects, the second video comprises N objects, and M and N are positive integers; determining the track similarity of each object in the M objects and each object in the N objects; and determining that a first object of the M objects and a second object of the N objects are the same object according to the track similarity of each object of the M objects and each object of the N objects. In the method, when comparing whether the first object and the second object in the videos acquired by the two cameras are the same object, the comparison method is to determine the track similarity of the first object and the second object. And judging whether the first object and the second object are the same object or not through the comparison of the track similarity. By adopting the method for comparing the track similarity, even if the timestamps of the cameras are different, whether the first object and the second object are the same object or not can be accurately judged, the tracking of the objects is realized, the accuracy of tracking the objects is improved, and the technical problem of low accuracy of tracking the objects in the related technology is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

FIG. 1 is a schematic illustration of an application environment of an alternative method of matching objects in video according to an embodiment of the invention;

FIG. 2 is a flow chart of an alternative method of matching objects in video according to an embodiment of the invention;

FIG. 3 is a schematic diagram of an alternative method of matching objects in video according to an embodiment of the invention;

FIG. 4 is a schematic diagram of an alternative method of matching objects in video according to an embodiment of the invention;

FIG. 5 is a schematic diagram of an alternative method of matching objects in video according to an embodiment of the invention;

FIG. 6 is a schematic diagram of an alternative method of matching objects in video according to an embodiment of the invention;

FIG. 7 is a schematic diagram of an alternative apparatus for matching objects in video according to an embodiment of the present invention;

fig. 8 is a schematic structural view of an alternative electronic device according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of the embodiment of the present invention, there is provided a method for matching objects in a video, optionally, as an optional implementation manner, the method for matching objects in a video may be, but is not limited to, applied to an environment as shown in fig. 1.

Man-machine interaction between the user 102 and the user device 104 may be performed in fig. 1. The user device 104 includes a memory 106 for storing interaction data and a processor 108 for processing the interaction data. The user equipment 104 is connected with two cameras, namely a camera 1 and a camera 2. The shooting areas of the camera 1 and the camera 2 comprise a superposition area 110, the first video shot by the camera 1 at least comprises an object 112, and the second video shot by the camera 2 at least comprises an object 114. The user equipment 104 may receive the videos shot by the cameras 1 and 2, and determine whether the object 112 and the object 114 in the videos are the same object. The user 102 may view the results.

It should be noted that, the actions performed by the user equipment may also be performed by the server. For example, the two cameras send the collected video to the server, and the server judges whether the objects in the video collected by the two cameras are the same object.

The number of cameras is not limited by the scheme. The connection referred to in this scheme may be a physical connection or a connection through a network and receive or transmit data.

In the related art, when tracking an object in a video captured by a camera, if the timestamps of the cameras are different, the tracking result is inaccurate. By adopting the method provided by the scheme, even if the timestamps of the cameras are different, the objects in the video shot by the cameras can be accurately matched, the tracking of the objects is realized, and the accuracy of object tracking is improved.

Alternatively, the terminal 104 may be a mobile phone, a tablet computer, a notebook computer, a PC, etc., and the network may include, but is not limited to, a wireless network or a wired network. Wherein the wireless network comprises: bluetooth, WIFI, and other networks that enable wireless communications. The wired network may include, but is not limited to: wide area network, metropolitan area network, local area network. The server may include, but is not limited to, any hardware device that can perform the calculations.

Optionally, as an optional implementation manner, as shown in fig. 2, the method for matching objects in the video includes:

S202, acquiring a first video shot by a first camera and a second video shot by a second camera, wherein a shooting area of the first camera and a shooting area of the second camera comprise overlapping areas, the first video comprises M objects, the second video comprises N objects, and M and N are positive integers;

s204, determining the track similarity of each object in the M objects and each object in the N objects;

s206, determining that a first object of the M objects and a second object of the N objects are the same object according to the track similarity of each object of the M objects and each object of the N objects.

Alternatively, the above method for matching objects in video may be applied to the field of object tracking, but is not limited to the above method. For example, tracking the movement of objects in a video, or in the field of object recognition, for example, recognizing whether an object in a video is an object.

Taking object tracking as an example, a first video and a second video are respectively shot by a first camera and a second camera, and after a first object moves from a shooting area of the first camera to a shooting area of the second camera, the first object is included in the first video and the second video. By comparing the moving track of the first object, the object in the first video and the object in the second video can be determined to be the same object, so that the tracking of the first object is realized. As shown in fig. 3, the camera 1 and the camera 2 shoot the object 302, and tracking of the object 302 is realized.

Optionally, the first video shot by the first camera or the second video shot by the second camera in the scheme includes multiple frames of images, and an object in each frame of image in the multiple frames of images can be regarded as a point. The point may be considered the location of the object. The point may be the center point of the object or the position of the object. For example, as shown in fig. 4, a point 402 in fig. 4 is a position of the object 404 on one frame image, a point 406 is a position of the object 404 on a plurality of frames image, and the point 402 and the point 406 form a track of the object 404.

Optionally, by the method, the track of the object in the first video shot by the first camera and the track of the object in the second video shot by the second camera can be determined.

Optionally, in this embodiment, the first camera and the second camera may have different coordinate systems. For example, as shown in fig. 5, the angles of the first camera and the second camera are different, so that even if the objects in the captured video are the same object, the trajectories may be different. Therefore, for this case, the coordinate systems of the first camera and the second camera need to be unified.

Alternatively, the one or more target objects may be selected in a region of coincidence of the photographing regions of the first camera and the second camera. As shown in fig. 6, the target object 602 and the target object 604 are selected, and then, first coordinates of the target object 602 and the target object 604 in a first coordinate system of the first camera are acquired, and second coordinates of the target object 602 and the target object 604 in a second coordinate system of the second camera are acquired. Then, a conversion relation between the first coordinate and the second coordinate is obtained. For example, the coordinates of the target object 602 in the first coordinate system are (2, 4, 6), the coordinates of the target object 602 in the second coordinate system are (2,6,4), the correspondence of the coordinates is recorded, and the coordinates of the objects in the two cameras are unified.

Alternatively, the coordinates of the object photographed by the first camera may be converted into a standard coordinate system. Before conversion, it is necessary to determine a first coordinate correspondence of the first coordinate system of the first camera converted to the standard coordinate system, and determine a second coordinate correspondence of the second coordinate system of the second camera converted to the standard coordinate system.

Alternatively, after the coordinates of the object photographed by the first camera and the coordinates of the object photographed by the second camera are acquired and the coordinates are converted into the same coordinate system, the similarity of the coordinates of the two objects may be compared. For example, taking a first object in a first video and a second object in a second video as examples, a first movement track of the first object in the first video is acquired, and a second movement track of the second object in the second video is acquired, where the two movement tracks are tracks under the same coordinate system. And when the track similarity is compared, acquiring the shortest distance from each track point in the first group of track points on the first moving track to the second moving track, determining the average value of the shortest distances in the first group of track points, and determining the track similarity of the first object and the second object according to the average value.

Alternatively, the inverse of the mean may be normalized, and the normalized result may be determined as the track similarity between the first object and the second object. In this case, the larger the average value, the smaller the track similarity, and the larger the track similarity.

Alternatively, other curve similarity calculation methods, such as cosine similarity algorithm, may be used when comparing the similarity of the coordinates.

After the track similarity of each object in the first video and each object in the second video is calculated, a matrix of track similarity can be obtained. For example, M objects in the first video and N objects in the second video may together obtain m×n track similarities. At this time, pairing of every two objects can be achieved using the hungarian algorithm. And finishing pairing of M objects in the first video and N objects in the second video according to the M x N track similarities and the Hungary algorithm, and determining two objects marked as a group in the pairing result of the Hungary algorithm as the same object.

Optionally, the first camera and the second camera in the present solution may have different time stamps. The time stamp is the system time of the camera. For example, the system time of the first camera is 12:00:00, while the system time of the second camera is 12:00:02. Causing the first system time to be different from the second system time. Or, due to delay in the data transmission process, the time for the first camera and the second camera to receive the shooting command is different, and the first system time and the second system time are also different. When the first video and the second video are acquired, the absolute value of the difference value between the first system time and the second system time of the first camera and the second camera is acquired, so that the difference value of the video when the first camera and the second camera record the video is judged. For example, the first camera may acquire the first video and the second video for more than two seconds when the first camera is different from the second camera by two seconds. For example, a first video of three seconds and a second video of four seconds are acquired. This ensures that if the same object is included in the first video and the second video and the object appears in the overlapping area, the track where the object overlaps is necessarily included in the first video and the second video. In addition, since the first system time and the second system time record the system time at which the video starts to be shot. Therefore, even if the time at which the signal or instruction to start shooting video is received is different, the video time can be calibrated by the difference between the first system time and the second system time.

Optionally, in the scheme, after the track similarity of the two objects is obtained by comparison, the feature similarity of the two objects is also obtained. Such as the extrinsic features of two objects: similarity of color, shape, etc. And carrying out weighted summation on the track similarity and the feature similarity to obtain final similarity, then carrying out object matching according to the final similarity by using a Hungary algorithm, and determining two objects indicated as a group in a matching result of the Hungary algorithm as one object.

A specific example is described. When an object (such as a vehicle) moves under two cameras with shooting overlapping areas, the generated movement tracks should have the same shape when being converted into the same coordinate system, and the movement tracks may have differences in the time dimension, and the track shapes in a period of time are used for matching instead of the positions in a single time point, so that higher accuracy and better robustness can be obtained. The specific implementation steps are as follows:

given a coordinate conversion model between camera image coordinate systems with a geodetic photographing overlapping area (assuming that camera imaging accords with a small-hole imaging model and internal references are calibrated, namely photographing pictures are undistorted, and imaging of a ground plane on two camera pictures accords with an affine transformation model, model parameter estimation can be performed by using a group of corresponding points on the large ground plane in advance to obtain the coordinate conversion model), and the range of the photographing overlapping area in the pictures is calibrated in advance in the form of polygon vertex coordinates.

Each single camera processing process obtains a camera video stream and decodes the camera video stream into an image (the frame rate is greater than or equal to 25 fps).

And performing target detection and tracking on continuous frame images of the single camera. The target detection algorithm may use an algorithm such as SSD, YOLO, MASKRCNN, and the detection algorithm obtains the target frame position, and then uses the target frame position as an initial result to a tracking algorithm, where the tracking algorithm may use a related filtering algorithm such as KCF, or an algorithm such as SiameseNet type deep neural network. In order to meet the real-time requirement of a system, the scheme adopts a technical architecture of tracking based on detection, namely, after tracking a plurality of frames, the detection algorithm is called once to be used for correcting and updating a tracking result.

And comparing the similarity of the tracks of the two objects in different videos to obtain a first similarity. And comparing the external features of the two objects in the video to obtain feature similarity, and carrying out weighted summation on the first similarity and the feature similarity to obtain final similarity. And finally, matching the objects according to the final similarity by using a Hungary algorithm. For example, the first video includes 3 objects, the second video includes 4 objects, and the hungarian algorithm pairs three objects in the first video with three of four objects in the second video, where the paired objects are regarded as the same object.

According to the embodiment, by the method, whether the first object and the second object are the same object or not is judged. By adopting the method for comparing the track similarity, whether the first object and the second object are the same object can be accurately judged even if the timestamps of the cameras are different, the tracking of the objects is realized, and the accuracy of tracking the objects is improved.

As an alternative embodiment, said determining the trajectory similarity of each of the M objects to each of the N objects includes:

s1, sequentially determining each object in the M objects as a first object, determining each object in the N objects as a second object, and executing the following steps to obtain M x N track similarities:

acquiring a first moving track of the first object from the first video and acquiring a second moving track of the second object from the second video, wherein the first moving track comprises a first group of track points, each track point in the first group of track points is used for representing the position of the first object, and the second moving track comprises a second group of track points, and each track point in the second group of track points is used for representing the position of the second object;

Acquiring the shortest distance from each track point in the group of track points to the second moving track;

and determining the track similarity of the first object and the second object according to the mean value of the shortest distance, wherein the smaller the mean value is, the larger the track similarity of the first object and the second object is.

Alternatively, one object in the first video is determined as an object a, one object in the second video is determined as an object B, if the first video includes 3 objects, the second video includes 4 objects, and twelve combinations are combined, and twelve track similarities are calculated. After determining the first movement track of two objects, such as object a, and the second movement track of object B in the second video, the similarity of the two movement tracks is calculated. Alternatively, the shortest distance from each track point in the first movement track to the second movement track may be calculated, and a mean value of the shortest distances may be calculated, where the smaller the mean value, the closer the tracks of the object a and the object B are. Representing that object a is most likely the same object as object B. Taking the reciprocal of the mean value, carrying out normalization processing, and regarding the normalized result as the track similarity of the object A and the object B.

By the method, the track similarity of each object in the first video and each object in the second video is calculated, and the object which is the same object in the two videos is further determined according to the track similarity, so that the effect of improving the accuracy of tracking the object is achieved.

As an alternative embodiment, the object in the first camera is located under a first coordinate system, and the object in the second camera is located under a second coordinate system; the obtaining the first moving track of the first object from the first video and the second moving track of the second object from the second video includes:

s1, transferring the coordinates of the first object under the first coordinate system to a target coordinate system;

s2, transferring the coordinates of the second object in the second coordinate system to the target coordinate system;

s3, acquiring the first moving track of the first object in the target coordinate system, and acquiring the second moving track of the second object in the target coordinate system.

Optionally, the first coordinate system and the second coordinate system are three-dimensional rectangular coordinate systems, the first coordinate of the first object under the first coordinate system is a three-dimensional coordinate, and the second coordinate of the second object under the second coordinate system is a three-dimensional coordinate. The first coordinate and the second coordinate can be transferred to a standard coordinate system, and then the moving track of the first object and the second object under the standard coordinate system is acquired.

Alternatively, the coordinates of the first object in the first coordinate system may be converted into the second coordinate system, thereby unifying the coordinate systems of the two objects.

According to the method, the coordinate systems of the first object and the second object are unified, so that the same track can be accurately compared when the track similarity is compared, and the accuracy of tracking the objects is improved.

As an alternative embodiment, before transferring the coordinates of the first object under the first coordinate system into the target coordinate system, the method further comprises:

s1, acquiring a target coordinate value of a target object in the target coordinate system, a first coordinate value of the target object in the first coordinate system and a second coordinate value of the target object in the second coordinate system, wherein the target object is located in the target coordinate system and located in the overlapping area;

s2, calculating a first coordinate corresponding relation between the first coordinate system and the target coordinate system by using the target coordinate value and the first coordinate value;

and S3, calculating a second coordinate corresponding relation between the second coordinate system and the target coordinate system by using the target coordinate value and the second coordinate value.

Optionally, in this scheme, when transforming the coordinates of the object into another coordinate system, a corresponding rule needs to be followed.

Alternatively, for the first coordinate system and the second coordinate system and the target coordinate system, a first coordinate value of the target object under the target coordinate system under the first coordinate system may be acquired, and a second coordinate value of the target object under the target coordinate system under the second coordinate system may be recorded and acquired. The target object has corresponding target coordinates in a target coordinate system. And recording the corresponding relation between the first coordinate value and the target coordinate, and recording the corresponding relation between the second coordinate value and the target coordinate.

Through the embodiment, the corresponding relation of the coordinates is recorded through the method, so that the coordinates in one coordinate system can be accurately transferred to the other coordinate system, the track similarity can be accurately compared, and the effect of improving the accuracy of tracking the object is achieved.

As an alternative embodiment, the acquiring the first movement track of the first object from the first video and the second movement track of the second object from the second video includes:

s1, acquiring a moving track of the first object and the second object in the overlapping area.

Optionally, if the moving track of the first object and the moving track of the second object in the overlapping area are obtained, after the moving track is obtained, when the track similarity is compared, if the track similarity is the same object, the track similarity is high, and the track similarity of different objects is low. Obvious distinction between the same object and different objects is realized, and the accuracy of tracking the objects is improved.

As an alternative to this embodiment of the present invention,

after determining the trajectory similarity of each of the M objects to each of the N objects, further comprising: s1, determining appearance feature similarity of each object in the M objects and each object in the N objects;

the determining that the first object of the M objects and the second object of the N objects are the same object according to the track similarity between each object of the M objects and each object of the N objects includes: s1, determining that a first object in the M objects and a second object in the N objects are the same object according to the sum of the track similarity and the appearance feature similarity of each object in the M objects and each object in the N objects.

According to the embodiment, the objects in the first video and the second video are compared by the method, so that the same objects can be matched according to the similarity of the track similarity and the similarity of the appearance characteristics, the matching accuracy is improved, and the object tracking accuracy is further improved.

As an optional implementation manner, the determining that the first object of the M objects and the second object of the N objects are the same object according to the track similarity of each object of the M objects and each object of the N objects includes:

S1, matching the M objects with the N objects according to the track similarity by using a Hungary algorithm to obtain a matching result, wherein the matching result comprises a plurality of groups of object groups, each group of object groups comprises one object of the M objects and one object of the N objects, each group of object groups comprises a target object group, and each target object group comprises the first object and the second object;

s2, determining two objects in each object group as the same object.

According to the method, the two objects are determined to be the same object, so that accuracy of determining the same object can be improved, and in the process that a plurality of objects exist in a first video and a second video and the track of the object which is similar to that of the first video and the second video possibly exists in the plurality of objects, the method is higher in identification accuracy.

As an optional implementation manner, the acquiring the first video shot by the first camera and the second video shot by the second camera includes:

s1, acquiring the first system time and the second system time;

s2, calculating the absolute value of the difference value between the first system time and the second system time to obtain the first threshold value;

s3, starting from the first system time, acquiring the video with the time length larger than the first threshold value shot by the first camera as the first video, and starting from the second system time, acquiring the video with the time length larger than the first threshold value shot by the second camera as the second video.

For example, the system time of the first camera is 12:50:00, the system time of the second camera is 12:50:30, the difference is 30 seconds, the first camera 12:50:00 receives the shooting start instruction, and the second camera receives the shooting start instruction two seconds later than the first camera, so that the system time of the first camera and the system time of the second camera start shooting are 32 seconds. And when the first video is determined, starting from 12:50:00, determining the video shot by the first camera with the time length longer than 32 seconds, and when the second video is determined, starting from 12:50:32, and determining the video shot by the second camera with the time length longer than 32 seconds. And ensuring that the contents shot in the same time period exist in the first video and the second video so as to compare tracks.

When the video processing method is specifically compared, a buffer queue can be maintained in the central node, and the first video and the second video can be buffered in the buffer queue for comparison. The length of the cache queue is greater than or equal to a first threshold.

According to the method, when the first video and the second video are determined, at least the content recorded in the same time period in the first video and the second video can be ensured, and the track comparison efficiency is improved.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.

According to another aspect of the embodiment of the present invention, there is also provided a matching apparatus for an object in a video, which is used for implementing the matching method for an object in a video. As shown in fig. 7, the apparatus includes:

(1) An obtaining unit 702, configured to obtain a first video captured by a first camera and a second video captured by a second camera, where a capturing area of the first camera and a capturing area of the second camera include overlapping areas, the first video includes M objects, the second video includes N objects, and the M and the N are positive integers;

(2) A first determining unit 704, configured to determine a trajectory similarity of each of the M objects to each of the N objects;

(3) The second determining unit 706 is configured to determine that a first object of the M objects and a second object of the N objects are the same object according to a trajectory similarity between each object of the M objects and each object of the N objects.

Alternatively, the matching device of the object in the video can be applied to the field of object tracking, but is not limited to the field of object tracking. For example, tracking the movement of objects in a video, or in the field of object recognition, for example, recognizing whether an object in a video is an object.

Optionally, the first camera and the second camera in the present solution may have different time stamps. And when the first video and the second video are acquired, acquiring the time difference between the first camera and the second camera. For example, the first camera is currently 12:00:00, and the second camera is currently 12:00:02, which are two seconds apart, and when the first video and the second video are acquired, the first video and the second video which are more than two seconds are acquired. For example, ten minutes of first video and second video are acquired. This ensures that if the same object is included in the first video and the second video and the object appears in the overlapping area, the track where the object overlaps is necessarily included in the first video and the second video.

As an alternative embodiment, the first determining unit includes:

(1) The first determining module is configured to determine each of the M objects as a first object in turn, determine each of the N objects as a second object, and perform the following steps to obtain m×n track similarities:

As an alternative embodiment, the object in the first camera is located under a first coordinate system, and the object in the second camera is located under a second coordinate system; the first determining module is further configured to perform the steps of:

transferring the coordinates of the first object in the first coordinate system to a target coordinate system;

transferring coordinates of the second object in the second coordinate system into the target coordinate system;

and acquiring the first moving track of the first object in the target coordinate system, and acquiring the second moving track of the second object in the target coordinate system.

As an alternative embodiment, the first determining module is further configured to perform the following steps:

before transferring the coordinates of the first object in the first coordinate system to a target coordinate system, acquiring a target coordinate value of the target object in the target coordinate system, a first coordinate value of the target object in the first coordinate system and a second coordinate value of the target object in the second coordinate system, wherein the target object is located in the target coordinate system and located in the overlapping area;

calculating a first coordinate corresponding relation between the first coordinate system and the target coordinate system by using the target coordinate value and the first coordinate value;

and calculating a second coordinate corresponding relation between the second coordinate system and the target coordinate system by using the target coordinate value and the second coordinate value.

and acquiring the moving track of the first object and the second object in the overlapping area.

As an alternative to this embodiment of the present invention,

the device further comprises: (1) A third determining unit, configured to determine, after determining the trajectory similarity of each of the M objects to each of the N objects, an appearance feature similarity of each of the M objects to each of the N objects;

the second determination unit includes: (1) And the second determining module is used for determining that a first object in the M objects and a second object in the N objects are the same object according to the sum of the track similarity and the appearance feature similarity of each object in the M objects and each object in the N objects.

As an alternative embodiment, the second determining unit includes:

(1) The matching module is used for matching the M objects with the N objects according to the track similarity by using a Hungary algorithm to obtain a matching result, wherein the matching result comprises a plurality of groups of object groups, each group of object groups comprises one object of the M objects and one object of the N objects, each group of object groups comprises a target object group, and each target object group comprises the first object and the second object;

(2) And the third determining module is used for determining two objects in each object group as the same object.

As an alternative embodiment, the acquisition unit comprises:

(1) The first acquisition module is used for acquiring the first system time and the second system time;

(2) The calculating module is used for calculating the absolute value of the difference value between the first system time and the second system time to obtain the first threshold value;

(3) The second obtaining module is configured to obtain, from the first system time, a video with a time length greater than the first threshold value captured by the first camera as the first video, and obtain, from the second system time, a video with a time length greater than the first threshold value captured by the second camera as the second video.

According to a further aspect of the embodiments of the present invention there is also provided an electronic device for implementing the above-described method of matching objects in video, as shown in fig. 8, the electronic device comprising a memory 802 and a processor 804, the memory 802 having stored therein a computer program, the processor 804 being arranged to perform the steps of any of the method embodiments described above by means of the computer program.

Alternatively, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of the computer network.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:

s1, acquiring a first video shot by a first camera and a second video shot by a second camera, wherein a shooting area of the first camera and a shooting area of the second camera comprise overlapping areas, the first video comprises M objects, the second video comprises N objects, and M and N are positive integers;

s2, determining the track similarity of each object in the M objects and each object in the N objects;

s3, determining that a first object in the M objects and a second object in the N objects are the same object according to the track similarity of each object in the M objects and each object in the N objects.

Alternatively, it will be understood by those skilled in the art that the structure shown in fig. 8 is only schematic, and the electronic device may also be a terminal device such as a smart phone (e.g. an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile internet device (Mobile Internet Devices, MID), a PAD, etc. Fig. 8 is not limited to the structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 8, or have a different configuration than shown in FIG. 8.

The memory 802 may be used to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for matching objects in video in the embodiment of the present invention, and the processor 804 executes the software programs and modules stored in the memory 802, thereby executing various functional applications and data processing, that is, implementing the method for matching objects in video described above. Memory 802 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 802 may further include memory remotely located relative to processor 804, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 802 may be used to store, but is not limited to, information such as a first video, a second video, a track of a first object, and a track of a second object. As an example, as shown in fig. 8, the memory 802 may include, but is not limited to, an acquisition unit 702, a first determination unit 704, and a second determination unit 706 in a matching apparatus including the object in the video. In addition, other module units in the matching device of the object in the video may be included, but are not limited to, and are not described in detail in this example.

Optionally, the transmission device 806 is used to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission means 806 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 806 is a Radio Frequency (RF) module for communicating wirelessly with the internet.

In addition, the electronic device further includes: a display 808 for displaying the matching result; and a connection bus 810 for connecting the respective module parts in the above-described electronic device.

According to a further aspect of embodiments of the present invention there is also provided a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.

Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:

Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present invention.

In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A method for matching objects in video, comprising:

acquiring a first video with a time length longer than a first threshold value and a second video with a time length longer than the first threshold value, wherein a shooting area of the first camera and a shooting area of the second camera comprise overlapping areas, the first video comprises M objects, the second video comprises N objects, M and N are positive integers, the first threshold value is a difference value between a first system time and a second system time, the first system time is a system time when the first camera starts shooting the first video, and the second system time is a system time when the second camera starts shooting the second video;

Determining a movement track corresponding to each of the M objects and a movement track corresponding to each of the N objects based on coordinates corresponding to each of the M objects in a target coordinate system and coordinates corresponding to each of the N objects in the target coordinate system;

determining the track similarity of each of the M objects and each of the N objects based on the track shape of the movement tracks corresponding to the M objects and the shape similarity between the track shapes of the movement tracks corresponding to the N objects;

and determining that a first object in the M objects and a second object in the N objects are the same object according to the track similarity of each object in the M objects and each object in the N objects.

2. The method of claim 1, wherein the determining the trajectory similarity of each of the M objects to each of the N objects based on the shape similarity between the trajectory shape of the movement trajectories corresponding to the M objects and the trajectory shape of the movement trajectories corresponding to the N objects comprises:

Determining each object in the M objects as a first object in turn, determining each object in the N objects as a second object, and performing the following steps to obtain m×n track similarities:

3. The method of claim 1, wherein before determining the movement track corresponding to each of the M objects and the movement track corresponding to each of the N objects, further comprises:

Transferring coordinates of each of the M objects in a first coordinate system to a target coordinate system, wherein each of the M objects is located in the first coordinate system;

transferring the coordinates of each of the N objects in a second coordinate system to the target coordinate system, wherein each of the N objects is located under the second coordinate system.

4. A method according to claim 3, wherein transferring the coordinates of each of the M objects in the first coordinate system into a target coordinate system further comprises:

acquiring a target coordinate value of a target object in the target coordinate system, a first coordinate value of the target object in the first coordinate system and a second coordinate value of the target object in the second coordinate system, wherein the target object is located in the target coordinate system and located in the overlapping area;

5. The method of claim 2, wherein the acquiring a first movement trajectory of the first object from the first video and a second movement trajectory of the second object from the second video comprises:

6. The method of claim 1, further comprising, after said determining the trajectory similarity of each of said M objects to each of said N objects, respectively:

determining appearance feature similarity of each of the M objects and each of the N objects;

the determining that the first object of the M objects and the second object of the N objects are the same object according to the track similarity between each object of the M objects and each object of the N objects includes: and determining that a first object in the M objects and a second object in the N objects are the same object according to the sum of the track similarity and the appearance feature similarity of each object in the M objects and each object in the N objects.

7. The method of claim 1, wherein determining that the first object of the M objects is the same as the second object of the N objects according to the trajectory similarity of each of the M objects to each of the N objects, respectively, comprises:

matching the M objects with the N objects according to the track similarity by using a Hungary algorithm to obtain a matching result, wherein the matching result comprises a plurality of groups of object groups, each group of object groups comprises one object of the M objects and one object of the N objects, each group of object groups comprises a target object group, and each target object group comprises the first object and the second object;

and determining two objects in each object group as the same object.

8. The method according to any one of claims 1 to 7, wherein the acquiring the first video with the time period greater than the first threshold for the video shot by the first camera and the second video with the time period greater than the first threshold for the video shot by the second camera includes:

acquiring the first system time and the second system time;

Calculating the absolute value of the difference value between the first system time and the second system time to obtain the first threshold value;

and starting from the first system time, acquiring the video with the time length larger than the first threshold value shot by the first camera as the first video, and starting from the second system time, acquiring the video with the time length larger than the first threshold value shot by the second camera as the second video.

9. A device for matching objects in video, comprising:

the device comprises an acquisition unit, a first camera and a second camera, wherein the time length of a video shot by the first camera is larger than a first threshold, the time length of a first video shot by the second camera is larger than a second video shot by the first threshold, the shooting area of the first camera and the shooting area of the second camera comprise overlapping areas, the first video comprises M objects, the second video comprises N objects, M and N are positive integers, the first threshold is the difference value between a first system time and a second system time, the first system time is the system time when the first camera starts shooting the first video, and the second system time is the system time when the second camera starts shooting the second video;

A first determining unit, configured to determine a movement track corresponding to each of the M objects, and a movement track corresponding to each of the N objects, based on coordinates corresponding to each of the M objects in a target coordinate system and coordinates corresponding to each of the N objects in the target coordinate system, and determine a shape similarity between a track shape of the movement track corresponding to each of the M objects and a track shape of the movement track corresponding to each of the N objects, and a track similarity between each of the M objects and each of the N objects, respectively;

and the second determining unit is used for determining that the first object of the M objects and the second object of the N objects are the same object according to the track similarity of each object of the M objects and each object of the N objects.

10. The apparatus according to claim 9, wherein the first determining unit comprises:

the first determining module is configured to determine each of the M objects as a first object in turn, determine each of the N objects as a second object, and perform the following steps to obtain m×n track similarities:

11. The apparatus of claim 10, wherein the first determination module is further configured to perform the steps of:

12. The apparatus of claim 11, wherein the first determination module is further configured to perform the steps of:

before transferring the coordinates of each object in the M objects in the first coordinate system into a target coordinate system, acquiring a target coordinate value of the target object in the target coordinate system, a first coordinate value of the target object in the first coordinate system and a second coordinate value of the target object in the second coordinate system, wherein the target object is an object positioned in the target coordinate system and positioned in the overlapping area;

13. The apparatus of claim 10, wherein the first determination module is further configured to perform the steps of:

14. A storage medium storing a computer program, characterized in that the computer program when run performs the method of any one of claims 1 to 8.

15. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1 to 8 by means of the computer program.