CN115908545A

CN115908545A - Target track generation method and device, electronic equipment and medium

Info

Publication number: CN115908545A
Application number: CN202211088454.XA
Authority: CN
Inventors: 宋荣; 刘晓东
Original assignee: Shanghai Goldway Intelligent Transportation System Co Ltd
Current assignee: Shanghai Goldway Intelligent Transportation System Co Ltd
Priority date: 2021-11-17
Filing date: 2022-09-07
Publication date: 2023-04-04
Also published as: CN114066974A; WO2023087860A1

Abstract

The embodiment of the invention provides a method, a device, electronic equipment and a medium for generating a target track, wherein the method comprises the following steps: aiming at an image group acquired by each image acquisition device, acquiring the image position of each target in each image frame in the image group; converting the image position into a position under a world coordinate system according to a preset conversion relation to obtain a world position corresponding to the image position; for each image frame set, fusing the world positions of the same targets in each image frame in the image frame set to obtain the fusion positions of the targets; and for each target, associating the fusion position of the target according to the acquisition time sequence of the image frame set corresponding to the target to generate a fusion track of the target. By adopting the method, the tracks of a plurality of targets can be generated, and the method can meet the track generation requirements of a plurality of complex targets in the scenes such as urban traffic and the like.

Description

Target track generation method and device, electronic equipment and medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for generating a target trajectory, an electronic device, and a medium.

Background

In some surveillance scenes, multiple cameras may be provided for capturing moving objects in the scene, for example, multiple cameras are typically provided in areas such as intersections for capturing surveillance video of objects such as vehicles including intersections, and for example, multiple cameras may be provided in animal arenas for capturing animals in the arenas. Furthermore, the action track of the target such as the vehicle and the animal can be generated according to the shot monitoring video, and then whether the vehicle has illegal behaviors such as running red light and whether the animal has abnormity or not can be determined according to the action track.

The method for generating the action track of the target mainly comprises the following steps: and detecting the target in the monitoring video, then constructing a target re-identification model for matching the initial position of a specific target to be identified in other cameras, and finally obtaining the action track of the specific target by forward and reverse analysis of the target track.

However, the above method can only generate a specific track of a certain target, and cannot meet the track generation requirements of a plurality of complex targets in scenes such as urban traffic.

Disclosure of Invention

The embodiment of the invention aims to provide a method, a device, an electronic device and a medium for generating target tracks so as to generate action tracks of a plurality of targets.

In a first aspect, an embodiment of the present invention provides a method for generating a target trajectory, including:

aiming at an image group acquired by each image acquisition device, acquiring the image position of each target in each image frame in the image group;

converting the image position into a position under a world coordinate system according to a preset conversion relation to obtain a world position corresponding to the image position;

for each image frame set, fusing the world positions of the same targets in each image frame in the image frame set to obtain the fusion positions of the targets; the image frame set is an image set formed by all image frames with synchronous acquisition time;

and for each target, associating the fusion position of the target according to the acquisition time sequence of the image frame set corresponding to the target to generate a fusion track of the target.

Optionally, the number of image acquisition devices is at least 3;

the fusing the world positions of each identical target in each image frame in the image frame set to obtain the fused position of the target includes:

for any two image frames in the image frame set, according to the world position of each target in the image frame set, constructing a similarity matrix of each target in the two image frames;

resolving the similarity matrix according to a Hungarian algorithm; if the similarity between the two targets which are obtained by calculation and located at the corresponding positions in the similarity matrix and located in different image frames is larger than a preset similarity threshold value, determining that the two targets are the same target;

and fusing the world positions of the same target in the image frame set to obtain a fused position, and taking the fused position as the fused position of the fused target corresponding to the same target.

Optionally, the constructing, for any two image frames in the image frame set, a similarity matrix of each object in the two image frames according to the world position of each object in the image frame set includes:

determining a speed direction of each target in the set of image frames based on the world location of the target;

and for any two image frames in the image frame set, constructing a similarity matrix of each target in the two image frames based on the world positions and the speed directions of the targets in the two image frames.

Optionally, after the fusing the world positions of the same target in the image frame set to obtain a fused position, and taking the fused position as the fused position of the fused target corresponding to the same target, the method further includes:

for each fusion target, if at least two targets from the same image group exist in a plurality of same targets corresponding to the fusion target, removing the target with the distance between the at least two targets and the fusion target being non-minimum distance; and updates the fusion position of the fusion target.

Optionally, before the associating, for each target, the fusion position of the target according to the acquisition time sequence of the image frame set corresponding to the target, and generating the fusion trajectory of the target, the method further includes:

for each image group, generating a single-camera trajectory for each target in the image group based on the plurality of world locations of the target;

for each target, associating the fusion position of the target according to the acquisition time sequence of the image frame set corresponding to the target to generate a fusion track of the target, including:

for each fusion target, if the current fusion position of the fusion target and the source of the fusion position in the existing track both comprise positions in the same single-camera track, determining that the current fusion position of the fusion target is associated with the existing track; the existing track is formed by associating the fusion position of each target with the acquisition time before the current fusion position of the fusion target;

if the current fusion position of the fusion target does not comprise a position in the same single-camera track with the source of the fusion position in any one existing track, and the current fusion position of the fusion target is in a preset motion state, determining that the current fusion position of the fusion target is not related to the existing track;

if the current fusion position of the fusion target does not comprise a position in the same single-camera track with the source of the fusion position in any existing track, and the current fusion position of the fusion target is not in a preset motion state, constructing a correlation matrix between each fusion target in the image frame corresponding to the fusion target and the existing track;

calculating the association matrix according to a Hungarian algorithm, and if the correlation between the fusion target and the existing track at the corresponding position in the association matrix is larger than a preset correlation threshold value, determining that the current fusion position of the fusion target is associated with the existing track;

and if the fusion positions corresponding to the fusion target are all associated, generating a fusion track of the fusion target based on all the associated fusion positions of the fusion target.

Optionally, after determining that the fusion target is associated with the existing track, the method further includes:

the existing trajectory is updated based on the current fusion location of the fusion target.

Optionally, before the fusing, for each image frame set, the world position of each identical object in each image frame in the image frame set to obtain the fused position of the object, the method further includes:

matching the single camera tracks in any two image groups;

optionally selecting a group of matched single-camera tracks, and respectively calculating the distance from the starting position and the ending position of the first single-camera track in the group to each position of the second single-camera track; and calculating the distance between the start position and the end position of the second single-camera trajectory in the group to each position of the first single-camera trajectory, respectively;

if a position exists in the second single-camera track, wherein the distance between the second single-camera track and the initial position or the ending position of the first single-camera track is smaller than a preset distance threshold value, and a position exists in the first single-camera track, wherein the distance between the first single-camera track and the initial position or the ending position of the second single-camera track is smaller than the preset distance threshold value, the group of matched single-camera tracks is reserved; otherwise, deleting the single-camera track of the group;

aiming at each group of reserved matched single-camera tracks, selecting a shortest distance from the starting position and the ending position of a first single-camera track in the group to each position of a second single-camera track and the distance from the starting position and the ending position of the second single-camera track in the group to each position of the first single-camera track, and determining image frames where two positions corresponding to the shortest distance are respectively located as primary synchronous image frames;

and calculating the average time of the acquisition time of each pair of primary synchronous image frames in any two image groups, and taking the image frame with the acquisition time of the average time in the two image groups as the final synchronous image frame of the two image groups.

In a second aspect, an embodiment of the present invention further provides a device for generating a target trajectory, including:

the position acquisition module is used for acquiring the image position of each target in each image frame in the image group aiming at the image group acquired by each image acquisition device;

the position conversion module is used for converting the image position into a position under a world coordinate system according to a preset conversion relation to obtain a world position corresponding to the image position;

the position fusion module is used for fusing the world positions of the same targets in the image frames in each image frame set aiming at each image frame set to obtain the fusion positions of the targets; the image frame set is an image set formed by all image frames with synchronous acquisition time;

and the track generation module is used for associating the fusion positions of the targets according to the acquisition time sequence of the image frame set corresponding to the targets so as to generate the fusion track of the targets.

Optionally, the number of image acquisition devices is at least 3;

the location fusion module includes:

the similarity matrix determination submodule is used for constructing a similarity matrix of each target in any two image frames according to the world position of each target in the image frame set aiming at the two image frames in the image frame set;

the target determining submodule is used for calculating the similarity matrix according to the Hungarian algorithm, and if the similarity between two targets which are located at corresponding positions in the similarity matrix and are located in different image frames obtained through calculation is larger than a preset similarity threshold value, the two targets are determined to be the same target;

and the position fusion sub-module is used for fusing the world positions of the same target in the image frame set to obtain a fused position, and taking the fused position as the fused position of the fused target corresponding to the same target.

Optionally, the similarity matrix determining submodule is specifically configured to determine a speed direction of each target in the image frame set based on the world position of the target; and for any two image frames in the image frame set, constructing a similarity matrix of each target in the two image frames based on the world positions and the speed directions of the targets in the two image frames.

Optionally, the apparatus further comprises:

a fusion position updating module used for removing the target with the distance with the fusion target being non-minimum distance in the at least two targets if at least two targets from the same image group exist in a plurality of same targets corresponding to the fusion target aiming at each fusion target; and updates the fusion position of the fusion target.

Optionally, the apparatus further comprises:

a single camera trajectory generation module to generate, for each image group, a single camera trajectory for each target in the image group based on the plurality of world locations of the target;

the track generation module is specifically used for determining that the current fusion position of the fusion target is associated with the existing track if the current fusion position of the fusion target and the source of the fusion position in the existing track both comprise positions in the same single-camera track; the existing track is formed by associating the fusion position of each target with the acquisition time before the current fusion position of the fusion target; if the current fusion position of the fusion target does not comprise a position in the same single-camera track with the source of the fusion position in any existing track, and the current fusion position of the fusion target is in a preset motion state, constructing a correlation matrix between each fusion target in the image frame corresponding to the fusion target and the existing track; calculating the association matrix according to a Hungarian algorithm, and if the association degree between the fusion target and the existing track at the corresponding position in the association matrix is larger than a preset association degree threshold value, determining that the current fusion position of the fusion target is associated with the existing track; and if the fusion positions corresponding to the fusion target are all associated, generating a fusion track of the fusion target based on all the associated fusion positions of the fusion target.

Optionally, the apparatus further comprises:

and the track updating module is used for updating the existing track based on the current fusion position of the fusion target.

Optionally, the apparatus further comprises:

a synchronization time determination module to generate, for each image group, a single camera trajectory for each target in the image group based on the plurality of world locations of the target; matching the single camera tracks in any two image groups; optionally selecting a group of matched single-camera tracks, and respectively calculating the distance from the starting position and the ending position of the first single-camera track in the group to each position of the second single-camera track; and calculating the distance between the start position and the end position of the second single-camera trajectory in the group to each position of the first single-camera trajectory, respectively; if a position exists in the second single-camera track, wherein the distance between the second single-camera track and the starting position or the ending position of the first single-camera track is smaller than a preset distance threshold, and a position exists in the first single-camera track, wherein the distance between the first single-camera track and the starting position or the ending position of the second single-camera track is smaller than a preset distance threshold, the group of matched single-camera tracks are reserved; otherwise, deleting the single-camera track of the group; aiming at each group of reserved matched single-camera tracks, selecting a shortest distance from the starting position and the ending position of a first single-camera track in the group to each position of a second single-camera track and the distance from the starting position and the ending position of the second single-camera track in the group to each position of the first single-camera track, and determining image frames where two positions corresponding to the shortest distance are respectively located as primary synchronous image frames; and calculating the average time of the acquisition time of each pair of primary synchronous image frames in any two image groups, and taking the image frame with the acquisition time of the average time in the two image groups as the final synchronous image frame of the two image groups.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

a processor adapted to perform the method steps of any of the above first aspects when executing a program stored in the memory.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method steps described in any one of the above first aspects.

The embodiment of the invention has the following beneficial effects:

by adopting the method provided by the embodiment of the invention, the image position of each target in each image frame in the image group is acquired aiming at the image group acquired by each image acquisition device; converting the image position into a position under a world coordinate system according to a preset conversion relation to obtain a world position corresponding to the image position; for each image frame set, fusing the world positions of the same targets in each image frame in the image frame set to obtain the fusion positions of the targets; and for each target, associating the fusion position of the target according to the acquisition time sequence of the image frame set corresponding to the target to generate a fusion track of the target. The method provided by the embodiment of the invention can generate the tracks of a plurality of targets, and can meet the track generation requirements of a plurality of complex targets in scenes such as urban traffic and the like.

Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by referring to these drawings.

Fig. 1 is a flowchart of a target trajectory generation method according to an embodiment of the present invention;

FIG. 2 is a flow chart of an automatic calibration method according to an embodiment of the present invention;

FIG. 3 is a flowchart of a location fusion method according to an embodiment of the present invention;

FIG. 4 is a flowchart of generating a fused track of a target according to an embodiment of the present invention;

FIG. 5 is a schematic flow chart of a trajectory rectification method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a target track generation apparatus according to an embodiment of the present invention;

fig. 7 is another schematic structural diagram of a target trajectory generation apparatus according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a target trajectory generation apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived from the embodiments given herein by one of ordinary skill in the art, are within the scope of the invention.

In order to generate action tracks of multiple targets and meet track generation requirements of multiple complex targets in scenes such as urban traffic, the embodiment of the invention provides a target track generation method and device, electronic equipment, a storage medium and a computer program product.

First, a method for generating a target trajectory according to an embodiment of the present invention is described below. The method for generating the target trajectory provided by the embodiment of the present invention may be applied to any electronic device with an image processing function, and is not specifically limited herein.

Fig. 1 is a flowchart of a method for generating a target trajectory according to an embodiment of the present invention, as shown in fig. 1, where the method includes:

s101, aiming at an image group collected by each image collecting device, acquiring the image position of each target in each image frame in the image group.

And S102, converting the image position into a position under a world coordinate system according to a preset conversion relation, and obtaining a world position corresponding to the image position.

S103, for each image frame set, fusing the world positions of the same targets in the image frames in the image frame set to obtain the fused positions of the targets.

The image frame set is an image set formed by all image frames with synchronous acquisition time.

And S104, for each target, associating the fusion position of the target according to the acquisition time sequence of the image frame set corresponding to the target, and generating a fusion track of the target.

By adopting the method provided by the embodiment of the invention, the image position of each target in each image frame in the image group is acquired aiming at the image group acquired by each image acquisition device; converting the image position into a position under a world coordinate system according to a preset conversion relation to obtain a world position corresponding to the image position; for each image frame set, fusing the world positions of the same targets in each image frame in the image frame set to obtain the fusion positions of the targets; and for each target, associating the fusion positions of the target according to the acquisition time sequence of the image frame set corresponding to the target to generate a fusion track of the target. The method provided by the embodiment of the invention can generate the tracks of a plurality of targets, and can meet the track generation requirements of a plurality of complex targets in scenes such as urban traffic and the like.

The embodiment of the invention can be applied to scenes such as crossroads and the like. In the embodiment of the invention, a plurality of image acquisition devices are arranged in a specific application scene, the position and the angle of each image acquisition device are different, and each image acquisition device can acquire images at different angles in the application scene. Each image acquisition device can acquire a plurality of frames of images as an image group. Each frame of image may include one or more targets in the application scene, and the targets may specifically be movable targets such as animals and vehicles in the application scene. The image capturing device may be a camera, a video recorder, or the like.

For each image acquisition device, because the angles of the image acquisition devices in the actual application scenes are different, the image position of the target in the image acquired by the image acquisition device is not uniform with the corresponding position of the target in the real world, and the track of the target needs to be displayed on a high-precision map in the application scenes of urban traffic and the like, so that the target is accurately positioned on the high-precision map, the image position of each target in each image frame needs to be converted into the position under a world coordinate system, and the world position of the target is obtained.

In the embodiment of the present invention, for each image capturing device, a conversion relationship between an image position of a target captured by the image capturing device and a world position in a world coordinate system may be determined, and specifically, world position coordinates of a set of specified targets in an image frame captured by the image capturing device in the world coordinate system may be obtained, for example, the world position coordinates of the set of specified targets in the world coordinate system may be obtained through actual distance measurement or from a high-precision map. Wherein, the designated target can select a target with obvious characteristics, such as a lane line stop point, a guide arrow, a road fixing facility and the like. Then, for each image capturing device, a coordinate transformation matrix corresponding to the image capturing device may be calculated according to the selected world position coordinates of the plurality of designated objects corresponding to the image capturing device in the world coordinate system and the selected image position coordinates of the plurality of designated objects in the image frame by using the following formula:

the above formula can also be expressed as:

or,

wherein,

a coordinate transformation matrix corresponding to the image pickup device ₁₁ -a ₃₃ X ', y ' and w ' are respectively an abscissa, an ordinate and a vertical coordinate in world position coordinates of the specified target, and u and v are respectively an abscissa and a vertical coordinate in image position coordinates of the specified target. Wherein,since the plane height directions before and after the position coordinate conversion are normalized, a33 is 1,w' is 1.

Furthermore, the image coordinates and the world position coordinates may form an equation set, 8 parameters a11, a12, a13, a21, a22, a23, a31, a32 are solved, at least 4 pairs of image coordinates and world position coordinates are required, and the coordinate transformation matrix corresponding to the image acquisition device may be determined by solving the following linear equation through least squares.

In the process of actually calculating the coordinate transformation matrix corresponding to the image acquisition device, since the image acquisition device may have a certain distortion, the coordinate transformation matrix needs to be corrected by combining parameters of the image acquisition device, and specifically, the parameters of the image acquisition device may be multiplied on the basis of the coordinate transformation matrix calculated based on a group of specified targets to obtain a new coordinate transformation matrix. In addition, for an application scene with a more complex road condition, a coordinate transformation matrix corresponding to each group of designated targets can be respectively calculated according to a plurality of groups of designated targets in an image frame acquired by image acquisition equipment, and then a matrix corresponding to an average value of the coordinate transformation matrices corresponding to the plurality of groups of designated targets is calculated and used as a coordinate transformation matrix corresponding to the image acquisition equipment. The information of the coordinate transformation matrix corresponding to the multiple groups of specified targets is combined to be more complete, and the mapping relation between the image position coordinates and the world position coordinates can be represented more accurately.

In the embodiment of the invention, after the coordinate conversion matrix corresponding to each image acquisition device is obtained in advance, the coordinate conversion matrix can be used as a preset conversion relation, and the image position coordinates of each target in the image frame acquired by the image acquisition device are converted into world position coordinates in a world coordinate system. For example, the following formula can be used to convert the image position into a position in a world coordinate system, and a world position corresponding to the image position is obtained:

or,

wherein,

a coordinate transformation matrix corresponding to the image pickup device ₁₁ -a ₃₃ X ', y ' and w ' are respectively an abscissa, an ordinate and an ordinate in a world position, and u and v are respectively an abscissa and an ordinate in an image position.

In the embodiment of the invention, image frame synchronization processing needs to be carried out on different image acquisition devices so as to ensure that the positions of all the same targets for subsequent position fusion are at the same time. Specifically, time synchronization correction can be performed on each image acquisition device in a manual correction manner, so that it is ensured that each image acquisition device starts to acquire images at the same time, and the frequency of acquiring the images is the same. In the embodiment of the present invention, a set formed by image frames acquired by different image acquisition devices at the same time may also be determined as an image frame set according to timestamps of the image frames acquired by the image acquisition devices, so as to ensure that acquisition times of the image frames included in the image frame set are synchronous.

In a possible implementation manner, an automatic correction manner may also be adopted to perform synchronous correction on each image capturing device, specifically, fig. 2 is a flowchart of the automatic correction manner provided by the embodiment of the present invention, and as shown in fig. 2, the correction method includes:

s201, aiming at each image group, generating a single-camera track of each target based on a plurality of world positions of the target in the image group.

Each image group is formed by a plurality of image frames acquired by the same image acquisition equipment.

Specifically, the following steps A1-A2 may be employed to generate a single-camera trajectory for the target:

step A1: for each image group, the respective targets in the first image frame in the image group may be regarded as single-camera trajectories, respectively, and single-camera trajectory identifications of the respective single-camera trajectories may be generated. That is, for each image group, a single-camera track corresponding to each target in the first image frame in the image group may be generated, and a single-camera track identifier of each single-camera track may be generated.

For example, if object 1, object 2, and object 3 are included in the first image frame in the image group, then corresponding single-camera trajectories may be generated for object 1, object 2, and object 3, respectively, and an identification of the single-camera trajectories, illustratively, trajectory 1, trajectory 2, and trajectory 3, may be generated, respectively.

Step A2: for each target in the next image frame in the image group, the minimum distance between the target and each single-camera track in the previous image frame in the image group can be calculated according to the world position of the target; and if the minimum distance is smaller than a preset threshold value, determining that the target in the image frame is the same as the target in the single-camera track, taking the world position of the target as a track point of the single-camera track, and updating the single-camera track.

That is, for each target in the next image frame in the image set, the minimum distance between the target and each single-camera trajectory in the previous image frame in the image set may be calculated based on the world location of the target at the time the next image frame was acquired. If the minimum distance is smaller than a preset threshold value, it is indicated that the world position of the target when the next image frame is acquired conforms to the movement rule of the target represented by the single-camera track corresponding to the minimum distance, it may be determined that the target in the image frame is the same as the target corresponding to the single-camera track, and the world position of the target is used as a track point of the single-camera track to update the single-camera track.

The preset threshold may be set according to practical applications, for example, if the target is a vehicle, the preset threshold may be determined according to an average speed of the vehicle at the intersection and a collection time interval of two adjacent video frames in the image group, which is not specifically limited herein.

In the embodiment of the invention, a single-camera track of the target can also be generated by adopting a deep sort (multi-target analysis algorithm), a FairMot (multi-target analysis algorithm) and the like.

And S202, matching the tracks of the single camera in any two image groups.

In this embodiment, if the target has a vehicle identifier, the single-camera tracks in any two image groups may be matched according to the vehicle identifier of the target, and the single-camera tracks of the target having the same vehicle identifier may be used as a group of matched single-camera tracks. Wherein the vehicle identification is used to uniquely represent the vehicle object.

In the present embodiment, the matching degree between the single camera trajectories in any two image groups may be calculated from the distance between the single camera trajectories, and a pair of single camera trajectories with the highest matching degree may be set as a set of matched single camera trajectories. For example, for any two image groups, which are respectively an image group a composed of image frames acquired by the image acquisition device 1 and an image group B composed of image frames acquired by the image acquisition device 2, for each single-camera trajectory in the image group a, a cosine similarity of a distance between the single-camera trajectory and each single-camera trajectory in the image group B may be calculated as a matching degree, that is, a similarity, such as a cosine similarity, between the single-camera trajectory and each single-camera trajectory in the image group B may be calculated as a matching degree; and then selecting a single camera track with the highest matching degree with the single camera track in the image group B as an initial matching single camera track of the single camera track, wherein the matching degree between the initial matching single camera track of the single camera track and the single camera track reaches a preset matching degree threshold, and the initial matching single camera track of the single camera track and the single camera track can be determined as a group of matched single camera tracks. The preset matching degree threshold may be set according to an actual application scenario, and is not specifically limited herein. In this embodiment, after the matching degree between the preliminary matching single-camera track of the single-camera track and the single-camera track reaches the preset matching degree threshold, it may be further determined whether the lane number of the lane where the single-camera track is located is consistent with the lane number of the lane where the preliminary matching single-camera track of the single-camera track is located, and if so, the single-camera track and the preliminary matching single-camera track of the single-camera track may be determined as a group of matched single-camera tracks.

A set of matching single camera trajectories includes a first single camera trajectory and a second single camera trajectory from different sets of images.

S203, selecting a group of matched single-camera tracks, and respectively calculating the distance from the starting position and the ending position of the first single-camera track in the group to each position of the second single-camera track; and calculating the distance between the start position and the end position of the second single-camera track in the group to each position of the first single-camera track respectively.

That is, a set of matched single-camera trajectories is optionally selected, the distance between the start position of the first single-camera trajectory in the set to each position of the second single-camera trajectory is calculated separately, and the distance between the end position of the first single-camera trajectory in the set to each position of the second single-camera trajectory is calculated separately. And respectively calculating the distance from the start position of the second single-camera trajectory in the group to each position of the first single-camera trajectory, and respectively calculating the distance from the end position of the second single-camera trajectory in the group to each position of the first single-camera trajectory.

Specifically, the distance of world position coordinates between the positions is calculated in this step.

For example, a single camera trajectory a in image group 1 matches a single camera trajectory B in image group 2, where the single camera trajectory a is a first single camera trajectory and the single camera trajectory B is a second single camera trajectory. In this step, the distance between the start position of the single-camera trajectory a and each position of the single-camera trajectory B, and the distance between the end position of the single-camera trajectory a and each position of the single-camera trajectory B, and the distance between the start position and the end position of the single-camera trajectory B and each position of the single-camera trajectory a, and the distance between the end position of the single-camera trajectory B and each position of the single-camera trajectory a may be calculated.

S204, if a position exists in the second single-camera track, wherein the distance between the second single-camera track and the starting position or the ending position of the first single-camera track is smaller than a preset distance threshold, and a position exists in the first single-camera track, wherein the distance between the first single-camera track and the starting position or the ending position of the second single-camera track is smaller than a preset distance threshold, the matched single-camera track of the group is reserved; otherwise, the single camera track of the group is deleted.

In the embodiment of the present invention, the preset distance threshold may be set to 1 meter or 2 meters, etc.

For example, if the preset distance threshold is set to 1 meter, for the matched single-camera track a and single-camera track B, if the distance between the third position in the single-camera track B and the start position of the single-camera track a is less than 1 meter, and the distance between the fourth position in the single-camera track a and the start position of the single-camera track B is less than 1 meter, the matched single-camera track a and the matched single-camera track B are retained.

And if the distance between any position in the single-camera track B and the initial position of the single-camera track A is not less than 1 meter, the distance between any position in the single-camera track B and the end position of the single-camera track A is not less than 1 meter, the distance between any position in the single-camera track A and the initial position of the single-camera track B is not less than 1 meter, and the distance between any position in the single-camera track A and the end position of the single-camera track B is not less than 1 meter, deleting the matched single-camera track A and the matched single-camera track B.

S205, aiming at each group of the reserved matched single-camera tracks, selecting the shortest distance from the starting position and the ending position of the first single-camera track in the group to each position of the second single-camera track and the distance from the starting position and the ending position of the second single-camera track in the group to each position of the first single-camera track, and determining the image frames of the two positions corresponding to the shortest distance as primary synchronous image frames.

For example, if the matched single camera track a and the matched single camera track B are reserved, and for the matched single camera track a and the matched single camera track B, if a distance between a start position and an end position of the single camera track a to each position of the single camera track B, and a distance between a start position of the single camera track a and a third position of the single camera track B among distances between a start position and an end position of the single camera track B to each position of the single camera track a, is the shortest, an image frame in which the start position of the single camera track a is located and an image frame in which the third position of the single camera track B is located may be the primary synchronization image frame.

S206, calculating the average time of the acquisition time of each pair of primary synchronous image frames in any two image groups, and taking the image frame with the acquisition time being the average time in the two image groups as the final synchronous image frame of the two image groups.

For example, for image group 1 and image group 2, if the image frame A1 in the image group 1 and the image frame B1 in the image group 2 are primary synchronization image frames, and the image frame A2 in the image group 1 and the image frame B2 in the image group 2 are primary synchronization image frames, the acquisition time of the image frame A1 is t _A1 The acquisition time of the image frame A2 is t _A2 The acquisition time of the image frame B1 is t _B1 The acquisition time of the image frame B2 is t _B2 . The average time t can be calculated _{Average out} ＝(t _A1 +t _A2 +t _B1 +t _B2 )/4. The acquisition time in image set 1 can be t _{Average out} The image frame A3 and the image group 2 are collected for time t _Average As the final synchronization image frame of the two image groups.

If the image group does not exist, the acquisition time is t _Average Then the acquisition time and t in the image group can be determined _Average The image frame with the smallest difference therebetween serves as the final synchronization image frame of the image group.

Also, the image group 1 is distant from the image frame A3 by the number of frames after the image frame A3, in synchronization with the acquisition time between the image frames of the image group 2 which are distant from the image frame B3 by the same number of frames after the image frame B3. That is, the image frame A3 is synchronized with the acquisition time of the image frame B3, and the acquisition times of the respective corresponding image frames acquired after the two are synchronized.

Based on the method described in fig. 2, image frames with synchronized acquisition time between each image group can be automatically determined.

If multiple groups of matched single-camera tracks exist between any two image groups, the number of frames of the time-synchronized image frames can be calculated according to each group of matched single-camera tracks, and the average value of the number of frames of the time-synchronized image frames determined by the multiple groups of matched single-camera tracks is used as a time synchronization frame. For example, a single camera track a in the image group 1 is matched with a single camera track B in the image group 2, and it is determined that the acquisition time of the first image frame in the image group 1 is synchronized with the acquisition time of the third image frame in the image group 2 based on the single camera track a and the single camera track B, a single camera track C in the image group 1 is matched with a single camera track D in the image group 2, and it is determined that the acquisition time of the first image frame in the image group 1 is synchronized with the acquisition time of the fifth image frame in the image group 2 based on the single camera track C and the single camera track D, then an average value of the number of matching frames may be taken, and the acquisition time synchronization of the first image frame in the image group 1 and the acquisition time synchronization of the fourth image frame in the image group 2 is finally obtained.

By adopting the method provided by the embodiment of the invention, aiming at each target, the fusion positions of the targets are associated according to the acquisition time sequence of the image frame set corresponding to the target, and the fusion track of the target is generated. In addition, the method provided by the embodiment of the invention provides more reliable space-time information for the track fusion of the target by synchronizing the image frames acquired by the plurality of image acquisition devices.

In the embodiment of the present invention, the number of the image capturing devices may be 2, 3, or more, and is not limited specifically herein.

In a possible implementation manner, fig. 3 is a flowchart of a location fusion method provided by an embodiment of the present invention, and as shown in fig. 3, for an application scenario in which the number of image capturing devices is 3 and more than 3, the fusing world locations of each identical object in each image frame in the image frame set to obtain a fused location of the object may include:

s301, for any two image frames in the image frame set, according to the world position of each object in the image frame set, constructing a similarity matrix of each object in the two image frames.

In this embodiment of the present invention, for any two image frames in the image frame set, the similarity matrix of each object in the two image frames is constructed according to the world position of each object in the image frame set, and specifically, the following steps B1 to B2 may be adopted:

step B1, based on the world position of each target in the image frame set, determining the speed direction of the target.

Specifically, for each target, a position difference value between a world position corresponding to the image position of the target in the current image frame and a world position corresponding to the image position of the target in the previous image frame of the current image frame, that is, a displacement of the target in the acquisition time interval of the current image frame and the previous image frame may be calculated, a ratio between the position difference value and the acquisition time difference of the two image frames is determined as a speed of the target, and a speed direction of the target is indicated based on the positive or negative of the speed.

And B2, aiming at any two image frames in the image frame set, and constructing a similarity matrix of each target in the two image frames based on the world positions and the speed directions of the targets in the two image frames.

In order to avoid missing the track of the target, in the present embodiment, the targets acquired by every two arbitrary image acquisition devices may be matched first to determine the same target in the targets acquired by the two image acquisition devices.

Specifically, for any two image frames in the image frame set, such as the image frame a and the image frame B, if the image frame a includes m objects and the image frame B includes n objects, the cosine similarity between the velocity vectors of the objects in the image frame a and the image frame B may be calculated according to the world positions of the objects in the image frame a and the image frame B and the velocity directions of the objects in the image frame a and the image frame B, and the cosine similarity is used as the similarity, so as to obtain an m × n similarity matrix.

That is, the speed vectors of the respective objects in the image frame a and the image frame B may be calculated according to the world positions of the respective objects in the image frame a and the image frame B and the speed directions of the respective objects in the image frame a and the image frame B, further, cosine similarities between the speed vectors including m objects in the image frame a and the speed vectors including n objects in the image frame B are calculated, and the similarities are arranged in an m × n matrix to obtain an m × n similarity matrix.

In the embodiment of the present invention, not only the manner mentioned in the above steps is used, but also the actual application scenario may be combined with the cosine similarity between the calculated velocity vectors of the targets, and the appearance similarity, the posture similarity, the target type similarity, and the like between the targets may be determined by combining various features of the appearance, the posture, the target type, and the like of the targets, and the cosine similarity, the appearance similarity, the posture similarity, and the target type similarity between the velocity vectors of the targets may be superimposed to obtain the corresponding m × n similarity matrix. Wherein, the target type can be an animal, a vehicle or a sign board; the appearance characteristics of the target can be vehicle identification of the vehicle, appearance characteristics of animals and the like; the attitude characteristics of the target may be the world location, speed, and speed direction of the target, etc.

In the embodiment of the invention, if the target is a vehicle, the vehicle identification of the target can also be acquired, and the similarity between any two targets positioned in different image frames in the two image frames is calculated according to the vehicle identification.

S302, resolving the similarity matrix according to a Hungarian algorithm; and if the calculated similarity between the two targets which are located at the corresponding positions in the similarity matrix and are located in different image frames is larger than a preset similarity threshold, determining that the two targets are the same target.

In the embodiment of the invention, after the similarity matrix of each target in the two image frames is obtained, the similarity matrix can be solved by using the Hungarian algorithm to obtain the one-to-one matching relation among the targets. For example, if image frame A1 in the set of image frames includes 5 objects: a1, a2, a3, a4, and a5, the image frame B1 includes 3 objects: b1, B2 and B3, wherein the similarity matrix of each target in the image frame A1 and the image frame B1 is a5 multiplied by 3 matrix, and the similarity matrix can be solved by using Hungarian algorithm to obtain a one-to-one matching relation among the targets: a1 matches b2, a2 matches b1, and a4 matches b3.

In the embodiment of the present invention, for each image group, a single-camera track of each target may be generated based on multiple world positions of each target in the image group, and specifically, the single-camera track of the target may be generated by adopting the above steps A1 to A2, which is not described herein again.

For any two image frames in the image frame set, after a similarity matrix of each object in the two image frames is constructed according to the world position of each object in the image frame set, if historical association exists between a pair of objects respectively located in the two image frames, the similarity between the two objects can be directly set to be 1. For example, if the image frame A1 in the image frame set includes 5 objects: a1, a2, a3, a4, and a5, the image frame B1 includes 3 objects: b1, b2 and b3. If the object in the single-camera track to which A1 belongs in the image frame A1 in the previous frame is matched with the object in the single-camera track to which B2 belongs in the image frame B1 in the previous frame, the similarity between the object A1 and the object B2 can be directly set to 1.

After the similarity matrix is solved, if the similarity between two targets located at corresponding positions in the similarity matrix and in different image frames obtained through calculation is greater than a preset similarity threshold, the two targets are determined to be the same target. For example, a1 and b2 match, a2 and b1 match, a4 and b3 are obtained through calculation, that is, a1 and b2 are at corresponding positions in the similarity matrix, a2 and b1 are at corresponding positions in the similarity matrix, and a4 and b3 are at corresponding positions in the similarity matrix, and then it may be further determined whether the similarity between a1 and b2 is greater than a preset similarity threshold, the similarity between a2 and b1 is greater than a preset similarity threshold, and the similarity between a4 and b3 is greater than a preset similarity threshold. If the similarity between a1 and b2 is greater than the preset similarity threshold, the similarity between a2 and b1 is not greater than the preset similarity threshold, and the similarity between a4 and b3 is greater than the preset similarity threshold, then a1 and b2 may be determined as the same target, a4 and b3 may be determined as the same target, and a2 and b1 may not be the same target. The preset similarity threshold may be set according to an actual application scenario, and is not specifically limited herein.

And S303, fusing the world positions of the same target in the image frame set to obtain a fused position, and taking the fused position as the fused position of the fused target corresponding to the same target.

Specifically, an average value of world coordinates of world positions of the same target in the image frame set may be calculated, the average value may be taken as the indicated position as the fused position, and the fused position may be taken as the fused position of the fusion target corresponding to the same target.

In the step, matching results of any two targets positioned in different image frames in the image frame set are integrated to obtain the fusion positions of the fusion targets corresponding to all the same targets in the image frame set.

In this embodiment, in order to ensure that each fusion target can only match at most 1 target in the image frames acquired by each image acquisition device, that is, the target acquired by each image acquisition device can only appear in one fusion target, after the world positions of the same target in the image frame set are fused to obtain a fused position, and the fused position is taken as the fused position of the fusion target corresponding to the same target, the method further includes: for each fusion target, if at least two targets from the same image group exist in a plurality of same targets corresponding to the fusion target, removing the target with the distance which is not the minimum distance from the fusion target from the at least two targets; and updates the fusion position of the fusion target. That is, the distance between the at least two targets and the fusion target may be calculated, the target with the smallest distance between the at least two targets and the fusion target is retained, other targets are removed, and then the average value of the world coordinates of the world positions of the remaining same targets after the targets are removed is calculated as the updated fusion position of the fusion target.

After the fusion objectives are obtained, the features of each fusion objective may be updated. Specifically, the characteristics of the fusion target may include: world coordinates, appearance characteristics, object type, vehicle identification, etc. of the fusion location of the fusion object. The world coordinate of the fusion target is an average value of the world coordinates of the world of the target in each image acquisition device of the fusion source; the target type and the vehicle identification of the fusion target adopt the type and the lane number of a target which is closest to the fusion target in the targets in each image acquisition device of the fusion source; the fusion target appearance characteristic is an average value of appearance characteristics of targets in the image acquisition devices which are fusion sources of the fusion target appearance characteristic.

By adopting the method provided by the embodiment of the invention, aiming at each target, the fusion positions of the targets are associated according to the acquisition time sequence of the image frame set corresponding to the target, and the fusion track of the target is generated. In addition, the method provided by the embodiment of the invention provides more reliable space-time information for the track fusion of the target by synchronizing the image frames acquired by a plurality of image acquisition devices. And moreover, the position association is realized by combining the characteristics of world coordinates, vehicle identification, motion speed, appearance and the like, so that the association result is more reliable and has wide application range, and the method is not limited to specific categories of targets.

In another possible embodiment, before the fusion position of each target is associated according to the acquisition time sequence of the image frame set corresponding to the target to generate the fusion track of the target, for each target, a single-camera track of each target in the image group may be generated based on a plurality of world positions of the target. Specifically, the single-camera track of the target may be generated by adopting the above steps A1-A2, which is not described herein again. Also, a single camera track identification may also be generated for each single camera track.

Fig. 4 is a flowchart of generating a fusion trajectory of a target according to an embodiment of the present invention, and as shown in fig. 4, the step of associating, for each target, fusion positions of the target according to an acquisition time sequence of an image frame set corresponding to the target to generate the fusion trajectory of the target may specifically include:

s401, for each fusion target, if the current fusion position of the fusion target and the source of the fusion position in the existing track both comprise positions in the same single-camera track, determining that the current fusion position of the fusion target is associated with the existing track.

And the existing track is formed by associating the fusion positions of all targets before the current fusion position of the fusion target in the acquisition time. That is to say, the existing trajectory is a trajectory formed by associating the fusion position of each target before the current fusion position of the fusion target, where each target before the current fusion position of the fusion target is each target whose acquisition time of the corresponding image frame is before the acquisition time of the image frame corresponding to the fusion target.

If the current fusion position of the fusion target is obtained by fusing a group of image frames with the earliest acquisition time in each group of synchronous image frames, the fusion target can be directly used as an existing track, and a uniform fusion track identifier is given to the existing track.

And according to the acquisition time sequence, aiming at each fusion target of which the acquisition time of the corresponding image frame is behind the first fusion target, if the current fusion position of the fusion target and the source of the fusion position in the existing track both comprise positions in the same single-camera track, determining that the current fusion position of the fusion target is associated with the existing track. For example, if the sources of the fusion position in the existing trajectory a include: a position in the single camera trajectory 1 of the image capturing device 1, a position in the single camera trajectory 2 of the image capturing device 2, a position in the single camera trajectory 3 of the image capturing device 3 and a position in the single camera trajectory 4 of the image capturing device 4; the sources of the current fusion position of the fusion target a include: a position in the single camera trajectory 1 of the image capturing device 1, a position in the single camera trajectory 5 of the image capturing device 2, a position in the single camera trajectory 6 of the image capturing device 3 and a position in the single camera trajectory 7 of the image capturing device 4; because the sources of the existing track a and the current fusion position of the fusion target a both include the position in the single-camera track 1 of the image capturing device 1, it can be directly determined that the current fusion position of the fusion target a is associated with the existing track, and the existing track a can be updated based on the current fusion position of the fusion target a, that is, the current fusion position of the fusion target a is used as a new track point of the existing track a, and a new fusion track identifier is given to the updated existing track a.

Specifically, in this embodiment, the current fusion position of the fusion target and the source of the fusion position in the existing trajectory may be determined according to the single-camera trajectory identifier and the fusion trajectory identifier of the single-camera trajectory.

S402, if the current fusion position of the fusion target and the source of the fusion position in any one existing track do not comprise the position in the same single-camera track, and the current fusion position of the fusion target is in a preset motion state, determining that the current fusion position of the fusion target is not related to the existing track.

The preset motion state is a state that the motion state of the fusion target is in a state of driving away from the intersection.

In the embodiment, if the single-camera track identifier of the existing track source is in a lost state and the existing track is in a motion state of driving away from the intersection, the existing track is prohibited from being associated with any fusion target; and if the single-camera track mark of the existing track source is in a cancelled state and the existing track is in a motion state of driving away from the intersection, directly cancelling the existing track, namely deleting the existing track.

S403, if the current fusion position of the fusion target does not include the position in the same single-camera track with the source of the fusion position in any existing track, and the current fusion position of the fusion target is not in a preset motion state, constructing a correlation matrix between each fusion target and the existing track in the image frame corresponding to the fusion target.

If the current fusion position of the fusion target does not include a position in the same single-camera track as the source of the fusion position in any existing track, the identifier along the single-camera track may be set to 0. Wherein, following a single camera trajectory with a designation of 0 represents: the existing track associated with the current fusion position of the fusion target is not required to be determined according to the track of the single camera; the label along the single camera trajectory is 1: the existing track associated with the current fusion position of the fusion target needs to be determined according to whether the source of the current fusion position of the fusion target is the same as the source of the single-camera track.

Specifically, the similarity between the world coordinate distances of the fusion target and the existing track can be calculated, and a correlation matrix between each fusion target and the existing track in the image frame corresponding to the fusion target is constructed. And performing local association processing and global association processing of the fusion target and the existing track through the association degree matrix between each fusion target and the existing track in the image frame corresponding to the fusion target. The local association processing specifically comprises the following steps: based on the incidence matrix, the Hungarian algorithm can be adopted to calculate the incidence result of the fusion target and the existing track. The global association processing specifically includes: based on the incidence matrix, the Hungarian algorithm can be adopted to calculate the incidence result of the existing track with the track mark of the single camera fusing the target and the source in the lost state. According to the association result of the local association and the global association, a new fusion track identifier can be given to the matched fusion target, and the track characteristic of the fusion target is updated; for the unmatched fusion target, if the unmatched fusion target meets the track new-building condition, building a track for the fusion target as an existing track; and for the existing track which is not matched with the upper fusion target, setting the existing track into a lost state. And if the unmatched fusion target is obtained by fusing a group of image frames with the earliest acquisition time in each group of synchronous image frames, determining that the fusion target meets the new track building condition.

S404, calculating the association matrix according to the Hungarian algorithm, and if the association between the fusion target and the existing track at the corresponding position in the association matrix obtained through calculation is larger than a preset association threshold value, determining that the current fusion position of the fusion target is associated with the existing track.

The preset association threshold may be set according to an actual application situation, and is not specifically limited herein.

In the embodiment of the invention, the incidence matrix can be solved by using a Hungarian algorithm to obtain the one-to-one matching relation between each fusion target and the existing track in the image frame corresponding to the fusion target. For example, if the fusion target corresponds to 3 targets in the image frame: c1, c2 and c3, the existing tracks comprise: the existing track A and the existing track B have a correlation matrix of 3 × 2, and the correlation matrix can be solved by using the Hungarian algorithm to obtain a one-to-one matching relationship between each fusion target and the existing track in the image frame corresponding to the fusion target: c1 is matched with the existing track B, and c2 is matched with the existing track A.

If the association degree between the c1 and the existing track B is greater than a preset association degree threshold value, it can be determined that the current fusion position of the fusion target c1 is associated with the existing track B. If the association degree between the c2 and the existing track a is not greater than the preset association degree threshold, it may be determined that the current fusion position of the fusion target c2 is not associated with the existing track a.

After determining that the fusion target is associated with the existing track, the existing track may be updated based on the current fusion position of the fusion target, that is, the current fusion position of the fusion target is used as a new track point of the existing track, and a new fusion track identifier is assigned to the updated existing track.

S405, if the fusion positions corresponding to the fusion target are all associated, generating a fusion track of the fusion target based on all the associated fusion positions of the fusion target.

By adopting the method provided by the embodiment of the invention, aiming at each target, the fusion positions of the targets are associated according to the acquisition time sequence of the image frame set corresponding to the target, and the fusion track of the target is generated. In addition, the fusion track of the target can be generated in real time, namely, the method provided by the embodiment of the invention is suitable for scenes with high real-time requirements, such as smart intersections and the like. In addition, the method provided by the embodiment of the invention provides more reliable space-time information for the track fusion of the target by synchronizing the image frames acquired by a plurality of image acquisition devices. And moreover, the position association is realized by combining the characteristics of world coordinates, lane numbers, motion speeds, appearances and the like, so that the association result is more reliable and has wide application range, and the method is not limited to specific types of targets. In addition, the method provided by the embodiment of the invention utilizes the single-camera track to perform track fusion of the target, so that the fusion track is more reliable, and the phenomenon that the fusion track mark is brought back is effectively reduced. In addition, the method provided by the embodiment of the invention does not need to manually acquire the relation matrix of the region acquired by the image acquisition equipment, and has lower use cost and better universality.

The method provided by the embodiment of the invention can be applied to a real-time scene to generate the fusion track of the target in real time.

In a possible implementation mode, for scenes which do not require complete real-time, the deviation rectification processing on the fusion track can be realized in a track delay deviation rectification mode, so that the jump of the fusion track is reduced. Fig. 5 is a schematic flow chart of a trajectory rectification method according to an embodiment of the present invention, as shown in fig. 5: for an application scene adopting delay correction, track delay caching can be carried out, then track smoothing is carried out, and finally a smoothed track is output, specifically, a multi-frame fusion track of a fixed time period can be cached, and specifically n can be set to be 10 frames or 20 frames and the like; then, calculating the average value of the world coordinates of the fusion track of the previous i frames one by one, wherein i =0, 1, 2 … n; and then, outputting the current smooth fusion track according to the frame, realizing the deviation rectification processing of the fusion track, and enabling the output fusion track to be smoother and smoother. Aiming at the application scene without adopting time delay correction, the fusion track can be directly output.

By adopting the method provided by the embodiment of the invention, aiming at each target, the fusion positions of the targets are correlated according to the acquisition time sequence of the image frame set corresponding to the target, and the fusion track of the target is generated. In addition, the fusion track of the target can be generated in real time, namely, the method provided by the embodiment of the invention is suitable for scenes with high real-time requirements, such as smart intersections. In addition, the method provided by the embodiment of the invention provides more reliable space-time information for the track fusion of the target by synchronizing the image frames acquired by the plurality of image acquisition devices. And moreover, the position association is realized by combining the characteristics of world coordinates, lane numbers, motion speeds, appearances and the like, so that the association result is more reliable and has wide application range, and the method is not limited to specific types of targets.

For an application scenario in which a trajectory is generated for a specific target, the method provided by the embodiment of the present invention provides more reliable spatiotemporal information for trajectory fusion of the target by synchronizing image frames acquired by a plurality of image acquisition devices, and realizes position association by combining features such as world coordinates, lane numbers, motion speeds, and appearances, so that the association result is more reliable and the application range is wider.

Corresponding to the target track generation method, the embodiment of the invention also provides a target track generation device. The following describes a target trajectory generation apparatus provided in an embodiment of the present invention. Fig. 6 is a schematic structural diagram of an apparatus for generating a target track according to an embodiment of the present invention, as shown in fig. 6, the apparatus includes:

the position acquisition module 601 is configured to acquire, for an image group acquired by each image acquisition device, an image position where each target in each image frame in the image group is located;

a position conversion module 602, configured to convert the image position into a position in a world coordinate system according to a preset conversion relationship, so as to obtain a world position corresponding to the image position;

a position fusion module 603, configured to fuse, for each image frame set, world positions of each identical target in each image frame in the image frame set to obtain a fusion position of the target; the image frame set is an image set formed by all image frames with synchronous acquisition time;

a track generating module 604, configured to associate, for each target, a fusion position of the target according to the acquisition time sequence of the image frame set corresponding to the target, and generate a fusion track of the target.

By adopting the device provided by the embodiment of the invention, the image position of each target in each image frame in the image group is acquired aiming at the image group acquired by each image acquisition device; converting the image position into a position under a world coordinate system according to a preset conversion relation to obtain a world position corresponding to the image position; for each image frame set, fusing the world positions of the same targets in each image frame in the image frame set to obtain the fusion positions of the targets; and for each target, associating the fusion positions of the target according to the acquisition time sequence of the image frame set corresponding to the target to generate a fusion track of the target. The device provided by the embodiment of the invention can generate the tracks of a plurality of targets, and can meet the track generation requirements of a plurality of complex targets in scenes such as urban traffic and the like.

Optionally, the number of image acquisition devices is at least 3;

referring to fig. 7, the location fusion module 603 includes:

the similarity matrix determination submodule 701 is used for constructing a similarity matrix of each target in any two image frames in the image frame set according to the world position of each target in the image frame set;

the target determining submodule 702 is configured to solve the similarity matrix according to a hungarian algorithm, and determine that two targets located in different image frames and corresponding to the similarity matrix are the same target if the similarity between the two targets is greater than a preset similarity threshold;

a position fusion sub-module 703, configured to fuse the world positions of the same target in the image frame set to obtain a fused position, and use the fused position as the fusion position of the fusion target corresponding to the same target.

Optionally, the similarity matrix determining submodule 701 is specifically configured to determine a speed direction of each target in the image frame set based on a world position of the target; and for any two image frames in the image frame set, constructing a similarity matrix of each target in the two image frames based on the world positions and the speed directions of the targets in the two image frames.

Optionally, referring to fig. 8, the apparatus further includes:

a fusion location updating module 801, configured to, for each fusion target, if at least two targets from the same image group exist in multiple same targets corresponding to the fusion target, remove a target whose distance to the fusion target is a non-minimum distance from the at least two targets; and updates the fusion position of the fusion target.

Optionally, referring to fig. 8, the apparatus further includes:

a single camera trajectory generation module 802 for generating, for each image group, a single camera trajectory for each target in the image group based on the plurality of world locations of the target;

the track generating module 604 is specifically configured to, for each fusion target, determine that the current fusion position of the fusion target is associated with the existing track if the current fusion position of the fusion target and the source of the fusion position in the existing track both include positions in the same single-camera track; the existing track is formed by associating the fusion position of each target with the acquisition time before the current fusion position of the fusion target; if the current fusion position of the fusion target does not comprise a position in the same single-camera track with the source of the fusion position in any one existing track, and the current fusion position of the fusion target is in a preset motion state, determining that the current fusion position of the fusion target is not related to the existing track; if the current fusion position of the fusion target does not comprise a position in the same single-camera track with the source of the fusion position in any one existing track, and the current fusion position of the fusion target is not in a preset motion state, constructing a correlation matrix between each fusion target and the existing track in an image frame corresponding to the fusion target; calculating the association matrix according to a Hungarian algorithm, and if the correlation between the fusion target and the existing track at the corresponding position in the association matrix is larger than a preset correlation threshold value, determining that the current fusion position of the fusion target is associated with the existing track; and if the fusion positions corresponding to the fusion target are all associated, generating a fusion track of the fusion target based on all the associated fusion positions of the fusion target.

Optionally, referring to fig. 8, the apparatus further includes:

a track update module 803, configured to update the existing track based on the current fusion position of the fusion target.

Optionally, referring to fig. 8, the apparatus further includes:

a synchronization time determination module 804 for generating, for each image group, a single camera trajectory for each target in the image group based on the plurality of world locations for the target; matching the single camera tracks in any two image groups; optionally selecting a group of matched single-camera tracks, and respectively calculating the distance from the starting position and the ending position of the first single-camera track in the group to each position of the second single-camera track; and calculating the distance between the start position and the end position of the second single-camera trajectory in the group to each position of the first single-camera trajectory, respectively; if a position exists in the second single-camera track, wherein the distance between the second single-camera track and the starting position or the ending position of the first single-camera track is smaller than a preset distance threshold, and a position exists in the first single-camera track, wherein the distance between the first single-camera track and the starting position or the ending position of the second single-camera track is smaller than a preset distance threshold, the group of matched single-camera tracks are reserved; otherwise, deleting the single-camera track of the group; aiming at each group of retained matched single-camera tracks, selecting a shortest distance from the starting position and the ending position of a first single-camera track in the group to each position of a second single-camera track and the distance from the starting position and the ending position of the second single-camera track in the group to each position of the first single-camera track, and determining image frames where two positions corresponding to the shortest distance are respectively located as primary synchronous image frames; and calculating the average time of the acquisition time of each pair of primary synchronous image frames in any two image groups, and taking the image frame with the acquisition time of the average time in the two image groups as the final synchronous image frame of the two image groups.

By adopting the device provided by the embodiment of the invention, aiming at each target, the fusion positions of the targets are correlated according to the acquisition time sequence of the image frame set corresponding to the target, and the fusion track of the target is generated. In addition, the fusion track of the target can be generated in real time, namely, the device provided by the embodiment of the invention is suitable for scenes with high real-time requirements, such as intelligent intersections and the like. In addition, the device provided by the embodiment of the invention provides more reliable space-time information for the track fusion of the target by synchronizing the image frames acquired by a plurality of image acquisition devices. And moreover, the position association is realized by combining the characteristics of world coordinates, lane numbers, motion speeds, appearances and the like, so that the association result is more reliable and has wide application range, and the method is not limited to specific types of targets. In addition, the device provided by the embodiment of the invention utilizes the single-camera track to perform track fusion of the target, so that the fusion track is more reliable, and the phenomenon that the fusion track mark is brought back is effectively reduced. In addition, the device provided by the embodiment of the invention does not need to manually acquire the relation matrix of the region acquired by the image acquisition equipment, and has lower use cost and better universality.

An embodiment of the present invention further provides an electronic device, as shown in fig. 9, which includes a processor 901, a communication interface 902, a memory 903, and a communication bus 904, where the processor 901, the communication interface 902, and the memory 903 complete mutual communication through the communication bus 904,

a memory 903 for storing computer programs;

the processor 901 is configured to implement the steps of the target trajectory generation method according to any of the above embodiments when executing the program stored in the memory 903.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In yet another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any one of the above-mentioned target trajectory generation methods.

In yet another embodiment of the present invention, a computer program product containing instructions is further provided, which when run on a computer causes the computer to execute the method for generating any one of the target trajectories in the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the computer-readable storage medium, and the computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method for generating a target trajectory, comprising:

2. The method of claim 1, wherein the number of image acquisition devices is at least 3;

for any two image frames in the image frame set, constructing a similarity matrix of each target in the two image frames according to the world position of each target in the image frame set;

3. The method of claim 2, wherein for any two image frames in the image frame set, constructing a similarity matrix for each object in the two image frames according to the world position of each object in the image frame set comprises:

determining a velocity direction of each target in the set of image frames based on the world location of the target;

4. The method according to claim 2, wherein after the fusing the world positions of the same object in the image frame set to obtain a fused position and taking the fused position as the fused position of the fused object corresponding to the same object, the method further comprises:

for each fusion target, if at least two targets from the same image group exist in a plurality of same targets corresponding to the fusion target, removing the target with the distance between the target and the fusion target being non-minimum distance from the at least two targets; and updates the fusion position of the fusion target.

5. The method according to claim 2, further comprising, before the associating, for each object, the fusion position of the object according to the acquisition time sequence of the image frame set corresponding to the object to generate the fusion track of the object, the steps of:

6. The method according to claim 1, further comprising, before said fusing, for each image frame set, the world positions of each identical object in the respective image frames in the image frame set to obtain the fused position of the object:

matching the single camera tracks in any two image groups;

7. An apparatus for generating a target trajectory, comprising:

8. The apparatus of claim 7, wherein the number of image capture devices is at least 3;

the location fusion module includes:

the target determining submodule is used for resolving the similarity matrix according to Hungarian algorithm, and if the similarity between two targets which are located at corresponding positions in the similarity matrix and are located in different image frames obtained by resolving is larger than a preset similarity threshold value, determining that the two targets are the same target;

and the position fusion submodule is used for fusing the world positions of the same target in the image frame set to obtain a fused position, and taking the fused position as the fused position of the fused target corresponding to the same target.

9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1-6 when executing a program stored in the memory.

10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-6.