CN113326715B

CN113326715B - Target association method and device

Info

Publication number: CN113326715B
Application number: CN202010129076.XA
Authority: CN
Inventors: 吴迪; 蔡娟; 翁超群
Original assignee: Momenta Suzhou Technology Co Ltd
Current assignee: Momenta Suzhou Technology Co Ltd
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2022-06-10
Anticipated expiration: 2040-02-28
Also published as: CN113326715A

Abstract

The embodiment of the invention discloses a target association method and a target association device, wherein the method comprises the following steps: obtaining semantic perception information of each perception target in each image to be processed; for each image to be processed, determining three-dimensional position information corresponding to each perception target by using equipment information of image acquisition equipment corresponding to the image to be processed and grounding point information of each perception target; if the perception targets do not correspond to the effective historical association relationship, determining the three-dimensional distance between the perception targets and each perception target to be processed by utilizing the three-dimensional position information corresponding to the perception targets and the three-dimensional position information corresponding to the perception targets to be processed; and determining an associated sensing target which has an association relation with the sensing target by using the three-dimensional distance between the sensing target and the sensing target to be processed corresponding to the sensing target so as to realize the association of the sensing targets corresponding to the same physical target in the images acquired by the multi-image acquisition equipment.

Description

Target association method and device

Technical Field

The invention relates to the technical field of data processing, in particular to a target association method and a target association device.

Background

In a vision-based automatic driving scheme, a vehicle is provided with a plurality of image acquisition devices so as to acquire images aiming at the surrounding environment of the vehicle; and subsequently, performing depth analysis on the acquired image by using a visual perception algorithm to generate a perception result of a perception target contained in the image, wherein the perception target includes but is not limited to: targets such as vehicles around the vehicle, pedestrians, riders, lane lines, traffic obstacles, signboards, and the like; and outputting the sensing result to a prediction regulation and control module, and performing path planning and decision control on the next-stage motion of the vehicle by using the input sensing result by the prediction regulation and control module.

In the vision-based automatic driving scheme, because image acquisition regions between adjacent image acquisition devices arranged at positions of a vehicle are overlapped, the number of perception targets analyzed from images acquired by a plurality of paths of image acquisition devices is greater than the number of real physical targets in the environment where the vehicle is actually located by using a vision perception algorithm, and the perception results of the perception targets analyzed from the images acquired by the plurality of paths of image acquisition devices are directly output to a prediction regulation and control module without associating the perception targets corresponding to the same physical target from the perception targets analyzed from the images acquired by the plurality of paths of image acquisition devices, so that the prediction regulation and control module cannot perform optimal path planning and decision control. Therefore, how to provide a method for associating perception targets corresponding to the same physical target in images acquired by a plurality of image acquisition devices becomes an urgent problem to be solved.

Disclosure of Invention

The invention provides a target association method and a target association device, which are used for determining association of perception targets corresponding to the same physical target in images acquired by multiple image acquisition devices. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a target association method, including:

obtaining semantic perception information of each perception target in each image to be processed, wherein the image to be processed is as follows: the plurality of image acquisition devices shoot acquired images aiming at the environment where the target vehicle is located from different angles in the same acquisition period; the semantic perception information of each perception target comprises: sensing grounding point information of a target;

for each image to be processed, determining three-dimensional position information corresponding to each perception target in the image to be processed by using equipment information of image acquisition equipment corresponding to the image to be processed and grounding point information of each perception target in the image to be processed;

judging whether each perception target corresponds to an effective historical association relation or not;

if the perception target does not correspond to the effective historical association relationship, determining a three-dimensional distance between the perception target and each perception target to be processed corresponding to the perception target by using the three-dimensional position information corresponding to the perception target and the three-dimensional position information corresponding to the perception target to be processed, wherein the perception target to be processed corresponding to the perception target is as follows: the perception target which corresponds to the image acquisition equipment adjacent to the image acquisition equipment corresponding to the perception target and does not correspond to the incidence relation;

and determining an associated sensing target which has an association relation with the sensing target from the corresponding to-be-processed sensing target by utilizing the three-dimensional distance between the sensing target and the to-be-processed sensing target corresponding to the sensing target.

Optionally, the method further includes:

and if the sensing target is judged to correspond to the effective historical association relationship, determining the target corresponding to the effective historical association relationship corresponding to the sensing target as the associated sensing target with the association relationship with the sensing target.

Optionally, the step of determining, for each sensing target, whether the sensing target corresponds to an effective historical association relationship includes:

judging whether each perception target corresponds to a historical association relation or not;

if the historical association relation corresponding to the perception target is judged, judging whether the historical association relation corresponding to the perception target is valid, wherein if the target corresponding to the historical association relation corresponding to the perception target is judged to exist and the length of the determination time corresponding to the historical association relation is lower than a preset time threshold, the historical association relation corresponding to the perception target is determined to be valid; otherwise, determining that the historical association relation corresponding to the perception target is invalid.

Optionally, the step of determining, for each image to be processed, three-dimensional position information corresponding to each sensing target in the image to be processed by using the device information of the image acquisition device corresponding to the image to be processed and the grounding point information of each sensing target in the image to be processed includes:

for each image to be processed, determining three-dimensional position information corresponding to each perception target in the image to be processed by using a preset three-dimensional position information determining algorithm, wherein the installation height information and the visual field vanishing point position information in the equipment information of the image acquisition equipment corresponding to the image to be processed and the grounding point information of each perception target in the image to be processed are determined, and the preset three-dimensional position information determining algorithm is as follows: and (4) an algorithm based on the imaging principle of the image acquisition equipment and the similar triangle relation determination.

Optionally, the step of determining, by using a preset three-dimensional position information determination algorithm, the mounting height information and the view vanishing point position information in the device information of the image acquisition device corresponding to the image to be processed, and the grounding point information of each sensing target in the image to be processed, to determine the three-dimensional position information corresponding to each sensing target in the image to be processed, includes:

determining depth information corresponding to each perception target by utilizing a preset three-dimensional position information determination algorithm, mounting height information and visual field vanishing point position information in equipment information of image acquisition equipment corresponding to the image to be processed and grounding point information of each perception target in the image to be processed;

for each perception target in the image to be processed, determining position information of the perception target under an equipment coordinate system of the corresponding image acquisition equipment based on depth information and grounding point information corresponding to the perception target and a preset projection matrix of the corresponding image acquisition equipment;

and for each perception target in the image to be processed, determining three-dimensional position information corresponding to each perception target in the image to be processed based on a position conversion relation between an equipment coordinate system of image acquisition equipment corresponding to the perception target and a vehicle body coordinate system of the target vehicle and position information of the perception target in the equipment coordinate system of the image acquisition equipment corresponding to the perception target.

Optionally, the step of determining, from the to-be-processed sensing targets corresponding to the sensing target, an associated sensing target having an association relationship with the sensing target by using the three-dimensional distance between the sensing target and the to-be-processed sensing target corresponding to the sensing target includes:

judging whether the three-dimensional distance between the perception target and the perception target to be processed corresponding to the perception target is lower than a preset distance threshold value or not;

if the three-dimensional distance between the perception target and the perception target to be processed corresponding to the perception target is judged to be lower than the preset distance threshold, determining the perception target to be processed corresponding to the three-dimensional position information as a perception target to be associated corresponding to the perception target;

and determining an associated perception target which has an association relation with the perception target from each perception target to be associated corresponding to the perception target.

In a second aspect, an embodiment of the present invention provides a target association apparatus, where the apparatus includes:

an obtaining module configured to obtain semantic perception information of each perception target in each image to be processed, where the image to be processed is: the plurality of image acquisition devices shoot acquired images aiming at the environment where the target vehicle is located from different angles in the same acquisition period; the semantic perception information of each perception target comprises: sensing grounding point information of a target;

the first determining module is configured to determine, for each image to be processed, three-dimensional position information corresponding to each sensing target in the image to be processed by using the device information of the image acquisition device corresponding to the image to be processed, the grounding point information of each sensing target in the image to be processed, and the position information of the visual field vanishing point;

the judging module is configured to judge whether each perception target corresponds to an effective historical association relation;

a second determining module, configured to, if it is determined that the sensing target does not correspond to an effective historical association relationship, determine a three-dimensional distance between the sensing target and each sensing target to be processed corresponding to the sensing target by using the three-dimensional position information corresponding to the sensing target and the three-dimensional position information corresponding to the sensing target to be processed, where the sensing target to be processed corresponding to the sensing target is: the perception target which corresponds to the image acquisition equipment adjacent to the image acquisition equipment corresponding to the perception target and does not correspond to the incidence relation;

and the third determining module is configured to determine an associated sensing target which has an association relation with the sensing target from the corresponding to-be-processed sensing targets by using the three-dimensional distance between the sensing target and the corresponding to-be-processed sensing target.

Optionally, the apparatus further comprises:

and the fourth determining module is configured to determine the target corresponding to the effective historical association relationship corresponding to the sensing target as the associated sensing target having the association relationship with the sensing target if the effective historical association relationship corresponding to the sensing target is judged.

Optionally, the determining module is specifically configured to determine, for each sensing target, whether the sensing target corresponds to a historical association relationship;

Optionally, the first determining module is specifically configured to determine, for each image to be processed, three-dimensional position information corresponding to each sensing target in the image to be processed by using a preset three-dimensional position information determining algorithm, where the mounting height information and the view vanishing point position information in the device information of the image acquisition device corresponding to the image to be processed and the grounding point information of each sensing target in the image to be processed are determined, where the preset three-dimensional position information determining algorithm is: and (4) an algorithm based on the imaging principle of the image acquisition equipment and the similar triangle relation determination.

Optionally, the first determining module is specifically configured to determine depth information corresponding to each sensing target by using a preset three-dimensional position information determining algorithm, installation height information and view vanishing point position information in the device information of the image acquisition device corresponding to the image to be processed, and grounding point information of each sensing target in the image to be processed;

Optionally, the third determining module is specifically configured to determine whether a three-dimensional distance between the sensing target and a sensing target to be processed corresponding to the sensing target is lower than a preset distance threshold;

As can be seen from the above, the object association method and apparatus provided in the embodiments of the present invention can obtain semantic perception information of each perception object in each to-be-processed image, where the to-be-processed image is: the method comprises the following steps that a plurality of image acquisition devices shoot acquired images from different angles in the same acquisition period aiming at the environment where a target vehicle is located; the semantic perception information of each perception target comprises: sensing grounding point information of a target; for each image to be processed, determining three-dimensional position information corresponding to each perception target in the image to be processed by using equipment information of image acquisition equipment corresponding to the image to be processed and grounding point information of each perception target in the image to be processed; judging whether each perception target corresponds to an effective historical association relation or not; if the perception target does not correspond to the effective historical association relationship, determining a three-dimensional distance between the perception target and each perception target to be processed corresponding to the perception target by using the three-dimensional position information corresponding to the perception target and the three-dimensional position information corresponding to the perception target to be processed, wherein the perception target to be processed corresponding to the perception target is as follows: the perception target which corresponds to the image acquisition equipment adjacent to the image acquisition equipment corresponding to the perception target and does not correspond to the incidence relation; and determining an associated sensing target which has an association relation with the sensing target from the corresponding to-be-processed sensing target by utilizing the three-dimensional distance between the sensing target and the to-be-processed sensing target corresponding to the sensing target.

By applying the embodiment of the invention, the three-dimensional position information corresponding to each perception target in each image to be processed can be determined by utilizing the equipment information of the image acquisition equipment corresponding to each image to be processed and the grounding point information of each perception target in the image to be processed, further, under the condition that the perception target does not correspond to the effective historical association relation, the three-dimensional position information corresponding to the perception target is utilized, determining the three-dimensional distance between the perception target and each perception target to be processed corresponding to the perception target according to the three-dimensional position information corresponding to the perception target to be processed, determining an associated perception target which has an association relation with the perception target in the perception targets to be processed according to the three-dimensional distance between the perception target and each perception target to be processed corresponding to the perception target, namely, the associated perception targets corresponding to the same physical target are determined, and the associated determination of the perception targets corresponding to the same physical target in the images acquired by the multi-image acquisition equipment is realized. Of course, it is not necessary for any product or method to achieve all of the above-described advantages at the same time for practicing the invention.

The innovation points of the embodiment of the invention comprise:

1. the device information of the image acquisition device corresponding to each image to be processed and the grounding point information of each perception target in the image to be processed can be firstly utilized to determine the three-dimensional position information corresponding to each perception target in each image to be processed, further, under the condition that the sensing target does not correspond to the effective historical association relation, the three-dimensional position information corresponding to the sensing target is utilized, determining the three-dimensional distance between the perception target and each perception target to be processed corresponding to the perception target according to the three-dimensional position information corresponding to the perception target to be processed, determining an associated perception target which has an association relation with the perception target in the perception targets to be processed according to the three-dimensional distance between the perception target and each perception target to be processed corresponding to the perception target, namely, the associated perception targets corresponding to the same physical target are determined, and the associated determination of the perception targets corresponding to the same physical target in the images acquired by the multi-image acquisition equipment is realized.

2. In order to reduce the calculation amount to a certain extent and consider the factors that the period of the image acquisition equipment for acquiring the images is short and the difference of the perception targets between the adjacent images is not large, under the condition that the effective historical association relationship corresponding to the perception targets is judged, the effective historical association relationship corresponding to the perception targets is continuously used.

3. Under the condition that the historical association relation corresponding to the perception target is judged to exist, judging whether the target corresponding to the historical association relation exists or not, and judging whether the length of the time corresponding to the historical association relation is smaller than a preset time threshold or not to judge whether the perception target corresponds to the effective historical association relation or not; and if the target corresponding to the historical association relationship corresponding to the perception target exists and the determination time corresponding to the historical association relationship is lower than a preset time threshold value, determining that the historical association relationship corresponding to the perception target is valid, otherwise, determining that the historical association relationship corresponding to the perception target is invalid.

4. The method comprises the steps of roughly determining depth information corresponding to each perception target in an image to be processed by utilizing a preset three-dimensional position information determination algorithm determined based on an imaging principle of image acquisition equipment and a similar triangular relation, mounting height information and visual field vanishing point position information in equipment information of the image acquisition equipment corresponding to the image to be processed and grounding point information of each perception target in the image to be processed, further determining three-dimensional position information corresponding to each perception target in the image to be processed based on a preset projection matrix of the image acquisition equipment and a position conversion relation between an equipment coordinate system and a vehicle body coordinate system of the image acquisition equipment, and providing a basis for determining the association of the perception targets corresponding to the same physical target in the image acquired by the multi-image acquisition equipment.

5. And aiming at the perception targets which do not correspond to the effective historical association relationship, taking the perception targets to be processed which correspond to the perception targets and have the three-dimensional distance lower than a preset distance threshold as the perception targets to be associated corresponding to the perception targets, determining the association perception targets which have the association relationship with the perception targets from each perception target to be associated corresponding to the perception targets, realizing the association perception targets which have the association relationship with the perception targets and are determined for each perception target, and realizing the accurate association determination of the perception targets corresponding to the same physical target in the images acquired by the multi-image acquisition equipment.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is to be understood that the drawings in the following description are merely exemplary of some embodiments of the invention. For a person skilled in the art, without inventive effort, further figures can be obtained from these figures.

Fig. 1 is a schematic flow chart of a target association method according to an embodiment of the present invention;

fig. 2 is another schematic flow chart of the target association method according to the embodiment of the present invention

FIG. 3A is a diagram illustrating an example of an imaging situation of an image capturing apparatus;

FIG. 3B is a perspective view of an imaging situation of the image capturing device;

FIGS. 3C and 3D are exemplary diagrams of an association confirmation of a perception object provided;

fig. 4 is a schematic structural diagram of a target association apparatus according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the embodiments and drawings of the present invention are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

The invention provides a target association method and a target association device, which are used for realizing accurate association determination of perception targets corresponding to the same physical target in images acquired by a plurality of image acquisition devices. The following provides a detailed description of embodiments of the invention.

Fig. 1 is a schematic flow chart of a target association method according to an embodiment of the present invention. The method may comprise the steps of:

s101: and obtaining semantic perception information of each perception target in each image to be processed.

Wherein, the image to be processed is: the method comprises the following steps that a plurality of image acquisition devices shoot acquired images from different angles in the same acquisition period aiming at the environment where a target vehicle is located; the semantic perception information of each perception target comprises: and sensing grounding point information of the target.

In the embodiment of the present invention, the method may be applied to any type of electronic device with computing capability, and the electronic device may be a server or a terminal device. The electronic device may be provided in a vehicle, or may be provided in a non-vehicle device without being provided in the vehicle.

The target vehicle may be provided with a plurality of image capturing devices that can capture images from different angles for the surrounding environment of the target vehicle. For example, the target vehicle may be provided with four image capturing devices that may capture images for the front, rear, left side, and right side of the target vehicle, respectively.

In one case, the electronic device may obtain semantic perception information for each perception target in each to-be-processed image sent by the other device. The other equipment can be connected with the plurality of image acquisition equipment, and directly obtains images acquired by the plurality of image acquisition equipment at the same acquisition time and from different angles aiming at the environment where the target vehicle is located, and the images are used as images to be processed.

The other equipment can be connected with the plurality of image acquisition equipment, and can directly obtain images acquired by the plurality of image acquisition equipment at the same acquisition time aiming at the environment where the target vehicle is located as images to be processed. And performing semantic perception information detection on a plurality of images to be processed by using a preset semantic perception information detection model, and determining semantic perception information of each perception target in each image to be processed. And further sending the semantic perception information to the electronic equipment so that the electronic equipment obtains the semantic perception information of each perception target in each image to be processed. Among these, perceptual targets include, but are not limited to: vehicles, light poles, pedestrians, riders, lane lines, and traffic obstacles, among others. Semantic awareness information includes, but is not limited to: information describing the shape, position, color, and category of the object.

The preset semantic perception detection model can be a neural network model obtained by training in advance based on a sample image labeled with each sample perception target and calibration information corresponding to the sample image and containing semantic perception information of each sample perception target in the sample image. The training process of the preset semantic perception detection model can refer to the training process of a neural network model in the related art, and is not described herein again.

In another implementation manner, the electronic device may be directly connected to the multiple image acquisition devices, to obtain images acquired by shooting the multiple image acquisition devices from different angles in the same acquisition period for the environment where the target vehicle is located, and use the images as images to be processed, and further perform semantic perception information detection on the multiple images to be processed by using a preset semantic perception information detection model to determine semantic perception information of each perception target in each image to be processed.

The electronic device determines whether the semantic perception information of each perception target depends on the position information of the pixel point, wherein the position information of the pixel point on which the semantic perception information of each perception target depends is undistorted, and if the position information of the pixel point on which the semantic perception information of each perception target depends is obtained by the electronic device and is not undistorted, the electronic device needs to perform undistorted on the position information of the pixel point on which the semantic perception information of each perception target depends, and then continues to execute a subsequent target association process.

In this embodiment, the semantic perception information of each perception target may include: and sensing the grounding point information of the target, namely sensing the position information of the contact position of the target and the ground. For example: the ground point information of the vehicle is: information of the position where the wheel of the vehicle is tangent to the ground, i.e. the wheel grounding point information. The grounding point information of the sensing target may be determined by any determination method capable of determining the position information of the contact position of the sensing target with the ground in the related art, which is not limited in the embodiment of the present invention.

In one case, in the case that the perception target is a vehicle, a vehicle in the to-be-processed image a may include at least two pieces of grounding point information, in one case, the grounding point information of a specified wheel of the vehicle may be selected and used as the grounding point information of the vehicle to perform the target association calculation, or the target association calculation may be performed on at least two pieces of grounding point information, where, when it is determined that there is an association relationship with the grounding point information of the to-be-processed image a from the grounding point information of another to-be-processed image B for any one piece of grounding point information of the vehicle, it may be considered that the vehicle to which the grounding point information in the another to-be-processed image B belongs corresponds to the same physical vehicle as the vehicle to which the grounding point information of the to-be-processed image a belongs.

In one implementation, when the sensing target is a vehicle, the wheel grounding point information of the vehicle may be determined as follows:

detecting an image to be processed by using a pre-trained vehicle detection model, and determining detection information of each wheel grounding point in the image to be processed; and associating detection information of a wheel grounding point corresponding to each vehicle, and outputting detection information of a wheel grounding point set of the vehicle for each vehicle, wherein the detection information includes: position information and orientation attribute information of a wheel ground point, wherein a pre-trained vehicle detection model is: and training the obtained model based on the sample image labeled with the wheel grounding point set of each sample vehicle. The azimuth attribute information is: the vehicle grounding point corresponds to the orientation to which the vehicle belongs, for example: the orientation attribute information includes, but is not limited to: left front wheel, left rear wheel, right rear wheel and left front wheel.

The pre-trained vehicle detection model comprises a feature extraction layer, a feature classification layer, a feature regression layer and a post-processing layer.

Correspondingly, the detection information of the grounding point of each wheel in the image to be processed is determined; the process of associating the detection information of the wheel grounding point corresponding to each vehicle and outputting the position information and the orientation attribute information of the wheel grounding point set of the vehicle for each vehicle may be:

the electronic equipment utilizes the feature extraction layer to extract features of the image to be processed and determine a feature image corresponding to the image to be processed; determining an area where at least one suspected wheel grounding point in the characteristic image is located by utilizing the characteristic classification layer and the characteristic image; determining first regression information and second regression information corresponding to each pixel point to be processed in the area where the at least one suspected wheel grounding point is located by utilizing the characteristic regression layer and the characteristic image; determining detection information of the grounding point of each wheel in the image to be processed by utilizing the post-processing layer and the first return information corresponding to each pixel point to be processed; determining the detection information of the wheel grounding point corresponding to each vehicle in the image to be processed by utilizing the post-processing layer, the detection information of the wheel grounding point in the image to be processed and the second regression information corresponding to each pixel point to be processed; and outputting, for each vehicle, detection information of a set of wheel contact points of the vehicle in the form of a set of ordered points ordered in a preset order. The preset sequence is ordered as: reverse-time or clockwise ordering.

For example, when the preset sequence is counterclockwise, the image to be detected includes a vehicle a, a corresponds to three wheel grounding points, namely a left front wheel, a left rear wheel and a right rear wheel, and when the wheel grounding points of the vehicle a are output, detection information of the wheel grounding points of the vehicles is sorted counterclockwise; for example, it may be: a- [ position information of the left front wheel, position information of the left rear wheel, and position information of the right rear wheel ].

In one case, in order to better discriminate the wheel grounding points of different vehicles, it is possible to output an image to be detected identifying detection information of the wheel grounding point of each vehicle detected at the same time when outputting the detection information of the wheel grounding point of each vehicle, and in the image to be detected, the detection information of the wheel grounding points belonging to the same vehicle is represented by line connection, and the detection information of the wheel grounding points of different vehicles are not connected to each other.

The first regression information is translation information required for representing that the corresponding pixel point to be processed regresses to the current wheel grounding point corresponding to the pixel point to be processed; the second regression information is translation information required for representing that the corresponding pixel point to be processed regresses to the next wheel grounding point corresponding to the pixel point to be processed, and the current wheel grounding point corresponding to the pixel point to be processed is as follows: a wheel grounding point of an area where a suspected wheel grounding point is located; and sorting the next wheel grounding point corresponding to the pixel point to be processed and the current wheel grounding point corresponding to the pixel point to be processed according to a preset sequence.

The size of the characteristic image corresponding to the image to be detected is the same as that of the image to be detected, and the area of each suspected wheel grounding point can be as follows: and a circular area centered on the suspected wheel-contact point predicted by the feature classification layer and having a predetermined length as a radius. The area of each suspected wheel grounding point comprises pixel points which are called to-be-processed pixel points, and under an ideal state, the area of each suspected wheel grounding point is covered with the to-be-processed pixel points which are used as the wheel grounding points.

In one implementation, the process of determining the detection information of the grounding point of each wheel in the image to be processed by using the post-processing layer and the first regression information corresponding to each pixel point to be processed is as follows: determining a first voting score corresponding to each pixel point to be processed by using the position information of each pixel point to be processed and the corresponding first return information thereof, wherein the pixel point to be processed is translated by using the position information of the pixel point to be processed and the corresponding first return information thereof, and the first voting score corresponding to the pixel point at the position to which the pixel point is translated is added by one; determining the pixel points to be processed, corresponding to which the first voting score exceeds a first score threshold value, from the pixel points to be processed, and determining the pixel points to be alternative pixel points; determining candidate pixel points, from which the corresponding first voting scores are higher than the corresponding first voting scores of the four-neighborhood pixel points, as candidate wheel grounding points; and determining the grounding point of each wheel in the image to be detected from the grounding point of the candidate wheel by utilizing the detection information of the grounding point of each candidate wheel and a preset suppression algorithm, and determining the detection information of the grounding point of each wheel.

The preset inhibition algorithm may be a Non-Maximum inhibition algorithm (NMS). Correspondingly, determine each wheel ground point in waiting to detect the image to determine the process of the detection information of each wheel ground point, can be: and sequencing the candidate wheel grounding points according to the size of the corresponding first voting score, sequentially taking each undeleted candidate wheel grounding point in the sequencing queue as a current point, calculating the distance between the current point and the other undeleted candidate wheel grounding point according to the position information of the current point and the position information of the other undeleted candidate wheel grounding points in the sequencing queue, judging whether the distance is smaller than a first preset value or not, and if the distance is smaller than the first preset value, deleting the other undeleted candidate wheel grounding points from the sequencing queue. And determining the remaining candidate wheel grounding points in the sequencing queue as the wheel grounding points in the image to be detected and determining the detection information of each wheel grounding point until the process is finished aiming at each undeleted candidate wheel grounding point.

The detection information comprises position information and azimuth attribute information, and the characteristic image corresponding to the image to be detected determined based on the characteristic extraction layer is as follows: a characteristic diagram corresponding to each azimuth attribute information corresponding to the image to be detected; subsequently, based on the characteristic classification layer and the characteristic image, determining an area where at least one suspected wheel grounding point in the characteristic image is located, wherein the area where each suspected wheel grounding point is located corresponds to one piece of azimuth attribute information, and each pixel point to be processed in the area where each suspected wheel grounding point is located corresponds to the azimuth attribute information.

Wherein, if the pixel point of the position department that the pixel point to be processed translated to still is the pixel point in the area of the suspected wheel grounding point that this pixel point to be processed belongs to, then this pixel point to be processed translates the first voting score that corresponds to the pixel point of the position department, can understand as: and the first voting score corresponding to the pixel point to be processed at the position to which the pixel point to be processed is translated.

And if the position of the pixel point to be processed corresponding to the wheel grounding point after translation falls into the area of the suspected wheel grounding point where other wheel grounding points are located, the position of the pixel point to be processed corresponding to the wheel grounding point after translation corresponds to the direction attribute information corresponding to the area of the suspected wheel grounding point where other wheel grounding points are located. If the translated position of the pixel point to be processed corresponding to the wheel grounding point does not fall into the area where the suspected wheel grounding points of other wheel grounding points are located, the translated position of the pixel point to be processed corresponding to the wheel grounding point does not correspond to the azimuth attribute information.

In one implementation, the process of determining the detection information of the wheel grounding point corresponding to each vehicle in the image to be processed by using the post-processing layer, the detection information of each wheel grounding point in the image to be processed, and the second regression information corresponding to each pixel point to be processed may be: for each wheel grounding point, translating each pixel point to be processed by utilizing the detection information of each pixel point to be processed in the area where the suspected wheel grounding point corresponding to the wheel grounding point is located and the corresponding second regression information thereof, and determining the position of the pixel point corresponding to each pixel point to be processed corresponding to the wheel grounding point after translation; judging whether the position of a pixel point corresponding to each pixel point to be processed corresponding to the wheel grounding point after translation belongs to the area where suspected wheel grounding points corresponding to other wheel grounding points are located; determining a second voting score between the wheel grounding point and each of the other wheel grounding points based on the judgment result, wherein if the position of the pixel point corresponding to the wheel grounding point after the translation of each pixel point to be processed belongs to the area where the suspected wheel grounding point corresponding to the target wheel grounding point is located in the other wheel grounding points, the second voting score between the wheel grounding point and the target wheel grounding point is added by one; determining a wheel ground point having a highest second voting score with each of the other wheel ground points from the other wheel ground points based on a second voting score between the wheel ground point and each of the other wheel ground points; and judging whether the highest second voting score exceeds a second score threshold value, if so, determining a wheel grounding point with the highest second voting score between the wheel grounding point and the wheel grounding point, and determining detection information of the wheel grounding point corresponding to each vehicle in the image to be processed as a next wheel grounding point corresponding to the wheel grounding point.

Wherein, the detection information includes position information and orientation attribute information, and the above-mentioned pixel position that corresponds after judging the translation of each pixel point to be processed that this wheel earthing point corresponds whether belongs to the process in the regional of the suspected wheel earthing point that other wheel earthing points correspond, can be: judging whether the translated position of the pixel point to be processed corresponding to the wheel grounding point corresponds to azimuth attribute information or not, if so, determining the azimuth attribute information corresponding to the translated position of the pixel point to be processed corresponding to the wheel grounding point, and if so, determining other wheel grounding points which are the same as the azimuth attribute information corresponding to the area where the suspected wheel grounding point corresponding to the other wheel grounding point is located, and correspondingly judging that the translated position of the pixel point to be processed corresponding to the wheel grounding point belongs to the area where the suspected wheel grounding point corresponding to the other wheel grounding point is located, wherein the other wheel grounding points which are the same as the azimuth attribute information corresponding to the wheel grounding point are used as target other wheel grounding points; accordingly, based on the determination result, the second voting score between the wheel grounding point and the target other wheel grounding point is incremented by one. If the azimuth attribute information does not correspond to the judgment result, the judgment result is as follows: the translated position of the pixel point to be processed corresponding to the wheel grounding point does not belong to the area where the suspected wheel grounding point corresponding to any other wheel grounding point is located, and correspondingly, the second voting scores between the wheel grounding point and the other wheel grounding points are not added by one based on the judgment result.

The training process of the preset trained vehicle detection model is similar to the detection process, in the training process, after the predicted detection information of the wheel grounding point corresponding to the sample vehicle in each sample image is determined by using the initial vehicle detection model, the updated value of the model parameter is determined by using the preset optimization function, the predicted detection information of the wheel grounding point corresponding to the sample vehicle in the sample image and the calibrated detection information of the wheel grounding point corresponding to the sample vehicle in the calibrated information corresponding to the sample image, and the current values of the model parameter of the feature extraction layer, the feature classification layer and the feature regression layer of the initial vehicle detection model are corrected into the determined updated value of the model parameter; and circularly adjusting the values of the model parameters of the feature extraction layer, the feature classification layer and the feature regression layer of the initial vehicle detection model until the initial vehicle detection model reaches a preset convergence condition, so as to obtain a preset training vehicle detection model comprising the feature extraction layer, the feature classification layer, the feature regression layer and the post-processing layer.

The preset optimization function may be an optimization function of any type of model parameters in related technologies such as a gradient descent method, and the embodiment of the present invention is not limited. The preset convergence condition may be: identifying and detecting each image in the verification set based on the vehicle detection model obtained after the parameters are adjusted, wherein the proportion of the obtained predicted detection information corresponding to each image in the verification set to the calibrated detection information corresponding to each image in the verification set exceeds a preset proportion threshold; alternatively, it may be: identifying and detecting each image in the verification set based on the vehicle detection model obtained after the parameters are adjusted, wherein the obtained predicted detection information corresponding to each image in the verification set exceeds a preset quantity threshold value by the quantity of results consistent with the calibrated detection information corresponding to each image in the verification set; alternatively, it may be: and iteratively adjusting the times of model parameters of a feature extraction layer, a feature classification layer and a feature regression layer of the initial vehicle detection model to exceed the preset times and the like.

S102: and for each image to be processed, determining three-dimensional position information corresponding to each perception target in the image to be processed by using the equipment information of the image acquisition equipment corresponding to the image to be processed and the grounding point information of each perception target in the image to be processed.

In this step, the electronic device may determine, for each to-be-processed image, three-dimensional position information corresponding to each sensing target in the to-be-processed image by using the device information of the image acquisition device corresponding to the to-be-processed image and the grounding point information of each sensing target in the to-be-processed image, with reference to an imaging principle of the image acquisition device and a triangle similarity principle.

The device information of the image acquisition device corresponding to the image to be processed comprises: the external reference information and the internal reference information of the image acquisition device, wherein the external reference information includes but is not limited to the installation height information of the image acquisition device, and the internal reference information includes but is not limited to: focal length information of the image acquisition device.

In an implementation manner of the present invention, the S102 may include the following steps:

and aiming at each image to be processed, determining three-dimensional position information corresponding to each perception target in the image to be processed by utilizing a preset three-dimensional position information determination algorithm, and determining the mounting height information and the visual field vanishing point position information in the equipment information of the image acquisition equipment corresponding to the image to be processed and the grounding point information of each perception target in the image to be processed.

The preset three-dimensional position information determination algorithm comprises the following steps: and (4) an algorithm based on the imaging principle of the image acquisition equipment and the similar triangle relation determination.

The visual field vanishing point position information in the device information of the image capturing device may refer to: the point of the image acquisition device which is farthest away from the visual field in the image acquisition area projects the position information of the imaging plane of the image acquisition device, namely the position information projected in the image to be processed acquired by the image acquisition device. After the installation position of each image acquisition device is determined, the visual field vanishing point position information in the device information of each image acquisition device is determined. Wherein, the ordinate of the visual field vanishing point position information in the device information of the image capturing device may be equal to the installation height information of the image capturing device.

The image acquisition equipment corresponding to the image to be processed is as follows: and the image acquisition equipment acquires the image to be processed.

Specifically, the S102 may specifically include 011-:

011: and determining depth information corresponding to each perception target by utilizing a preset three-dimensional position information determination algorithm, mounting height information and visual field vanishing point position information in the equipment information of the image acquisition equipment corresponding to the image to be processed, and grounding point information of each perception target in the image to be processed.

012: and determining the position information of each perception target in the image to be processed under the equipment coordinate system of the corresponding image acquisition equipment based on the depth information and the grounding point information corresponding to the perception target and the preset projection matrix of the corresponding image acquisition equipment.

013: and for each perception target in the image to be processed, determining three-dimensional position information corresponding to each perception target in the image to be processed based on a position conversion relation between an equipment coordinate system of image acquisition equipment corresponding to the perception target and a vehicle body coordinate system of the target vehicle and position information of the perception target in the equipment coordinate system of the image acquisition equipment corresponding to the perception target.

According to the imaging principle of the image acquisition equipment, the installation position of the image acquisition equipment corresponding to the perception target, the contact position of the physical target corresponding to the perception target and the ground, and the intersection point of the installation position of the image acquisition equipment corresponding to the perception target and the vertical line of the ground and the ground form a triangle by three points, wherein the intersection point of the installation position of the image acquisition equipment corresponding to the perception target and the vertical line of the ground and the ground can be regarded as the physical distance from the physical target corresponding to the perception target to the image acquisition equipment corresponding to the perception target, namely the depth information.

And the visual field vanishing point of the image acquisition equipment corresponding to the perception target is at the position of the corresponding translation point in the image to be processed where the perception target is located, the imaging position corresponding to the grounding point information corresponding to the perception target and the installation position of the image acquisition equipment corresponding to the perception target, and the three points form a triangle. The position of a visual field vanishing point of the image acquisition equipment corresponding to the perception target in a translation point corresponding to the to-be-processed image of the perception target is as follows: under the premise that the ordinate of the position of the visual field vanishing point of the image acquisition equipment corresponding to the perception target in the to-be-processed image of the perception target is not changed, the position of the visual field vanishing point of the image acquisition equipment corresponding to the perception target in the to-be-processed image of the perception target is translated to the position of the same abscissa in the grounding point information corresponding to the perception target. Considering that the area of an imaging surface of the image acquisition device is smaller relative to the actual physical target and the distance between the actual physical target and the image acquisition device, in a specific calculation process, the distance between the visual field vanishing point of the image acquisition device corresponding to the perception target, the position of the corresponding translation point in the to-be-processed image of the perception target, and the installation position of the image acquisition device corresponding to the perception target can be considered to be equal to the focal length information of the image acquisition device.

The mounting position of the image acquisition device corresponding to the perception target, the grounding point information corresponding to the perception target and the contact position of the physical target corresponding to the perception target and the ground are on the same straight line, and the distance between the actual physical position of the visual field vanishing point of the image acquisition device corresponding to the perception target and the image acquisition device is larger, so that the visual field vanishing point of the image acquisition device corresponding to the perception target is considered to be in the position of the translation point corresponding to the perception target in the image to be processed, and the connecting line between the mounting position of the image acquisition device corresponding to the perception target and the ground is parallel, and the two triangles are considered to be similar.

As shown in fig. 3B, wherein fig. 3B is a schematic perspective view of fig. 3A, wherein the point 1 shown in fig. 3B is: a visual field vanishing point of the image acquisition equipment corresponding to the perception target is positioned at a corresponding translation point in the to-be-processed image of the perception target; the point 2 is the installation position of the image acquisition equipment corresponding to the perception target, and the point 3 is the position of the visual field vanishing point of the image acquisition equipment corresponding to the perception target in the to-be-processed image of the perception target, namely the position represented by the visual field vanishing point position information; point 4 is grounding point information of the perception target, namely position information of an imaging point in the image to be processed; the point 5 is the contact position of the physical target corresponding to the perception target and the ground; the point 6 is an intersection point of the position of the image acquisition device corresponding to the perception target and a perpendicular line of the ground, wherein a triangle formed by the

points

1, 4 and 2 is similar to a triangle formed by the

points

2, 5 and 6.

Correspondingly, as shown in fig. 3A, the diagram is a side view illustration of an imaging condition of an image capturing device, a widest vertical line on the left side in fig. 3A represents an imaging surface of the image capturing device in a side view state, that is, an image to be processed, a "pin hole" in fig. 3A represents a position where the image capturing device corresponding to a sensing target is located, a widest vertical line on the left side in fig. 3A represents an imaging surface of the image capturing device in a side view state, and a distance between positions where the image capturing device corresponding to the sensing target is located is focal length information of the image capturing device; y represents grounding point information of a sensing target in an imaging plane, namely position information of an imaging point in an image to be processed, FOEy represents visual field vanishing point position information of an image acquisition device corresponding to the sensing target in the imaging plane, namely position information of the visual field vanishing point of the image acquisition device corresponding to the sensing target in the image to be processed, H represents installation height information of the image acquisition device corresponding to the sensing target, and an intersection point of a dotted line and a ground in fig. 3A represents: the contact position of the physical target corresponding to the perception target and the ground, the intersection point of the position of the image acquisition equipment corresponding to the perception target and the perpendicular line of the ground, and the distance between the contact position of the physical target corresponding to the perception target and the ground are as follows: and sensing depth information corresponding to the target.

Accordingly, the preset three-dimensional position information determination algorithm may be expressed by the following formula (1):

Depth＝H*f/(y-FOEy) (1)；

the Depth information corresponding to a perception target is represented by Depth, the installation height information of the image acquisition equipment corresponding to the perception target is represented by H, the longitudinal coordinate value in grounding point information of the perception target is represented by y, namely the longitudinal coordinate value in position information of the perception target in an image to be processed is represented by FOEy, and the longitudinal coordinate value in visual field vanishing point position information of the image acquisition equipment corresponding to the perception target is represented by FOEy; f represents the focal length information of the image acquisition equipment corresponding to the perception target.

After the electronic device determines the depth information corresponding to each sensing target, the electronic device may determine the position information of the sensing target in the device coordinate system of the corresponding image acquisition device based on the depth information corresponding to the sensing target, the grounding point information, and the preset projection matrix of the corresponding image acquisition device; the method comprises the steps of determining the installation positions of image acquisition equipment and a target vehicle, determining the relative position relationship between the image acquisition equipment and the target vehicle, and further determining the three-dimensional position information corresponding to each sensing target in an image to be processed based on the position conversion relationship between the equipment coordinate system of the image acquisition equipment corresponding to the sensing target and a vehicle body coordinate system of the target vehicle and the position information of the sensing target in the equipment coordinate system of the image acquisition equipment corresponding to the sensing target. And determining the three-dimensional position information corresponding to each perception target in the image to be processed as the three-dimensional position information under the body coordinate system of the target vehicle.

S103: and judging whether the perception target corresponds to an effective historical association relation or not aiming at each perception target.

Wherein, the historical association relation refers to: the association relationship determined before the subsequent target association process is performed on the perception target may include: the incidence relation containing the identification information of the perception target is determined based on the previous frame image of the image to be processed, and the incidence relation containing the identification information of the perception target is determined based on the obtained image to be processed.

In an implementation manner of the present invention, the step S203 may include the following steps 021-:

021: and judging whether the perception target corresponds to the historical association relation or not aiming at each perception target.

022: and if the historical association relationship corresponding to the perception target is judged, judging whether the historical association relationship corresponding to the perception target is effective.

If the target corresponding to the historical association relationship corresponding to the perception target is judged to exist, and the length of the determination time corresponding to the historical association relationship is lower than a preset time threshold, determining that the historical association relationship corresponding to the perception target is valid; otherwise, determining that the historical association relation corresponding to the perception target is invalid.

The electronic equipment judges whether the preset storage space stores the association relation containing the identification information of the perception target or not aiming at each perception target, judges whether the historical association relation corresponding to the perception target is effective or not under the condition that the historical association relation corresponding to the perception target is determined to be stored, namely judges whether the perception target corresponding to the other identification information contained in the historical association relation corresponding to the perception target exists or not, and determines whether the duration corresponding to the association relation containing the identification information of the perception target exceeds a preset time threshold or not if the history association relation exists and the determined duration corresponding to the association relation containing the identification information of the perception target does not exceed the preset time threshold, and then determines to judge that the historical association relation corresponding to the perception target is effective.

The above determining whether the perception target corresponding to the other identification information included in the historical association relationship corresponding to the perception target exists may be: judging whether a perception target corresponding to another identification information contained in a historical association relation corresponding to the perception target is contained in the perception target corresponding to the image acquisition equipment adjacent to the image acquisition equipment corresponding to the perception target, if so, judging that the perception target corresponding to another identification information contained in the historical association relation corresponding to the perception target exists, otherwise, judging that the perception target corresponding to another identification information contained in the historical association relation corresponding to the perception target does not exist.

In the process of judging whether a perception target corresponding to another identification information included in the historical association relationship corresponding to the perception target is included in the perception targets corresponding to the image acquisition devices adjacent to the position of the image acquisition device corresponding to the perception target, the semantic perception information corresponding to the perception target can be determined according to the semantic perception information corresponding to the perception target and/or in combination with the driving speed of the vehicle.

Wherein, the determined duration corresponding to the incidence relation containing the identification information of the perception target is as follows: and determining the time length between the latest time of the incidence relation containing the identification information of the perception target and the current moment. The current time may be a time when semantic perception information of each perception target in each image to be processed is obtained, or a time when whether the perception target corresponds to an effective historical association relationship is determined.

S104: and if the perception target does not correspond to the effective historical association relationship, determining the three-dimensional distance between the perception target and each perception target to be processed corresponding to the perception target by utilizing the three-dimensional position information corresponding to the perception target and the three-dimensional position information corresponding to the perception target to be processed.

The perception target to be processed corresponding to the perception target is as follows: and the perception target which corresponds to the image acquisition equipment adjacent to the image acquisition equipment corresponding to the perception target and does not correspond to the incidence relation.

After the electronic device determines that the perception target does not correspond to the effective historical association relationship, the electronic device may determine the three-dimensional distance between the perception target and each perception target to be processed corresponding to the perception target by using the three-dimensional position information corresponding to the perception target and the three-dimensional position information corresponding to the perception target to be processed. The perception target to be processed corresponding to the perception target is as follows: and the perception target which corresponds to the image acquisition equipment adjacent to the image acquisition equipment corresponding to the perception target and does not correspond to the incidence relation. Namely, the perception target which is not in the corresponding association relation is detected from the to-be-processed image acquired by the image acquisition equipment adjacent to the image acquisition equipment corresponding to the perception target.

For the perception target corresponding to the historical association relationship but having an invalid historical association relationship, the invalid historical association relationship corresponding to the perception target may be deleted, the three-dimensional position information corresponding to the perception target and the three-dimensional position information corresponding to the perception target to be processed may be utilized to determine the three-dimensional distance between the perception target and each perception target to be processed corresponding to the perception target, and a subsequent target association process may be executed.

It can be understood that the electronic device may obtain a corresponding relationship between each sensing target and each to-be-processed image, and obtain a corresponding relationship between each to-be-processed image and the image capturing device, and further, may determine a corresponding relationship between each sensing target and the image capturing device.

In one implementation, the three-dimensional distance between the sensing target and each of the sensing targets to be processed corresponding to the sensing target is a euclidean distance.

S105: and determining an associated perception target which has an association relation with the perception target from the corresponding perception target to be processed by utilizing the three-dimensional distance between the perception target and the perception target to be processed corresponding to the perception target.

In an implementation manner of the present invention, the electronic device may determine, from the to-be-processed sensing targets corresponding to the sensing targets that do not correspond to the valid historical association relationship, a to-be-processed sensing target with the smallest three-dimensional distance, as the association sensing target having an association relationship with the sensing target, by using the three-dimensional distance between the sensing target that does not correspond to the valid historical association relationship and the to-be-processed sensing target corresponding to the sensing target.

In another implementation manner of the present invention, the S104 may include the following steps 031-:

031: and judging whether the three-dimensional distance between the perception target and the perception target to be processed corresponding to the perception target is lower than a preset distance threshold value or not.

032: and if the three-dimensional distance between the perception target and the perception target to be processed corresponding to the perception target is judged to be lower than a preset distance threshold, determining the perception target to be processed corresponding to the three-dimensional position information as a perception target to be associated corresponding to the perception target.

033: and determining an associated perception target which has an association relation with the perception target from each perception target to be associated corresponding to the perception target.

It can be understood that, when sensing targets corresponding to different image acquisition devices are considered, three-dimensional position information between the sensing targets and the same physical target should be overlapped theoretically, and the distance between the three-dimensional position information corresponding to the sensing target of the same physical target is not too large in consideration of errors of the constructed three-dimensional position information. Correspondingly, the electronic equipment judges whether the three-dimensional distance between the perception target and the perception target to be processed corresponding to the perception target is lower than a preset distance threshold value or not; if the three-dimensional distance between the perception target and the perception target to be processed corresponding to the perception target is judged to be lower than a preset distance threshold, the two targets are considered to possibly have an association relationship, and the perception target to be processed corresponding to the three-dimensional position information is determined to be the perception target to be associated corresponding to the perception target; and subsequently, determining an associated perception target which has an association relation with the perception target from each perception target to be associated corresponding to the perception target based on the three-dimensional distance between the perception target and each perception target to be associated corresponding to the perception target.

As shown in fig. 3C, there may be 1 perception target corresponding to the a camera, i.e., the target in the a camera shown in fig. 3C, there may be 1 perception target corresponding to the B camera, i.e., the target 1 in the B camera and the target 2 in the B camera shown in fig. 3C, and the a camera and the B camera are adjacent cameras, where the distance between the perception target corresponding to the a camera and the target 1 corresponding to the B camera is s1, the distance between the perception target 2 and the target 2 is s2, a threshold th, and s1< th < s2, and there may be an association between the perception target corresponding to the a camera and the target 1 corresponding to the B camera, and there may be no association between the target 2.

If the perception target to be associated corresponding to the perception target comprises one perception target, determining the perception target to be associated corresponding to the perception target as an associated perception target which has an association relation with the perception target; if the perception target to be associated corresponding to the perception target comprises a plurality of objects, that is, the same perception target may have a potential association relationship with a plurality of objects to be processed corresponding to adjacent image acquisition devices, but considering that an actual physical target only appears once in a single image acquisition device, the perception target to be associated corresponding to the perception target with the smallest three-dimensional distance may be determined as the associated perception target having an association relationship with the perception target.

As shown in fig. 3D, the sensing target corresponding to the camera a, such as the target in the camera a shown in fig. 3D, and at the same time, the target 1 and the target 2 corresponding to the camera B, that is, the target 1 in the camera B and the target 2 in the camera B shown in fig. 3D, may have an association relationship, that is, s1< th, and s2< th, but since the distance between the sensing target corresponding to the camera a and the target 2 is closer, it may be considered that the association between the sensing target corresponding to the camera a and the target 2 corresponding to the camera B is a true association, and the sensing target corresponding to the camera a and the target 2 corresponding to the camera B are selected as a 2D combination of the same 3D target, that is, a physical target, that is, it is determined that the sensing target corresponding to the camera a and the target 2 corresponding to the camera B have an association relationship, and the two are associated sensing targets with each other.

In another implementation, after determining a perception target to be associated corresponding to a perception target with an incorrect historical association relationship, the electronic device may establish an association side between the perception target and the perception target to be associated corresponding to the perception target, and further determine an associated perception target having an association relationship with the perception target from each perception target to be associated corresponding to the perception target by using a preset matching algorithm and the established association side between the perception target and the perception target to be associated corresponding to the perception target. The preset matching algorithm may be a hungarian matching algorithm.

By applying the embodiment of the invention, the three-dimensional position information corresponding to each perception target in each image to be processed can be determined by utilizing the equipment information of the image acquisition equipment corresponding to each image to be processed and the grounding point information of each perception target in the image to be processed, further, under the condition that the sensing target does not correspond to the effective historical association relation, the three-dimensional position information corresponding to the sensing target is utilized, determining the three-dimensional distance between the perception target and each perception target to be processed corresponding to the perception target according to the three-dimensional position information corresponding to the perception target to be processed, determining an associated perception target which has an association relation with the perception target in the perception targets to be processed according to the three-dimensional distance between the perception target and each perception target to be processed corresponding to the perception target, namely, the associated perception targets corresponding to the same physical target are determined, and the associated determination of the perception targets corresponding to the same physical target in the images acquired by the multi-image acquisition equipment is realized.

In another embodiment of the present invention, as shown in fig. 2, the method may include the steps of:

s201: and semantic perception information detection is carried out on the images to be processed, which are acquired by a plurality of image acquisition devices at the same acquisition time aiming at the environment where the target vehicle is located, so that the semantic perception information of each perception target in each image to be processed is determined.

The system comprises a plurality of image acquisition devices, a camera and a display, wherein the plurality of image acquisition devices shoot environments of target vehicles from different angles; the semantic perception information of each perception target comprises: and sensing grounding point information of the target.

S202: and for each image to be processed, determining three-dimensional position information corresponding to each perception target in the image to be processed by using the equipment information of the image acquisition equipment corresponding to the image to be processed and the grounding point information of each perception target in the image to be processed.

S203: judging whether each perception target corresponds to an effective historical association relation or not;

if the sensing target does not correspond to the effective historical association relationship, executing S204; if the sensing target corresponds to the valid historical association relationship, S206 is executed.

S204: and determining the three-dimensional distance between the perception target and each perception target to be processed corresponding to the perception target by utilizing the three-dimensional position information corresponding to the perception target and the three-dimensional position information corresponding to the perception target to be processed.

S205: and determining an associated perception target which has an association relation with the perception target from the corresponding perception target to be processed by utilizing the three-dimensional distance between the perception target and the perception target to be processed corresponding to the perception target.

S206: and determining the target corresponding to the effective historical association relation corresponding to the perception target as the association perception target with the association relation with the perception target.

Wherein S201 is the same as S101 shown in fig. 1, S202 is the same as S102 shown in fig. 1, S203 is the same as S103 shown in fig. 1, S204 is the same as S104 shown in fig. 1, and S205 is the same as S105 shown in fig. 1, and thus, the description thereof is omitted.

And determining a target corresponding to the effective historical association relation corresponding to the perception target as an association perception target having an association relation with the perception target, and storing the association relation, wherein the association relation comprises the perception target and identification information of the association perception target having the association relation with the perception target. The identification information of the perception target is information which can uniquely determine the perception target. Subsequently, the electronic device may modify the determination time corresponding to the association relationship to a latest determination time, that is, may modify the determination time to a determination time at which the sensing target and the associated sensing target having the association relationship therewith are determined.

In order to reduce the calculation amount to a certain extent and consider the factors that the period of the image acquisition equipment for acquiring the images is short and the difference of the perception targets between the adjacent images is not large, under the condition that the effective historical association relationship corresponding to the perception targets is judged, the effective historical association relationship corresponding to the perception targets is continuously used.

Corresponding to the foregoing method embodiment, an embodiment of the present invention provides a target association apparatus, as shown in fig. 4, which may include:

an obtaining module 410 configured to obtain semantic perception information of each perception target in each image to be processed, where the image to be processed is: the plurality of image acquisition devices shoot acquired images aiming at the environment where the target vehicle is located from different angles in the same acquisition period; the semantic perception information of each perception target comprises: sensing grounding point information of a target;

the first determining module 420 is configured to determine, for each to-be-processed image, three-dimensional position information corresponding to each sensing target in the to-be-processed image by using the device information of the image acquisition device corresponding to the to-be-processed image and the grounding point information of each sensing target in the to-be-processed image;

a determining module 430 configured to determine, for each sensing target, whether the sensing target corresponds to a valid historical association;

a second determining module 440, configured to, if it is determined that the sensing target does not correspond to an effective historical association relationship, determine a three-dimensional distance between the sensing target and each sensing target to be processed corresponding to the sensing target by using the three-dimensional position information corresponding to the sensing target and the three-dimensional position information corresponding to the sensing target to be processed, where the sensing target to be processed corresponding to the sensing target is: the perception target which is not corresponding to the incidence relation and corresponds to the image acquisition equipment adjacent to the image acquisition equipment corresponding to the perception target;

the third determining module 450 is configured to determine, from the to-be-processed sensing targets corresponding to the sensing target, an associated sensing target having an association relationship with the sensing target by using the three-dimensional distance between the sensing target and the to-be-processed sensing target corresponding to the sensing target.

By applying the embodiment of the invention, the three-dimensional position information corresponding to each perception target in each image to be processed can be determined by utilizing the equipment information of the image acquisition equipment corresponding to each image to be processed and the grounding point information of each perception target in the image to be processed, further, under the condition that the perception target does not correspond to the effective historical association relation, the three-dimensional position information corresponding to the perception target is utilized, determining the three-dimensional distance between the perception target and each perception target to be processed corresponding to the perception target according to the three-dimensional position information corresponding to the perception target to be processed, determining an associated perception target which has an association relation with the perception target in the perception targets to be processed according to the three-dimensional distance between the perception target and each perception target to be processed corresponding to the perception target, namely, the associated perception targets corresponding to the same physical target are determined, and the associated determination of the perception targets corresponding to the same physical target in the images acquired by the multi-image acquisition equipment is realized.

In another embodiment of the present invention, the apparatus further comprises:

a fourth determining module (not shown in the figure), configured to determine, if it is determined that the sensing target corresponds to the valid historical association relationship, a target corresponding to the valid historical association relationship corresponding to the sensing target as an associated sensing target having an association relationship with the sensing target.

In another embodiment of the present invention, the determining module 430 is specifically configured to determine, for each sensing target, whether the sensing target corresponds to a historical association relationship;

In another embodiment of the present invention, the first determining module 420 is specifically configured to determine, for each image to be processed, three-dimensional position information corresponding to each sensing target in the image to be processed by using a preset three-dimensional position information determining algorithm, where the preset three-dimensional position information determining algorithm is: and (4) an algorithm based on the imaging principle of the image acquisition equipment and the similar triangle relation determination.

In another embodiment of the present invention, the first determining module 420 is specifically configured to determine depth information corresponding to each sensing target by using a preset three-dimensional position information determining algorithm, installation height information and view vanishing point position information in the device information of the image capturing device corresponding to the image to be processed, and grounding point information of each sensing target in the image to be processed;

In another embodiment of the present invention, the third determining module 450 is specifically configured to determine whether a three-dimensional distance between the sensing target and a sensing target to be processed corresponding to the sensing target is lower than a preset distance threshold;

The device and system embodiments correspond to the method embodiment, and have the same technical effects as the method embodiment, and reference is made to the method embodiment for specific description. The device embodiment is obtained based on the method embodiment, and for specific description, reference may be made to the method embodiment section, which is not described herein again.

Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.

Those of ordinary skill in the art will understand that: modules in the devices in the embodiments may be distributed in the devices in the embodiments according to the description of the embodiments, or may be located in one or more devices different from the embodiments with corresponding changes. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An object association method, comprising:

obtaining semantic perception information of each perception target in each image to be processed, wherein the image to be processed is as follows: the method comprises the following steps that a plurality of image acquisition devices shoot acquired images from different angles in the same acquisition period aiming at the environment where a target vehicle is located; the semantic perception information of each perception target comprises: sensing grounding point information of a target;

if the perception target does not correspond to the effective historical association relationship, determining the three-dimensional distance between the perception target which does not correspond to the effective historical association relationship and each perception target to be processed corresponding to the perception target which does not correspond to the effective historical association relationship by utilizing the three-dimensional position information corresponding to the perception target which does not correspond to the effective historical association relationship and the three-dimensional position information corresponding to the perception target which does not correspond to the effective historical association relationship, wherein the perception target to be processed corresponding to the perception target which does not correspond to the effective historical association relationship is as follows: the perception target which corresponds to the image acquisition equipment and does not correspond to the incidence relation is adjacent to the image acquisition equipment corresponding to the perception target which does not correspond to the effective historical incidence relation;

determining an association perception target which has an association relation with a perception target which does not correspond to an effective historical association relation from the corresponding perception target to be processed by utilizing the three-dimensional distance between the perception target which does not correspond to the effective historical association relation and the corresponding perception target to be processed;

the step of determining the three-dimensional position information corresponding to each perception target in the image to be processed by using the device information of the image acquisition device corresponding to the image to be processed and the grounding point information of each perception target in the image to be processed aiming at each image to be processed comprises the following steps:

for each image to be processed, determining depth information corresponding to each perception target by using a preset three-dimensional position information determination algorithm, installation height information in equipment information of image acquisition equipment corresponding to the image to be processed, visual field vanishing point position information, focal length information of the image acquisition equipment corresponding to the image to be processed and grounding point information of each perception target in the image to be processed;

for each perception target in the image to be processed, determining position information of each perception target under an equipment coordinate system of the corresponding image acquisition equipment based on depth information and grounding point information corresponding to each perception target and a preset projection matrix of the corresponding image acquisition equipment;

for each perception target in the image to be processed, determining three-dimensional position information corresponding to each perception target in the image to be processed based on a position conversion relation between an equipment coordinate system of image acquisition equipment corresponding to each perception target and a vehicle body coordinate system of the target vehicle and position information of each perception target in the equipment coordinate system of the image acquisition equipment corresponding to each perception target;

wherein the preset three-dimensional position information determination algorithm is represented by the following formula:

Depth＝H*f/(y-FOEy)；

the Depth represents Depth information corresponding to a perception target, the H represents the installation height information, the y represents an ordinate value in grounding point information of the perception target, the FOEy represents an ordinate value in the visual field vanishing point position information, and the f represents the focal length information.

2. The method of claim 1, wherein the method further comprises:

and if the sensing target is judged to correspond to the effective historical association relationship, determining the target corresponding to the effective historical association relationship corresponding to the sensing target corresponding to the effective historical association relationship as the associated sensing target having the association relationship with the sensing target corresponding to the effective historical association relationship.

3. The method of claim 1, wherein the step of determining, for each perception object, whether the perception object corresponds to a valid historical association comprises:

judging whether the perception target corresponds to a historical association relation or not aiming at each perception target;

if the historical association relationship corresponding to the perception target of the corresponding historical association relationship is judged, judging whether the historical association relationship corresponding to the perception target of the corresponding historical association relationship is valid, wherein if the target corresponding to the historical association relationship corresponding to the perception target of the corresponding historical association relationship is judged to exist and the corresponding determination time length of the historical association relationship is lower than a preset time threshold value, the historical association relationship corresponding to the perception target of the corresponding historical association relationship is determined to be valid; otherwise, determining that the historical association relation corresponding to the perception target corresponding to the historical association relation is invalid.

4. The method according to any one of claims 1 to 3, wherein the step of determining, from the perception targets to be processed corresponding thereto, the associated perception target having an association relationship with the perception target not corresponding to the valid historical association relationship by using the three-dimensional distance between the perception target not corresponding to the valid historical association relationship and the perception target to be processed corresponding thereto, comprises:

judging whether the three-dimensional distance between the perception target which does not correspond to the effective historical association relation and the perception target to be processed corresponding to the perception target is lower than a preset distance threshold value or not;

if the three-dimensional distance between the perception target which does not correspond to the effective historical association relationship and the perception target to be processed corresponding to the perception target which does not correspond to the effective historical association relationship is judged to be lower than the preset distance threshold, the perception target to be processed corresponding to the three-dimensional position information is determined to be the perception target to be associated corresponding to the perception target which does not correspond to the effective historical association relationship;

and determining an associated sensing target which has an association relation with the sensing target which does not correspond to the effective historical association relation from each sensing target to be associated which corresponds to the sensing target which does not correspond to the effective historical association relation.

5. An object associating apparatus, comprising:

an obtaining module configured to obtain semantic perception information of each perception target in each image to be processed, wherein the image to be processed is: the method comprises the following steps that a plurality of image acquisition devices shoot acquired images from different angles in the same acquisition period aiming at the environment where a target vehicle is located; the semantic perception information of each perception target comprises: sensing grounding point information of a target;

the first determining module is configured to determine, for each image to be processed, three-dimensional position information corresponding to each sensing target in the image to be processed by using the device information of the image acquisition device corresponding to the image to be processed and the grounding point information of each sensing target in the image to be processed;

the judging module is configured to judge whether each sensing target corresponds to an effective historical association relation;

a second determining module, configured to, if it is determined that the sensing target does not correspond to the valid historical association relationship, determine a three-dimensional distance between the sensing target that does not correspond to the valid historical association relationship and each sensing target to be processed corresponding to the sensing target that does not correspond to the valid historical association relationship, using three-dimensional position information corresponding to the sensing target that does not correspond to the valid historical association relationship and three-dimensional position information corresponding to the sensing target to be processed corresponding to the sensing target that does not correspond to the valid historical association relationship, where the sensing target to be processed corresponding to the sensing target that does not correspond to the valid historical association relationship is: the perception target which corresponds to the image acquisition equipment and does not correspond to the incidence relation is adjacent to the image acquisition equipment corresponding to the perception target which does not correspond to the effective historical incidence relation;

the third determining module is configured to determine, from the perception targets to be processed corresponding to the third determining module, an association perception target which has an association relationship with the perception target which does not correspond to the effective historical association relationship by using the three-dimensional distance between the perception target which does not correspond to the effective historical association relationship and the perception target to be processed corresponding to the perception target;

the first determining module is specifically configured to determine, for each image to be processed, depth information corresponding to each sensing target by using a preset three-dimensional position information determining algorithm, mounting height information in device information of image acquisition devices corresponding to the image to be processed, visual field vanishing point position information, focal length information of the image acquisition devices corresponding to the image to be processed, and grounding point information of each sensing target in the image to be processed; for each perception target in the image to be processed, determining position information of each perception target under an equipment coordinate system of the corresponding image acquisition equipment based on depth information and grounding point information corresponding to each perception target and a preset projection matrix of the corresponding image acquisition equipment; for each perception target in the image to be processed, determining three-dimensional position information corresponding to each perception target in the image to be processed based on a position conversion relation between an equipment coordinate system of image acquisition equipment corresponding to each perception target and a vehicle body coordinate system of the target vehicle and position information of each perception target in the equipment coordinate system of the image acquisition equipment corresponding to each perception target;

Depth＝H*f/(y-FOEy)；

the Depth represents Depth information corresponding to a perception target, the H represents the installation height information, the y represents a vertical coordinate value in grounding point information of the perception target, the FOEy represents a vertical coordinate value in the visual field vanishing point position information, and the f represents the focal length information.

6. The apparatus of claim 5, wherein the apparatus further comprises:

and the fourth determining module is configured to determine the target corresponding to the effective historical association relation corresponding to the perception target corresponding to the effective historical association relation as the associated perception target having the association relation with the perception target corresponding to the effective historical association relation if the perception target corresponding to the effective historical association relation is judged to correspond to the effective historical association relation.

7. The apparatus according to claim 5, wherein the determining module is specifically configured to determine, for each perception object, whether the perception object corresponds to a historical association relationship;

if the historical association relation corresponding to the perception target corresponding to the historical association relation is judged, judging whether the historical association relation corresponding to the perception target corresponding to the historical association relation is valid, wherein if the target corresponding to the historical association relation corresponding to the perception target corresponding to the historical association relation is judged to exist, and the length of the determination corresponding to the historical association relation is lower than a preset time threshold value, the historical association relation corresponding to the perception target corresponding to the historical association relation is determined to be valid; otherwise, determining that the historical association relation corresponding to the perception target is invalid.