CN115880665A - Three-dimensional reconstruction method, device and equipment for traffic sign position and storage medium - Google Patents

Three-dimensional reconstruction method, device and equipment for traffic sign position and storage medium Download PDF

Info

Publication number
CN115880665A
CN115880665A CN202211441838.5A CN202211441838A CN115880665A CN 115880665 A CN115880665 A CN 115880665A CN 202211441838 A CN202211441838 A CN 202211441838A CN 115880665 A CN115880665 A CN 115880665A
Authority
CN
China
Prior art keywords
target
traffic
frame
frames
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211441838.5A
Other languages
Chinese (zh)
Inventor
张键驰
徐林鵾
刘德浩
孙力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Kunpeng Space Information Technology Co ltd
Original Assignee
Guangdong Kunpeng Space Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Kunpeng Space Information Technology Co ltd filed Critical Guangdong Kunpeng Space Information Technology Co ltd
Priority to CN202211441838.5A priority Critical patent/CN115880665A/en
Publication of CN115880665A publication Critical patent/CN115880665A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the specification discloses a three-dimensional reconstruction method, a three-dimensional reconstruction device, three-dimensional reconstruction equipment and a storage medium for a traffic sign position. First, the position frame of the target traffic sign and a first number of pixels between the position frames in two adjacent frames of traffic images are determined in N consecutive frames of traffic images. Since the smaller the first number is, the greater the probability that two position frames corresponding to the first number correspond to the same target traffic object is, based on the first number, the target-associated frame combination is determined in the position frames in the N frames of traffic images. And further, performing three-dimensional reconstruction based on the position information of the target position frame in the target association frame combination and shooting pose information corresponding to the target traffic image to which the target position frame belongs to obtain the three-dimensional position information of the target traffic sign. And performing three-dimensional reconstruction on a plurality of two-dimensional positions of the target traffic sign by shooting the pose information, and accurately determining the three-dimensional position of the target traffic sign.

Description

Three-dimensional reconstruction method, device and equipment for traffic sign position and storage medium
Technical Field
The invention relates to the technical field of automatic driving, in particular to a three-dimensional reconstruction method, a three-dimensional reconstruction device, a three-dimensional reconstruction equipment and a storage medium for a traffic sign position.
Background
In the field of intelligent transportation, traffic signs play an important role in improving crossing traffic efficiency and safety and in application of intelligent transportation systems. But urban re-routing often results in the position data of traffic signs in the map not being in accordance with reality. Therefore, the position of the traffic sign can be reconstructed in three dimensions based on a binocular vision method to determine the accurate position of the traffic sign.
In the related art, a binocular vision-based method needs to use a plurality of cameras to reconstruct the position of a traffic sign in a common visual area, and the relative position relationship among the cameras needs to be calibrated in advance. However, the accuracy of the traffic sign position determined by three-dimensional reconstruction using the binocular vision based method needs to be improved.
Disclosure of Invention
The embodiments of the present specification aim to solve at least one of the technical problems in the related art to some extent. Therefore, the embodiment of the specification provides a three-dimensional reconstruction method and device for a traffic sign position, a computer device and a storage medium.
The embodiment of the specification provides a three-dimensional reconstruction method for a traffic sign position, which comprises the following steps:
determining a position frame of the target traffic sign in the continuous N frames of traffic images; the position frames in two adjacent frames of traffic images have a first number of pixels; n is a positive integer greater than or equal to 2;
determining a target associated border combination in the position borders in the N frames of traffic images based on the first number; the target associated frame combination comprises a plurality of target position frames corresponding to the same target traffic sign;
performing three-dimensional reconstruction according to the position information of the target position frame and shooting pose information corresponding to a target traffic image to which the target position frame belongs to obtain three-dimensional position information of the target traffic sign; and the shooting pose information is the pose information of the traffic object at the shooting time of the target traffic image.
The embodiment of the specification provides a three-dimensional reconstruction device of a traffic sign position, and the device comprises a device package;
the position frame determining module is used for determining the position frame of the target traffic sign in the continuous N frames of traffic images; the position frames in two adjacent frames of traffic images have a first number of pixels; n is a positive integer greater than or equal to 2;
an associated frame combination determining module, configured to determine a target associated frame combination in a position frame in the N frames of traffic images based on the first number; the target associated frame combination comprises a plurality of target position frames corresponding to the same target traffic sign;
the three-dimensional position information determining module is used for performing three-dimensional reconstruction according to the position information of the target position frame and shooting pose information corresponding to the target traffic image to which the target position frame belongs to obtain three-dimensional position information of the target traffic sign; and the shooting pose information is the pose information of the traffic object at the shooting time of the target traffic image.
The present specification embodiment provides a computer device, including: a memory, and one or more processors communicatively connected to the memory; the memory has stored therein instructions executable by the one or more processors to cause the one or more processors to implement the steps of the method of any one of the embodiments described above.
The present specification provides a computer readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implements the steps of the method according to any one of the above embodiments.
The present specification provides a computer program product, which includes instructions that, when executed by a processor of a computer device, enable the computer device to perform the steps of the method of any one of the above embodiments.
In the above-described embodiment, first, the position frame of the target traffic sign and the first number of pixels between the position frames in the adjacent two frames of traffic images are determined in the consecutive N frames of traffic images. Since the smaller the first number is, the greater the probability that two position frames corresponding to the first number correspond to the same target traffic object is, based on the first number, the target-associated frame combination is determined in the position frames in the N frames of traffic images. And further, performing three-dimensional reconstruction based on the position information of the target position frame in the target association frame combination and shooting pose information corresponding to the target traffic image to which the target position frame belongs to obtain the three-dimensional position information of the target traffic sign. The three-dimensional reconstruction is carried out on the plurality of two-dimensional positions of the target traffic sign by shooting the pose information, so that the three-dimensional position of the target traffic sign can be accurately determined, and the accuracy of the three-dimensional reconstruction result of the traffic sign position is improved.
Drawings
Fig. 1a is a schematic view of an application scenario of a three-dimensional reconstruction method for a traffic sign position in an embodiment of the present disclosure;
fig. 1b is a schematic flow chart of a three-dimensional reconstruction method of a traffic sign position in an embodiment of the present disclosure;
FIG. 1c is a schematic diagram of a position frame of a target traffic sign according to an embodiment of the present disclosure;
fig. 2a is a schematic flowchart of a three-dimensional reconstruction method for a traffic sign position in an embodiment of the present disclosure;
FIG. 2b is a diagram illustrating an initial associated bounding box combination determined in an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of a three-dimensional reconstruction method for a traffic sign position in an embodiment of the present disclosure;
fig. 4 is a schematic flowchart of a three-dimensional reconstruction method for a traffic sign position in an embodiment of the present disclosure;
fig. 5 is a schematic flowchart of a three-dimensional reconstruction method for a traffic sign position in an embodiment of the present disclosure;
fig. 6a is a schematic flowchart of a three-dimensional reconstruction method for a traffic sign position in an embodiment of the present disclosure;
FIG. 6b is a diagram illustrating position clustering in an embodiment of the present disclosure;
fig. 7 is a flowchart illustrating a method for three-dimensional reconstruction of a traffic sign position according to an embodiment of the present disclosure;
fig. 8 is a schematic flow chart of a three-dimensional reconstruction method of a traffic sign position in an embodiment of the present disclosure;
fig. 9 is a schematic diagram of a three-dimensional reconstruction device of a traffic sign position provided in an embodiment of the present specification.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative and intended to explain the present invention and should not be construed as limiting the present invention.
In the related art, since the position data of the traffic sign in the map often does not match the reality due to the urban road modification, the position data of the traffic sign in the map needs to be updated. In the related art, in a three-dimensional reconstruction algorithm of a traffic sign, driving data of a single vehicle is generally used, reconstruction accuracy of the traffic sign is related to a driving track of the single vehicle, and a reconstruction result is more accurate as a traffic label is closer to the driving track, but a certain error exists in a depth direction.
As mentioned in the background art, in some related technologies, a binocular-vision-based algorithm is used for three-dimensional reconstruction of a traffic sign, a plurality of cameras are required to be mounted in advance on a vehicle, the plurality of cameras participate in sensing, and the relative position relationship between the plurality of cameras needs to be calibrated in advance. Furthermore, the visual angle of the common visual area of the cameras is small, the non-common visual area cannot be reconstructed, and the characteristic points are sensitive to noise and illumination. In other related technologies, the accuracy is poor under the weather conditions of heavy rain, heavy fog and the like, a collection vehicle with high cost is required to be used for reconstruction, and the feasibility of large-scale crowd-sourcing reconstruction is low.
Based on this, the embodiments of the present specification provide a three-dimensional reconstruction method of a traffic sign position, first, determine a position frame of a target traffic sign in N consecutive frames of traffic images; secondly, determining a target associated frame combination in the position frames in the N frames of traffic images based on the number of pixels between the position frames in the two adjacent frames of traffic images; and finally, performing three-dimensional reconstruction according to the position information of the target position frame corresponding to the same target traffic sign in the target association frame combination and the shooting pose information corresponding to the target traffic image to which the target position frame belongs to obtain the three-dimensional position information of the target traffic sign. Wherein N is a positive integer greater than or equal to 2; the shooting pose information is the pose information of the traffic object at the shooting time of the target traffic image.
Referring to FIG. 1a, a traffic sign may be a traffic light of FIG. 1a and a traffic object may be a vehicle of FIG. 1 a. The three-dimensional reconstruction method for the traffic sign position provided in the embodiments of the present disclosure may be applied to the vehicle 110 in fig. 1a, or applied to other devices (such as the cloud server 120 and the mobile phone terminal 130 in fig. 1 a) having a function of controlling the vehicle. The traffic sign may be a traffic sign standing on the ground, such as an electronic eye, a speed limit sign, a warning sign, and a road name sign, or may be a traffic sign such as a lane line on the ground and a ground direction indication. The vehicle may be an autonomous vehicle, which may be a vehicle having a partial autonomous function, or a vehicle having a full autonomous function. That is, the level of the automatic driving of the vehicle may be classified into no automation (L0), driving support (L1), partial automation (L2), conditional automation (L3), highly automation (L4), or full automation (L5) with reference to the classification standard of the Society of Automotive Engineers (SAE). The vehicle or other device may implement the lane centerline detection method through its contained components, including hardware and software. It is to be understood that the vehicle may be any one of a car, a truck, a motorcycle, a bus, an amusement ride vehicle, a playground vehicle, construction equipment (such as a construction vehicle), a tram, a golf cart, a train, a cart, and the like, and the embodiments of the present specification are not particularly limited thereto.
The embodiment of the specification provides an example of a scene of a three-dimensional reconstruction method of a traffic sign position, so as to exemplarily explain how the embodiment of the specification performs three-dimensional reconstruction on a target traffic sign position. The target traffic sign can be a traffic light of an intersection C (the number of the traffic lights of the intersection C is more than or equal to 1), the traffic object can be a vehicle, and a fisheye camera is mounted on the vehicle. In a certain period of time, a plurality of vehicles pass through the intersection C, and a plurality of traffic lights (for example, recorded as a traffic light R, a traffic light G, and a traffic light Y) of the intersection C can be photographed by the fisheye camera, so that continuous N-frame traffic images are obtained. A target detection model can be deployed on a vehicle, traffic lights in N frames of traffic images are detected through the target detection model, and position frames of the traffic lights in each frame of traffic images are output. The vehicle can upload the detected position frame, the shooting pose information of the vehicle, the internal parameters of the fisheye camera and the external parameters of the fisheye camera to the cloud end of the vehicle communication connection.
In this scenario example, the cloud may analyze the received position frames to determine a target associated frame combination G1 corresponding to the traffic light R, a target associated frame combination G2 corresponding to the traffic light G, and a target associated frame combination G3 corresponding to the traffic light Y. Then, the cloud determines three-dimensional position information P1 of the traffic light R based on position information of a target position frame in the target association frame combination G1, shooting pose information of a vehicle, internal reference of a fisheye camera, and external reference of the fisheye camera by using the three-dimensional reconstruction method of the traffic sign position mentioned in any one of the above embodiments; determining three-dimensional position information P2 of the traffic light G based on position information of a target position frame in the target association frame combination G2, shooting pose information of a vehicle, internal parameters of a fisheye camera and external parameters of the fisheye camera; and determining three-dimensional position information P3 of the traffic light Y based on the position information of the target position frame in the target associated frame combination G3, the shooting pose information of the vehicle, the internal parameters of the fisheye camera and the external parameters of the fisheye camera.
Therefore, for vehicles passing through the intersection C, part of the vehicles can three-dimensionally reconstruct three-dimensional position information of the traffic lights R, G and Y, part of the vehicles can three-dimensionally reconstruct three-dimensional position information of the traffic lights R and G, part of the vehicles can three-dimensionally reconstruct three-dimensional position information of the traffic lights G and Y, part of the vehicles can three-dimensionally reconstruct three-dimensional position information of the traffic lights R, part of the vehicles can three-dimensionally reconstruct three-dimensional position information of the traffic lights G, and part of the vehicles can three-dimensionally reconstruct three-dimensional position information of the traffic lights Y. And collecting the three-dimensional position information of the traffic lights reconstructed by the vehicles passing through the crossroad C to obtain a set M of the three-dimensional position information of the traffic lights.
In the present scene example, the set M of the three-dimensional position information of the traffic lights is subjected to clustering processing, and a position cluster C1 corresponding to the traffic light R, a position cluster C2 corresponding to the traffic light G, and a position cluster C3 corresponding to the traffic light Y are obtained. It is understood that the three-dimensional position information in the position cluster C1 corresponds to the traffic light R. And the three-dimensional position information in the position clustering cluster C2 corresponds to the traffic light G. And the three-dimensional position information in the position clustering cluster C3 corresponds to the traffic light Y.
In the present scene example, for the position cluster C1, the three-dimensional position information in the position cluster C1 is subjected to average calculation to obtain the average three-dimensional position of the traffic light R. And optimizing the average three-dimensional position of the traffic light R by adopting a light beam adjustment method by taking the average three-dimensional position of the traffic light R as an initial position to obtain the three-dimensional position information of the traffic light R corresponding to the position cluster C1. Further, when the position of the traffic light R is reconstructed by using multiple vehicles, the external parameters of the fisheye camera are changed, so that the external parameters of the fisheye camera can be optimized simultaneously in the process of optimizing the average three-dimensional position of the traffic light R by using a beam adjustment method to obtain the optimized external parameters.
In this scene example, for the position cluster C2, average calculation is performed on the three-dimensional position information in the position cluster C2 to obtain an average three-dimensional position of the traffic light G. And taking the average three-dimensional position of the traffic light G as an initial position, and optimizing the average three-dimensional position of the traffic light G by adopting a beam adjustment method to obtain the three-dimensional position information of the traffic light G corresponding to the position cluster C2. Further, when the position of the traffic light G is reconstructed by multiple vehicles, the external parameters of the fisheye camera are changed, so that the external parameters of the fisheye camera can be optimized simultaneously in the process of optimizing the average three-dimensional position of the traffic light G by using a light beam adjustment method to obtain the optimized external parameters.
In this scene example, for the position cluster C3, the three-dimensional position information in the position cluster C3 is averaged to obtain the average three-dimensional position of the traffic light Y. And taking the average three-dimensional position of the traffic light Y as an initial position, and optimizing the average three-dimensional position of the traffic light Y by adopting a light beam adjustment method to obtain the three-dimensional position information of the traffic light Y corresponding to the position cluster C3. Further, when the position of the traffic light Y is reconstructed by multiple vehicles, the external parameters of the fisheye camera are changed, so that the external parameters of the fisheye camera can be optimized simultaneously in the process of optimizing the average three-dimensional position of the traffic light Y by using a light beam adjustment method to obtain the optimized external parameters.
In the above scenario example, first, for a single vehicle, a target associated frame combination corresponding to a traffic light is determined in N consecutive frames of traffic images. Performing three-dimensional reconstruction based on the position information of the target position frame in the target association frame combination and shooting pose information corresponding to the target traffic image to which the target position frame belongs to obtain three-dimensional position information of the traffic light; secondly, clustering is carried out by utilizing the three-dimensional position information of the traffic lights reconstructed by a plurality of vehicles, so as to cluster the three-dimensional position information corresponding to the same traffic light, optimize the traffic light position by utilizing the reconstruction results of a plurality of vehicles, and optimize the external parameters of the fisheye camera.
Referring to fig. 1b, the method for three-dimensionally reconstructing a traffic sign position according to an embodiment of the present disclosure may include:
and S110, determining a position frame of the target traffic sign in the continuous N frames of traffic images.
The position frames in two adjacent frames of traffic images have a first number of pixels; n is a positive integer greater than or equal to 2. When the vehicle runs on a road, surrounding traffic signs can be shot through an image acquisition device on the vehicle to obtain a traffic image, and the traffic signs in the traffic image comprise at least one. Since the positions of at least part of the traffic signs may change due to the urban road rerouting, it is necessary to determine a target traffic sign in the traffic signs and reconstruct a new position of the target traffic sign after the urban road rerouting. For example, the target traffic sign may be a traffic light, as traffic light data is crucial for vehicle driving. For example, the electronic eye is also important for vehicle driving, and thus the target traffic sign may be the electronic eye. It should be noted that the image capturing device may be a fisheye camera.
In some embodiments, a target detection model may be deployed on a vehicle, and N consecutive frames of traffic images are input to the target detection model, and the target detection model performs target detection on a target traffic sign in the traffic image to obtain position information of the target traffic sign in the traffic image. The location information for the target traffic sign may be represented as a location frame with a certain number of pixels distributed within the location frame. Referring to fig. 1c, the position frame may be a rectangular frame of the traffic image framing the target traffic sign (traffic light in fig. 1 c). For example, the target detection model may output a location frame of the target traffic sign. The location border may be used to mark an area in the traffic image where the target traffic sign is located, such as by framing the target traffic sign in the traffic image with a rectangular frame. The position frame may be a bounding box (bounding box) determined when the target traffic sign is detected by the target detection model. The target detection model may employ a deep learning network. The target detection model may also adopt an SSD (Single Shot multi box Detector) model.
In still other embodiments, the target detection model may be deployed on a cloud server communicatively connected to the vehicle, and the vehicle uploads the acquired traffic images to the cloud server, and the cloud server determines N consecutive traffic images from the received traffic images. And outputting the continuous N frames of traffic images to a target detection model to obtain a position frame of the target traffic sign in the continuous N frames of traffic images. And the target detection model is deployed at the cloud, so that the consumption of vehicle-side resources is reduced.
In the embodiment, the traffic object can acquire images of surrounding target traffic signs in the driving process to obtain a series of real traffic images. Since the time interval between two adjacent frames of traffic images is very short (for example, 0.1s or 0.2 s), in the embodiment, the position frames in the two adjacent frames of traffic objects are directly used for comparison to determine the pixel distance between the position frames in the two adjacent frames of traffic images, that is, the first number of pixels. As can be seen, the first number may be used to represent a pixel distance between location borders in two adjacent frames of traffic images.
Specifically, a convolutional network model is trained based on a deep neural network, the traffic image shot by an image acquisition module is input into the trained convolutional network model, and a position frame of the target traffic sign in the traffic image and coordinates of the position frame can be detected and obtained by one-time forward propagation. Furthermore, a plurality of pixel points are distributed in each frame of traffic image, and the detected coordinates of the position frame can be regarded as the coordinates of the pixel points at a certain vertex of the position frame. Illustratively, P and Q are pixels corresponding to a vertex of a position frame in two adjacent images. (x) p ,y P ) Is the P coordinate of the pixel point, (x) q ,y q ) And the coordinates of the pixel points Q are obtained. Further, according to Euclidean distance
Figure BDA0003948624850000071
And calculating the pixel distance of two adjacent frames, and recording the pixel distance as a first number.
And S120, determining a target associated frame combination in the position frames in the N frames of traffic images based on the first number.
The target related frame combination comprises a plurality of target position frames corresponding to the same target traffic sign. Specifically, in the N frames of traffic images, for any frame of traffic image P, which may include M target traffic signs, there may be M (greater than or equal to 1) position borders in each frame of traffic image. However, for any one of the M target traffic signs T, the target traffic sign T does not appear in the N frames of images due to objective factors (occlusion caused by other traffic objects, angle of shooting of the traffic object, etc.) in the shooting process. Illustratively, the any target traffic sign T appears in the first frame traffic image and the third frame traffic image, that is, the any target traffic sign T is occluded when the second frame traffic image is captured, and the any target traffic sign T is exposed when the second frame traffic image is captured. Illustratively, the any target traffic sign T appears in the fourth frame traffic image and the fifth frame traffic image, that is, when the first frame traffic image, the second frame traffic image and the third frame traffic image are taken, the any target traffic sign T is occluded until the fourth frame traffic image is taken, the any target traffic sign T is exposed again, and the any target traffic sign T belongs to a newly seen target traffic sign. It can be known that some of the N traffic images correspond to the same target traffic sign. Therefore, it is necessary to find a target position frame that may correspond to the same target traffic sign from the position frames in the N frames of traffic images, and the target position frames corresponding to the same target traffic sign may form a target associated frame combination.
In some cases, the target associated frame combination is determined based on the first number of pixels between position frames in the N frames of traffic images, since the position frames with closer pixel distances are more likely to correspond to the same target traffic sign, and the first number is used to represent the pixel distance between the position frames in the two adjacent frames of traffic images. In particular, in some embodiments, comparing the magnitudes of the first number of pixels between the position frames, two position frames corresponding to the determined smaller first number may constitute the target associated frame combination. For example, the first numbers are sorted from large to small, the last first numbers are determined, and the two position frames corresponding to the last first numbers are determined as a plurality of target associated frame combinations.
And S130, performing three-dimensional reconstruction according to the position information of the target position frame and shooting pose information corresponding to the target traffic image to which the target position frame belongs to obtain three-dimensional position information of the target traffic sign.
The shooting pose information is the pose information of the traffic object at the shooting time of the target traffic image. The pose information may include the position and orientation angle at which the traffic object was located when the image was captured. The three-dimensional reconstruction may be to recover the three-dimensional position of the traffic sign from several two-dimensional images of the traffic sign. Further, the three-dimensional reconstruction can also render the traffic signs, and finally, the virtual reality of the objective world is expressed in the computer equipment.
Specifically, as before, the target associated frame combination has been determined in accordance with the positional frame based on the first number in the N-frame traffic images. The target associated frame combination comprises more than or equal to 2 target position frames. The position of the target position frame in the target traffic image is known, and shooting pose information when the target traffic image is shot is known, so that three-dimensional reconstruction can be performed based on a multi-frame triangulation principle, and three-dimensional position information of the target traffic sign is obtained.
It will be appreciated that multi-frame triangulation is the 2D observation of u, v in known multi-frame images]A camera internal reference matrix K, and a camera pose [ R ] of each observation image frame ciw ,t ciw ](from a world coordinate system w to a camera coordinate system ci, the camera coordinate system is right-down-front, namely, a connecting line of an optical center and an image center is a Z axis, the width direction of an image is an X axis, and the height direction of the image is a Y axis) to solve the position p of the 3D point in the world coordinate system, wherein n of a plurality of frames is more than or equal to 2. Specifically, in the present embodiment, at least two target position frames corresponding to the same target traffic sign may be determined in the adjacent N frames of traffic images based on the first number. The position information of the ith target position frame is recorded as [ ui, vi ]](ii) a Recording shooting pose information corresponding to the target traffic image to which the ith target position frame belongs as P; based on the location information of the ith target location frame ui, vi]And corresponding shooting position informationAnd (4) solving by combining a multi-frame triangulation principle to obtain the three-dimensional position information of the target traffic sign.
It should be noted that the three-dimensional reconstruction method for the traffic sign position may be operated in the traffic object, and may also be operated in a cloud server in communication connection with the traffic object. If the method runs in the cloud server, the consumption of vehicle-side resources can be reduced.
In the three-dimensional reconstruction method for the traffic sign position, firstly, the position frame of the target traffic sign and the first number of pixels between the position frames in two adjacent frames of traffic images are determined in N continuous frames of traffic images. Since the smaller the first number is, the greater the possibility that two position frames corresponding to the first number correspond to the same target traffic object is, based on the first number, the target-associated frame combination is determined in the position frames in the N frames of traffic images. And further, performing three-dimensional reconstruction based on the position information of the target position frame in the target association frame combination and shooting pose information corresponding to the target traffic image to which the target position frame belongs to obtain the three-dimensional position information of the target traffic sign. The three-dimensional reconstruction is carried out on the plurality of two-dimensional positions of the target traffic sign by shooting the pose information, so that the three-dimensional position of the target traffic sign can be accurately determined, and the accuracy of the three-dimensional reconstruction result of the traffic sign position is improved.
In some embodiments, referring to fig. 2a, determining the target associated frame combination in the position frames in the N frames of traffic images based on the first number may include the following steps:
s210, determining two position borders corresponding to the target number smaller than the threshold value of the pixel number in the first number as initial associated border combinations.
The initial associated frame combination has the confidence that two position frames included in the initial associated frame combination correspond to the same target traffic sign.
Wherein, the threshold value of the pixel quantity is used for judging the initial associated frame combination. The threshold number of pixels may be set in conjunction with the distance traveled by the traffic object in the adjacent frames. The pixel quantity threshold may be set in conjunction with the area of the position frame in the adjacent frame traffic image. The pixel quantity threshold may be set in conjunction with the quantity of pixels within the location frame in the traffic image. The confidence coefficient is used for representing the probability that the position frames in the two adjacent frames of traffic images are the same target traffic sign, and can be set by combining the pixel distance of the position frames in the two adjacent frames of traffic images. Specifically, a first number of pixels are arranged between position frames of adjacent frames, and a pixel number threshold is set according to actual conditions. And comparing the first quantity with a pixel quantity threshold, and if the first quantity is smaller than a quantity preset threshold, indicating that the position frames in the adjacent frame traffic images have a high probability of being the position frames corresponding to the same traffic sign in reality, wherein the position frames corresponding to the same traffic sign and the position frames corresponding to the same traffic sign can form an initial associated frame combination.
In practical situations, two adjacent frames of traffic images are respectively marked as a previous frame of traffic image and a current frame of traffic image. The position frame in the previous frame of traffic image at most corresponds to one position frame in the current frame of traffic image. Moreover, the position frame in the current frame traffic image at most corresponds to one position frame in the previous frame traffic image. The position frames in the current frame traffic image in the initial associated frame combination correspond to more than two position frames in the previous frame traffic image, and the position frames in the previous frame traffic image in the determined initial associated frame combination also correspond to a plurality of position frames in the current frame traffic image, which is obviously inconsistent with the actual situation, so that a confidence coefficient is set for the initial associated frame combination, and the confidence coefficient represents the probability that the two position frames included in the initial associated frame combination correspond to the same target traffic sign. Further, the confidence of the initial associated bounding box combination can be used to filter the initial associated bounding box combination to determine a target associated bounding box combination.
S220, in the initial associated frame combination, determining a target associated frame combination based on the confidence of the initial associated frame combination.
For example, referring to fig. 2b, the solid line and the dashed line in fig. 2b represent that the first number of pixels between the two position frames at the two ends of the line is smaller than the pixel number threshold. In fig. 2B, a, B, C, and D represent the frames of the four detected target traffic signs in the previous frame of traffic image, E, F, G, H, and I represent the frames of the five detected traffic signs in the current frame of traffic image, and the frames of the two frames of traffic images with connecting lines can be regarded as an initial associated frame combination. For example, the position frames a and E constitute an initial associated frame combination, the position frames a and F constitute an initial associated frame combination, the position frames B and H constitute an initial associated frame combination, the position frames C and E constitute an initial associated frame combination, the position frames D and G constitute an initial associated frame combination, and the position frames D and I constitute an initial associated frame combination. The confidence level is used to determine a target associated bounding box combination at the initial associated bounding box combination. With continued reference to FIG. 2b, the two position frames connected by the solid line can form the target associated frame combination.
Specifically, there are one-to-many or many-to-one cases in the initial associated frame combination, but the initial associated frame combination is any one of one-to-one, one-to-zero, and zero-to-one based on actual conditions. Therefore, based on the confidence of the initial associated frame combination, the initial associated frame combination is screened to obtain a target associated frame combination. For example, with reference to fig. 2b, the position frame a in the previous frame of traffic image corresponds to the position frames E and F in the current frame of traffic image. And comparing the confidence coefficient between the position frames A and E with the confidence coefficient between the position frames A and F, and deleting the initial associated frame combination formed by the position frames A and F and keeping the initial associated frame combination formed by the position frames A and E if the confidence coefficient between the position frames A and E is greater than the confidence coefficient between the position frames A and F. Further, comparing the confidence between the position frames a and E with the confidence between the position frames E and C, deleting the initial associated frame combination formed by the position frames E and C, and reserving the initial associated frame combination formed by the position frames a and E. And by analogy, deleting the initial associated frame combination formed by the position frames B and H, deleting the initial associated frame combination formed by the position frames D and I, keeping the initial associated frame combination formed by the position frames B and F, and keeping the initial associated frame combination formed by the position frames D and G. It can be understood that the target traffic sign in the position frame C is occluded in the current frame, and the target traffic signs in the position frames H and I are new target traffic signs captured in the current frame.
According to the three-dimensional reconstruction method of the traffic sign position, an initial association frame combination is firstly constructed in position frames in two connected traffic images, then the initial association frame combination is filtered based on the confidence coefficient of the initial association frame combination to obtain a target association frame combination, and a target position frame corresponding to the same target traffic sign is determined in the detected position frame as much as possible, so that the three-dimensional position of the target traffic sign can be accurately reconstructed, and the position of the target traffic sign can be accurately displayed in a map.
In some embodiments, the confidence is determined by the number of pixels between the two location bounding boxes that the initial associated bounding box combination includes.
Specifically, two position frames in the initial associated frame combination have corresponding pixel coordinates according to the Euclidean distance
Figure BDA0003948624850000101
The pixel distance between the two position bounding boxes is calculated, and the reciprocal of the pixel distance can be used as the confidence of the initial associated bounding box combination. Further, in some embodiments, an adjustment coefficient may also be set, and the confidence of the initial associated bounding box combination may be determined based on the adjustment coefficient and the inverse of the pixel distance.
In the three-dimensional reconstruction method for the traffic sign position, the confidence coefficient of the initial association frame combination is set by utilizing the number of pixels between two position frames included in the initial association frame combination, so that the possibility that the two position frames in the initial association frame combination correspond to the same target traffic sign can be quantitatively expressed, and a data basis is provided for accurately reconstructing the three-dimensional position of the target traffic marking.
In some embodiments, the initial associated bounding box combination includes two location bounding boxes each having a second number of pixels and a third number of pixels; the pixel number threshold is determined by the number of pixels in the frame of the designated position; and the designated position frame is designated in two position frames included in the initial associated frame combination according to the comparison result of the second number and the third number.
Specifically, each position frame has a plurality of pixels, and the two position frames of the initial associated frame combination have the number of pixels corresponding to the position frames, wherein the number of pixels included in the position frame in the previous frame of traffic image is recorded as a second number, and the number of pixels included in the position frame in the current frame of traffic image is recorded as a third number. The second number is compared with the third number, and a pixel number threshold is determined based on the smaller value of the second number and the third number. For example, if the second number is smaller than the third number, the position frame in the previous frame of traffic image is determined as the designated position frame. And if the third quantity is smaller than the second quantity, determining the position frame in the current frame traffic image as the appointed position frame.
In the three-dimensional reconstruction method for the traffic sign position, the pixel quantity threshold is set according to the pixel quantity in the two position frames of the two adjacent frames of traffic images, and whether the two position frames possibly correspond to the same target traffic position or not can be further determined by using the pixel quantity threshold, so that the position frames possibly corresponding to the same target traffic sign can be accurately searched in the position frames of the two adjacent frames of traffic images.
In some embodiments, referring to fig. 3, two position frames included in the initial association frame combination are respectively marked as a first position frame and a second position frame having an initial connection relationship, where the first position frame belongs to the first traffic image and the second position frame belongs to the second traffic image. In the initial associated frame combination, determining a target associated frame combination based on the confidence of the initial associated frame combination may include the following steps:
and S310, if any position frame in the first traffic image and the position frames in the second traffic image have the initial connection relationship, determining a position frame which has a target connection relationship with any position frame according to the confidence of the initial associated frame combination in the position frames in the second traffic image.
S320, determining any position frame and a position frame which has a target connection relation with the position frame as a target associated frame combination.
The first traffic image may be a previous traffic image in two adjacent traffic images, and the second traffic image may be a current traffic image in the two adjacent traffic images. The position frames in the first traffic image and the second traffic image have one-to-one, one-to-many, and one-to-zero conditions. Referring to fig. 2B, the position frames a, B, C, D can be considered as belonging to the first traffic image, the position frames E, F, G, H, I can be considered as belonging to the second traffic image, the position frames a, B, D are in a one-to-many case, and the position frame C is in a one-to-one case.
Specifically, in the traffic images of two adjacent frames, each frame has n (n is greater than or equal to 1) position frames, the position frame in the first traffic image is taken and recorded as a position frame X, and the pixel distance between the position frame and any one position frame (recorded as a position frame Y) in the second traffic image is determined. If the number of pixels corresponding to the pixel distance between the position frame X and the position frame Y is smaller than the threshold value of the number of pixels, it can be determined that the position frame X and the position frame Y in the second traffic image have an initial connection relationship, the above process is repeated until all the position frames in the first traffic image are accessed, and an initial association frame combination is constructed based on the two position frames having the initial connection relationship. The initial association frame combination corresponds to the confidence degree, and any position frame in the first traffic image corresponds to at most one position frame in the second traffic image, so that when the any position frame in the first traffic image and the position frames in the second traffic image have the initial connection relation, a target connection relation is determined in the initial connection relations corresponding to the any position frame in the first traffic image according to the confidence degree of the initial association frame combination, and the position frame having the target connection relation with the any position frame is obtained. Further, any position frame and a position frame having a target connection relation with any position frame are determined as a target associated frame combination.
For example, please continue to refer to fig. 2b, an initial connection relationship exists between the position frame a in the previous frame of traffic image and the position frame E in the current frame of traffic image; an initial connection relationship is formed between the position frame A in the previous frame of traffic image and the position frame E in the current frame of traffic image. And comparing the confidence coefficient between the position frames A and E with the confidence coefficient between the position frames A and F, if the confidence coefficient between the position frames A and E is greater than the confidence coefficient between the position frames A and F, determining that a target connection relation exists between the position frame A and the position frame E, and determining the position frame A and the position frame E as a target position frame in the target association frame combination.
In the three-dimensional reconstruction method of the traffic sign position, an initial connection relation is constructed in position frames in two connected traffic images, then the initial connection relation is filtered based on the confidence degree corresponding to the initial connection relation to obtain a target connection relation, and a target position frame corresponding to the same target traffic sign is determined in the detected position frame as much as possible, so that the three-dimensional position of the target traffic sign can be accurately reconstructed, and the position of the target traffic sign can be accurately displayed in a map.
In some embodiments, determining, among the plurality of location frames in the second traffic image, a location frame having a target connection relationship with any one of the location frames according to the confidence of the initial associated frame combination includes: and taking the position frame in the second traffic image corresponding to the maximum confidence coefficient as a position frame having a target connection relation with any position frame.
The maximum confidence is used for determining the confidence of the two position frames with the maximum association degree between the first traffic image and the second traffic image. Specifically, any position frame in the first traffic image and the plurality of position frames in the second traffic image have an initial connection relationship, that is, any position frame in the first traffic image and the plurality of position frames in the second traffic image form a plurality of initial associated frame combinations. And each initial associated frame combination has a confidence coefficient, and the confidence coefficients of the initial associated frame combinations are compared to determine the highest confidence coefficient. The highest confidence corresponds to two position frames, wherein the position frame corresponding to the highest confidence in the second traffic image can be identified as the position frame having the target connection relation with any position frame in the first traffic image. Further, any position frame and a position frame having a target connection relation with any position frame are determined as a target associated frame combination.
Illustratively, there is an initial connection relationship between the position frame B1 in the first traffic image and the position frames B2, B3, B4 in the second traffic image. And comparing the confidence degrees between the position frames B1 and B2, the confidence degrees between the position frames B1 and B3 and the confidence degrees between the position frames B1 and B4, if the confidence degrees between the position frames B1 and B4 are greater than the confidence degrees between the position frames B1 and B2 and the confidence degrees between the position frames B1 and B3, determining that a target connection relation exists between the position frames B1 and B4, and determining the position frames B1 and B4 as target position frames in the target association frame combination.
In the three-dimensional reconstruction method for the traffic sign position, if any position frame in the first traffic image and the plurality of position frames in the second traffic image have the initial connection relations, the target connection relations are determined in the plurality of initial connection relations through the maximum confidence, the position frames corresponding to the same target traffic sign are determined as correctly as possible, and the reconstruction of the three-dimensional position of the target traffic sign is facilitated.
In some embodiments, referring to fig. 4, the target traffic image is obtained by capturing a target traffic sign through an image capturing device installed on the traffic object. The method for reconstructing the three-dimensional position of the target traffic sign comprises the following steps of:
and S410, triangulating on the basis of the position information of the frame of the target position and the shooting pose information to obtain an estimated three-dimensional position of the target traffic sign.
The triangularization calculation can be that an image acquisition device (such as a camera) installed on a traffic object is observed by the target traffic sign, an observation ray starting from the center of the camera in a 3D space can be obtained according to a shooting pose and an observation vector, multiple shooting pose observations can generate multiple observation rays, the observation rays intersect at one point in the space under an ideal condition, and the intersection point is the estimated three-dimensional position of the target traffic sign.
Specifically, the shot traffic image is subjected to target detection through a convolutional network model to obtain two-dimensional position information [ u ] of a position frame i ,v i ]According to the detected position frame [ u ] i ,v i ]Temporal shooting pose information P i A matrix D may be constructed and then a solution of DY =0 may be calculated by the SVD algorithm, the value being the estimated three-dimensional position of the target traffic sign, where Y is the homogeneous coordinate of the estimated three-dimensional position of the target traffic sign.
Figure BDA0003948624850000131
And S420, optimizing the estimated three-dimensional position according to the internal reference, the external reference, the estimated three-dimensional position and the shooting pose information of the image acquisition device by taking the minimum reprojection error as a criterion to obtain the three-dimensional position information of the target traffic sign.
The image acquisition device can be a camera, the internal parameters are the focal length and the position of the center point of the camera in the image, and the external parameters are the position and the orientation of the camera in the coordinate system of the traffic object. Exemplarily, the traffic object is a vehicle, and the external parameters are the position and orientation of the camera in the vehicle coordinate system.
In some cases, the traffic light initial position calculated by triangulation is not accurate due to viewing angle, perception deviation, image noise, and the like. Therefore, the estimated three-dimensional position of the target traffic sign is optimized by adopting a beam adjustment method, and the estimated three-dimensional position of the target traffic sign is taken as an initial position. Specifically, secondary projection is carried out according to the estimated three-dimensional position and shooting pose information of the target traffic sign obtained by triangularization calculation to obtain the estimated three-dimensional positionThe projection position can be written as: k (Ex P) -1 * p), wherein p represents the estimated three-dimensional position of the target traffic sign; p represents shooting pose information; k represents camera internal parameters; ex represents the camera external reference.
Due to the reasons of visual angle, perception deviation, image noise and the like, an error exists between the projection position and the two-dimensional position information of the position frame detected in the image shot by the shooting pose information P, and a cost function is constructed according to the projection position of the secondary projection and the two-dimensional position information of the position frame:
Figure BDA0003948624850000141
minimizing the cost function enables the position frames detected in the secondary projection and the shot image to be as close as possible to coincide, enables the estimated three-dimensional position of the target traffic sign to be more fit with the position of the target traffic sign in reality, and improves the accuracy of the reconstruction of the three-dimensional position of the target traffic sign.
The three-dimensional position optimization formula is as follows:
Figure BDA0003948624850000142
the method comprises the following steps of calculating the position of a target traffic sign in a three-dimensional space, wherein K represents camera internal parameters, ex represents camera external parameters, n represents that the target traffic sign detects a position frame in n traffic images, ui represents the position of the ith position frame, P on the left side of an equation represents the three-dimensional position coordinates of the optimized target traffic sign, P on the right side of the equation represents the estimated three-dimensional position of the target traffic sign obtained through triangularization calculation, and P represents shooting pose information.
And then, solving a three-dimensional position coordinate p with the minimum reprojection error through a Levenberg-Marquardt algorithm, namely the three-dimensional position coordinate of the optimized target traffic sign. The Levenberg-Marquardt algorithm is one of the optimization algorithms, and chinese is the Levenberg-Marquardt method, an iterative algorithm that can be used to solve the least squares problem.
According to the three-dimensional reconstruction method of the traffic sign position, firstly, the estimated three-dimensional coordinate of the target traffic sign is obtained through the triangulation principle, and then, certain errors possibly exist in the estimated three-dimensional coordinate of the target traffic sign due to the visual angle, the perception deviation, the image noise and the like, so that the estimated three-dimensional coordinate is optimized through the reprojection errors, and the accuracy of the three-dimensional reconstruction result of the target traffic sign position is improved.
In some embodiments, referring to fig. 5, a method for three-dimensional reconstruction of a traffic sign position may include the steps of:
s510, clustering is carried out on the basis of three-dimensional position information of the target traffic sign obtained by three-dimensional reconstruction of a plurality of traffic objects, and a plurality of position clustering clusters are obtained.
S520, determining the three-dimensional position information of the target traffic sign corresponding to any position cluster according to the three-dimensional position information in any position cluster.
And the three-dimensional position information in the position clustering cluster corresponds to the same target traffic sign. The position clustering cluster is a set of three-dimensional position information of the target traffic sign according with the condition of a clustering algorithm.
In some cases, there may be several traffic objects around the target traffic sign, each capable of image capture of the target traffic sign. For each traffic object, the three-dimensional position information of the single target traffic sign can be determined by adopting the three-dimensional reconstruction method for the traffic sign position mentioned in any one of the above embodiments. Therefore, the three-dimensional position information of the target traffic sign obtained by three-dimensional reconstruction of a plurality of traffic objects forms a three-dimensional position information set. The three-dimensional position information set comprises three-dimensional position information of a plurality of target traffic signs. Further, the three-dimensional position information set may be clustered, the three-dimensional position information corresponding to the same target traffic sign may be categorized together, and the three-dimensional position information determined for a single traffic object may be optimized using the three-dimensional position information corresponding to the same target traffic sign.
In some embodiments, since the three-dimensional position information of the target traffic sign obtained by three-dimensional reconstruction of a single traffic object is known, and each traffic object can determine the three-dimensional position information of at least one target traffic sign, each traffic object can upload the three-dimensional position information of at least one target traffic sign, in order to identify which three-dimensional position information corresponds to the same target traffic sign, clustering processing can be performed based on the three-dimensional position information of the target traffic sign obtained by three-dimensional reconstruction of a plurality of traffic objects to obtain a plurality of position clustering clusters, and the three-dimensional position information in the position clustering clusters corresponds to the same target traffic sign. Further, for any position cluster, the position of the target traffic sign corresponding to the any position cluster can be optimized by using the three-dimensional position information in the any position cluster, so as to obtain the three-dimensional position information of the target traffic sign corresponding to the any position cluster.
According to the three-dimensional reconstruction method for the traffic sign position, the position cluster is obtained by clustering the three-dimensional position information of the target traffic sign obtained by three-dimensional reconstruction of a plurality of traffic objects, and the position of the target traffic sign can be further optimized by using the three-dimensional position information in the position cluster, so that the precision of the three-dimensional reconstruction result of the position of the target sign is improved.
In other embodiments, since the three-dimensional position information of the target traffic sign obtained by three-dimensional reconstruction of a single traffic object is known, each traffic object can upload the three-dimensional position information for the same target traffic sign to the cloud, and the cloud receives the three-dimensional position information for the same target traffic sign, so that the cloud can optimize the position of the same target traffic sign based on the three-dimensional position information for the same target traffic sign. For example, averaging the information of the multiple three-dimensional positions of the same target traffic sign to obtain an average three-dimensional position of the same target traffic sign; and further optimizing the average three-dimensional position according to the internal reference, the external reference, the average three-dimensional position and the shooting pose information of the image acquisition device by adopting a light beam adjustment method to obtain the optimized three-dimensional position information of the same target traffic sign.
In some embodiments, referring to fig. 6a, clustering is performed based on three-dimensional position information of a target traffic sign obtained by three-dimensional reconstruction of a plurality of traffic objects to obtain a plurality of position cluster clusters, including the following steps:
s610, any three-dimensional position information is determined as a data object point in the three-dimensional position information of the target traffic sign obtained through three-dimensional reconstruction of the plurality of traffic objects.
S620, clustering is carried out by taking the data object points as core points according to the preset neighborhood radius and the preset quantity threshold of the target traffic signs in the preset neighborhood radius, and a plurality of position clustering clusters are obtained.
The data object point may be any one of three-dimensional position information of the traffic sign. The core point may be a point where the number of sample points within the preset neighborhood radius is greater than or equal to a preset number threshold. The preset number threshold may be the number of three-dimensional positions of the traffic sign at least included in the preset neighborhood radius. A position cluster is a collection of data object points for which the core point density is reachable. Illustratively, clustering may be performed using a DBSCAN Clustering algorithm, DBSCAN (Density-Based Spatial Clustering of Applications with Noise, density-Based Clustering method) being a Density-Based Spatial Clustering algorithm. There are two important parameters in the DBSCAN algorithm: eps and MmPtS. Eps is the neighborhood radius when defining the density and MmPTs is the threshold when defining the core point.
Specifically, the three-dimensional position information of a target traffic sign is selected from three-dimensional reconstruction results of a plurality of traffic objects as a data object point, if the data object point has data object points which are larger than or equal to a preset quantity threshold value in a preset neighborhood radius, the selected data object point is a core point, all data object points with the density reaching from the core point are found out to form a cluster, if the selected data object point is an edge point, another data object point is selected (wherein the edge point is the data object point with the quantity of three-dimensional positions of the traffic sign in the preset neighborhood radius of the data object point smaller than the preset quantity threshold value), the process is repeated until all the points are processed, and a plurality of position cluster clusters can be obtained. Referring to fig. 6b, fig. 6b shows the result of the clustering process, which results in 6-position clusters. The three-dimensional position information in each position cluster shown in fig. 6b corresponds to the same target traffic sign. Illustratively, the target traffic sign may be a traffic light, the preset neighborhood radius may be 1.5 meters, and the preset number threshold of the target traffic signs within the preset neighborhood radius may be 20. And selecting the reconstruction result of any one traffic light from the reconstruction results of all the traffic lights as a core point, if at least 20 traffic lights of the single vehicles are within 1.5 m of the radius of the traffic lights, considering that the traffic lights reconstructed by all the single vehicles are the same traffic light in the real world, repeating the neighborhood of the traffic light reconstructed by each adjacent single vehicle through recursion, then randomly selecting other reconstruction results of the single vehicles which are not contained, and repeating the process until all the reconstruction results of the single vehicles are visited.
According to the three-dimensional reconstruction method for the traffic sign position, the core points are selected, the preset neighborhood radius and the preset quantity threshold are combined to perform clustering processing on the three-dimensional position information of the target traffic sign obtained by three-dimensional reconstruction of the plurality of traffic objects, the position clustering cluster is obtained, the distribution characteristics of the traffic objects are met, the accuracy of the clustering result is high, and therefore the accuracy of the three-dimensional reconstruction result of the target sign position can be improved.
In some embodiments, referring to fig. 7, determining the three-dimensional position information of the target traffic sign corresponding to any one of the position cluster clusters according to the three-dimensional position information in any one of the position cluster clusters may include the following steps:
s710, carrying out average calculation on the three-dimensional position information in any position cluster to obtain the average three-dimensional position of the target traffic sign.
S720, optimizing the average three-dimensional position by using the average three-dimensional position of the target traffic sign as an initial position and adopting a light beam adjustment method to obtain the three-dimensional position information of the target traffic sign corresponding to any position cluster.
The method comprises the steps of combining a plurality of images shot from different visual angles to describe the same scene by a beam Adjustment method (Bundle Adjustment), and extracting the 3D point coordinates and the relative motion parameters of the scene structure and the optical parameters of a camera according to the projection of all points in the images as a standard. The beam-balancing method can be based on optimization problems of 3D structure and view parameters (i.e. camera position, orientation, intrinsic calibration and radial distortion) to obtain the best three-dimensional position and parameters of the image acquisition device. By providing an initial estimate, the beam-balancing method refines both the three-dimensional position information of the target traffic sign and the external parameters of the image capture device by minimizing the projection error between the observed and predicted image points.
Specifically, in order to reduce reconstruction errors caused by different movement tracks of traffic objects, after a reconstruction result of the same target traffic sign in the real world is obtained through clustering processing, average calculation is carried out on three-dimensional position information in any position clustering cluster to obtain an average three-dimensional position of the target traffic sign, the average three-dimensional position of the target traffic sign is used as an initial position, and the initial position is optimized through a light beam adjustment method to obtain final three-dimensional position information.
In the embodiment, the average three-dimensional position can be optimized according to the internal reference, the external reference, the average three-dimensional position and the shooting pose information of the image acquisition device by taking the minimum reprojection error as a criterion, so as to obtain the three-dimensional position information of the target traffic sign corresponding to any position cluster. In particular, the average three-dimensional position can likewise be optimized using the Levenberg-Marquardt algorithm. As described above, the obtained average three-dimensional position and shooting pose are subjected to secondary projection, and an error exists between the projection and a position frame, so that a cost function is constructed according to the information of the position frame detected in the images shot by the secondary projection and the shooting pose, the cost function is minimized, the position frames detected in the secondary projection and the shot images tend to coincide infinitely, the three-dimensional position information is more fit to the position of the traffic sign in reality, and details are not repeated here.
In the three-dimensional reconstruction method for the traffic sign position, the average three-dimensional position of the traffic sign is obtained through calculation, the average three-dimensional position is optimized through a light beam adjustment method, the three-dimensional reconstruction result of the target traffic sign can be optimized based on multiple traffic objects, and reconstruction errors caused by the motion trail of the traffic objects are reduced.
In some embodiments, the external parameter of the image capturing device is optimized by using a beam adjustment method to obtain the optimized external parameter.
The image capturing device may be a device installed on a traffic object for capturing an image of a traffic sign, for example, the image capturing device may be a fisheye camera. The external parameter of the image capturing device may be the position and orientation of the image capturing device in the vehicle coordinate system, which is generally considered to be fixed, but since the position and orientation of the image capturing device in the vehicle coordinate system may change due to aging of the traffic object or due to loosening of the image capturing device caused by excessive braking, the external parameter value is considered to be constant and not accurate enough, and thus the external parameter of the image capturing device needs to be optimized. Specifically, the external parameter of the image acquisition device is optimized by adopting a beam adjustment method to obtain the optimized external parameter. As before, the Levenberg-Marquardt algorithm can be used to optimize the exogenous.
In the three-dimensional reconstruction method for the traffic sign position, the reconstruction result is considered due to the external parameter change of the image acquisition device, so that the external parameter of the image acquisition device is optimized in the optimization process of the target traffic sign position by the beam adjustment method, the influence on the external parameter caused by external force factors can be reduced, and the accuracy of the target traffic sign reconstruction position is further improved.
The embodiment of the present specification further provides a method for three-dimensional reconstruction of a traffic sign position, referring to fig. 8, the method for three-dimensional reconstruction of a traffic sign position may include the following steps:
s802, determining the position frame of the target traffic sign in the continuous N frames of traffic images.
The position frames in two adjacent frames of traffic images have a first number of pixels; n is a positive integer greater than or equal to 2.
S804, two position frames corresponding to the target number smaller than the threshold value of the pixel number in the first number are determined as initial associated frame combinations.
The two position frames included in the initial associated frame combination respectively have a second number of pixels and a third number of pixels; the pixel number threshold is determined by the number of pixels within the bounding box of the specified location. And the designated position frames are designated in the two position frames included in the initial associated frame combination according to the comparison result of the second number and the third number.
S806, in the initial associated frame combination, determining a target associated frame combination based on the confidence of the initial associated frame combination.
The target association frame combination comprises a plurality of target position frames corresponding to the same target traffic sign; the confidence is determined by the number of pixels between the two positional bounding boxes comprised by the initial associated bounding box combination. And the two position frames included in the initial associated frame combination are respectively marked as a first position frame and a second position frame with an initial connection relation, the first position frame belongs to the first traffic image, and the second position frame belongs to the second traffic image.
Specifically, if any position frame in the first traffic image and a plurality of position frames in the second traffic image have an initial connection relationship, the position frame in the second traffic image corresponding to the maximum confidence degree is taken as the position frame having a target connection relationship with any position frame in the plurality of position frames in the second traffic image; and determining any position frame and a position frame which has a target connection relation with the position frame as a target associated frame combination.
And S808, performing three-dimensional reconstruction according to the position information of the target position frame and shooting pose information corresponding to the target traffic image to which the target position frame belongs to obtain three-dimensional position information of the target traffic sign.
The shooting pose information is the pose information of the traffic object at the shooting time of the target traffic image. The target traffic image is obtained by shooting a target traffic sign through an image acquisition device installed on a traffic object.
Specifically, triangularization calculation is carried out on the basis of the position information of the target position frame and the shooting pose information, and the estimated three-dimensional position of the target traffic sign is obtained. And optimizing the estimated three-dimensional position according to the internal reference, the external reference, the estimated three-dimensional position and the shooting pose information of the image acquisition device by taking the minimum reprojection error as a criterion to obtain the three-dimensional position information of the target traffic sign.
And S810, determining any three-dimensional position information as a data object point from the three-dimensional position information of the target traffic sign obtained by three-dimensional reconstruction of a plurality of traffic objects.
And S812, clustering by taking the data object points as core points according to a preset neighborhood radius and a preset quantity threshold of the target traffic signs in the preset neighborhood radius to obtain a plurality of position cluster.
The three-dimensional position information in the position clustering cluster corresponds to the same target traffic sign;
s814, carrying out average calculation on the three-dimensional position information in any position cluster to obtain the average three-dimensional position of the target traffic sign.
And S816, optimizing the average three-dimensional position by using the average three-dimensional position of the target traffic sign as an initial position and adopting a beam adjustment method to obtain the three-dimensional position information of the target traffic sign corresponding to any position cluster.
And S818, optimizing the external parameter of the image acquisition device by adopting a light beam adjustment method to obtain the optimized external parameter.
In an embodiment of the present disclosure, a three-dimensional reconstruction apparatus 900 for a traffic sign position is provided, referring to fig. 9, the three-dimensional reconstruction apparatus 900 for a traffic sign position includes: a position frame determining module 910, an associated frame combination determining module 920, and a three-dimensional position information determining module 930.
A position frame determining module 910, configured to determine a position frame of the target traffic sign in the consecutive N frames of traffic images; the position frames in two adjacent frames of traffic images have a first number of pixels; n is a positive integer greater than or equal to 2;
an associated frame combination determining module 920, configured to determine a target associated frame combination in the position frames in the N frames of traffic images based on the first number; the target associated frame combination comprises a plurality of target position frames corresponding to the same target traffic sign.
A three-dimensional position information determining module 930, configured to perform three-dimensional reconstruction according to the position information of the target position frame and shooting pose information corresponding to the target traffic image to which the target position frame belongs, to obtain three-dimensional position information of the target traffic sign; and the shooting pose information is the pose information of the traffic object at the shooting moment of the target traffic image.
The present specification provides a computer device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method described in any one of the above embodiments when executing the computer program.
The present specification provides a computer readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implements the steps of the method described in any one of the above embodiments.
One embodiment of the present specification provides a computer program product comprising instructions which, when executed by a processor of a computer device, enable the computer device to perform the steps of the method of any one of the above embodiments.
It should be noted that the logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Further, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

Claims (14)

1. A method of three-dimensional reconstruction of a traffic sign location, the method comprising:
determining a position frame of the target traffic sign in the continuous N frames of traffic images; the position frames in two adjacent frames of traffic images have a first number of pixels; n is a positive integer greater than or equal to 2;
determining a target associated border combination in the position borders in the N frames of traffic images based on the first number; the target associated frame combination comprises a plurality of target position frames corresponding to the same target traffic sign;
performing three-dimensional reconstruction according to the position information of the target position frame and shooting pose information corresponding to a target traffic image to which the target position frame belongs to obtain three-dimensional position information of the target traffic sign; and the shooting pose information is the pose information of the traffic object at the shooting moment of the target traffic image.
2. The method of claim 1, wherein determining a target associated bounding box combination among the location bounding boxes in the N frames of traffic images based on the first number comprises:
determining two position borders corresponding to the target number smaller than the threshold value of the number of pixels in the first number as initial associated border combinations; the initial associated frame combination has the confidence that two position frames included by the initial associated frame combination correspond to the same target traffic sign;
in the initial associated frame combination, the target associated frame combination is determined based on the confidence of the initial associated frame combination.
3. The method of claim 2, wherein the confidence level is determined by the number of pixels between two position frames included in the initial associated frame combination.
4. The method of claim 2, wherein the initial associated frame combination includes two position frames having a second number of pixels and a third number of pixels; the pixel number threshold is determined by the number of pixels in the frame of the designated position; and the specified position frame is specified in two position frames included in the initial associated frame combination according to the comparison result of the second quantity and the third quantity.
5. The method according to claim 2, wherein two position frames included in the initial associated frame combination are respectively marked as a first position frame and a second position frame having an initial connection relationship, the first position frame belongs to the first traffic image, and the second position frame belongs to the second traffic image; determining the target associated border combination based on the confidence of the initial associated border combination in the initial associated border combination comprises:
if any position frame in the first traffic image and the plurality of position frames in the second traffic image have the initial connection relationship, determining a position frame having a target connection relationship with the any position frame according to the confidence of the initial association frame combination in the plurality of position frames in the second traffic image;
and determining the any position frame and a position frame having a target connection relation with the any position frame as the target associated frame combination.
6. The method of claim 5, wherein the determining, from the confidence levels of the initial associated frame combinations, the position frame having the target connection relationship with the any position frame in the plurality of position frames in the second traffic image comprises:
and taking the position frame in the second traffic image corresponding to the maximum confidence coefficient as a position frame having a target connection relation with any one position frame.
7. The method according to claim 1, wherein the target traffic image is captured by an image capturing device installed on the traffic object for the target traffic sign; the three-dimensional reconstruction is performed according to the position information of the target position frame and shooting pose information corresponding to the target traffic image to which the target position frame belongs, so as to obtain the three-dimensional position information of the target traffic sign, and the method comprises the following steps:
performing triangularization calculation based on the position information of the target position frame and the shooting pose information to obtain an estimated three-dimensional position of the target traffic sign;
and optimizing the estimated three-dimensional position according to the internal reference, the external reference, the estimated three-dimensional position and the shooting pose information of the image acquisition device by taking the minimum reprojection error as a criterion to obtain the three-dimensional position information of the target traffic sign.
8. The method of claim 1, further comprising:
clustering processing is carried out on the basis of the three-dimensional position information of the target traffic sign obtained by three-dimensional reconstruction of a plurality of traffic objects, so as to obtain a plurality of position clustering clusters; the three-dimensional position information in the position clustering cluster corresponds to the same target traffic sign;
and determining the three-dimensional position information of the target traffic sign corresponding to any position cluster according to the three-dimensional position information in the position cluster.
9. The method according to claim 8, wherein the clustering based on the three-dimensional position information of the target traffic sign obtained by three-dimensional reconstruction of the traffic objects to obtain a plurality of position cluster clusters comprises:
determining any three-dimensional position information as a data object point from the three-dimensional position information of the target traffic sign obtained by three-dimensional reconstruction of a plurality of traffic objects;
and taking the data object points as core points, and carrying out clustering processing according to a preset neighborhood radius and a preset quantity threshold of the target traffic signs in the preset neighborhood radius to obtain the plurality of position clustering clusters.
10. The method according to claim 8, wherein the determining the three-dimensional position information of the target traffic sign corresponding to any position cluster according to the three-dimensional position information in the any position cluster comprises:
carrying out average calculation on the three-dimensional position information in any position clustering cluster to obtain the average three-dimensional position of the target traffic sign;
and optimizing the average three-dimensional position by adopting a light beam adjustment method by taking the average three-dimensional position of the target traffic sign as an initial position to obtain the three-dimensional position information of the target traffic sign corresponding to any position cluster.
11. The method of claim 10, further comprising:
and optimizing the external parameter of the image acquisition device by adopting a light beam adjustment method to obtain the optimized external parameter.
12. An apparatus for three-dimensional reconstruction of a traffic sign position, the apparatus comprising:
the position frame determining module is used for determining the position frame of the target traffic sign in the continuous N frames of traffic images; the position frames in two adjacent frames of traffic images have a first number of pixels; n is a positive integer greater than or equal to 2;
the associated frame combination determining module is used for determining a target associated frame combination in position frames in the N frames of traffic images based on the first number; the target associated frame combination comprises a plurality of target position frames corresponding to the same target traffic sign;
the three-dimensional position information determining module is used for performing three-dimensional reconstruction according to the position information of the target position frame and shooting pose information corresponding to a target traffic image to which the target position frame belongs to obtain three-dimensional position information of the target traffic sign; and the shooting pose information is the pose information of the traffic object at the shooting time of the target traffic image.
13. A device for three-dimensional reconstruction of a traffic sign position, comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method according to any one of claims 1 to 11 when executing the computer program.
14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 11.
CN202211441838.5A 2022-11-17 2022-11-17 Three-dimensional reconstruction method, device and equipment for traffic sign position and storage medium Pending CN115880665A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211441838.5A CN115880665A (en) 2022-11-17 2022-11-17 Three-dimensional reconstruction method, device and equipment for traffic sign position and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211441838.5A CN115880665A (en) 2022-11-17 2022-11-17 Three-dimensional reconstruction method, device and equipment for traffic sign position and storage medium

Publications (1)

Publication Number Publication Date
CN115880665A true CN115880665A (en) 2023-03-31

Family

ID=85760132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211441838.5A Pending CN115880665A (en) 2022-11-17 2022-11-17 Three-dimensional reconstruction method, device and equipment for traffic sign position and storage medium

Country Status (1)

Country Link
CN (1) CN115880665A (en)

Similar Documents

Publication Publication Date Title
CN110988912B (en) Road target and distance detection method, system and device for automatic driving vehicle
JP7430277B2 (en) Obstacle detection method and apparatus, computer device, and computer program
KR101534056B1 (en) Traffic signal mapping and detection
CN112912920A (en) Point cloud data conversion method and system for 2D convolutional neural network
CN112417953B (en) Road condition detection and map data updating method, device, system and equipment
Broggi et al. Terramax vision at the urban challenge 2007
CN106571046B (en) Vehicle-road cooperative driving assisting method based on road surface grid system
CN110648389A (en) 3D reconstruction method and system for city street view based on cooperation of unmanned aerial vehicle and edge vehicle
WO2021038294A1 (en) Systems and methods for identifying potential communication impediments
GB2554481A (en) Autonomous route determination
KR20160123668A (en) Device and method for recognition of obstacles and parking slots for unmanned autonomous parking
CN103770704A (en) System and method for recognizing parking space line markings for vehicle
CN111814602B (en) Intelligent vehicle environment dynamic target detection method based on vision
WO2021017211A1 (en) Vehicle positioning method and device employing visual sensing, and vehicle-mounted terminal
CN112257668A (en) Main and auxiliary road judging method and device, electronic equipment and storage medium
JP2022039188A (en) Position attitude calculation method and position attitude calculation program
CN117576652B (en) Road object identification method and device, storage medium and electronic equipment
CN117372991A (en) Automatic driving method and system based on multi-view multi-mode fusion
EP4250245A1 (en) System and method for determining a viewpoint of a traffic camera
CN114648639B (en) Target vehicle detection method, system and device
KR102368262B1 (en) Method for estimating traffic light arrangement information using multiple observation information
CN115880665A (en) Three-dimensional reconstruction method, device and equipment for traffic sign position and storage medium
US11544899B2 (en) System and method for generating terrain maps
Klette et al. Advance in vision-based driver assistance
Berrio et al. Semantic sensor fusion: From camera to sparse LiDAR information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination