CN113744361A - Three-dimensional high-precision map construction method and device based on trinocular vision - Google Patents

Three-dimensional high-precision map construction method and device based on trinocular vision Download PDF

Info

Publication number
CN113744361A
CN113744361A CN202110993664.2A CN202110993664A CN113744361A CN 113744361 A CN113744361 A CN 113744361A CN 202110993664 A CN202110993664 A CN 202110993664A CN 113744361 A CN113744361 A CN 113744361A
Authority
CN
China
Prior art keywords
camera
dimensional
map
training
cameras
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110993664.2A
Other languages
Chinese (zh)
Inventor
李必军
别韦苇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Dongfeng Motor Group Co Ltd
Original Assignee
Wuhan University WHU
Dongfeng Motor Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU, Dongfeng Motor Group Co Ltd filed Critical Wuhan University WHU
Priority to CN202110993664.2A priority Critical patent/CN113744361A/en
Publication of CN113744361A publication Critical patent/CN113744361A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/206Drawing of charts or graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • G06T2207/30256Lane; Road marking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • Remote Sensing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a three-dimensional high-precision map construction method and a three-dimensional high-precision map construction device based on trinocular vision, wherein the method comprises the steps of calibrating each camera in a trinocular camera respectively and determining the position relation among the cameras; and acquiring image information acquired by the trinocular camera, and constructing a three-dimensional reconstruction map model and a two-dimensional high-precision map according to the image information to obtain a three-dimensional rendering map. The invention realizes the effect of expanding the field angle range through the trinocular vision, and the missing environmental information is repaired through the combination of AR and the information fusion technology, thereby effectively solving the problem of vision blind areas in automatic driving. And because the optical axes of the three cameras form a certain angle respectively to form three binocular vision models, the models can be verified mutually, the problem of mismatching in binocular vision is solved, uncertainty caused by binocular matching is eliminated, the anti-interference performance is strong, and the scientificity of modeling and positioning is enhanced.

Description

Three-dimensional high-precision map construction method and device based on trinocular vision
Technical Field
The application relates to the technical field of automatic driving, in particular to a three-dimensional high-precision map construction method and device based on trinocular vision.
Background
In recent years, with the gradual development of automatic driving technology, the functions of high-precision maps, particularly 3D high-precision maps in constructing long-period memory and realizing automobile super-vision perception are gradually highlighted. The 3D high-precision map not only can provide high-quality three-dimensional map data for automatic driving through three-dimensional geometric structure reconstruction of the map and simulation of road streetscapes, but also can provide data support for simulation data.
However, the conventional vision-based high-precision map scheme is based on binocular vision, and the scheme is limited by the field angle of a binocular camera, so that the defects of obvious vision blind areas and multi-semantic matching exist. In addition, a scheme for constructing a three-dimensional high-precision map through vehicle-mounted point cloud data or deep learning is adopted at present, the former has the defect of poor anti-interference performance, and the latter cannot carry out unbiased estimation on the rule of data through a deep learning algorithm under an application scene only capable of providing limited data volume, and the model design is very complex, so that a large amount of manpower, material resources and time are required to be invested to develop a new algorithm and a new model.
Disclosure of Invention
In order to solve the above problems, an embodiment of the present application provides a three-dimensional high-precision map construction method and apparatus based on trinocular vision.
In a first aspect, an embodiment of the present application provides a three-dimensional high-precision map construction method based on trinocular vision, where the method includes:
calibrating each camera in the trinocular camera respectively, and determining the position relation between the cameras;
and acquiring image information acquired by the trinocular camera, and constructing a three-dimensional reconstruction map model and a two-dimensional high-precision map according to the image information to obtain a three-dimensional rendering map.
Preferably, the calibrating each camera in the three-view camera and determining the position relationship between the cameras respectively includes:
constructing a checkerboard, determining the checkerboard as a calibration board, and calibrating each camera in the three-view camera respectively;
and calculating a rotation matrix and a translation matrix based on the coordinate system corresponding to each camera, and determining the position relation between the cameras according to the rotation matrix and the translation matrix.
Preferably, the constructing a checkerboard, determining the checkerboard as a calibration board, and calibrating each camera in the three-view camera respectively includes:
controlling a three-eye camera to rotate, and acquiring calibration images shot by the three-eye camera from different directions;
extracting checkerboard angular points from the calibration image according to a harris algorithm, constructing a checkerboard based on the checkerboard angular points, and determining the checkerboard as a calibration plate;
estimating internal parameters and external parameters of each camera respectively, and estimating distortion coefficients of each camera under radial distortion based on a least square method;
and after the distortion coefficient is optimized and estimated according to a maximum likelihood method, obtaining and outputting camera parameters, and completing calibration.
Preferably, the calculating a rotation matrix and a translation matrix based on the coordinate system corresponding to each of the cameras, and determining a position relationship between the cameras according to the rotation matrix and the translation matrix includes:
laying the calibration board to enable the calibration board to fill the view fields of all the cameras, wherein the cameras comprise a first camera, a second camera and a panoramic camera;
determining a coordinate system corresponding to the first camera as a first area coordinate system, determining a coordinate system corresponding to the second camera as a second area coordinate system, and determining a coordinate system corresponding to the panoramic camera as a reference coordinate system;
and calculating a rotation matrix and a translation matrix based on the first area coordinate system, the second area coordinate system and the reference coordinate system, and determining the position relation between the cameras according to the rotation matrix and the translation matrix.
Preferably, the acquiring image information acquired by the trinocular camera, and constructing a three-dimensional reconstruction map model and a two-dimensional high-precision map according to the image information to obtain a three-dimensional rendering map includes:
acquiring image information acquired by the trinocular camera;
after the image information is subjected to distortion correction, extracting image feature points based on an SIFT algorithm, matching the image feature points, and determining a geometric constraint relation between the image information;
converting the geometric constraint relation into model parameters of a basic matrix according to epipolar geometry;
constructing a three-dimensional reconstruction map according to the model parameters, and optimizing the three-dimensional reconstruction map based on a binding adjustment method;
acquiring a mainstream data set, and constructing environment semantic information based on the mainstream data set, wherein the mainstream data set comprises a training set and a testing set;
performing model training on the mainstream data set by adopting a mode of learning a noise sample containing a label;
classifying all high-precision map elements according to a model training result, and verifying feasibility of all the high-precision map elements;
perfecting the training set based on feasibility verification results to obtain a two-dimensional map model;
extracting elements of the image information, inputting the extracted elements into the two-dimensional map model to obtain a two-dimensional high-precision map, and repairing missing layer information of the two-dimensional high-precision map;
and obtaining a three-dimensional rendering map based on the three-dimensional reconstruction map model and the two-dimensional high-precision map.
Preferably, the model training of the mainstream data set by using the method of learning the noise sample containing the tag includes:
randomly dividing the training set image and the test set image into fixed images with fixed sizes based on a sliding window;
randomly selecting a plurality of training set images from the training set to construct a reverse noise sample set, and taking the rest training sets as a general training sample set;
after data enhancement processing is carried out on the training set, the reverse noise sample set and the general training sample set are input into a U-Net model for training;
and calculating the detection precision ratio between the general training sample set and the reverse noise sample set after each round of training is finished, and stopping training when the detection precision ratio reaches a preset maximum value to obtain the U-Net model parameters.
Preferably, the repairing the missing map-layer information of the two-dimensional high-precision map includes:
when the environmental information is determined to be missing, detecting the environmental information which is not lost to obtain environmental characteristic information;
matching the environmental characteristic information with a preset environmental database, and calculating a target position relation between a detected first target and a second target with information loss based on a computer vision principle;
and displaying AR image information at the position where the environmental information is missing according to the target position relation, and repairing the missing layer information of the two-dimensional high-precision map based on the AR image information.
In a second aspect, an embodiment of the present application provides a three-dimensional high-precision map building apparatus based on trinocular vision, where the apparatus includes:
the calibration module is used for respectively calibrating each camera in the three-camera and determining the position relation between the cameras;
and the construction module is used for acquiring the image information acquired by the trinocular camera, and constructing a three-dimensional reconstruction map model and a two-dimensional high-precision map according to the image information to obtain a three-dimensional rendering map.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method as provided in the first aspect or any one of the possible implementation manners of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method as provided in the first aspect or any one of the possible implementations of the first aspect.
The invention has the beneficial effects that: 1. the visual field angle range is expanded through the trinocular vision, missing environmental information is repaired through the combination of AR and the information fusion technology, and the problem of visual blind areas in automatic driving is effectively solved.
2. Because the optical axes of the three cameras form a certain angle respectively to form three binocular vision models, the models can be verified mutually, the problem of mismatching in binocular vision is solved, uncertainty caused by binocular matching is eliminated, the anti-interference performance is strong, and the scientificity of modeling and positioning is enhanced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic flow chart of a three-dimensional high-precision map construction method based on trinocular vision according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a three-dimensional high-precision map building device based on trinocular vision according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
In the following description, the terms "first" and "second" are used for descriptive purposes only and are not intended to indicate or imply relative importance. The following description provides embodiments of the present application, where different embodiments may be substituted or combined, and thus the present application is intended to include all possible combinations of the same and/or different embodiments described. Thus, if one embodiment includes feature A, B, C and another embodiment includes feature B, D, then this application should also be considered to include an embodiment that includes one or more of all other possible combinations of A, B, C, D, even though this embodiment may not be explicitly recited in text below.
The following description provides examples, and does not limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements described without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as appropriate. For example, the described methods may be performed in an order different than the order described, and various steps may be added, omitted, or combined. Furthermore, features described with respect to some examples may be combined into other examples.
Referring to fig. 1, fig. 1 is a schematic flow chart of a three-dimensional high-precision map construction method based on trinocular vision according to an embodiment of the present application. In an embodiment of the present application, the method includes:
s101, calibrating each camera in the trinocular camera respectively, and determining the position relation among the cameras.
In the embodiment of the application, in order to ensure the accuracy of subsequent measurement, before the three-dimensional high-precision map is constructed, the three cameras are respectively calibrated, and the position relationship among the three cameras is determined, so that the position corresponding relationship among the subsequently shot pictures is conveniently determined.
In one possible embodiment, step S101 includes:
constructing a checkerboard, determining the checkerboard as a calibration board, and calibrating each camera in the three-view camera respectively;
and calculating a rotation matrix and a translation matrix based on the coordinate system corresponding to each camera, and determining the position relation between the cameras according to the rotation matrix and the translation matrix.
In the embodiment of the application, in order to calibrate the cameras, a checkerboard is selected to be used as a calibration board, so that each camera is calibrated separately. In addition, constraint relations exist among the cameras, a rotation matrix and a translation matrix can be calculated by integrating the coordinate system corresponding to each camera, and the position relation among the cameras can be represented and determined through the rotation matrix and the translation matrix.
In an embodiment, the constructing a checkerboard, determining the checkerboard as a calibration board, and calibrating each of the cameras in the trinocular camera respectively includes:
controlling a three-eye camera to rotate, and acquiring calibration images shot by the three-eye camera from different directions;
extracting checkerboard angular points from the calibration image according to a harris algorithm, constructing a checkerboard based on the checkerboard angular points, and determining the checkerboard as a calibration plate;
estimating internal parameters and external parameters of each camera respectively, and estimating distortion coefficients of each camera under radial distortion based on a least square method;
and after the distortion coefficient is optimized and estimated according to a maximum likelihood method, obtaining and outputting camera parameters, and completing calibration.
In the embodiment of the present application, a specific process of calibrating a camera may be as follows:
firstly, the calibration board image is read in, namely, the direction of the camera is adjusted through control, and calibration images in different directions shot for chessboard calibration are obtained.
And then extracting the checkerboard corner points, namely extracting the checkerboard corner points from the calibration image by using a harris algorithm.
The specific calculation process of the Harris algorithm is as follows:
1. calculating the gradients Ix and Iy of the calibration image I (X, Y) in the X direction and the Y direction:
Figure DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE004
2. and performing Gaussian weighting on the gradient calculation result by using a Gaussian function to generate a matrix M:
Figure DEST_PATH_IMAGE006
3. and calculating a harris response value C of each pixel, and setting C smaller than a preset threshold value t to be zero.
Specifically, non-maximum suppression is performed in a 5 × 5 neighborhood, and a local maximum point is a corner point in the calibration image, wherein the local maximum point is the corner point
Figure DEST_PATH_IMAGE008
And
Figure DEST_PATH_IMAGE010
two characteristic values corresponding to M are respectively, k is a calculation coefficient, and the calculation formula of C is as follows:
Figure DEST_PATH_IMAGE012
the checkerboard corner points are determined, and the checkerboard can be constructed.
In addition, in order to be able to perform specific calculations later, it is also necessary to estimate and acquire internal parameters and external parameters of each camera. In the ideal distortion-free case, there are four internal parameters and six external parameters.
Wherein, the four internal parameters are internal parameter matrixes
Figure DEST_PATH_IMAGE014
Wherein
Figure DEST_PATH_IMAGE016
Is a focal length expressed in units of pixels,
Figure DEST_PATH_IMAGE018
the main point of the calibration image is the intersection point of the through lens axis vertical to the imaging plane and the image plane.
Six external parameters are
Figure DEST_PATH_IMAGE020
Wherein
Figure DEST_PATH_IMAGE022
Is a translation vector that is a vector of translation,
Figure DEST_PATH_IMAGE024
is a matrix of rotations of the optical system,
Figure DEST_PATH_IMAGE026
is rotated about the x-axis by an angle,
Figure DEST_PATH_IMAGE028
is the angle of rotation about the y-axis,
Figure DEST_PATH_IMAGE030
is the angle of rotation about the z-axis of the camera coordinate system.
Then, the distortion coefficient estimation is carried out, namely, the distortion coefficient under the actual existence of radial distortion is estimated by applying a least square method:
Figure DEST_PATH_IMAGE032
in the formula (I), the compound is shown in the specification,
Figure DEST_PATH_IMAGE034
is the coordinates of the points of the ideal image,
Figure DEST_PATH_IMAGE036
for this purpose the coordinates of the point in the actual image coordinate system of the calibration image,
Figure DEST_PATH_IMAGE038
is a parameter of the radial distortion of the camera,
Figure DEST_PATH_IMAGE040
the polar path from the image point to the center of the image plane,
Figure DEST_PATH_IMAGE042
is a distortion coefficient in the x-axis direction,
Figure DEST_PATH_IMAGE044
is the distortion coefficient in the y-axis direction. The least square method is to find a group
Figure DEST_PATH_IMAGE046
So that
Figure DEST_PATH_IMAGE048
And
Figure DEST_PATH_IMAGE050
the sum of the squared residuals is minimal.
After the distortion coefficient is obtained, the maximum likelihood method is adopted for optimization estimation, and the estimation precision is improved. The most probable (maximum probability) parameter value, namely the camera parameter, is reversely deduced by using the known distortion coefficient obtained by calculation, and finally the camera parameter is output, so that the calibration process of the camera can be completed.
In one possible implementation, the calculating a rotation matrix and a translation matrix based on the coordinate system corresponding to each of the cameras, and determining a position relationship between each of the cameras according to the rotation matrix and the translation matrix includes:
laying the calibration board to enable the calibration board to fill the view fields of all the cameras, wherein the cameras comprise a first camera, a second camera and a panoramic camera;
determining a coordinate system corresponding to the first camera as a first area coordinate system, determining a coordinate system corresponding to the second camera as a second area coordinate system, and determining a coordinate system corresponding to the panoramic camera as a reference coordinate system;
and calculating a rotation matrix and a translation matrix based on the first area coordinate system, the second area coordinate system and the reference coordinate system, and determining the position relation between the cameras according to the rotation matrix and the translation matrix.
In the embodiment of the present application, the three cameras are two general cameras (i.e., a first camera, a second camera) and one panoramic camera, respectively. Since the calibration board has been determined by the checkerboard, the calibration board with the appropriate size will be laid within the field of view of the cameras so that it fills the fields of view of the three cameras, i.e. the three cameras can shoot the calibration board at the same time. After the calibration plate is laid, the coordinate system of the panoramic camera is used as a reference coordinate system, and the coordinate system of the common camera is used as an area coordinate system. The relative relationship of the three camera coordinate systems is described by a rotation matrix R and a translation matrix T. That is, the position relationship between the camera coordinate system P and the world coordinate system P is represented by the external parameters R and T:
Figure DEST_PATH_IMAGE052
s102, obtaining image information collected by the trinocular camera, and constructing a three-dimensional reconstruction map model and a two-dimensional high-precision map according to the image information to obtain a three-dimensional rendering map.
In the embodiment of the application, the three-view camera can shoot and collect required image information on an actual road section where the three-dimensional high-precision map is required to be built, a three-dimensional reconstruction map model can be built according to the shot image information, meanwhile, the two-dimensional high-precision map can be built according to the image information, the two-dimensional vector high-precision map and the three-dimensional grid high-precision map are displayed in a superposition mode, and the three-dimensional rendering map is obtained.
In one possible embodiment, step S102 includes:
acquiring image information acquired by the trinocular camera;
after the image information is subjected to distortion correction, extracting image feature points based on an SIFT algorithm, matching the image feature points, and determining a geometric constraint relation between the image information;
converting the geometric constraint relation into model parameters of a basic matrix according to epipolar geometry;
constructing a three-dimensional reconstruction map according to the model parameters, and optimizing the three-dimensional reconstruction map based on a binding adjustment method;
acquiring a mainstream data set, and constructing environment semantic information based on the mainstream data set, wherein the mainstream data set comprises a training set and a testing set;
performing model training on the mainstream data set by adopting a mode of learning a noise sample containing a label;
classifying all high-precision map elements according to a model training result, and verifying feasibility of all the high-precision map elements;
perfecting the training set based on feasibility verification results to obtain a two-dimensional map model;
extracting elements of the image information, inputting the extracted elements into the two-dimensional map model to obtain a two-dimensional high-precision map, and repairing missing layer information of the two-dimensional high-precision map;
and obtaining a three-dimensional rendering map based on the three-dimensional reconstruction map model and the two-dimensional high-precision map.
In the embodiment of the application, after the image information is acquired by the trinocular camera, the three-dimensional reconstruction map model and the two-dimensional high-precision map are generated respectively based on the image information.
The specific process of generating the three-dimensional reconstruction map model is as follows:
(1) and (3) image correction: the distortion in the captured image information is corrected.
(2) Feature extraction: and detecting the image features by using local non-deformation, and extracting image feature points by using an SIFT algorithm.
(3) And (3) feature matching: matching image feature points of the same scene uniform space position, and resolving a preliminary position relation between images, namely a geometric constraint relation.
(4) Multi-view geometric constraint relationship calculation: the geometric constraint relation is converted into model parameter estimation of a basic matrix through epipolar geometry, the mode is irrelevant to an external scene, only depends on internal parameters of a camera and the relative posture between the two images, and can be quickly converted and calculated by the internal parameters and the relative position information obtained in the previous step.
(5) And (4) three-dimensional reconstruction. And optimizing the three-dimensional reconstruction result by adopting a binding adjustment method, simultaneously carrying out nonlinear optimization on the three-dimensional points and the camera posture to obtain three-dimensional modeling parameters and three-dimensional point coordinates of the camera, and further pasting textures on the dense point cloud to obtain a three-dimensional reconstruction model.
The generation process of the two-dimensional high-precision map is as follows:
(1) and constructing environment semantic information. Mainstream-based datasets, such as Mapillary, Tusimple, VPGNet, CUlane; BDD100k, ApploCape and the like, and the environment semantic information of various elements such as ground, lane lines, traffic lights, road signs and the like is constructed. Tusimple and cumane for lane line detection; ApploCape, BDD100k for ground detection;
(2) and (5) training a model. Model training is carried out by adopting a method of learning a noise sample containing a label so as to solve the defect of matching multiple meanings;
(3) automatic classification and algorithm feasibility verification. Classifying all elements of the high-precision map according to the training result of the model, and verifying the feasibility of the algorithm, wherein the specific feasibility verification mode can be that a test data set which is divided in advance is input into the model for verification;
(4) and (5) completing a training sample. The algorithm is feasible, other data sets are added, the training sample is further improved, and the training model is optimized; if not feasible, other mainstream data sets or autonomous acquisition data sets are selected for retraining.
(5) And extracting the layer elements. And extracting the elements of various element road layers, traffic light layers, traffic facility layers and fixed ground feature layers, constructing the layers to realize high-precision map manufacturing, and identifying and modifying the misclassified elements.
(6) And missing layer information of the two-dimensional high-precision map is repaired, and the influence on the map effect caused by partial layer missing of the finally obtained image is avoided.
In one embodiment, the model training of the mainstream data set by using the learning of the labeled noise sample includes:
randomly dividing the training set image and the test set image into fixed images with fixed sizes based on a sliding window;
randomly selecting a plurality of training set images from the training set to construct a reverse noise sample set, and taking the rest training sets as a general training sample set;
after data enhancement processing is carried out on the training set, the reverse noise sample set and the general training sample set are input into a U-Net model for training;
and calculating the detection precision ratio between the general training sample set and the reverse noise sample set after each round of training is finished, and stopping training when the detection precision ratio reaches a preset maximum value to obtain the U-Net model parameters.
In the embodiment of the present application, the specific training process of the model may be as follows:
the training set and test set images are randomly divided into fixed images of fixed size by a sliding window.
Randomly selecting training set images from the training set to manufacture a reverse noise sample set, wherein the rest training sets are general training samples.
And enhancing data. Namely, the images in the training set are subjected to operations such as horizontal overturning, anticlockwise rotating, random size stretching and the like, so that the richness of data in the training set is increased.
And fourthly, inputting U-Net training. And inputting the geometric information and the image merging channel into U-Net for training, detecting a general training sample and a reverse noise sample by the U-Net after each training round is finished, calculating the ratio of the detection precision of the general training sample and the reverse noise sample, stopping training when the value reaches the maximum value, and obtaining the parameters of the U-Net model.
In an implementation manner, the repairing missing layer information of the two-dimensional high-precision map includes:
when the environmental information is determined to be missing, detecting the environmental information which is not lost to obtain environmental characteristic information;
matching the environmental characteristic information with a preset environmental database, and calculating a target position relation between a detected first target and a second target with information loss based on a computer vision principle;
and displaying AR image information at the position where the environmental information is missing according to the target position relation, and repairing the missing layer information of the two-dimensional high-precision map based on the AR image information.
In this embodiment of the present application, a specific repair process of missing layer information may be as follows:
detecting information which is not lost by a road through a vehicle-mounted three-eye sensor under the condition that road environment information such as a damaged road target, a shielded road target and the like is lost, such as an unblocked lane line, a signboard and the like, so as to obtain characteristic information of the road;
secondly, matching the detected road characteristic information with a traffic environment database, and calculating the position relation between the detected road target and the road target with missing information according to the computer vision principle.
And displaying the missing road in the information collected by the sensor by utilizing an AR and information fusion technology, so as to realize the road environment information restoration based on the vehicle-mounted sensing and the fine map deep fusion.
The three-dimensional high-precision map building device based on the trinocular vision provided by the embodiment of the present application will be described in detail below with reference to fig. 2. It should be noted that, the three-dimensional high-precision map building apparatus based on trinocular vision shown in fig. 2 is used for executing the method of the embodiment shown in fig. 1 of the present application, and for convenience of description, only the parts related to the embodiment of the present application are shown, and details of the specific technology are not disclosed, please refer to the embodiment shown in fig. 1 of the present application.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a three-dimensional high-precision map building apparatus based on trinocular vision according to an embodiment of the present application. As shown in fig. 2, the apparatus includes:
a calibration module 201, configured to calibrate each camera in a trinocular camera, and determine a position relationship between the cameras;
the construction module 202 is configured to acquire image information acquired by the trinocular camera, and construct a three-dimensional reconstruction map model and a two-dimensional high-precision map according to the image information to obtain a three-dimensional rendering map.
In one possible implementation, the calibration module 201 includes:
the construction unit is used for constructing a checkerboard, determining the checkerboard as a calibration board, and calibrating each camera in the three-view camera respectively;
the first calculation unit is used for calculating a rotation matrix and a translation matrix based on a coordinate system corresponding to each camera, and determining the position relation between the cameras according to the rotation matrix and the translation matrix.
In an embodiment, the building unit is specifically configured to:
controlling a three-eye camera to rotate, and acquiring calibration images shot by the three-eye camera from different directions;
extracting checkerboard angular points from the calibration image according to a harris algorithm, constructing a checkerboard based on the checkerboard angular points, and determining the checkerboard as a calibration plate;
estimating internal parameters and external parameters of each camera respectively, and estimating distortion coefficients of each camera under radial distortion based on a least square method;
and after the distortion coefficient is optimized and estimated according to a maximum likelihood method, obtaining and outputting camera parameters, and completing calibration.
In an implementation manner, the first computing unit is specifically configured to:
laying the calibration board to enable the calibration board to fill the view fields of all the cameras, wherein the cameras comprise a first camera, a second camera and a panoramic camera;
determining a coordinate system corresponding to the first camera as a first area coordinate system, determining a coordinate system corresponding to the second camera as a second area coordinate system, and determining a coordinate system corresponding to the panoramic camera as a reference coordinate system;
and calculating a rotation matrix and a translation matrix based on the first area coordinate system, the second area coordinate system and the reference coordinate system, and determining the position relation between the cameras according to the rotation matrix and the translation matrix.
In one possible implementation, the building module 202 includes:
the acquisition unit is used for acquiring the image information acquired by the trinocular camera;
the matching unit is used for extracting image feature points based on an SIFT algorithm after the image information is subjected to distortion correction, matching the image feature points and determining a geometric constraint relation among the image information;
the conversion unit is used for converting the geometric constraint relation into model parameters of a basic matrix according to epipolar geometry;
the first construction unit is used for constructing a three-dimensional reconstruction map according to the model parameters and optimizing the three-dimensional reconstruction map based on a binding adjustment method;
the second construction unit is used for acquiring a mainstream data set and constructing environment semantic information based on the mainstream data set, wherein the mainstream data set comprises a training set and a testing set;
the training unit is used for carrying out model training on the mainstream data set in a mode of learning a noise sample containing a label;
the classification unit is used for classifying all high-precision map elements according to a model training result and verifying the feasibility of all the high-precision map elements;
the perfecting unit is used for perfecting the training set based on the feasibility verification result to obtain a two-dimensional map model;
the extraction unit is used for performing element extraction on the image information and inputting the image information to the two-dimensional map model to obtain a two-dimensional high-precision map and repairing the missing layer information of the two-dimensional high-precision map;
and the third construction unit is used for obtaining a three-dimensional rendering map based on the three-dimensional reconstruction map model and the two-dimensional high-precision map.
In one possible embodiment, the training unit is specifically configured to:
randomly dividing the training set image and the test set image into fixed images with fixed sizes based on a sliding window;
randomly selecting a plurality of training set images from the training set to construct a reverse noise sample set, and taking the rest training sets as a general training sample set;
after data enhancement processing is carried out on the training set, the reverse noise sample set and the general training sample set are input into a U-Net model for training;
and calculating the detection precision ratio between the general training sample set and the reverse noise sample set after each round of training is finished, and stopping training when the detection precision ratio reaches a preset maximum value to obtain the U-Net model parameters.
In an embodiment, the extraction unit is specifically configured to:
when the environmental information is determined to be missing, detecting the environmental information which is not lost to obtain environmental characteristic information;
matching the environmental characteristic information with a preset environmental database, and calculating a target position relation between a detected first target and a second target with information loss based on a computer vision principle;
and displaying AR image information at the position where the environmental information is missing according to the target position relation, and repairing the missing layer information of the two-dimensional high-precision map based on the AR image information.
It is clear to a person skilled in the art that the solution according to the embodiments of the present application can be implemented by means of software and/or hardware. The "unit" and "module" in this specification refer to software and/or hardware that can perform a specific function independently or in cooperation with other components, where the hardware may be, for example, a Field-Programmable Gate Array (FPGA), an Integrated Circuit (IC), or the like.
Each processing unit and/or module in the embodiments of the present application may be implemented by an analog circuit that implements the functions described in the embodiments of the present application, or may be implemented by software that executes the functions described in the embodiments of the present application.
Referring to fig. 3, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown, where the electronic device may be used to implement the method in the embodiment shown in fig. 1. As shown in fig. 3, the electronic device 300 may include: at least one central processor 301, at least one network interface 304, a user interface 303, a memory 305, at least one communication bus 302.
Wherein a communication bus 302 is used to enable the connection communication between these components.
The user interface 303 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 303 may further include a standard wired interface and a wireless interface.
The network interface 304 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
The central processor 301 may include one or more processing cores. The central processor 301 connects various parts within the entire electronic device 300 using various interfaces and lines, and performs various functions of the terminal 300 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 305 and calling data stored in the memory 305. Alternatively, the central Processing unit 301 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The CPU 301 may integrate one or a combination of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the cpu 301, but may be implemented by a single chip.
The Memory 305 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 305 includes a non-transitory computer-readable medium. The memory 305 may be used to store instructions, programs, code sets, or instruction sets. The memory 305 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 305 may alternatively be at least one storage device located remotely from the central processor 301. As shown in fig. 3, memory 305, which is a type of computer storage medium, may include an operating system, a network communication module, a user interface module, and program instructions.
In the electronic device 300 shown in fig. 3, the user interface 303 is mainly used for providing an input interface for a user to obtain data input by the user; the central processor 301 may be configured to call the three-dimensional high-precision map building application program based on the trinocular vision stored in the memory 305, and specifically perform the following operations:
calibrating each camera in the trinocular camera respectively, and determining the position relation between the cameras;
and acquiring image information acquired by the trinocular camera, and constructing a three-dimensional reconstruction map model and a two-dimensional high-precision map according to the image information to obtain a three-dimensional rendering map.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described method. The computer-readable storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some service interfaces, devices or units, and may be an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program, which is stored in a computer-readable memory, and the memory may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above description is only an exemplary embodiment of the present disclosure, and the scope of the present disclosure should not be limited thereby. That is, all equivalent changes and modifications made in accordance with the teachings of the present disclosure are intended to be included within the scope of the present disclosure. Embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (10)

1. A three-dimensional high-precision map construction method based on trinocular vision is characterized by comprising the following steps:
calibrating each camera in the trinocular camera respectively, and determining the position relation between the cameras;
and acquiring image information acquired by the trinocular camera, and constructing a three-dimensional reconstruction map model and a two-dimensional high-precision map according to the image information to obtain a three-dimensional rendering map.
2. The method of claim 1, wherein the calibrating each camera of the trinocular cameras and determining the positional relationship between the cameras respectively comprises:
constructing a checkerboard, determining the checkerboard as a calibration board, and calibrating each camera in the three-view camera respectively;
and calculating a rotation matrix and a translation matrix based on the coordinate system corresponding to each camera, and determining the position relation between the cameras according to the rotation matrix and the translation matrix.
3. The method of claim 2, wherein the constructing a checkerboard, determining the checkerboard as a calibration board, and calibrating each of the cameras of the trinocular cameras separately comprises:
controlling a three-eye camera to rotate, and acquiring calibration images shot by the three-eye camera from different directions;
extracting checkerboard angular points from the calibration image according to a harris algorithm, constructing a checkerboard based on the checkerboard angular points, and determining the checkerboard as a calibration plate;
estimating internal parameters and external parameters of each camera respectively, and estimating distortion coefficients of each camera under radial distortion based on a least square method;
and after the distortion coefficient is optimized and estimated according to a maximum likelihood method, obtaining and outputting camera parameters, and completing calibration.
4. The method according to claim 2, wherein the calculating a rotation matrix and a translation matrix based on the coordinate system corresponding to each of the cameras, and determining the position relationship between the cameras according to the rotation matrix and the translation matrix comprises:
laying the calibration board to enable the calibration board to fill the view fields of all the cameras, wherein the cameras comprise a first camera, a second camera and a panoramic camera;
determining a coordinate system corresponding to the first camera as a first area coordinate system, determining a coordinate system corresponding to the second camera as a second area coordinate system, and determining a coordinate system corresponding to the panoramic camera as a reference coordinate system;
and calculating a rotation matrix and a translation matrix based on the first area coordinate system, the second area coordinate system and the reference coordinate system, and determining the position relation between the cameras according to the rotation matrix and the translation matrix.
5. The method according to claim 1, wherein the obtaining of image information collected by the trinocular camera, and the constructing of a three-dimensional reconstruction map model and a two-dimensional high-precision map according to the image information to obtain a three-dimensional rendering map comprises:
acquiring image information acquired by the trinocular camera;
after the image information is subjected to distortion correction, extracting image feature points based on an SIFT algorithm, matching the image feature points, and determining a geometric constraint relation between the image information;
converting the geometric constraint relation into model parameters of a basic matrix according to epipolar geometry;
constructing a three-dimensional reconstruction map according to the model parameters, and optimizing the three-dimensional reconstruction map based on a binding adjustment method;
acquiring a mainstream data set, and constructing environment semantic information based on the mainstream data set, wherein the mainstream data set comprises a training set and a testing set;
performing model training on the mainstream data set by adopting a mode of learning a noise sample containing a label;
classifying all high-precision map elements according to a model training result, and verifying feasibility of all the high-precision map elements;
perfecting the training set based on feasibility verification results to obtain a two-dimensional map model;
extracting elements of the image information, inputting the extracted elements into the two-dimensional map model to obtain a two-dimensional high-precision map, and repairing missing layer information of the two-dimensional high-precision map;
and obtaining a three-dimensional rendering map based on the three-dimensional reconstruction map model and the two-dimensional high-precision map.
6. The method of claim 5, wherein the model training of the mainstream data set by means of labeled noise sample learning comprises:
randomly dividing the training set image and the test set image into fixed images with fixed sizes based on a sliding window;
randomly selecting a plurality of training set images from the training set to construct a reverse noise sample set, and taking the rest training sets as a general training sample set;
after data enhancement processing is carried out on the training set, the reverse noise sample set and the general training sample set are input into a U-Net model for training;
and calculating the detection precision ratio between the general training sample set and the reverse noise sample set after each round of training is finished, and stopping training when the detection precision ratio reaches a preset maximum value to obtain the U-Net model parameters.
7. The method according to claim 5, wherein the repairing of the missing layer information of the two-dimensional high-precision map comprises:
when the environmental information is determined to be missing, detecting the environmental information which is not lost to obtain environmental characteristic information;
matching the environmental characteristic information with a preset environmental database, and calculating a target position relation between a detected first target and a second target with information loss based on a computer vision principle;
and displaying AR image information at the position where the environmental information is missing according to the target position relation, and repairing the missing layer information of the two-dimensional high-precision map based on the AR image information.
8. A three-dimensional high-precision map construction device based on trinocular vision is characterized by comprising:
the calibration module is used for respectively calibrating each camera in the three-camera and determining the position relation between the cameras;
and the construction module is used for acquiring the image information acquired by the trinocular camera, and constructing a three-dimensional reconstruction map model and a two-dimensional high-precision map according to the image information to obtain a three-dimensional rendering map.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1-7 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202110993664.2A 2021-08-27 2021-08-27 Three-dimensional high-precision map construction method and device based on trinocular vision Pending CN113744361A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110993664.2A CN113744361A (en) 2021-08-27 2021-08-27 Three-dimensional high-precision map construction method and device based on trinocular vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110993664.2A CN113744361A (en) 2021-08-27 2021-08-27 Three-dimensional high-precision map construction method and device based on trinocular vision

Publications (1)

Publication Number Publication Date
CN113744361A true CN113744361A (en) 2021-12-03

Family

ID=78733335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110993664.2A Pending CN113744361A (en) 2021-08-27 2021-08-27 Three-dimensional high-precision map construction method and device based on trinocular vision

Country Status (1)

Country Link
CN (1) CN113744361A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419590A (en) * 2022-01-17 2022-04-29 北京百度网讯科技有限公司 High-precision map verification method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419590A (en) * 2022-01-17 2022-04-29 北京百度网讯科技有限公司 High-precision map verification method, device, equipment and storage medium
CN114419590B (en) * 2022-01-17 2024-03-19 北京百度网讯科技有限公司 Verification method, device, equipment and storage medium of high-precision map

Similar Documents

Publication Publication Date Title
CN106683068B (en) Three-dimensional digital image acquisition method
WO2018119889A1 (en) Three-dimensional scene positioning method and device
AU2011362799B2 (en) 3D streets
CN108074267B (en) Intersection point detection device and method, camera correction system and method, and recording medium
CN113689578B (en) Human body data set generation method and device
CN112686877B (en) Binocular camera-based three-dimensional house damage model construction and measurement method and system
CN109035327B (en) Panoramic camera attitude estimation method based on deep learning
CN114758337B (en) Semantic instance reconstruction method, device, equipment and medium
CN116310046B (en) Image processing method, device, computer and storage medium
CN115830135A (en) Image processing method and device and electronic equipment
CN109712197B (en) Airport runway gridding calibration method and system
CN117132737B (en) Three-dimensional building model construction method, system and equipment
CN113744361A (en) Three-dimensional high-precision map construction method and device based on trinocular vision
CN115880448B (en) Three-dimensional measurement method and device based on binocular imaging
CN115994952B (en) Calibration method and device for panoramic image system, computer equipment and storage medium
CN115620264B (en) Vehicle positioning method and device, electronic equipment and computer readable medium
CN116543109A (en) Hole filling method and system in three-dimensional reconstruction
CN116524382A (en) Bridge swivel closure accuracy inspection method system and equipment
CN112652056B (en) 3D information display method and device
CN112634439B (en) 3D information display method and device
CN113610927B (en) AVM camera parameter calibration method and device and electronic equipment
CN113870365B (en) Camera calibration method, device, equipment and storage medium
CN116310408B (en) Method and device for establishing data association between event camera and frame camera
US11055835B2 (en) Method and device for generating virtual reality data
CN115930988A (en) Visual odometer method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination