CN114018215B

CN114018215B - Monocular distance measuring method, device, equipment and storage medium based on semantic segmentation

Info

Publication number: CN114018215B
Application number: CN202210001002.7A
Authority: CN
Inventors: 张康宁; 张海强; 李成军; 朱磊
Original assignee: Zhidao Network Technology Beijing Co Ltd
Current assignee: Zhidao Network Technology Beijing Co Ltd
Priority date: 2022-01-04
Filing date: 2022-01-04
Publication date: 2022-04-12
Anticipated expiration: 2042-01-04
Also published as: CN114018215A

Abstract

The application relates to a monocular distance measurement method, a monocular distance measurement device, monocular distance measurement equipment and a storage medium based on semantic segmentation. The method comprises the following steps: generating a ground grid point matrix according to a self-vehicle coordinate system, and performing semantic segmentation on a front vehicle image acquired by a monocular camera, wherein the type of the semantic segmentation of pixels in the front vehicle image comprises the following steps: the method comprises the steps of converting a ground grid point matrix to a front vehicle image according to a preset projection matrix of an interested object in a road pavement and a road, determining a target grid grounding point set of the interested object in the road according to the interested object in the road pavement and the road, wherein the target grid grounding point set comprises a plurality of target grid grounding points, and obtaining a distance measurement result of a monocular camera according to the distance from the target grid grounding point set to a vehicle. Semantic information association is carried out on the preset projection matrix and the images of the front vehicles, and a plurality of target grid grounding points of the interested object closest to the road surface are obtained, so that the distance measurement precision is improved.

Description

Monocular distance measuring method, device, equipment and storage medium based on semantic segmentation

Technical Field

The present application relates to the field of automatic driving technologies, and in particular, to a monocular distance measuring method, apparatus, device, and storage medium based on semantic segmentation.

Background

The monocular distance measurement method is widely applied to the field of automatic driving due to low hardware cost, and has important value in the aspect of distance measurement of objects.

Currently, in the related art, most monocular distance measurement methods are implemented as follows: detecting an object detection frame of interest in the image by using a depth learning technology, then further refining the object in the detection frame to contour points or feature points, and finally completing distance measurement of the object based on a priori assumption and a camera imaging principle. However, the above method has a defect, if the detection frame cannot fit with the contour of the object, more complicated refinement processing needs to be performed on the object in the detection frame, otherwise, the distance measurement accuracy is affected, but the complicated refinement processing involves a complicated flow, consumes time and labor, and the extracted contour points or feature points do not necessarily meet the priori assumption that the object is located on the ground, which all have a great influence on the monocular distance measurement accuracy.

Disclosure of Invention

In order to solve or partially solve the problems in the related art, the application provides a monocular distance measurement method and device based on semantic segmentation and a storage medium, and the distance measurement precision can be improved.

The first aspect of the present application provides a monocular distance measurement method based on semantic segmentation, including:

generating a ground grid point matrix according to the own vehicle coordinate system;

performing semantic segmentation on a front vehicle image acquired by a monocular camera, wherein the type of the semantic segmentation of pixels in the front vehicle image comprises: road pavement, objects of interest in the road;

converting the ground grid point matrix to the image of the front vehicle according to a preset projection matrix, and determining a target grid ground point set of an interested object in the road according to the road surface and the interested object in the road, wherein the target grid ground point set comprises a plurality of target grid ground points;

and obtaining the distance measurement result of the monocular camera according to the distance from the target grid grounding point set to the vehicle.

Preferably, the generating a ground grid point matrix according to the own vehicle coordinate system includes:

generating a ground grid point matrix and a semantic value matrix according to a self-vehicle coordinate system, wherein the dimension of the ground grid point matrix is the same as that of the semantic value matrix;

the converting the ground grid points to the image of the preceding vehicle according to a preset projection matrix, and determining a target grid ground point set of an interested object in the road according to the road surface and the interested object in the road, includes:

converting the ground grid point matrix into a ground grid point projection matrix which is positioned in the same coordinate system with the front vehicle image according to a preset projection matrix;

acquiring pixel values of the road surface and the interested object in the road according to the ground grid point projection matrix, storing the pixel values into the semantic value matrix, and carrying out edge detection on the semantic value matrix to obtain a continuous contour point set, closest to the road surface, of the interested object in the road;

and finding a target grid grounding point set of the interested object in the road in the ground grid point matrix by taking the contour point set closest to the road surface as a mapping index.

Preferably, the length, the width and the resolution of the ground grid point matrix are adjustable.

Preferably, the semantic segmentation is performed on the preceding vehicle image acquired by the monocular camera, where the type of semantic segmentation of the pixels in the preceding vehicle image includes: road pavement, the object of interest in the road include:

inputting the image of the front vehicle into a trained deep learning model;

performing forward prediction, and performing pixel-level classification on the front vehicle image through a convolutional neural network;

and semantically segmenting the front vehicle image into a road pavement and an interested object in the road according to the memory arrangement of the semantic segmentation result.

Preferably, the converting the ground grid point matrix into the ground grid point projection matrix located in the same coordinate system as the preceding vehicle image according to a preset projection matrix includes:

setting a camera coordinate system and a pixel coordinate system, wherein the front vehicle image is positioned under the pixel coordinate system;

acquiring a camera internal reference matrix and a camera external reference matrix;

converting the ground grid point matrix into a camera coordinate system according to the camera external parameter matrix to obtain a camera grid point matrix;

and converting the camera grid point matrix into a pixel coordinate system according to the camera internal reference matrix to obtain the ground grid point projection matrix.

Preferably, the performing the edge detection on the semantic value matrix includes:

setting a threshold value to carry out binarization processing on the semantic value matrix;

and detecting the outline of the semantic value matrix by using a Sobel operator.

Preferably, the obtaining of the distance measurement result of the monocular camera according to the distance from the target grid ground point set to the vehicle includes:

calculating the distance between the grounding point of a single target grid and the origin of the coordinate system of the self-vehicle according to the X-axis component, the Y-axis component and the Z-axis component, and repeating the process until the distance between the grounding point of each target grid and the origin of the coordinate system of the self-vehicle is calculated to obtain a distance set to be optimized, wherein the distance set to be optimized comprises a plurality of distances to be optimized;

calculating the variance of the distance set to be optimized, and if the variance is greater than or equal to a rated value, performing mean operation on the distance set to be optimized to obtain the object distance; and if the variance is smaller than the rated value, the object distance is equal to the distance to be optimized with the minimum value.

A second aspect of the present application provides a monocular distance measuring device based on semantic segmentation, including:

the matrix generation module is used for generating a ground grid point matrix according to the own vehicle coordinate system;

the semantic segmentation module is used for performing semantic segmentation on a front vehicle image acquired by the monocular camera, wherein the type of semantic segmentation of pixels in the front vehicle image comprises: road pavement, objects of interest in the road;

the conversion module is used for converting the ground grid point matrix to the image of the front vehicle according to a preset projection matrix and determining a target grid grounding point set of an interested object in the road according to the road surface and the interested object in the road, wherein the target grid grounding point set comprises a plurality of target grid grounding points;

and the distance measurement module is used for obtaining the distance measurement result of the monocular camera according to the distance from the target grid grounding point set to the vehicle.

A third aspect of the present application provides an electronic device comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method as described above.

A fourth aspect of the present application provides a computer-readable storage medium having stored thereon executable code, which, when executed by a processor of an electronic device, causes the processor to perform the method as described above.

The technical scheme provided by the application can comprise the following beneficial effects:

according to the technical scheme, a ground grid point matrix is generated according to a self-vehicle coordinate system, semantic segmentation is carried out on a front vehicle image acquired by a monocular camera, wherein the type of the semantic segmentation of pixels in the front vehicle image comprises the following steps: the method comprises the steps of converting a ground grid point matrix to a front vehicle image according to a preset projection matrix of an interested object in a road pavement and a road, determining a target grid grounding point set of the interested object in the road according to the interested object in the road pavement and the road, wherein the target grid grounding point set comprises a plurality of target grid grounding points, and obtaining a distance measurement result of a monocular camera according to the distance from the target grid grounding point set to a vehicle. Semantic information association is carried out on the preset projection matrix and the images of the front vehicles, and a plurality of target grid grounding points of the interested object closest to the road surface are obtained, so that the distance measurement precision is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The foregoing and other objects, features and advantages of the application will be apparent from the following more particular descriptions of exemplary embodiments of the application as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the application.

Fig. 1 is a schematic flowchart of a monocular distance measuring method based on semantic segmentation according to an embodiment of the present application;

FIG. 2 is a flow chart of a monocular distance measuring method based on semantic segmentation according to another embodiment of the present application;

FIG. 3 is a schematic diagram of a ground grid point matrix shown in an embodiment of the present application;

FIG. 4 is a schematic diagram of a semantic image of an object of interest shown in an embodiment of the present application;

FIG. 5 is a schematic diagram of a semantic value matrix before and after assignment shown in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a monocular distance measuring device based on semantic segmentation according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a monocular distance measuring device based on semantic segmentation according to another embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device shown in an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While embodiments of the present application are illustrated in the accompanying drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms "first," "second," "third," etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.

In the related art, most monocular distance measurement methods refine objects in a detection frame to contour points or features, and finally complete distance measurement of the objects based on a priori assumption and a camera imaging principle. The distance measurement precision of the method can not be effectively ensured because the detection frame can not be attached to the object.

In order to solve the above problem, an embodiment of the present application provides a monocular distance measurement method based on semantic segmentation, which can improve distance measurement accuracy. In order to facilitate understanding of the embodiments of the present application, the technical solutions of the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 shows a schematic flowchart of a monocular distance measurement method based on semantic segmentation according to an embodiment of the present application.

Referring to fig. 1, a monocular distance measuring method based on semantic segmentation includes the following steps:

and step S11, generating a ground grid point matrix according to the own vehicle coordinate system.

The ground grid point matrix is a coordinate point matrix, the ground grid point matrix comprises a plurality of ground grid points, the generation process of the ground grid point matrix can be that a plurality of grids are uniformly divided on the surface, and the place where the grids intersect with each other is the ground grid point. The coordinates of the ground grid points are (x, y, 0).

Step S12, performing semantic segmentation on the front vehicle image acquired by the monocular camera, wherein the type of the semantic segmentation of the pixels in the front vehicle image comprises the following steps: road pavement, objects of interest in a road.

The semantic segmentation is performed on the image of the preceding vehicle, and the purpose of the semantic segmentation is to identify a plurality of object elements in the image of the preceding vehicle, for example, the image of the preceding vehicle includes four object elements, namely, a road surface, a car, a pedestrian and a kitten, wherein the car, the pedestrian and the kitten are objects of interest in the road. After semantic segmentation processing, the current image of the front vehicle can be identified to have four object elements.

And step S13, converting the ground grid point matrix to the image of the front vehicle according to the preset projection matrix, and determining a target grid grounding point set of the interested object in the road according to the road surface and the interested object in the road, wherein the target grid grounding point set comprises a plurality of target grid grounding points.

After the ground grid point matrix is generated, in order to establish a semantic relation between the ground grid point matrix and the images of the preceding vehicles, the ground grid point matrix needs to be converted into the images of the preceding vehicles according to a preset projection matrix, so that the semantic relation between the ground grid point matrix and the images of the preceding vehicles is established. And then determining a target grid grounding point set of the interested object in the road according to the road pavement and the interested object in the road, wherein the target grid grounding point set comprises a plurality of target grid grounding points. The target grid grounding point is a coordinate point of the contour edge of the interested object in the road, which is closest to the road surface.

And step S14, obtaining the distance measurement result of the monocular camera according to the distance from the target grid grounding point set to the vehicle.

It should be noted that, a target grid grounding point including a plurality of target grid grounding points is obtained, and according to the target grid grounding point, the distance between the object of interest in the road and the vehicle is calculated, so as to obtain the distance measurement result of the monocular camera.

According to the method, a ground grid point matrix is generated according to a self-vehicle coordinate system, and semantic segmentation is performed on a front vehicle image acquired by a monocular camera, wherein the type of semantic segmentation of pixels in the front vehicle image comprises: the method comprises the steps of converting a ground grid point matrix to a front vehicle image according to a preset projection matrix of an interested object in a road pavement and a road, determining a target grid grounding point set of the interested object in the road according to the interested object in the road pavement and the road, wherein the target grid grounding point set comprises a plurality of target grid grounding points, and obtaining a distance measurement result of a monocular camera according to the distance from the target grid grounding point set to a vehicle. Semantic information association is carried out on the preset projection matrix and the images of the front vehicles, and a plurality of target grid grounding points of the interested object closest to the road surface are obtained, so that the distance measurement precision is improved.

Fig. 2 shows a schematic flowchart of a monocular distance measuring method based on semantic segmentation according to an embodiment of the present application.

Referring to fig. 2, a monocular distance measuring method based on semantic segmentation includes the following steps:

and step S21, generating a ground grid point matrix and a semantic value matrix according to the own vehicle coordinate system, wherein the dimension of the ground grid point matrix is the same as that of the semantic value matrix.

It should be noted that the ground grid point matrix is a coordinate point matrix, the ground grid point matrix includes a plurality of ground grid points, the process of generating the ground grid point matrix may be to divide a plurality of grids uniformly on the surface, and the place where the grids intersect is the ground grid point. The coordinates of the ground grid points are (x, y, 0). The semantic value matrix is a virtual matrix, and the dimension of the ground grid point matrix is the same as that of the semantic value matrix, that is, the ground grid point matrix and the semantic value matrix have a corresponding relationship, but the difference is that the ground grid point matrix is a set of coordinate points, and the semantic value matrix is used for storing pixel values corresponding to each ground grid point.

It should be noted that the ground grid dot matrix is a matrix with adjustable length, width and resolution, for example, the ground grid dot matrix may be set to a matrix with a length of 50m, a width of 10m and a resolution of 0.1 × 0.1m (the resolution of 0.1 × 0.1m means that the length and the width of the small grid in the ground grid dot matrix are 0.1 m), or the ground grid dot matrix may be set to a matrix with a length of 100m, a width of 20m and a resolution of 0.2 × 0.2 m. The user can flexibly set the length, the width and the resolution according to the actual situation.

Step S22, performing semantic segmentation on the front vehicle image acquired by the monocular camera, wherein the type of the semantic segmentation of the pixels in the front vehicle image comprises the following steps: road pavement, objects of interest in a road.

Inputting a front vehicle image into a trained deep learning model; performing forward prediction, and performing pixel-level classification on the images of the front vehicles through a convolutional neural network; and semantically segmenting the front vehicle image into a road pavement and an interested object in the road according to the memory arrangement of the semantic segmentation result.

The method comprises the steps of firstly inputting a front vehicle image into a trained deep learning model, executing forward prediction by the deep learning model, carrying out pixel-level classification on the front vehicle image by the deep learning model by using a convolutional neural network, wherein the convolutional neural network comprises a convolutional layer, a pooling layer, an upper sampling layer, a full-link layer and an activation function, and finally semantically segmenting the front vehicle image into a road pavement and an interested object in the road according to semantic segmentation result memory arrangement.

And step S23, converting the ground grid point matrix into a ground grid point projection matrix which is positioned in the same coordinate system with the image of the front vehicle according to a preset projection matrix.

After the ground grid point matrix is generated, in order to establish a semantic relationship between the ground grid point matrix and the image of the preceding vehicle, the ground grid point matrix needs to be converted into a position which is located in the same coordinate system as the image of the preceding vehicle according to a preset projection matrix, so that the semantic relationship between the ground grid point matrix and the image of the preceding vehicle is established through the ground grid point projection matrix.

Setting a camera coordinate system and a pixel coordinate system, wherein the image of the front vehicle is positioned under the pixel coordinate system; acquiring a camera internal reference matrix and a camera external reference matrix; converting the ground grid point matrix into a camera coordinate system according to the camera external parameter matrix to obtain a camera grid point matrix; and converting the camera grid point matrix into a pixel coordinate system according to the camera internal reference matrix to obtain a ground grid point projection matrix.

First, a camera coordinate system and a pixel coordinate system are set, wherein the ground grid point matrix is located under the own vehicle coordinate system, and the image of the preceding vehicle is located under the pixel coordinate system. In order to more easily understand the coordinate transformation process of the technical scheme of the application, the ground grid points P are used_i（x_i，y_i0) for example (the ground grid point matrix is composed of a plurality of ground grid points), the ground grid points (x) are projected_i，y_i0) projection onto the camera coordinate system at a ground grid point (x)_i，y_i0) corresponding ground mesh projection points p_i(u, v), the specific conversion formula is as follows:

（1）

wherein, P_iIs a ground grid point, p, in a vehicle coordinate system_iThen, the ground grid projection points are under the pixel coordinate system, R is a camera external parameter matrix, t is a translation variable of the camera external parameter, K is a camera internal parameter matrix, and Z is the translation variable of the camera external parameter_cIs P_iTo the Z coordinate in the camera coordinate system. The camera internal reference matrix and the camera external reference matrix are both the calibration quantity of the camera and are fixedAlternatively, the user can learn by acquiring camera parameters. After the ground grid point projection matrix is obtained, the ground grid point projection matrix is obtained by converting the ground grid point matrix, and the dimension of the ground grid point matrix is the same as that of the semantic value matrix, so that the ground grid point projection matrix, the ground grid point matrix and the semantic value matrix have one-to-one correspondence, namely, a mapping relation is established.

It should be further noted that the processing operations described above may be accelerated by using multiple threads, a CPU instruction set, or a GPU, so as to accelerate the operation efficiency and obtain the ground grid point projection matrix as soon as possible.

And step S24, acquiring pixel values of the road pavement and the interested object in the road according to the ground grid point projection matrix, storing the pixel values into a semantic value matrix, and carrying out edge detection on the semantic value matrix to obtain a continuous contour point set closest to the road pavement of the interested object in the road.

It should be noted that, because the ground grid point projection matrix and the semantic value matrix have a one-to-one correspondence relationship, semantic images of objects of interest in the road pavement and the road are obtained through semantic segmentation of the images of the preceding vehicles (pixel values are stored in the object semantic images), at this time, the pixel values of the semantic images can be stored in the semantic value matrix through the ground grid point projection matrix, then edge detection is performed on the semantic value matrix in which the pixel values are stored, so that a continuous contour point set can be obtained, and each contour point of the contour point set is connected, so that the contour line of the objects of interest in the road can be obtained, and the continuous contour point set, closest to the road pavement, of the objects of interest in the road is identified. .

The above conversion process is described in detail in a specific case as follows:

fig. 3 shows a schematic diagram of a ground dot matrix (5 rows x7 columns). Each element in the ground grid point matrix shown in fig. 3 is a three-dimensional point coordinate (x, y, z = 0), (0, 0, 0) in the own vehicle coordinate system, which represents the origin of the coordinate system, and the positive directions of the x axis and the y axis are shown in fig. 3.

Obtaining interest objects in roads through semantic segmentationThe semantic image of the body, as shown in fig. 4, is a semantic image of an object of interest in a road. The internal reference K and the external reference (R, t) of the onboard camera are known, projecting each three-dimensional point in the ground grid point matrix into the image of the preceding vehicle. (1) In the formula, Pi is a pixel point, Pi is a ground grid point, R is a camera external parameter matrix, t is a translation variable of the camera external parameter, K is a camera internal parameter matrix, and Z is a linear transformation matrix_cIs P_iTo the Z coordinate in the camera coordinate system. Suppose nine points p1 (3, 1, 0), p2 (3, 0, 0), p3 (3, -1, 0), p4 (2, 1, 0), p5 (2, 0, 0), p6 (2, -1, 0), p7 (1, 1, 0), p8 (1, 0, 0), p9 (1, -1, 0).

As shown in fig. 5, the camera internal reference and the camera external reference are projected into the semantic image of the object of interest in the road, the corresponding semantic values are nine 1 pixel points (for example, the regions in fig. 5 that are both 1), the initial semantic value matrix elements are all set to 0, and then the effect of filling the semantic value matrix is the "assigned semantic value matrix" in fig. 5.

Further, in one embodiment, the edge detection on the semantic value matrix comprises: setting a threshold value to carry out binarization processing on the semantic value matrix; and detecting the outline of the semantic value matrix by using a Sobel operator. It should be noted that, a user of taking a threshold value can set the threshold value according to the actual situation, and finally, the outline of the semantic value matrix is detected by using a Sobel operator. Compared with an edge calculation method in the related art, the edge detection processing does not need to perform gray processing on a semantic value matrix before binarization processing, and simultaneously does not need to draw a contour by using OpenCV (open source/consumer computer vision system), so that the operation burden of edge detection is greatly reduced, and a complex operation flow can be simplified.

And step S25, finding a target grid grounding point set of the interested object in the road in the ground grid point matrix by taking the contour point set closest to the road surface of the road as a mapping index.

It should be noted that, after the contour point set (including a plurality of contour points in the contour point set) of the automobile closest to the road surface is obtained, since the semantic value matrix and the ground grid point matrix have a one-to-one correspondence relationship, the contour point set can be used as a mapping index at this time to find the target grid ground point set in the ground grid point matrix.

And step S26, obtaining the distance measurement result of the monocular camera according to the distance from the target grid grounding point set to the vehicle.

And calculating the distance between a single target grid point and the origin of the coordinate system of the self-vehicle according to the X-axis component, the Y-axis component and the Z-axis component, and repeating the process until the distance between each target grid point and the origin of the coordinate system of the self-vehicle is calculated to obtain a distance set to be optimized, wherein the distance set to be optimized comprises a plurality of distances to be optimized. Carrying out variance calculation on the distance set to be optimized, and carrying out mean operation on the distance to be optimized to obtain the object distance if the variance is greater than or equal to a rated value; and if the variance is smaller than the rated value, the object distance is equal to the distance to be optimized of the minimum value.

Note that, for example, a set of contour points k of the car closest to the road surface is obtained by edge-processing a semantic image of the car₁、k₂、k₃、k₄、……、k_nFinding a target grid grounding point set { g) in the ground grid point matrix according to the mapping relation between the semantic value matrix and the ground grid point matrix₁、g₂、g₃、g₄、……、g_nAt this point, the target grid ground point set g can be passed₁、g₂、g₃、g₄、……、g_nH, calculating a single target grid point g_i（x_i、y_i，z_i) The distance from the origin of the coordinate system of the self-vehicle is specifically represented by the following formula:

（2）

wherein, due to the target grid point g_iLocated under the vehicle coordinate system, and thus target grid points g_iThe Z-axis component of (2) is 0, and the formula is a simplified formula under the condition that the Z-axis component is 0, at this time, a single target grid point g can be obtained through calculation_iThe distance to the origin of the coordinate system of the vehicle is obtained, and the process is repeated until each target grid point g is obtained_iDistance meter with origin of vehicle coordinate systemCalculating to obtain a distance set { d ] to be optimized₁、d₂、d₃、d₄、……、d_nFinally, we treat the optimized distance set { d }₁、d₂、d₃、d₄、……、d_nSolving the variance, and if the variance is larger than or equal to the rated value, carrying out mean operation on the distance set to be optimized to obtain the object distance; if the variance is less than the target value, the object distance is equal to the distance d to be optimized of the minimum value_i。

According to the method, interesting semantic information in image data is segmented by a deep learning method, then a ground grid point matrix and a semantic value matrix which are generated in advance are utilized, the ground grid point matrix is projected and converted to the ground grid projection point matrix in the same coordinate system with a front vehicle image, mapping relations among the ground grid point matrix, the semantic value matrix and the ground grid projection point matrix are established, namely semantic relation among the ground grid point matrix, the semantic value matrix and the ground grid projection point matrix is established, edge detection is carried out on the semantic value matrix with stored pixels to obtain a continuous contour point set, closest to a road surface, of an interesting object in a road, the contour point set is used as a mapping index to match a target grid ground point set in the ground grid point matrix, and finally the target grid ground point set is used for calculating to obtain the object distance.

According to the method and the device, the characteristic points do not need to be further extracted after the object contour lines are obtained in the related technology, so that a complex image processing flow can be omitted, and an operation flow is simplified. Meanwhile, due to the mapping relation among the ground grid point matrix, the semantic value matrix and the ground grid projection point matrix, the obtained contour line can be attached to the contour of the object semantic image obtained from the semantic segmentation result, and a target grid grounding point set obtained by subsequent mapping can also be attached to the contour of the object semantic image, so that the ranging precision is improved.

Corresponding to the embodiment of the application function method, the application also provides a corresponding embodiment of the monocular distance measuring device based on semantic segmentation.

Fig. 6 shows a schematic structural diagram of a monocular distance measuring device 80 based on semantic segmentation in the embodiment of the present application.

Referring to fig. 6, a monocular distance measuring device 80 based on semantic segmentation includes: a matrix generation module 810, a semantic segmentation module 820, a conversion module 830, and a ranging module 840.

The matrix generation module 810 is configured to generate a ground grid point matrix according to the own vehicle coordinate system;

the semantic segmentation module 820 is configured to perform semantic segmentation on a preceding vehicle image acquired by the monocular camera, where the type of semantic segmentation of pixels in the preceding vehicle image includes: road pavement, objects of interest in the road;

the conversion module 830 is configured to convert the ground grid point matrix into a preceding vehicle image according to a preset projection matrix, and determine a target grid ground point set of an object of interest in a road according to a road surface and the object of interest in the road, where the target grid ground point set includes a plurality of target grid ground points;

the distance measurement module 840 is used for obtaining a distance measurement result of the monocular camera according to the distance from the target grid ground point set to the vehicle.

In the apparatus of this embodiment, the matrix generating module 810 generates a ground grid point matrix according to a vehicle coordinate system, and the semantic segmentation module 820 performs semantic segmentation on a preceding vehicle image acquired by a monocular camera, where the type of semantic segmentation of pixels in the preceding vehicle image includes: the method comprises the steps that a road pavement and interested objects in the road are converted to a front vehicle image through a conversion module 830 according to a preset projection matrix, a target grid grounding point set of the interested objects in the road is determined according to the interested objects in the road pavement and the road, the target grid grounding point set comprises a plurality of target grid grounding points, and a distance measurement module 840 obtains a distance measurement result of a monocular camera according to the distance from the target grid grounding point set to a vehicle. Semantic information association is carried out on the preset projection matrix and the images of the front vehicles, and a plurality of target grid grounding points of the interested object closest to the road surface are obtained, so that the distance measurement precision is improved.

Fig. 7 is a schematic structural diagram of a monocular distance measuring device 80 based on semantic segmentation in another embodiment of the present application.

Referring to fig. 7, a monocular distance measuring device 80 based on semantic segmentation includes: a matrix generation module 810, a semantic segmentation module 820, a conversion module 830, and a ranging module 840. The conversion module 830 includes a projection unit 831, an edge detection unit 832, and a mapping unit 833.

Please refer to the description in fig. 6 for the functions of the semantic segmentation module 820 and the ranging module 840, which is not described herein again.

The matrix generation module 810 is configured to generate a ground grid point matrix and a semantic value matrix according to the own vehicle coordinate system, where a dimension of the ground grid point matrix is the same as a dimension of the semantic value matrix.

The projection unit 831 is configured to convert the ground grid point matrix into a ground grid point projection matrix located in the same coordinate system as the preceding vehicle image according to a preset projection matrix.

The edge detection unit 832 is configured to obtain pixel values of the road surface and the object of interest in the road according to the ground grid point projection matrix, store the pixel values in the semantic value matrix, and perform edge detection on the semantic value matrix to obtain a continuous contour point set closest to the road surface of the object of interest in the road.

The mapping unit 833 is used to find a target mesh ground point set of an object of interest in a road within the ground mesh point matrix using the set of contour points closest to the road surface as a mapping index.

Referring to fig. 8, an electronic device 1000 includes a processor 1100 and a memory 1200.

The Processor 1100 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 1200 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions for the processor 1100 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 1200 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (e.g., DRAM, SRAM, SDRAM, flash, programmable read-only memory), magnetic and/or optical disks, as well. In some embodiments, memory 1200 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a digital versatile disc read only (e.g., DVD-ROM, dual layer DVD-ROM), a Blu-ray disc read only, an ultra-dense disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disk, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.

The memory 1200 has stored thereon executable code, which when processed by the processor 910, may cause the processor 1200 to perform some or all of the methods described above.

Furthermore, the method according to the present application may also be implemented as a computer program or computer program product comprising computer program code instructions for performing some or all of the steps of the above-described method of the present application.

Alternatively, the present application may also be embodied as a computer-readable storage medium (or non-transitory machine-readable storage medium or machine-readable storage medium) having executable code (or a computer program or computer instruction code) stored thereon, which, when executed by a processor of an electronic device (or server, etc.), causes the processor to perform part or all of the various steps of the above-described method according to the present application.

Having described embodiments of the present application, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A monocular distance measuring method based on semantic segmentation is characterized by comprising the following steps:

the determining a target grid grounding point set of the interested object in the road according to the road surface and the interested object in the road comprises:

obtaining a contour point set { k1, k2, k3, k4, … … and kn } of the object of interest in the road, which is closest to the road surface, by performing semantic segmentation on the preceding vehicle image;

determining the target grid ground point set { g1, g 2, g 3, g 4, … …, g n } in the ground grid point matrix through the mapping relation between the semantic value matrix and the ground grid point matrix;

2. The monocular distance measuring method based on semantic segmentation according to claim 1, wherein the generating a ground grid point matrix according to the own vehicle coordinate system comprises:

and finding a target grid grounding point set of the interested object in the road in the ground grid point matrix by taking the contour point set closest to the road surface of the road as a mapping index.

3. The semantic segmentation based monocular distance measuring method of claim 2, wherein the length, width and resolution of the ground grid point matrix are adjustable.

4. The monocular distance measuring method based on semantic segmentation as claimed in claim 2, wherein the semantic segmentation is performed on the image of the leading vehicle collected by the monocular camera, and the type of semantic segmentation of the pixels in the image of the leading vehicle comprises: road pavement, the object of interest in the road include:

inputting the image of the front vehicle into a trained deep learning model;

5. The monocular distance measuring method based on semantic segmentation according to claim 2, wherein the converting the ground grid point matrix into the ground grid point projection matrix under the same coordinate system as the front vehicle image according to a preset projection matrix comprises:

6. The monocular distance measuring method based on semantic segmentation according to claim 2, wherein the edge detection of the semantic value matrix comprises:

7. The monocular distance measuring method based on semantic segmentation according to claim 2, wherein obtaining a distance measuring result of a monocular camera according to a distance from the target grid ground point set to a vehicle comprises:

8. A monocular distance measuring device based on semantic segmentation is characterized by comprising:

the conversion module is used for converting the ground grid point matrix to the image of the front vehicle according to a preset projection matrix and determining a target grid grounding point set of an interested object in the road according to the road surface and the interested object in the road, wherein the target grid grounding point set comprises a plurality of target grid grounding points; wherein the determining a target grid earth point set of the object of interest in the road according to the road surface and the object of interest in the road comprises: obtaining a contour point set { k1, k2, k3, k4, … … and kn } of the object of interest in the road, which is closest to the road surface, by performing semantic segmentation on the preceding vehicle image; determining the target grid ground point set { g1, g 2, g 3, g 4, … …, g n } in the ground grid point matrix through the mapping relation between the semantic value matrix and the ground grid point matrix;

9. An electronic device, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any one of claims 1 to 7.

10. A computer readable storage medium having stored thereon executable code which, when executed by a processor of an electronic device, causes the processor to perform the method of any of claims 1 to 7.