CN115526987A

CN115526987A - Label element reconstruction method, system, device and medium based on monocular camera

Info

Publication number: CN115526987A
Application number: CN202211156746.2A
Authority: CN
Inventors: 杨蒙蒙; 江昆; 温拓朴; 杨殿阁; 黄晋; 唐雪薇; 黄健强
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-09-22
Filing date: 2022-09-22
Publication date: 2022-12-27

Abstract

The invention relates to a method, a system, equipment and a medium for reconstructing a sign element based on a monocular camera, wherein the method comprises the following steps: acquiring a monocular image, a GNSS signal, an IMU signal and a wheel speed signal; the acquired monocular image is subjected to perception processing to obtain a map element result of image perception; acquiring six-degree-of-freedom information of the vehicle based on the GNSS signal, the IMU signal and the wheel speed signal; and calculating road side sign elements based on the map element result of image perception and six-degree-of-freedom information of the vehicle to obtain three-dimensional information of the road side signs. The invention can realize the reconstruction of the label elements by using only low-cost monocular cameras, GNSS and IMU.

Description

Label element reconstruction method, system, device and medium based on monocular camera

Technical Field

The invention relates to a method, a system, equipment and a medium for reconstructing a label element based on a monocular camera, and relates to the technical field of intelligent networking automobile environment construction.

Background

High-precision maps are an important basis for high-level automated driving. High-precision maps are important inputs in high-level autonomous driving, which require a description of the road environment from decimetre to centimetre levels.

The traditional method mainly adopts a laser radar for collection, but is expensive and difficult to apply on a large scale. In the prior art, data of a crowdsourcing monocular camera is used for reconstruction, the cost is low, the data can be installed in a large scale, but the depth measurement is lacked, and the reconstruction of road side elements is difficult.

Disclosure of Invention

In view of the above problems, it is an object of the present invention to provide a method, system, device and medium for reconstructing signage elements based on a monocular camera, which can achieve the reconstruction of signage elements.

In order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows:

in a first aspect, the present invention provides a method for reconstructing a signage component based on a monocular camera, comprising:

acquiring a monocular image, a GNSS signal, an IMU signal and a wheel speed signal;

the obtained monocular image is subjected to perception processing to obtain a map element result of image perception;

acquiring six-degree-of-freedom information of the vehicle based on the GNSS signal, the IMU signal and the wheel speed signal;

and calculating road side sign elements based on the map element result of image perception and six-degree-of-freedom information of the vehicle to obtain three-dimensional information of the road side signs.

Further, it is necessary to include a sign on the road side in the monocular image.

Further, the obtaining of the map element result of image perception by perception processing of the acquired monocular image includes:

the label sensing is used for acquiring mask data of label pixels in the monocular image;

and edge extraction, namely performing edge extraction on the mask data of the road side sign pixels, and outputting a series of control points in the image to represent the closed contour of the road side sign element.

Further, signage perception, comprising:

zooming the monocular image;

setting an image segmentation model based on a convolutional neural network;

and inputting the monocular image into an image segmentation model, and performing forward calculation on the neural network through a GPU/FPGA/AI chip calculation unit carrying the image segmentation model to obtain a mask of the signboard pixels in the image.

Further, calculating road side sign elements based on the map element result of image perception and six-degree-of-freedom information of the vehicle to obtain three-dimensional information of the road side signs, and the method comprises the following steps:

predicting positions of the sign element examples in different frame images by using optical flows according to the closed contour of the road side sign element and six-degree-of-freedom information of the vehicle to obtain closed contours of signs at different moments;

and solving the closed outlines of the signs under the plurality of moments and the pose information of the six-degree-of-freedom vehicle through the set template information to finally obtain the three-dimensional information of the roadside signs.

Further, solving the closed outlines of the signs at multiple moments and the pose information of the six-degree-of-freedom vehicle through the set template information to finally obtain the three-dimensional information of the roadside signs, wherein the three-dimensional information comprises the following steps:

defining a label shape template;

parameterizing the signboard, wherein the parameterization comprises shape parameters and position parameters of the signboard;

defining an objective function:

wherein, the first and the second end of the pipe are connected with each other,

to represent

Pixel position projected onto the image, R _k Tk represents the six-degree-of-freedom pose of the camera at k, pi represents the projection model of the pinhole camera, p represents the closed contour, and

the nearest point, w represents the point in the world coordinate system;

and optimizing the objective function by using an L-M optimization algorithm, and solving the shape parameter and the position parameter of the sign.

Further, a sign shape template is defined, including rectangular templates and circular templates.

In a second aspect, the present invention further provides a system for reconstructing signage components based on a monocular camera, comprising:

a signal acquisition module configured to acquire a monocular image as well as a GNSS signal, an IMU signal, and a wheel speed signal;

the perception module is configured to perform perception processing on the acquired monocular image to obtain a map element result of image perception;

a positioning module configured to obtain six-degree-of-freedom information of the vehicle based on the GNSS signals, the IMU signals and the wheel speed signals;

and the roadside sign element calculation module is configured to calculate the roadside sign element based on the image-perceived map element result and the six-degree-of-freedom information of the vehicle to obtain the three-dimensional information of the roadside sign.

In a third aspect, the present invention further provides an electronic device comprising computer program instructions, wherein the program instructions, when executed by a processor, are adapted to implement the method for reconstructing a signage component based on a monocular camera.

In a fourth aspect, the present invention further provides a computer-readable storage medium having stored thereon computer program instructions, wherein the program instructions, when executed by a processor, are for implementing the monocular camera-based signage component reconstruction method.

Due to the adoption of the technical scheme, the invention has the following characteristics:

1. according to the method, a monocular image, a GNSS signal, an IMU signal and a wheel speed signal are obtained; the acquired monocular image is subjected to perception processing to obtain a map element result of image perception; acquiring six-degree-of-freedom information of the vehicle based on the GNSS signal, the IMU signal and the wheel speed signal; and calculating road side sign elements based on the map element result of image perception and six-degree-of-freedom information of the vehicle to obtain three-dimensional information of the road side signs, so that the sign elements can be reconstructed only by using low-cost monocular cameras, GNSS and IMU.

2. The invention can be applied to the construction of various signboard, including rectangular and round signboard.

In conclusion, the invention can be widely applied to the reconstruction of the label.

Drawings

Various additional advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Like parts are designated with like reference numerals throughout the drawings. In the drawings:

fig. 1 is a schematic overall flow chart of a monocular vision roadside sign element reconstruction method according to an embodiment of the present invention.

Fig. 2 is a flow chart of roadside sign object calculation according to an embodiment of the present invention.

FIG. 3 is a signage parameterization template of an embodiment of the invention: (a) Is a rectangular template with the length of h and the width of w, and (b) is a circular template with the radius of r.

Detailed Description

It is to be understood that the terminology used herein is for the purpose of describing particular example embodiments only, and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms "comprises," "comprising," "including," and "having" are inclusive and therefore specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order described or illustrated, unless specifically identified as an order of performance. It should also be understood that additional or alternative steps may be used.

Although the terms first, second, third, etc. may be used herein to describe various elements, components, regions, layers and/or sections, these elements, components, regions, layers and/or sections should not be limited by these terms. These terms may be only used to distinguish one element, component, region, layer or section from another region, layer or section. Terms such as "first," "second," and other numerical terms when used herein do not imply a sequence or order unless clearly indicated by the context. Thus, a first element, component, region, layer or section discussed below could be termed a second element, component, region, layer or section without departing from the teachings of the example embodiments.

For convenience of description, spatially relative terms, such as "inner", "outer", "lower", "upper", and the like, may be used herein to describe one element or feature's relationship to another element or feature as illustrated in the figures. This spatially relative term is intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures.

Because the prior art uses the data of the crowdsourcing monocular camera to reconstruct and lacks the measurement of depth, the reconstruction of the roadside element is difficult to realize. The invention provides a method, a system, equipment and a medium for reconstructing a sign element based on a monocular camera, comprising the following steps: acquiring a monocular image, a GNSS signal, an IMU signal and a wheel speed signal; the acquired monocular image is subjected to perception processing to obtain a map element result of image perception; acquiring six-degree-of-freedom information of the vehicle based on the GNSS signal, the IMU signal and the wheel speed signal; and calculating road side sign elements based on the map element result of image perception and six-degree-of-freedom information of the vehicle to obtain three-dimensional information of the road side signs. Therefore, the invention can realize the reconstruction of the signage elements by using only low-cost monocular cameras, GNSS and IMU.

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

The first embodiment is as follows: the method for reconstructing a signage component based on a monocular camera provided by the embodiment comprises the following steps:

s1, acquiring a monocular image, a GNSS (global navigation satellite system) signal, an IMU (inertial measurement unit) signal and a wheel speed signal.

Specifically, the monocular image needs to include a sign on the road side. The GNSS signal is used for obtaining absolute position information of the vehicle, and the IMU signal and the wheel speed signal are used for obtaining a pose of six degrees of freedom of the continuous vehicle.

S2, the acquired monocular image is subjected to perception processing, and a map element result of image perception is obtained.

Specifically, the sensing processing input is monocular image information, the output is a sensing vector result of a sign in an image, and the sensing processing comprises the steps of sign sensing and edge extraction, wherein:

s21, the label sensing process comprises the following steps:

scaling the input monocular image, where the image is scaled to 768 × 480 in this embodiment, for example, but not limited thereto;

an image segmentation model based on a convolutional neural network is arranged;

and inputting the input image into an image segmentation model, and performing forward calculation on the neural network through a GPU/FPGA/AI chip calculation unit carrying the image segmentation model to obtain a mask of the tag pixels in the image.

S22, edge extraction

Mask data of the road side sign pixels are input, edge extraction is carried out through a Canny operator, a series of control points in the image are output, and the closed outline of the road side sign elements is represented.

And S3, obtaining six-degree-of-freedom information of the vehicle.

GNSS signals, monocular images, IMU and wheel speed signals are processed, and six-degree-of-freedom information including three-dimensional pose information and three-degree-of-freedom rotation angle information obtained from the vehicle is obtained by utilizing the existing GNSS/IMU/wheel speed multi-sensor fusion technology. Positioning may use GNSS or RTK (carrier phase differential) signals to obtain positioning results of different positioning accuracy. In addition, other sensors, such as visual odometers, may also be incorporated to achieve more accurate relative positioning accuracy.

S4, as shown in FIG. 2, calculating the road side sign element by using the image sensing result and the six-degree-of-freedom information of the vehicle to obtain the three-dimensional information of the road side sign element, wherein the three-dimensional information comprises the following steps:

s41, predicting positions of the sign element examples in different frame images by using optical flow tracking according to the input closed contour of the road side sign element and the vehicle six-degree-of-freedom information obtained through solving in the previous step, and obtaining the closed contour of the sign at different moments.

S42, solving the closed outlines of the signs under the multiple moments and the pose information of the six-degree-of-freedom vehicle through the set template information to finally obtain the three-dimensional information of the roadside signs, wherein the solving process is as follows:

and S421, defining two typical sign shape templates.

Specifically, as shown in fig. 3 (a), the sign shape template is defined as a rectangle, and the rectangle is composed of 8 key points, which are:

(0.5w,0.25h),(0.5w,-0.25h),(0.25w,-0.5h),(-0.25w,-0.5h),

(-0.5w,-0.25h),(-0.5w,0.25h),(-0.25w,0.5h),(0.25w,0.5h)

where w represents the width and h represents the height, these two variables will be estimated in the solution module.

As shown in fig. 3 (b), the plate-shaped template is further defined as a circle, which is also indicated by 8 dots, respectively:

wherein r is the radius of the circle.

S422 parameterization of signboard

Specifically, the parameterization of the signboard comprises two parts:

one part is a shape parameter, such as a rectangle, which is w, and a circle, which is r;

another part is the position parameter p of the sign _c ,R _o Respectively representing the central position of the signboard and the rotation matrix of the signboard relative to a world coordinate system, and assuming that a key point corresponding to one signboard is p ₀ ,p ₁ ,…,p ₇ Center point of which is p _c The lateral edge of a given sign is the x-axis, the vertically upward edge is the y-axis, and the normal to the plane is the z-axis. The rotation matrix of the label coordinate system relative to the world coordinate system is R _o Then the points in the three-dimensional space of the signboard are:

w denotes that the point is in the world coordinate system,

representing the point of the signboard in three-dimensional space.

S423, defining an objective function, and obtaining the optimal signage parameter by minimizing the objective function, where the objective function is defined as:

wherein the content of the first and second substances,

to represent

Pixel position projected onto the image, R _k ,t _k Representing the pose of six degrees of freedom of the camera at the k time, pi representing the projection model of the pinhole camera, and p representing the neutralization in the closed contour

The closest point.

S424, optimizing the objective function by using an L-M optimization algorithm, solving the shape parameter and the position parameter of the sign, and iteratively solving the increment delta t and delta theta, wherein the delta t is p _c Delta theta = (delta theta) ₁ ,δθ ₂ ,δθ ₃ ) Is R _o By finding the increment, the rotation matrix of the signboard can be updated.

Since the signboard is generally a plane perpendicular to the ground, its orientation is a one-dimensional rotation around the z-axis, so that R is updated _o The following formula is used for optimization:

R _o ←R _o ·exp(0,0,δθ ₃ )

in this way, updated R can be guaranteed _o It is always a one-dimensional rotation around the z-axis, i.e. a rotation around the signboard plane normal vector.

Example two: the first embodiment provides a method for reconstructing a signage component based on a monocular camera, and correspondingly, the first embodiment provides a system for reconstructing a signage component based on a monocular camera. The system provided in this embodiment may implement the method for reconstructing a signage component based on a monocular camera in the first embodiment, and the system may be implemented by software, hardware, or a combination of software and hardware. For convenience of description, the present embodiment is described with the functions divided into various units, which are described separately. Of course, the functions of the units may be implemented in the same software and/or hardware or in one or more pieces. For example, the system may comprise integrated or separate functional modules or units to perform the corresponding steps in the method of an embodiment. Since the system of the present embodiment is substantially similar to the method embodiment, the description process of the present embodiment is relatively simple, and reference may be made to part of the description of the first embodiment for relevant points.

Specifically, the system for reconstructing a signage component based on a monocular camera provided in this embodiment includes:

and the roadside sign element calculation module is configured to calculate the roadside sign elements based on the map element result sensed by the image and six-degree-of-freedom information of the vehicle to obtain three-dimensional information of the roadside sign.

In a preferred embodiment, the sensing module comprises a signage sensing module and an edge extraction module, wherein:

the label perception module comprises an image segmentation model based on a convolution neural network, wherein an input image is firstly scaled to 768 × 480, then the input image is sent into the neural network, and the neural network is subjected to forward calculation through a GPU/FPGA/AI chip calculation unit mounted on the image segmentation model to obtain a mask of label pixels in the image.

And the edge extraction module inputs mask data of the road side sign pixels, performs edge extraction through a Canny operator, and finally outputs a series of control points in the image to represent the closed contour of the road side sign element.

In a preferred embodiment, the positioning module is configured to obtain six degrees of freedom information for the vehicle.

Specifically, the positioning module: the input is GNSS, monocular image, IMU and wheel speed signal, and the output is the six-freedom degree information of the self-vehicle. The positioning module may be compatible with GNSS or RTK signals to obtain positioning results of different positioning accuracies. In addition, other sensors may also be fused to obtain more accurate relative positioning accuracy.

In a preferred embodiment, specifically, as shown in fig. 2, the roadside sign element calculation module: the module inputs the closed contour of the road side sign element and the positioning information of the vehicle output by the sensing module, and outputs the three-dimensional information of the road side sign, and the module comprises the following components:

firstly, tracking calculation is carried out according to the input closed contour and the positioning information, and the closed contour of the label at different moments is obtained.

And then, inputting the closed outlines of the signs and the vehicle pose information at a plurality of moments into a solving module based on template information for solving to finally obtain the three-dimensional information of the roadside signs.

Example three: the present embodiment provides an electronic device corresponding to the method for reconstructing a signage component based on a monocular camera according to the first embodiment, where the electronic device may be an electronic device for a client, such as a mobile phone, a notebook computer, a tablet computer, a desktop computer, and the like, to execute the method according to the first embodiment.

The electronic equipment comprises a processor, a memory, a communication interface and a bus, wherein the processor, the memory and the communication interface are connected through the bus to complete mutual communication. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (Extended Industry Standard Component) bus, or the like. The memory stores a computer program that can be executed on the processor, and the processor executes the computer program to execute the method for reconstructing a signage element based on a monocular camera according to the first embodiment.

In some implementations, the logic instructions in the memory may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), an optical disk, and various other media capable of storing program codes.

In other implementations, the processor may be various general-purpose processors such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), and the like, and is not limited herein.

Example four: the monocular camera-based signage component reconstruction method of this embodiment may be embodied as a computer program product that may include a computer readable storage medium having computer readable program instructions embodied thereon for executing the monocular camera-based signage component reconstruction method of this embodiment.

The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any combination of the foregoing.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. In the description herein, references to the description of "one embodiment," "some implementations," or the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of an embodiment of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for reconstructing a signage component based on a monocular camera is characterized by comprising the following steps:

the acquired monocular image is subjected to perception processing to obtain a map element result of image perception;

2. The method of claim 1, wherein the monocular image is required to include a sign on a road side.

3. The method for reconstructing signage components based on a monocular camera of claim 1, wherein the obtaining of the monocular image is perceptually processed to obtain a map component result of image perception, comprising:

and edge extraction, namely performing edge extraction on the mask data of the road side sign pixel, and outputting a series of control points in the image to represent the closed contour of the road side sign element.

4. The method of claim 3, wherein the signage aspect reconstruction based on the monocular camera, comprises:

zooming the monocular image;

setting an image segmentation model based on a convolutional neural network;

5. The method for reconstructing signage components based on a monocular camera of claim 3, wherein the road side signage component calculation based on the map component result of image sensing and six-degree-of-freedom information of the vehicle to obtain three-dimensional information of the road side signage comprises:

predicting the positions of the sign element examples in different frame images by using optical flows according to the closed contour of the roadside sign element and the six-degree-of-freedom information of the vehicle to obtain the closed contour of the sign at different moments;

6. The method for reconstructing sign elements based on a monocular camera according to claim 5, wherein solving the closed outlines of signs at a plurality of times and the pose information of the six-degree-of-freedom vehicle through the set template information to finally obtain the three-dimensional information of the roadside signs comprises:

defining a label shape template;

signboard parameterization, including shape parameters and signboard position parameters;

defining an objective function:

wherein the content of the first and second substances,

to represent

Pixel position projected onto the image, R _k ,t _k Representing the pose of six degrees of freedom of the camera at the k time, pi representing the projection model of the pinhole camera, p representing the closed contour, and

the nearest point, w represents the point in the world coordinate system;

7. The method of claim 6, wherein the signage component reconstruction method based on the monocular camera defines signage shape templates, including a rectangular template and a circular template.

8. A monocular camera-based signage feature reconstruction system, comprising:

a positioning module configured to obtain six degree of freedom information of the vehicle based on the GNSS signals, the IMU signals, and the wheel speed signals;

9. An electronic device comprising computer program instructions, wherein the program instructions, when executed by a processor, are adapted to implement the monocular camera based signage component reconstruction method of any one of claims 1-7.

10. A computer-readable storage medium, having stored thereon computer program instructions, wherein the program instructions, when executed by a processor, are adapted to implement the monocular camera based signage component reconstruction method of any one of claims 1-7.