CN114511682A

CN114511682A - Three-dimensional scene reconstruction method and device based on laser radar and electronic equipment

Info

Publication number: CN114511682A
Application number: CN202210407112.3A
Authority: CN
Inventors: 陈小雪; 周谷越
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-04-19
Filing date: 2022-04-19
Publication date: 2022-05-17
Anticipated expiration: 2042-04-19
Also published as: CN114511682B

Abstract

The invention provides a three-dimensional scene reconstruction method and device based on a laser radar and electronic equipment, relates to the technical field of computer vision, and aims to learn an implicit expression function of an object based on a joint detection result, reconstruct the implicit expression function to obtain a compact object surface and realize accurate and complete understanding of a three-dimensional scene. The method comprises the following steps: acquiring point cloud data of a target area; performing joint detection on the point cloud data by using a neural network based on an attention mechanism to obtain a joint detection result; the joint detection result comprises a target area object and a target area layout; learning based on the joint detection result to obtain an implicit expression function of the object in the target area; and performing three-dimensional scene reconstruction on the object in the target area based on the implicit expression function. The three-dimensional scene reconstruction device based on the laser radar is applied to a three-dimensional scene reconstruction method based on the laser radar. The three-dimensional scene reconstruction method based on the laser radar is applied to electronic equipment.

Description

Three-dimensional scene reconstruction method and device based on laser radar and electronic equipment

Technical Field

The invention relates to the technical field of computer vision, in particular to a three-dimensional scene reconstruction method and device based on a laser radar and electronic equipment.

Background

The three-dimensional scene comprises an indoor scene and an outdoor scene, the application number is CN201910925338.0 in the prior art, the method for reconstructing the robot vision guide three-dimensional object based on the octree adopts an octree structure as a point cloud data storage structure, selects a region growing method as a reconstruction algorithm, and discloses the method for detecting and reconstructing the object independently. The application number is CN201811466786.0, and the method for reconstructing the indoor scene based on the RGB-D image restores the depth image hole by utilizing the semantic segmentation result, provides object outline and category information for three-dimensional reconstruction, and obtains the shape and appearance of an object according to priori knowledge, thereby providing more accurate data for three-dimensional reconstruction. The three-dimensional reconstruction provides three-dimensional space information for semantic segmentation, and solves the problem of mistaken segmentation caused by object overlapping, illumination influence and the like in two-dimensional image segmentation. And a multi-level camera pose estimation is adopted, coarse estimation pose is provided by sparse feature matching, and then accurate camera pose is obtained by a dense geometric and photometric optimization method, so that more accurate camera pose is provided for a reconstructed model. In the reconstruction process, local optimization is carried out on each frame, meanwhile, a key frame mechanism is added, global optimization and closed loop detection are established, and constraint is established on spatial points corresponding to key frame pixels.

In the prior art, 2D images and multi-frame data are mostly used for reconstruction, and the problems of high requirement on hardware, complex algorithm, long processing time and the like exist. More importantly, in the prior art, reconstruction is limited to detection of an object or prediction of three-dimensional scene layout independently, but the two methods have less cooperative detection, and have certain limitations on time efficiency and network models.

Disclosure of Invention

In order to solve the technical problems, the invention provides a three-dimensional scene reconstruction method, a three-dimensional scene reconstruction device, electronic equipment and a computer-readable storage medium based on a laser radar.

The invention provides a three-dimensional scene reconstruction method based on a laser radar, which comprises the following steps:

step 1: acquiring point cloud data of a target area;

step 2: performing joint detection on the point cloud data based on a neural network of an attention mechanism to obtain a joint detection result; the joint detection result comprises a target area object and a target area layout;

and step 3: learning based on the joint detection result to obtain an implicit expression function of the target area object;

and 4, step 4: and reconstructing a three-dimensional scene of the object in the target area based on the implicit expression function.

Preferably, the step 2: performing joint detection on the point cloud data by using a neural network based on an attention mechanism to obtain a joint detection result, wherein the joint detection result comprises the following steps:

step 2.1: extracting the characteristics of the point cloud data to obtain a characteristic result, wherein the characteristic result comprises characteristic information of the global point cloud of the target area;

step 2.2: acquiring a target area proposal characteristic, and performing characteristic fusion on the target area proposal characteristic and the characteristic result to obtain a characteristic fusion result; the target area proposing characteristics comprise target area object proposing characteristics and target area layout proposing characteristics;

step 2.3: inputting the feature fusion result into a neural network based on an attention mechanism to carry out joint detection so as to obtain a joint detection result; the joint detection result includes a target area object and a target area layout.

Further, the step 2.1: performing feature extraction on the point cloud data to obtain a feature result, wherein the feature result comprises the following steps:

carrying out down-sampling and up-sampling on the point cloud data based on a PointNet + + network, and preprocessing a sampling result to obtain the characteristic result;

the step 2.2: acquiring a target area proposal characteristic, and carrying out characteristic fusion on the target area proposal characteristic and the characteristic result to obtain a characteristic fusion result; the target area proposal feature comprises a target area object proposal feature and a target area layout proposal feature, and comprises the following steps:

and acquiring the object proposal characteristic of the target area based on a vote algorithm, acquiring the layout proposal characteristic of the target area based on an fps algorithm, and performing characteristic fusion on the object proposal characteristic of the target area, the layout proposal characteristic of the target area and the characteristic result based on a Transformer network to acquire the characteristic fusion result.

Preferably, the step 3: learning based on the joint detection result to obtain an implicit expression function of the object in the target area; the method comprises the following steps:

step 3.1: aligning the point cloud data of the target area to a standard coordinate system to form point cloud data of an object in the target area;

step 3.2: inputting the point cloud data of the target area object and the target area object proposal into a multilayer neural network for learning to obtain shape characteristics;

step 3.3: inputting the shape features into a decoding neural network to form the implicit expression function.

Preferably, the step 4: reconstructing a three-dimensional scene of the object in the target area based on the implicit expression function, wherein the three-dimensional scene reconstruction comprises the following steps:

step 4.1: obtaining surface information of the object in the target area based on the implicit expression function;

step 4.2: extracting a triangular patch of the surface information by using a Marching Cubes algorithm;

step 4.3: and restoring the triangular patch to tightly reconstruct the surface of the object in the target area.

Further, the step 4.2: extracting said surface information using Marching Cubes algorithm

A triangular patch, comprising:

step 4.2.1: predicting by using the decoding neural network to obtain an occupation function;

step 4.2.2: and based on the occupation function, extracting the triangular patch of the surface information by using a Marching Cubes algorithm.

Compared with the prior art, the three-dimensional scene reconstruction method based on the laser radar provided by the invention

Has the following beneficial effects: firstly, point cloud data of a target area are obtained; performing joint detection on the point cloud data by using a neural network based on an attention mechanism to obtain a joint detection result; the joint detection result comprises a target area object and a target area layout; learning based on the joint detection result to obtain an implicit expression function of the object in the target area; and performing three-dimensional scene reconstruction on the object in the target area based on the implicit expression function. According to the super-resolution three-dimensional reconstruction method, accurate and complete reconstruction of the super-resolution three-dimensional scene can be rapidly realized by using single-frame laser radar data, the requirement on hardware equipment is low, algorithm processing is mainly relied on, the problems that the requirement on hardware is high, the processing time is long and the like in the prior art are solved, the reconstruction result of any resolution can be obtained based on the continuity of implicit expression, and the limitation of the prior art on the resolution is broken.

The invention also provides a three-dimensional scene reconstruction device based on the laser radar, which comprises:

the data acquisition module is used for acquiring point cloud data of a target area;

the joint detection module is used for carrying out joint detection on the point cloud data based on a neural network of an attention mechanism to obtain a joint detection result; the joint detection result comprises a target area object and a target area layout;

an implicit function module, configured to learn based on the joint detection result to obtain an implicit expression function of the target area object;

and the reconstruction module is used for reconstructing a three-dimensional scene of the object in the target area based on the implicit expression function.

Preferably, the joint detection module comprises:

the characteristic extraction unit is used for extracting the characteristics of the point cloud data to obtain a characteristic result, and the characteristic result comprises the characteristic information of the global point cloud of the target area;

the proposed feature unit is used for acquiring a proposed feature of a target area, and performing feature fusion on the proposed feature of the target area and the feature result to obtain a feature fusion result; the target area proposing characteristics comprise target area object proposing characteristics and target area layout proposing characteristics;

the joint detection unit is used for inputting the feature fusion result into a neural network based on an attention mechanism so as to carry out joint detection and obtain a joint detection result; the joint detection result comprises a target area object and a target area layout;

the implicit function module comprises:

the transformation unit is used for aligning the point cloud data of the target area into a standard coordinate system to form point cloud data of an object in the target area;

the shape feature unit is used for inputting the point cloud data of the target area object and the target area object proposal into a multilayer neural network for learning to obtain shape features;

an implicit function unit, configured to input the shape feature into a decoding neural network to form the implicit expression function;

the reconstruction module includes:

a surface information unit for obtaining surface information of the object in the target area based on the implicit expression function;

the triangular patch unit is used for extracting a triangular patch of the surface information by using a Marching cube algorithm;

and the tight reconstruction unit is used for restoring the triangular patch so as to carry out tight reconstruction on the surface of the object in the target region.

Compared with the prior art, the beneficial effects of the three-dimensional scene reconstruction device based on the laser radar provided by the invention are the same as the beneficial effects of the three-dimensional scene reconstruction method based on the laser radar in the technical scheme, and the details are not repeated herein.

The invention further provides an electronic device, which includes a bus, a transceiver, a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the transceiver, the memory, and the processor are connected via the bus, and when the computer program is executed by the processor, the steps in any one of the above three-dimensional scene reconstruction methods based on lidar are implemented.

Compared with the prior art, the beneficial effects of the electronic device provided by the invention are the same as those of the laser radar-based three-dimensional scene reconstruction method in the technical scheme, and are not described herein again.

The present invention also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of a method for three-dimensional reconstruction based on lidar according to any of the above.

Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the invention are the same as the beneficial effects of the laser radar-based three-dimensional scene reconstruction method in the technical scheme, and are not repeated herein.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 shows a flowchart of a three-dimensional scene reconstruction method based on a laser radar according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a joint detection process provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a three-dimensional scene reconstruction process based on an implicit expression function according to an embodiment of the present invention;

fig. 4 shows a schematic structural diagram of a three-dimensional scene reconstruction apparatus based on a lidar according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device for executing a three-dimensional scene reconstruction method based on lidar according to an embodiment of the present invention.

Detailed Description

In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

The "plurality" mentioned in the present embodiment means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a is present alone, A and B are present simultaneously, and B is present alone. The terms "exemplary" or "such as" are used herein to mean serving as an example, instance, or illustration, and are intended to present concepts in a concrete fashion, and should not be construed as preferred or advantageous over other embodiments or designs.

Before describing the embodiments of the present application, the terms related to the embodiments of the present application will be explained as follows:

laser radar: an optical remote sensing technology measures parameters such as a distance of a target by emitting a pulse laser to the target.

Three-dimensional reconstruction: refers to a mathematical process and computer technology for recovering three-dimensional information (shape, etc.) of an object using two-dimensional projection or images.

Target detection: also called object detection or object extraction, is an image segmentation based on object geometry and statistical features. The method combines the segmentation and the identification of the target into a whole, and the accuracy and the real-time performance of the method are important capabilities of the whole system. Especially, in a complex scene, when a plurality of targets need to be processed in real time, automatic target extraction and identification are particularly important.

MarchingCubes: the MC algorithm is a classical algorithm in the rendering algorithm, which is a voxel-level reconstruction method proposed by w.lorensen et al in 1987. The MC algorithm is also called "iso surface extraction" (isosurface extraction) algorithm.

In the prior art, with application number CN201910925338.0, an octree-based robot vision-guided three-dimensional object reconstruction method adopts an octree structure as a storage structure of point cloud data, and selects a region growing method as a reconstruction algorithm. The reconstruction method comprises the following steps: A. searching a flat area in the point cloud data, and constructing a seed triangle in the flat area; B. pressing the edge of the seed triangle into a front edge push stack, popping a front edge from the front edge push stack, and determining a candidate region for region growth by taking the front edge as a reference; C. setting a search field in the candidate area, determining an optimal point in the search field by utilizing a particle swarm algorithm, and constructing a new seed triangle by matching with the leading edge; D. and repeating B, C until the leading edge push stack is empty, wherein the set of all triangles obtained in the step is the reconstruction result. On the premise of ensuring the reconstruction precision of the three-dimensional object, the efficiency of reconstructing the three-dimensional object is improved.

The application number is CN201811466786.0, and the method for reconstructing the indoor scene based on the RGB-D image restores the depth image hole by utilizing the semantic segmentation result, provides object outline and category information for three-dimensional reconstruction, and obtains the shape and appearance of an object according to priori knowledge, thereby providing more accurate data for three-dimensional reconstruction. The three-dimensional reconstruction provides three-dimensional space information for semantic segmentation, and solves the problem of mistaken segmentation caused by object overlapping, illumination influence and the like in two-dimensional image segmentation. And a multi-level camera pose estimation is adopted, coarse estimation pose is provided by sparse feature matching, and then accurate camera pose is obtained by a dense geometric and photometric optimization method, so that more accurate camera pose is provided for a reconstructed model. In the reconstruction process, each frame is locally optimized, a key frame mechanism is added, global optimization and closed-loop detection are established, space points corresponding to key frame pixels are constrained, error accumulation is effectively inhibited, the pose of a camera is further optimized, and the accuracy of a reconstruction result is improved.

A robot vision-guided three-dimensional object reconstruction method based on octree relies on the octree as a data structure, and the reconstructed resolution is limited by the octree scale; a method for reconstructing an indoor scene in three dimensions based on RGB-D images adopts a multi-frame data optimization method and simultaneously utilizes RGB information. The reconstruction methods of the two patents are complex, the processing speed is low, and the hardware cost is high.

Based on this, the embodiment of the invention provides a three-dimensional scene reconstruction method and device based on a laser radar, an electronic device and a computer-readable storage medium.

The embodiment of the invention provides a three-dimensional scene reconstruction method based on a laser radar, and figure 1 shows the method

The embodiment of the invention provides a flow chart of a three-dimensional scene reconstruction method based on a laser radar, and as shown in fig. 1, the method includes:

step S1: and acquiring point cloud data of the target area.

It should be noted that the target area may be an indoor scene or an outdoor scene. The target area in the embodiment of the invention takes an indoor scene as an example, main hardware depends on a laser radar arranged at the top of the mobile household robot, and the laser radar is used for collecting point cloud data of the indoor scene. Specifically, the laser radar is used to collect point cloud data of room objects and point cloud data of a room layout, which may be, for example, a wall.

Step S2: performing joint detection on the point cloud data by using a neural network based on an attention mechanism to obtain a joint detection result; the joint detection result includes a target area object and a target area layout.

Fig. 2 shows a schematic diagram of a joint detection process provided by the embodiment of the present invention, and as shown in fig. 2, the step 2 includes:

step 2.1: and performing feature extraction on the point cloud data to obtain a feature result, wherein the feature result comprises feature information of the global point cloud of the target area.

It should be noted that point cloud data is down-sampled and up-sampled based on the PointNet + + network, and the sampling result is preprocessed to obtain a feature result.

Specifically, point cloud data of room objects and room layout obtained by scanning of the laser radar is extracted firstly. The PointNet + + network is a feature extraction network designed based on point cloud characteristics. Illustratively, point cloud data are down-sampled based on a PointNet + + network, and then up-sampling is performed through a feature propagation layer, that is, interpolation is performed on sampled data. It should be understood that the feature propagation layer may be part of a network structure within a pointent + network. After the point cloud data is sampled, enhancement pretreatment such as rotation and turning is carried out on the sampling result so as to obtain a characteristic result. The obtained characteristic result has the characteristic information of the global point cloud in the room, and the characteristic result is used as the input of a subsequent neural network to provide a basis for the subsequent target detection.

Step 2.2: acquiring target area proposal characteristics, and performing characteristic fusion on the target area proposal characteristics and the characteristic results to obtain characteristic fusion results; the target area proposal feature includes a target area object proposal feature and a target area layout proposal feature.

It should be noted that the voting mechanism is derived from hough transform, and in the target detection scene, each vote represents a vector pointing from the coordinates of a point to the center of the object detection box. The embodiment of the invention utilizes the multilayer perception neural network to predict voting vectors of all points which can be obtained by all points, clusters the prediction results to obtain K1 possible object center positions, and the characteristics obtained by clustering represent the possible object characteristics, which are called the proposal of the object.

Illustratively, the sampling of the farthest point is a very common sampling algorithm, a first point of a point cloud is used as a query point, and a point with the farthest distance is taken from the rest points; and continuing to take the extracted point as a query point, and extracting the point with the farthest distance from the rest points. Such iteration results in K2 points as possible wall locations. And extracting the local features as proposed features of the target area layout, such as proposed features of a wall surface.

The embodiment of the invention connects the proposed characteristics of the object with the proposed characteristics of the wall surface, and performs further characteristic optimization as the input of the attention layer. The proposed features are fused with global point cloud feature results obtained by PointNet + +, so that feature fusion results are obtained and further enhanced.

Specifically, the target area object proposal feature may be acquired based on the vote algorithm, and the target area layout proposal feature may be acquired based on the fps algorithm. And performing feature fusion on the object proposed features, the target area layout proposed features and the feature results based on a Transformer network, namely obtaining feature fusion results through feature transmission. Briefly, attention extraction is performed between input different proposed features, wherein the attention represents the relationship between the different features, and the proposed features are updated in advance based on the attention.

It should be noted that the feature fusion result is input into the multilayer neural network to obtain a parameterized object and wall prediction result. The object detection is expressed in the form of a three-dimensional detection frame, and the predicted parameter result comprises the central position of the detection frame, the size of the detection frame, the rotation angle and the like. The wall surface prediction result is represented by a two-dimensional rectangle in a three-dimensional space, and the parameter result comprises the center position of the rectangle, the size of the rectangle and a normal vector.

The neural network based on the attention mechanism performs supervised learning through an objective loss function, wherein the objective loss function comprises voting loss, detection frame loss, wall rectangular loss and the like. Through the training process, a joint detection result for carrying out joint detection on the indoor object and the room layout is obtained, namely the indoor object and the room layout are obtained through the joint detection. As shown in fig. 2, the input is point cloud data including an object and a wall, and the joint detection result is the marked object and the marked wall.

Step S3: and learning based on the joint detection result to obtain an implicit expression function of the object in the target area.

It should be noted that fig. 3 shows a schematic diagram of a three-dimensional scene reconstruction process based on an implicit expression function provided in the embodiment of the present invention, and as shown in fig. 3, the step 3 includes:

step 3.1: and aligning the point cloud data of the target area to a standard coordinate system to form the point cloud data of the object in the target area.

It should be noted that, the object proposals are first screened to remove the objects with confidence lower than the threshold. And then sampling from the point cloud data of the target area on the basis of the center of the three-dimensional detection frame, selecting points within a fixed radius range from the center of the detection frame for clustering, and realizing the process through a clustering layer. After clustering, K groups of object point clouds can be obtained. The embodiment of the invention aligns the point clouds into a standard coordinate system, normalizes all points, translates the centers of all groups of point clouds to an original point, aligns the point clouds with a coordinate axis by rotating a matrix to form point cloud data of an object in a target area, and thus removes the influence of space translation or rotation on variance generated by the shape.

Step 3.2: and inputting the point cloud data of the target area object and the target area object proposal into a multilayer neural network for learning to obtain shape characteristics.

It should be noted that, each group of point clouds aligned to the standard coordinate system and the proposed features corresponding to the point clouds are input to the subsequent multi-layer neural network as the proposed for shape generation, and the shape features are obtained.

Specifically, the input is processed through a shape generation network. First, background points in the point cloud may be removed by a noise reduction network. The points after noise reduction are connected with the previous object proposal through a jump connection layer so as to enhance the point cloud. And further extracting features of the enhanced point cloud through a PointNet + + network to obtain shape features.

Step 3.3: and inputting the shape features into a decoding neural network to form an implicit expression function.

The shape features are input into a decoding neural network, and an implicit expression function of the object is obtained through learning.

Step S4: and performing three-dimensional scene reconstruction on the object in the target area based on the implicit expression function.

In addition, the step 4 includes:

step 4.1: surface information of the object in the target area is obtained based on the implicit expression function.

And 4.2: and extracting the triangular patch of the surface information by using a Marching Cubes algorithm.

It should be noted that, the occupancy function is obtained by using the decoding neural network for prediction, and based on the occupancy function,

and extracting a triangular patch of the surface information of the object by using a Marching cube algorithm.

In particular, the shape of the object may be expressed using an occupancy function. The occupancy function means that three dimensions are input

If the return value of the coordinates of a point in the space is 0, the point is considered to be inside the object, and if the return value is 1, the point is considered to be outside the object.

The embodiment of the invention obtains the occupation function by decoding neural network prediction. Taking into account sparsity of point clouds

There are many possibilities for the shape of the object. A probabilistic generating model may be used to predict the occupancy function. Specifically, the point cloud and the proposed features can be input and encoded into one variable.

After the occupancy function is obtained, the triangular patch on the surface of the object can be extracted through a Marching Cubes algorithm. The Marching Cubes algorithm is a classical algorithm for extracting an isosurface from a three-dimensional discrete data field. The Marching Cubes algorithm has the main idea that an isosurface is approximated in a three-dimensional discrete data field through linear interpolation, and the method specifically comprises the following steps: each grid cell in the three-dimensional discrete data field is regarded as a voxel, and each vertex of the voxel has a corresponding scalar value. Defining a voxel vertex to be outside the iso-surface if the value at that vertex is greater than or equal to the iso-surface value, labeled "0"; whereas if the value at the voxel vertex is smaller than the isosurface value, the vertex is defined to lie within the isosurface, labeled "1".

It should be noted that, since there are 8 vertices per voxel unit, there are 2^8 = 256 cases, and the triangle patch on the surface of the object can be restored through the preset restoration result for these cases. And finally, the compact reconstruction of the object surface is obtained, and the accurate and complete indoor scene reconstruction is realized.

Compared with the prior art, the three-dimensional scene reconstruction method based on the laser radar provided by the embodiment of the invention has the following beneficial effects as shown in table 1:

1. the point cloud data obtained by scanning the laser radar is used for reconstructing indoor or outdoor scenes, and can be applied to scenes with high scene understanding requirements, such as floor sweeping robots, blind guiding robots and the like.

2. Based on a deep learning method, the information in the large batch of scene data is learned, so that the priori knowledge of the large data is further utilized on the basis of using the geometric principle in the reconstruction process, and the detection and reconstruction results which are better than those in the prior art are obtained.

3. Accurate and complete reconstruction of a super-resolution three-dimensional scene can be quickly realized by using single-frame laser radar data, hardware equipment support is not needed, only algorithm processing is needed, and the problems of high requirement on hardware, long processing time and the like in the prior art are solved.

4. The reconstruction result of any resolution is obtained based on the continuity of the implicit expression, and the limitation of the prior art on the resolution is broken.

5. The three-dimensional scene can be accurately and completely understood, rich perception information can be provided for the multifunctional household robot, and the method has a wide application prospect.

TABLE 1

	Robot vision guide three-dimensional object reconstruction method based on octree	Indoor scene three-dimensional reconstruction method based on RGB-D image	The invention
				Relying on 2D images	Is limited by	Is limited by	Arbitrary
By means of multiple frames of data	Is that	Whether or not	Whether or not
				Complexity of installation of equipment	Height of	Is low in	Is low with
Complexity of algorithm	High (a)	Height of	Is low in
				Speed of treatment	Slow	Slow	Fast-acting toy
Cost of equipment	Height of	Height of	Is low in

The embodiment of the invention also provides a three-dimensional scene reconstruction device based on the laser radar, and fig. 4 shows a structural schematic diagram of the three-dimensional scene reconstruction device based on the laser radar provided by the embodiment of the invention. As shown in fig. 4, the apparatus includes:

the data acquisition module 1 is used for acquiring point cloud data of a target area.

The joint detection module 2 is used for carrying out joint detection on the point cloud data based on a neural network of an attention mechanism to obtain a joint detection result; the joint detection result includes a target area object and a target area layout.

The joint detection module 2 comprises: the feature extraction unit 21 is configured to perform feature extraction on the point cloud data to obtain a feature result, where the feature result includes feature information of the global point cloud of the target area; an proposing feature unit 22, configured to obtain a proposing feature of the target area, perform feature fusion on the proposing feature of the target area and the feature result, and obtain a feature fusion result; the target area proposing characteristics comprise target area object proposing characteristics and target area layout proposing characteristics; the joint detection unit 23 is configured to input the feature fusion result to a neural network based on an attention mechanism to perform joint detection, so as to obtain a joint detection result; the joint detection result includes a target area object and a target area layout.

And the implicit function module 3 is used for learning based on the joint detection result to obtain an implicit expression function of the target area object.

The implicit function module 3 comprises: and a transformation unit 31, configured to align the point cloud data of the target area into a standard coordinate system, and form point cloud data of the target area object. And the shape feature unit 32 is used for inputting the point cloud data of the target area object and the target area object proposal into the multilayer neural network for learning to obtain the shape feature. And the implicit function unit 33 is used for inputting the shape features into the decoding neural network to form an implicit expression function.

And the reconstruction module 4 is used for reconstructing a three-dimensional scene of the object in the target area based on the implicit expression function.

The reconstruction module 4 comprises: a surface information unit 41 for obtaining surface information of the object in the target area based on the implicit expression function. And a triangular patch unit 42, configured to extract a triangular patch of the surface information by using a Marching Cubes algorithm. And a tight reconstruction unit 43, configured to restore the triangular patch to perform tight reconstruction on the surface of the object in the target region.

Compared with the prior art, the beneficial effects of the three-dimensional scene reconstruction device based on the laser radar provided by the embodiment of the invention are the same as the beneficial effects of the three-dimensional scene reconstruction method based on the laser radar in the technical scheme, and the detailed description is omitted here.

In addition, an embodiment of the present invention further provides an electronic device, which includes a bus, a transceiver, a memory, a processor, and a computer program stored in the memory and executable on the processor, where the transceiver, the memory, and the processor are respectively connected via the bus, and when the computer program is executed by the processor, each process of the above-mentioned three-dimensional scene reconstruction method based on a laser radar is implemented, and the same technical effect can be achieved, and therefore, in order to avoid repetition, details are not repeated here.

Specifically, referring to fig. 5, an embodiment of the present invention further provides an electronic device, which includes a bus 1110, a processor 1120, a transceiver 1130, a bus interface 1140, a memory 1150, and a user interface 1160.

In an embodiment of the present invention, the electronic device further includes: a computer program stored on the memory 1150 and executable on the processor 1120, the computer program, when executed by the processor 1120, implements the processes of one of the above-described lidar-based three-dimensional scene reconstruction method embodiments.

A transceiver 1130 for receiving and transmitting data under the control of the processor 1120.

In embodiments of the invention in which a bus architecture (represented by bus 1110) is used, bus 1110 may include any number of interconnected buses and bridges, with bus 1110 connecting various circuits including one or more processors, represented by processor 1120, and memory, represented by memory 1150.

Bus 1110 represents one or more of any of several types of bus structures, including a memory bus, and memory controller, a peripheral bus, an Accelerated Graphics Port (AGP), a processor, or a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include: industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA), Peripheral Component Interconnect (PCI) bus.

Processor 1120 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits in hardware or instructions in software in a processor. The processor described above includes: general purpose processors, Central Processing Units (CPUs), Network Processors (NPs), Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), Complex Programmable Logic Devices (CPLDs), Programmable Logic Arrays (PLAs), Micro Control Units (MCUs) or other Programmable Logic devices, discrete gates, transistor Logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in embodiments of the present invention may be implemented or performed. For example, the processor may be a single core processor or a multi-core processor, which may be integrated on a single chip or located on multiple different chips.

Processor 1120 may be a microprocessor or any conventional processor. The steps of the method disclosed in connection with the embodiments of the present invention may be directly performed by a hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor. The software modules may be located in a Random Access Memory (RAM), a flash Memory (flash Memory), a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), a register, and other readable storage media known in the art. The readable storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The bus 1110 may also connect various other circuits such as peripherals, voltage regulators, or power management circuits to provide an interface between the bus 1110 and the transceiver 1130, as is well known in the art. Therefore, the embodiments of the present invention will not be further described.

The transceiver 1130 may be one element or may be multiple elements, such as multiple receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. For example: the transceiver 1130 receives external data from other devices, and the transceiver 1130 transmits data processed by the processor 1120 to other devices. Depending on the nature of the computer system, a user interface 1160 may also be provided, such as: touch screen, physical keyboard, display, mouse, speaker, microphone, trackball, joystick, stylus.

It is to be appreciated that in an embodiment of the invention, the memory 1150 may further include remotely located memory relative to the processor 1120, such remotely located memory may be coupled to the server via a network. One or more portions of the above-described networks may be an ad hoc network (ad hoc network), an intranet (intranet), an extranet (extranet), a Virtual Private Network (VPN), a Local Area Network (LAN), a Wireless Local Area Network (WLAN), a Wide Area Network (WAN), a Wireless Wide Area Network (WWAN), a Metropolitan Area Network (MAN), the Internet (Internet), a Public Switched Telephone Network (PSTN), a plain old telephone service network (POTS), a cellular telephone network, a wireless fidelity (Wi-Fi) network, and combinations of two or more of the above. For example, the cellular telephone network and the wireless network may be a global system for Mobile Communications (GSM) system, a Code Division Multiple Access (CDMA) system, a Worldwide Interoperability for Microwave Access (WiMAX) system, a General Packet Radio Service (GPRS) system, a Wideband Code Division Multiple Access (WCDMA) system, a Long Term Evolution (LTE) system, an LTE Frequency Division Duplex (FDD) system, an LTE Time Division Duplex (TDD) system, a long term evolution-advanced (LTE-a) system, a Universal Mobile Telecommunications (UMTS) system, an enhanced Mobile Broadband (eMBB) system, a mass Machine Type Communication (mtc) system, an ultra reliable Low Latency Communication (urrllc) system, or the like.

It is to be understood that the memory 1150 in embodiments of the present invention can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. Wherein the nonvolatile memory includes: Read-Only Memory (ROM), Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), or Flash Memory.

The volatile memory includes: random Access Memory (RAM), which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as: static random access memory (Static RAM, SRAM), Dynamic random access memory (Dynamic RAM, DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data rate Synchronous Dynamic random access memory (Double Data RateSDRAM, DDRSDRAM), Enhanced Synchronous DRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DRRAM). The memory 1150 of the electronic device described in the embodiments of the invention includes, but is not limited to, the above and any other suitable types of memory.

In an embodiment of the present invention, memory 1150 stores the following elements of operating system 1151 and application programs 1152: an executable module, a data structure, or a subset thereof, or an expanded set thereof.

Specifically, the operating system 1151 includes various system programs such as: a framework layer, a core library layer, a driver layer, etc. for implementing various basic services and processing hardware-based tasks. Applications 1152 include various applications such as: media Player (Media Player), Browser (Browser), for implementing various application services. A program implementing a method of an embodiment of the invention may be included in application program 1152. The application programs 1152 include: applets, objects, components, logic, data structures, and other computer system executable instructions that perform particular tasks or implement particular abstract data types.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned three-dimensional scene reconstruction method based on a laser radar, and can achieve the same technical effect, and is not described herein again to avoid repetition.

The computer-readable storage medium includes: permanent and non-permanent, removable and non-removable media may be tangible devices that retain and store instructions for use by an instruction execution apparatus. The computer-readable storage medium includes: electronic memory devices, magnetic memory devices, optical memory devices, electromagnetic memory devices, semiconductor memory devices, and any suitable combination of the foregoing. The computer-readable storage medium includes: phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), non-volatile random access memory (NVRAM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic tape cartridge storage, magnetic tape disk storage or other magnetic storage devices, memory sticks, mechanically encoded devices (e.g., punched cards or raised structures in a groove having instructions recorded thereon), or any other non-transmission medium useful for storing information that may be accessed by a computing device. As defined in embodiments of the present invention, the computer-readable storage medium does not include transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses traveling through a fiber optic cable), or electrical signals transmitted through a wire.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, electronic device and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electrical, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to solve the problem to be solved by the embodiment of the invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present invention may be substantially or partially contributed by the prior art, or all or part of the technical solutions may be embodied in a software product stored in a storage medium and including instructions for causing a computer device (including a personal computer, a server, a data center, or other network devices) to execute all or part of the steps of the methods of the embodiments of the present invention. And the storage medium includes various media that can store the program code as listed in the foregoing.

The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present invention, and the present invention shall be covered by the claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A three-dimensional scene reconstruction method based on laser radar is characterized by comprising the following steps:

step 1: acquiring point cloud data of a target area;

and step 3: learning based on the joint detection result to obtain an implicit expression function of the object in the target area;

2. The lidar based three-dimensional scene reconstruction method according to claim 1, wherein the step 2: performing joint detection on the point cloud data by using a neural network based on an attention mechanism to obtain a joint detection result, wherein the joint detection result comprises the following steps:

3. The lidar based three-dimensional scene reconstruction method according to claim 2, wherein the step 2.1: performing feature extraction on the point cloud data to obtain a feature result, wherein the feature result comprises the following steps:

the step 2.2: acquiring a target area proposal characteristic, and performing characteristic fusion on the target area proposal characteristic and the characteristic result to obtain a characteristic fusion result; the target area proposal feature comprises a target area object proposal feature and a target area layout proposal feature, and comprises the following steps:

4. The lidar based three-dimensional scene reconstruction method according to claim 1, wherein the step 3: learning based on the joint detection result to obtain an implicit expression function of the object in the target area; the method comprises the following steps:

5. The lidar based three-dimensional scene reconstruction method according to claim 1, wherein the step 4: reconstructing a three-dimensional scene of the object in the target area based on the implicit expression function, wherein the three-dimensional scene reconstruction comprises the following steps:

6. The lidar based three-dimensional scene reconstruction method according to claim 5, wherein the step 4.2: the method for extracting the triangular patch of the surface information by using the Marching Cubes algorithm comprises the following steps:

step 4.2.1: predicting by using a decoding neural network to obtain an occupation function;

7. A three-dimensional scene reconstruction device based on laser radar is characterized by comprising:

8. The lidar-based three-dimensional scene reconstruction apparatus according to claim 7, wherein the joint detection module comprises:

the implicit function module comprises:

the reconstruction module includes:

9. An electronic device comprising a bus, a transceiver, a memory, a processor and a computer program stored on the memory and executable on the processor, the transceiver, the memory and the processor being connected via the bus, characterized in that the computer program, when executed by the processor, implements the steps in a lidar based three-dimensional scene reconstruction method according to any of claims 1 to 6.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a method for lidar based three-dimensional scene reconstruction according to any one of claims 1 to 6.