CN113591556A

CN113591556A - Three-dimensional point cloud semantic analysis method based on neural network three-body model

Info

Publication number: CN113591556A
Application number: CN202110688525.9A
Authority: CN
Inventors: 胡奇; 王春阳; 段锦; 翟朗; 田嘉政
Original assignee: Changchun University of Science and Technology
Current assignee: Changchun University of Science and Technology
Priority date: 2021-06-22
Filing date: 2021-06-22
Publication date: 2021-11-02

Abstract

The invention provides a three-dimensional point cloud semantic analysis method based on a neural network three-body model, which comprises the following steps: the method comprises the following steps of completing the construction of a learning body of laser 3D point cloud data semantic analysis, wherein the construction of the learning body is mainly divided into two major links, namely local space coding and attention mechanism introduction; then, a memory and an interpreter are constructed: and finding tensor units in the mesoscopic system, extracting entity parameters through the mesoscopic system, taking other particle information in the mesoscopic system as a background environment along with the gradual rise of the mesoscopic system in the model, continuously updating the description of the tensor units, and finally predicting the spatial relationship between the whole mesoscopic system group and a single mesoscopic system for each mesoscopic system. The invention provides a three-dimensional point cloud semantic analysis strategy based on a mesoscopic system neural network three-body model, can be applied to large-scene dynamic laser 3D point cloud semantic analysis, and has certain generalization on the development trend of continuously increasing dimensions in the future.

Description

Three-dimensional point cloud semantic analysis method based on neural network three-body model

Technical Field

The invention belongs to the field of three-dimensional point cloud semantic analysis of a neural network three-body model, and particularly relates to a three-dimensional point cloud semantic analysis method based on the neural network three-body model.

Background

In recent years, the demands for large scene laser Three-Dimensional (3D) point cloud target identification and tracking technology based on deep learning are increasingly strong in various fields such as industrial detection and intelligent operation. However, due to the characteristics of non-regularization, non-structuring, disorder and the like of the laser point cloud, a conventional Convolutional Neural Network (CNN) cannot be directly applied to such data. Meanwhile, due to fundamental difference between deep learning and human cognitive systems, existence of the black box model can cause fatal problems of lack of interpretability and the like.

Therefore, in order to adapt to the requirements of a complex system, break through the original parallel dimension concept, and based on the concepts of 'ascending dimension' and 'cross-boundary', the systematic research of the 3D point cloud semantic analysis method of the three-body neural network model based on the learner, the memory and the interpreter is mainly completed by introducing the concept and the characteristics of the mesoscopic system.

Disclosure of Invention

In order to solve the technical problem, the invention provides a three-dimensional point cloud semantic analysis method based on a neural network three-body model, which comprises the following steps:

step 1: local spatial coding: encoding spatial geometry information of the 3D point cloud, thereby enabling the network to better learn the spatial geometry from the relative positions of the various points and distance information;

step 2: an attention mechanism is introduced: outputting the relative position of each point and a neighborhood point feature set of distance information, automatically learning and aggregating through an attention mechanism, and further improving the algorithm execution efficiency by adopting more efficient nearest neighbor interpolation in an up-sampling stage in a decoder in consideration of continuous and large down-sampling of the input point cloud;

and step 3: through continuous learning and iteration, the mesoscopic system is promoted to continuously increase the receptive field of each point and promote the characteristic propagation among the neighborhood points, so that the process of evolution from the low-level mesoscopic system to the high-level mesoscopic system is completed, and the construction of a learner is completed;

and 4, step 4: constructing a memory and an interpreter: finding tensor units in the mesoscopic system, extracting entity parameters through the mesoscopic system, taking other particle information in the mesoscopic system as a background environment along with the gradual rise of the mesoscopic system in the model, continuously updating the description of the tensor units, and finally predicting the spatial relationship between the whole mesoscopic system group and a single mesoscopic system for each mesoscopic system;

and 5: in such iteration, the high-level mesoscopic system should have a certain interpretation capability, so that the use of an interpreter has an implicit assumption that one of the interpretations is correct, and in order to find out the correct interpretation, an objective function needs to be selected to ensure that the log-likelihood maximization of the posture which is generated by the high-level mesoscopic system through a mixed model and is observed on the low-level mesoscopic system is ensured;

step 6: when the interpreter is used for reverse propagation, how to instantiate a high-level mesoscopic system is learned, which cannot well explain elements of data, so that an analytic tree needs to be established, and the elements which are best explained can be learned and optimized by obtaining the maximum derivative.

Preferably, in the step 2, automatic learning and aggregation are performed through an attention mechanism, and an attention weight capable of automatically selecting an important feature independently is learned for each point through designing a sharing function, and the finally obtained feature is a weighted sum of the neighborhood feature point sets.

Preferably, the spatial relationship between the whole mesoscopic system group and the single mesoscopic system is predicted for each mesoscopic system in the step 4, each instantiated high-level mesoscopic system predicts the posture for each extracted low-level mesoscopic system from the image, and in the process of predicting the posture, an objective function is selected to ensure that the log-likelihood maximization of the posture, generated by the high-level mesoscopic system through a mixed model and observed on the low-level mesoscopic system, is ensured.

Preferably, the mesoscopic particle is a neuron structure in a neural network, and mainly comprises a logic unit, a matrix unit and a vector unit, wherein the logic unit is mainly used for representing whether the entity exists in the current image, no matter the entity is in any place of the image range covered by the set, the matrix unit is used for representing the spatial relationship between the entity and an observer, or the spatial relationship between an intrinsic coordinate system embedded in the entity and the observer; the vector unit is used to represent information other than the logic unit, the matrix unit.

Compared with the prior art, the invention has the beneficial effects that: the invention provides a three-dimensional point cloud semantic analysis strategy based on a mesoscopic system neural network three-body model, can be applied to large-scene dynamic laser 3D point cloud semantic analysis, and has certain generalization on the development trend of continuously increasing dimensions in the future.

Drawings

FIG. 1 is an analytical roadmap for the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

example (b):

as shown in fig. 1, the invention provides a three-dimensional point cloud semantic analysis method based on a neural network three-body model, which comprises the following steps:

(1) and (3) completing the construction of a learning body for semantic analysis of laser 3D point cloud data:

the construction of a learning body is mainly divided into two major links, namely local space coding and coding of space geometric shape information of 3D point cloud, so that a network can better learn the space geometric structure from the relative position and distance information of each point; secondly, an attention mechanism is introduced, neighborhood point feature sets of relative positions and distance information of all points are output, automatic learning and aggregation are performed through the attention mechanism, an attention weight value of an important feature can be independently and automatically selected for each point learning through designing a sharing function, finally, the obtained feature is weighted summation of the neighborhood feature set, the input point cloud is considered to be subjected to continuous and large-amplitude down-sampling, through continuous learning and iteration, the mesoscopic system is promoted to continuously increase the receptive field of each point and promote feature propagation among the neighborhood points, the evolution process from the low-level mesoscopic system to the high-level mesoscopic system is further completed, and finally, in an up-sampling stage in a decoder, more efficient nearest neighbor interpolation is adopted, and the algorithm execution efficiency is further improved;

(2) completing the construction of a memory and an interpreter in a three-body neural network model:

firstly, finding out tensor units in the mesoscopic system, then extracting entity parameters through the mesoscopic system, taking other particle information in the mesoscopic system as background environment along with the gradual rise of the mesoscopic system in the model, continuously updating the description of the tensor units, finally, predicting the posture of each instantiated high-level mesoscopic system for each extracted low-level mesoscopic system from the image, such iteration should have a certain interpretation capability in the high-level mesoscopic system, so there is an implicit assumption that using an interpreter, one of them is the correct interpretation, but generally you don't know which is the correct one, for this purpose, an objective function is chosen that ensures the maximum log-likelihood of the poses already observed on the low-level mesoscopic system, generated by the high-level mesoscopic system through the hybrid model; in such iteration, a high-level mesoscopic system should have a certain interpretation capability, and therefore, an implicit assumption is that using an interpreter is that one of the interpretations is a correct interpretation, and in order to find out the correct interpretation, an objective function needs to be selected to ensure that the log-likelihood of the posture which is generated by the high-level mesoscopic system through a mixed model and is observed on the low-level mesoscopic system is maximized, when the interpreter performs reverse propagation, how to instantiate the high-level mesoscopic system is learned, which cannot well interpret the elements of data, so that an analytic tree needs to be established to enable the best interpreted elements to obtain the maximum derivative, that is, learning and optimization can be performed.

Specifically, the mesoscopic particle is a neuron structure in a neural network, and mainly comprises a logic unit, a matrix unit and a vector unit, wherein the logic unit is mainly used for representing whether the entity exists in the current image, and the matrix unit is used for representing the spatial relationship between the entity and an observer or the spatial relationship between an embedded inherent coordinate system of the entity and the observer no matter whether the entity is in any place of the image range covered by the set; the vector unit is used to represent information other than the logic unit, the matrix unit.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A three-dimensional point cloud semantic analysis method based on a neural network three-body model is characterized by comprising the following steps:

2. The three-dimensional point cloud semantic analysis method based on the neural network three-body model as claimed in claim 1, wherein in the step 2, automatic learning and aggregation are performed through an attention mechanism, an attention weight capable of automatically selecting important features independently is learned for each point through designing a sharing function, and the finally obtained features are weighted summation of the neighborhood feature point sets.

3. The method as claimed in claim 1, wherein the step 4 of predicting the spatial relationship between the whole mesoscopic system group and the single mesoscopic system for each mesoscopic system predicts the pose for each instantiated high-level mesoscopic system extracted from the image, and selects an objective function in the process of predicting the pose to ensure that the log-likelihood of the pose generated by the high-level mesoscopic system through the hybrid model and observed on the low-level mesoscopic system is maximized.

4. The method as claimed in claim 1, wherein the mesoscopic particle is a neuron structure in a neural network, and mainly comprises three parts, namely a logic unit, a matrix unit and a vector unit, the logic unit is mainly used for indicating whether the entity exists in the current image, and the matrix unit is used for indicating the spatial relationship between the entity and the observer, or the spatial relationship between the intrinsic coordinate system embedded in the entity and the observer, no matter where the entity is in the image range covered by the set; the vector unit is used to represent information other than the logic unit, the matrix unit.