CN115049786B - Task-oriented point cloud data downsampling method and system - Google Patents

Task-oriented point cloud data downsampling method and system Download PDF

Info

Publication number
CN115049786B
CN115049786B CN202210689275.5A CN202210689275A CN115049786B CN 115049786 B CN115049786 B CN 115049786B CN 202210689275 A CN202210689275 A CN 202210689275A CN 115049786 B CN115049786 B CN 115049786B
Authority
CN
China
Prior art keywords
point cloud
task
downsampling
network
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210689275.5A
Other languages
Chinese (zh)
Other versions
CN115049786A (en
Inventor
金�一
王旭
岑翼刚
刘柏甫
王涛
李浥东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN202210689275.5A priority Critical patent/CN115049786B/en
Publication of CN115049786A publication Critical patent/CN115049786A/en
Application granted granted Critical
Publication of CN115049786B publication Critical patent/CN115049786B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a task-oriented point cloud data downsampling method and a task-oriented point cloud data downsampling system, which belong to the technical field of point cloud data processing, and are used for adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism and introducing a scaling strategy in a feedforward neural network layer; based on the sampling loss function, expanding coverage range of downsampling point cloud and focusing capability of key areas, and promoting generation of point cloud as proper subset of original point cloud; and combining the downsampling module with the task network, and updating the weight parameters of the downsampling network by utilizing the sampling loss and the task loss together. The invention reduces the consumption of calculation and storage resources; a sampling loss function is designed, so that proper subset point cloud data with more uniform point cloud distribution and more comprehensive key point coverage is obtained; and the universal downsampling module is combined with the three-dimensional classification task network, so that effective balance of performance optimization and resource overhead minimization of the task network is realized.

Description

Task-oriented point cloud data downsampling method and system
Technical Field
The invention relates to the technical field of point cloud data processing, in particular to a task-oriented point cloud data downsampling method and system based on a transformer neural network.
Background
In recent years, due to the continuous reduction of prices of three-dimensional sensors, three-dimensional sensor modules such as laser radars and the like are widely applied to the fields of daily life of people, such as rail transit, intelligent traffic, unmanned systems, three-dimensional vision robots, augmented reality technologies, smart cities, point cloud data processing systems and the like. Meanwhile, along with the continuous development of deep learning technology, the point cloud data acquired by the three-dimensional sensor is widely applied to different traffic scenes such as intelligent rail traffic and intelligent urban traffic, and data support is provided for realizing infrastructure intellectualization, standardization, design and digitization, planning and design such as 'safe trip'. In general, in order to acquire refined modeling data in a practical scene, a large amount of dense point cloud data is often required to be acquired on the surface of an object so as to improve modeling accuracy. However, as three-dimensional point cloud processing devices move toward miniaturized, handheld devices, processing large-scale dense point clouds on mobile devices or terminals becomes an important challenge. On the other hand, the obtained large-scale point cloud data provides complete global scene information, but for specific tasks, such as three-dimensional object classification, three-dimensional scene segmentation, three-dimensional point cloud calibration and the like, the resource cost is increased by processing the large-scale point cloud data, the scale and the density of the points in the point cloud and the task performance or accuracy show non-positive correlation growth, namely, the performance of a task model cannot continuously grow along with the continuous increase of the number of the points. To solve the above problems, three-dimensional downsampling techniques are proposed.
The existing downsampling method is mainly divided into a traditional method and a deep learning method. Conventional downsampling methods are represented by furthest point sampling (Farthest Point Sampling), random sampling (Randomly Sampling), and voxelization (Voxel). Although the traditional method can finish the task of point cloud downsampling, the method takes data as a guide, and does not fully consider deep geometrical characteristics of the point cloud data and network requirements of downstream tasks, so that suboptimal sampling results are often obtained. Meanwhile, the conventional method generally needs to perform multiple conventional downsampling on the input original point cloud data to select an optimal downsampling result, so that satisfactory task network output accuracy is ensured. It is emphasized that the repeated downsampling effort described above results in a multiple increase in resource overhead, which is contrary to the goal of reducing resource overhead by point cloud downsampling. The performance of the conventional method is therefore to be optimized.
Downsampling methods based on deep learning have also been proposed in recent years. The currently popular downsampling methods based on deep learning can be divided into two categories: (1) A specific downsampling layer, and (2) a generic downsampling module. Specifically, the specific downsampling layer method combines an embedded point cloud downsampling network layer with a specific task neural network by designing the embedded point cloud downsampling network layer, and redundant point cloud information is continuously filtered while feature learning is performed. While such methods can effectively reduce the point cloud scale, they are not friendly for a priori network structures where the model structure is fixed. The reason for this is that (1) any slight structural change may cause a decrease in output performance for a predefined network structure; (2) For a priori defined neural network with complex structure and high precision, the resource overhead of retraining is huge.
The general down sampling module design based on task guidance mainly designs a down sampling module independent of a task network, and can be combined with any task network needing down sampling under the condition of not changing the structure of the task network. It should be noted that existing general task oriented downsampling modules all employ a point net-like deep learning framework, and despite the low resource overhead of such structures, are limited to the individual processing of points in the point cloud by such networks, but disregard the correlation and geometric relationship information before the points, resulting in model performance to be optimized further. The success of the converter network in the machine vision task provides a new idea for three-dimensional point cloud data processing. The current mainstream converter network strengthens the model depth and width of the final converter network by superposing a plurality of converter modules and introducing a multi-head attention network into each module, thereby utilizing rich learnable parameters to fit visual tasks, such as three-dimensional classification tasks, three-dimensional segmentation tasks and the like, and achieving higher output performance. It should be noted that conventional transformer network models are complex in structure and have significant computational and memory overhead. Instead, the purpose of the point cloud downsampling task is to save computation and storage overhead. Therefore, the existing converter network is difficult to be directly applied to the point cloud downsampling task, and the resource consumption of the downsampling task network is possibly larger than the resource saving caused by the point cloud descent, so that the resource saving is against the goal of saving resources.
In summary, the field of point cloud downsampling does not well incorporate the recently proposed converter network framework into the design of the depth model, and meanwhile, no effective method is available for the existing resource-intensive converter network structure, so that the problem of performance degradation caused by the reduction of the scale of the point cloud is reduced as much as possible while the resource utilization rate is reduced.
Disclosure of Invention
The invention aims to provide a task-oriented point cloud data downsampling method and system, which aim to solve at least one technical problem in the background technology.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
in one aspect, the present invention provides a task-oriented point cloud data downsampling method, including:
adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer; the English word of the expansion and contraction is scaling, and the expansion and contraction strategy is used for increasing or reducing the scale of the feedforward neural network layer according to the requirement of an actual task. I.e., scaling in scaling means increasing the size of the feedforward neural network, scaling means reducing the size of the feedforward neural network.
Based on the sampling loss function, expanding coverage range of downsampling point cloud and focusing capability of key areas, and promoting generation of point cloud as proper subset of original point cloud;
and combining the downsampling module with the task network, and updating the weight parameters of the downsampling network by utilizing the sampling loss and the task loss together.
Optionally, the method includes adjusting a resource-intensive structure in the transformer network, removing a position embedding, simplifying an input data embedding layer structure, deleting a mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer, including:
constructing a light-weight input data embedding layer;
deleting mapping steps of query vectors and key vectors in the self-correlation attention mechanism, only preserving dot multiplication operation of input data, and constructing a lightweight self-correlation attention mechanism;
and constructing a scalable feedforward neural network. According to the size of the data set and the complexity of the scene, the network size is expanded according to priori knowledge. The feedforward neural network is a multi-layer neural network (MLP), and the feedforward neural network is expanded and contracted to increase the number of layers of the MLP and the number of neurons in each layer. After the structure is adjusted, testing and fine tuning are performed until the network converges to a proper accuracy.
Alternatively, the mathematical formula for the sampling loss is expressed as:
L sampling =L CD +αL repl +βL soft ,
wherein α and β are regularization parameters, L CD Representing a chamfer loss function, L repl Representing a rejection loss function; l (L) soft Representing the nonlinear mapping penalty.
Alternatively, the chamfer loss function L CD (Q, P) is:
wherein Q represents a generated point cloud, Q represents points in the generated point cloud; p represents the original point cloud, and P represents a point in the original point cloud.
Optionally, the rejection loss function is:
wherein η (r) =max (0, h 2 -r 2 ) Is a function of ensuring that Q maintains a certain distance from other points in Q, h represents the average separation distance between the generated points, and K represents the number of K neighbors of Q.
Optionally, the nonlinear mapping loss includes:
q is expressed by using the average weight w of k adjacent points of q as a soft projection point z, and a specific mathematical formula is expressed as follows:
next, gumbel-Softmax Trick is used to optimize the weight w, specifically formulated as:
where t is a learnable temperature coefficient, controlling the distribution shape of the weights w, when t tends to 0, the points z can be approximated as a proper subset of the input point cloud;
finally, a mapping loss is added to the sampling loss to optimize the nonlinearity and convergence of the soft projection, and the specific mathematical formula is expressed as:
L soft =T(t),t∈[0,+∞),
wherein T (·) is a function of T for introducing a nonlinear relationship.
In a second aspect, the present invention provides a task-oriented point cloud data downsampling system, comprising:
the converter module is used for adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer;
the sampling loss module is used for expanding the coverage range of the downsampling point cloud and the attention capability of a key area based on the sampling loss function and promoting the generation of the point cloud as a proper subset of the original point cloud;
and the task guiding module is used for combining the downsampling module with the task network and updating the weight parameters of the downsampling network by utilizing the sampling loss and the task loss together.
In a third aspect, the present invention provides a computer device comprising a memory and a processor, the processor and the memory being in communication with each other, the memory storing program instructions executable by the processor, the processor invoking the program instructions to perform a task oriented point cloud data downsampling method as described above.
In a fourth aspect, the present invention provides an electronic device comprising a memory and a processor, the processor and the memory being in communication with each other, the memory storing program instructions executable by the processor, the processor invoking the program instructions to perform a task oriented point cloud data downsampling method as described above.
In a fifth aspect, the present invention provides a computer readable storage medium storing a computer program which, when executed by a processor, implements a task oriented point cloud data downsampling method as described above.
Term interpretation:
a transducer: the transducer (transducer) was a new deep learning framework proposed in 2017 by the google machine translation team paper Attention is All YouNeed. The transducer in the field of deep learning has an encoder-decoder (encoder-decoder) structure, comprising three main modules: an input data embedding module (input embedding), a position encoding module (positional encoding), a self-attention module (self-attention).
And (3) point cloud data: the point cloud data in the track traffic system is a set of vectors in a three-dimensional coordinate system, which is acquired by three-dimensional acquisition equipment, such as a laser radar, a stereo camera and the like, wherein each point contains three-dimensional coordinates, and some points also contain information such as color, depth, reflection intensity and the like.
Downsampling (downsampling): the point cloud data acquired in the rail transit system is often large in scale, for example, the number of the point clouds of a single point cloud image can reach hundreds of thousands to millions, but is limited by the restrictions of indexes such as time, energy consumption and the like, and the existing embedded equipment is difficult to directly operate the large-scale data. Meanwhile, the point cloud data often contains a large number of noise points under the influence of weather, road bump, illumination change and the like, and the accuracy of the data is possibly seriously influenced, so that the accuracy of an unmanned analysis system depending on a large data scale is reduced. Therefore, in an actual point cloud data processing system, a down-sampling operation of the point cloud is often included, that is, noise points and redundant points in the point cloud data are removed.
The invention has the beneficial effects that: the resource-intensive structure in the converter network is subjected to light-weight adjustment, so that the consumption of calculation and storage resources is reduced as much as possible; a sampling loss function is designed, so that proper subset point cloud data with more uniform point cloud distribution and more comprehensive key point coverage is obtained; and finally, combining the general downsampling module with the three-dimensional classification task network to realize effective balance of performance optimization and resource overhead minimization of the task network.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a process flow diagram of a task oriented point cloud data downsampling method based on a lightweight transformer neural network according to an embodiment of the invention.
Fig. 2 is a specific instantiation structure diagram of a task oriented lightweight transformer network model according to an embodiment of the present invention.
Fig. 3 is a specific exemplary structural diagram of a lightweight auto-correlation attention model according to an embodiment of the present invention.
FIG. 4 is a diagram illustrating an exemplary task model for constructing a task oriented object according to an embodiment of the present invention.
Fig. 5 is a point cloud plot of a portion of a training sample and corresponding downsampled point cloud plot according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements throughout or elements having like or similar functionality. The embodiments described below by way of the drawings are exemplary only and should not be construed as limiting the invention.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or groups thereof.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
In order that the invention may be readily understood, a further description of the invention will be rendered by reference to specific embodiments that are illustrated in the appended drawings and are not to be construed as limiting embodiments of the invention.
It will be appreciated by those skilled in the art that the drawings are merely schematic representations of examples and that the elements of the drawings are not necessarily required to practice the invention.
Example 1
The embodiment 1 provides a task-oriented point cloud data downsampling system, which comprises:
the converter module is used for adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer;
the sampling loss module is used for expanding the coverage range of the downsampling point cloud and the attention capability of a key area based on the sampling loss function and promoting the generation of the point cloud as a proper subset of the original point cloud;
and the task guiding module is used for combining the downsampling module with the task network and updating the weight parameters of the downsampling network by utilizing the sampling loss and the task loss together.
In this embodiment 1, a task-oriented point cloud data downsampling method is implemented by using the system, including:
adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer; the English word of the expansion and contraction is scaling, and the expansion and contraction strategy is used for increasing or reducing the scale of the feedforward neural network layer according to the requirement of an actual task. I.e., scaling in scaling means increasing the size of the feedforward neural network, scaling means reducing the size of the feedforward neural network.
Based on the sampling loss function, expanding coverage range of downsampling point cloud and focusing capability of key areas, and promoting generation of point cloud as proper subset of original point cloud;
and combining the downsampling module with the task network, and updating the weight parameters of the downsampling network by utilizing the sampling loss and the task loss together.
The method comprises the steps of adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, introducing a scaling strategy in a feedforward neural network layer, and comprises the following steps:
constructing a light-weight input data embedding layer;
deleting mapping steps of query vectors and key vectors in the self-correlation attention mechanism, only preserving dot multiplication operation of input data, and constructing a lightweight self-correlation attention mechanism;
and constructing a scalable feedforward neural network. Comprising the following steps: according to the size of the data set and the complexity of the scene, the network size is expanded according to priori knowledge. The feedforward neural network is a multi-layer neural network (MLP), and the feedforward neural network is expanded and contracted to increase the number of layers of the MLP and the number of neurons in each layer. After the structure is adjusted, testing and fine tuning are performed until the network converges to a proper accuracy.
Wherein, the mathematical formula of the sampling loss is expressed as:
L sampling =L CD +αL repl +βL soft ,
wherein α and β are regularization parameters, L CD Representing a chamfer loss function, L repl Representing a rejection loss function; l (L) soft Representing the nonlinear mapping penalty.
Chamfer loss function L CD (Q, P) is:
wherein Q represents a generated point cloud, Q represents points in the generated point cloud; p represents the original point cloud, and P represents a point in the original point cloud.
The rejection loss function is:
wherein η (r) =max (0, h 2 -r 2 ) Is a function of ensuring that Q maintains a certain distance from other points in Q, h represents the average separation distance between the generated points, and K represents the number of K neighbors of Q.
The nonlinear mapping loss includes:
q is expressed by using the average weight w of k adjacent points of q as a soft projection point z, and a specific mathematical formula is expressed as follows:
next, gumbel-Softmax Trick is used to optimize the weight w, specifically formulated as:
where t is a learnable temperature coefficient, controlling the distribution shape of the weights w, when t tends to 0, the points z can be approximated as a proper subset of the input point cloud;
finally, a mapping loss is added to the sampling loss to optimize the nonlinearity and convergence of the soft projection, and the specific mathematical formula is expressed as:
L soft =T(t),t∈[0,+∞),
wherein T (·) is a function of T for introducing a nonlinear relationship.
Example 2
In this embodiment 2, a task-oriented point cloud data downsampling method based on a lightweight transformer neural network is provided, the existing transformer network structure is redesigned, the model structure is simplified, and meanwhile, a sufficiently strong learning capability is ensured as much as possible.
As shown in fig. 1, the process flow of the method specifically includes the following steps:
step S1: and constructing a lightweight converter model, wherein the model is mainly used for carrying out lightweight adjustment on all modules in a traditional converter network, so that the cost of calculation and storage resources is reduced as much as possible while the learning capacity of the model is ensured. The specific structure is shown in fig. 2.
Step S1-1: building lightweight input data embedded layers
First, a point cloud data set for a three-dimensional point cloud classification task is collected by a lidar device and divided into a training set and a testing set. Secondly, the function of the input data embedding layer is to map the input point cloud data to a high-dimensional feature space, so as to prepare for subsequent feature extraction. Point cloud data is given that includes N points, where each point includes three-dimensional coordinate information. Compared with the traditional input data embedded layer based on the multi-layer sharing linear layer, the single-layer sharing linear layer (shared linear layer) is utilized to map the original data into the high-dimensional characteristic space, and the output can be obtainedWherein F is o Representing the output characteristics, d o Representing the output feature dimension.
Step S1-2: building lightweight autocorrelation attention module
Self-attention (self-attention) was originally designed to address feature extraction in natural language processing. Since word order is one of the important characteristics of text meaning in natural language processing tasks, it is necessary to convert the input sequence into vectors by a mapping matrix before calculating the attention score. The traditional multi-head attention model is composed of a plurality of single-head attention modules and has a feature extraction mode which is executed in parallel. The traditional single head dot product attention function is formally expressed as:
SA(P)=FC out (Atten(FC Q (P),FC K (P),FC V (P)),
Q,K∈R N×(D/a) ,V∈R N×D
where P represents the input point cloud, FC (·) represents the linear transformation through the projection matrix, Q, K, V represents the input point cloud fluxVector representation after linear transformation, softmax is the activation function, and Q.K T ScalingTo improve network stability, D is the dimension of the Q and K vectors. Note that a is a scale factor of the computational cost for maintaining multi-head attention, which is close to that of single-head mechanisms.
In contrast, the point cloud has unordered characteristics, positions of two points can be interchanged, and the point cloud representation cannot be influenced, so that the design deletes the mapping steps of query vector (Q) and key vector (K) in the traditional self-attention mechanism, and only the point multiplication operation of input data is reserved. Theoretically, this deletion operation is more advantageous for the calculation of the attention score matrix of the point cloud data, because it satisfies the permutation invariance, i.e., a i j=a j i, wherein a represents the attention score between two points, and i and j represent any two different points in the point cloud respectively. In addition, in order to further reduce the calculation and storage overhead of the self-attention mechanism, the operation of the value vector (V) is further removed in the present embodiment. In summary, the new calculation operations are all input data autocorrelation operations, so they are named autocorrelation attentions layers in this embodiment. Formally representing the autocorrelation layer attention function as:
SA(X)=FC out (C(X))
wherein X is the output characteristic of the light-weight input data embedded layer, FC o ut (·) represents the projection matrix by linear transformation, softmax represents the normalization function, D represents the characteristic dimension of X. A specific example structure is shown in fig. 3.
Step S1-3: construction of scalable feedforward neural network
Based on the problem of reduced network learning parameters caused by the lightweight self-correlation attention module of the S1-2, the invention designs the scalable feedforward neural network. The method is mainly characterized in that the scale and the depth of the feedforward neural network are dynamically adjusted according to the requirement of task output performance, so that the purpose of strengthening the learning capacity of the network is achieved.
Step S2: construction of sampling loss function
Through the operation of step S1, gao Weidian cloud feature data containing abundant geometric information can be obtained in this embodiment. The loss function is used to evaluate the degree to which the true and predicted values of the model do not agree, the better the loss function, the higher the performance of the model after training. Thus, in this embodiment a sampling loss function is designed that includes a chamfer distance loss function, a rejection loss function, and a nonlinear soft map loss.
Step S2-1: chamfer loss function
In order to ensure that the points generated by the point cloud downsampling network are proper subsets of the original data, the invention firstly introduces a chamfering loss function L C D(Q,P):
Wherein Q represents a generated point cloud, Q represents points in the generated point cloud; p represents the original point cloud, and P represents a point in the original point cloud.
Step S2-2: rejection loss function
The main limitation of the chamfer loss function of step S2-1 is that it ignores the uniform distribution of points, making it difficult for a simplified set of points to effectively cover the entire surface and critical areas of the object. To alleviate this problem, the present invention introduces a rejection loss function to encourage uniformity of the generated points and critical area coverage. The specific mathematical formula is expressed as L r epl(Q):
Wherein η (r) =max (0, h 2 -r 2 ) Is a function of ensuring that Q is a certain distance from other points in Q, h representing the distance between the points of generationAverage separation distance, K represents K neighbor number of q.
Step S2-3: nonlinear soft mapping loss
The set of key points generated by the task-oriented downsampling module is not guaranteed to be a proper subset of the original point cloud. In this case, the generated point set will inevitably lose the geometric information. In some studies, additional matching operations, such as nearest neighbor searches, are introduced, mapping each generated point to a nearest neighbor point in the original point cloud. However, the matching step limits further improvement in the performance of the downsampled model because conventional matching operations are not differentiable, i.e., cannot be optimized through network training. Therefore, there is a need for an improved matching algorithm.
To solve the above-mentioned problem, a nonlinear soft projection method is proposed in this embodiment to implement differential matching. First, q is expressed by using the average weight w of k adjacent points of q as a soft projection point z, and a specific mathematical formula is expressed as follows:
next, gumbel-Softmax Trick is used to optimize the weight w, specifically formulated as:
where t is a learnable temperature coefficient, controlling the distribution shape of the weights w. It is apparent that when t tends to 0, point z can be approximated as a proper subset of the input point cloud.
Finally, a mapping loss is added to the sampling loss to optimize the nonlinearity and convergence of the soft projection, and the specific mathematical formula is expressed as:
L soft =T(t),t∈[0,+∞),
where T (-) is a function of T, for introducing a nonlinear relationship. To sum up, the mathematical formula for the sampling loss is expressed as:
L sampling =L CD +αL repl +βL soft ,
wherein α and β are regularization parameters.
Step S3: constructing a task-oriented target task model
The lightweight transformer neural network is a plug and play downsampling module, and is combined with a three-dimensional classification task network to form an end-to-end task-oriented point cloud downsampling model.
An example of the overall network structure in this embodiment is shown in fig. 4. Firstly, step S1 builds a lightweight transformer model for extracting point cloud high-dimensional global geometric features. And secondly, designing a sampling loss function in the step S2, and optimizing the updating of the weight parameters in the network training. And finally, inputting the simplified point cloud into a task network, combining the sampling loss and the task loss, and jointly optimizing the weight update of the downsampling network. All loss functions are clustered together for minimization:
argminL sampling (P,Q)+δL task (Q),
where δ is the balance parameter. A part of training samples and corresponding down-sampled point cloud images are shown in FIG. 5, wherein gray points are original point clouds, and thickened points are down-sampled point clouds.
In summary, in this embodiment 2, a lightweight inverter network model is constructed: the model consists of five modules: (1) lightweight input data embedding layer; (2) a lightweight autocorrelation layer; (3) scalable feed forward neural networks; (4) layer normalization; (5) a jump connection. The lightweight input mapping layer maps the point cloud data to a high-dimensional feature space to prepare for subsequent deep feature learning; the light-weight autocorrelation layer is used for extracting refined global feature information; the scalable feedforward neural network introduces a scaling mechanism based on the traditional feedforward neural network, increases the width and depth of the downsampling network under the constraint of limited resource overhead, and improves the quantity of the learnable parameters, thereby improving the learning capacity of the downsampling network. The layer normalization and jump connection is used to optimize the network training process, preventing gradient explosion and overfitting. A sampling loss function is constructed: the sampling loss function comprises three loss functions: the chamfering distance loss function, the rejection loss function and the nonlinear soft mapping loss act together to improve the training performance of the downsampling network. A task-oriented target task model is constructed. The lightweight transformer neural network provided by the design is a plug and play universal sampling module, and can be theoretically combined with any task network or model with point cloud downsampling requirements. And finally, cascading the lightweight converter network, the sampling loss function and the task neural network to form an end-to-end task-oriented point cloud downsampling model.
Example 3
An embodiment 3 of the present invention provides an electronic device, including a memory and a processor, where the processor and the memory are in communication with each other, the memory stores program instructions executable by the processor, and the processor invokes the program instructions to execute a method for down-sampling point cloud data based on task guidance of a transformer neural network, where the method includes the following steps:
adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer;
based on the sampling loss function, expanding coverage range of downsampling point cloud and focusing capability of key areas, and promoting generation of point cloud as proper subset of original point cloud;
and combining the downsampling module with the task network, and updating the weight parameters of the downsampling network by utilizing the sampling loss and the task loss together.
Example 4
Embodiment 4 of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements a method for down-sampling point cloud data based on task guidance of a transformer neural network, the method comprising the steps of:
adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer;
based on the sampling loss function, expanding coverage range of downsampling point cloud and focusing capability of key areas, and promoting generation of point cloud as proper subset of original point cloud;
and combining the downsampling module with the task network, and updating the weight parameters of the downsampling network by utilizing the sampling loss and the task loss together.
Example 5
Embodiment 5 of the present invention provides a computer device, including a memory and a processor, where the processor and the memory are in communication with each other, the memory stores program instructions executable by the processor, and the processor invokes the program instructions to perform a method for down-sampling point cloud data based on task guidance of a transformer neural network, the method includes the steps of:
adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer;
based on the sampling loss function, expanding coverage range of downsampling point cloud and focusing capability of key areas, and promoting generation of point cloud as proper subset of original point cloud;
and combining the downsampling module with the task network, and updating the weight parameters of the downsampling network by utilizing the sampling loss and the task loss together.
In summary, the point cloud downsampling method based on the lightweight converter neural network provided by the embodiment of the invention can be used for track traffic, intelligent traffic, unmanned systems, three-dimensional vision robots, augmented reality technology, intelligent cities and point cloud data processing systems. The method comprises the following steps: constructing a light-weight autocorrelation attention mechanism, and extracting refined global geometric information of the point cloud; combining hardware resource requirements and global geometric information, and generating downsampling point cloud data of a specific scale by utilizing a scalable feedforward neural network; the generated point cloud is optimized based on the sampling loss function designed by the invention, so that the generated point cloud is ensured to be a proper subset of the original point cloud, and the convergence speed of the model is accelerated; and finally cascading the target task network to finish the specific target task. And by utilizing the strong global feature extraction capability of the converter network and designing a brand new lightweight framework, the fine downsampling of the original point cloud input is completed. Specifically, a light-weight autocorrelation attention mechanism is designed, the extraction capacity of point cloud geometric feature information is optimized, and the calculation amount and parameter amount requirements of a model are compressed; under the condition of limited resource expense, a lightweight scalable feedforward neural network is designed for adjusting the depth and width of the neural network, so that the learning capacity of the neural network is enhanced; in order to improve the performance of a point cloud downsampling task, a new sampling loss function is designed, so that downsampling point cloud data with more uniform point cloud distribution and more comprehensive key point coverage is obtained based on a lightweight neural network; by combining the modules, a point cloud downsampling model based on task-oriented plug and play is designed, so that the module can be combined with a three-dimensional classification task neural network, original point cloud geometric information is kept as completely as possible under the condition of reducing the scale of the point cloud, and effective balance of performance optimization and resource cost minimization of the three-dimensional classification task network is completed.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it should be understood that various changes and modifications could be made by one skilled in the art without the need for inventive faculty, which would fall within the scope of the invention.

Claims (6)

1. The task-oriented point cloud data downsampling method is characterized by comprising the following steps of:
adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer;
based on the sampling loss function, expanding coverage range of downsampling point cloud and focusing capability of key areas, and promoting generation of point cloud as proper subset of original point cloud;
combining the downsampling module with a task network, and updating weight parameters of the downsampling network by utilizing sampling loss and task loss together; the converter network is a plug-and-play downsampling module and is combined with the three-dimensional classification task network to form an end-to-end task-oriented point cloud downsampling network;
wherein,,
the mathematical formula for the sampling loss is expressed as:
L sampling =L CD +αL repl +βL soft ,
wherein α and β are regularization parameters, L CD Representing a chamfer loss function, L repl Representing a rejection loss function; l (L) soft Representing nonlinear mapping loss;
chamfer loss function L CD (Q, P) is:
wherein Q represents a generated point cloud, Q represents points in the generated point cloud; p represents an original point cloud, and P represents a point in the original point cloud;
the rejection loss function is:
wherein η (r) =max (0, h 2 -r 2 ) Is a function for ensuring that Q keeps a preset distance from other points in Q, h represents an average separation distance between generated points, and K represents the number of K neighbors of Q;
the nonlinear mapping loss includes:
q is expressed by using the average weight w of k adjacent points of q as a soft projection point z, and a specific mathematical formula is expressed as follows:
next, gumbel-Softmax Trick is used to optimize the weight w, specifically formulated as:
where t is a learnable temperature coefficient, controlling the distribution shape of the weights w, when t tends to 0, the points z are approximated as a proper subset of the input point cloud;
finally, a mapping loss is added to the sampling loss to optimize the nonlinearity and convergence of the soft projection, and the specific mathematical formula is expressed as:
L soft =T(t),t∈[0,+∞),
wherein T (·) is a function of T for introducing a nonlinear relationship.
2. The task oriented point cloud data downsampling method of claim 1, wherein adjusting the resource intensive structure in the transformer network, removing the position embedding, simplifying the input data embedding layer structure, deleting the mapping matrix operation of the self-attention mechanism, introducing the scaling strategy in the feedforward neural network layer, comprises:
constructing a light-weight input data embedding layer;
deleting mapping steps of query vectors and key vectors in the self-correlation attention mechanism, only preserving dot multiplication operation of input data, and constructing a lightweight self-correlation attention mechanism;
and constructing a scalable feedforward neural network.
3. A task oriented point cloud data downsampling system, comprising:
the converter module is used for adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer;
the sampling loss module is used for expanding the coverage range of the downsampling point cloud and the attention capability of a key area based on the sampling loss function and promoting the generation of the point cloud as a proper subset of the original point cloud;
the task guiding module is used for combining the downsampling module with the task network and updating the weight parameters of the downsampling network by utilizing the sampling loss and the task loss together; the converter network is a plug-and-play downsampling module and is combined with the three-dimensional classification task network to form an end-to-end task-oriented point cloud downsampling network;
wherein,,
the mathematical formula for the sampling loss is expressed as:
L sampling =L CD +αL repl +βL soft ,
wherein α and β are regularization parameters, L CD Representing a chamfer loss function, L repl Representing a rejection loss function; l (L) soft Representing nonlinear mapping loss;
chamfer loss function L CD (Q, P) is:
wherein Q represents a generated point cloud, Q represents points in the generated point cloud; p represents an original point cloud, and P represents a point in the original point cloud;
the rejection loss function is:
wherein η (r) =max (0, h 2 -r 2 ) Is a function for ensuring that Q keeps a preset distance from other points in Q, h represents an average separation distance between generated points, and K represents the number of K neighbors of Q;
the nonlinear mapping loss includes:
q is expressed by using the average weight w of k adjacent points of q as a soft projection point z, and a specific mathematical formula is expressed as follows:
next, gumbel-Softmax Trick is used to optimize the weight w, specifically formulated as:
where t is a learnable temperature coefficient, controlling the distribution shape of the weights w, when t tends to 0, the points z are approximated as a proper subset of the input point cloud;
finally, a mapping loss is added to the sampling loss to optimize the nonlinearity and convergence of the soft projection, and the specific mathematical formula is expressed as:
L soft =T(t),t∈[0,+∞),
wherein T (·) is a function of T for introducing a nonlinear relationship.
4. A computer readable storage medium storing a computer program, which when executed by a processor implements the task oriented point cloud data downsampling method of any of claims 1-2.
5. A computer device comprising a memory and a processor, the processor and the memory being in communication with each other, the memory storing program instructions executable by the processor, the processor invoking the program instructions to perform the task oriented point cloud data downsampling method of any of claims 1-2.
6. An electronic device comprising a memory and a processor, the processor and the memory in communication with each other, the memory storing program instructions executable by the processor, the processor invoking the program instructions to perform the task oriented point cloud data downsampling method of any of claims 1-2.
CN202210689275.5A 2022-06-17 2022-06-17 Task-oriented point cloud data downsampling method and system Active CN115049786B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210689275.5A CN115049786B (en) 2022-06-17 2022-06-17 Task-oriented point cloud data downsampling method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210689275.5A CN115049786B (en) 2022-06-17 2022-06-17 Task-oriented point cloud data downsampling method and system

Publications (2)

Publication Number Publication Date
CN115049786A CN115049786A (en) 2022-09-13
CN115049786B true CN115049786B (en) 2023-07-18

Family

ID=83160762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210689275.5A Active CN115049786B (en) 2022-06-17 2022-06-17 Task-oriented point cloud data downsampling method and system

Country Status (1)

Country Link
CN (1) CN115049786B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116029022A (en) * 2022-12-23 2023-04-28 内蒙古自治区交通建设工程质量监测鉴定站(内蒙古自治区交通运输科学发展研究院) Three-dimensional visualization temperature field construction method for tunnel and related equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3674870A1 (en) * 2018-12-29 2020-07-01 Dassault Systèmes Learning a neural network for inference of editable feature trees
CN113870160B (en) * 2021-09-10 2024-02-27 北京交通大学 Point cloud data processing method based on transformer neural network
CN114445280B (en) * 2022-01-21 2024-03-29 太原科技大学 Point cloud downsampling method based on attention mechanism

Also Published As

Publication number Publication date
CN115049786A (en) 2022-09-13

Similar Documents

Publication Publication Date Title
CN110060475B (en) Multi-intersection signal lamp cooperative control method based on deep reinforcement learning
CN113887610B (en) Pollen image classification method based on cross-attention distillation transducer
Mousavi et al. Traffic light control using deep policy‐gradient and value‐function‐based reinforcement learning
CN112435282B (en) Real-time binocular stereo matching method based on self-adaptive candidate parallax prediction network
CN111860340B (en) Efficient K-nearest neighbor search algorithm for unmanned three-dimensional laser radar point cloud
CN111178316A (en) High-resolution remote sensing image land cover classification method based on automatic search of depth architecture
CN113449736B (en) Photogrammetry point cloud semantic segmentation method based on deep learning
CN112347970A (en) Remote sensing image ground object identification method based on graph convolution neural network
CN109389246B (en) Neural network-based vehicle destination area range prediction method
CN115049786B (en) Task-oriented point cloud data downsampling method and system
CN114120115A (en) Point cloud target detection method for fusing point features and grid features
CN113052254A (en) Multi-attention ghost residual fusion classification model and classification method thereof
CN110633706B (en) Semantic segmentation method based on pyramid network
Liu et al. Data augmentation technology driven by image style transfer in self-driving car based on end-to-end learning
Zheng et al. CLMIP: cross-layer manifold invariance based pruning method of deep convolutional neural network for real-time road type recognition
Sun et al. RobNet: real-time road-object 3D point cloud segmentation based on SqueezeNet and cyclic CRF
CN117237623B (en) Semantic segmentation method and system for remote sensing image of unmanned aerial vehicle
CN117576149A (en) Single-target tracking method based on attention mechanism
CN114937153B (en) Visual characteristic processing system and method based on neural network in weak texture environment
CN114494284B (en) Scene analysis model and method based on explicit supervision area relation
CN115620238A (en) Park pedestrian attribute identification method based on multivariate information fusion
Yu et al. An incremental learning based convolutional neural network model for large-scale and short-term traffic flow
CN117036698B (en) Semantic segmentation method based on dual feature knowledge distillation
CN117875533B (en) Mining safety escape path planning method and system
Chen et al. Semantic segmentation using generative adversarial network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant