CN113723208B

CN113723208B - Three-dimensional object shape classification method based on canonical and other transformation conversion sub-neural network

Info

Publication number: CN113723208B
Application number: CN202110895887.5A
Authority: CN
Inventors: 林宙辰; 董一鸣; 何翎申; 王奕森
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2023-10-20
Anticipated expiration: 2041-08-05
Also published as: CN113723208A

Abstract

The invention discloses a three-dimensional object shape recognition method based on a neural network of standard isomorphism conversion elements, which is used for creating isomorphism conversion elements for realizing model standard isomorphism, projecting a global coordinate system to a local coordinate system, realizing model rotation invariance based on standard isomorphism and effectively carrying out three-dimensional object classification and recognition visual analysis. The three-dimensional object shape recognition model GET based on the neural network of the transformation converter such as the standard is constructed and trained, the input of the model is a 3D object represented by a two-dimensional manifold structure in a three-dimensional space, the input is output as the prediction type of the 3D object, the visual analysis such as the classification and recognition of the shape of the object in the 3D image data can be efficiently carried out, and the accuracy and the efficiency of the shape classification of the object are improved.

Description

Three-dimensional object shape classification method based on canonical and other transformation conversion sub-neural network

Technical Field

The invention belongs to the technical fields of pattern recognition, machine learning, neural network, deep learning, artificial intelligence and computer graphics, relates to an object shape classification method, and in particular relates to a three-dimensional object shape classification method based on a standard and other transformation and conversion sub-neural network.

Background

In recent years, conversion operators (transformers) have almost dominated related algorithms in the field of natural language processing. A significant advantage of this is that the most relevant parts are emphasized in a given context. Because of its excellent performance, there is much work currently being done to apply transformers to other fields of machine learning, such as computer vision and image processing applications.

Manifold learning is a machine learning technique that applies a traditional neural network model to a complex and diverse data structure. Some existing works have the curved surface data in a three-dimensional image represented by performing two-dimensional projection or by voxel grid points, and the disadvantage of these methods is that the calculation amount is too large. Other works have defined convolution directly on the surface, which has the advantage of being more robust to deformation processing of surfaces in three-dimensional images. However, the main difficulty of the method is that the neighborhood of each point on the curved surface does not have a standard coordinate system, so that the parameterisation of adjacent points cannot be unified, and the performance of the neural network model is affected.

In order to solve the uncertainty of a neighborhood coordinate system, a constant deep learning technology is proposed. The superiority of the Convolutional Neural Network (CNN) which is widely used at present is largely brought about by the variability of translation and the like, which makes researchers to widen this characteristic to other operations such as rotation operation and the like. Cohen et al apply the standard isomorphism (gauge equivariance) to manifold learning, de Haan et al propose a standard isomorphism neural network based on anisotropic convolution, and the anisotropic convolution kernel of the graph convolution network is modified to meet the standard isomorphism condition, so that the object shape classification technology of the standard isomorphism CNN based on manifold can be directly applied to meshing successfully. They designed a new activation function in the neural network that satisfies the standard isomorphism condition, called regular noise, and realized the standard isomorphism based on fourier transform. However, introducing fourier transforms introduces additional computational resource consumption; in addition, the three-dimensional shape recognition and classification method proposed by de han et al is based on convolution, which is equivalent to applying the same attention to all points in the neighborhood, ignoring the influence of attention weight based on content, and has lower recognition accuracy.

Disclosure of Invention

In order to overcome the defects of the conventional standard isomorphism object recognition technology in terms of rotation invariance and attention mechanism introduction, the invention provides a novel three-dimensional object shape recognition method based on a neural network of a standard isomorphism converter, wherein a model is named GET (Gauge Equivariant Transformer) and is used for efficiently carrying out visual analysis such as classification and recognition of the shape of an object in 3D image data.

The invention uses isomorphism as mathematical guidance, and designs an input processing and conversion operator layer (a conversion operator layer) of the model respectively, so that the whole three-dimensional object shape recognition model GET has spatial rotation invariance and standard invariance at the same time. The input of the whole model is a 3D object expressed by a two-dimensional manifold structure in a three-dimensional space, and the output is a prediction type of the 3D object.

The technical scheme provided by the invention is as follows:

a three-dimensional object shape recognition method based on a neural network of standard and other transformation conversion factors, which designs the standard and other transformation factors of a model, designs a method for projecting a global coordinate system to a local coordinate system and combines the standard and other transformation factors to realize the rotation invariance of the model, is used for efficiently carrying out three-dimensional object classification, recognition and other visual analysis, and comprises the following steps:

firstly, carrying out meshing (mesh) on a 3D object represented by a manifold structure to generate mesh data;

in practice, 3D object data may be acquired with a 3D camera or directly with existing 3D object data sets. The 3D object mesh data is a discrete data representation of a 3D object consisting of a set of points, edges, faces. For a manifold containing infinite points, we use the furthest point sampling (Farthest Point Sampling) algorithm to obtain a set of a given number of points and form a triangular mesh (triangulated mesh) from these points.

Secondly), preprocessing the mesh data;

21 Normalized by a number of samples

Summing the areas of each triangle of the triangular meshes, calculating the area of the whole triangular mesh, and then taking the area as a shrinkage side and a surface of the mesh to normalize the total area to be 1.

22 Determining a neighborhood

Setting a geodesic threshold value and a linear threshold value in advance; according to the bit of each point in the mesh in the spaceSetting, finding out a point set with a linear distance from the point being smaller than a set geodetic threshold value in space, then calculating the geodetic distance between points with a linear distance from the point being smaller than the set linear threshold value for each point i in the mesh by using a thermal vector method (vector heat method), and then screening out the points with the linear distance being smaller than the set linear threshold value as a neighborhood of the point i, and marking as n _i 。

23 Selected local coordinate system

For each point in the 3D object mesh, a tangent plane at the point is calculated, and the coordinate system of the tangent plane is arbitrarily selected as the x-axis and the y-axis of the local coordinate system. The tangent vector of the tangent plane is arbitrarily selected as the z-axis of the local coordinate system.

24 Computing log maps and contacts

Calculating the local coordinates of the point j in the neighborhood of each point i in the mesh by using a thermal vector method according to the local coordinate systemContact g corresponding to point j to point i _j→i, wherein g_j→i Is a two-dimensional rotation matrix.

25 Construction of input features

For each point in the mesh, the coordinates of the point in the global coordinate system are projected to the local coordinate system given in the step 24), and the projected coordinate system value is used as the input of the model.

Third), dividing the data set into training samples and test samples

Fourth, constructing a standard-class Transformer;

one transducer consists of three parts, a key function, a query function and a value function. Wherein key and query are components of attention score (attention score). We implement canonical alike transformers by designing invariant saturation score and alike value functions.

41 Architecture for building a canonical alike transducer

Let f of the transducer input feature field be C in dimension _in Group (a) of groupsDenoted as ρ _in Output feature fieldIs of dimension C _out The group is denoted as ρ _out Definition of the output of a canonical-isomorphic transducer at point p under canonical w>The method comprises the following steps:

wherein MHSA is a multi-head attention function, SA is a single-head attention function, W _M Is a linear transformation matrix, || is a vector concatenation operation operator. At the head h, the output of the SA function is:

wherein ,point q _u ＝exp _p w _p (u),f _w ′(q _u ) For point q _u The eigenvectors at that point are shifted in parallel to the value of point p under specification w, V _u As a value function, it uses a matrix for the relative position u>Encoding is performed with the following expression:

alpha is the attention score, which is p at the center point and q at the neighborhood point _u The expression at time head h is as follows:

we propose the process of constructing the value function and the attention score in 43) and 44), respectively.

42 Expanded formal representation

C _N The group is formed by all corresponding radian values in the space(wherein k is an integer between 0 and N-1) and the regular expression (Regular representation) is C _N Is a special group representation of the same. The definition of the regular expression of the present invention is defined by the regular expression described in Linear Representation of Finite Group. If using theta _k To indicate the rotation angle is +.>And C is the rotation matrix of (2) _N Can be expressed as { Θ } ₀ ,Θ ₁ ,…,Θ _N-1 }. For an integer k between 0 and N-1, the normal expression +.>Is an nxn permutation matrix that cyclically translates all components of a vector by k units.

Can be decomposed into +.> wherein Is C _N Is an N x N invertible matrix. When N is an odd number, the irreducible representation is as follows:

wherein θ ε [0,2π) is the rotation angle corresponding to matrix Θ, namely:

and is also provided with

Further, the invention extends the irreducible representation over a two-dimensional rotating group SO (2), expressed as:

wherein This results in an expanded group representation:

hereby is achieved the property of moving any vector in space in parallel without losing rotation angle information.

43 A value function of standard constant change is constructed, and the relative position u is encoded by a matrix;

defining a value function as a numerical value left-by-value encoding matrix W of feature vectors of parallel movement _V . The full requirement for Value function specification and the like is W _V (Θ ^-1 u)＝ρ _out (Θ ^-1 )W _V (u)ρ _in (Θ). For W _V The taylor expansion is performed to solve this equation, namely:

substituting the formula into a fully necessary conditional formula of standard isomorphism to obtain a linear equation set:

W ₀ ＝ρ _out (Θ ^-1 )W ₀ ρ _in (Θ),

cos(θ)W ₁ -sin(θ)W ₂ ＝ρ _out (Θ ^-1 )W ₁ ρ _in (Θ),

sin(θ)K ₁ -cos(θ)K ₂ ＝ρ _out (Θ ^-1 )W ₂ ρ _in (Θ),

...

there are an infinite number of equations in this linear system of equations, and the number of equations can be limited by truncating the term of the taylor expansion. When the system is truncated to the order 2, the system of equations only contains 6 equations, and the existing program library can be effectively used for solving. After solving, we can obtain a set of basesWhere m is the dimension of the solution space. Each +.>Are all composed of six parts, i.e.)>Thus, the constant code matrix W ⁽ⁱ⁾ Has the following form:

these W ⁽ⁱ⁾ Is a linear combination of Σc _i W ⁽ⁱ⁾ Still meets the sufficient and necessary conditions for Value function specification and the like. During training, c _i Designed as a learnable parameter.

44 Building a canonical invariant attention score

In the implementation process, the manifold is generally discretized into a mesh structure. Here we choose key and query functions using Graph Att described in literature (Velickovic et al)The architecture of the event Network, i.e. wherein />The score function (score function) employs the architecture S (K (·), Q (·))=p (ReLU (K (·) +q (·))). Here ReLU is the component-wise activation function and P is the average pooling function. Linear transformation matrix W _K and W_Q It is also necessary to satisfy the first equation of the system of equations. After activation and pooling, the computed intent score will be specification-invariant.

45 Rotation invariance is realized;

the rotational invariance is achieved by projecting the coordinates of the points on the manifold under the global coordinate system to the local coordinate system. Let x be _p Is the coordinate value of the point p in the global coordinate system, n _p The normal vector at point p, specification w at point p _p From two coordinate directions u _p and v_p And (5) determining. The coordinate value obtained after projection under the local coordinate system is X _p ＝(<x _p ,u _p >,<x _p ,v _p >,<x _p ,n _p >) Such X is rotation invariant to the global coordinate system.

Fifth), constructing a three-dimensional object shape recognition model GET of a neural network based on a canonical-like transformation converter

An existing convolutional neural network architecture, such as ResNet, can be used in construction, wherein the convolutional layer adopts the transducer constructed in step four). After the output of the last convolution layer, a group pooling layer (described in Cohen et al) is added, so that the output pool with standard change becomes the output with standard unchanged, and finally, the prediction score of each category is obtained through a global average pooling layer and a full connection layer.

The transducer layer constructed in the present invention was initialized by using the Weiler initialization method described in literature (Weiler Learning Steerable Filters for Rotation Equivariant CNNs paper), and the full link layer was initialized by using the Xavier initialization method. The entire network structure may be implemented using a pytorch.

In the specific implementation of the invention, a back propagation algorithm is adopted, and the model is trained by an Adam algorithm. The training process iterates 50 rounds with a batch size of 1, i.e., one batch per mesh. The initial learning rate was 0.01, the 31 to 40 rounds of learning rate was 0.001, and the 41 to 50 rounds of learning rate was 0.0005.

Sixthly, recognizing the 3D object mesh sample data to be tested by utilizing the GET model constructed and trained in the step five), and obtaining the predicted shape recognition tag.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a novel three-dimensional object shape recognition method based on a neural network of a standard and other transformation converters, which is used for creating a three-dimensional object shape recognition model GET with space rotation invariance and standard invariance, wherein the model is input into a 3D object represented by a two-dimensional manifold structure in a three-dimensional space, and the model is output as a prediction type of the 3D object, so that visual analysis such as classification and recognition of the object shape in 3D image data can be effectively carried out, and the accuracy and efficiency of object shape classification are improved.

Drawings

FIG. 1 is a block diagram of a 3D object shape classification model GET network constructed in accordance with an embodiment of the invention.

Detailed Description

The invention is further described by way of examples in the following with reference to the accompanying drawings, but in no way limit the scope of the invention.

The invention provides a novel three-dimensional object shape recognition method based on a neural network of a standard and other transformation converters, which constructs a 3D object shape classification model GET, wherein the network structure is shown in figure 1 and comprises the following steps: an input layer, an attention mechanism layer, a group pooling layer, a global average pooling layer and a full connection layer. The input layer converts the data in the original 3D coordinate system into projected input through steps 23), 24) and 25), the attention mechanism layer is the transducer layer constructed in step 4, and the group pooling layer is first proposed by Weiler et al in General E (2) -equivariant Steerable CNNs, which takes the maximum value of the numerical component of the vector under a given specification in the feature domain, resulting in a specification-invariant one-dimensional feature domain. The global averaging pooling layer averages the values of all feature vectors in the feature domain under a given specification to generate a global feature vector. The full connection layer is the full connection layer of the deep neural network. The following examples identify and classify three-dimensional object shapes based on canonical or other transformation sub-neural networks for 3D objects of a 3D object dataset (SHREC 11 dataset, see document "Shape Retrieval on Non-rib 3D Watertight Meshes"). The method comprises the following steps:

step 1: gridding the data in the 3D object data set to generate 3D object mesh data; a set of a given number of points is obtained using a furthest point sampling algorithm and a triangular mesh is formed from the points.

Step 2: preprocessing 3D object mesh data;

21 Normalized by a number of samples

Summing the areas of each triangle of the triangular meshes, calculating the area of the whole triangular mesh, and scaling the edges and the surfaces of the triangular mesh according to the area to normalize the total area to be 1.

22 Determining a neighborhood

Setting a geodesic threshold value and a linear threshold value in advance; according to the position of each point in the 3D object mesh data in the space, finding out a point set with the linear distance from the point being smaller than the set geodetic threshold value in the space, then using a thermal vector method (vector heat method), calculating the geodetic distance between points with the linear distance from the point being smaller than the set linear threshold value for each point i in the mesh, and then screening out the points with the geodetic distance being smaller than the set linear threshold value from the points with the linear distance being smaller than the corresponding geodetic threshold value as a neighborhood of the point i, and marking as n _i 。

23 Selected local coordinate system

For each point in the mesh, a tangential plane at the point is calculated, and the coordinate system of the tangential plane is arbitrarily selected as the x-axis and the y-axis of the local coordinate system. The tangent vector of the tangent plane is arbitrarily selected as the z-axis of the local coordinate system.

24 Computing log maps and contacts

Calculating the local coordinates of the point j in the neighborhood of each point i in the 3D object mesh data by using a thermal vector method according to the local coordinate systemContact g corresponding to point j to point i _j→i, wherein g_j→i Is a two-dimensional rotation matrix.

25 Construction of input features

For each point in the 3D object mesh data, the coordinates of the point under the global coordinate system are projected to the local coordinate system in the step 24), and the projected coordinate system value is used as the input of the model.

Step 3: dividing the dataset into a training sample and a test sample;

the present example uses a data set, namely a 3D object shape data set SHREC11, which is obtained by computer graphics modeling of objective entities from well known databases, such as PSB, mcGill, etc., and comprises 30 classes of 3D object shapes, each 3D object shape having 20 samples, and we divide the data set into a training set and a test set, each class of which is 10 3D object shape samples.

Step 4: construction specification-alike transformers

41 Overall architecture)

Suppose the dimension of the transducer input feature field is C _in The group is denoted as ρ _in The dimension of the output feature domain is C _out The group is denoted as ρ _out The output of the transformation of the specification and the like at the point p under the specification w is defined as follows:

alpha is an attribute score expressed as follows:

we propose the procedure of constructing the value function and the attention score in 43) and 44), respectively.

42 Expanded formal representation

C _N The group is formed by all corresponding radian values in the space(wherein k is an integer between 0 and N-1) and the regular expression (Regular representation) is C _N Is a special group representation of the same. This patent saysThe precise definition and specific description of the explicit formal representation is given in Linear Representation of Finite Group. If using theta _k To indicate the rotation angle is +.>And C is the rotation matrix of (2) _N Can be expressed as { Θ } ₀ ,Θ ₁ ,…,Θ _N-1 }. For an integer k between 0 and N-1, the normal expression +.>Is an nxn permutation matrix that cyclically translates all components of a vector by k units.

wherein θ ε [0,2π) is the rotation angle corresponding to matrix Θ, namely:

and is also provided withWe extend these irreducible representations over a two-dimensional rotation group SO (2), namely:

wherein Thus we get the expanded group representation as

The property of moving any vector in space in parallel without losing rotation angle information is achieved.

43 Value function of specification constant

We define the value function as the numerical left-by-value encoding matrix W of feature vectors that are parallel-shifted _V . The full requirement for Value function specification and the like is W _V (Θ ^-1 u)＝ρ _out (Θ ^-1 )W _V (u)ρ _in (Θ) we have done for W _V Solving this equation by Taylor expansion, i.e

W ₀ ＝ρ _out (Θ ^-1 )W ₀ ρ _in (Θ),

cos(θ)W ₁ -sin(θ)W ₂ ＝ρ _out (Θ ^-1 )W ₁ ρ _in (Θ),

sin(θ)K ₁ -cos(θ)K ₂ ＝ρ _out (Θ ^-1 )W ₂ ρ _in (Θ),

...

there are an infinite number of equations in this linear system of equations, we can truncate the term of the taylor expansion to limit the number of equations. When the cut-off is 2 nd order, the equation set only comprises 6 equations, and there can beThe existing program library is effectively used for solving. After solving, we can obtain a set of basesWhere m is the dimension of the solution space. Each +.>Are all composed of six parts, i.e.)>Thus, the constant code matrix W ⁽ⁱ⁾ Has the following form:

these W ⁽ⁱ⁾ Is a linear combination of Σc _i W ⁽ⁱ⁾ The constant condition is still satisfied. During training process, c _i May be designed as a learnable parameter.

44 Normative constant attention score

In the implementation process, the manifold is generally discretized into a mesh structure. Here we choose key and query functions using the same architecture as Graph Attention Network proposed by Velickovic et al, i.e wherein /> The scoring function (score function) employs a similar architecture as Graph Attention Network, i.e., S (K (·), Q (·))=p (ReLU (K) +q (·))). Here ReLU is the component-wise activation function and P is the average pooling function. Linear transformation matrix W _K and W_Q It is also necessary to satisfy the first equation of the system of equations. After activation and pooling, the calculated intent scThe ore will be specification-invariant.

45 Rotational invariance implementation

Step 5: three-dimensional object shape recognition model GET for constructing neural network based on standard and other transformation scales

An existing convolutional neural network architecture, such as ResNet, may be used in the construction to change the convolutional layer therein to the Transformer introduced in step 4. And adding a group pooling layer proposed by Cohen et al after the output of the last convolution layer so as to pool the standard-unchanged output into a standard-unchanged output, and finally obtaining the prediction score of each category through a global average pooling layer and a full connection layer.

The transducer layer in the present invention uses the initialization method set forth in the paper Learning Steerable Filters for Rotation Equivariant CNNs by Weiler, and the full connectivity layer uses the Xavier initialization method. The entire network structure may be implemented using a pytorch.

Step 6: and 5, identifying the test mesh sample by utilizing the GET model constructed and trained in the step 5 to obtain a predicted shape identification label. Table 1 gives the image classification error rate comparison of the model GET of the invention and other models on the SHREC dataset:

TABLE 1 image classification accuracy of the model GET and other models of the invention on SHREC dataset

Model	SO (3) invariance	Accuracy rate of	Quantity of parameters
				MDGCNN	Whether or not	82.2％	---
MeshCNN	Is that	91.0％	---
				HSN	Whether or not	96.6％	78k
GET	Is that	99.2％	11k

The above table shows that the model proposed by the patent can realize better performance under smaller parameter quantity, and benefits from the fact that the model has rotation invariance, standard and other variability. The convolution kernel in MDGCNN is scalar in form, which greatly limits its performance. Compared to the previously optimal specification-invariant network HSN, performance is lower and more parameters are required due to no rotational invariance of our model.

It should be noted that the purpose of the disclosed embodiments is to aid further understanding of the present invention, but those skilled in the art will appreciate that: various alternatives and modifications are possible without departing from the scope of the invention and the appended claims. Therefore, the invention should not be limited to the disclosed embodiments, but rather the scope of the invention is defined by the appended claims.

Claims

1. A three-dimensional object shape recognition method based on a neural network of standard isomorphism conversion elements creates isomorphism conversion elements for realizing model standard isomorphism, projects a global coordinate system to a local coordinate system, realizes model rotation invariance based on standard isomorphism, and is used for efficiently classifying and recognizing vision analysis of three-dimensional objects; the method comprises the following steps:

1) Performing meshing (mesh) on the 3D object data represented by the manifold structure to generate 3D object mesh data;

2) Preprocessing 3D object mesh data; comprising the following steps: normalizing; determining a neighborhood; selecting a local coordinate system; calculating a log map and contacting; constructing a model input feature, and projecting the coordinates of each point in the mesh under the global coordinate system to the local coordinate system to serve as the model input feature;

3) Dividing the 3D object mesh data set into a training sample and a test sample;

4) Constructing a standard-class Transformer;

the transducer includes: a key value function, a query function, and a cost function; wherein the attention score includes a key value key and a query; realizing a standard isomorphic Transformer by designing a constant attention score and an isomorphic cost function; the method comprises the following steps:

41 A standard-compliant transducer architecture is constructed;

let the dimension of the transducer input feature field f be C _in The group is denoted as ρ _in Output feature fieldIs of dimension C _out The group is denoted as ρ _out The method comprises the steps of carrying out a first treatment on the surface of the Definition of a canonical Isotropic Transformer under canonical w, output of Point p +.>The method comprises the following steps:

wherein MHSA is a multi-head attention function, SA is a single-head attention function, W _M Is a linear transformation matrix, and I is a vector splicing operation operator; at the head h, the output of the SA function is:

wherein ,point q _u ＝exp _p w _p (u),f _w ′(q _u ) For point q _u The eigenvectors at that point are shifted in parallel to the value of point p under specification w, V _u As a cost function, it uses a matrix for the relative position u>Encoding is performed with the following expression:

wherein alpha is the attentionForce fraction, p at the center point, q at the neighborhood point _u The expression at time head h is as follows:

42 Expanding the regular representation;

C _N the group is formed by all corresponding radian values in the spaceA group of N elements, wherein k is an integer between 0 and N-1; the formal representation is C _N Is a special group representation of (a); if using theta _k To indicate the rotation angle asAnd C is the rotation matrix of (2) _N Can be expressed as { Θ } ₀ ,Θ ₁ ,…,Θ _N-1 -a }; for integer k, formal representation +.>Is an nxn permutation matrix in which all components of the vector are circularly translated by k units;

can be decomposed into +.> wherein Is C _N Is an N x N invertible matrix; when N is an odd number, the irreducible representation is as follows:

wherein θ∈ [0,2π) is the rotation angle corresponding to the matrix Θ, namely:

and is also provided with

Further, the irreducible representation is extended to a two-dimensional rotation group SO (2), expressed as:

wherein I.e. the expanded group is expressed as:

thereby realizing parallel movement of any vector in the space without losing rotation angle information;

43 Constructing a value function of standard constant change, and encoding the relative position u by a matrix;

defining a cost function as a numerical left-hand value encoding matrix W of feature vectors moving in parallel _V The method comprises the steps of carrying out a first treatment on the surface of the The sufficiency requirement of the change of the cost function specification is W _V (Θ ^-1 u)＝ρ _out (Θ ^-1 )W _V (u)ρ _in (Θ); for W _V The taylor expansion is performed to solve this equation, namely:

substituting the formula into a full and necessary conditional formula of standard and the like to obtain a linear equation set, wherein the linear equation set is expressed as:

W ₀ ＝ρ _out (Θ ^-1 )W ₀ ρ _in (Θ),

cos(θ)W ₁ -sin(θ)W ₂ ＝ρ _out (Θ ^-1 )W ₁ ρ _in (Θ),

sin(θ)K ₁ -cos(θ)K ₂ ＝ρ _out (Θ ^-1 )W ₂ ρ _in (Θ),

…

the number of equations in the linear system of equations may be limited by truncating the term of the taylor expansion; solving the linear equation set to obtain a set of basesWhere m is the dimension of the solution space; each +.>Comprises->Coding matrix W to be constant ⁽ⁱ⁾ Expressed as:

W ⁽ⁱ⁾ is a linear combination of Σc _i W ⁽ⁱ⁾ Still meets the sufficient and necessary conditions of Value function specification and the like; during training process, c _i Is a learnable parameter;

44 A build specification invariant attention score;

the key value function and the query function are expressed as: wherein />W _K and W_Q Is a linear transformation matrix;

the scoring function takes S (K (·), Q (·))=p (ReLU (K (·) +q (·))); wherein, reLU is a component-by-component activation function, and P is an average pooling function;

after activation and pooling, the calculated attention score is canonical;

45 A) rotational invariance is achieved by projecting the coordinates of the points on the manifold under the global coordinate system to the local coordinate system;

let x be _p Is the coordinate value of the point p in the global coordinate system, n _p The normal vector at point p, specification w at point p _p From two coordinate directions u _p and v_p Determining; the coordinate values in the local coordinate system obtained after projection are expressed as:

X _p ＝(<x _p ,u _p >,<x _p ,v _p >,<x _p ,n _p >) X is rotation-invariant to the global coordinate system;

5) Constructing and training a three-dimensional object shape recognition model GET of a neural network based on a transformation converter such as a standard, wherein the input of the model is a 3D object represented by a two-dimensional manifold structure in a three-dimensional space, and the model is output as a prediction type of the 3D object;

adopting the transducer constructed in the step 4) as a convolution layer of a convolution neural network model; adding a group pooling layer after the output of the last convolution layer to pool the output of standard change into standard-unchanged output, and obtaining the prediction score of each category through a global average pooling layer and a full connection layer respectively;

initializing the constructed transducer layer, and training the model to obtain a trained GET model;

6) And 5) identifying the 3D object mesh sample data to be identified by utilizing the GET model constructed and trained in the step 5), so as to obtain a predicted 3D object shape identification tag, and realizing three-dimensional object shape identification of the neural network based on the standard isomorphism conversion operator.

2. The method for recognizing three-dimensional object shape based on neural network of canonical-like transformation sub-as recited in claim 1, wherein in step 1), 3D object data is obtained by using 3D camera or existing 3D object data set is directly adopted; a group of points can be obtained by adopting the furthest point sampling algorithm, and a triangular grid is formed according to the points; discrete data of a 3D object consisting of a group of points, edges and faces is used for generating 3D object mesh data.

3. The three-dimensional object shape recognition method of the neural network based on the canonical isomorphism converter of claim 1, wherein the step 2) preprocesses the 3D object mesh data, and includes the following steps:

21 Normalized): summing the areas of each triangle of the triangular meshes, calculating the area of the whole triangular mesh, and then carrying out edge contraction and surface contraction on the mesh by the area to normalize the total area to be 1;

22 Determining the neighborhood:

setting a geodesic threshold value and a linear threshold value in advance; according to the position of each point in the mesh data in the space, finding out a point set with a linear distance from the point in the space smaller than a set geodetic threshold; for each point i in the mesh, calculating the geodesic distance between the points with the linear distance smaller than the set linear threshold value, and then screening out the points with the geodesic distance smaller than the set linear threshold value from the points with the geodesic distance smaller than the corresponding geodesic threshold value as the neighborhood of the point i, and marking as n _i ；

23 Selecting a local coordinate system: for each point in the 3D object mesh, calculating a tangential plane of the point, and arbitrarily selecting a coordinate system of the tangential plane as an x-axis and a y-axis of a local coordinate system; arbitrarily selecting a tangent vector of a tangent plane as a z-axis of a local coordinate system;

24 Calculating a log map and communicating;

according to the local coordinate system, calculating the local coordinates of the point j in the neighborhood of each point i in the meshContact g corresponding to point j to point i _j→i, wherein g_j→i Is a two-dimensional rotation matrix;

25 Constructing input features of the model;

for each point in the mesh data, projecting the coordinates of the point in the global coordinate system to the local coordinate system in step 24), and taking the projected coordinate system value as an input feature of the model.

4. The method for recognizing the three-dimensional object shape of the neural network based on the canonical isomorphous transformation element according to claim 3, wherein in the step 22), the geodesic distance between the points having the linear distance from each point i in the mesh smaller than the set linear threshold is calculated by using a thermal vector method.

5. The method for three-dimensional object shape recognition of a neural network based on canonical isomorphous transformation according to claim 3, wherein in step 24), the local coordinates of the point j in the neighborhood of each point i in the mesh are calculated using a thermal vector methodContact g corresponding to point j to point i _j→i 。

6. The method for three-dimensional object shape recognition of a neural network based on a canonical-class transformation transformer according to claim 1, wherein step 5) constructs a three-dimensional object shape recognition model GET of the neural network based on the canonical-class transformation transformer, and the convolutional neural network used is a res net.

7. The method for recognizing the three-dimensional object shape of the neural network based on the canonical and isomorphous transformation element according to claim 1, wherein in the step 5), a Weiler initialization method is used to initialize the transducer layer; initializing a full connection layer by using an Xavier initialization method; network model results were achieved using pytorch.

8. The method for three-dimensional object shape recognition based on a neural network of canonical isomorphous scales of claim 7, wherein the model is trained using Adam's algorithm using a back propagation algorithm.

9. The method for recognizing the three-dimensional object shape of the neural network based on the canonical isomorphous transformation element according to claim 8, wherein the number of iteration rounds of the training process is 50, the batch size is 1, namely, each mesh is one batch; the initial learning rate was 0.01, the 31 to 40 rounds of learning rate was 0.001, and the 41 to 50 rounds of learning rate was 0.0005.