CN113723208B - Three-dimensional object shape classification method based on canonical and other transformation conversion sub-neural network - Google Patents

Three-dimensional object shape classification method based on canonical and other transformation conversion sub-neural network Download PDF

Info

Publication number
CN113723208B
CN113723208B CN202110895887.5A CN202110895887A CN113723208B CN 113723208 B CN113723208 B CN 113723208B CN 202110895887 A CN202110895887 A CN 202110895887A CN 113723208 B CN113723208 B CN 113723208B
Authority
CN
China
Prior art keywords
point
coordinate system
neural network
model
mesh
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110895887.5A
Other languages
Chinese (zh)
Other versions
CN113723208A (en
Inventor
林宙辰
董一鸣
何翎申
王奕森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202110895887.5A priority Critical patent/CN113723208B/en
Publication of CN113723208A publication Critical patent/CN113723208A/en
Application granted granted Critical
Publication of CN113723208B publication Critical patent/CN113723208B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B07SEPARATING SOLIDS FROM SOLIDS; SORTING
    • B07CPOSTAL SORTING; SORTING INDIVIDUAL ARTICLES, OR BULK MATERIAL FIT TO BE SORTED PIECE-MEAL, e.g. BY PICKING
    • B07C5/00Sorting according to a characteristic or feature of the articles or material being sorted, e.g. by control effected by devices which detect or measure such characteristic or feature; Sorting by manually actuated devices, e.g. switches
    • B07C5/34Sorting according to other particular properties
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • G06T17/205Re-meshing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a three-dimensional object shape recognition method based on a neural network of standard isomorphism conversion elements, which is used for creating isomorphism conversion elements for realizing model standard isomorphism, projecting a global coordinate system to a local coordinate system, realizing model rotation invariance based on standard isomorphism and effectively carrying out three-dimensional object classification and recognition visual analysis. The three-dimensional object shape recognition model GET based on the neural network of the transformation converter such as the standard is constructed and trained, the input of the model is a 3D object represented by a two-dimensional manifold structure in a three-dimensional space, the input is output as the prediction type of the 3D object, the visual analysis such as the classification and recognition of the shape of the object in the 3D image data can be efficiently carried out, and the accuracy and the efficiency of the shape classification of the object are improved.

Description

Three-dimensional object shape classification method based on canonical and other transformation conversion sub-neural network
Technical Field
The invention belongs to the technical fields of pattern recognition, machine learning, neural network, deep learning, artificial intelligence and computer graphics, relates to an object shape classification method, and in particular relates to a three-dimensional object shape classification method based on a standard and other transformation and conversion sub-neural network.
Background
In recent years, conversion operators (transformers) have almost dominated related algorithms in the field of natural language processing. A significant advantage of this is that the most relevant parts are emphasized in a given context. Because of its excellent performance, there is much work currently being done to apply transformers to other fields of machine learning, such as computer vision and image processing applications.
Manifold learning is a machine learning technique that applies a traditional neural network model to a complex and diverse data structure. Some existing works have the curved surface data in a three-dimensional image represented by performing two-dimensional projection or by voxel grid points, and the disadvantage of these methods is that the calculation amount is too large. Other works have defined convolution directly on the surface, which has the advantage of being more robust to deformation processing of surfaces in three-dimensional images. However, the main difficulty of the method is that the neighborhood of each point on the curved surface does not have a standard coordinate system, so that the parameterisation of adjacent points cannot be unified, and the performance of the neural network model is affected.
In order to solve the uncertainty of a neighborhood coordinate system, a constant deep learning technology is proposed. The superiority of the Convolutional Neural Network (CNN) which is widely used at present is largely brought about by the variability of translation and the like, which makes researchers to widen this characteristic to other operations such as rotation operation and the like. Cohen et al apply the standard isomorphism (gauge equivariance) to manifold learning, de Haan et al propose a standard isomorphism neural network based on anisotropic convolution, and the anisotropic convolution kernel of the graph convolution network is modified to meet the standard isomorphism condition, so that the object shape classification technology of the standard isomorphism CNN based on manifold can be directly applied to meshing successfully. They designed a new activation function in the neural network that satisfies the standard isomorphism condition, called regular noise, and realized the standard isomorphism based on fourier transform. However, introducing fourier transforms introduces additional computational resource consumption; in addition, the three-dimensional shape recognition and classification method proposed by de han et al is based on convolution, which is equivalent to applying the same attention to all points in the neighborhood, ignoring the influence of attention weight based on content, and has lower recognition accuracy.
Disclosure of Invention
In order to overcome the defects of the conventional standard isomorphism object recognition technology in terms of rotation invariance and attention mechanism introduction, the invention provides a novel three-dimensional object shape recognition method based on a neural network of a standard isomorphism converter, wherein a model is named GET (Gauge Equivariant Transformer) and is used for efficiently carrying out visual analysis such as classification and recognition of the shape of an object in 3D image data.
The invention uses isomorphism as mathematical guidance, and designs an input processing and conversion operator layer (a conversion operator layer) of the model respectively, so that the whole three-dimensional object shape recognition model GET has spatial rotation invariance and standard invariance at the same time. The input of the whole model is a 3D object expressed by a two-dimensional manifold structure in a three-dimensional space, and the output is a prediction type of the 3D object.
The technical scheme provided by the invention is as follows:
a three-dimensional object shape recognition method based on a neural network of standard and other transformation conversion factors, which designs the standard and other transformation factors of a model, designs a method for projecting a global coordinate system to a local coordinate system and combines the standard and other transformation factors to realize the rotation invariance of the model, is used for efficiently carrying out three-dimensional object classification, recognition and other visual analysis, and comprises the following steps:
firstly, carrying out meshing (mesh) on a 3D object represented by a manifold structure to generate mesh data;
in practice, 3D object data may be acquired with a 3D camera or directly with existing 3D object data sets. The 3D object mesh data is a discrete data representation of a 3D object consisting of a set of points, edges, faces. For a manifold containing infinite points, we use the furthest point sampling (Farthest Point Sampling) algorithm to obtain a set of a given number of points and form a triangular mesh (triangulated mesh) from these points.
Secondly), preprocessing the mesh data;
21 Normalized by a number of samples
Summing the areas of each triangle of the triangular meshes, calculating the area of the whole triangular mesh, and then taking the area as a shrinkage side and a surface of the mesh to normalize the total area to be 1.
22 Determining a neighborhood
Setting a geodesic threshold value and a linear threshold value in advance; according to the bit of each point in the mesh in the spaceSetting, finding out a point set with a linear distance from the point being smaller than a set geodetic threshold value in space, then calculating the geodetic distance between points with a linear distance from the point being smaller than the set linear threshold value for each point i in the mesh by using a thermal vector method (vector heat method), and then screening out the points with the linear distance being smaller than the set linear threshold value as a neighborhood of the point i, and marking as n i
23 Selected local coordinate system
For each point in the 3D object mesh, a tangent plane at the point is calculated, and the coordinate system of the tangent plane is arbitrarily selected as the x-axis and the y-axis of the local coordinate system. The tangent vector of the tangent plane is arbitrarily selected as the z-axis of the local coordinate system.
24 Computing log maps and contacts
Calculating the local coordinates of the point j in the neighborhood of each point i in the mesh by using a thermal vector method according to the local coordinate systemContact g corresponding to point j to point i j→i, wherein gj→i Is a two-dimensional rotation matrix.
25 Construction of input features
For each point in the mesh, the coordinates of the point in the global coordinate system are projected to the local coordinate system given in the step 24), and the projected coordinate system value is used as the input of the model.
Third), dividing the data set into training samples and test samples
Fourth, constructing a standard-class Transformer;
one transducer consists of three parts, a key function, a query function and a value function. Wherein key and query are components of attention score (attention score). We implement canonical alike transformers by designing invariant saturation score and alike value functions.
41 Architecture for building a canonical alike transducer
Let f of the transducer input feature field be C in dimension in Group (a) of groupsDenoted as ρ in Output feature fieldIs of dimension C out The group is denoted as ρ out Definition of the output of a canonical-isomorphic transducer at point p under canonical w>The method comprises the following steps:
wherein MHSA is a multi-head attention function, SA is a single-head attention function, W M Is a linear transformation matrix, || is a vector concatenation operation operator. At the head h, the output of the SA function is:
wherein ,point q u =exp p w p (u),f w ′(q u ) For point q u The eigenvectors at that point are shifted in parallel to the value of point p under specification w, V u As a value function, it uses a matrix for the relative position u>Encoding is performed with the following expression:
alpha is the attention score, which is p at the center point and q at the neighborhood point u The expression at time head h is as follows:
we propose the process of constructing the value function and the attention score in 43) and 44), respectively.
42 Expanded formal representation
C N The group is formed by all corresponding radian values in the space(wherein k is an integer between 0 and N-1) and the regular expression (Regular representation) is C N Is a special group representation of the same. The definition of the regular expression of the present invention is defined by the regular expression described in Linear Representation of Finite Group. If using theta k To indicate the rotation angle is +.>And C is the rotation matrix of (2) N Can be expressed as { Θ } 01 ,…,Θ N-1 }. For an integer k between 0 and N-1, the normal expression +.>Is an nxn permutation matrix that cyclically translates all components of a vector by k units.
Can be decomposed into +.> wherein Is C N Is an N x N invertible matrix. When N is an odd number, the irreducible representation is as follows:
wherein θ ε [0,2π) is the rotation angle corresponding to matrix Θ, namely:
and is also provided with
Further, the invention extends the irreducible representation over a two-dimensional rotating group SO (2), expressed as:
wherein This results in an expanded group representation:
hereby is achieved the property of moving any vector in space in parallel without losing rotation angle information.
43 A value function of standard constant change is constructed, and the relative position u is encoded by a matrix;
defining a value function as a numerical value left-by-value encoding matrix W of feature vectors of parallel movement V . The full requirement for Value function specification and the like is W V-1 u)=ρ out-1 )W V (u)ρ in (Θ). For W V The taylor expansion is performed to solve this equation, namely:
substituting the formula into a fully necessary conditional formula of standard isomorphism to obtain a linear equation set:
W 0 =ρ out-1 )W 0 ρ in (Θ),
cos(θ)W 1 -sin(θ)W 2 =ρ out-1 )W 1 ρ in (Θ),
sin(θ)K 1 -cos(θ)K 2 =ρ out-1 )W 2 ρ in (Θ),
...
there are an infinite number of equations in this linear system of equations, and the number of equations can be limited by truncating the term of the taylor expansion. When the system is truncated to the order 2, the system of equations only contains 6 equations, and the existing program library can be effectively used for solving. After solving, we can obtain a set of basesWhere m is the dimension of the solution space. Each +.>Are all composed of six parts, i.e.)>Thus, the constant code matrix W (i) Has the following form:
these W (i) Is a linear combination of Σc i W (i) Still meets the sufficient and necessary conditions for Value function specification and the like. During training, c i Designed as a learnable parameter.
44 Building a canonical invariant attention score
In the implementation process, the manifold is generally discretized into a mesh structure. Here we choose key and query functions using Graph Att described in literature (Velickovic et al)The architecture of the event Network, i.e. wherein />The score function (score function) employs the architecture S (K (·), Q (·))=p (ReLU (K (·) +q (·))). Here ReLU is the component-wise activation function and P is the average pooling function. Linear transformation matrix W K and WQ It is also necessary to satisfy the first equation of the system of equations. After activation and pooling, the computed intent score will be specification-invariant.
45 Rotation invariance is realized;
the rotational invariance is achieved by projecting the coordinates of the points on the manifold under the global coordinate system to the local coordinate system. Let x be p Is the coordinate value of the point p in the global coordinate system, n p The normal vector at point p, specification w at point p p From two coordinate directions u p and vp And (5) determining. The coordinate value obtained after projection under the local coordinate system is X p =(<x p ,u p >,<x p ,v p >,<x p ,n p >) Such X is rotation invariant to the global coordinate system.
Fifth), constructing a three-dimensional object shape recognition model GET of a neural network based on a canonical-like transformation converter
An existing convolutional neural network architecture, such as ResNet, can be used in construction, wherein the convolutional layer adopts the transducer constructed in step four). After the output of the last convolution layer, a group pooling layer (described in Cohen et al) is added, so that the output pool with standard change becomes the output with standard unchanged, and finally, the prediction score of each category is obtained through a global average pooling layer and a full connection layer.
The transducer layer constructed in the present invention was initialized by using the Weiler initialization method described in literature (Weiler Learning Steerable Filters for Rotation Equivariant CNNs paper), and the full link layer was initialized by using the Xavier initialization method. The entire network structure may be implemented using a pytorch.
In the specific implementation of the invention, a back propagation algorithm is adopted, and the model is trained by an Adam algorithm. The training process iterates 50 rounds with a batch size of 1, i.e., one batch per mesh. The initial learning rate was 0.01, the 31 to 40 rounds of learning rate was 0.001, and the 41 to 50 rounds of learning rate was 0.0005.
Sixthly, recognizing the 3D object mesh sample data to be tested by utilizing the GET model constructed and trained in the step five), and obtaining the predicted shape recognition tag.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a novel three-dimensional object shape recognition method based on a neural network of a standard and other transformation converters, which is used for creating a three-dimensional object shape recognition model GET with space rotation invariance and standard invariance, wherein the model is input into a 3D object represented by a two-dimensional manifold structure in a three-dimensional space, and the model is output as a prediction type of the 3D object, so that visual analysis such as classification and recognition of the object shape in 3D image data can be effectively carried out, and the accuracy and efficiency of object shape classification are improved.
Drawings
FIG. 1 is a block diagram of a 3D object shape classification model GET network constructed in accordance with an embodiment of the invention.
Detailed Description
The invention is further described by way of examples in the following with reference to the accompanying drawings, but in no way limit the scope of the invention.
The invention provides a novel three-dimensional object shape recognition method based on a neural network of a standard and other transformation converters, which constructs a 3D object shape classification model GET, wherein the network structure is shown in figure 1 and comprises the following steps: an input layer, an attention mechanism layer, a group pooling layer, a global average pooling layer and a full connection layer. The input layer converts the data in the original 3D coordinate system into projected input through steps 23), 24) and 25), the attention mechanism layer is the transducer layer constructed in step 4, and the group pooling layer is first proposed by Weiler et al in General E (2) -equivariant Steerable CNNs, which takes the maximum value of the numerical component of the vector under a given specification in the feature domain, resulting in a specification-invariant one-dimensional feature domain. The global averaging pooling layer averages the values of all feature vectors in the feature domain under a given specification to generate a global feature vector. The full connection layer is the full connection layer of the deep neural network. The following examples identify and classify three-dimensional object shapes based on canonical or other transformation sub-neural networks for 3D objects of a 3D object dataset (SHREC 11 dataset, see document "Shape Retrieval on Non-rib 3D Watertight Meshes"). The method comprises the following steps:
step 1: gridding the data in the 3D object data set to generate 3D object mesh data; a set of a given number of points is obtained using a furthest point sampling algorithm and a triangular mesh is formed from the points.
Step 2: preprocessing 3D object mesh data;
21 Normalized by a number of samples
Summing the areas of each triangle of the triangular meshes, calculating the area of the whole triangular mesh, and scaling the edges and the surfaces of the triangular mesh according to the area to normalize the total area to be 1.
22 Determining a neighborhood
Setting a geodesic threshold value and a linear threshold value in advance; according to the position of each point in the 3D object mesh data in the space, finding out a point set with the linear distance from the point being smaller than the set geodetic threshold value in the space, then using a thermal vector method (vector heat method), calculating the geodetic distance between points with the linear distance from the point being smaller than the set linear threshold value for each point i in the mesh, and then screening out the points with the geodetic distance being smaller than the set linear threshold value from the points with the linear distance being smaller than the corresponding geodetic threshold value as a neighborhood of the point i, and marking as n i
23 Selected local coordinate system
For each point in the mesh, a tangential plane at the point is calculated, and the coordinate system of the tangential plane is arbitrarily selected as the x-axis and the y-axis of the local coordinate system. The tangent vector of the tangent plane is arbitrarily selected as the z-axis of the local coordinate system.
24 Computing log maps and contacts
Calculating the local coordinates of the point j in the neighborhood of each point i in the 3D object mesh data by using a thermal vector method according to the local coordinate systemContact g corresponding to point j to point i j→i, wherein gj→i Is a two-dimensional rotation matrix.
25 Construction of input features
For each point in the 3D object mesh data, the coordinates of the point under the global coordinate system are projected to the local coordinate system in the step 24), and the projected coordinate system value is used as the input of the model.
Step 3: dividing the dataset into a training sample and a test sample;
the present example uses a data set, namely a 3D object shape data set SHREC11, which is obtained by computer graphics modeling of objective entities from well known databases, such as PSB, mcGill, etc., and comprises 30 classes of 3D object shapes, each 3D object shape having 20 samples, and we divide the data set into a training set and a test set, each class of which is 10 3D object shape samples.
Step 4: construction specification-alike transformers
One transducer consists of three parts, a key function, a query function and a value function. Wherein key and query are components of attention score (attention score). We implement canonical alike transformers by designing invariant saturation score and alike value functions.
41 Overall architecture)
Suppose the dimension of the transducer input feature field is C in The group is denoted as ρ in The dimension of the output feature domain is C out The group is denoted as ρ out The output of the transformation of the specification and the like at the point p under the specification w is defined as follows:
wherein MHSA is a multi-head attention function, SA is a single-head attention function, W M Is a linear transformation matrix, || is a vector concatenation operation operator. At the head h, the output of the SA function is:
wherein ,point q u =exp p w p (u),f w ′(q u ) For point q u The eigenvectors at that point are shifted in parallel to the value of point p under specification w, V u As a value function, it uses a matrix for the relative position u>Encoding is performed with the following expression:
alpha is an attribute score expressed as follows:
we propose the procedure of constructing the value function and the attention score in 43) and 44), respectively.
42 Expanded formal representation
C N The group is formed by all corresponding radian values in the space(wherein k is an integer between 0 and N-1) and the regular expression (Regular representation) is C N Is a special group representation of the same. This patent saysThe precise definition and specific description of the explicit formal representation is given in Linear Representation of Finite Group. If using theta k To indicate the rotation angle is +.>And C is the rotation matrix of (2) N Can be expressed as { Θ } 01 ,…,Θ N-1 }. For an integer k between 0 and N-1, the normal expression +.>Is an nxn permutation matrix that cyclically translates all components of a vector by k units.
Can be decomposed into +.> wherein Is C N Is an N x N invertible matrix. When N is an odd number, the irreducible representation is as follows:
wherein θ ε [0,2π) is the rotation angle corresponding to matrix Θ, namely:
and is also provided withWe extend these irreducible representations over a two-dimensional rotation group SO (2), namely:
wherein Thus we get the expanded group representation as
The property of moving any vector in space in parallel without losing rotation angle information is achieved.
43 Value function of specification constant
We define the value function as the numerical left-by-value encoding matrix W of feature vectors that are parallel-shifted V . The full requirement for Value function specification and the like is W V-1 u)=ρ out-1 )W V (u)ρ in (Θ) we have done for W V Solving this equation by Taylor expansion, i.e
Substituting the formula into a fully necessary conditional formula of standard isomorphism to obtain a linear equation set:
W 0 =ρ out-1 )W 0 ρ in (Θ),
cos(θ)W 1 -sin(θ)W 2 =ρ out-1 )W 1 ρ in (Θ),
sin(θ)K 1 -cos(θ)K 2 =ρ out-1 )W 2 ρ in (Θ),
...
there are an infinite number of equations in this linear system of equations, we can truncate the term of the taylor expansion to limit the number of equations. When the cut-off is 2 nd order, the equation set only comprises 6 equations, and there can beThe existing program library is effectively used for solving. After solving, we can obtain a set of basesWhere m is the dimension of the solution space. Each +.>Are all composed of six parts, i.e.)>Thus, the constant code matrix W (i) Has the following form:
these W (i) Is a linear combination of Σc i W (i) The constant condition is still satisfied. During training process, c i May be designed as a learnable parameter.
44 Normative constant attention score
In the implementation process, the manifold is generally discretized into a mesh structure. Here we choose key and query functions using the same architecture as Graph Attention Network proposed by Velickovic et al, i.e wherein /> The scoring function (score function) employs a similar architecture as Graph Attention Network, i.e., S (K (·), Q (·))=p (ReLU (K) +q (·))). Here ReLU is the component-wise activation function and P is the average pooling function. Linear transformation matrix W K and WQ It is also necessary to satisfy the first equation of the system of equations. After activation and pooling, the calculated intent scThe ore will be specification-invariant.
45 Rotational invariance implementation
The rotational invariance is achieved by projecting the coordinates of the points on the manifold under the global coordinate system to the local coordinate system. Let x be p Is the coordinate value of the point p in the global coordinate system, n p The normal vector at point p, specification w at point p p From two coordinate directions u p and vp And (5) determining. The coordinate value obtained after projection under the local coordinate system is X p =(<x p ,u p >,<x p ,v p >,<x p ,n p >) Such X is rotation invariant to the global coordinate system.
Step 5: three-dimensional object shape recognition model GET for constructing neural network based on standard and other transformation scales
An existing convolutional neural network architecture, such as ResNet, may be used in the construction to change the convolutional layer therein to the Transformer introduced in step 4. And adding a group pooling layer proposed by Cohen et al after the output of the last convolution layer so as to pool the standard-unchanged output into a standard-unchanged output, and finally obtaining the prediction score of each category through a global average pooling layer and a full connection layer.
The transducer layer in the present invention uses the initialization method set forth in the paper Learning Steerable Filters for Rotation Equivariant CNNs by Weiler, and the full connectivity layer uses the Xavier initialization method. The entire network structure may be implemented using a pytorch.
In the specific implementation of the invention, a back propagation algorithm is adopted, and the model is trained by an Adam algorithm. The training process iterates 50 rounds with a batch size of 1, i.e., one batch per mesh. The initial learning rate was 0.01, the 31 to 40 rounds of learning rate was 0.001, and the 41 to 50 rounds of learning rate was 0.0005.
Step 6: and 5, identifying the test mesh sample by utilizing the GET model constructed and trained in the step 5 to obtain a predicted shape identification label. Table 1 gives the image classification error rate comparison of the model GET of the invention and other models on the SHREC dataset:
TABLE 1 image classification accuracy of the model GET and other models of the invention on SHREC dataset
Model SO (3) invariance Accuracy rate of Quantity of parameters
MDGCNN Whether or not 82.2% ---
MeshCNN Is that 91.0% ---
HSN Whether or not 96.6% 78k
GET Is that 99.2% 11k
The above table shows that the model proposed by the patent can realize better performance under smaller parameter quantity, and benefits from the fact that the model has rotation invariance, standard and other variability. The convolution kernel in MDGCNN is scalar in form, which greatly limits its performance. Compared to the previously optimal specification-invariant network HSN, performance is lower and more parameters are required due to no rotational invariance of our model.
It should be noted that the purpose of the disclosed embodiments is to aid further understanding of the present invention, but those skilled in the art will appreciate that: various alternatives and modifications are possible without departing from the scope of the invention and the appended claims. Therefore, the invention should not be limited to the disclosed embodiments, but rather the scope of the invention is defined by the appended claims.

Claims (9)

1. A three-dimensional object shape recognition method based on a neural network of standard isomorphism conversion elements creates isomorphism conversion elements for realizing model standard isomorphism, projects a global coordinate system to a local coordinate system, realizes model rotation invariance based on standard isomorphism, and is used for efficiently classifying and recognizing vision analysis of three-dimensional objects; the method comprises the following steps:
1) Performing meshing (mesh) on the 3D object data represented by the manifold structure to generate 3D object mesh data;
2) Preprocessing 3D object mesh data; comprising the following steps: normalizing; determining a neighborhood; selecting a local coordinate system; calculating a log map and contacting; constructing a model input feature, and projecting the coordinates of each point in the mesh under the global coordinate system to the local coordinate system to serve as the model input feature;
3) Dividing the 3D object mesh data set into a training sample and a test sample;
4) Constructing a standard-class Transformer;
the transducer includes: a key value function, a query function, and a cost function; wherein the attention score includes a key value key and a query; realizing a standard isomorphic Transformer by designing a constant attention score and an isomorphic cost function; the method comprises the following steps:
41 A standard-compliant transducer architecture is constructed;
let the dimension of the transducer input feature field f be C in The group is denoted as ρ in Output feature fieldIs of dimension C out The group is denoted as ρ out The method comprises the steps of carrying out a first treatment on the surface of the Definition of a canonical Isotropic Transformer under canonical w, output of Point p +.>The method comprises the following steps:
wherein MHSA is a multi-head attention function, SA is a single-head attention function, W M Is a linear transformation matrix, and I is a vector splicing operation operator; at the head h, the output of the SA function is:
wherein ,point q u =exp p w p (u),f w ′(q u ) For point q u The eigenvectors at that point are shifted in parallel to the value of point p under specification w, V u As a cost function, it uses a matrix for the relative position u>Encoding is performed with the following expression:
wherein alpha is the attentionForce fraction, p at the center point, q at the neighborhood point u The expression at time head h is as follows:
42 Expanding the regular representation;
C N the group is formed by all corresponding radian values in the spaceA group of N elements, wherein k is an integer between 0 and N-1; the formal representation is C N Is a special group representation of (a); if using theta k To indicate the rotation angle asAnd C is the rotation matrix of (2) N Can be expressed as { Θ } 01 ,…,Θ N-1 -a }; for integer k, formal representation +.>Is an nxn permutation matrix in which all components of the vector are circularly translated by k units;
can be decomposed into +.> wherein Is C N Is an N x N invertible matrix; when N is an odd number, the irreducible representation is as follows:
wherein θ∈ [0,2π) is the rotation angle corresponding to the matrix Θ, namely:
and is also provided with
Further, the irreducible representation is extended to a two-dimensional rotation group SO (2), expressed as:
wherein I.e. the expanded group is expressed as:
thereby realizing parallel movement of any vector in the space without losing rotation angle information;
43 Constructing a value function of standard constant change, and encoding the relative position u by a matrix;
defining a cost function as a numerical left-hand value encoding matrix W of feature vectors moving in parallel V The method comprises the steps of carrying out a first treatment on the surface of the The sufficiency requirement of the change of the cost function specification is W V-1 u)=ρ out-1 )W V (u)ρ in (Θ); for W V The taylor expansion is performed to solve this equation, namely:
substituting the formula into a full and necessary conditional formula of standard and the like to obtain a linear equation set, wherein the linear equation set is expressed as:
W 0 =ρ out-1 )W 0 ρ in (Θ),
cos(θ)W 1 -sin(θ)W 2 =ρ out-1 )W 1 ρ in (Θ),
sin(θ)K 1 -cos(θ)K 2 =ρ out-1 )W 2 ρ in (Θ),
the number of equations in the linear system of equations may be limited by truncating the term of the taylor expansion; solving the linear equation set to obtain a set of basesWhere m is the dimension of the solution space; each +.>Comprises->Coding matrix W to be constant (i) Expressed as:
W (i) is a linear combination of Σc i W (i) Still meets the sufficient and necessary conditions of Value function specification and the like; during training process, c i Is a learnable parameter;
44 A build specification invariant attention score;
the key value function and the query function are expressed as: wherein />W K and WQ Is a linear transformation matrix;
the scoring function takes S (K (·), Q (·))=p (ReLU (K (·) +q (·))); wherein, reLU is a component-by-component activation function, and P is an average pooling function;
after activation and pooling, the calculated attention score is canonical;
45 A) rotational invariance is achieved by projecting the coordinates of the points on the manifold under the global coordinate system to the local coordinate system;
let x be p Is the coordinate value of the point p in the global coordinate system, n p The normal vector at point p, specification w at point p p From two coordinate directions u p and vp Determining; the coordinate values in the local coordinate system obtained after projection are expressed as:
X p =(<x p ,u p >,<x p ,v p >,<x p ,n p >) X is rotation-invariant to the global coordinate system;
5) Constructing and training a three-dimensional object shape recognition model GET of a neural network based on a transformation converter such as a standard, wherein the input of the model is a 3D object represented by a two-dimensional manifold structure in a three-dimensional space, and the model is output as a prediction type of the 3D object;
adopting the transducer constructed in the step 4) as a convolution layer of a convolution neural network model; adding a group pooling layer after the output of the last convolution layer to pool the output of standard change into standard-unchanged output, and obtaining the prediction score of each category through a global average pooling layer and a full connection layer respectively;
initializing the constructed transducer layer, and training the model to obtain a trained GET model;
6) And 5) identifying the 3D object mesh sample data to be identified by utilizing the GET model constructed and trained in the step 5), so as to obtain a predicted 3D object shape identification tag, and realizing three-dimensional object shape identification of the neural network based on the standard isomorphism conversion operator.
2. The method for recognizing three-dimensional object shape based on neural network of canonical-like transformation sub-as recited in claim 1, wherein in step 1), 3D object data is obtained by using 3D camera or existing 3D object data set is directly adopted; a group of points can be obtained by adopting the furthest point sampling algorithm, and a triangular grid is formed according to the points; discrete data of a 3D object consisting of a group of points, edges and faces is used for generating 3D object mesh data.
3. The three-dimensional object shape recognition method of the neural network based on the canonical isomorphism converter of claim 1, wherein the step 2) preprocesses the 3D object mesh data, and includes the following steps:
21 Normalized): summing the areas of each triangle of the triangular meshes, calculating the area of the whole triangular mesh, and then carrying out edge contraction and surface contraction on the mesh by the area to normalize the total area to be 1;
22 Determining the neighborhood:
setting a geodesic threshold value and a linear threshold value in advance; according to the position of each point in the mesh data in the space, finding out a point set with a linear distance from the point in the space smaller than a set geodetic threshold; for each point i in the mesh, calculating the geodesic distance between the points with the linear distance smaller than the set linear threshold value, and then screening out the points with the geodesic distance smaller than the set linear threshold value from the points with the geodesic distance smaller than the corresponding geodesic threshold value as the neighborhood of the point i, and marking as n i
23 Selecting a local coordinate system: for each point in the 3D object mesh, calculating a tangential plane of the point, and arbitrarily selecting a coordinate system of the tangential plane as an x-axis and a y-axis of a local coordinate system; arbitrarily selecting a tangent vector of a tangent plane as a z-axis of a local coordinate system;
24 Calculating a log map and communicating;
according to the local coordinate system, calculating the local coordinates of the point j in the neighborhood of each point i in the meshContact g corresponding to point j to point i j→i, wherein gj→i Is a two-dimensional rotation matrix;
25 Constructing input features of the model;
for each point in the mesh data, projecting the coordinates of the point in the global coordinate system to the local coordinate system in step 24), and taking the projected coordinate system value as an input feature of the model.
4. The method for recognizing the three-dimensional object shape of the neural network based on the canonical isomorphous transformation element according to claim 3, wherein in the step 22), the geodesic distance between the points having the linear distance from each point i in the mesh smaller than the set linear threshold is calculated by using a thermal vector method.
5. The method for three-dimensional object shape recognition of a neural network based on canonical isomorphous transformation according to claim 3, wherein in step 24), the local coordinates of the point j in the neighborhood of each point i in the mesh are calculated using a thermal vector methodContact g corresponding to point j to point i j→i
6. The method for three-dimensional object shape recognition of a neural network based on a canonical-class transformation transformer according to claim 1, wherein step 5) constructs a three-dimensional object shape recognition model GET of the neural network based on the canonical-class transformation transformer, and the convolutional neural network used is a res net.
7. The method for recognizing the three-dimensional object shape of the neural network based on the canonical and isomorphous transformation element according to claim 1, wherein in the step 5), a Weiler initialization method is used to initialize the transducer layer; initializing a full connection layer by using an Xavier initialization method; network model results were achieved using pytorch.
8. The method for three-dimensional object shape recognition based on a neural network of canonical isomorphous scales of claim 7, wherein the model is trained using Adam's algorithm using a back propagation algorithm.
9. The method for recognizing the three-dimensional object shape of the neural network based on the canonical isomorphous transformation element according to claim 8, wherein the number of iteration rounds of the training process is 50, the batch size is 1, namely, each mesh is one batch; the initial learning rate was 0.01, the 31 to 40 rounds of learning rate was 0.001, and the 41 to 50 rounds of learning rate was 0.0005.
CN202110895887.5A 2021-08-05 2021-08-05 Three-dimensional object shape classification method based on canonical and other transformation conversion sub-neural network Active CN113723208B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110895887.5A CN113723208B (en) 2021-08-05 2021-08-05 Three-dimensional object shape classification method based on canonical and other transformation conversion sub-neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110895887.5A CN113723208B (en) 2021-08-05 2021-08-05 Three-dimensional object shape classification method based on canonical and other transformation conversion sub-neural network

Publications (2)

Publication Number Publication Date
CN113723208A CN113723208A (en) 2021-11-30
CN113723208B true CN113723208B (en) 2023-10-20

Family

ID=78674890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110895887.5A Active CN113723208B (en) 2021-08-05 2021-08-05 Three-dimensional object shape classification method based on canonical and other transformation conversion sub-neural network

Country Status (1)

Country Link
CN (1) CN113723208B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984174B (en) * 2022-12-02 2024-03-22 河南交通发展研究院有限公司 Pavement disease identification method based on rotary constant-change detector

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102102471B1 (en) * 2019-08-19 2020-04-20 유징테크주식회사 System for shape recognition based image processing
CN112801280A (en) * 2021-03-11 2021-05-14 东南大学 One-dimensional convolution position coding method of visual depth self-adaptive neural network
CN112990315A (en) * 2021-03-17 2021-06-18 北京大学 3D shape image classification method of equal-variation 3D convolution network based on partial differential operator
CN113139470A (en) * 2021-04-25 2021-07-20 安徽工业大学 Glass identification method based on Transformer
CN113159232A (en) * 2021-05-21 2021-07-23 西南大学 Three-dimensional target classification and segmentation method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102102471B1 (en) * 2019-08-19 2020-04-20 유징테크주식회사 System for shape recognition based image processing
CN112801280A (en) * 2021-03-11 2021-05-14 东南大学 One-dimensional convolution position coding method of visual depth self-adaptive neural network
CN112990315A (en) * 2021-03-17 2021-06-18 北京大学 3D shape image classification method of equal-variation 3D convolution network based on partial differential operator
CN113139470A (en) * 2021-04-25 2021-07-20 安徽工业大学 Glass identification method based on Transformer
CN113159232A (en) * 2021-05-21 2021-07-23 西南大学 Three-dimensional target classification and segmentation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于深度卷积神经网络的物体识别算法;黄斌;卢金金;王建华;吴星明;陈伟海;计算机应用;36(12);3333-3340 *
深度学习在基于单幅图像的物体三维重建中的应用;陈加;张玉麒;宋鹏;魏艳涛;王煜;;自动化学报(第04期);23-34 *

Also Published As

Publication number Publication date
CN113723208A (en) 2021-11-30

Similar Documents

Publication Publication Date Title
Lézoray et al. Image processing and analysis with graphs
Tam et al. Registration of 3D point clouds and meshes: A survey from rigid to nonrigid
Caelli et al. An eigenspace projection clustering method for inexact graph matching
Aubry et al. Pose-consistent 3d shape segmentation based on a quantum mechanical feature descriptor
CN106844620B (en) View-based feature matching three-dimensional model retrieval method
JP7075654B2 (en) 3D CAD model partial search method and 3D CAD model search method
Chen et al. Mesh convolution: a novel feature extraction method for 3d nonrigid object classification
Hu et al. Curve skeleton extraction from 3D point clouds through hybrid feature point shifting and clustering
Lee et al. Connectivity-based convolutional neural network for classifying point clouds
CN113723208B (en) Three-dimensional object shape classification method based on canonical and other transformation conversion sub-neural network
Zaied et al. A power tool for Content-based image retrieval using multiresolution wavelet network modeling and Dynamic histograms
Li et al. A non-rigid 3D model retrieval method based on scale-invariant heat kernel signature features
Akimaliev et al. Improving skeletal shape abstraction using multiple optimal solutions
Bespalov and et al. Scale-space representation and classification of 3d models
Jiang et al. Robust 3d face alignment with efficient fully convolutional neural networks
Cao et al. Gaussian-curvature-derived invariants for isometry
Pastor et al. Surface approximation of 3D objects from irregularly sampled clouds of 3D points using spherical wavelets
CN110945499B (en) Method and system for real-time three-dimensional space search and point cloud registration by applying dimension shuffling transformation
Niu et al. Two-dimensional shape retrieval using the distribution of extrema of laplacian eigenfunctions
Aouat et al. Indexing binary images using quad-tree decomposition
Jensen et al. Deep Active Latent Surfaces for Medical Geometries
Limberger et al. Curvature-based spectral signatures for non-rigid shape retrieval
Moumoun et al. 3d hierarchical segmentation using the markers for the watershed transformation
Chen et al. An integrated graph Laplacian downsample (IGLD)-based method for DEM generalization
CN111862328B (en) Three-dimensional grid segmentation result labeling method based on small samples

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant