CN116467948A

CN116467948A - Digital twin model mechanism and appearance combined parameter learning method

Info

Publication number: CN116467948A
Application number: CN202310438448.0A
Authority: CN
Inventors: 牛羽佳; 肖罡; 黄晋; 赵斯杰; 张蔚; 刘小兰; 杨钦文; 万可谦
Original assignee: Jiangxi Kejun Industrial Co ltd
Current assignee: Jiangxi Kejun Industrial Co ltd
Priority date: 2023-04-20
Filing date: 2023-04-20
Publication date: 2023-07-21

Abstract

The invention discloses a method for learning a digital twin model mechanism and appearance joint parameter, which comprises the following steps: obtaining a real image, reconstructing a three-dimensional scene by the real image, rendering the three-dimensional scene by using a multi-layer perceptron to obtain a composite image, calculating a three-dimensional scene representation when the difference between the composite image and the real image is minimum, and generating a three-dimensional grid in the current state according to the three-dimensional scene representation; acquiring mechanism data in the current state of the target equipment and learning potential mechanism representation of the mechanism data to obtain a mechanism data sequence in the next state of the target equipment; sorting vertexes of the three-dimensional grid in the current state, preprocessing the vertexes, and inputting the vertexes into a neural network model to obtain vertex distribution of the three-dimensional grid in the next state; generating current state grid surface distribution according to the potential representation of the target equipment mechanism and the current state three-dimensional grid, sequencing the current state grid surface distribution, preprocessing, and inputting the current state grid surface distribution into a neural network model to obtain next state grid surface distribution. The invention avoids frequent three-dimensional modeling aiming at different states of the equipment.

Description

Digital twin model mechanism and appearance combined parameter learning method

Technical Field

The invention relates to the field of digital twinning, in particular to a method for learning a digital twinning model mechanism and appearance joint parameter.

Background

Digital twinning is a virtual model that aims to accurately reflect a physical object, obtains physical object related data through sensors, combines data, algorithms and decision analysis together for simulation, i.e. virtual mapping of the physical object. Through digital twinning, the problems can be found before the problems occur, the change of the physical object in the virtual model is monitored, the multi-dimensional data based on artificial intelligence is subjected to complex processing and anomaly analysis, the potential risk is predicted, and related equipment is reasonably and effectively planned or maintained.

Although digital twin technology has been rapidly developed in recent years, most digital twin scenes are designed by providing CAD drawings to manually design three-dimensional models, and one or more three-dimensional models need to be provided for the same equipment model in different states, so that frequent modeling is required, the efficiency is low, and the repeatability is high.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: aiming at the technical problems existing in the prior art, the invention provides a method for learning the mechanism and appearance joint parameters of a digital twin model, which predicts the mechanism change in the future state of a virtual model of equipment and the structure change of a three-dimensional model of the equipment through historical data and current operation data based on machine learning, thereby avoiding frequent three-dimensional modeling aiming at different states of the equipment and simultaneously linking the mechanism change of the model with the structure of the three-dimensional model.

In order to solve the technical problems, the technical scheme provided by the invention is as follows:

a digital twin model mechanism and appearance joint parameter learning method comprises the following steps:

s1) obtaining real images of the same scene of the target equipment under different view angles, reconstructing a three-dimensional scene according to the real images, rendering the three-dimensional scene by using a multi-layer perceptron to obtain a composite image, calculating a three-dimensional representation of the scene when the difference between the composite image and the real image is minimum, and generating a three-dimensional grid in the current state according to the three-dimensional representation of the scene;

s2) acquiring mechanism data in the current state of the target equipment and learning potential representation of the mechanism to obtain a mechanism data sequence in the next state of the target equipment; sorting vertexes of the three-dimensional grid in the current state, preprocessing the vertexes, and inputting the vertexes into a transducer neural network model to obtain vertex distribution of the three-dimensional grid in the next state; generating current state grid surface distribution according to the three-dimensional grid of the current state of the target equipment, sequencing the current state grid surface distribution, preprocessing the current state grid surface distribution, inputting the current state grid surface distribution into a transducer neural network model with potential representation of a mechanism data sequence under the current state of the target equipment, and obtaining next state grid surface distribution.

Further, in step S1, when reconstructing the three-dimensional scene according to the real image, the method includes: a ray is generated through each pixel in the real image, the origin of the ray being the position of the corresponding pixel of the image plane, the direction of the ray being the direction of the corresponding pixel towards the camera aperture.

Further, in step S1, when obtaining real images of the same scene of the target device under different viewing angles, a corresponding camera world-to-world matrix is also obtained, and the direction expression of the rays is as follows:

wherein C is _ex ^-1 Camera world matrix corresponding to real image, R ^′ _ex Is C _ex ^-1 Is a 3 x 3 matrix of the (c),the homogeneous representation of the point under the camera coordinate frame and under the world coordinate frame, respectively.

Further, in step S1, when the three-dimensional scene is rendered by using the multi-layer perceptron to obtain a composite image, the method includes: three-dimensional sample points are extracted from each ray, the space coordinates and the directions of each three-dimensional sample point are input into a multi-layer perceptron to obtain the color and the volume density of each three-dimensional sample point, and the color and the volume density of each three-dimensional sample point are integrated to obtain a composite image.

Further, the three-dimensional sample point expression is as follows:

wherein U refers to uniform sampling, t _f And t _n The most distant and nearest points on the current ray, N represents the number of divisions of the current ray, t _i Representing the current ray ith three-dimensional sample point.

Further, the expression for integrating the color and the bulk density of each three-dimensional sample point is as follows:

wherein t is _f And t _n The furthest and closest points on the current ray, respectively, σ (r (t)) is the volume density of the current ray at the three-dimensional sample point t, c (r (t)) is the color of the current ray at the three-dimensional sample point t,indicating how far the ray penetrates the three-dimensional space to the three-dimensional sample point.

Further, when calculating the three-dimensional representation of the scene when the difference between the synthesized image and the real image is minimum in step S1, it includes: the difference between the synthesized image and the real image is synthesized through the minimum luminosity consistency loss so as to optimize the three-dimensional representation of the scene;

in step S1, when generating a current state three-dimensional grid according to the three-dimensional representation of the scene, the method includes: and generating the three-dimensional grid of the current state from the optimized three-dimensional representation of the scene through a Marching cube algorithm.

Further, when acquiring the mechanism data of the target device in the current state and learning the mechanism potential representation in step S2, the method includes: and inputting the mechanism data sequence in the current state of the target device into a sequence network with attention, and predicting and learning potential representation of the mechanism data sequence of the target device according to the mechanism data sequence in the current state and the mechanism data sequences of all historical states before the current state by the GRU algorithm by the sequence network with attention, and outputting the mechanism data sequence in the next state of the target device.

Further, in step S2, when the vertices of the three-dimensional mesh in the current state are ordered and preprocessed and input into the transducer neural network model to obtain the vertex distribution of the three-dimensional mesh in the next state, the method includes:

all vertexes of the three-dimensional grid in the current state are firstly ordered from low to high according to z coordinates, the vertexes with the same z coordinate values are ordered from low to high according to y coordinates, and the vertexes with the same z coordinate values and y coordinate values are ordered from low to high according to x coordinates, so that a vertex sequence is obtained;

respectively embedding corresponding coordinate information, belonging point information, value information and mechanism potential representation information for each element in the vertex sequence to obtain a high-dimensional Vector corresponding to each element in the vertex sequence _v Then the high-dimensional Vector is used for _v Inputting a high-dimensional Vector of a next state generated in a transducer neural network model _v 。

Further, in step S2, when the current state grid surface distribution is ordered and preprocessed and the potential representation of the mechanism data sequence in the current state of the target device is input into the transducer neural network model to obtain the next state grid surface distribution, the method includes:

the sequence of the current state grid surface is ordered according to the sequence from low to high of the vertex indexes, and a surface sequence is obtained;

embedding corresponding belonging surface information, in-surface position information and vertex information for each element in the surface sequence to obtain a high-dimensional Vector corresponding to each element in the surface sequence _f Then the high-dimensional Vector is used for _f Potential representation of a sequence of mechanism data with a current state of a target device is input into a transducer neural network model to generate a high-dimensional Vector of the next state _f 。

Compared with the prior art, the invention has the advantages that:

according to the invention, the mechanism and the three-dimensional structure of the equipment digital twin model are jointly learned by analyzing the historical data and the operation data, so that the mechanism data of different states of the equipment model can be characterized, and meanwhile, the equipment digital twin model is used as a generating model, has the capability of generating the three-dimensional structure of the equipment digital twin model in the state, and is beneficial to automatic model construction and strengthening of the connection with the equipment mechanism data.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a multi-layer perceptron.

FIG. 3 is a schematic diagram of a mechanism sequence model according to an embodiment of the present invention.

FIG. 4 is a schematic diagram of a vertex sequence model according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a mesh surface sequence model according to an embodiment of the present invention.

Detailed Description

The invention is further described below in connection with the drawings and the specific preferred embodiments, but the scope of protection of the invention is not limited thereby.

The present embodiment proposes a method for learning a mechanism and appearance joint parameter of a digital twin model, which predicts a future state (a mechanism change thereof and a device three-dimensional model structure change) of a virtual model of a prediction device by using historical data and current operation data, and can solve the problem of frequently performing three-dimensional modeling on different states of a device, and simultaneously, combines the mechanism change of the model with the three-dimensional model structure, as shown in fig. 1, and includes the following steps:

s1) reconstructing a three-dimensional representation of the scene using multi-angle two-dimensional images of the same scene: obtaining real images of the same scene of the target equipment under different view angles, reconstructing a three-dimensional scene according to the real images, rendering the three-dimensional scene by using a multi-layer perceptron to obtain a composite image, calculating a three-dimensional representation of the scene when the difference between the composite image and the real image is minimum, and generating a three-dimensional grid in the current state according to the three-dimensional representation of the scene;

s2) modeling by using the mechanism data in the current state and the three-dimensional grid of the three-dimensional representation after reconstruction: acquiring mechanism data in the current state of the target equipment and learning potential mechanism representation of the mechanism data to obtain a mechanism data sequence in the next state of the target equipment; sorting vertexes of the three-dimensional grid in the current state, preprocessing the vertexes, and inputting the vertexes into a transducer neural network model to obtain vertex distribution of the three-dimensional grid in the next state; generating current state grid surface distribution according to the three-dimensional grid of the current state of the target equipment, sequencing the current state grid surface distribution, preprocessing the current state grid surface distribution, inputting the current state grid surface distribution into a transducer neural network model with potential representation of a mechanism data sequence under the current state of the target equipment, and obtaining next state grid surface distribution.

The target device in this embodiment is a bogie of a rail transit subway train, and the continuous state data is mechanism data, including structural parameters, performance parameters and material parameters at a certain moment, where the real images of the same scene at different view angles are images of the bogie at different view angles.

Step S1 of the embodiment firstly acquires two-dimensional images under different visual angles, and at the moment, the two-dimensional information is not directly analyzed and processed, but is innovatively restored into a three-dimensional scene by using a reverse mapping method for processing; meanwhile, the reconstructed three-dimensional information is sampled by utilizing rays, and in order to overcome the defect of insufficient information learning capacity of the traditional method, a sampling result is sent into a multi-layer perceptron to perform feature learning. Step S1 improves the traditional mode of carrying out mechanism and appearance representation by only using a two-dimensional digital model, innovatively adopts a 3D model generation mode, and creates more reliable synthetic data for downstream generation tasks; similar to the two-dimensional digital model, the three-dimensional digital model has data acquisition, data operation, data organization and data analysis, but the spatial information of the three-dimensional model device is more direct, the data analysis function is stronger, and the huge advantages in the aspect of visualization can bring better performance in downstream tasks. Meanwhile, 3D scene representation is optimized by combining a multi-layer perceptron, and compared with a traditional linear method and a single-layer perceptron, the learning capacity of the method is greatly improved. The optimized 3D scene representation can achieve better simulation prediction effect when downstream data twins. Step S1 of the present embodiment is specifically described below, and step S1 specifically includes:

s1.1) obtaining images of different view angles of the current scene of the equipment and corresponding camera world-to-world matrixes. In step S1 of this embodiment, the images with different view angles and the corresponding camera world matrix are required to be mapped reversely according to the real image reconstruction three-dimensional scene, and the 3D scene is generated from the 2D image, so that the corresponding camera world matrix C is also acquired when the real images with different view angles of the same scene of the target device are acquired _ex ^-1 。

S1.2) inverse mapping, generating a 3D scene from the 2D image. In step S1 of the present embodiment, when reconstructing a three-dimensional scene from the real image, the method includes: using ray casting and volumetric rendering techniques in computer graphics, rays are generated through each pixel in the real image, expressed as:

r(t)＝o+td (1)

where o is the origin of the ray, d represents the direction of the ray, t is a scalar parameter indicating how far along the direction vector d from the origin o is required to reach a particular point on the ray. As t varies, it creates a different point on the ray. When t is 0, r (t) is equal to the origin o. When t is positive, r (t) moves in the direction of d along the ray, and when t is negative, r (t) moves in the opposite direction of d. . In this embodiment, the origin of the ray is the position of the corresponding pixel of the image plane, and the direction of the ray is the direction of the corresponding pixel toward the camera aperture. Therefore, the direction expression of the ray in step S1 is as follows:

wherein C is _ex ^-1 A camera-to-world matrix corresponding to the real image,t′ _ex representing the origin of the ray, R' _ex Is C _ex ^-1 3 x 3 matrix above>The homogeneous representation of the point under the camera coordinate frame and under the world coordinate frame, respectively.

Next, rendering the three-dimensional scene using a multi-layer perceptron to obtain a composite image, comprising:

s1.3) sampling the radiation. Extracting a three-dimensional sample point from each ray, wherein the expression of the three-dimensional sample point is as follows:

S1.4) feeding the sampled three-dimensional sample points into a multi-layer perceptron to learn the correlation mapping thereof. As shown in FIG. 2, in this embodiment, three-dimensional sample points are input to the multi-layer perceptronComprises the spatial coordinates (x, y, z) and the direction of each three-dimensional sample pointThe multi-layer perceptron predicts and outputs the color RGB and the volume density sigma of each three-dimensional sample point, and specifically comprises the following steps:

assuming that there is a ray sampling point p, p is represented by a three-dimensional coordinate (x, y, z), its coordinates are encoded and initialized to a D-dimensional vector p'. P' is fully connected through 4 layers to obtain a characteristic f ₁ The method comprises the following steps:

f ₁ ＝w ⁴ (w ³ (w ² (w ¹ p′+b ¹ )+b ² )+b ³ )+b ⁴ (4)

wherein the weight parameter w ¹ ，w ² ，w ³ ，w ⁴ Are all randomly initialized, updated as the network trains;

will f ₁ Splicing with p' and inputting into full connection of 3 layers to obtain feature f ₂ The method comprises the following steps:

feature f ₂ Outputting a density value between 0 and 1 through a full connection with a sigmoid activation function, wherein the density value comprises the following components:

σ＝sigmoid(w ⁸ f ₂ +b ⁸ ) (6)

feature f ₂ In combination with the ray direction, the RGB4 color values of the ray sample point p are predicted. Assuming that the direction of the ray isThe encoded representation is initialized to a D-dimensional vector q, which has:

s1.5) rendering the three-dimensional scene using the color and volume densities predicted by the multi-layer perceptron. Integrating the color and the volume density of each three-dimensional sample point in the three-dimensional scene through ray sampling to obtain a composite image containing the color of the specific point, wherein the expression for integrating the color and the volume density of each three-dimensional sample point is as follows:

Next, when calculating a three-dimensional representation of the scene with a minimum difference between the composite image and the real image, the method includes

S1.6) the difference between the synthetic image and the real image is combined to optimize the three-dimensional representation of the scene by minimizing the photometric consistency loss, which is used to measure the difference between the synthetic image and the real image, typically for image generation and visual reconstruction tasks. By minimizing the photometric consistency loss, we can make the composite image as close as possible to the real image. Assuming that we have a composite image i_synth and a real image i_real, we can calculate the photometric consistency loss using a pixel level difference metric (e.g., mean square error, MSE). The formula is as follows:

L_photometric ＝ 1/N ∑(I_synth(i) - I_real(i))^2 (9)

where l_photometric represents the photometric consistency loss, N represents the total number of pixels in the image, I represents the index of the pixel, i_synth (I) represents the value of the I-th pixel in the composite image, and i_real (I) represents the value of the I-th pixel in the real image. This formula calculates the sum of squares of the difference of each corresponding pixel value between the composite image and the real image and normalizes it to the total number of pixels.

To minimize the gap between the composite image and the real image, we need to minimize the photometric consistency loss:

minimize L_photometric(10)

by optimizing this loss function, we can make the composite image photometrically closer to the real image;

generating a current state three-dimensional grid according to the scene three-dimensional representation, comprising: the optimized three-dimensional representation of the scene is used for generating the three-dimensional grid of the current state through the Marching cube algorithm, and the specific implementation process is not repeated as the three-dimensional representation is a conventional means used by a person skilled in the art.

Step S2 of the present embodiment first models the mechanism data, and in order to perform joint learning on both the history data and the current parameters, a module with attention is introduced to learn the mechanism data as a sequence. And secondly, establishing a three-dimensional grid in a vertex modeling mode, flattening the obtained vertex information, then carrying out embedded representation, disassembling one piece of vertex information into four angles, respectively carrying out embedded representation, and inputting the embedded information into a transducer module to carry out position prediction of the next vertex. And then establishing a grid surface model, and inputting the grid surface model into a transducer module to predict the surface position. In the aspect of mechanism modeling, the step S2 introduces an attention mechanism, and the mechanism data of the equipment is input into a network with the attention mechanism in a sequential manner, so that the joint learning of the current data and the historical data is realized. In the grid modeling for obtaining the 3D image, a vertex modeling mode is used, and 3D grid data generation is performed in combination with surface modeling so as to complete the point and surface information prediction of the next state of the device. Step S2 of the present embodiment is specifically described below, where step S2 specifically includes:

the method for acquiring the mechanism data in the current state of the target equipment and learning the mechanism potential representation comprises the following steps:

s2.1) modeling the mechanism data and establishing a sequence-to-sequence model. In this embodiment, in order to model structural parameters, performance parameters, and material parameters in the current state of the bogie as a sequence model, the current state of the target device is calculatedThe structural parameter, performance parameter and material parameter are used as the mechanism data sequence in the current state to input the sequence to the sequence network with attention, the sequence to sequence network with attention is shown in figure 3, and the potential representation C of the mechanism data sequence of the target equipment is predicted and learned for the mechanism data sequence in the current state and the mechanism data sequences of all historical states before the current state through GRU algorithm _Mechanism And outputting a mechanism data sequence in the next state of the target equipment, wherein the calculation and deduction process is as follows:

and processing the mechanism data from the forward direction and the reverse direction respectively by adopting the bidirectional GRU, and extracting the characteristic representation of the mechanism data.

Wherein the forward GRU has the formula ofThe formula of reverse GRU is->

Obtaining final hidden states as feature vectors for potential representation of the mechanism data

The sequence of attentiveness may take as input not only the current characteristics, but also historical information into the predictive network. The method integrates the history and the current parameters and characteristics, and solves the problem of inaccurate prediction information caused by the fact that the traditional method only depends on local information for prediction.

Next, when the vertexes of the three-dimensional grid in the current state are ordered and preprocessed and then input into a transducer neural network model to obtain the vertex distribution of the three-dimensional grid in the next state, the method comprises the following steps:

s2.2) building a vertex model. The vertex model refers to the fact that the 3D model built in step S1 is represented as a strictly ordered sequence of vertices, which enables them to apply an attention-based sequence modeling method to generate a 3D mesh, and the built three-dimensional model has each vertex of the model corresponding to a unique coordinate representation (x, y, z), and in order to facilitate the input of vertex information into the network, the vertex information is preprocessed as follows: all vertexes of the three-dimensional grid in the current state are firstly ordered from low to high according to z coordinates, the vertexes with the same z coordinate values are ordered from low to high according to y coordinates, and the vertexes with the same z coordinate values and y coordinate values are ordered from low to high according to x coordinates, so that a flattened vertex sequence is obtained;

as shown in fig. 4, after the flattened vertex sequence is obtained, the node information is encoded into a vector of a specified size by the embedding operation, and the corresponding coordinate information, belonging point information, value information, and mechanism potential representation information (the feature vector C obtained after the mechanism information has been extracted) are embedded for each element in the vertex sequence _Mechanism ) Obtaining a high-dimensional Vector corresponding to each element in the vertex sequence _v Then the high-dimensional Vector is used for _v Inputting a high-dimensional Vector of a next state generated in a transducer neural network model _v Specifically: maximizing the product of conditional vertex distributions using a transformer network is:the flattened vertex sequence for the next state is denoted v ^seq . Wherein the element is denoted as v _n ，n＝1，2，...，N _v . Vector from high-dimensional Vector of next state _v The vertex distribution of the three-dimensional grid of the next state can be obtained.

Next, comprising:

s2.3) establishing a grid surface model. I.e. potential representation of C in terms of the vertex distribution of the three-dimensional mesh and the mechanism obtained in step S2.1) _Mechanism A distribution of the mesh surface sequence is generated for the condition.

Firstly, generating current state grid surface distribution according to a three-dimensional grid of the current state of target equipment, then sorting the current state grid surface distribution, preprocessing the current state grid surface distribution, and inputting a latent representation of a mechanism data sequence in the current state of the target equipment into a transducer neural network model to obtain next state grid surface distribution, wherein the method comprises the following steps:

as shown in FIG. 5, each element in the surface sequence is embedded with corresponding surface information, in-surface position information and vertex information to obtain a high-dimensional Vector corresponding to each element in the surface sequence _f Then the high-dimensional Vector is used for _f Potential representation C of a mechanism data sequence in the current state of a target device _Mechanism Inputting a high-dimensional Vector of a next state generated in a transducer neural network model _f Specifically, using a transducer network to maximize the product of conditional mesh plane distributions, there are:wherein θ represents a weight parameter; v represents vertex information, namely, vertex high-dimensional vector representation obtained in the last step; three-dimensional meshes are typically composed of a collection of triangles, one polygon being composed of many triangular faces, the number of triangular faces required for different polygons being different. Let the ith polygon be denoted +.>Is made up of N _f Each triangular surface is formed by adding a spacer s between each triangular surface. As with the vertex sequence, we concatenate faces into a flattened sequence, using f ^seq And (3) representing. f (f) _n As f ^seq In the presence of an element of the group, wherein subscript n=1,.. _f 。

Vector from high-dimensional Vector of next state _f The grid surface distribution of the next state can be obtained.

In summary, the invention can characterize the mechanism data of different states of the equipment model by analyzing the historical data and the operation data and jointly learning the mechanism and the three-dimensional structure of the digital twin equipment model, and simultaneously, the model can be used as a generating model, has the capability of generating the three-dimensional structure of the equipment model in the state, and is beneficial to the automatic model construction and the strengthening of the connection with the equipment mechanism data.

The foregoing is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. While the invention has been described with reference to preferred embodiments, it is not intended to be limiting. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention shall fall within the scope of the technical solution of the present invention.

Claims

1. The method for learning the combined parameters of the mechanism and the appearance of the digital twin model is characterized by comprising the following steps:

2. The method for learning the combined digital twin model mechanism and appearance parameters according to claim 1, wherein when reconstructing a three-dimensional scene from the real image in step S1, the method comprises: a ray is generated through each pixel in the real image, the origin of the ray being the position of the corresponding pixel of the image plane, the direction of the ray being the direction of the corresponding pixel towards the camera aperture.

3. The method for learning the combined parameters of the mechanism and the appearance of the digital twin model according to claim 2, wherein in step S1, when obtaining real images of the same scene of the target device under different viewing angles, a corresponding camera world-to-world matrix is also obtained, and the direction expression of the rays is as follows:

4. The method for learning the combined digital twin model mechanism and appearance parameters according to claim 1, wherein when the three-dimensional scene is rendered by using the multi-layer perceptron in step S1 to obtain a composite image, the method comprises: three-dimensional sample points are extracted from each ray, the space coordinates and the directions of each three-dimensional sample point are input into a multi-layer perceptron to obtain the color and the volume density of each three-dimensional sample point, and the color and the volume density of each three-dimensional sample point are integrated to obtain a composite image.

5. The method for learning a combined digital twin model mechanism and appearance parameter of claim 4, wherein the three-dimensional sample point expression is as follows:

wherein U refers to uniform sampling, t _f And t _n The most distant and nearest points on the current ray, N represents the number of divisions of the current ray, t _i Representation ofThe current ray is the i-th three-dimensional sample point.

6. The method for learning the combined digital twin model mechanism and appearance parameters according to claim 4, wherein the expression for integrating the color and the bulk density of each three-dimensional sample point is as follows:

7. The method for learning the combined digital twin model mechanism and appearance parameters according to claim 1, wherein the step S1 of calculating the three-dimensional representation of the scene with the smallest difference between the synthesized image and the real image comprises: the difference between the synthesized image and the real image is synthesized through the minimum luminosity consistency loss so as to optimize the three-dimensional representation of the scene;

8. The method for learning the combined mechanism and appearance parameters of the digital twin model according to claim 1, wherein when acquiring the mechanism data of the target device in the current state and learning the mechanism potential representation in step S2, the method comprises: and inputting the mechanism data sequence in the current state of the target device into a sequence network with attention, and predicting and learning potential representation of the mechanism data sequence of the target device according to the mechanism data sequence in the current state and the mechanism data sequences of all historical states before the current state by the GRU algorithm by the sequence network with attention, and outputting the mechanism data sequence in the next state of the target device.

9. The method for learning the combined parameters of the mechanism and the appearance of the digital twin model according to claim 1, wherein when the vertexes of the three-dimensional grid in the current state are ordered and preprocessed and input into the transform neural network model to obtain the vertex distribution of the three-dimensional grid in the next state in the step S2, the method comprises the following steps:

10. The method for learning the mechanism and appearance joint parameters of the digital twin model according to claim 1, wherein when the step S2 is to sort the current state grid surface distribution and input the potential representation of the mechanism data sequence in the current state of the target device after preprocessing to the Transformer neural network model to obtain the next state grid surface distribution, the method comprises the steps of:

embedding corresponding belonging surface information, in-surface position information and vertex information for each element in the surface sequence to obtain a high-dimensional Vector corresponding to each element in the surface sequence _f Then the high-dimensional Vector is used for _f In the current state with the target devicePotential representation of a mechanism data sequence is input into a transducer neural network model to generate a high-dimensional Vector of the next state _f 。