CN116680988A

CN116680988A - Porous medium permeability prediction method based on Transformer network

Info

Publication number: CN116680988A
Application number: CN202310737138.9A
Authority: CN
Inventors: 蒋建国; 孟胤全; 吴吉春; 王栋
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2023-06-21
Filing date: 2023-06-21
Publication date: 2023-09-01

Abstract

The invention discloses a porous medium permeability prediction method based on a transducer network, and belongs to the technical field of pore scale numerical simulation. The method simulates a flow field by using a pore scale simulation method, and calculates the permeability of the three-dimensional porous medium; taking the three-dimensional structure image of the porous medium as a space sequence consisting of two-dimensional slice images, and extracting the physical attribute of each two-dimensional slice image; adding a corresponding physical parameter matrix into a three-dimensional structure image of the porous medium, constructing a sample set by taking the permeability of the porous medium as a label value corresponding to the image, determining a training set, a verification set and a test set, and constructing and training a PhyCNN-transducer neural network model by utilizing a convolutional neural network CNN and a transducer network, thereby realizing accurate prediction of the permeability of the porous medium. According to the invention, the three-dimensional image regression problem is converted into the two-dimensional image sequence regression problem, so that the understanding of the structural characteristics of the porous medium is deepened, and the essential characteristics of the porous medium are captured, thereby improving the prediction performance and generalization capability of the model.

Description

Porous medium permeability prediction method based on Transformer network

Technical Field

The invention relates to the technical field of pore scale numerical simulation, in particular to a porous medium permeability prediction method based on a Transformer network.

Background

The permeability is taken as an important parameter for measuring the difficulty of fluid passing through the porous medium, and plays a basic control role in establishing a mathematical model of porous medium seepage and pollutant migration. Accurate acquisition of permeability is helpful for more comprehensively knowing the detailed process of substance transfer in the porous medium, reduces the uncertainty of application and improves the application efficiency, so that the dynamic characteristics of the geologic body can be accurately and quantitatively evaluated and predicted.

The traditional permeability measurement method is a Darcy pressure gradient method, but the method has the problems of long test period and large environmental disturbance.

In recent twenty years, with the rapid development of computational fluid mechanics, pore scale numerical simulation has been widely applied to simulation of porous media fields and calculation of macroscopic parameters (such as permeability), mainly including direct simulation methods such as lattice boltzmann method, classical computational fluid mechanics (Computational Fluid Dynamics, CFD), and pore network model method. The direct simulation method carries out simulation calculation based on a three-dimensional image of the pore space, so that complex boundary conditions exist, and the calculation cost is high due to the nonlinear characteristics of a Navier-Stokes equation, so that the calculable size of the porous medium is limited. The pore network model method simplifies the geometric shape of the pore space, simultaneously reserves the necessary pore structure characteristics of substance transportation, has shorter calculation time and can simulate a porous medium rock core with larger size, but the accuracy of the simulation result is generally inferior to that of a direct simulation method.

With the vigorous development of computer technology and deep learning, the use of deep learning neural network models to predict the permeability of porous media has become a trend. The basic idea of the method is to establish a direct mapping relation between the porous medium digital image and the permeability value by using a neural network architecture. The most common of these is convolutional neural network CNN. From darcy's law, the permeability of a porous medium is determined by its pore space structure. The porous medium digital image is used as the input of a CNN model, the network compresses the extracted high-dimensional deep feature image into feature vectors through the transformation operations of convolution, pooling, residual connection and the like, and the mapping relation with the permeability value is further established through the full-connection layer.

As an alternative to the traditional numerical simulation method, the neural network model has short calculation time and high accuracy, and is a main stream direction for rapidly predicting the permeability parameter or other physical parameters of the porous medium in the future. However, training neural network models, particularly deep three-dimensional convolutional neural network models, is a difficult challenge for the memory of Graphics Processing Unit (GPU) because the memory size of the GPU often limits the predictive performance and training efficiency of the model.

Disclosure of Invention

Aiming at the defects existing in the prior art, the invention provides a porous medium permeability prediction method based on a transform network, which is used for establishing a PhyCNN-transform hybrid neural network, converting a three-dimensional image regression problem into a two-dimensional image sequence regression problem and solving the problems of prediction performance and training efficiency of a memory size limit model of a GPU to a certain extent.

In order to solve the technical problems, the invention provides the following technical scheme: a porous medium permeability prediction method based on a transducer network comprises the following steps:

s1, simulating a flow field by using a pore scale simulation method, and calculating the permeability of a three-dimensional porous medium;

s2, regarding the three-dimensional structure image of the porous medium as a space sequence consisting of two-dimensional slice images, and extracting the physical attribute of each two-dimensional slice image;

s3, adding a corresponding physical parameter matrix into the three-dimensional structure image of the porous medium to construct a sample set, and determining a training set, a verification set and a test set;

s4, constructing a PhyCNN-transducer neural network model based on the convolutional neural network CNN and the transducer network; the convolutional neural network CNN is used for extracting image features containing physical parameter information, and is different from a common CNN and expressed as PhyCNN;

combining the training set with a gradient descent method, and training and optimizing parameters of a neural network model PhyCNN-transducer; the verification set is used for evaluating the prediction capability of the PhyCNN-transducer model in the training process; the test set was used to examine the final predictive effect of the PhyCNN-transporter model.

According to the technical scheme, the pore scale simulation method comprises a pore network model PNM and a lattice Boltzmann method LBM.

According to the technical scheme, the three-dimensional structure image segmentation step of the porous medium comprises the following steps:

the three-dimensional structure image of the porous medium is cut into continuous two-dimensional images along any axis direction (generally, the water flow direction) of a Cartesian space rectangular coordinate system, the length of the porous medium in the cutting axis direction is the sequence length, and the thickness of the cut image is equal to the pixel size.

The axis along which the cutting is performed in any one axis direction of the cartesian space rectangular coordinate system is called a cutting axis.

According to the technical scheme, the physical properties of the two-dimensional slice image comprise porosity and specific surface area parameters;

the physical attribute extraction method comprises the following steps: the 1 value of the porous medium two-dimensional slice gray level image represents a solid phase, the 0 value pixel of the porous medium two-dimensional slice gray level image represents a pore structure, the statistics of the pixel points of the slice image is realized by using python programming, and the porosity and the specific surface area are calculated.

The porosity is defined as the ratio of the occupied area of the pores to the slice area, and the representation method is the ratio of the number of pore pixels to the total number of pixels of the slice image; the specific surface area is defined as the ratio of the total perimeter of all pores in the slice to the partial area of the pores, and the representation method is the ratio of the pixel number of the pore edge to the total pixel number of the pores;

according to the technical scheme, the corresponding physical parameter matrix is added into the three-dimensional structure image of the porous medium, and the specific method comprises the following steps of: and establishing a two-dimensional parameter matrix with the same size as the two-dimensional slice image, wherein the upper half elements of the matrix are the porosity values of the slice, and the lower half elements of the matrix are the specific surface area values of the slice, and adding the parameter matrix as a single-channel image into the original gray image of the three-dimensional structure of the porous medium, so that the three-dimensional structure image of the porous medium contains physical parameter information.

According to the technical scheme, the porous medium permeability calculated by using a pore network model PNM or a lattice Boltzmann method LBM is used as a label value corresponding to an image, so that the sample set is formed; the sample set is divided into a training set, a verification set and a test set according to a certain proportion, and can be divided into a 6:2:2 ratio or other similar proportions.

According to the technical scheme, the PhyCNN-transducer neural network model consists of a convolutional neural network CNN and a transducer network, and specifically comprises the following steps:

the convolutional neural network CNN is provided with 4 two-dimensional convolutional layers with different sizes, and a single-layer first full-connection layer is tightly connected after the 4 convolutional layers;

the first full-connection layer of the convolutional neural network CNN is connected with the coding layer of the transducer network;

the Transformer network has 6 identical coding layers, each coding layer having 2 sublayers; the first sub-layer is a multi-head self-attention layer, the second sub-layer is a second full-connection layer, and residual connection and regularization operations are included between the multi-head self-attention layer and the second full-connection layer;

the convolution layer is used for generating a multi-channel feature map for each two-dimensional image in the sequence; the first full-connection layer is used for compressing the multi-channel feature map into a feature vector with a fixed dimension; the size and the number of the two-dimensional convolution kernels are fixed.

The convolutional neural network CNN extracts image features containing physical parameter information, and is distinguished from common CNNs and expressed as PhyCNNs.

The Transformer network is a Seq2Seq model based on a multi-headed self-attention mechanism. The self-attention mechanism can correlate different positions in the sequence and re-characterize the sequence. Multi-headed self-attention allows the model to focus information from different subspaces at different locations, thereby enhancing the characterizability of the model.

According to the technical scheme, the regression prediction of the permeability comprises the following specific steps:

inputting a porous medium three-dimensional structure containing physical parameter information and represented by a two-dimensional image sequence into a two-dimensional convolution layer of a CNN model, and increasing the nonlinear characterization capability of a network by using a Batch Norm and ReLU activation function to generate a multi-channel feature map for each two-dimensional image in the sequence;

the multi-channel feature map is input to the fully connected layer, compressing it into a fixed-dimension feature vector. Repeating this operation for each image in the sequence, thereby forming a sequence of feature vectors;

determining positional information for each item in the sequence using sinusoidal positional codes;

adding position codes into the feature vector sequence, inputting the position codes into a multi-head self-attention layer of a transducer network, associating different positions in the feature vector sequence, and re-characterizing the positions;

and (3) averaging in the dimension of the sequence length of the characteristic vector sequence after characterization, and inputting the characteristic vector sequence into a full-connection layer to carry out regression prediction on the permeability of the three-dimensional porous medium.

Since the self-attention mechanism ignores the position information of each item in the sequence, it is necessary to add a position code to the input sequence;

according to the technical scheme, the sinusoidal position codes:

wherein 2i and 2i+1 represent the 2i and 2i+1 th elements embedded in the feature vector, pos is the global position of the feature vector in the sequence, PE represents the position code, d _model Representing the number of features expected by the model, i.e. the dimension of the feature vectors in the sequence.

According to the technical scheme, the re-characterization step of the porous medium feature vector sequence comprises the following steps:

in order to select information related to porous medium permeability from an input sequence of feature vectors containing position codes, a multi-headed self-attention mechanism introduces a dimension d _k Is a query vector and a key vector of (1) and has a dimension d _v Is a vector of values of (a);

the correlation between each feature vector (represented by a key value pair) in the sequence and the query vector is calculated as a scoring function. The key vector is used for calculating the attention distribution, and the value vector is used for calculating the aggregation information;

the dot product of the query vector is calculated using all key vectors and divided byAnd then calculating the weight of the value vector by using a softmax function, wherein the matrix representation method comprises the following steps:

wherein Q is a query vector matrix, K is a key vector matrix, V is a value vector matrix, d _k ＝d _model /h, representing Q, K in the dimension of h subspaces (heads), K ^T Is the transpose of the key vector matrix;

multi-headed self-attention projects the query matrix Q, key matrix K, and value matrix V to d with different linear projections of the h subspaces (heads), respectively _k 、d _k And d _v And (3) the output of each subspace (head) is spliced and projected again to obtain the final multi-head self-attention mechanism output:

wherein, the projections are all parameter matrixes,and +.>Weight matrix of linear transformation, i E [1, h]H represents the number of projections, i.e. the number of heads, subspaces of multi-head self-attention, d _model Representing model expectationsFeature numbers of (a), i.e. the dimension of feature vectors in the sequence, head _i Representing the ith "head" of the multi-head self-attention mechanism.

Compared with the prior art, the invention has the following beneficial effects: according to the PhyCNN-converter model provided by the invention, the three-dimensional image regression problem is converted into the regression problem of a two-dimensional image sequence, and aiming at the task of porous medium permeability prediction, two-dimensional physical parameter information is added into a porous medium digital image, so that the model can deepen understanding of structural characteristics of the porous medium, and the essential characteristics of the porous medium are captured, so that the prediction accuracy and generalization capability are improved. Compared with a common 3D CNN model, the PhyCNN-transducer model greatly reduces the parameter quantity of neural network training, and greatly improves the training speed, the prediction performance and the generalization capability of the model.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

FIG. 1 is a frame structure of a PhyCNN-transporter model;

FIG. 2 is a network structure of a transducer;

FIG. 3 is a three-dimensional structure of a porous media sample;

FIG. 4 is a data distribution of a porous medium;

FIG. 5 is a graph showing loss values of PhyCNN-transporter model at different batch sizes during training;

FIG. 6 is a prediction result of the PhyCNN-transporter model for the validation set at different batch sizes;

FIG. 7 is a prediction result of an optimal PhyCNN-transporter model;

fig. 8 is a prediction result of the 3D CNN model.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1-8, the present invention provides a method for predicting permeability of a porous medium based on a Transformer network, which comprises the following steps:

s1, simulating a flow field by using a PNM or LBM pore scale simulation method, and calculating the permeability of a three-dimensional porous medium;

wherein the physical properties of the two-dimensional slice image include porosity and specific surface area parameters; the three-dimensional structure image segmentation step of the porous medium comprises the following steps: the three-dimensional structure image of the porous medium is cut into continuous two-dimensional images along any axis direction (generally, the water flow direction) of a Cartesian space rectangular coordinate system, the length of the porous medium on a cutting axis is the sequence length, and the thickness of the cut image is equal to the pixel size.

S3, adding a corresponding physical parameter matrix into the three-dimensional structure image of the porous medium, constructing a sample set by taking the permeability of the porous medium as a label value corresponding to the image, and determining a training set, a verification set and a test set. The method for adding the corresponding physical parameter matrix into the three-dimensional structure image of the porous medium comprises the following steps of: and establishing a two-dimensional parameter matrix with the same size as the two-dimensional slice image, wherein the upper half elements of the matrix are the porosity values of the slice, and the lower half elements of the matrix are the specific surface area values of the slice, and adding the parameter matrix as a single-channel image into the original gray image of the three-dimensional structure of the porous medium, so that the three-dimensional structure image of the porous medium contains physical parameter information.

The sample set may be divided into a training set, a validation set, and a test set in a 6:2:2 or other similar ratio.

S4, constructing a PhyCNN-transducer neural network model based on a convolutional neural network CNN and a transducer network, wherein the PhyCNN-transducer neural network model specifically comprises the following steps: inputting a porous medium three-dimensional structure containing physical parameter information and represented by a two-dimensional image sequence into a two-dimensional convolution layer of a CNN model, and increasing the nonlinear characterization capability of a network by using a Batch Norm and ReLU activation function to generate a multi-channel feature map for each two-dimensional image in the sequence;

determining positional information for each item in the sequence using sinusoidal positional codes; the sinusoidal position code:

Adding position codes into the feature vector sequence, inputting the position codes into a multi-head self-attention layer of a transducer network, associating different positions in the feature vector sequence, and re-characterizing the positions; the re-characterization step of the porous medium feature vector sequence comprises the following steps:

wherein, the projections are all parameter matrixes,and +.>Weight matrix of linear transformation, i E [1, h]H represents the number of projections, i.e. the number of heads, subspaces of multi-head self-attention, d _model Representing the number of features expected by the model, i.e. the dimension of the feature vector in the sequence, head _i Representing the ith "head" of the multi-head self-attention mechanism.

Averaging the characteristic vector sequences in the dimension of sequence length after characterization, inputting the characteristic vector sequences into a full-connection layer, and carrying out regression prediction on the permeability of the three-dimensional porous medium;

the PhyCNN-transducer neural network model consists of a convolutional neural network CNN and a transducer network, and specifically comprises the following steps: the convolutional neural network CNN is provided with 4 two-dimensional convolutional layers with different sizes, and a single-layer first full-connection layer is tightly connected after the 4 convolutional layers; the first full-connection layer of the convolutional neural network CNN model is connected with the coding layer of the Transformer network; the Transformer network has 6 identical coding layers, each coding layer having 2 sublayers; the first sub-layer is a multi-head self-attention layer, the second sub-layer is a second full-connection layer, and residual connection and regularization operations are included between the multi-head self-attention layer and the second full-connection layer; the convolution layer is used for generating a multi-channel feature map for each two-dimensional image in the sequence; the first full connection layer is used for compressing the multi-channel feature map into a feature vector with a fixed dimension.

Training a PhyCNN-transporter neural network by using a training data set and optimizing parameters by using a reverse gradient method; using the validation set to evaluate the model, calculating a regression performance index, such as Root Mean Square Error (RMSE), of the model, determining coefficients (R ² ) Etc.; adjusting the model according to the evaluation result, such as a network structure, a super parameter such as Batch size (Batch size), a learning rate, or using a regularization technology; permeability predictions were made for the test set (new porous media image) using a trained PhyCNN-transducer neural network.

Example 1

Randomly selecting 6250 semi-true three-dimensional porous media (shown in figure 3) generated by Arash Rabbani and corresponding permeability values calculated by PNM; three-dimensional porous media image volume of 256 ³ Voxel, voxel size 5 μm. Permeability values are expressed in square units of pixels, and their data distribution in darcy units is shown in fig. 4;

considering the three-dimensional porous medium as 256 two-dimensional gray-scale images which are continuous along the x-axis direction, wherein the image size is 256 multiplied by 256, and the pixel with the value of 0 represents the pore structure and the pixel with the value of 1 represents the solid-phase skeleton; the porosity and specific surface area of the two-dimensional slice image can be calculated by python programming, and the two-dimensional parameter matrix is formed and added into the image, so that the two-dimensional parameter matrix becomes a two-channel image containing physical information and is input into a PhyCNN-transporter network;

constructing a PhyCNN-transform neural network architecture by using a Pytorch platform, wherein network hyper-parameter settings such as CNN convolution kernel size, number and the like are shown in a table 1;

TABLE 1

3500 (56%) samples in the sample set were selected as training sets, 1500 (24%) samples as validation sets, 1250 (20%) samples as test sets.

The PhyCNN-transporter neural network obtains a good prediction result after 100epoch training. Fig. 5 shows the loss curves of the PhyCNN-transducer model for the validation set at different batch sizes. R for verification set in connection with FIG. 6 ² And determining the optimal PhyCNN-transducer model by the characterization of the RMSE. The predicted results of the best PhyCNN-transporter model on the validation set and the test set are shown in FIG. 7.

Comparative example 1

The same training set, validation set and test set as PhyCNN-transducer were used.

A three-dimensional convolutional neural network (3D CNN) model based on pytorrch is established, and a three-dimensional porous medium digital image with a volume of 2563 voxels is compressed into a feature vector with 256 dimensions. Firstly, constructing an encoder through 4 3D convolution layers, a Batch normalization (Batch Norm) layer and a ReLU activation function, and extracting features of a digital image; then, compressing the extracted features into 256-dimensional feature vectors using a linear layer; finally, these feature vectors are input into 5 fully connected layers (including input and output layers) to predict permeability.

Wherein, the epoch used for 3D CNN model training is increased to 150, and the initial learning rate is 1e-3; at epoch of 20, 50, 80 and 110, respectively, the learning rate is set to 0.1, 0.05, 0.025 and 0.01 times the initial learning rate in this order;

r of 3D CNN on validation set when batch size is equal to 8 ² 0.9844, rmse 0.0765;

r of training completed 3D CNN on test set ² 0.9670, rmse 0.1479;

comparing the predicted results of the 3D CNN model and the PhyCNN-transducer model on the verification set and the test set, the result shows that the predicted result of the 3D CNN model is relatively poor.

In addition, the training parameter number of the 3D CNN is 20.2M, and the training parameter number of the PhyCNN-transporter is 8.59M, so that the parameter number is reduced by 57.5%, and the required video memory is greatly saved. In terms of training speed, the PhyCNN-transporter model averages 29.0 minutes per epoch and the 3D CNN model averages 29.6 minutes per epoch when trained with the same GPU device. In addition, the 3D CNN model has the advantages that the training epoch required for the loss curve convergence is larger, and the training time is far longer than that of PhyCNN-transducer.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A porous medium permeability prediction method based on a transducer network is characterized by comprising the following steps of:

simulating a flow field by using a pore scale simulation method, and calculating the permeability of the three-dimensional porous medium;

the three-dimensional structure image of the porous medium is regarded as a space sequence consisting of two-dimensional slice images, and the physical attribute of each two-dimensional slice image is extracted;

adding a corresponding physical parameter matrix into the three-dimensional structure image of the porous medium, constructing a sample set by taking the permeability of the porous medium as a label value, and determining a training set, a verification set and a test set;

constructing a PhyCNN-transporter neural network model based on a convolutional neural network (Convolutional Neural Network, CNN) and a transporter network;

2. The method for predicting the permeability of the porous medium based on the Transformer network according to claim 1, wherein the method comprises the following steps: the pore scale simulation method includes a pore network model (Pore Network Model, PNM) and lattice boltzmann method (Lattice Boltzmann Method, LBM).

3. The method for predicting the permeability of a porous medium based on a Transformer network according to claim 1, wherein the three-dimensional structure image segmentation step of the porous medium is as follows:

and cutting the three-dimensional structure image of the porous medium into continuous two-dimensional images along any axis direction of a Cartesian space coordinate system, wherein the length of the porous medium on a cutting axis is the sequence length, and the thickness of the cut image is equal to the pixel size.

4. The method for predicting the permeability of the porous medium based on the Transformer network according to claim 1, wherein the method comprises the following steps:

the physical properties of the two-dimensional slice image comprise porosity and specific surface area parameters;

5. The method for predicting the permeability of the porous medium based on the Transformer network according to claim 1, wherein the method comprises the following steps: the specific method for adding the corresponding physical parameter matrix into the three-dimensional structure image of the porous medium comprises the following steps: and establishing a two-dimensional parameter matrix with the same size as the two-dimensional slice image, wherein the upper half elements of the matrix are the porosity values of the slice, and the lower half elements of the matrix are the specific surface area values of the slice, and adding the parameter matrix as a single-channel image into the original gray image of the three-dimensional structure of the porous medium, so that the three-dimensional structure image of the porous medium contains physical parameter information.

6. The method for predicting the permeability of the porous medium based on the Transformer network according to claim 1, wherein the method comprises the following steps: the PhyCNN-transducer neural network model consists of a convolutional neural network CNN and a transducer network, and specifically comprises the following components:

the convolutional neural network CNN is provided with 4 two-dimensional convolutional layers, and a first full-connection layer of a single layer is tightly connected after the 4 convolutional layers;

the convolution layer is used for generating a multi-channel feature map for each two-dimensional image in the sequence;

the first full connection layer is used for compressing the multi-channel feature map into a feature vector with a fixed dimension.

7. The method for predicting the permeability of a porous medium based on a Transformer network according to claim 1, wherein the regression prediction of the permeability comprises the following specific steps:

inputting the multi-channel feature map into a full connection layer, so as to compress the multi-channel feature map into a feature vector with fixed dimension; repeating this operation for each image in the sequence, thereby forming a sequence of feature vectors;

8. The method for predicting the permeability of the porous medium based on the Transformer network according to claim 7, wherein the method comprises the following steps: the sinusoidal position code:

wherein,,2i and 2i+1 represent the 2i and 2i+1 th elements embedded in the feature vector, pos is the global position of the feature vector in the sequence, PE represents the position code, d _model Representing the number of features expected by the model, i.e. the dimension of the feature vectors in the sequence.

9. The method for predicting the permeability of a porous medium based on a Transformer network according to claim 7, wherein the step of re-characterizing the feature vector sequence is:

the multi-head self-attention mechanism introduces dimension d _k Is a query vector and a key vector of (1) and has a dimension d _v Is a vector of values of (a);

calculating the correlation between each feature vector in the sequence and the query vector by using a scoring function; the key vector is used for calculating the attention distribution, the value vector is used for calculating the aggregation information, and the characteristic vector adopts a key value representation;

wherein, the projections are all parameter matrixes,andweight matrix of linear transformation, i E [1, h]H represents the number of projections, i.e. the number of heads, subspaces of multi-head self-attention, d _model Representing the number of features expected by the model, i.e. the dimension of the feature vector in the sequence, head _i Representing the ith "head" of the multi-head self-attention mechanism. />