CN116467930A - Transformer-based structured data general modeling method - Google Patents

Transformer-based structured data general modeling method Download PDF

Info

Publication number
CN116467930A
CN116467930A CN202310239904.9A CN202310239904A CN116467930A CN 116467930 A CN116467930 A CN 116467930A CN 202310239904 A CN202310239904 A CN 202310239904A CN 116467930 A CN116467930 A CN 116467930A
Authority
CN
China
Prior art keywords
features
neural network
mlp
layer
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310239904.9A
Other languages
Chinese (zh)
Inventor
郭颖
熊媛媛
李喜武
刁克红
孙广源
梁浩然
梁荣华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Cosmoplat Industrial Intelligent Research Institute Qingdao Co Ltd
Original Assignee
Zhejiang University of Technology ZJUT
Cosmoplat Industrial Intelligent Research Institute Qingdao Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT, Cosmoplat Industrial Intelligent Research Institute Qingdao Co Ltd filed Critical Zhejiang University of Technology ZJUT
Priority to CN202310239904.9A priority Critical patent/CN116467930A/en
Publication of CN116467930A publication Critical patent/CN116467930A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/02Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a general modeling method of structured data based on a transducer, which comprises the steps of firstly removing irrelevant features from original data, then using different embedding methods for category features and numerical features, then splicing feature vectors after embedding the category features and the numerical features, and inputting the spliced feature vectors into a transducer+neural network (improved transducer) and an MLP+neural network, wherein the transducer+neural network is formed by adding a leakage Gate before the original transducer and an MLP+neural network after the original transducer, and finally distributing different weights for output values of two modules. The invention is applicable to both classification problems and multi-classification problems.

Description

Transformer-based structured data general modeling method
Technical Field
The invention belongs to the field of structured data processing, and particularly relates to a general structured data modeling method based on a Transformer
Background
Form data is the most common form of data and is ubiquitous in a variety of applications, such as medical diagnosis based on medical records, predictive analysis in the financial field, network security, and the like. At present, a tree-based integration method, such as a gradient lifting decision tree GBDT, is generally used, has a good effect in processing table data, mainly realizes better learning ability on continuous numerical characteristics, can automatically select and combine useful numerical characteristics, and effectively constructs the decision tree by calculating information gain. However, since class features are generally converted into high-dimensional sparse one-hot codes, the gradient boost decision tree GBDT will obtain very little information gain when processing such data, and such features cannot be effectively learned.
In recent years, the method based on a transducer has been greatly successful in the field of computer vision and the field of natural language processing. In the field of computer vision, the size of the receptive field is limited by the arrangement of the convolution kernel, so that a network often needs multiple layers of stacking to pay attention to the whole feature map; in the field of natural language processing, RNN or LSTM can link information accumulation over several time steps, and the farther the distance, the less likely effective capture. Whereas self-attention in the transducer can capture global attention information. Besides this, there is also a direct help to increase the parallelism of the computation, which is also the main reason that transfomers are widely used.
Multi-layer perceptron MLPs, which may be the simplest and most versatile neural networks, typically learn parameter embedding to encode classification data features, but are not robust to missing and noisy data due to their relatively shallow architecture and use of context-free embedding, most importantly, in most cases, multi-layer perceptron MLPs do not behave like tree-based models.
In view of the foregoing, it is an urgent need to solve the problem of deep learning application in the form field to effectively learn form data and overcome the above problems.
Disclosure of Invention
In order to solve the defects of the existing tree-based integration method in form prediction, the invention provides a general modeling method for structured data based on a transducer.
In order to solve the technical problems, the invention is realized by adopting the following technical scheme:
a general modeling method for structured data based on a transducer comprises the following steps:
(1) The input public data set is subjected to characteristic processing: after the original data is obtained, irrelevant features are required to be removed, category features in the data are encoded into identifiable digital forms, and the digital features are scaled according to standardized operation;
(2) Word Embedding is carried out on the feature vectors after feature processing, and the feature vectors are embedded into the Embedding: projecting the high-dimensional discrete data of the numerical features and the class features into a low-dimensional dense d-dimensional space by word Embedding before passing through a transducer+encoder;
(3) The word obtained in the last step is embedded into an Embedding vector and is input into two branches of a model: the model is divided into two branches of a transducer+neural network and an MLP+neural network, the feature vector of training data after word Embedding is input into the transducer+neural network for learning, the original output of the neural network is obtained, the same input is input into the MLP+neural network for modeling learning, a trained MLP+neural network is obtained, the transducer+neural network and the MLP+neural network are fused into a classification model, so that the two parts of original outputs are weighted and summed to form the integral output value of the model, and then the integral prediction result given by the classification model is obtained through an activation function;
(4) Training is guided by adopting Focal Loss as an objective function: training the classification model by utilizing the preprocessed training data, guiding the training process by adopting Focal Loss Focal Loss as an objective function, and searching the optimal parameters to obtain a trained classification model;
(5) Receive other tabular data for prediction: and preprocessing the form data to be classified, and inputting the form data to the trained classification model for classification prediction.
Further, in the step (1), the method for processing the input features includes the following steps:
(1-1) culling the useless features: carrying out feature recognition on each data set according to priori knowledge, and eliminating useless features;
(1-2) processing the sequential features: the continuous features are standardized by a standard scaler, and the numerical features are scaled;
(1-3) processing category characteristics: the class features are coded into digital form by a label coder LabelEncoder, so that the calculation cost is increased in order to avoid sparse coding, and single-hot one-hot coding is not carried out.
Further, in the step (2), word Embedding is a technique of mapping feature vectors to low-dimensional space vectors, discrete feature vectors may be converted into continuous vector representations, general word Embedding processing is performed for category features, a separate full-connection layer is used for numerical features, each numerical feature has ReLU nonlinearity, so that 1-dimensional input is projected into d-dimensional space, and then Embedding of category features and numerical features is performed in a first dimension.
Further, in the step (3), the neural network model includes the following parts:
(3-1) the neural network fransformer+ is improved over the original converter fransformer by adding a Leaky Gate before the original converter fransformer encoder and adding an mlp+ neural network after, the Leaky Gate being a combination of two simple elements, namely a linear transformation based on element level and a Leaky relu activation function;
(3-2) the neural network mlp+ is improved relative to the multi-layer perceptron MLP by starting with a sub-block of the multi-layer perceptron MLP, replacing the normal Batch normalization Batch Norm with a Ghost normalization Ghost Batch Norm (GBN), adding a linear jump layer on the right side of the sub-block, the jump layer being just a fully connected linear layer, then a LeakyRelu activation function, and finally adding a Leaky Gate before the multi-layer perceptron MLP sub-block and the linear jump layer. Ghost normalized Ghost Batch North (GBN) allows training using large volumes of data, and one great motivation for the present invention to use Ghost normalized Ghost Batch North (GBN) is to expedite training.
The invention provides a general modeling method for structured data based on a transducer, which is characterized in that the transducer is adopted to process category characteristics and numerical characteristics simultaneously, and the transducer and a multi-layer perceptron MLP are fused into a model on the premise of fully retaining the performance of the transducer model, rather than respectively giving weighted voting after category prediction, so that the model can be optimized by introducing a loss function in end-to-end training, and the recognition capability of the model is effectively enhanced. Compared with the prior art, the invention has the positive effects that:
1. the invention proposes a data processing method that enters class features and numerical features together into a transducer, which means that any information of the correlation between class features and numerical features is not lost.
2. The invention provides a general modeling method for structured data based on a Transformer, which can effectively fuse a simpler MLP neural network with a more complex Transformer neural network based on attention, thereby learning category characteristics and numerical characteristics.
3. The invention adopts seven public data sets of the public data set of the fault, the blast, the shrutime and the like to evaluate the proposed new model, and experimental results show that the method of the invention is superior to other advanced methods under the two classification scenes.
Drawings
FIG. 1 is an overall framework of the method of the present invention.
FIG. 2 is a process flow of the MLP+.
Detailed description of the preferred embodiments
The technical scheme of the invention will be clearly and completely described below with reference to the drawings in the embodiments of the application. The specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1
Fig. 1 is an overall architecture diagram of the present invention, which is a general modeling method for structured data based on a transducer, and specifically includes the following steps:
step (1) input characteristic processing;
in the data layer, public data sets such as an add, a blast, a spambise and the like are used, some data sets only have numerical characteristics, some data sets contain both numerical characteristics and category characteristics, and meanwhile, the data is divided into a training set and a testing set. For different data sets, we use a priori knowledge to cull out a portion of the useless features. Since most class features are in the form of strings, they are encoded into a digital (1, 2, 3) form that the model can recognize; for numerical features, scaling is performed by taking a normalization operation.
For the original data (comprising a training set and a test set), unnecessary features are removed, numeric codes are carried out on the category type features, and normalization processing is carried out on the numeric features, so that a data set D= { (x) is obtained i ,y i ),y i E 0, classnum), i=1, 2,3, and N (where x i Is the eigenvector of each sample, y is x i Corresponding tag, classnum is the number of categories, N is the number of samples), distinguish different types of features, divide the data into category type features x cat And numerical feature x cont
Step (2) embedding category characteristics and numerical characteristics;
the embedding layer E embeds each feature into d-dimensional space, and in order to effectively process table data, the invention discriminates discrete type features and continuous numerical features. The invention obtains a new category characteristic by word Embedding technologyAn embedded representation, a new embedded representation of the numerical feature is obtained by using the fully connected layer,is a single sample with class or numerical features, the embedding layer e uses different embedding functions for different types of features for a given +.>Obtain->Then splice in the feature dimension, E Φ (X) is the result of all features being represented by the embedding.
E Φ (x)={e Φ1 (x 1 ),...,e ΦN (x N )} (1)
Step (3), inputting the characteristic vector embedded into the model;
(3-1) the feature vector outputted in the previous step is first entered into a leakage Gate, which is a combination of two simple elements, one linear transformation at element level, followed by a leakage Relu activation function that will let any positive value pass without change and compress any negative value to almost zero, in other words, if w i And b i Is the linear layer parameter of the ith column, then the leakage Gate leak Gate of the ith column is:
the Leaky Gate is intended to act as a simple filter with different behavior for each column, with or without masking or passing depending on each individual value.
The converter layer takes as input the output of the Leaky Gate and passes the output to the second converter layer and so on, as shown in fig. 1, the output of the last converter layer will be input directly to the mlp+ neural network (modified multi-layerPerceptron MLP), the MLP+ neural network is shown in FIG. 2, and the output value y of the model is obtained Transformer+ . Wherein θ is 1 ,θ 2 And theta 3 The model parameters of the Leaky Gate, transducer, mlp+ neural network, respectively.
y Transformer+ (x)=M(f transformer (G Θ (E Φ (x);θ 1 );θ 2 );θ 3 ) (3)
(3-2) similarly, the feature vector outputted in the previous step is inputted into an MLP+ neural network (right branch of FIG. 1) to obtain an output value y of the model mlp+
y mlp+ =M(E Φ (X);θ 1 ) (4)
Step (4) merging left and right branches;
specifically, to combine the improved converter and the improved multi-layer perceptron MLP to obtain predictions of the overall model and perform end-to-end training, the present invention assigns different weights w to the output values of the two modules 1 And w 2 (the two weights can be obtained by back propagation training learning), and the prediction probability of the final model outputAs in equation (5), σ represents the activation function (two categories are sigmoid, multiple categories are softmax).
Step (5) training a classification model based on Focal Loss;
the preprocessed data is utilized to train the model, focal Loss is used as a Loss function to guide the training process, so that the model can pay more attention to few types of samples which are difficult to classify, and deviation caused by the majority of types is reduced.
According to equation (5), the loss of the model can be expressed as equation (6),representing the loss function, y is the true label of sample x.
In order to solve the problem of unbalanced data types, the invention adopts the idea of cost sensitivity, and introduces Focal Loss Focal Loss as a Loss function of the model. Focal Loss was originally used to solve the problem of class imbalance in the object detection task and was an improvement over conventional cross entropy Loss. The invention introduces the method into the form classification field. For the two-classification problem, the Focal Loss Focal Loss can be expressed in the form of equation (7), whereIs a probability prediction defined in equation (5), y i Is a label of the input sample, alpha is a balance factor, and gamma is equal to or greater than 0 and is called a focusing parameter.
For the multi-classification problem, one-to-many ideas can be taken to extend equation (7) to equation (8), where y is the one-hot representation of the class label,probability outputs in the form of (m, n) (m is the number of samples and n is the number of categories).
Based on the loss functions defined by the formulas (7) and (8), end-to-end model training can be performed, and a gradient shelving method is adopted to select a model with minimum loss.
Example 2
The invention provides a commodity recommendation method based on a transform structured data general modeling method.
FIG. 1 is a diagram of the overall architecture of the present invention, and the method comprises the following specific steps:
and (3) inputting feature processing.
In an application scenario of a recommendation system, taking a commodity recommendation system as an example, a general modeling method of structured data based on a Transformer is used for classifying the commodity recommendation system according to the behavior of a user, so as to recommend commodities of corresponding types. At the data level, an online_choppers is used to disclose a data set, wherein the data set contains both numerical characteristics and category characteristics, and meanwhile, the data is divided into a training set and a testing set. For the present dataset, we use a priori knowledge to cull out a portion of the useless features. Since most class features are in the form of strings, they are encoded into a digital (1, 2, 3) form that the model can recognize; for numerical features, scaling is performed by taking a normalization operation.
For the original data (comprising a training set and a test set), unnecessary features are removed, numeric codes are carried out on the category type features, and normalization processing is carried out on the numeric features, so that a data set D= { (x) is obtained i ,y i ),y i E 0, classnum), i=1, 2,3, and N (where x i Is the eigenvector of each sample, y is x i Corresponding tag, classnum is the number of categories, N is the number of samples), distinguish different types of features, divide the data into category type features x cat And numerical feature x cont
And (3) embedding category characteristics and numerical characteristics.
The embedding layer E embeds each feature into d-dimensional space, and in order to effectively process table data, the invention discriminates discrete type features and continuous numerical features. The invention obtains a new embedded representation of the category characteristics through the word Embedding technology, obtains a new embedded representation of the numerical characteristics through using the full connection layer,is a single sample with class or numerical features, the embedding layer e uses different embedding functions for different types of features for a given +.>Obtain->Then splice in the feature dimension, E Φ (X) is the result of all features being represented by the embedding:
E Φ (x)={e Φ1 (x 1 ),...,e ΦN (x N )} (1)
and (3) inputting the characteristic vector embedded into the model.
(3-1) the feature vector outputted in the previous step is first entered into a leakage Gate, which is a combination of two simple elements, one linear transformation at element level, followed by a leakage Relu activation function that will let any positive value pass without change and compress any negative value to almost zero, in other words, if w i And b i Is the linear layer parameter of the ith column, then the leakage Gate leak Gate of the ith column is:
the Leaky Gate is intended to act as a simple filter with different behavior for each column, with or without masking or passing depending on each individual value.
The converter layer takes the output of the Leaky Gate as input and passes the output to the second converter layer, and so on, as shown in FIG. 1, the output of the last converter layer will be directly input to an MLP+ neural network (modified multi-layer perceptron MLP), as shown in FIG. 2, resulting in the output value y of the neural network Transformer+ . Wherein θ is 1 ,θ 2 And theta 3 Leak gate Gate, converter, model parameters of mlp+ neural network.
y Transformer+ (x)=M(f transformer (G Θ (E Φ (x);θ 1 );θ 2 );θ 3 ) (3)
(3-2) similarly, the feature vector outputted in the previous step is inputted into an MLP+ neural network (right branch of FIG. 1) to obtain an output value y of the neural network mlp+
y mlp+ =M(E Φ (X);θ 1 ) (4)
And (4) fusing the left branch and the right branch.
Specifically, to combine the improved transform and the improved multi-layer perceptron MLP to obtain predictions of the overall model and perform end-to-end training, the present invention assigns different weights w to the output values of the two modules 1 And w 2 (the two weights can be obtained by back propagation training learning), and the prediction probability of the final model outputAs in equation (5), σ represents the activation function (two categories are sigmoid, multiple categories are softmax).
And (5) training a classification model based on Focal Loss Focal Loss.
The preprocessed data is utilized to train the model, focal Loss is used as a Loss function to guide the training process, so that the model can pay more attention to few types of samples which are difficult to classify, and deviation caused by the majority of types is reduced.
According to equation (5), the loss of the model can be expressed as equation (6),representing the loss function, y is the true label of sample x.
In order to solve the problem of unbalanced data types, the invention adopts the idea of cost sensitivity, and introduces Focal Loss Focal Loss as a Loss function of the model. Focal Loss was originally used to solve the problem of class imbalance in the object detection task and was an improvement over conventional cross entropy Loss. The invention introduces the method into the form classification field. For the two-classification problem, the Focal Loss Focal Loss can be expressed in the form of equation (7), whereIs a probability prediction defined in equation (5), y i Is a label of the input sample, alpha is a balance factor, and gamma is equal to or greater than 0 and is called a focusing parameter.
For the multi-classification problem, one-to-many ideas can be taken to extend equation (7) to equation (8), where y is the one-hot representation of the class label,probability outputs in the form of (m, n) (m is the number of samples and n is the number of categories).
Based on the loss functions defined by the formulas (7) and (8), end-to-end model training can be performed, and a gradient shelving method is adopted to select a model with minimum loss.
And (6) inputting the user characteristics into the model to realize commodity recommendation.
When the commodity recommendation system acquires the user behavior or adds or modifies the behavior based on the original behavior, the commodity recommendation system inputs the newly constructed user behavior into the model to obtain a new classification result, and then recommends the corresponding commodity.
The numerical feature and category feature embedding module is used for collecting feature vectors embedded by new user behaviors and used for later model input.
And the feature vector input model module for embedding and outputting is used for carrying out parameter adjustment on the new feature vector input model.
The focus Loss Focal Loss-based training classification model module is used for training a new model after parameter changes.
It will be appreciated by persons skilled in the art that the foregoing description is a preferred embodiment of the invention, and is not intended to limit the invention, but rather to limit the invention to the specific embodiments described, and that modifications may be made to the technical solutions described in the foregoing embodiments, or equivalents may be substituted for elements thereof, for the purposes of those skilled in the art. Modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (7)

1. The general modeling method for the structured data based on the Transformer is characterized by comprising the following steps of:
(1) The input public data set is subjected to characteristic processing: after the original data is obtained, irrelevant features are required to be removed, category features in the data are encoded into identifiable digital forms, and the digital features are scaled according to standardized operation;
(2) Word Embedding is carried out on the feature vectors after feature processing, and the feature vectors are embedded into the Embedding: projecting the high-dimensional discrete data of the numerical features and the class features into a low-dimensional dense d-dimensional space by word Embedding before passing through an encoder of a transfomer+neural network;
(3) Inputting the word embedded coding vector obtained in the step (2) into two branches of a model: the model is divided into two branches of a transducer+neural network and an MLP+neural network, the feature vector of training data after word Embedding is input into the transducer+neural network for learning, the original output of the neural network is obtained, the same input is input into the MLP+neural network for modeling learning, a trained MLP+neural network is obtained, the transducer+neural network and the MLP+neural network are fused into a classification model, so that the two parts of original outputs are weighted and summed to form the integral output value of the model, and then the integral prediction result given by the classification model is obtained through an activation function;
(4) Training is guided by adopting Focal Loss as an objective function: training the classification model by utilizing the preprocessed training data, guiding the training process by adopting Focal Loss Focal Loss as an objective function, and searching the optimal parameters to obtain a trained classification model;
(5) Receive other tabular data for prediction: and preprocessing the form data to be classified, and inputting the form data to the trained classification model for classification prediction.
2. The method of claim 1, wherein the method of input feature processing of step (1) comprises the steps of:
(1-1) culling the useless features: carrying out feature recognition on each data set according to priori knowledge, and eliminating useless features;
(1-2) processing the sequential features: the continuous features are standardized by a standard scaler, and the numerical features are scaled;
(1-3) processing category characteristics: the class features are coded into digital form by a label coder LabelEncoder, so that the calculation cost is increased in order to avoid sparse coding, and single-hot one-hot coding is not carried out.
3. The method of claim 1, wherein the word Embedding method of step (2) is a technique of mapping feature vectors to low dimensional space vectors, wherein discrete feature vectors are converted into continuous vector representations, wherein general word Embedding method processing is performed for category features, wherein a separate full-connected layer is used for numerical features, wherein each numerical feature has ReLU nonlinearity, thereby projecting 1-dimensional input into d-dimensional space, and wherein Embedding of category features and numerical features is performed in a first dimension.
4. A method as claimed in claim 3, wherein: the embedding of the category characteristics and the numerical characteristics in the first dimension is connected, and the method specifically comprises the following steps: the embedding layer E embeds each feature into d-dimensional space, and in order to effectively process table data, the invention discriminates discrete type features and continuous numerical features. The invention obtains new embedded representation of category characteristics through word Embedding technology, obtains new embedded representation of numerical characteristics through using a full connection layer, and obtains x i =[f i {1} ,f i {2} ,...,f i {n} ]Is a single sample with class or numerical features, the embedding layer e uses different embedding functions for different types of features, for a givenObtain->Then splice in the feature dimension, E Φ (X) is the result of all features being represented by the embedding:
E Φ (x)={e Φ1 (x 1 ),...,e ΦN (x N )} (1)
5. the method of claim 1, wherein the model of step (3) comprises the following parts:
(3-1) the neural network fransformer+ is improved over the converter fransformer by adding a Leaky Gate before the encoder of the converter fransformer and adding an mlp+ neural network after, the Leaky Gate being a combination of two simple elements, namely a linear transformation based on element level and a Leaky relu activation function;
(3-2) the neural network mlp+ is improved relative to the multi-layer perceptron MLP by starting with a sub-block of the multi-layer perceptron MLP, replacing the normal Batch normalization Batch Norm with a Ghost normalization Ghost Batch Norm (GBN), adding a linear jump layer on the right side of the sub-block, the jump layer being just a fully connected linear layer, then a LeakyRelu activation function, and finally adding a Leaky Gate before the multi-layer perceptron MLP sub-block and the linear jump layer.
6. The method of claim 5, wherein: the steps (3-1) and (3-2) specifically comprise: (3-1) the feature vector outputted in the previous step is first entered into a leakage Gate, which is a combination of two simple elements, one linear transformation at element level, followed by a leakage Relu activation function that will let any positive value pass without change and compress any negative value to almost zero, in other words, if w i And b i Is the linear layer parameter of the ith column, then the leakage Gate leak Gate of the ith column is:
the Leaky Gate is intended to act as a simple filter, with different behavior for each column, with or without masking or passing depending on each individual value;
the converter layer takes the output of the Leaky Gate as input and passes the output to the second converter layer, and so on, as shown in FIG. 1, the output of the last converter layer will be directly input to the MLP+ neural network (modified multi-layer perceptron MLP), which is shown in FIG. 2, to obtain the model output value y Transformer+ The method comprises the steps of carrying out a first treatment on the surface of the Wherein θ is 1 ,θ 2 And theta 3 The model parameters of the Leaky Gate, transducer, mlp+ neural network are:
y Transformer+ (x)=M(f transformer (G Θ (E Φ (x);θ 1 );θ 2 );θ 3 ) (3)
(3-2) similarly, the feature vector outputted in the previous stepInto MLP+ neural network (right branch of FIG. 1) to obtain model output value y mlp+
y mlp+ =M(E Φ (X);θ 1 ) (4)
7. The method of claim 1, wherein: the step (4) specifically comprises: to combine the improved converter and the improved multi-layer perceptron MLP to obtain predictions of the whole model and perform end-to-end training, the output values of the two modules are assigned different weights w 1 And w 2 (the two weights can be obtained by back propagation training learning), and the prediction probability of the final model outputAs in equation (5), σ represents an activation function, two is classified as sigmoid, and multiple is classified as softmax:
CN202310239904.9A 2023-03-07 2023-03-07 Transformer-based structured data general modeling method Pending CN116467930A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310239904.9A CN116467930A (en) 2023-03-07 2023-03-07 Transformer-based structured data general modeling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310239904.9A CN116467930A (en) 2023-03-07 2023-03-07 Transformer-based structured data general modeling method

Publications (1)

Publication Number Publication Date
CN116467930A true CN116467930A (en) 2023-07-21

Family

ID=87183209

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310239904.9A Pending CN116467930A (en) 2023-03-07 2023-03-07 Transformer-based structured data general modeling method

Country Status (1)

Country Link
CN (1) CN116467930A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116663516A (en) * 2023-07-28 2023-08-29 深圳须弥云图空间科技有限公司 Table machine learning model training method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116663516A (en) * 2023-07-28 2023-08-29 深圳须弥云图空间科技有限公司 Table machine learning model training method and device, electronic equipment and storage medium
CN116663516B (en) * 2023-07-28 2024-02-20 深圳须弥云图空间科技有限公司 Table machine learning model training method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113792113A (en) Visual language model obtaining and task processing method, device, equipment and medium
CN112749274B (en) Chinese text classification method based on attention mechanism and interference word deletion
CN112818861A (en) Emotion classification method and system based on multi-mode context semantic features
CN111597340A (en) Text classification method and device and readable storage medium
CN109816032A (en) Zero sample classification method and apparatus of unbiased mapping based on production confrontation network
CN112381763A (en) Surface defect detection method
CN114743020A (en) Food identification method combining tag semantic embedding and attention fusion
JP7139749B2 (en) Image recognition learning device, image recognition device, method, and program
CN112163092A (en) Entity and relation extraction method, system, device and medium
CN114818703B (en) Multi-intention recognition method and system based on BERT language model and TextCNN model
CN116310850B (en) Remote sensing image target detection method based on improved RetinaNet
CN114757183B (en) Cross-domain emotion classification method based on comparison alignment network
CN116467930A (en) Transformer-based structured data general modeling method
CN112786160A (en) Multi-image input multi-label gastroscope image classification method based on graph neural network
CN117217368A (en) Training method, device, equipment, medium and program product of prediction model
CN114036298A (en) Node classification method based on graph convolution neural network and word vector
CN117951092A (en) Multi-mode information fusion-based electronic archive image multi-stage classification method and device
CN111768803B (en) General audio steganalysis method based on convolutional neural network and multitask learning
CN116206227B (en) Picture examination system and method for 5G rich media information, electronic equipment and medium
CN117191268A (en) Oil and gas pipeline leakage signal detection method and system based on multi-mode data
CN117114705A (en) Continuous learning-based e-commerce fraud identification method and system
CN117011219A (en) Method, apparatus, device, storage medium and program product for detecting quality of article
CN114741487B (en) Image-text retrieval method and system based on image-text semantic embedding
CN115346132A (en) Method and device for detecting abnormal events of remote sensing images by multi-modal representation learning
CN115098681A (en) Open service intention detection method based on supervised contrast learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination