CN116340455A

CN116340455A - Method for extracting design standard entity relation of high-speed train bogie

Info

Publication number: CN116340455A
Application number: CN202310333688.4A
Authority: CN
Inventors: 龚天翼; 蒋妍; 贺凌峰; 杜佳雯; 尹新宇; 王淑营
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2023-03-31
Filing date: 2023-03-31
Publication date: 2023-06-27

Abstract

The invention relates to a method for extracting a standard entity relation of a bogie design of a high-speed train, which is based on the existing BERT-BiLSTM model, and provides an improved RoBERTa-wwm-BISRU model based on a double-attention-focusing mechanism and optimized aiming at a bogie data set to form a BRENT model. The dual focus mechanism was combined with the RoBERTa-wwm-BISRU model and model loss functions were optimized for the bogie data set architecture. The addition of the two concentration layers can enable the model to improve the understanding capability of context semantic, and the improved RoBERTa-wwm model can improve word segmentation and coding capability, so that the entity relation extraction performance in the small data set is improved, the model is optimized for the bogie data set, the model loss function is improved, and the accuracy of model prediction is improved.

Description

Method for extracting design standard entity relation of high-speed train bogie

Technical Field

The invention relates to a design standard entity relation extraction method, in particular to a high-speed train bogie design standard entity relation extraction method.

Background

The processes of designing, manufacturing, operating and maintaining the bogie of the high-speed train involve a great deal of data and knowledge, and the data are scattered in different service systems, including technical standards, design parameters, process parameters, fault information and the like. There are complex relationships between these data, such as relationships between manufacturing parameters and fault information, relationships between design parameters and technical standards, and so forth. By extracting the relation among entities, useful information can be extracted from a large amount of data, the association relation among the information is revealed, and important reference basis is provided for the aspects of design, manufacture, operation and maintenance of the bogie of the high-speed train. Therefore, it is very important to perform relation extraction in the field of high-speed train bogies. Most of current scholars research is focused on an entity relation extraction algorithm of BERT-LSTM, the principle of which is mainly to use BERT or a language model based on BERT improvement to extract text features to obtain a word granularity vector matrix, and then use a long-short-term memory network to extract the relation between words and words between an input sentence and a context, so as to output a relation label. Current research is directed to model improvement from how parameters are reduced using BERT models, added attention, and using bi-directional long-short term neural networks. However, the improvement of the algorithms needs to be trained based on massive data sets, and no research on a physical relationship extraction method which is specially used for extracting small data volume of the data sets in the field of high-speed train bogies is proposed.

Through effectively utilizing and analyzing data in the processes of designing, manufacturing and the like of the bogie of the high-speed train, important value can be provided for optimizing the performance of the bogie. However, the multisource heterogeneity of the data accumulated in each stage and the lack of the proprietary corpus lead to the lack of integrity and consistency of the data, which leads to inaccuracy of entity relation extraction and makes the value of the data difficult to be effectively reflected.

Disclosure of Invention

The invention provides the following technical scheme:

a method for extracting the standard entity relation of the bogie design of high-speed train includes such steps as S1-S2,

s1: extracting text data, performing data enhancement by using synonym substitution and Chinese-English-Chinese translation, and manufacturing a data set;

s2: and constructing a network model, and training an entity relation extraction model.

Step S2 includes steps S21-S25:

s21: constructing an improved RoBERTa-wwm pre-training model;

s22: constructing a self-concentration mechanism layer;

s23: constructing a bidirectional SRU network module;

s24: constructing a multi-head concentration mechanism layer;

s25: the loss function of the model is improved.

In step S21, constructing a modified RoBERTa-wwm pre-training model, wherein a backbone network of the RoBERTa model uses a transducer encoder and a GELU nonlinear activation function; the embedding size of the appointed word is E, the number of layers of the encoder is L, and the size of the hidden layer is H; setting the hidden layer size H to 1024, setting the number of attention heads to 16, setting the coding layer size to 24, and dividing the data set into 2K for each training; roBERTa uses a hybrid coding of character and word level to hybrid code the input sentence for learning high frequency byte combinations.

In step S22, the self-attention mechanism includes three parts: the output word vector of RoBERTa-wwm, the Tanh function and the Softmax activation function; the self-concentration mechanism network structure calculation process adopted is as shown in the formulas (3) to (5):

M＝tanh(H) (3)

r＝Hα ^T (5)

wherein: in the formula (3): the H-word vector matrix is the output of the first layer RoBERTa-wwm, with dimensions n×Sw, where n is the dimension of the word vector matrix and Sw is the length of the word vector sequence;

in the formula (4): alpha is a word vector matrix activated by a softmax activation function, and T represents the transpose of the matrix;

in formula (5): r is the output sentence weight vector;

the H matrix firstly obtains a word vector matrix M through a Tanh activation function, then obtains a weight matrix through a softmax layer, at the moment, the sum of all weights in alpha is added to be 1, and the H word vector matrix is multiplied by a normalized weight matrix to obtain a corresponding weighted global time sequence feature vector;

in step S23, using a bidirectional SRU network, the forward SRU network and the backward SRU network are connected, so that the model can acquire both forward semantic information and backward semantic information;

in step S24, the multi-head attention construction process is as follows: is provided with an input sequence X= [ X ] ₁ ,x ₂ ,...,x _n ]Wherein n represents the sequence length, each x _i Is a vector representation; the multi-headed attention mechanism maps the input sequence to h subspaces, each subspace having a different weight matrix W ^Q ,W ^K ,W ^V Respectively used for calculating inquiry, keys and value vectors; for each subspace i, a query vector Q is calculated _i ＝XW _i ^Q Key vector K _i ＝XW _i ^K Sum vector V _i ＝XW _i ^V The method comprises the steps of carrying out a first treatment on the surface of the Wherein W is _i ^Q ,W _i ^K ,W _i ^V Is d _model ×d _k Matrix d of (d) _model Is the dimension of the representation vector in the model, d _k Is the dimension of the key vector; then, the attention score of the i subspace is obtained by dot-product of the query vector and the key vector

And scaling the attention score; next, the attention score S _i Normalization is carried out through a Softmax function to obtain the attention weight W _i ＝softmax(S _i ) Representing the contribution of each location to subspace i; finally, the value vector V _i And attention weight W _i Weighted average is performed to obtain the output of multiple-head attention +.>

Finally, output O of a plurality of subspaces ₁ ,O ₂ ,...,O _h Spliced to form a final output vector O= [ O ] ₁ ,O ₂ ,...,O _h ]Wherein [ O ] ₁ ,O ₂ ,...,O _h ]A concatenation operation representing vectors;

s25: and extracting a data set aiming at the constructed high-speed train bogie relation, and improving the loss function of the whole BRENT model to enable the pre-training model to better adapt to the characteristics of the high-speed train bogie data set, wherein the improved loss function is shown in a formula (6):

wherein: y is _i Representing the actual value of the current,

representing the predicted value, c represents the length of the sentence sequence, so that after improvement, the model can pay more attention to the effect of the original variable, namely, the original variable and the predicted variable jointly influence the predicted result.

Step S1 specifically comprises steps S11-S14;

step S11 is to extract data, including the following steps: and collecting a high-speed train bogie knowledge text, and acquiring the text as a data source through a network.

Step S12 is data enhancement; the method comprises the following steps:

(1) Data enhancement using synonym replacement, the method will replace a random number of words in a sentence, change these words to their synonyms, the method builds a new sentence by changing the synonym words in the sentence; (2) Translating the text into different languages, translating the text into Chinese, and increasing the details of corpus data; (3) Text entity extraction is performed by using an ALBERT-BILSTM+CNN-CRF model, features are extracted by using BILSTM and CNN in parallel, and then the extracted entity is marked.

Step S13: and (3) carrying out relationship labeling on the data generated in the step (S12) by using a manual labeling mode on the data set after the entity extraction is completed, and labeling the data as a supervised learning data set format to finally form the data set required by the invention.

Step S14: and controlling the sentence length, and if the length of the input sentence is less than 20 words, adding 0 complement at the two ends of the sentence to ensure that the length of the sentence reaches 20 words.

Compared with the prior art, the invention has the beneficial effects that:

(1) The inventor finds that the traditional BERT-BILSTM model in experiments, relies on a large number of data sets for training and has huge training parameters. In this regard, the invention discloses a method for extracting the design standard entity relationship of a high-speed train bogie. Wherein the method comprises a Roberta-wwm-Self attribute-BiSRU-Multi-head attribute model based on a dual focus mechanism and Roberta modifications for extracting relationships between bogie design criteria data entities. The method comprises the steps that a model firstly utilizes a RoBERTa-wwm pre-training model to convert sentence semantics into text vectors, then dynamically adjusts according to bogie field design standard dataset characteristics to obtain dynamic word vectors, then respectively sends the word vectors into a self-attention concentration mechanism layer to extract weighted semantic information, then the word vector data with weights flows through a BiSRU layer, the layer can fully utilize the advantages of GPU parallel computing, context semantic information is obtained, and finally the model can pay attention to information of different positions and different aspects at the same time through a multi-head attention concentration layer, so that the expressive force of the model is improved. The model fused with the double-concentration mechanism can fully extract the context semantic and the information of different positions while improving the operation speed, and the acquisition capability of the model on the entity relationship is greatly improved.

(2) The inventor finds in experiments that the original BERT model uses a word mask mechanism to mask part of text information, and the mask mode is good in English data set, but in Chinese data set, word mask can divide Chinese vocabulary into individual words, so that semantic information is lost. Therefore, the invention improves the word mask, uses word mask technology, namely, when mask training is carried out, firstly uses a language model of the Harbin industrial university to segment the input sentence, and then carries out mask training on different vocabularies, thus avoiding model under fitting phenomenon caused by semantic loss while expanding a data set.

(3) The inventor finds that the complex relation extracting capability of a bogie relation extracting data set formed by using a pipeline mode is weaker when an entity relation extracting task in the bogie field is carried out in an experiment in the prior art. Therefore, a double-concentration mechanism is added for extracting the context semantic and information of different positions, and the context semantic and the information of different positions are fused, and an improved loss function is additionally used, so that the original information quantity is added in the model during evaluation, and the prediction performance of the model is improved.

(4) The inventor finds that pure convolutional neural networks such as TextCNN have the problem of insufficient coding of the grammar information in experiments, and if RNN or LSTM is used, parallel operation cannot be performed on the GPU, so that the problem of low operation efficiency is caused. In this regard, the present invention proposes that the dependency relationship between words is solved by using the BiSRU, and the bidirectional network can extract semantic information of the context and combine the position information, vocabulary, syntax and semantic information, and then uses the multi-head attention mechanism to focus on learning the characteristic information inside the sequence. Meanwhile, the SRU network is used for solving the problem that the model cannot operate in parallel on the GPU, and the calculation efficiency of the model is greatly improved by introducing parallel operation.

(5) The inventor controls the sentence length in the model input sentence, limits the length of the input entity relation sentence with label to 20 words, calculates according to our statistics, extracts the data set in the constructed high-speed train bogie relation, the average length of the entity relation sentence is about 16 words, the phenomenon that the characteristics disappear in Bi-SRU layer can be effectively avoided by using the short word sentence, if the length of the sentence is less than 16 words, equal length zero filling operation is carried out at two ends of the sentence, and the length of the sentence is supplemented to 20 words.

(6) The inventor finds that the problem of insufficient design standard in the bogie design field in experiments, and in this way, the invention proposes to increase the existing corpus data set by using a synonym and Chinese-English-Chinese translation method, thereby preventing model fitting phenomenon caused by too little corpus and improving the accuracy of bogie model entity extraction.

Description of the drawings:

FIG. 1 is a flow chart of the overall structure of a model network;

FIG. 2 is a schematic diagram of a model calculation flow;

FIG. 3 is a schematic diagram of a first layer self-focusing mechanism;

FIG. 4 is a schematic diagram of a bidirectional SRU network architecture;

FIG. 5 is a schematic diagram of a second-layer multi-head attention mechanism.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. It will be apparent that the described embodiments are some, but not all, embodiments of the invention.

Thus, the following detailed description of the embodiments of the invention is not intended to limit the scope of the invention, as claimed, but is merely representative of some embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that, under the condition of no conflict, the embodiments of the present invention and the features and technical solutions in the embodiments may be combined with each other.

The invention discloses a method for extracting a standard entity relation of a bogie design of a high-speed train, in particular to an improved RoBERTa-wwm-BISRU model based on a double-Attention-focusing mechanism and optimized for a bogie data set based on an existing BERT-BiLSTM model, and a RoBERTa-wwm +self-attention+BiSRU+multi-Head Attention model (BRENT) is formed. The model innovatively combines the dual focus mechanism with the RoBERTa-wwm-BISRU model and optimizes the model loss function for the bogie data set architecture. The model firstly uses the RoBERTa-wwm model to pretrain, the sentence semantics are converted into text vectors through the whole word mask technology and the optimized BERT model, then dynamic word vectors of sentences are obtained through dynamic adjustment according to the characteristics of a bogie field design standard data set, finally word vectors are respectively sent into a self-focusing mechanism layer, the two-way SRU layer and the multi-head focusing layer extract characteristic information, the model can improve the understanding capability of context semantics due to the addition of the two focusing layers, and the improved RoBERTa-wwm model can improve the word segmentation and coding capability, so that the entity relation extraction performance in a small data set is improved, the model is optimized for the bogie data set, the model loss function is improved, and the accuracy of model prediction is improved.

The invention comprises the following steps:

s1: text data are extracted, data enhancement is carried out by using synonym substitution and Chinese-English-Chinese translation, and a data set is manufactured.

Preferably, step S1 includes: text data is extracted, data enhancement is performed, and a data set is manufactured. And acquiring unlabeled corpus in the appointed field, and acquiring data required by training. After the data is enhanced in various modes, the relationship between the entities in each sentence is marked. The invention extracts named entity first and then marks the entity relation data set to form the data set used by the invention. Step S1 specifically includes steps S11-S14.

Preferably, step S11 is to extract data. The method comprises the following steps: and collecting a high-speed train bogie knowledge text, and acquiring a large amount of text as a data source through a network. And the obtained text is good and uneven, and various text data related to the required high-speed rail bogie are obtained after manual inspection and cleaning.

Preferably, step S12 is data enhancement. The method comprises the following steps:

(1) The method is characterized in that synonym replacement is used for data enhancement, random number of words in a sentence are replaced, the words are changed into synonyms of the words, and a new sentence is constructed by changing the synonyms in the sentence, so that the purpose of enhancing the data quantity is achieved; (2) Translating the text into different languages, translating the text into Chinese, and increasing the details of corpus data; (3) The text entity extraction is carried out by using an ALBERT-BILSTM+CNN-CRF model, the model uses BILSTM and CNN to extract features in parallel, coarse granularity features and fine granularity features of sentences can be fused, the accuracy of entity extraction is improved, and then the extracted entity is marked.

Preferably, step S13: and (3) carrying out relationship labeling on the data generated in the step (S12) by using a manual labeling mode on the data set after the entity extraction is completed, and labeling the data as a supervised learning data set format to finally form the data set required by the invention.

Preferably, step S14: and controlling the sentence length, and if the length of the input sentence is less than 20 words, adding 0 complement at the two ends of the sentence to ensure that the length of the sentence reaches 20 words. Specifically, assuming that the sentence length is n, the number of both terminal complements 0 is represented by formula (1) if n is even, and the number of both terminal complements zero is represented by formula (2) if n is odd.

Preferably, in step S2, the entity relationship extraction model training is performed based on the data generated in S1 and the trained language model. In the training process of the entity relation extraction model, the parameters of the language word segmentation coding model are also regarded as the parameters of the upper-layer entity relation extraction model to be trained together. In the step S2, a BRENT entity relation extraction model network is used, an entity relation extraction model is trained on the basis of manually marked entity relation extraction data, and then parameters are updated through gradient descent, and the training process is repeated, so that a loss function can be continuously reduced, and a better and better prediction result is obtained.

Step S2 includes steps S21-S25:

s21: constructing an improved RoBERTa-wwm pre-training model;

s22: constructing a self-concentration mechanism layer;

s23: constructing a bidirectional SRU network module;

s24: constructing a multi-head concentration mechanism layer;

s25: the loss function of the model is improved.

Preferably, S21: an improved Roberta-wwm pre-training model was constructed. The backbone network of the RoBERTa model uses a transducer encoder and a gel nonlinear activation function. The embedded size of the appointed word is E, the number of encoder layers is L, and the size of the hidden layer is H. The invention sets the hidden layer size H as 1024, the number of attention heads as 16, the coding layer size as set bit 24, the size of the divided data set for each training is 2K, and the parameter is set to indicate that the RoBERTa model can have better performance when being trained by using larger training parameters, thereby playing better word segmentation capability on the extracted data set of the self-built bogie design standard relation. Meanwhile, roBERTa performs mixed coding on the input sentence by using a character and word level mixed coding mode, and is used for learning high-frequency byte combinations. Meanwhile, in order to cope with Chinese corpus, the RoBERTa model uses a full-word mask training technology (wwm) and uses a word segmentation model (Language Technology Platform, LTP) of Harbin university to segment the center Wen Yugou, so that the phenomenon of semantic loss caused by segmentation of Chinese words when training by using a mask technology can be avoided.

Preferably, S22: a self-focusing mechanism layer is built. As shown in fig. 3, the self-attention mechanism consists of three parts: the output word vector of RoBERTa-wwm, the Tanh function and the Softmax activation function. The invention adopts a self-attention mechanism to process the output of RoBERTa-wwm, so that the model attention is more focused on the entity to be extracted, and the calculation process of the network structure of the self-attention mechanism adopted by the invention is shown in the formulas (3) to (5):

M＝tanh(H) (3)

r＝Hα ^T (5)

in formula (5): r is the output sentence weight vector;

the H matrix firstly obtains a word vector matrix M through a Tanh activation function, then obtains a weight matrix through a softmax layer, at the moment, the sum of all weights in alpha is added to be 1, and the H word vector matrix is multiplied by a normalized weight matrix to obtain a corresponding weighted global time sequence feature vector.

Preferably, S23: and constructing a bidirectional SRU network module. The method comprises the following steps: the bidirectional SRU network is used, and the forward SRU network and the backward SRU network are connected, so that the model can acquire both forward semantic information and backward semantic information.

The invention uses the improved network SRU of LSTM, which can solve the parallel computing problem, thereby improving the operation efficiency. SRU is prior art and will not be described in detail herein. The invention uses the bidirectional SRU network, as shown in figure 4, the forward SRU and the backward SRU are connected, so that the model can acquire both the forward semantic information and the backward semantic information, and the problem of semantic loss caused by incapability of acquiring the forward semantic information due to a single network which frequently occurs in the single SRU network can be solved. Meanwhile, the word number of the input sentence is limited to 20 words, so that the phenomenon that information is lost in transmission due to overlong sentences can be avoided.

Preferably, S24: a second layer of attentiveness mechanisms, namely a multi-headed attentiveness mechanism layer, is constructed.

After calculation through the bi-directional SRU network, a series of word sequences are output, which lose semantic location information during calculation, and the multi-headed attentiveness mechanism captures different aspects of the input from different angles by mapping the input sequences to different representation spaces, thereby enabling the model to better understand the input sequences.

As shown in fig. 5, the multi-head attention construction process is as follows: is provided with an input sequence X= [ X ] ₁ ,x ₂ ,...,x _n ]Wherein n represents the sequence length, each x _i Is a vector representation. The multi-headed attention mechanism maps the input sequence to h subspaces, each subspace having a different weight matrix W ^Q ,W ^K ,W ^V Respectively for computing queries, keys and value vectors. For each subspace i, a query vector Q is calculated _i ＝XW _i ^Q Key vector K _i ＝XW _i ^K Sum vector V _i ＝XW _i ^V . Which is a kind ofIn (W) _i ^Q ,W _i ^K ,W _i ^V Is d _model ×d _k Matrix d of (d) _model Is the dimension of the representation vector in the model, d _k Is the dimension of the key vector. Then, the attention score of the i subspace is obtained by dot-product of the query vector and the key vector

And scale the attention score. Next, the attention score S _i Normalization is carried out through a Softmax function to obtain the attention weight W _i ＝softmax(S _i ) Representing the contribution of each location to subspace i. Finally, the value vector V _i And attention weight W _i Weighted average is performed to obtain the output of multiple-head attention +.>

Finally, output O of a plurality of subspaces ₁ ,O ₂ ,...,O _h Spliced to form a final output vector O= [ O ] ₁ ,O ₂ ,...,O _h ]Wherein [ O ] ₁ ,O ₂ ,...,O _h ]Representing the concatenation operation of the vectors.

wherein: y is _i Representing the actual value of the current,

In the text classification task, the Precision P (Precision), recall R (Recall) and F1 values are generally used as evaluation indexes, and the larger the Precision P and Recall R, F values are, the better the performance is represented.

In the formulas (7) - (9): TP is that samples belonging to class C are correctly classified into class C; FP is that samples not belonging to class C are misclassified to class C; FN is other class where samples belonging to class C are misclassified outside class C; FN is the other class that the samples not belonging to class C are correctly classified outside class C. P is the percentage of the number of samples correctly identified to the total number of samples identified, R is the percentage of the number of samples correctly identified to the total number of samples, and F1 is the harmonic mean of the two values of P and R.

To test the advantages and disadvantages of the methods proposed in the present invention, the Roberta-wwm-Bi-SRU method, the Roberta-wwm-Self-Attention-Bi-SRU method, and the Roberta-wwm-Bi-SRU-Multi-Head Attention method, the following comparative tests were employed in the present invention.

In the test, the initial learning rate is 0.005, the number of batch corpus is 25,000, the number of SRU hidden units in the forward direction and the reverse direction is 512 layers, and the number of training rounds is 110000. The whole model uses an improved los function as a Loss function calculation formula, a BiLSTM model with an improved dual-concentration mechanism adopts a Softmax function as a classifier, and a system optimization function uses an Adam optimizer subjected to parameter optimization. Aiming at the phenomena of over fitting and under fitting possibly occurring in the model training process, a double-attention focusing mechanism is introduced and an improved RoBERTa model based on a full-word mask technology is used, large model parameters are used for training in the improved RoBERTa model, the value of BatchSize is improved, the prediction precision is improved, and tests are carried out on a self-built bogie data set, so that the precision rate P reaches 86.31%, the recall rate R is 86.90%, and the F1 value is 86.60%.

The data for the comparison with the three models of the same module but in different combinations are as follows, the RoBERTa-wwm, self-Attention, biSRU and Multi-Head Attention in the tables being the different modules making up the model, P, R and F1 being the model evaluation indicators, this table showing that the modules mentioned in the patent are all useful for the model:

the comparison data in the same case with the model of the three different modules are as follows:

from the results of the comparative experiments, the proposed method of the present invention is superior to the prior art method.

The above embodiments are only for illustrating the present invention and not for limiting the technical solutions described in the present invention, and although the present invention has been described in detail in the present specification with reference to the above embodiments, the present invention is not limited to the above specific embodiments, and thus any modifications or equivalent substitutions are made to the present invention; all technical solutions and modifications thereof that do not depart from the spirit and scope of the invention are intended to be included in the scope of the appended claims.

Claims

1. A method for extracting a standard entity relation of a high-speed train bogie design is characterized by comprising the following steps of: comprising the steps S1-S2, wherein,

s2: constructing a network model, and training an entity relation extraction model;

step S2 includes steps S21-S25:

s21: constructing an improved RoBERTa-wwm pre-training model;

s22: constructing a self-concentration mechanism layer;

s23: constructing a bidirectional SRU network module;

s24: constructing a multi-head concentration mechanism layer;

s25: improving the loss function of the model;

M＝tanh(H) (3)

r＝Hα ^T (5)

in formula (5): r is the output sentence weight vector;

in step S24, the multi-head attention construction process is as follows: is provided with oneInput sequence x= [ X ] ₁ ,x ₂ ,...,x _n ]Wherein n represents the sequence length, each x _i Is a vector representation; the multi-headed attention mechanism maps the input sequence to h subspaces, each subspace having a different weight matrix W ^Q ,W ^K ,W ^V Respectively used for calculating inquiry, keys and value vectors; for each subspace i, a query vector Q is calculated _i ＝XW _i ^Q Key vector K _i ＝XW _i ^K Sum vector V _i ＝XW _i ^V The method comprises the steps of carrying out a first treatment on the surface of the Wherein W is _i ^Q ,W _i ^K ,W _i ^V Is d _model ×d _k Matrix d of (d) _model Is the dimension of the representation vector in the model, d _k Is the dimension of the key vector; then, the attention score of the i subspace is obtained by dot-product of the query vector and the key vector

wherein: y is _i Representing the actual value of the current,

2. A method for extracting standard entity relation of bogie design of high-speed train according to claim 1, wherein: step S1 specifically comprises steps S11-S14; step S11 is to extract data, including the following steps: and collecting a high-speed train bogie knowledge text, and acquiring the text as a data source through a network.

3. A method for extracting standard entity relation of bogie design of high-speed train according to claim 2, wherein: step S12 is data enhancement; the method comprises the following steps:

4. A method for extracting standard entity relationship of high-speed train bogie design as claimed in claim 3, wherein: step S13: and (3) carrying out relationship labeling on the data generated in the step (S12) by using a manual labeling mode on the data set after the entity extraction is completed, and labeling the data as a supervised learning data set format to finally form the data set required by the invention.

5. The method for extracting the standard entity relation of the bogie design of the high-speed train according to claim 4, wherein the method comprises the following steps: step S14: and controlling the sentence length, and if the length of the input sentence is less than 20 words, adding 0 complement at the two ends of the sentence to ensure that the length of the sentence reaches 20 words.

6. The method for extracting the standard entity relation of the bogie design of the high-speed train according to claim 5, wherein the method comprises the following steps: in step S21, constructing a modified RoBERTa-wwm pre-training model, wherein a backbone network of the RoBERTa model uses a transducer encoder and a GELU nonlinear activation function; the embedding size of the appointed word is E, the number of layers of the encoder is L, and the size of the hidden layer is H; setting the hidden layer size H to 1024, setting the number of attention heads to 16, setting the coding layer size to 24, and dividing the data set into 2K for each training; roBERTa uses a hybrid coding of character and word level to hybrid code the input sentence for learning high frequency byte combinations.