CN116340455A - Method for extracting design standard entity relation of high-speed train bogie - Google Patents

Method for extracting design standard entity relation of high-speed train bogie Download PDF

Info

Publication number
CN116340455A
CN116340455A CN202310333688.4A CN202310333688A CN116340455A CN 116340455 A CN116340455 A CN 116340455A CN 202310333688 A CN202310333688 A CN 202310333688A CN 116340455 A CN116340455 A CN 116340455A
Authority
CN
China
Prior art keywords
model
vector
attention
sentence
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310333688.4A
Other languages
Chinese (zh)
Inventor
龚天翼
蒋妍
贺凌峰
杜佳雯
尹新宇
王淑营
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Jiaotong University
Original Assignee
Southwest Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Jiaotong University filed Critical Southwest Jiaotong University
Priority to CN202310333688.4A priority Critical patent/CN116340455A/en
Publication of CN116340455A publication Critical patent/CN116340455A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a method for extracting a standard entity relation of a bogie design of a high-speed train, which is based on the existing BERT-BiLSTM model, and provides an improved RoBERTa-wwm-BISRU model based on a double-attention-focusing mechanism and optimized aiming at a bogie data set to form a BRENT model. The dual focus mechanism was combined with the RoBERTa-wwm-BISRU model and model loss functions were optimized for the bogie data set architecture. The addition of the two concentration layers can enable the model to improve the understanding capability of context semantic, and the improved RoBERTa-wwm model can improve word segmentation and coding capability, so that the entity relation extraction performance in the small data set is improved, the model is optimized for the bogie data set, the model loss function is improved, and the accuracy of model prediction is improved.

Description

Method for extracting design standard entity relation of high-speed train bogie
Technical Field
The invention relates to a design standard entity relation extraction method, in particular to a high-speed train bogie design standard entity relation extraction method.
Background
The processes of designing, manufacturing, operating and maintaining the bogie of the high-speed train involve a great deal of data and knowledge, and the data are scattered in different service systems, including technical standards, design parameters, process parameters, fault information and the like. There are complex relationships between these data, such as relationships between manufacturing parameters and fault information, relationships between design parameters and technical standards, and so forth. By extracting the relation among entities, useful information can be extracted from a large amount of data, the association relation among the information is revealed, and important reference basis is provided for the aspects of design, manufacture, operation and maintenance of the bogie of the high-speed train. Therefore, it is very important to perform relation extraction in the field of high-speed train bogies. Most of current scholars research is focused on an entity relation extraction algorithm of BERT-LSTM, the principle of which is mainly to use BERT or a language model based on BERT improvement to extract text features to obtain a word granularity vector matrix, and then use a long-short-term memory network to extract the relation between words and words between an input sentence and a context, so as to output a relation label. Current research is directed to model improvement from how parameters are reduced using BERT models, added attention, and using bi-directional long-short term neural networks. However, the improvement of the algorithms needs to be trained based on massive data sets, and no research on a physical relationship extraction method which is specially used for extracting small data volume of the data sets in the field of high-speed train bogies is proposed.
Through effectively utilizing and analyzing data in the processes of designing, manufacturing and the like of the bogie of the high-speed train, important value can be provided for optimizing the performance of the bogie. However, the multisource heterogeneity of the data accumulated in each stage and the lack of the proprietary corpus lead to the lack of integrity and consistency of the data, which leads to inaccuracy of entity relation extraction and makes the value of the data difficult to be effectively reflected.
Disclosure of Invention
The invention provides the following technical scheme:
a method for extracting the standard entity relation of the bogie design of high-speed train includes such steps as S1-S2,
s1: extracting text data, performing data enhancement by using synonym substitution and Chinese-English-Chinese translation, and manufacturing a data set;
s2: and constructing a network model, and training an entity relation extraction model.
Step S2 includes steps S21-S25:
s21: constructing an improved RoBERTa-wwm pre-training model;
s22: constructing a self-concentration mechanism layer;
s23: constructing a bidirectional SRU network module;
s24: constructing a multi-head concentration mechanism layer;
s25: the loss function of the model is improved.
In step S21, constructing a modified RoBERTa-wwm pre-training model, wherein a backbone network of the RoBERTa model uses a transducer encoder and a GELU nonlinear activation function; the embedding size of the appointed word is E, the number of layers of the encoder is L, and the size of the hidden layer is H; setting the hidden layer size H to 1024, setting the number of attention heads to 16, setting the coding layer size to 24, and dividing the data set into 2K for each training; roBERTa uses a hybrid coding of character and word level to hybrid code the input sentence for learning high frequency byte combinations.
In step S22, the self-attention mechanism includes three parts: the output word vector of RoBERTa-wwm, the Tanh function and the Softmax activation function; the self-concentration mechanism network structure calculation process adopted is as shown in the formulas (3) to (5):
M=tanh(H) (3)
Figure BDA0004155673630000021
r=Hα T (5)
wherein: in the formula (3): the H-word vector matrix is the output of the first layer RoBERTa-wwm, with dimensions n×Sw, where n is the dimension of the word vector matrix and Sw is the length of the word vector sequence;
in the formula (4): alpha is a word vector matrix activated by a softmax activation function, and T represents the transpose of the matrix;
in formula (5): r is the output sentence weight vector;
the H matrix firstly obtains a word vector matrix M through a Tanh activation function, then obtains a weight matrix through a softmax layer, at the moment, the sum of all weights in alpha is added to be 1, and the H word vector matrix is multiplied by a normalized weight matrix to obtain a corresponding weighted global time sequence feature vector;
in step S23, using a bidirectional SRU network, the forward SRU network and the backward SRU network are connected, so that the model can acquire both forward semantic information and backward semantic information;
in step S24, the multi-head attention construction process is as follows: is provided with an input sequence X= [ X ] 1 ,x 2 ,...,x n ]Wherein n represents the sequence length, each x i Is a vector representation; the multi-headed attention mechanism maps the input sequence to h subspaces, each subspace having a different weight matrix W Q ,W K ,W V Respectively used for calculating inquiry, keys and value vectors; for each subspace i, a query vector Q is calculated i =XW i Q Key vector K i =XW i K Sum vector V i =XW i V The method comprises the steps of carrying out a first treatment on the surface of the Wherein W is i Q ,W i K ,W i V Is d model ×d k Matrix d of (d) model Is the dimension of the representation vector in the model, d k Is the dimension of the key vector; then, the attention score of the i subspace is obtained by dot-product of the query vector and the key vector
Figure BDA0004155673630000031
And scaling the attention score; next, the attention score S i Normalization is carried out through a Softmax function to obtain the attention weight W i =softmax(S i ) Representing the contribution of each location to subspace i; finally, the value vector V i And attention weight W i Weighted average is performed to obtain the output of multiple-head attention +.>
Figure BDA0004155673630000032
Finally, output O of a plurality of subspaces 1 ,O 2 ,...,O h Spliced to form a final output vector O= [ O ] 1 ,O 2 ,...,O h ]Wherein [ O ] 1 ,O 2 ,...,O h ]A concatenation operation representing vectors;
s25: and extracting a data set aiming at the constructed high-speed train bogie relation, and improving the loss function of the whole BRENT model to enable the pre-training model to better adapt to the characteristics of the high-speed train bogie data set, wherein the improved loss function is shown in a formula (6):
Figure BDA0004155673630000033
wherein: y is i Representing the actual value of the current,
Figure BDA0004155673630000034
representing the predicted value, c represents the length of the sentence sequence, so that after improvement, the model can pay more attention to the effect of the original variable, namely, the original variable and the predicted variable jointly influence the predicted result.
Step S1 specifically comprises steps S11-S14;
step S11 is to extract data, including the following steps: and collecting a high-speed train bogie knowledge text, and acquiring the text as a data source through a network.
Step S12 is data enhancement; the method comprises the following steps:
(1) Data enhancement using synonym replacement, the method will replace a random number of words in a sentence, change these words to their synonyms, the method builds a new sentence by changing the synonym words in the sentence; (2) Translating the text into different languages, translating the text into Chinese, and increasing the details of corpus data; (3) Text entity extraction is performed by using an ALBERT-BILSTM+CNN-CRF model, features are extracted by using BILSTM and CNN in parallel, and then the extracted entity is marked.
Step S13: and (3) carrying out relationship labeling on the data generated in the step (S12) by using a manual labeling mode on the data set after the entity extraction is completed, and labeling the data as a supervised learning data set format to finally form the data set required by the invention.
Step S14: and controlling the sentence length, and if the length of the input sentence is less than 20 words, adding 0 complement at the two ends of the sentence to ensure that the length of the sentence reaches 20 words.
Compared with the prior art, the invention has the beneficial effects that:
(1) The inventor finds that the traditional BERT-BILSTM model in experiments, relies on a large number of data sets for training and has huge training parameters. In this regard, the invention discloses a method for extracting the design standard entity relationship of a high-speed train bogie. Wherein the method comprises a Roberta-wwm-Self attribute-BiSRU-Multi-head attribute model based on a dual focus mechanism and Roberta modifications for extracting relationships between bogie design criteria data entities. The method comprises the steps that a model firstly utilizes a RoBERTa-wwm pre-training model to convert sentence semantics into text vectors, then dynamically adjusts according to bogie field design standard dataset characteristics to obtain dynamic word vectors, then respectively sends the word vectors into a self-attention concentration mechanism layer to extract weighted semantic information, then the word vector data with weights flows through a BiSRU layer, the layer can fully utilize the advantages of GPU parallel computing, context semantic information is obtained, and finally the model can pay attention to information of different positions and different aspects at the same time through a multi-head attention concentration layer, so that the expressive force of the model is improved. The model fused with the double-concentration mechanism can fully extract the context semantic and the information of different positions while improving the operation speed, and the acquisition capability of the model on the entity relationship is greatly improved.
(2) The inventor finds in experiments that the original BERT model uses a word mask mechanism to mask part of text information, and the mask mode is good in English data set, but in Chinese data set, word mask can divide Chinese vocabulary into individual words, so that semantic information is lost. Therefore, the invention improves the word mask, uses word mask technology, namely, when mask training is carried out, firstly uses a language model of the Harbin industrial university to segment the input sentence, and then carries out mask training on different vocabularies, thus avoiding model under fitting phenomenon caused by semantic loss while expanding a data set.
(3) The inventor finds that the complex relation extracting capability of a bogie relation extracting data set formed by using a pipeline mode is weaker when an entity relation extracting task in the bogie field is carried out in an experiment in the prior art. Therefore, a double-concentration mechanism is added for extracting the context semantic and information of different positions, and the context semantic and the information of different positions are fused, and an improved loss function is additionally used, so that the original information quantity is added in the model during evaluation, and the prediction performance of the model is improved.
(4) The inventor finds that pure convolutional neural networks such as TextCNN have the problem of insufficient coding of the grammar information in experiments, and if RNN or LSTM is used, parallel operation cannot be performed on the GPU, so that the problem of low operation efficiency is caused. In this regard, the present invention proposes that the dependency relationship between words is solved by using the BiSRU, and the bidirectional network can extract semantic information of the context and combine the position information, vocabulary, syntax and semantic information, and then uses the multi-head attention mechanism to focus on learning the characteristic information inside the sequence. Meanwhile, the SRU network is used for solving the problem that the model cannot operate in parallel on the GPU, and the calculation efficiency of the model is greatly improved by introducing parallel operation.
(5) The inventor controls the sentence length in the model input sentence, limits the length of the input entity relation sentence with label to 20 words, calculates according to our statistics, extracts the data set in the constructed high-speed train bogie relation, the average length of the entity relation sentence is about 16 words, the phenomenon that the characteristics disappear in Bi-SRU layer can be effectively avoided by using the short word sentence, if the length of the sentence is less than 16 words, equal length zero filling operation is carried out at two ends of the sentence, and the length of the sentence is supplemented to 20 words.
(6) The inventor finds that the problem of insufficient design standard in the bogie design field in experiments, and in this way, the invention proposes to increase the existing corpus data set by using a synonym and Chinese-English-Chinese translation method, thereby preventing model fitting phenomenon caused by too little corpus and improving the accuracy of bogie model entity extraction.
Description of the drawings:
FIG. 1 is a flow chart of the overall structure of a model network;
FIG. 2 is a schematic diagram of a model calculation flow;
FIG. 3 is a schematic diagram of a first layer self-focusing mechanism;
FIG. 4 is a schematic diagram of a bidirectional SRU network architecture;
FIG. 5 is a schematic diagram of a second-layer multi-head attention mechanism.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. It will be apparent that the described embodiments are some, but not all, embodiments of the invention.
Thus, the following detailed description of the embodiments of the invention is not intended to limit the scope of the invention, as claimed, but is merely representative of some embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, under the condition of no conflict, the embodiments of the present invention and the features and technical solutions in the embodiments may be combined with each other.
The invention discloses a method for extracting a standard entity relation of a bogie design of a high-speed train, in particular to an improved RoBERTa-wwm-BISRU model based on a double-Attention-focusing mechanism and optimized for a bogie data set based on an existing BERT-BiLSTM model, and a RoBERTa-wwm +self-attention+BiSRU+multi-Head Attention model (BRENT) is formed. The model innovatively combines the dual focus mechanism with the RoBERTa-wwm-BISRU model and optimizes the model loss function for the bogie data set architecture. The model firstly uses the RoBERTa-wwm model to pretrain, the sentence semantics are converted into text vectors through the whole word mask technology and the optimized BERT model, then dynamic word vectors of sentences are obtained through dynamic adjustment according to the characteristics of a bogie field design standard data set, finally word vectors are respectively sent into a self-focusing mechanism layer, the two-way SRU layer and the multi-head focusing layer extract characteristic information, the model can improve the understanding capability of context semantics due to the addition of the two focusing layers, and the improved RoBERTa-wwm model can improve the word segmentation and coding capability, so that the entity relation extraction performance in a small data set is improved, the model is optimized for the bogie data set, the model loss function is improved, and the accuracy of model prediction is improved.
The invention comprises the following steps:
s1: text data are extracted, data enhancement is carried out by using synonym substitution and Chinese-English-Chinese translation, and a data set is manufactured.
S2: and constructing a network model, and training an entity relation extraction model.
Preferably, step S1 includes: text data is extracted, data enhancement is performed, and a data set is manufactured. And acquiring unlabeled corpus in the appointed field, and acquiring data required by training. After the data is enhanced in various modes, the relationship between the entities in each sentence is marked. The invention extracts named entity first and then marks the entity relation data set to form the data set used by the invention. Step S1 specifically includes steps S11-S14.
Preferably, step S11 is to extract data. The method comprises the following steps: and collecting a high-speed train bogie knowledge text, and acquiring a large amount of text as a data source through a network. And the obtained text is good and uneven, and various text data related to the required high-speed rail bogie are obtained after manual inspection and cleaning.
Preferably, step S12 is data enhancement. The method comprises the following steps:
(1) The method is characterized in that synonym replacement is used for data enhancement, random number of words in a sentence are replaced, the words are changed into synonyms of the words, and a new sentence is constructed by changing the synonyms in the sentence, so that the purpose of enhancing the data quantity is achieved; (2) Translating the text into different languages, translating the text into Chinese, and increasing the details of corpus data; (3) The text entity extraction is carried out by using an ALBERT-BILSTM+CNN-CRF model, the model uses BILSTM and CNN to extract features in parallel, coarse granularity features and fine granularity features of sentences can be fused, the accuracy of entity extraction is improved, and then the extracted entity is marked.
Preferably, step S13: and (3) carrying out relationship labeling on the data generated in the step (S12) by using a manual labeling mode on the data set after the entity extraction is completed, and labeling the data as a supervised learning data set format to finally form the data set required by the invention.
Preferably, step S14: and controlling the sentence length, and if the length of the input sentence is less than 20 words, adding 0 complement at the two ends of the sentence to ensure that the length of the sentence reaches 20 words. Specifically, assuming that the sentence length is n, the number of both terminal complements 0 is represented by formula (1) if n is even, and the number of both terminal complements zero is represented by formula (2) if n is odd.
Figure BDA0004155673630000061
Figure BDA0004155673630000062
Preferably, in step S2, the entity relationship extraction model training is performed based on the data generated in S1 and the trained language model. In the training process of the entity relation extraction model, the parameters of the language word segmentation coding model are also regarded as the parameters of the upper-layer entity relation extraction model to be trained together. In the step S2, a BRENT entity relation extraction model network is used, an entity relation extraction model is trained on the basis of manually marked entity relation extraction data, and then parameters are updated through gradient descent, and the training process is repeated, so that a loss function can be continuously reduced, and a better and better prediction result is obtained.
Step S2 includes steps S21-S25:
s21: constructing an improved RoBERTa-wwm pre-training model;
s22: constructing a self-concentration mechanism layer;
s23: constructing a bidirectional SRU network module;
s24: constructing a multi-head concentration mechanism layer;
s25: the loss function of the model is improved.
Preferably, S21: an improved Roberta-wwm pre-training model was constructed. The backbone network of the RoBERTa model uses a transducer encoder and a gel nonlinear activation function. The embedded size of the appointed word is E, the number of encoder layers is L, and the size of the hidden layer is H. The invention sets the hidden layer size H as 1024, the number of attention heads as 16, the coding layer size as set bit 24, the size of the divided data set for each training is 2K, and the parameter is set to indicate that the RoBERTa model can have better performance when being trained by using larger training parameters, thereby playing better word segmentation capability on the extracted data set of the self-built bogie design standard relation. Meanwhile, roBERTa performs mixed coding on the input sentence by using a character and word level mixed coding mode, and is used for learning high-frequency byte combinations. Meanwhile, in order to cope with Chinese corpus, the RoBERTa model uses a full-word mask training technology (wwm) and uses a word segmentation model (Language Technology Platform, LTP) of Harbin university to segment the center Wen Yugou, so that the phenomenon of semantic loss caused by segmentation of Chinese words when training by using a mask technology can be avoided.
Preferably, S22: a self-focusing mechanism layer is built. As shown in fig. 3, the self-attention mechanism consists of three parts: the output word vector of RoBERTa-wwm, the Tanh function and the Softmax activation function. The invention adopts a self-attention mechanism to process the output of RoBERTa-wwm, so that the model attention is more focused on the entity to be extracted, and the calculation process of the network structure of the self-attention mechanism adopted by the invention is shown in the formulas (3) to (5):
M=tanh(H) (3)
Figure BDA0004155673630000071
r=Hα T (5)
wherein: in the formula (3): the H-word vector matrix is the output of the first layer RoBERTa-wwm, with dimensions n×Sw, where n is the dimension of the word vector matrix and Sw is the length of the word vector sequence;
in the formula (4): alpha is a word vector matrix activated by a softmax activation function, and T represents the transpose of the matrix;
in formula (5): r is the output sentence weight vector;
the H matrix firstly obtains a word vector matrix M through a Tanh activation function, then obtains a weight matrix through a softmax layer, at the moment, the sum of all weights in alpha is added to be 1, and the H word vector matrix is multiplied by a normalized weight matrix to obtain a corresponding weighted global time sequence feature vector.
Preferably, S23: and constructing a bidirectional SRU network module. The method comprises the following steps: the bidirectional SRU network is used, and the forward SRU network and the backward SRU network are connected, so that the model can acquire both forward semantic information and backward semantic information.
The invention uses the improved network SRU of LSTM, which can solve the parallel computing problem, thereby improving the operation efficiency. SRU is prior art and will not be described in detail herein. The invention uses the bidirectional SRU network, as shown in figure 4, the forward SRU and the backward SRU are connected, so that the model can acquire both the forward semantic information and the backward semantic information, and the problem of semantic loss caused by incapability of acquiring the forward semantic information due to a single network which frequently occurs in the single SRU network can be solved. Meanwhile, the word number of the input sentence is limited to 20 words, so that the phenomenon that information is lost in transmission due to overlong sentences can be avoided.
Preferably, S24: a second layer of attentiveness mechanisms, namely a multi-headed attentiveness mechanism layer, is constructed.
After calculation through the bi-directional SRU network, a series of word sequences are output, which lose semantic location information during calculation, and the multi-headed attentiveness mechanism captures different aspects of the input from different angles by mapping the input sequences to different representation spaces, thereby enabling the model to better understand the input sequences.
As shown in fig. 5, the multi-head attention construction process is as follows: is provided with an input sequence X= [ X ] 1 ,x 2 ,...,x n ]Wherein n represents the sequence length, each x i Is a vector representation. The multi-headed attention mechanism maps the input sequence to h subspaces, each subspace having a different weight matrix W Q ,W K ,W V Respectively for computing queries, keys and value vectors. For each subspace i, a query vector Q is calculated i =XW i Q Key vector K i =XW i K Sum vector V i =XW i V . Which is a kind ofIn (W) i Q ,W i K ,W i V Is d model ×d k Matrix d of (d) model Is the dimension of the representation vector in the model, d k Is the dimension of the key vector. Then, the attention score of the i subspace is obtained by dot-product of the query vector and the key vector
Figure BDA0004155673630000081
And scale the attention score. Next, the attention score S i Normalization is carried out through a Softmax function to obtain the attention weight W i =softmax(S i ) Representing the contribution of each location to subspace i. Finally, the value vector V i And attention weight W i Weighted average is performed to obtain the output of multiple-head attention +.>
Figure BDA0004155673630000082
Finally, output O of a plurality of subspaces 1 ,O 2 ,...,O h Spliced to form a final output vector O= [ O ] 1 ,O 2 ,...,O h ]Wherein [ O ] 1 ,O 2 ,...,O h ]Representing the concatenation operation of the vectors.
S25: and extracting a data set aiming at the constructed high-speed train bogie relation, and improving the loss function of the whole BRENT model to enable the pre-training model to better adapt to the characteristics of the high-speed train bogie data set, wherein the improved loss function is shown in a formula (6):
Figure BDA0004155673630000083
wherein: y is i Representing the actual value of the current,
Figure BDA0004155673630000084
representing the predicted value, c represents the length of the sentence sequence, so that after improvement, the model can pay more attention to the effect of the original variable, namely, the original variable and the predicted variable jointly influence the predicted result.
In the text classification task, the Precision P (Precision), recall R (Recall) and F1 values are generally used as evaluation indexes, and the larger the Precision P and Recall R, F values are, the better the performance is represented.
Figure BDA0004155673630000085
Figure BDA0004155673630000086
Figure BDA0004155673630000087
In the formulas (7) - (9): TP is that samples belonging to class C are correctly classified into class C; FP is that samples not belonging to class C are misclassified to class C; FN is other class where samples belonging to class C are misclassified outside class C; FN is the other class that the samples not belonging to class C are correctly classified outside class C. P is the percentage of the number of samples correctly identified to the total number of samples identified, R is the percentage of the number of samples correctly identified to the total number of samples, and F1 is the harmonic mean of the two values of P and R.
To test the advantages and disadvantages of the methods proposed in the present invention, the Roberta-wwm-Bi-SRU method, the Roberta-wwm-Self-Attention-Bi-SRU method, and the Roberta-wwm-Bi-SRU-Multi-Head Attention method, the following comparative tests were employed in the present invention.
In the test, the initial learning rate is 0.005, the number of batch corpus is 25,000, the number of SRU hidden units in the forward direction and the reverse direction is 512 layers, and the number of training rounds is 110000. The whole model uses an improved los function as a Loss function calculation formula, a BiLSTM model with an improved dual-concentration mechanism adopts a Softmax function as a classifier, and a system optimization function uses an Adam optimizer subjected to parameter optimization. Aiming at the phenomena of over fitting and under fitting possibly occurring in the model training process, a double-attention focusing mechanism is introduced and an improved RoBERTa model based on a full-word mask technology is used, large model parameters are used for training in the improved RoBERTa model, the value of BatchSize is improved, the prediction precision is improved, and tests are carried out on a self-built bogie data set, so that the precision rate P reaches 86.31%, the recall rate R is 86.90%, and the F1 value is 86.60%.
The data for the comparison with the three models of the same module but in different combinations are as follows, the RoBERTa-wwm, self-Attention, biSRU and Multi-Head Attention in the tables being the different modules making up the model, P, R and F1 being the model evaluation indicators, this table showing that the modules mentioned in the patent are all useful for the model:
Figure BDA0004155673630000091
the comparison data in the same case with the model of the three different modules are as follows:
Figure BDA0004155673630000092
from the results of the comparative experiments, the proposed method of the present invention is superior to the prior art method.
The above embodiments are only for illustrating the present invention and not for limiting the technical solutions described in the present invention, and although the present invention has been described in detail in the present specification with reference to the above embodiments, the present invention is not limited to the above specific embodiments, and thus any modifications or equivalent substitutions are made to the present invention; all technical solutions and modifications thereof that do not depart from the spirit and scope of the invention are intended to be included in the scope of the appended claims.

Claims (6)

1. A method for extracting a standard entity relation of a high-speed train bogie design is characterized by comprising the following steps of: comprising the steps S1-S2, wherein,
s1: extracting text data, performing data enhancement by using synonym substitution and Chinese-English-Chinese translation, and manufacturing a data set;
s2: constructing a network model, and training an entity relation extraction model;
step S2 includes steps S21-S25:
s21: constructing an improved RoBERTa-wwm pre-training model;
s22: constructing a self-concentration mechanism layer;
s23: constructing a bidirectional SRU network module;
s24: constructing a multi-head concentration mechanism layer;
s25: improving the loss function of the model;
in step S22, the self-attention mechanism includes three parts: the output word vector of RoBERTa-wwm, the Tanh function and the Softmax activation function; the self-concentration mechanism network structure calculation process adopted is as shown in the formulas (3) to (5):
M=tanh(H) (3)
Figure FDA0004155673610000011
r=Hα T (5)
wherein: in the formula (3): the H-word vector matrix is the output of the first layer RoBERTa-wwm, with dimensions n×Sw, where n is the dimension of the word vector matrix and Sw is the length of the word vector sequence;
in the formula (4): alpha is a word vector matrix activated by a softmax activation function, and T represents the transpose of the matrix;
in formula (5): r is the output sentence weight vector;
the H matrix firstly obtains a word vector matrix M through a Tanh activation function, then obtains a weight matrix through a softmax layer, at the moment, the sum of all weights in alpha is added to be 1, and the H word vector matrix is multiplied by a normalized weight matrix to obtain a corresponding weighted global time sequence feature vector;
in step S23, using a bidirectional SRU network, the forward SRU network and the backward SRU network are connected, so that the model can acquire both forward semantic information and backward semantic information;
in step S24, the multi-head attention construction process is as follows: is provided with oneInput sequence x= [ X ] 1 ,x 2 ,...,x n ]Wherein n represents the sequence length, each x i Is a vector representation; the multi-headed attention mechanism maps the input sequence to h subspaces, each subspace having a different weight matrix W Q ,W K ,W V Respectively used for calculating inquiry, keys and value vectors; for each subspace i, a query vector Q is calculated i =XW i Q Key vector K i =XW i K Sum vector V i =XW i V The method comprises the steps of carrying out a first treatment on the surface of the Wherein W is i Q ,W i K ,W i V Is d model ×d k Matrix d of (d) model Is the dimension of the representation vector in the model, d k Is the dimension of the key vector; then, the attention score of the i subspace is obtained by dot-product of the query vector and the key vector
Figure FDA0004155673610000021
And scaling the attention score; next, the attention score S i Normalization is carried out through a Softmax function to obtain the attention weight W i =softmax(S i ) Representing the contribution of each location to subspace i; finally, the value vector V i And attention weight W i Weighted average is performed to obtain the output of multiple-head attention +.>
Figure FDA0004155673610000022
Finally, output O of a plurality of subspaces 1 ,O 2 ,...,O h Spliced to form a final output vector O= [ O ] 1 ,O 2 ,...,O h ]Wherein [ O ] 1 ,O 2 ,...,O h ]A concatenation operation representing vectors;
s25: and extracting a data set aiming at the constructed high-speed train bogie relation, and improving the loss function of the whole BRENT model to enable the pre-training model to better adapt to the characteristics of the high-speed train bogie data set, wherein the improved loss function is shown in a formula (6):
Figure FDA0004155673610000023
wherein: y is i Representing the actual value of the current,
Figure FDA0004155673610000024
representing the predicted value, c represents the length of the sentence sequence, so that after improvement, the model can pay more attention to the effect of the original variable, namely, the original variable and the predicted variable jointly influence the predicted result.
2. A method for extracting standard entity relation of bogie design of high-speed train according to claim 1, wherein: step S1 specifically comprises steps S11-S14; step S11 is to extract data, including the following steps: and collecting a high-speed train bogie knowledge text, and acquiring the text as a data source through a network.
3. A method for extracting standard entity relation of bogie design of high-speed train according to claim 2, wherein: step S12 is data enhancement; the method comprises the following steps:
(1) Data enhancement using synonym replacement, the method will replace a random number of words in a sentence, change these words to their synonyms, the method builds a new sentence by changing the synonym words in the sentence; (2) Translating the text into different languages, translating the text into Chinese, and increasing the details of corpus data; (3) Text entity extraction is performed by using an ALBERT-BILSTM+CNN-CRF model, features are extracted by using BILSTM and CNN in parallel, and then the extracted entity is marked.
4. A method for extracting standard entity relationship of high-speed train bogie design as claimed in claim 3, wherein: step S13: and (3) carrying out relationship labeling on the data generated in the step (S12) by using a manual labeling mode on the data set after the entity extraction is completed, and labeling the data as a supervised learning data set format to finally form the data set required by the invention.
5. The method for extracting the standard entity relation of the bogie design of the high-speed train according to claim 4, wherein the method comprises the following steps: step S14: and controlling the sentence length, and if the length of the input sentence is less than 20 words, adding 0 complement at the two ends of the sentence to ensure that the length of the sentence reaches 20 words.
6. The method for extracting the standard entity relation of the bogie design of the high-speed train according to claim 5, wherein the method comprises the following steps: in step S21, constructing a modified RoBERTa-wwm pre-training model, wherein a backbone network of the RoBERTa model uses a transducer encoder and a GELU nonlinear activation function; the embedding size of the appointed word is E, the number of layers of the encoder is L, and the size of the hidden layer is H; setting the hidden layer size H to 1024, setting the number of attention heads to 16, setting the coding layer size to 24, and dividing the data set into 2K for each training; roBERTa uses a hybrid coding of character and word level to hybrid code the input sentence for learning high frequency byte combinations.
CN202310333688.4A 2023-03-31 2023-03-31 Method for extracting design standard entity relation of high-speed train bogie Pending CN116340455A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310333688.4A CN116340455A (en) 2023-03-31 2023-03-31 Method for extracting design standard entity relation of high-speed train bogie

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310333688.4A CN116340455A (en) 2023-03-31 2023-03-31 Method for extracting design standard entity relation of high-speed train bogie

Publications (1)

Publication Number Publication Date
CN116340455A true CN116340455A (en) 2023-06-27

Family

ID=86885536

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310333688.4A Pending CN116340455A (en) 2023-03-31 2023-03-31 Method for extracting design standard entity relation of high-speed train bogie

Country Status (1)

Country Link
CN (1) CN116340455A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117151222A (en) * 2023-09-15 2023-12-01 大连理工大学 Domain knowledge guided emergency case entity attribute and relation extraction method thereof, electronic equipment and storage medium
CN117151222B (en) * 2023-09-15 2024-05-24 大连理工大学 Domain knowledge guided emergency case entity attribute and relation extraction method thereof, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117151222A (en) * 2023-09-15 2023-12-01 大连理工大学 Domain knowledge guided emergency case entity attribute and relation extraction method thereof, electronic equipment and storage medium
CN117151222B (en) * 2023-09-15 2024-05-24 大连理工大学 Domain knowledge guided emergency case entity attribute and relation extraction method thereof, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US11941522B2 (en) Address information feature extraction method based on deep neural network model
CN110348016B (en) Text abstract generation method based on sentence correlation attention mechanism
CN111897949B (en) Guided text abstract generation method based on Transformer
CN110209801B (en) Text abstract automatic generation method based on self-attention network
CN107967262B (en) A kind of neural network illiteracy Chinese machine translation method
CN109635124B (en) Remote supervision relation extraction method combined with background knowledge
CN110688394B (en) NL generation SQL method for novel power supply urban rail train big data operation and maintenance
CN110134946B (en) Machine reading understanding method for complex data
CN110929030A (en) Text abstract and emotion classification combined training method
CN111858932A (en) Multiple-feature Chinese and English emotion classification method and system based on Transformer
CN109992775A (en) A kind of text snippet generation method based on high-level semantics
CN111209749A (en) Method for applying deep learning to Chinese word segmentation
CN113204674B (en) Video-paragraph retrieval method and system based on local-overall graph inference network
CN114880461A (en) Chinese news text summarization method combining contrast learning and pre-training technology
CN112926337B (en) End-to-end aspect level emotion analysis method combined with reconstructed syntax information
CN112580373A (en) High-quality Mongolian unsupervised neural machine translation method
CN113111152A (en) Depression detection method based on knowledge distillation and emotion integration model
CN115098673A (en) Business document information extraction method based on variant attention and hierarchical structure
Liu et al. Accurate emotion strength assessment for seen and unseen speech based on data-driven deep learning
CN111008277B (en) Automatic text summarization method
CN112395891A (en) Chinese-Mongolian translation method combining Bert language model and fine-grained compression
CN116756303A (en) Automatic generation method and system for multi-topic text abstract
CN116340455A (en) Method for extracting design standard entity relation of high-speed train bogie
CN116029295A (en) Electric power text entity extraction method, defect positioning method and fault diagnosis method
CN113157914B (en) Document abstract extraction method and system based on multilayer recurrent neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination