CN113239700A - Text semantic matching device, system, method and storage medium for improving BERT - Google Patents

Text semantic matching device, system, method and storage medium for improving BERT Download PDF

Info

Publication number
CN113239700A
CN113239700A CN202110459186.7A CN202110459186A CN113239700A CN 113239700 A CN113239700 A CN 113239700A CN 202110459186 A CN202110459186 A CN 202110459186A CN 113239700 A CN113239700 A CN 113239700A
Authority
CN
China
Prior art keywords
text
bert
vector
model
semantic matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110459186.7A
Other languages
Chinese (zh)
Inventor
王庆岩
顾金铭
殷楠楠
谢金宝
梁欣涛
沈涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN202110459186.7A priority Critical patent/CN113239700A/en
Publication of CN113239700A publication Critical patent/CN113239700A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The text semantic matching device, system, method and storage medium for improving BERT, in particular to a matching device, system, method and storage medium for text semantic matching, BER, word granularity, relative position coding and attention pooling, belonging to the field of natural language processing; the method aims to solve the problems that the training time of the BERT model is long, absolute position coding cannot indicate the relative position between words in a sentence, and the output text representation cannot completely utilize the text representation sequence output by the BERT model; the invention processes the text by establishing a word embedding mechanism in the transmission layer, a relative position coding mechanism of the coding layer and an attention mechanism after pooling through the output layer, and completes the semantic matching of the subsequent text; the method and the device not only improve the accuracy of text matching, more accurately reflect different positions of sentences and information among different positions, but also obtain the text representation containing more semantic information after dimension reduction by adopting an attention pooling mode.

Description

Text semantic matching device, system, method and storage medium for improving BERT
Technical Field
The invention discloses a text semantic matching method for improving BERT, in particular to matching equipment, a system, a method and a storage medium for text semantic matching, BER, word granularity, relative position coding and attention pooling, and belongs to the field of natural language processing.
Background
Text semantic matching is one of the basic tasks of the natural language processing field (NLP) and aims at modeling the semantics of two texts and classifying the relationship between them. The research of text semantic matching can be applied to natural language processing tasks such as automatic question answering, machine translation, dialogue system and rephrasing, and the tasks can be abstracted into text matching tasks to a certain extent.
The primary problem faced by the task of text semantic matching is the problem of text representation, which refers to mapping words in a text into word vector representations so that a computer can process the text. In recent years, with the development of large-scale pre-training models, text representation technology has been greatly developed, and various pre-training models based on large-scale text prediction emerge like bamboo shoots in spring after rain, such as ELMo, OpenAI GPT, BERT, XLNet and the like. Since the great success of the BERT pre-training model, improvements based on the BERT pre-training model have been proposed continuously, such as RoBERTa, ALBERT, etc.
Although the model has achieved good results, the three previous methods for reducing the dimension are [ CLS ] vector extraction, average pooling and maximum pooling; the three methods are too large in the three-dimensional text representation sequence output by one-sided application, so that the proposed method integrates the relation between the [ CLS ] vector and the rest vectors to obtain text representation which more accurately reflects text semantics.
The text representation generated by performing the pooling operation on the output text sequence extracted from the text by the pre-training model is an important step of the text semantic matching model. Collobert et al propose a global maximum pooling method that generates a semantically matched textual representation by the maximum of each vector-corresponding element in the textual representation sequence. Conneau et al combine a Bi-directional long-short time memory (Bi-LSTM) network with global maximum pooling and global average pooling, respectively, to encode sentence-level semantic information, and compare to obtain a structure in which Bi-LSTM is combined with global maximum pooling, which has an optimal effect on sentence-level semantic encoding. And the Kim generates a text representation sequence based on the word2vec embedding model, and combines a Convolutional Neural Network (CNN) with global maximum pooling to perform a text classification task. Hu and the like combine CNN and global maximum pooling to provide a text semantic matching model without prior knowledge. BERT proposed that the pooling approach employed be to extract the vector of the special character [ CLS ] as a semantic matching text representation. The above methods all use only a part of the output text sequence, and do not combine the special character [ CLS ] vector in BERT with other sequence vectors, and adopt attention pooling to solve the above problems.
Disclosure of Invention
In a text matching task, a BERT model obtains good performance, but still has the problems that the training time is long, absolute Position codes cannot indicate the Relative positions between words and words in a sentence, and the output text representation cannot completely utilize the text representation sequence output by the BERT model, and the invention provides a text matching model AP _ REP _ WordBERT based on word Embedding, Attention Pooling (AP) and Relative Position coding (PRE) for improving the BERT; the technical scheme of the invention is as follows:
the first scheme is as follows: the text semantic matching system for improving the BERT comprises a data preprocessing subsystem and a BERT subsystem; the data preprocessing subsystem is responsible for arranging the acquired text and transmitting the text to the BERT model subsystem, the BERT model subsystem is used for establishing a model and outputting the model, and finally the output layer subsystem is used for improving the model and outputting a matching result.
Specifically, the data preprocessing subsystem comprises a text acquisition module, a splicing module and a word segmentation module; the BERT model subsystem comprises an input representation layer, an encoding layer and an output layer; the output layer includes an attention pooling module and a classifier.
Scheme II: the method is realized on the basis of the system by establishing a word embedding mechanism in the transmission layer, a relative position coding mechanism of the coding layer and processing a text by the attention mechanism after pooling of the output layer, so as to complete subsequent text semantic matching; the method comprises the following specific steps:
step one, inputting a text by the text acquisition module and inserting a special element vector to complete the initialization operation of a text matching task;
splicing the main vectors by the splicing module by using a self-attention mechanism;
thirdly, the word segmentation module utilizes a word embedding mechanism to segment the text vector according to word granularity and serves as a final word segmentation result;
fourthly, coding the text by using a relative position coding mechanism and outputting the relative position learned by the model;
step five, using the special element vector inserted in the step one to perform attention pooling calculation with other output vector sequences in the output text sequence;
and step six, performing function calculation by using the classifier to complete text semantic matching.
Further, in the step one, the text matching task specifically includes two parts:
the first part, the text pair is spliced, a special symbol [ CLS ] is added in front of the first sentence in the text pair, a special symbol [ SEP ] is added at the end of the first sentence, then the second sentence is accessed, a special symbol [ SEP ] is added at the end of the second sentence, and the spliced sentences are segmented into sentences according to the character granularity;
and in the second part, the word vector, the segmentation vector and the position vector of each word are summed to be used as a vector representation of the final input BERT model.
Further, in the second step, the self-attention mechanism specifically comprises the following steps:
step two, performing similarity calculation on the query set Q of the current words and each key K to obtain a weight;
step two, normalizing the weights by using a Softmax function;
and step two, weighting and summing the weight and the corresponding value V to obtain the final attention result.
Further, in step three, the self-attention mechanism specifically includes the following steps: in step three, the word embedding mechanism specifically comprises the following steps:
step three, adding Chinese words in the text into an original word list;
step two, inputting a sentence, firstly, segmenting the sentence once by adopting a jieba word segmentation tool to obtain a word sequence wi,wi∈[w1,w2,...,wl];
Step three, traversing wiIf w isiKeeping in a word list, otherwise, carrying out subdivision once by using a word segmentation function carried by BERT;
step three and four, each wiThe word segmentation results are orderly spliced together to serve as the final word segmentation result.
Further, in step four, the relative position coding means adding two groups of vectors representing relationships between words in the self-attention mechanism and taking the vectors as parameters to participate in training, and the specific steps are as follows:
step four, two groups of vectors representing the relationship between the words are interacted:
step four, calculating an attention score;
and step four, weighting and outputting a vector.
Further, in step five, the relative position code depends on the coding mode of the two-dimensional coordinate representation position, and the relative position code is shared in the self-attention mechanism of each layer by converting the multi-dimensional vector into the relative position of the two-dimensional vector, and the representation in the relative position code of any layer is relative information between position and position.
Further, in step six, the classifier is used as a text semantic matching model for the multilayer perceptron, and the classifier consists of a forward propagation neural network, a Softmax normalization function and an Argmax maximum index function:
the forward propagation neural network has two hidden layers in total, all neurons of the first hidden layer are fully connected with a semantic matching representation vector v, and the v is mapped into a high-dimensional semantic space to analyze semantic matching information contained in the high-dimensional semantic space; fully connecting the neurons in the second hidden layer with all the neurons in the first hidden layer, and respectively outputting activation values corresponding to labels 0 with different representative semantics and labels 1 with the same representative semantics to obtain a two-dimensional activation vector;
the Softmax normalization function is used for normalizing the two-dimensional activation vector obtained by the forward propagation network to enable the sum of all elements in the vector to be 1, and a two-dimensional prediction vector is obtained; the vector, namely the prediction of the synonymy relation between two input sentences to be matched by the text semantic matching model, wherein two elements in the vector respectively correspond to the prediction probabilities of a label 0 and a label 1 and are used for calculating a model loss function;
and comparing the probability values of the two elements in the two-dimensional probability vector by the Argmax maximum index function, returning an index corresponding to the maximum probability value element in the vector, wherein the index is a text semantic matching model, and obtaining a final prediction label.
The third scheme is as follows: the text semantic matching equipment for improving the BERT comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the text semantic matching system and the text semantic matching method for improving the BERT when executing the computer program.
The invention has the beneficial effects that:
the model AP _ REP _ WordBERT provided by the invention is mainly improved as follows: firstly, a pre-training model with words as segmentation granularity is adopted in the aspect of pre-training model selection, so that the accuracy of text matching is improved, and the training speed of the model is accelerated; secondly, deleting absolute position codes of the BERT model, and adopting relative position codes, so that information of different positions and among different positions of sentences can be more accurately embodied, and position information among texts can be more definitely embodied; and finally, in a text output stage, obtaining a text representation after dimension reduction and containing more semantic information by adopting an attention pooling mode.
Drawings
FIG. 1 is a block diagram of an AP _ REP _ WordBERT model;
FIG. 2 is a block diagram of the BERT model;
FIG. 3 is a schematic diagram of the encoding of a Transformer model;
FIG. 4 is a block diagram of an attention mechanism;
FIG. 5 is a schematic diagram of a BERT input generation;
FIG. 6 is a diagram of relative position encoded vector encoding;
FIG. 7 is an attention pooling block diagram;
FIG. 8 is a diagram of a classifier structure;
FIG. 9 is a schematic diagram of the proportion of portions of a data set;
FIG. 10 is a graph comparing accuracy of BERT models;
FIG. 11 is a run-time comparison graph with the BERT model;
FIG. 12 is a comparison of different pooling schemes;
FIG. 13 is a graph comparing accuracy of derived models with BERT;
fig. 14 is a comparison graph of different learning rates.
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Detailed Description
The first embodiment is as follows:
the model provided by the embodiment is mainly improved based on a BERT model, firstly, the AP _ REP _ WordBERT model expands a word list through a jieba word segmentation tool on the basis of the BERT pre-training model, and words are changed into basic units of word segmentation by taking the words as the basic units of word segmentation in the word segmentation stage. The improvement not only improves the accuracy of text representation, but also saves memory and improves the training speed of the model; secondly, a position coding mode is improved, most of the pre-training models adopt the same position coding mode, namely an absolute position coding mode, the position coding mode only explicitly expresses information of different positions of the whole sentence and does not reflect position information between words between sentences, for example, the absolute position coding mode can only reflect that the position 1 and the position 2 are different positions, but can not reflect that the distance between the position 1 and the position 2 is shorter than the distance between the position 1 and the position 4, and the relative position coding solves the problem, so that the finally generated text is represented more closely to text semantics; and finally, reducing the dimension of the text representation sequence output by the BERT model by adopting an attention pooling method. Because the text representation sequence output by the pre-training model is a three-dimensional vector, the text representation sequence needs to be reduced into a two-dimensional vector and then is sent to a classifier for classification judgment.
The overall structure of the text semantic matching model provided by the embodiment is shown in fig. 1, the input of the model is two texts to be matched, and in order to utilize the interaction information between the two texts in the encoding process, the two texts to be matched are spliced into a text sequence and then used as the input of the AP _ REP _ WordBERT model.
The model improves the BERT model in three places: firstly, the word segmentation part improves the word segmentation into word segmentation; adding a relative position code and deleting an absolute position code; and thirdly, obtaining text representation by adopting attention pooling. Then, the following three aspects are discussed:
1.1 BERT model:
the BERT model mainly consists of three parts: an input presentation layer, an encoding layer, and an output layer for data pre-processing. The block diagram of the BERT model is shown in FIG. 2.
1.1.1 input representation layer for data preprocessing:
in the text matching task, the input presentation layer of BERT mainly completes two parts of contents, the first part is: splicing the text pair, adding a special symbol [ CLS ] before a first sentence in the text pair, adding a special symbol [ SEP ] at the end of the first sentence, then accessing a second sentence, adding a special symbol [ SEP ] at the end of the second sentence, and segmenting the sentence by the spliced sentence according to the character granularity; the second part is as follows: the word vector, segment vector and position vector for each word are summed as a vector representation for the final input BERT model.
1.1.2 coding layer:
the Transformer model is an essential component of the BERT model coding layer and is denoted Trm in fig. 1 and 2. The Transformer model is divided into a coding part and a decoding part, and because the nature of the model is a classification model, only the coding part of the Transformer is applied, and the internal structure of the coding part is shown in fig. 3:
as can be seen from fig. 3, the Transformer consists of three parts in addition to input and output, a multi-head attention mechanism, a forward feedback network and a layer normalization applying a residual network structure, where Nx represents the number of times the parts are repeated within a box. The multi-head attention mechanism is to extract a major part of the text information. The multi-head attention mechanism is realized by dividing the attention mechanism into a plurality of times, and finally splicing results obtained by each attention mechanism to serve as a final result of the multi-head attention mechanism. The attention mechanism calculation process used by the Transformer model is shown in fig. 4:
the calculation of the attention mechanism is mainly divided into three stages, wherein the first step is to calculate similarity of a query set Q (query) and each key K (Key) to obtain a weight, and a common function is a dot product, as shown in formula (1):
F(Q,Ki)=QTKi (1)
the second step is to normalize these weights using the Softmax function, as shown in equation (2):
Figure BDA0003041737040000061
finally, the normalized weighted value aiWeighting and summing the weight and corresponding Value V (Value) to obtain the final Attention result (Attention Value), wherein Q is the query of the current word, K is the key except the current word, and V is the Value except the current word
As shown in formula (3):
Attention(Q,K,V)=∑iaiVi (3)
the Q, K and V calculations used in the Transformer model all use the same input sequence, so this attention mechanism is called the self-attention mechanism in the Transformer.
1.1.3 output layer:
and after performing transform coding on the input vector, processing the text output sequence according to the specific task to be processed. For the text matching task, mainly adopting a [ CLS ] vector which can most express text meaning in the sentence as the final text representation of the sentence pair, and inputting the vector into a Softmax layer for classification.
In the embodiment, the sentence segmentation mode of the input representation layer is firstly improved by the model, and the segmentation with the character granularity is changed into the segmentation with the word granularity; secondly, deleting absolute position codes and introducing relative position codes into a coding layer; and finally, attention pooling is introduced into an output layer to process the output text representation, and finally the text representation which is more fitted with text semantics is obtained.
2.1 segmenting text by word granularity:
almost all pre-training model word segmentation modes are segmented based on characters at present. Because word segmentation based on words has the following advantages: fewer parameters, independence of word segmentation algorithms and substantially no occurrence of unknown words. Therefore, most of the models adopt the word as the granularity to segment the sentences. Although the segmentation of sentences at word granularity has the advantages above, there are many disadvantages to this approach, and above all, although the number of parameters is smaller than that of segmentation at word granularity, the parameters updated each time are larger than those updated by segmentation at word granularity due to too small segmentation granularity. For example: the sentence "rats in cats and mice can successfully escape each time. "divided by word size" cat/and/old/mouse/lire/old/mouse/every/all/ability/go/work/escape/. "segmentation by word size (taking jieba word segmentation tool as an example)" cat and mouse/rid/mouse/every/all/can/success/escape/. "splitting at word granularity updates 15 word vectors, while only 10 word vectors are updated at word granularity. Therefore, the updating speed is accelerated, and the memory occupied by each updating is reduced. Secondly, this method does not rely on word segmentation algorithms, but also brings the possibility of generating ambiguities. In the above examples, the word "mouse" is used as an example, and it is used as a part of a title in "cat and mouse", and is used as an animal name in combination with other words in "mouse". If the characters are used as the segmentation granularity, only unified processing is performed when the characters are input into a computer for processing, and the problem cannot be caused when the words are used as the segmentation granularity.
The word as the segmentation granularity does not have the advantages of the segmentation by the word granularity, but can be completely solved by a certain method. Firstly, the segmentation parameters are increased by taking words as granularity, so that the phenomenon of overfitting is inevitably generated, but the overfitting can be completely relieved through pre-training, so that the problem cannot be embodied seriously; secondly, only a part of the most common words are reserved by the model depending on the word segmentation algorithm problem, so that the results segmented by the word segmentation tools are almost the same and the difference is small; thirdly, the boundary segmentation is wrong, the problem is difficult to avoid, but the semantic matching of the text is different from the problem of strict boundary segmentation required by the sequence labeling class; fourthly: most characters are added into the word list by the model, and excessive unknown words can not appear.
Because the model is improved on the basis of the BERT model, the original word segmentation mode is inevitably improved, and the word segmentation mode of the model is as follows:
adding the Chinese words into the original word list;
inputting a sentence, and firstly segmenting the sentence once by adopting a jieba word segmentation tool to obtain wi,wi∈[w1,w2,...,wl];
Traverse wiIf w isiKeeping in a word list, otherwise, carrying out subdivision once by using a word segmentation function carried by BERT;
each wiThe word segmentation results are orderly spliced together to serve as the final word segmentation result.
3.1 coding:
3.1.1 relative position coding:
all BERTs and their derivatives encode words almost exclusively in the manner of fig. 5, which indicates that each word is summed from a word vector, a segment vector, and a position vector. The position vector is mainly generated by absolute position coding, but the coding mode can only indicate the relation of different positions of each word in a sentence and cannot indicate the relative relation between different positions, so that the model introduces relative position coding to improve the BERT model.
The model eliminates absolute position codes in BERT, as shown in an orange part of FIG. 5, adds two groups of vectors representing relations between words into a self-attention mechanism by improving the calculation mode of the self-attention mechanism, and takes part in the training process as parameters.
Let the input sequence be x ═ x1,K,xn) Each of which
Figure BDA0003041737040000081
The same length of z-as the input sequence is generated by the self-attention mechanism (z-z)1,K,zn) Each of which
Figure BDA0003041737040000082
Since the BERT model adopts a self-attention mechanism, the equations (1) to (3) are further developedLine refinement, the calculation of the self-attention mechanism needs the following three steps:
first, x is expressed by the formula (4)iAnd xjAnd (3) carrying out interaction:
Figure BDA0003041737040000083
next, the attention score is calculated by equation (5):
Figure BDA0003041737040000084
and finally, weighting by the formula (6) to obtain output:
Figure BDA0003041737040000085
Figure BDA0003041737040000086
these are parameter matrices that are not shared in the self-attention mechanism of each layer. Wherein xWQ,xWKAnd xWVThe results are Q, K and V, respectively, and eij is the similarity of xi and xj, alphaijTo score attention.
In order to express the relationship between different positions in the same sentence, a group of parameters which can train and express relative positions are added in the calculation of the attention score and the final output, and the parameters are shared in the self-attention mechanism of each layer, and the specific steps are as follows:
firstly, when different position words are interacted, the interaction is still carried out in a dot product mode, but a first parameter which represents the relative position is added during the interaction
Figure BDA0003041737040000087
As shown in formula (7):
Figure BDA0003041737040000091
next, in calculating the attention score, the calculation is performed by using the Softmax equation in the same manner as the original self-attention mechanism calculation equation.
Finally, a second parameter representing the relative position is added to the calculation output
Figure BDA0003041737040000092
As shown in equation (8):
Figure BDA0003041737040000093
3.1.2 feasibility analysis of relative position coding:
suppose the input word vector is [ x ]1,x2,K,xi]The vector encoded using absolute position is [ p ]1,p2,K,pi]Inputting the absolute position code and the word vector into the attention-free mechanism, the following operations are performed, as shown in equations (9) to (13):
qi=(xi+pi)WQ (9)
kj=(xj+pj)WK (10)
vj=(xj+pj)WV (11)
Figure BDA0003041737040000094
Figure BDA0003041737040000095
qifor a query k at location ijFor a key v in position jjIs the value a of the j positionijIs the similarity o between the words at i and jiIs the vector output at i;
unfolding formula (12) to give formula (14):
Figure BDA0003041737040000096
after introducing the relative position coding, it can be seen that p in the first bracket isiItem removed, second parenthesis
Figure BDA0003041737040000097
Using binary position vectors
Figure BDA0003041737040000098
Instead, as shown in equation (15):
Figure BDA0003041737040000099
similarly, expansion with respect to formula (13) yields formula (16):
Figure BDA00030417370400000910
will be p in the formulajWVInstead of making
Figure BDA00030417370400000911
To give formula (17):
Figure BDA0003041737040000101
it can be seen that the relative position coding will depend on the coding mode of the position represented by the two-dimensional coordinates (i, j), and through the vector
Figure BDA0003041737040000102
And
Figure BDA0003041737040000103
to a relative position dependent on i-j. It is for this reason that the relative position is encoded in the self-attention mechanism of the respective layersIs shared, and is the relative information between position and position, whether represented in the relative position coding of any layer.
Although two relative position parameters may be passed with respect to the text sequence
Figure BDA0003041737040000104
And
Figure BDA0003041737040000105
capturing relative position information between input elements, as shown in FIG. 6; the maximum relative position is limited to the range of | k |, since an exact relative position is not useful beyond a certain distance. Therefore, the model adopts a cutting mode, the method not only reduces the training parameter quantity, but also can lead the model to be popularized to the sequence length which can not be seen in the training period. Therefore, the relative position is selected from the top k words and the bottom k words centered on the current word, and k is 4 in the model.
The cutting method is shown in equations (18) to (20).
Figure BDA0003041737040000106
Figure BDA0003041737040000107
clip(x,k)=max(-k,min(k,x)) (20)
Finally, the model learns the relative position expression
Figure BDA0003041737040000108
And
Figure BDA0003041737040000109
wherein
Figure BDA00030417370400001010
4.1 attention pooling:
attention pooling arrangement as shown in figure 7,
the attention pooling method is by a special element [ CLS ] inserted when preprocessing text]Performing attention calculation on the vectors and the rest output vector sequences in the output text sequence to obtain a semantic matching text representation vAttAs the text representation sequence E corresponds to a semantically matched text representation.
The attention pooling calculation formula is shown in equation (21):
vAtt=Attention(e[CLS],KE,VE) (21)
wherein: e.g. of the type[CLS]As a special element [ CLS]Corresponding vector, KE、VEIs except for e[CLS]The rest of the text represents the sequence.
The Attention (Attention) mechanism calculation formula is shown in formula (22):
Figure BDA0003041737040000111
vattresults after attention pooling were shown wherein:
Figure BDA0003041737040000112
Figure BDA0003041737040000113
n is the input sequence length. From the above description it can be seen that the attention mechanism calculation method mentioned above is similar, and the current application for the attention mechanism is mainly focused on forming a self-attention mechanism, i.e. the Q matrix, the K matrix and the V matrix are all generated from the same input sequence X, and the Q matrix is generated from the vector CLS]And the K matrix and the V matrix are generated from dividing [ CLS]Text outside the vector represents the sequence. In the BERT model, [ CLS ] was shown]The vectors are used as the overall meaning of the most expressive text, but the rest vectors also have sentence information contained in a specific position, and the final processing method achieves the integration of the whole and the part, so that the output text representation is more consistent with the real text semantics.
5.1 classifier:
the multi-layer perceptron is used as a classifier of a text semantic matching model, and the classifier consists of a forward propagation neural network, a Softmax normalization function and an Argmax maximum index function. The structure of the classifier is shown in fig. 8.
Wherein the forward propagating neural network has two hidden layers in common. All neurons of the first hidden layer are fully connected with the semantic matching representation vector v, and the v is mapped to a high-dimensional semantic space to analyze semantic matching information contained in the high-dimensional semantic space. And fully connecting the neurons in the second hidden layer with all the neurons in the first hidden layer, and respectively outputting the corresponding activation values of a label 0 (with different semantics) and a label 1 (with the same semantics) to obtain a two-dimensional activation vector.
And the Softmax normalization function is used for normalizing the two-dimensional activation vector obtained by the forward propagation network to enable the sum of all elements in the vector to be 1, so that a two-dimensional prediction vector is obtained. The vector, namely the prediction of the synonymy relation between two input sentences to be matched by the text semantic matching model, and two elements in the vector respectively correspond to the prediction probabilities of a label 0 and a label 1 and are used for model loss function calculation.
The Argmax maximum index function compares the probability values of the two elements in the two-dimensional probability vector and returns the index corresponding to the maximum probability value element in the vector (the index starts from 0, if the value of the first element is larger than that of the second element, the index 0 is returned, otherwise, the index 1 is returned). The index is the final predicted label y of the text semantic matching model.
The activation functions used by the classifier all adopt Gaussian Error Linear Units (GELU), and the calculation formula of the GELU is as follows:
Figure BDA0003041737040000114
the classifier has the calculation formula as follows:
Figure BDA0003041737040000115
Figure BDA0003041737040000116
Figure BDA0003041737040000121
y=A rg max(p) (27)
wherein:
Figure BDA0003041737040000122
an output of a first hidden layer of the forward propagation network;
Figure BDA0003041737040000123
weights and biases for the first hidden layer of the forward propagation network; f. of2∈i2Is the output of the second layer of the forward propagation network;
Figure BDA0003041737040000124
b2∈R2weights and biases for the second layer of the forward propagation network; p is the probability vector obtained by the Softmax normalization function.
6.1 data set and Pre-trained model:
the dataset used is a large-scale chinese question matching dataset (LCQMC) that is used to determine the semantic relationship between two chinese question sentences. The LCQMC data set is divided into training set, validation set and test set, which contain 260068 samples, wherein 238766 training sets, 8802 validation sets and 12500 test sets are shown in fig. 9.
Each sample consists of a pair of chinese question sentences and corresponding tags. The labels are divided into two types of 0 and 1, the label 0 represents that the semantics of the two Chinese question sentences are different, the label 1 represents that the semantics of the two Chinese question sentences are the same, and the sample number ratio of the labels 0 to 1 is 1: 1.34. The data set presentation is shown in table 1:
TABLE 1 sentence pairs in dataset
Figure BDA0003041737040000125
The adopted pre-training model is a pursuit-science pre-training model, the pre-training method of the model is to carry out continuous pre-training on the basis of BERT-Chinese of Haohang open source, and the pre-training task is MLM. And in the initialization stage, each word is divided into words by using a Bert self-contained Tokenizer file, and then the average of word embedding is used as the initialization of word embedding. The model is trained by using a single 24G RTX for 100 ten thousand steps (about 10 days), the sequence length is 512, the learning rate is 5e-6, the batch size is 16, the accumulated gradient is 16 steps, and the training is equivalent to that the batch size is about 6 ten thousand steps trained by 256; the corpus is approximately 30 or more G of general-purpose corpus.
The GPU used for all experiments was NVIDIA GTX1080Ti (11G).
6.2 evaluation index
Verifying the performance of emotion classification in the aspect by using the accuracy as an evaluation index, and therefore, defining true (T) to represent the correct number of model predictions by comparing the prediction results of the model on a verification set and a test set with real labels; false (f) represents the number of model prediction errors; number (n) represents the total number of samples predicted by the model, and the calculation formula of the Accuracy (Accuracy) is shown in formula (28): :
Figure BDA0003041737040000131
in general, the larger the Accuracy, the greater the Accuracy, and the better the Accuracy of the model performance.
6.3 objective function:
text matching is a classification problem, and a sparse minimum cross entropy loss function is used as an objective function to optimize a model, wherein the formula is shown as formula (29):
Figure BDA0003041737040000132
wherein D is the size of the data volume of the training set, C is the number of categories of the data set, wherein C is 2,
Figure BDA0003041737040000133
for the prediction category, y is the actual category of the data, λ | | θ | | non-calculation2Is a cross-regularization term.
6.4 parameter setting:
the model training adopts an Adam optimizer to optimize and update all parameters, the embedding size dimension of BERT of the model is used as 768, the bias initialization is 0, the learning rate is set to be 2e-5, Dropout is set to be 0.1, the Batch size is 32, the sequence length is 512, the L2 regular term coefficient a is 10, and the excitation function is ReLU.
6.5 comparative experiment:
6.5.1 comparison with the BERT model:
as shown in fig. 8 and 9 below, the three-point improvement proposed around the example is compared with the BERT model in terms of both accuracy and runtime. The four models to be compared are: BERT model, WordBERT model (a model that performs word segmentation at word granularity), REP _ WordBERT model (adding relative position codes on the basis of WordBERT model), AP _ REP _ WordBERT model (adding attention pooling on the basis of REP _ WordBERT model).
As can be seen from fig. 10, the accuracy of the final model is improved by 2.04% in terms of accuracy compared to the BERT model; in terms of running time, in the experiment, under the current laboratory conditions, when the batch size is 32, the server memory overflows, so that the batch size is 16 for training, and the other three models do not have the above problem, so that the model provided by the embodiment saves more memory. The BERT model run time was compared to the other models with run time trained with a batch size of 16 times 2 as run time with a batch size of 32. It can be seen from fig. 11 that the operating time is also greatly increased.
6.5.2 pooling comparison:
at present, the main flow model mainly comprises a global maximum Pooling (Max Pooling), an Average Pooling (Average Pooling) and an extraction [ CLS ] vector for processing the dimension reduction method of the output sequence of the BERT model, the Attention Pooling (Pooling) is provided for the subject, and the four Pooling modes are compared based on a REP _ WordBERT model in the part. The experimental results are shown in fig. 12:
it can be seen from the figure that the pooling method described in the example has certain advantages over other pooling methods.
6.5.3 comparison with the derivative pre-trained model of BERT:
the model of this part comparison takes the BERT model as the main structure, and the BERT model is improved in some aspects.
The ERNIE model greatly enhances the general semantic representation capability by uniformly modeling the lexical structure, the grammatical structure and the semantic information in the training data;
the BRET-wwm model is a word masked with [ MASK ] in the Chinese pre-training phase, and a word masked by BERT;
the method comprises the following steps that a RoBERTA model is longer in pre-training time, larger in batch size and more in training data, and a Masking mechanism is dynamically adjusted in a pre-training stage;
the ALBERT-xlarge model provides two methods capable of greatly reducing the parameter quantity of the model, so that the model structure of the ALBERT can be expanded to the xlarge version.
The accuracy of the model comes from the official website, except that the BERT model and AP _ REP _ WordBERT are run on the server in the laboratory. Fig. 12 shows that, on the basis of the modified BERT model as well, the accuracy of the proposed model on the LCQMC data set is higher than that of the derivative models of other BERTs.
6.5.4 super parameter tuning:
the subject adopts a control variable method, and selects the optimal learning rate of the model AP _ REP _ WordBERT through multiple times of experiment comparison, and the experiment result is shown in figure 14:
as can be seen from the above figure, when the learning rate is 2 × 10-5The accuracy is highest. The experimental results can be analyzed, when the learning rate is too low, the training is too slow, and when the same training algebra is adopted, the network is not converged to an optimal value; when the learning rate is too large, the network may not converge, resulting in a decrease in accuracy.
The text semantic matching experiment result based on the LCQMC data set shows that the accuracy of the AP _ REP _ WordBERT model is improved by 2.04% compared with the BERT model, and the speed is 1.4 times of that of the BERT model; compared with other BERT derivative models, the method has a certain improvement.
According to the above method example, the functional modules may be divided according to the block diagram shown in fig. 1 of the specification, for example, the functional modules may be divided corresponding to the functions, or two or more functions may be integrated into one processing module; the integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, the division of the modules in the embodiment of the present invention is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
Specifically, the system includes a processor, a memory, a bus, and a communication device; the memory is used for storing computer execution instructions, the processor is connected with the memory through the bus, the processor executes the computer execution instructions stored in the memory, and the communication equipment is responsible for being connected with an external network and carrying out a data receiving and sending process; the processor is connected with the memory, and the memory comprises database software;
specifically, the database software is a database of version more than SQL Server2005 and is stored in a computer-readable storage medium; the processor and the memory contain instructions for causing the personal computer or the server or the network device to perform all or part of the steps of the method; the type of processor used includes central processing units, general purpose processors, digital signal processors, application specific integrated circuits, field programmable gate arrays or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof; the storage medium comprises a U disk, a mobile hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk.
Specifically, the software system is loaded on a Central Processing Unit (CPU), a general purpose Processor, a Digital Signal Processor (DSP), an Application-Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others. The communication device for communication between the relevant person and the user may utilize a transceiver, a transceiver circuit, a communication interface, or the like.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (10)

1. The text semantic matching system for improving BERT is characterized in that: the system comprises a data preprocessing subsystem and a BERT subsystem; the data preprocessing subsystem is responsible for arranging the acquired text and transmitting the text to the BERT model subsystem, the BERT model subsystem is used for establishing a model and outputting the model, and finally the output layer subsystem is used for improving the model and outputting a matching result.
2. The BERT-improved text semantic matching system of claim 1, wherein: the data preprocessing subsystem comprises a text acquisition module, a splicing module and a word segmentation module; the BERT model subsystem comprises an input representation layer, an encoding layer and an output layer; the output layer includes an attention pooling module and a classifier.
3. Text semantic matching method for improving BERT, which is different from the text semantic matching of the existing BERT, is implemented based on the system of any one of claims 1-2, and is characterized in that: the method processes the text by establishing a word embedding mechanism in the transmission layer, a relative position coding mechanism of the coding layer and an attention mechanism after pooling through the output layer, and completes the semantic matching of the subsequent text; the method comprises the following specific steps:
step one, inputting a text by the text acquisition module and inserting a special element vector to complete the initialization operation of a text matching task;
splicing the main vectors by the splicing module by using a self-attention mechanism;
thirdly, the word segmentation module utilizes a word embedding mechanism to segment the text vector according to word granularity and serves as a final word segmentation result;
fourthly, coding the text by using a relative position coding mechanism and outputting the relative position learned by the model;
step five, using the special element vector inserted in the step one to perform attention pooling calculation with other output vector sequences in the output text sequence;
and step six, performing function calculation by using the classifier to complete text semantic matching.
4. The method of claim 3 for text semantic matching for improved BERT, wherein: in step one, the text matching task specifically includes two parts:
the first part, the text pair is spliced, a special symbol [ CLS ] is added in front of the first sentence in the text pair, a special symbol [ SEP ] is added at the end of the first sentence, then the second sentence is accessed, a special symbol [ SEP ] is added at the end of the second sentence, and the spliced sentences are segmented into sentences according to the character granularity;
and in the second part, the word vector, the segmentation vector and the position vector of each word are summed to be used as a vector representation of the final input BERT model.
5. The method of claim 3 for text semantic matching for improved BERT, wherein: in the second step, the self-attention mechanism comprises the following specific steps:
step two, performing similarity calculation on the query set Q of the current words and each key K to obtain a weight;
step two, normalizing the weights by using a Softmax function;
and step two, weighting and summing the weight and the corresponding value V to obtain the final attention result.
6. The method of claim 3 for text semantic matching for improved BERT, wherein: in the third step, the self-attention mechanism comprises the following specific steps: in step three, the word embedding mechanism specifically comprises the following steps:
step three, adding Chinese words in the text into an original word list;
step two, inputting a sentence, firstly, segmenting the sentence once by adopting a jieba word segmentation tool to obtain a word sequence wi,wi∈[w1,w2,...,wl];
Step three, traversing wiIf w isiKeeping in a word list, otherwise, carrying out subdivision once by using a word segmentation function carried by BERT;
step three and four, each wiThe word segmentation results are orderly spliced together to serve as the final word segmentation result.
7. The method of claim 3 for text semantic matching for improved BERT, wherein: in the fourth step, the relative position coding means adding two groups of vectors representing the relationship between words in the self-attention mechanism and taking the vectors as parameters to participate in training, and the specific steps are as follows:
step four, two groups of vectors representing the relationship between the words are interacted:
step four, calculating an attention score;
and step four, weighting and outputting a vector.
8. The method of text semantic matching for improved BERT according to claim 7, wherein: in the fifth step, the relative position code depends on the coding mode of the two-dimensional coordinate representation position, and the relative position code is shared in the self-attention mechanism of each layer by converting the multi-dimensional vector into the relative position of the two-dimensional vector, and the representation in the relative position code of any layer is relative information between the position and the position.
9. The method of claim 3 for text semantic matching for improved BERT, wherein: in the sixth step, the classifier is used as a text semantic matching model for the multilayer perceptron, and the classifier consists of a forward propagation neural network, a Softmax normalization function and an Argmax maximum index function:
the forward propagation neural network has two hidden layers in total, all neurons of the first hidden layer are fully connected with a semantic matching representation vector v, and the v is mapped into a high-dimensional semantic space to analyze semantic matching information contained in the high-dimensional semantic space; fully connecting the neurons in the second hidden layer with all the neurons in the first hidden layer, and respectively outputting activation values corresponding to labels 0 with different representative semantics and labels 1 with the same representative semantics to obtain a two-dimensional activation vector;
the Softmax normalization function is used for normalizing the two-dimensional activation vector obtained by the forward propagation network to enable the sum of all elements in the vector to be 1, and a two-dimensional prediction vector is obtained; the vector, namely the prediction of the synonymy relation between two input sentences to be matched by the text semantic matching model, wherein two elements in the vector respectively correspond to the prediction probabilities of a label 0 and a label 1 and are used for calculating a model loss function;
and comparing the probability values of the two elements in the two-dimensional probability vector by the Argmax maximum index function, returning an index corresponding to the maximum probability value element in the vector, wherein the index is a text semantic matching model, and obtaining a final prediction label.
10. Text semantic matching equipment for improving BERT, which is characterized in that: the system comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to realize the system and the method for text semantic matching of the BERT improvement of any one of claims 1 to 9.
CN202110459186.7A 2021-04-27 2021-04-27 Text semantic matching device, system, method and storage medium for improving BERT Pending CN113239700A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110459186.7A CN113239700A (en) 2021-04-27 2021-04-27 Text semantic matching device, system, method and storage medium for improving BERT

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110459186.7A CN113239700A (en) 2021-04-27 2021-04-27 Text semantic matching device, system, method and storage medium for improving BERT

Publications (1)

Publication Number Publication Date
CN113239700A true CN113239700A (en) 2021-08-10

Family

ID=77129382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110459186.7A Pending CN113239700A (en) 2021-04-27 2021-04-27 Text semantic matching device, system, method and storage medium for improving BERT

Country Status (1)

Country Link
CN (1) CN113239700A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414481A (en) * 2020-03-19 2020-07-14 哈尔滨理工大学 Chinese semantic matching method based on pinyin and BERT embedding
CN113609855A (en) * 2021-08-12 2021-11-05 上海金仕达软件科技有限公司 Information extraction method and device
CN113641793A (en) * 2021-08-16 2021-11-12 国网安徽省电力有限公司电力科学研究院 Retrieval system for long text matching optimization aiming at power standard
CN113656547A (en) * 2021-08-17 2021-11-16 平安科技(深圳)有限公司 Text matching method, device, equipment and storage medium
CN113688621A (en) * 2021-09-01 2021-11-23 四川大学 Text matching method and device for texts with different lengths under different granularities
CN113779987A (en) * 2021-08-23 2021-12-10 科大国创云网科技有限公司 Event co-reference disambiguation method and system based on self-attention enhanced semantics
CN113792541A (en) * 2021-09-24 2021-12-14 福州大学 Aspect-level emotion analysis method introducing mutual information regularizer
CN114154496A (en) * 2022-02-08 2022-03-08 成都四方伟业软件股份有限公司 Coal prison classification scheme comparison method and device based on deep learning BERT model
CN114742035A (en) * 2022-05-19 2022-07-12 北京百度网讯科技有限公司 Text processing method and network model training method based on attention mechanism optimization
CN114818698A (en) * 2022-04-28 2022-07-29 华中师范大学 Mixed word embedding method of natural language text and mathematical language text
CN114818644A (en) * 2022-06-27 2022-07-29 北京云迹科技股份有限公司 Text template generation method, device, equipment and storage medium
CN115329883A (en) * 2022-08-22 2022-11-11 桂林电子科技大学 Semantic similarity processing method, device and system and storage medium
CN115357718A (en) * 2022-10-20 2022-11-18 佛山科学技术学院 Method, system, device and storage medium for discovering repeated materials of theme integration service
CN115357719A (en) * 2022-10-20 2022-11-18 国网天津市电力公司培训中心 Power audit text classification method and device based on improved BERT model
CN115617990A (en) * 2022-09-28 2023-01-17 浙江大学 Electric power equipment defect short text classification method and system based on deep learning algorithm
WO2023020522A1 (en) * 2021-08-18 2023-02-23 京东方科技集团股份有限公司 Methods for natural language processing and training natural language processing model, and device
CN115811630A (en) * 2023-02-09 2023-03-17 成都航空职业技术学院 Education informatization method based on artificial intelligence
CN117573813A (en) * 2024-01-17 2024-02-20 清华大学 Method, system, equipment and medium for positioning and detecting internal knowledge of large language model
WO2024046316A1 (en) * 2022-09-01 2024-03-07 国网智能电网研究院有限公司 Power domain model pre-training method and apparatus, and fine-tuning method and apparatus, device, storage medium and computer program product

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414481A (en) * 2020-03-19 2020-07-14 哈尔滨理工大学 Chinese semantic matching method based on pinyin and BERT embedding
CN111414481B (en) * 2020-03-19 2023-09-26 哈尔滨理工大学 Chinese semantic matching method based on pinyin and BERT embedding
CN113609855A (en) * 2021-08-12 2021-11-05 上海金仕达软件科技有限公司 Information extraction method and device
CN113641793A (en) * 2021-08-16 2021-11-12 国网安徽省电力有限公司电力科学研究院 Retrieval system for long text matching optimization aiming at power standard
CN113641793B (en) * 2021-08-16 2024-05-07 国网安徽省电力有限公司电力科学研究院 Retrieval system for long text matching optimization aiming at electric power standard
CN113656547A (en) * 2021-08-17 2021-11-16 平安科技(深圳)有限公司 Text matching method, device, equipment and storage medium
CN113656547B (en) * 2021-08-17 2023-06-30 平安科技(深圳)有限公司 Text matching method, device, equipment and storage medium
WO2023020522A1 (en) * 2021-08-18 2023-02-23 京东方科技集团股份有限公司 Methods for natural language processing and training natural language processing model, and device
CN113779987A (en) * 2021-08-23 2021-12-10 科大国创云网科技有限公司 Event co-reference disambiguation method and system based on self-attention enhanced semantics
CN113688621A (en) * 2021-09-01 2021-11-23 四川大学 Text matching method and device for texts with different lengths under different granularities
CN113688621B (en) * 2021-09-01 2023-04-07 四川大学 Text matching method and device for texts with different lengths under different granularities
CN113792541B (en) * 2021-09-24 2023-08-11 福州大学 Aspect-level emotion analysis method introducing mutual information regularizer
CN113792541A (en) * 2021-09-24 2021-12-14 福州大学 Aspect-level emotion analysis method introducing mutual information regularizer
CN114154496A (en) * 2022-02-08 2022-03-08 成都四方伟业软件股份有限公司 Coal prison classification scheme comparison method and device based on deep learning BERT model
CN114818698A (en) * 2022-04-28 2022-07-29 华中师范大学 Mixed word embedding method of natural language text and mathematical language text
CN114818698B (en) * 2022-04-28 2024-04-16 华中师范大学 Mixed word embedding method for natural language text and mathematical language text
CN114742035A (en) * 2022-05-19 2022-07-12 北京百度网讯科技有限公司 Text processing method and network model training method based on attention mechanism optimization
CN114818644A (en) * 2022-06-27 2022-07-29 北京云迹科技股份有限公司 Text template generation method, device, equipment and storage medium
CN114818644B (en) * 2022-06-27 2022-10-04 北京云迹科技股份有限公司 Text template generation method, device, equipment and storage medium
CN115329883A (en) * 2022-08-22 2022-11-11 桂林电子科技大学 Semantic similarity processing method, device and system and storage medium
WO2024046316A1 (en) * 2022-09-01 2024-03-07 国网智能电网研究院有限公司 Power domain model pre-training method and apparatus, and fine-tuning method and apparatus, device, storage medium and computer program product
CN115617990B (en) * 2022-09-28 2023-09-05 浙江大学 Power equipment defect short text classification method and system based on deep learning algorithm
CN115617990A (en) * 2022-09-28 2023-01-17 浙江大学 Electric power equipment defect short text classification method and system based on deep learning algorithm
CN115357719A (en) * 2022-10-20 2022-11-18 国网天津市电力公司培训中心 Power audit text classification method and device based on improved BERT model
CN115357718A (en) * 2022-10-20 2022-11-18 佛山科学技术学院 Method, system, device and storage medium for discovering repeated materials of theme integration service
CN115811630A (en) * 2023-02-09 2023-03-17 成都航空职业技术学院 Education informatization method based on artificial intelligence
CN117573813A (en) * 2024-01-17 2024-02-20 清华大学 Method, system, equipment and medium for positioning and detecting internal knowledge of large language model
CN117573813B (en) * 2024-01-17 2024-03-19 清华大学 Method, system, equipment and medium for positioning and detecting internal knowledge of large language model

Similar Documents

Publication Publication Date Title
CN113239700A (en) Text semantic matching device, system, method and storage medium for improving BERT
CN111368996B (en) Retraining projection network capable of transmitting natural language representation
CN113011533B (en) Text classification method, apparatus, computer device and storage medium
Young et al. Recent trends in deep learning based natural language processing
CN108733792B (en) Entity relation extraction method
US20220180073A1 (en) Linguistically rich cross-lingual text event embeddings
WO2022057776A1 (en) Model compression method and apparatus
CN111930942B (en) Text classification method, language model training method, device and equipment
CN110321563B (en) Text emotion analysis method based on hybrid supervision model
CN111221944B (en) Text intention recognition method, device, equipment and storage medium
CN112232087B (en) Specific aspect emotion analysis method of multi-granularity attention model based on Transformer
CN113392209B (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN111414481A (en) Chinese semantic matching method based on pinyin and BERT embedding
KR20220114495A (en) Interaction layer neural network for search, retrieval, and ranking
CN113239169A (en) Artificial intelligence-based answer generation method, device, equipment and storage medium
Bokka et al. Deep Learning for Natural Language Processing: Solve your natural language processing problems with smart deep neural networks
CN111241232A (en) Business service processing method and device, service platform and storage medium
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN113535897A (en) Fine-grained emotion analysis method based on syntactic relation and opinion word distribution
CN112269874A (en) Text classification method and system
Liu et al. A hybrid neural network BERT-cap based on pre-trained language model and capsule network for user intent classification
CN115169349A (en) Chinese electronic resume named entity recognition method based on ALBERT
WO2021129411A1 (en) Text processing method and device
Seilsepour et al. Self-supervised sentiment classification based on semantic similarity measures and contextual embedding using metaheuristic optimizer
CN111581365B (en) Predicate extraction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination