CN116029354A - Text pair-oriented Chinese language model pre-training method - Google Patents
Text pair-oriented Chinese language model pre-training method Download PDFInfo
- Publication number
- CN116029354A CN116029354A CN202210950700.1A CN202210950700A CN116029354A CN 116029354 A CN116029354 A CN 116029354A CN 202210950700 A CN202210950700 A CN 202210950700A CN 116029354 A CN116029354 A CN 116029354A
- Authority
- CN
- China
- Prior art keywords
- text
- word
- emb
- vector
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Machine Translation (AREA)
Abstract
The invention provides a text pair-oriented Chinese language model pre-training method, which comprises the following steps: inputting text pairs; the text pair comprises a text A and a text B which are arranged in pairs; in the text A, randomly selecting n words, and shielding each randomly selected word by adopting shielding characters to obtain a shielded text; word segmentation is carried out on the text B, and each word subjected to word segmentation is disordered, so that a text subjected to disordered is obtained; splicing the text A1, the text B1 and the text B to obtain a spliced text; and after the spliced text is encoded, a masking word prediction task and a word order recovery task are respectively adopted to obtain a total loss function. The invention provides a text pair-oriented Chinese language model pre-training method, which can learn language and word sequence information in a text pair more fully, thereby improving the pre-training model effect.
Description
Technical Field
The invention belongs to the technical field of computer natural language processing, and particularly relates to a text pair-oriented Chinese language model pre-training method.
Background
The advent of Pre-trained Models (PTMs) brought NLP into a new era. At present, a plurality of industrial applications adopt a mode of fine adjustment of PTMS+downstream task data, and have achieved an effect exceeding the past.
Among the many NLP tasks, text takes on the form of tasks such as text semantic matching tasks, question answer pair (question answer pair, QA) matching tasks, etc. Aiming at the task academia, a plurality of pre-training tasks are provided for training related PTMs, and in summary, two methods are mainly adopted, namely, a language model method is adopted, the core is to shield certain words, and then a training stage tries to recover the shielded words; and secondly, predicting the relation of the text pairs, such as scrambling the sequence of continuous sentences in the original article, trying to judge whether the text pairs are continuous in the original text in training, and the like. The knowledge learned by the pre-training tasks is generally simpler, the information quantity is relatively limited, and noise is easy to introduce, so that how to enable the text to learn the language and other information in the text pair more fully for the pre-training tasks, and the efficiency of the text to the pre-training tasks is improved, and the problem to be solved at present is solved.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides a text pair-oriented Chinese language model pre-training method, which can effectively solve the problems.
The technical scheme adopted by the invention is as follows:
the invention provides a text pair-oriented Chinese language model pre-training method, which comprises the following steps:
step 1, inputting text pairs; the text pair comprises a text A and a text B which are arranged in pairs;
step 2, randomly selecting n words in the text A, wherein each randomly selected word adopts shielding characters to carry out shielding treatment to obtain a shielded text, and the shielded text is expressed as a text A1;
word segmentation is carried out on the text B, and each word after word segmentation is disordered, so that a text with disordered sequence is obtained and is expressed as a text B1;
step 3, dividing the text A1, the text B1 and the text B according to the word, correspondingly obtaining the text A1, the text B1 and the text B;
splicing the text A1, the text B1 and the text B to obtain a spliced text [ CLS ] A1[ SEP ] B1[ SOS ] B [ EOS ]; wherein: [ CLS ], [ SEP ], [ SOS ] and [ EOS ] are respectively: a first separator, a second separator, a third separator, and a fourth separator;
step 4, taking the spliced text as input, inputting the input text into a coding module to respectively obtain a coding vector matrix V of the text A1 A1 Coding vector matrix V of text B1 B1 And the coding vector matrix V of the text B B ;
Wherein:
step 4.1, encoding vector matrix V of text A1 A1 Obtained by:
1) Word code Emb of i-th word in text A1 A1(i) A word vector Emb for the i-th word in text A1 A1char(i) Position code Emb with ith word in text A1 A1pos(i) Type code Emb with text A1 type(A1) The sum of which is given by:
Emb A1(i) =Emb A1char(i) +Emb A1pos(i) +Emb type(A1)
2) In the text A1, the words of each word are encoded to form an encoding vector matrix V of the text A1 A1 ;
Step 4.2, encoding vector matrix V of text B1 B1 Obtained by:
1) Word code Emb of i-th word in text B1 B1(i) A word vector Emb for the i-th word in the text B1 B1char(i) Position code Emb with ith word in text B1 B1pos(i) Type code Emb with text B1 type(B1) The sum of which is given by:
Emb B1(i) =Emb B1char(i) +Emb B1pos(i) +Emb type(B1)
2) In the text B1, the words of each word are encoded to form an encoding vector matrix V of the text B1 B1 ;
Step 4.3, encoding vector matrix V of text B B Obtained by:
1) Word code Emb of i-th word in text B B(i) Word vector Emb for the i-th word in text B Bchar(i) Position code Emb with i-th word in text B Bpos(i) Type code Emb with text B type(B) The sum of which is given by:
Emb B(i) =Emb Bchar(i) +Emb Bpos(i) +Emb type(B)
2) In the text B, the words of each word are encoded to form an encoding vector matrix V of the text B B ;
Wherein:
word vector Emb of i-th word in text A1 A1char(i) Word vector Emb of i-th word in text B1 B1char(i) And the word vector Emb of the i-th word in text B Bchar(i) All are obtained by inquiring a dictionary;
position code Emb of ith word in text A1 A1pos(i) Position code Emb of i-th word in text B1 B1pos(i) And the position code Emb of the i-th word in the text B Bpos(i) The position code of each word in the text of the spliced text is referred to;
type encoding Emb for text A1 type(A1) Type code Emb of text B1 type(B1) And type code Emb for text B type(B) Three different text types;
step 5, the coding vector matrix V of the text A1 A1 Coding vector matrix V of text B1 B1 And the coding vector matrix V of the text B B Inputting to a pre-training task layer, and calculating to obtain a total Loss function Loss by adopting the following method Total (S) :
Step 5.1, a pre-training task layer comprises a masking word prediction task and a word order recovery task;
step 5.2, obtaining a first Loss function Loss by masking the word prediction task and adopting the following formula 1 (x,θ):
Wherein:
P(x a |V A1 ,V B1 ) The meaning is as follows: the matrix of coded vectors V in text A1 A1 In reading a predicted mask word x a To read mask word x a Encoding vector matrix V of vector and text B of (C) B Splicing to obtain a spliced vector, and multiplying the spliced vector with the dictionary matrix to obtain a probability matrix; in the probability matrix, a maximum probability value, namely P (x a |V A1 ,V B1 ) The method comprises the steps of carrying out a first treatment on the surface of the The dictionary matrix is a matrix formed by word vectors of each word in the dictionary;
-log P(x a |V A1 ,V B1 ): representing cross entropy calculations, namely: using standard cross entropy pairs P (x a |V A1 ,V B1 ) Calculating to obtain a shielding word x a Is a loss value of (2);
e (): representing an averaging calculation;
a e len (A1), representing that in the text A1, the a-th word is masked;
thus, each mask predicts a penalty; then, summing the loss values of the shielding words, and dividing the sum by the number of the shielding words to obtain an average loss value;
step 5.3, obtaining a second Loss function Loss by recovering tasks through the word order and adopting the following formula 2 (x,θ):
Wherein:
b represents the position of the word predicted in text B, which has c words in total, so b=1, 2, …, c;
for word vector x predicted at bit B in text B b The loss value-log P (x) was obtained by the following method b |x b-1:0 ,V A1 ,V B1 ):
1) Input V A1 ,V B1 And x b-1:0 ;
Wherein:
x b-1:0 the meaning is as follows: the 0 th bit separator vector x in front of text B 0 Text B1 st bit word vector x 1 …, text B-1 bit word vector x b-1 Vector formed by splicing;
2) Let V A1 ,V B1 And x b-1:0 Performing splicing operation to obtain word vector x to be predicted b Context vector of (a);
3)P(x b |x b-1:0 ,V A1 ,V B1 ) The meaning is as follows: using a sequence-to-sequence seq2seq model, including an encoding end and a decoding end; inputting a context vector at an encoding end; outputting the predicted word vector x at the decoding end b Is set, and a predicted probability value thereof;
4)-log P(x b |x b-1:0 ,V A1 ,V B1 ): using standard cross entropy pairs P (x b |x b-1:0 ,V A1 ,V B1 ) Calculating to obtain word vector x b Is a loss value of (2);
each predicted word vector in the text B obtains a loss value; averaging the Loss values to obtain a second Loss function Loss 2 (x,θ);
Step 5.4, for the first Loss function Loss 1 (x, θ) and a second Loss function Loss 2 (x, θ) to obtain the total Loss function Loss Total (S) ;
Step 6, judging whether the training reaches the maximum iteration number, if not, according to the total Loss function Loss Total (S) Obtaining a gradient, carrying out back transmission and parameter updating on the model parameter theta, and returning to the step 4; if yes, stopping training to obtain the pre-trained language model.
Preferably, in step 2, for each randomly selected word, the masking character [ MASK ] is used to replace the corresponding word, resulting in a masked text.
Preferably, the text a and the text B refer to: the text A is a question text; text B is the answer text.
The text pair-oriented Chinese language model pre-training method provided by the invention has the following advantages:
the invention provides a text pair-oriented Chinese language model pre-training method, which can learn language and word sequence information in a text pair more fully, thereby improving the pre-training model effect.
Drawings
FIG. 1 is a flow chart of a text-pair-oriented Chinese language model pre-training method provided by the invention.
Detailed Description
In order to make the technical problems, technical schemes and beneficial effects solved by the invention more clear, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The invention provides a text pair-oriented Chinese language model pre-training method, which can learn language and word sequence information in a text pair more fully, thereby improving the pre-training model effect.
Referring to fig. 1, the invention provides a text-pair-oriented Chinese language model pre-training method, which comprises the following steps:
step 1, inputting text pairs; the text pair comprises a text A and a text B which are arranged in pairs;
in the invention, sources of the corpus text pairs include:
firstly, an open source similarity corpus on the Internet;
secondly, the open-source answer pair corpus on the Internet; for example, text a is a question text; text B is the answer text.
Thirdly, the desensitized user in the search engine retrieves query and user click information (comprising article titles, abstracts, texts and the like which are spliced together).
Step 2, randomly selecting n words in the text A, wherein each randomly selected word adopts shielding characters to carry out shielding treatment to obtain a shielded text, and the shielded text is expressed as a text A1;
in this step, for each randomly selected word, the masking character [ MASK ] is used to replace the corresponding word, resulting in a masked text.
Word segmentation is carried out on the text B, and each word after word segmentation is disordered, so that a text with disordered sequence is obtained and is expressed as a text B1;
for example, text A is the query of the user in the search engine and text B is the click browsing information of the user.
Assuming that the text A is "I love history museum", masking the two words of "calendar" and "history" to obtain the text A1 as follows: "I love [ MASK ] [ MASK ] museum".
Assuming that the text B is "rising sun in museum", the text B is "rising sun in museum" after word segmentation, the text B1 is obtained by scrambling the sequence: "museum on sun up".
Step 3, dividing the text A1, the text B1 and the text B according to the word, correspondingly obtaining the text A1, the text B1 and the text B;
splicing the text A1, the text B1 and the text B to obtain a spliced text [ CLS ] A1[ SEP ] B1[ SOS ] B [ EOS ]; wherein: [ CLS ], [ SEP ], [ SOS ] and [ EOS ] are respectively: a first separator, a second separator, a third separator, and a fourth separator;
step 4, taking the spliced text as input, inputting the input text into a coding module to respectively obtain a coding vector matrix V of the text A1 A1 Coding vector matrix V of text B1 B1 And the coding vector matrix V of the text B B ;
Wherein:
step 4.1, encoding vector matrix V of text A1 A1 Obtained by:
1) Word code Emb of i-th word in text A1 A1(i) A word vector Emb for the i-th word in text A1 A1char(i) With the position of the i-th word in text A1Code Emb A1pos(i) Type code Emb with text A1 type(A1) The sum of which is given by:
Emb A1(i) =Emb A1char(i) +Emb A1pos(i) +Emb type(A1)
2) In the text A1, the words of each word are encoded to form an encoding vector matrix V of the text A1 A1 ;
Step 4.2, encoding vector matrix V of text B1 B1 Obtained by:
1) Word code Emb of i-th word in text B1 B1(i) A word vector Emb for the i-th word in the text B1 B1char(i) Position code Emb with ith word in text B1 B1pos(i) Type code Emb with text B1 type(B1) The sum of which is given by:
Emb B1(i) =Emb B1char(i) +Emb B1pos(i) +Emb type(B1)
2) In the text B1, the words of each word are encoded to form an encoding vector matrix V of the text B1 B1 ;
Step 4.3, encoding vector matrix V of text B B Obtained by:
1) Word code Emb of i-th word in text B B(i) Word vector Emb for the i-th word in text B Bchar(i) Position code Emb with i-th word in text B Bpos(i) Type code Emb with text B type(B) The sum of which is given by:
Emb B(i) =Emb Bchar(i) +Emb Bpos(i) +Emb type(B)
2) In the text B, the words of each word are encoded to form an encoding vector matrix V of the text B B ;
Wherein:
word vector Emb of i-th word in text A1 A1char(i) Word vector Emb of i-th word in text B1 B1char(i) And the word vector Emb of the i-th word in text B Bchar(i) All are obtained by inquiring a dictionary;
position code Emb of ith word in text A1 A1pos(i) Position code Emb of i-th word in text B1 B1pos(i) And the position code Emb of the i-th word in the text B Bpos(i) The position code of each word in the text of the spliced text is referred to;
type encoding Emb for text A1 type(A1) Type code Emb of text B1 type(B1) And type code Emb for text B type(B) Three different text types;
step 5, the coding vector matrix V of the text A1 A1 Coding vector matrix V of text B1 B1 And the coding vector matrix V of the text B B Inputting to a pre-training task layer, and calculating to obtain a total Loss function Loss by adopting the following method Total (S) :
Step 5.1, a pre-training task layer comprises a masking word prediction task and a word order recovery task;
step 5.2, obtaining a first Loss function Loss by masking the word prediction task and adopting the following formula 1 (x,θ):
Wherein:
P(x a |V A1 ,V B1 ) The meaning is as follows: the matrix of coded vectors V in text A1 A1 In reading a predicted mask word x a To read mask word x a Encoding vector matrix V of vector and text B of (C) B Splicing to obtain a spliced vector, and multiplying the spliced vector with the dictionary matrix to obtain a probability matrix; in the probability matrix, a maximum probability value, namely P (x a |V A1 ,V B1 ) The method comprises the steps of carrying out a first treatment on the surface of the The dictionary matrix is a matrix formed by word vectors of each word in the dictionary;
-log P(x a |V A1 ,V B1 ): representing cross entropy calculations, namely: using standard cross entropy pairs P (x a |V A1 ,V B1 ) Calculating to obtain a shielding word x a Is a loss value of (2);
e (): representing an averaging calculation;
a e len (A1), representing that in the text A1, the a-th word is masked;
thus, each mask predicts a penalty; then, summing the loss values of the shielding words, and dividing the sum by the number of the shielding words to obtain an average loss value;
the method is mainly used for predicting the characters of the text A1 which are shielded, the text A1 and the text B1 are used as input, and the text A1 and the text B1 are text pairs and can be perceived mutually, so that the understanding of the text A1 can be improved and the diversity of the text A1 information can be increased by using the method.
Step 5.3, obtaining a second Loss function Loss by recovering tasks through the word order and adopting the following formula 2 (x,θ):
Wherein:
b represents the position of the word predicted in text B, which has c words in total, so b=1, 2, …, c;
for word vector x predicted at bit B in text B b The loss value-log P (x) was obtained by the following method b |x b-1:0 ,V A1 ,V B1 ):
1) Input V A1 ,V B1 And x b-1:0 ;
Wherein:
x b-1:0 the meaning is as follows: the 0 th bit separator vector x in front of text B 0 Text B1 st bit word vector x 1 …, text B-1 bit word vector x b-1 Vector formed by splicing;
2) Let V A1 ,V B1 And x b-1:0 Performing splicing operation to obtain word vector x to be predicted b Context vector of (a);
3)P(x b |x b-1:0 ,V A1 ,V B1 ) The meaning is as follows: using a sequence-to-sequence seq2seq model, including an encoding end and a decoding end; inputting a context vector at an encoding end; outputting the predicted word direction at the decoding endQuantity x b Is set, and a predicted probability value thereof;
4)-log P(x b |x b-1:0 ,V A1 ,V B1 ): using standard cross entropy pairs P (x b |x b-1:0 ,V A1 ,V B1 ) Calculating to obtain word vector x b Is a loss value of (2);
each predicted word vector in the text B obtains a loss value; averaging the Loss values to obtain a second Loss function Loss 2 (x,θ);
The step is mainly used for restoring the word order of the text B, and the main completion mode is implemented by a generation formula, and is exemplified below:
such as: the text A1 is: "I love [ MASK ] [ MASK ] museum". The text B1 is: "museum on sun up". Text B is "museum rising sun".
Splicing the text A1, the text B1 and the text B to obtain a spliced text which is:
[ CLS I love [ MASK ] [ MASK ] museum [ SEP solar museum liter [ SOS ] museum solar liter [ EOS ]
Only the part of the "[ CLS ] i love [ MASK ] museum [ SEP ] solar museum up" is visible when two [ MASK ] are masked out in the prediction text A1.
The order of the text B is restored, and one word in "museum up sun [ EOS ]" is predicted each time.
First, b=1, "[ CLS ] when predicting" blogs "]I love [ MASK][MASK]Museum [ SEP ]]Solar museum liter [ SOS ]]"this parts are mutually visible, namely: x is x b-1:0 =x 0 =[SOS];
b=2, "[ CLS", when predicting "object"]I love [ MASK][MASK]Museum [ SEP ]]Solar museum liter [ SOS ]]The "day" parts are visible to each other, namely: x is x b-1:0 =[SOS]A step of blogging;
b=3, "[ CLS ] i love [ MASK ] museum [ SEP ] solar museum [ SOS ] museum" this part is visible to each other when predicting "museum".
And so on.
Thus, for text B to be recovered, each time a word in B is predicted, the input is the word vector in text B that has been previously predicted, i.e., the next word in B is predicted each time under the condition that the word vectors in A1, B1, and B have been found.
Step 5.4, for the first Loss function Loss 1 (x, θ) and a second Loss function Loss 2 (x, θ) to obtain the total Loss function Loss Total (S) ;
Step 6, judging whether the training reaches the maximum iteration number, if not, according to the total Loss function Loss Total (S) Obtaining a gradient, carrying out back transmission and parameter updating on the model parameter theta, and returning to the step 4; if yes, stopping training to obtain the pre-trained language model.
The invention provides a text pair-oriented Chinese language model pre-training method, which uses a generating method on the basis of encoding text pair information, when the language sequence of a text B is restored, the semantic relation between the text A1 and the text B1 is learned (the information of A1 is encoded in the sequence restoration of the text B), and simultaneously the restoration of the sequence of the text B can be regarded as a higher-level language modeling.
The invention relates to a text pair-oriented Chinese language model pre-training method, which is characterized in that a text is pre-trained by using a language model and a generated order recovery task for segmentation, and language order information in a text pair can be more fully learned, so that the pre-training model effect is improved.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which is also intended to be covered by the present invention.
Claims (3)
1. A text pair-oriented Chinese language model pre-training method is characterized by comprising the following steps:
step 1, inputting text pairs; the text pair comprises a text A and a text B which are arranged in pairs;
step 2, randomly selecting n words in the text A, wherein each randomly selected word adopts shielding characters to carry out shielding treatment to obtain a shielded text, and the shielded text is expressed as a text A1;
word segmentation is carried out on the text B, and each word after word segmentation is disordered, so that a text with disordered sequence is obtained and is expressed as a text B1;
step 3, dividing the text A1, the text B1 and the text B according to the word, correspondingly obtaining the text A1, the text B1 and the text B;
splicing the text A1, the text B1 and the text B to obtain a spliced text [ CLS ] A1[ SEP ] B1[ SOS ] B [ EOS ]; wherein: [ CLS ], [ SEP ], [ SOS ] and [ EOS ] are respectively: a first separator, a second separator, a third separator, and a fourth separator;
step 4, taking the spliced text as input, inputting the input text into a coding module to respectively obtain a coding vector matrix V of the text A1 A1 Coding vector matrix V of text B1 B1 And the coding vector matrix V of the text B B ;
Wherein:
step 4.1, encoding vector matrix V of text A1 A1 Obtained by:
1) Word code Emb of i-th word in text A1 A1(i) A word vector Emb for the i-th word in text A1 A1char(i) Position code Emb with ith word in text A1 A1pos(i) Type code Emb with text A1 type(A1) The sum of which is given by:
Emb A1(i) =Emb A1char(i) +Emb A1pos(i) +Emb type(A1)
2) In the text A1, the words of each word are encoded to form an encoding vector matrix V of the text A1 A1 ;
Step 4.2, encoding vector matrix V of text B1 B1 Obtained by:
1) Word code Emb of i-th word in text B1 B1(i) A word vector Emb for the i-th word in the text B1 B1char(i) Position code Emb with ith word in text B1 B1pos(i) Type code Emb with text B1 type(B1) Sum of formulas such asThe following steps:
Emb B1(i) =Emb B1char(i) +Emb B1pos(i) +Emb type(B1)
2) In the text B1, the words of each word are encoded to form an encoding vector matrix V of the text B1 B1 ;
Step 4.3, encoding vector matrix V of text B B Obtained by:
1) Word code Emb of i-th word in text B B(i) Word vector Emb for the i-th word in text B Bchar(i) Position code Emb with i-th word in text B Bpos(i) Type code Emb with text B type(B) The sum of which is given by:
Emb B(i) =Emb Bchar(i) +Emb Bpos(i) +Emb type(B)
2) In the text B, the words of each word are encoded to form an encoding vector matrix V of the text B B ;
Wherein:
word vector Emb of i-th word in text A1 A1char(i) Word vector Emb of i-th word in text B1 B1char(i) And the word vector Emb of the i-th word in text B Bchar(i) All are obtained by inquiring a dictionary;
position code Emb of ith word in text A1 A1pos(i) Position code Emb of i-th word in text B1 B1pos(i) And the position code Emb of the i-th word in the text B Bpos(i) The position code of each word in the text of the spliced text is referred to;
type encoding Emb for text A1 type(A1) Type code Emb of text B1 type(B1) And type code Emb for text B type(B) Three different text types;
step 5, the coding vector matrix V of the text A1 A1 Coding vector matrix V of text B1 B1 And the coding vector matrix V of the text B B Inputting to a pre-training task layer, and calculating to obtain a total Loss function Loss by adopting the following method Total (S) :
Step 5.1, a pre-training task layer comprises a masking word prediction task and a word order recovery task;
step 5.2, obtaining a first Loss function Loss by masking the word prediction task and adopting the following formula 1 (x,θ):
Wherein:
P(x a |V A1 ,V B1 ) The meaning is as follows: the matrix of coded vectors V in text A1 A1 In reading a predicted mask word x a To read mask word x a Encoding vector matrix V of vector and text B of (C) B Splicing to obtain a spliced vector, and multiplying the spliced vector with the dictionary matrix to obtain a probability matrix; in the probability matrix, a maximum probability value, namely P (x a |V A1 ,V B1 ) The method comprises the steps of carrying out a first treatment on the surface of the The dictionary matrix is a matrix formed by word vectors of each word in the dictionary;
-log P(x a |V A1 ,V B1 ): representing cross entropy calculations, namely: using standard cross entropy pairs P (x a |V A1 ,V B1 ) Calculating to obtain a shielding word x a Is a loss value of (2);
e (): representing an averaging calculation;
a e len (A1), representing that in the text A1, the a-th word is masked;
thus, each mask predicts a penalty; then, summing the loss values of the shielding words, and dividing the sum by the number of the shielding words to obtain an average loss value;
step 5.3, obtaining a second Loss function Loss by recovering tasks through the word order and adopting the following formula 2 (x,θ):
Wherein:
b represents the position of the word predicted in text B, which has c words in total, so b=1, 2, …, c;
for word vector x predicted at bit B in text B b The loss value-log P (x) was obtained by the following method b |x b-1:0 ,V A1 ,V B1 ):
1) Input V A1 ,V B1 And x b-1:0 ;
Wherein:
x b-1:0 the meaning is as follows: the 0 th bit separator vector x in front of text B 0 Text B1 st bit word vector x 1 …, text B-1 bit word vector x b-1 Vector formed by splicing;
2) Let V A1 ,V B1 And x b-1:0 Performing splicing operation to obtain word vector x to be predicted b Context vector of (a);
3)P(x b |x b-1:0 ,V A1 ,V B1 ) The meaning is as follows: using a sequence-to-sequence seq2seq model, including an encoding end and a decoding end; inputting a context vector at an encoding end; outputting the predicted word vector x at the decoding end b Is set, and a predicted probability value thereof;
4)-log P(x b |x b-1:0 ,V A1 ,V B1 ): using standard cross entropy pairs P (x b |x b-1:0 ,V A1 ,V B1 ) Calculating to obtain word vector x b Is a loss value of (2);
each predicted word vector in the text B obtains a loss value; averaging the Loss values to obtain a second Loss function Loss 2 (x,θ);
Step 5.4, for the first Loss function Loss 1 (x, θ) and a second Loss function Loss 2 (x, θ) to obtain the total Loss function Loss Total (S) ;
Step 6, judging whether the training reaches the maximum iteration number, if not, according to the total Loss function Loss Total (S) Obtaining a gradient, carrying out back transmission and parameter updating on the model parameter theta, and returning to the step 4; if yes, stopping training to obtain the pre-trained language model.
2. The text-pair oriented chinese language model pretraining method of claim 1, wherein in step 2, for each randomly selected word, a MASK character [ -MASK ] is used to replace the corresponding word, resulting in a masked text.
3. The text-pair oriented chinese language model pretraining method of claim 1, wherein text a and text B refer to: the text A is a question text; text B is the answer text.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210950700.1A CN116029354B (en) | 2022-08-09 | 2022-08-09 | Text pair-oriented Chinese language model pre-training method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210950700.1A CN116029354B (en) | 2022-08-09 | 2022-08-09 | Text pair-oriented Chinese language model pre-training method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116029354A true CN116029354A (en) | 2023-04-28 |
CN116029354B CN116029354B (en) | 2023-08-01 |
Family
ID=86076489
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210950700.1A Active CN116029354B (en) | 2022-08-09 | 2022-08-09 | Text pair-oriented Chinese language model pre-training method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116029354B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110569500A (en) * | 2019-07-23 | 2019-12-13 | 平安国际智慧城市科技股份有限公司 | Text semantic recognition method and device, computer equipment and storage medium |
CN112487786A (en) * | 2019-08-22 | 2021-03-12 | 创新工场(广州)人工智能研究有限公司 | Natural language model pre-training method based on disorder rearrangement and electronic equipment |
CN112632997A (en) * | 2020-12-14 | 2021-04-09 | 河北工程大学 | Chinese entity identification method based on BERT and Word2Vec vector fusion |
CN113343683A (en) * | 2021-06-18 | 2021-09-03 | 山东大学 | Chinese new word discovery method and device integrating self-encoder and countertraining |
KR20220044406A (en) * | 2020-10-01 | 2022-04-08 | 네이버 주식회사 | Method and system for controlling distributions of attributes in language models for text generation |
-
2022
- 2022-08-09 CN CN202210950700.1A patent/CN116029354B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110569500A (en) * | 2019-07-23 | 2019-12-13 | 平安国际智慧城市科技股份有限公司 | Text semantic recognition method and device, computer equipment and storage medium |
CN112487786A (en) * | 2019-08-22 | 2021-03-12 | 创新工场(广州)人工智能研究有限公司 | Natural language model pre-training method based on disorder rearrangement and electronic equipment |
KR20220044406A (en) * | 2020-10-01 | 2022-04-08 | 네이버 주식회사 | Method and system for controlling distributions of attributes in language models for text generation |
CN112632997A (en) * | 2020-12-14 | 2021-04-09 | 河北工程大学 | Chinese entity identification method based on BERT and Word2Vec vector fusion |
CN113343683A (en) * | 2021-06-18 | 2021-09-03 | 山东大学 | Chinese new word discovery method and device integrating self-encoder and countertraining |
Non-Patent Citations (1)
Title |
---|
胡益淮;: "基于XLNET的抽取式多级语义融合模型", 通信技术, no. 07 * |
Also Published As
Publication number | Publication date |
---|---|
CN116029354B (en) | 2023-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110134771B (en) | Implementation method of multi-attention-machine-based fusion network question-answering system | |
Kowsher et al. | Bangla-bert: transformer-based efficient model for transfer learning and language understanding | |
Zhao et al. | Attention-Based Convolutional Neural Networks for Sentence Classification. | |
CN115392259A (en) | Microblog text sentiment analysis method and system based on confrontation training fusion BERT | |
CN115658890A (en) | Chinese comment classification method based on topic-enhanced emotion-shared attention BERT model | |
Li et al. | Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured Data | |
CN114281982B (en) | Book propaganda abstract generation method and system adopting multi-mode fusion technology | |
Tang et al. | Full attention-based bi-GRU neural network for news text classification | |
CN115129819A (en) | Text abstract model production method and device, equipment and medium thereof | |
CN111428518B (en) | Low-frequency word translation method and device | |
Yang et al. | Generation-based parallel particle swarm optimization for adversarial text attacks | |
Waghela et al. | Saliency attention and semantic similarity-driven adversarial perturbation | |
Jahan et al. | A Comprehensive Study on NLP Data Augmentation for Hate Speech Detection: Legacy Methods, BERT, and LLMs | |
Behere et al. | Text summarization and classification of conversation data between service chatbot and customer | |
CN116029354B (en) | Text pair-oriented Chinese language model pre-training method | |
CN115688703B (en) | Text error correction method, storage medium and device in specific field | |
Mastronardo et al. | Enhancing a text summarization system with ELMo | |
CN115309898A (en) | Word granularity Chinese semantic approximate countermeasure sample generation method based on knowledge enhanced BERT | |
Wei | Research on internet text sentiment classification based on BERT and CNN-BiGRU | |
Zhu | A Simple Survey of Pre-trained Language Models | |
Sattari et al. | Improving image captioning with local attention mechanism | |
Liu et al. | Raw-to-end name entity recognition in social media | |
Croce et al. | Grammatical Feature Engineering for Fine-grained IR Tasks. | |
Xu et al. | Incorporating forward and backward instances in a bi-lstm-cnn model for relation classification | |
CN110990385A (en) | Software for automatically generating news headlines based on Sequence2Sequence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |