CN110046248A - Model training method, file classification method and device for text analyzing - Google Patents

Model training method, file classification method and device for text analyzing Download PDF

Info

Publication number
CN110046248A
CN110046248A CN201910176632.6A CN201910176632A CN110046248A CN 110046248 A CN110046248 A CN 110046248A CN 201910176632 A CN201910176632 A CN 201910176632A CN 110046248 A CN110046248 A CN 110046248A
Authority
CN
China
Prior art keywords
word
sentence
vector
training
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910176632.6A
Other languages
Chinese (zh)
Other versions
CN110046248B (en
Inventor
蒋亮
张家兴
温祖杰
梁忠平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910176632.6A priority Critical patent/CN110046248B/en
Publication of CN110046248A publication Critical patent/CN110046248A/en
Application granted granted Critical
Publication of CN110046248B publication Critical patent/CN110046248B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

This specification embodiment provides a kind of model training method, file classification method and device for text analyzing, method includes: first with the first bidirectional transducer model, for each word in the first training sentence, initial term vector based on the word, and the information above of the word, obtain the corresponding positive vector of the word;Followed by the first bidirectional transducer model, for each word in the first training sentence, the context information of initial term vector and the word based on the word obtains the corresponding opposite vector of the word;Then according to the position of each word in the first training sentence, the opposite vector of the latter word of the positive vector sum position of the previous word of the position is stitched together, as the corresponding target term vector in the position;It recycles first language model to be trained for the corresponding target term vector in each position to the first bidirectional transducer model and first language model, so that not only the speed of service was fast, but also can guarantee the robustness of model.

Description

Model training method, file classification method and device for text analyzing
Technical field
This specification one or more embodiment is related to computer field, the more particularly, to model training of text analyzing Method, file classification method and device.
Background technique
Converter (Transformer) model is one kind that Ashish Vaswani of Google et al. was proposed in 2017 Neural network model can be used for the depth modelling of sequence data, alternative length memory network (long short term Memory, LSTM) model, have the characteristics that the speed of service is fast.
Transformer model only considers front institute only from unidirectional processing sequence in processing sequence when each position There is the information of position, do not account for the information of back location, this greatly limits the robustness of model.
Accordingly, it would be desirable to there is improved plan, when carrying out depth modelling to sequence data, can utilize The fireballing feature of Transformer model running, and guarantee the robustness of model.
Summary of the invention
This specification one or more embodiment describes a kind of model training method for text analyzing, text classification Method and apparatus can utilize the fireballing spy of Transformer model running when carrying out depth modelling to sequence data Point, and guarantee the robustness of model.
In a first aspect, providing a kind of model training method for text analyzing, method includes:
Using the first bidirectional transducer model, for each word in the first training sentence, based on the initial word of the word to The information above of amount and the word in the first training sentence obtains the corresponding positive vector of the word;
Using the first bidirectional transducer model, for each word in the first training sentence, based on the word The context information of initial term vector and the word in the first training sentence, obtains the corresponding opposite vector of the word;
It, should by the positive vector sum of the previous word of the position according to the position of each word in the first training sentence The opposite vector of the latter word of position is stitched together, as the corresponding target term vector in the position;
Using first language model, for the corresponding target term vector in position each in the first training sentence, prediction Obtain the first probability of the corresponding word in the position;
By making the first-loss Function Minimization with first probability correlation, to the first bidirectional transducer model It is trained with the first language model, the second bidirectional transducer model and second language model after being trained.
It is described to utilize the first bidirectional transducer model in a kind of possible embodiment, for first instruction Practice each word in sentence, initial term vector and the word based on the word train the context information in sentence described first, Obtain the corresponding opposite vector of the word, comprising:
Attention certainly is used for each word in the first training sentence using the first bidirectional transducer model Power mechanism, the context information of initial term vector and the word based on the word in the first training sentence, from different perspectives Extract multiple important informations;
The corresponding vector of important information each in the multiple important information is spliced, it is corresponding reversed to obtain the word Vector.
Second aspect, provides a kind of model training method for text analyzing, and method includes:
Using the second bidirectional transducer model after method training as described in relation to the first aspect, for the second training language Each word in sentence, the information above of initial term vector and the word based on the word in the second training sentence, obtains The corresponding positive vector of the word;
Using the second bidirectional transducer model, for each word in the second training sentence, based on the word The context information of initial term vector and the word in the second training sentence, obtains the corresponding opposite vector of the word;
It, should by the positive vector sum of the previous word of the position according to the position of each word in the second training sentence The opposite vector of the latter word of position is stitched together, as the corresponding target term vector in the position;
Using the second language model after method training as described in relation to the first aspect, for the second training sentence In the corresponding target term vector in each position, prediction obtains the first probability of the corresponding word in the position;And according to described second The corresponding target term vector in each position in training sentence generates the expression vector of the corresponding sentence of the second training sentence;
Using more disaggregated models, based on the expression vector of the corresponding sentence of the second training sentence, prediction described second Second probability of training sentence corresponding label;
By make first-loss function and the second loss function and minimization, to the second bidirectional transducer model, The second language model and more disaggregated models are trained, and obtain third bidirectional transducer model, third language model With more than second disaggregated model;Wherein, the first-loss function and first probability correlation, second loss function and institute State the second probability correlation.
It is described to utilize the second bidirectional transducer model in a kind of possible embodiment, for second instruction Practice each word in sentence, initial term vector and the word based on the word train the context information in sentence described second, Obtain the corresponding opposite vector of the word, comprising:
Attention certainly is used for each word in the second training sentence using the second bidirectional transducer model Power mechanism, the context information of initial term vector and the word based on the word in the second training sentence, from different perspectives Extract multiple important informations;
The corresponding vector of important information each in the multiple important information is spliced, it is corresponding reversed to obtain the word Vector.
It is described according to the corresponding target word in position each in the second training sentence in a kind of possible embodiment Vector generates the expression vector of the corresponding sentence of the second training sentence, comprising:
The corresponding target term vector in position each in the second training sentence is taken into mean value, using the mean value as described in The expression vector of the corresponding sentence of second training sentence.
It is described to pass through make the first-loss function and the second loss function and pole in a kind of possible embodiment Smallization is trained the second bidirectional transducer model, the second language model and more disaggregated models, comprising:
Make the first-loss function and the second loss function by gradient descent method and minimization, with determination described The model parameter of two bidirectional transducer models, the second language model and more disaggregated models.
The third aspect, provides a kind of file classification method, and method includes:
Using the third bidirectional transducer model after the method training as described in second aspect, for sentence to be sorted In each word, the information above of initial term vector and the word in the sentence to be sorted based on the word obtains the word Corresponding forward direction vector;
It is first based on the word for each word in the sentence to be sorted using the third bidirectional transducer model The context information of beginning term vector and the word in the sentence to be sorted, obtains the corresponding opposite vector of the word;
According to the position of each word in the sentence to be sorted, by the positive vector sum of the previous word of the position position The opposite vector for the latter word set is stitched together, as the corresponding target term vector in the position;
According to the corresponding target term vector in position each in the sentence to be sorted, it is corresponding to generate the sentence to be sorted The expression vector of sentence;
More than second disaggregated model as described in after being trained using the method as described in second aspect is based on the sentence to be sorted The expression vector of corresponding sentence carries out text classification to the sentence to be sorted.
Fourth aspect, provides a kind of model training apparatus for text analyzing, and device includes:
Positive vector generation unit, for utilizing the first bidirectional transducer model, for each of first training sentence Word, the information above of initial term vector and the word based on the word in the first training sentence, it is corresponding to obtain the word Positive vector;
Opposite vector generation unit, for utilizing the first bidirectional transducer model, for the first training sentence In each word, initial term vector and the word based on the word it is described first training sentence in context information, be somebody's turn to do The corresponding opposite vector of word;
Term vector generation unit, for the position according to each word in the first training sentence, by it is described it is positive to The position that opposite vector generation unit described in the positive vector sum of the previous word for the position that amount generation unit obtains obtains The opposite vector of the latter word be stitched together, as the corresponding target term vector in the position;
Predicting unit is used to utilize first language model, first instruction obtained for the term vector generation unit Practice the corresponding target term vector in each position in sentence, prediction obtains the first probability of the corresponding word in the position;
Model training unit, for the first-loss function by making the first probability correlation obtained with the predicting unit Minimization is trained the first bidirectional transducer model and the first language model, second after being trained pair To converter model and second language model.
5th aspect, provides a kind of model training apparatus for text analyzing, device includes:
Positive vector generation unit, for utilizing second bi-directional conversion after method training as described in relation to the first aspect Device model, for each word in the second training sentence, initial term vector and the word based on the word are in second training Information above in sentence obtains the corresponding positive vector of the word;
Opposite vector generation unit, for utilizing the second bidirectional transducer model, for the second training sentence In each word, initial term vector and the word based on the word it is described second training sentence in context information, be somebody's turn to do The corresponding opposite vector of word;
Term vector generation unit, for the position according to each word in the second training sentence, by it is described it is positive to The position that opposite vector generation unit described in the positive vector sum of the previous word for the position that amount generation unit obtains obtains The opposite vector of the latter word be stitched together, as the corresponding target term vector in the position;
First predicting unit, for utilizing the second language model after method training as described in relation to the first aspect, needle Target term vector corresponding to position each in the second training sentence, predicts that obtain the corresponding word in the position first is general Rate;
Sentence vector generation unit, described second for being obtained according to the term vector generation unit trains in sentence often The corresponding target term vector in a position generates the expression vector of the corresponding sentence of the second training sentence;
Second predicting unit, for utilizing more disaggregated models, obtained based on the sentence vector generation unit described the The expression vector of the corresponding sentence of two training sentences, predicts the second probability of the second training sentence corresponding label;
Model training unit, for by make first-loss function and the second loss function and minimization, to described Two bidirectional transducer models, the second language model and more disaggregated models are trained, and obtain third bidirectional transducer Model, third language model and more than second disaggregated model;Wherein, the first-loss function and first probability correlation, institute State the second loss function and second probability correlation.
6th aspect, provides a kind of document sorting apparatus, device includes:
Positive vector generation unit, for utilizing the third bi-directional conversion after the method training as described in second aspect Device model, for each word in sentence to be sorted, initial term vector and the word based on the word are in the sentence to be sorted In information above, obtain the corresponding positive vector of the word;
Opposite vector generation unit, for utilizing the third bidirectional transducer model, in the sentence to be sorted Each word, the context information of initial term vector and the word in the sentence to be sorted based on the word obtains the word pair The opposite vector answered;
Term vector generation unit, for the position according to each word in the sentence to be sorted, by the positive vector The position that opposite vector generation unit described in the positive vector sum of the previous word for the position that generation unit obtains obtains The opposite vector of the latter word is stitched together, as the corresponding target term vector in the position;
Sentence vector generation unit, it is each in the sentence to be sorted for being obtained according to the term vector generation unit The corresponding target term vector in position generates the expression vector of the corresponding sentence of the sentence to be sorted;
Text classification unit, more than second disaggregated model as described in after being trained for method of the utilization as described in second aspect, Expression vector based on the corresponding sentence of the sentence to be sorted that the sentence vector generation unit obtains, to described to be sorted Sentence carries out text classification.
7th aspect, provides a kind of computer readable storage medium, is stored thereon with computer program, when the calculating When machine program executes in a computer, the method that enables computer execute first aspect or second aspect or the third aspect.
Eighth aspect provides a kind of calculating equipment, including memory and processor, and being stored in the memory can hold Line code, when the processor executes the executable code, the method for realization first aspect or second aspect or the third aspect.
The method and apparatus provided by this specification embodiment, on the one hand, first with the first bidirectional transducer model, For each word in the first training sentence, initial term vector and the word based on the word are in the first training sentence Information above, obtain the corresponding positive vector of the word;Followed by the first bidirectional transducer model, for described first Each word in training sentence, the hereafter letter of initial term vector and the word based on the word in the first training sentence Breath, obtains the corresponding opposite vector of the word;Then according to the position of each word in the first training sentence, by the position The opposite vector of the latter word of the positive vector sum position of previous word is stitched together, as the corresponding target word in the position Vector;First language model is recycled, for the corresponding target term vector in position each in the first training sentence, is measured in advance To the first probability of the corresponding word in the position;Finally by the first-loss Function Minimization made with first probability correlation, The first bidirectional transducer model and the first language model are trained, the second bidirectional transducer after being trained Model and second language model.It is different from common unidirectional Transformer model in this specification embodiment, bi-directional conversion Device model has fully considered the contextual information of each word, rather than only considers information above, builds carrying out depth to sequence data When mould, the fireballing feature of Transformer model running can be utilized, and guarantee the robustness of model.
On the other hand, the second bidirectional transducer model after being trained first with method as described in relation to the first aspect, For each word in the second training sentence, initial term vector and the word based on the word are in the second training sentence Information above, obtain the corresponding positive vector of the word;Followed by the second bidirectional transducer model, for described second Each word in training sentence, the hereafter letter of initial term vector and the word based on the word in the second training sentence Breath, obtains the corresponding opposite vector of the word;Then according to the position of each word in the second training sentence, by the position The opposite vector of the latter word of the positive vector sum position of previous word is stitched together, as the corresponding target word in the position Vector;The second language model after recycling method training as described in relation to the first aspect, for the second training sentence In the corresponding target term vector in each position, prediction obtains the first probability of the corresponding word in the position;And according to described second The corresponding target term vector in each position in training sentence generates the expression vector of the corresponding sentence of the second training sentence; Second training is predicted based on the expression vector of the corresponding sentence of the second training sentence followed by more disaggregated models Second probability of sentence corresponding label;Finally by make first-loss function and the second loss function and minimization, to described Second bidirectional transducer model, the second language model and more disaggregated models are trained, and obtain third bi-directional conversion Device model, third language model and more than second disaggregated model;Wherein, the first-loss function and first probability correlation, Second loss function and second probability correlation.In this specification embodiment, depth not only is being carried out to sequence data When modeling, the fireballing feature of Transformer model running can be utilized, and guarantee the robustness of model;Moreover, On the one hand on the basis of carrying out model training to bidirectional transducer model and language model, further to bidirectional transducer model, Language model and more disaggregated models carry out joint training, reach better model training effect.
Another aspect, the third bidirectional transducer model after being trained first with the method as described in second aspect, For each word in sentence to be sorted, initial term vector and the word based on the word are upper in the sentence to be sorted Literary information obtains the corresponding positive vector of the word;Followed by the third bidirectional transducer model, for the language to be sorted Each word in sentence, the context information of initial term vector and the word in the sentence to be sorted based on the word are somebody's turn to do The corresponding opposite vector of word;Then according to the position of each word in the sentence to be sorted, by the previous word of the position The opposite vector of the latter word of the positive vector sum position is stitched together, as the corresponding target term vector in the position;Root again According to the corresponding target term vector in position each in the sentence to be sorted, the expression of the corresponding sentence of the sentence to be sorted is generated Vector;Finally using more than second disaggregated model after the method training as described in second aspect, it is based on the language to be sorted The expression vector of the corresponding sentence of sentence carries out text classification to the sentence to be sorted.In this specification embodiment, to sequence When data carry out depth modelling, the fireballing feature of Transformer model running can be utilized, and guarantee the robust of model Property, and bidirectional transducer model and more disaggregated models after two stages training, be conducive to obtain preferable text classification As a result.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment Attached drawing be briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill of field, without creative efforts, it can also be obtained according to these attached drawings others Attached drawing.
Fig. 1 is the implement scene schematic diagram of one embodiment that this specification discloses;
Fig. 2 shows the model training method flow charts for text analyzing according to one embodiment;
Fig. 3 shows the model training method flow chart for text analyzing according to another embodiment;
Fig. 4 shows the file classification method flow chart according to another embodiment;
Fig. 5 is the schematic diagram of internal structure for the unidirectional Transformer model that this specification embodiment provides;
Fig. 6 shows the schematic block diagram of the model training apparatus for text analyzing according to one embodiment;
Fig. 7 shows the schematic block diagram of the model training apparatus for text analyzing according to another embodiment;
Fig. 8 shows the schematic block diagram of the document sorting apparatus according to another embodiment.
Specific embodiment
With reference to the accompanying drawing, the scheme provided this specification is described.
Fig. 1 is the implement scene schematic diagram of one embodiment that this specification discloses.The implement scene is related to text classification, And the training to the model for text analyzing.Referring to Fig.1, which is related to three class models: bidirectional transducer model (also referred to as two-way Transformer model), language model and more disaggregated models (also referred to as multi-categorizer), to model into When row training, multi-task learning can be formed with multiple model joint trainings.
Wherein, text classification: if text classification refers to the text classification by user's input into the Ganlei pre-defined One or more classes task.
Language model: language model is to judge the sentence by calculating the probability that occurs in natural language of a sentence No to belong to correct natural language, to information retrieval, machine translation, the tasks such as speech recognition have important role.Neural language Speech model is one kind of language model, the probability that it occurs using each sentence of neural net model establishing.By from a large amount of corpus Study, neural language model may learn the inherent laws and knowledge of language.
Multi-task learning: multi-task learning is a machine learning research field, it is intended to be put into a variety of relevant tasks Combination learning in the same model or frame achievees the effect that knowledge migration improves each task between task.
As shown in Figure 1, the training of model is divided into two stages: pre-training stage and fine tuning stage.
In the pre-training stage, for the sentence S={ w being made of N number of word1,w2,…,wN, two-way Transformer model S is converted into N number of vector { v first1,v2,…,vN, wherein each vector indicates the output vector of a word, the production of the vector The raw contextual information for having fully considered each word.Then with the output vector v of each wordiPass through language model prediction present bit The word w seti, to carry out model training to two-way Transformer model and language model based on prediction result.
Two-way Transformer model is also passed through by sentence S=using the text data for having mark in the fine tuning stage {w1,w2,…,wNIt is converted into vector { v1,v2,…,vN, then by all word output vector { v1,v2,…,vNMean valueMake Text classification is carried out by multi-categorizer using the expression vector of sentence as the input of multi-categorizer for the expression vector of sentence, The stage point also is finely tuned by language model prediction current word using the output vector of each word as the input of language model simultaneously Generic task and language model prediction task form multi-task learning, and the generalization ability of more disaggregated models can be improved.
In forecast period, for the sentence of user's input, by two-way Transformer output to after measuring mean value To the expression vector of sentence, will classify in the expression vector input multi-categorizer of sentence.
In this specification embodiment, the two-way Transformer model of pre-training and language model first are two-way Transformer model fully considers the contextual information of each word, rather than only considers information above.Then good with pre-training Transformer model is finely tuned on text categorization task, to improve the robustness of model.
Fig. 2 shows the model training method flow chart for text analyzing according to one embodiment, which can be with Corresponding to the pre-training stage mentioned in application scenarios shown in Fig. 1.As shown in Fig. 2, being used for the mould of text analyzing in the embodiment Type training method the following steps are included:
First in step 21, using the first bidirectional transducer model, for each word in the first training sentence, being based on should Above information of the initial term vector and the word of word in the first training sentence obtains the corresponding positive vector of the word. It is understood that obtaining the process and unidirectional Transformer model class of the corresponding positive vector of each word in the step 21 Seemingly.
Then in step 22, using the first bidirectional transducer model, for each of described first training sentence Word, the context information of initial term vector and the word based on the word in the first training sentence, it is corresponding to obtain the word Opposite vector.It is understood that the word is utilized during obtaining the corresponding opposite vector of each word in the step 22 Context information.
In one example, using the first bidirectional transducer model, for each of described first training sentence Word, using from attention mechanism, initial term vector and the word based on the word train the hereafter letter in sentence described first Breath, extracts multiple important informations from different perspectives;The corresponding vector of important information each in the multiple important information is carried out Splicing, obtains the corresponding opposite vector of the word.
Then in step 23, according to the position of each word in the first training sentence, by the previous word of the position The opposite vector of the latter word of the positive vector sum position be stitched together, as the corresponding target term vector in the position.It can With understanding, the corresponding target term vector in each position not only embodied the position before it is above, but also embody the position it Afterwards hereafter, robustness is good.
Again in step 24, using first language model, for the corresponding target in position each in the first training sentence Term vector, prediction obtain the first probability of the corresponding word in the position.It is understood that obtaining using bidirectional transducer model The corresponding target term vector in each position in one training sentence, then the position is obtained according to target word vector forecasting by language model The probability of corresponding word.
Finally in step 25, by making the first-loss Function Minimization with first probability correlation, to described first Bidirectional transducer model and the first language model are trained, the second bidirectional transducer model and second after being trained Language model.It is understood that the training sentence that model training uses is convenient for without artificial mark using widely without mark Corpus is trained model.
The method provided by this specification embodiment, first with the first bidirectional transducer model, for the first training Each word in sentence, the information above of initial term vector and the word based on the word in the first training sentence, obtains To the corresponding positive vector of the word;Followed by the first bidirectional transducer model, in the first training sentence Each word, the context information of initial term vector and the word based on the word in the first training sentence, obtains the word pair The opposite vector answered;Then according to the position of each word in the first training sentence, just by the previous word of the position It is stitched together to the opposite vector of the latter word of the vector sum position, as the corresponding target term vector in the position;It recycles First language model, for the corresponding target term vector in position each in the first training sentence, prediction obtains the position pair First probability of the word answered;Finally by the first-loss Function Minimization made with first probability correlation, to described first Bidirectional transducer model and the first language model are trained, the second bidirectional transducer model and second after being trained Language model.In this specification embodiment, different from common unidirectional Transformer model, bidirectional transducer model is abundant The contextual information of each word is considered, rather than only considers information above, when carrying out depth modelling to sequence data, Neng Gouli With the fireballing feature of Transformer model running, and guarantee the robustness of model.
Fig. 3 shows the model training method flow chart for text analyzing according to another embodiment, which can To correspond to the fine tuning stage mentioned in application scenarios shown in Fig. 1.As shown in figure 3, model training method includes in the embodiment Following steps:
First in step 31, using the second bidirectional transducer model after the training of method described in Fig. 2, for second Each word in training sentence, the letter above of initial term vector and the word based on the word in the second training sentence Breath obtains the corresponding positive vector of the word.It is understood that obtaining the corresponding positive vector of each word in the step 31 Process is similar with unidirectional Transformer model.
Then in step 32, using the second bidirectional transducer model, for each of described second training sentence Word, the context information of initial term vector and the word based on the word in the second training sentence, it is corresponding to obtain the word Opposite vector.It is understood that the word is utilized during obtaining the corresponding opposite vector of each word in the step 32 Context information.
In one example, using the second bidirectional transducer model, for each of described second training sentence Word, using from attention mechanism, initial term vector and the word based on the word train the hereafter letter in sentence described second Breath, extracts multiple important informations from different perspectives;The corresponding vector of important information each in the multiple important information is carried out Splicing, obtains the corresponding opposite vector of the word.
Then in step 33, according to the position of each word in the second training sentence, by the previous word of the position The opposite vector of the latter word of the positive vector sum position be stitched together, as the corresponding target term vector in the position.It can With understanding, the corresponding target term vector in each position not only embodied the position before it is above, but also embody the position it Afterwards hereafter, robustness is good.
Again in step 34, using the second language model after the training of method described in Fig. 2, for second training The corresponding target term vector in each position in sentence, prediction obtain the first probability of the corresponding word in the position.It is understood that Obtain the corresponding target term vector in each position in the second training sentence using bidirectional transducer model, then by language model according to Target word vector forecasting obtains the probability of the corresponding word in the position.
Again in step 35, according to the corresponding target term vector in position each in the second training sentence, described the is generated The expression vector of the corresponding sentence of two training sentences.It is understood that generating the corresponding sentence of the second training sentence Indicate vector, combine the corresponding target term vector in multiple positions, rather than just the corresponding target word in one of position to Amount.
In one example, the corresponding target term vector in position each in the second training sentence is taken into mean value, by institute State expression vector of the mean value as the corresponding sentence of the second training sentence.
Again in step 36, using more disaggregated models, based on the expression vector of the corresponding sentence of the second training sentence, in advance Survey the second probability of the second training sentence corresponding label.It is understood that label is the text classification marked in advance Classification.
Finally in step 37, by make first-loss function and the second loss function and minimization, to described second pair Be trained to converter model, the second language model and more disaggregated models, obtain third bidirectional transducer model, Third language model and more than second disaggregated model;Wherein, the first-loss function and first probability correlation, described second Loss function and second probability correlation.It is understood that the training sentence that model training uses need to be marked manually, it is convenient for Using have mark corpus model is further trained.
In one example, by gradient descent method make the first-loss function with it is the second loss function and minimum Change, with the model parameter of determination the second bidirectional transducer model, the second language model and more disaggregated models.
The method that this specification embodiment provides, first with described second two-way turn after the training of method described in Fig. 2 Parallel operation model, for each word in the second training sentence, initial term vector and the word based on the word are in second instruction Practice the information above in sentence, obtains the corresponding positive vector of the word;Followed by the second bidirectional transducer model, for Each word in the second training sentence, initial term vector and the word based on the word are in the second training sentence Context information, obtain the corresponding opposite vector of the word;It then, will according to the position of each word in the second training sentence The opposite vector of the latter word of the positive vector sum position of the previous word of the position is stitched together, corresponding as the position Target term vector;The second language model after recycling the training of method described in Fig. 2, for the second training sentence In the corresponding target term vector in each position, prediction obtains the first probability of the corresponding word in the position;And according to described second The corresponding target term vector in each position in training sentence generates the expression vector of the corresponding sentence of the second training sentence; Second training is predicted based on the expression vector of the corresponding sentence of the second training sentence followed by more disaggregated models Second probability of sentence corresponding label;Finally by make first-loss function and the second loss function and minimization, to described Second bidirectional transducer model, the second language model and more disaggregated models are trained, and obtain third bi-directional conversion Device model, third language model and more than second disaggregated model;Wherein, the first-loss function and first probability correlation, Second loss function and second probability correlation.In this specification embodiment, depth not only is being carried out to sequence data When modeling, the fireballing feature of Transformer model running can be utilized, and guarantee the robustness of model;Moreover, On the one hand on the basis of carrying out model training to bidirectional transducer model and language model, further to bidirectional transducer model, Language model and more disaggregated models carry out joint training, reach better model training effect.
Fig. 4 shows the file classification method flow chart according to another embodiment, which can correspond to shown in Fig. 1 The forecast period mentioned in application scenarios.As shown in figure 4, in the embodiment file classification method the following steps are included:
First in step 41, using the third bidirectional transducer model after the training of method described in Fig. 3, for wait divide Each word in quasi-sentence, the information above of initial term vector and the word in the sentence to be sorted based on the word, obtains To the corresponding positive vector of the word.It is understood that in the step 41, obtain the process of the corresponding positive vector of each word with Unidirectional Transformer model is similar.
Then in step 42, using the third bidirectional transducer model, for each word in the sentence to be sorted, It is corresponding reversed to obtain the word for the context information of initial term vector and the word in the sentence to be sorted based on the word Vector.It is understood that the word is utilized hereafter during obtaining the corresponding opposite vector of each word in the step 42 Information.
Then in step 43, according to the position of each word in the sentence to be sorted, by the previous word of the position The opposite vector of the latter word of the positive vector sum position is stitched together, as the corresponding target term vector in the position.It can be with Understand, the corresponding target term vector in each position not only embodied the position before it is above, but also after embodying the position Hereafter, robustness is good.
It is generated described wait divide in step 44 according to the corresponding target term vector in position each in the sentence to be sorted again The expression vector of the corresponding sentence of quasi-sentence.It is understood that generate the expression of the corresponding sentence of the sentence to be sorted to Amount, combines the corresponding target term vector in multiple positions, rather than just the corresponding target term vector in one of position.
Finally in step 45, using more than second disaggregated model after the training of method described in Fig. 3, based on described wait divide The expression vector of the corresponding sentence of quasi-sentence carries out text classification to the sentence to be sorted.It is understood that utilizing more points Class model predicts that the sentence to be sorted corresponds to each classification based on the expression vector of the corresponding sentence of the sentence to be sorted Probability, take the classification of maximum probability as the result of text classification.
The method that this specification embodiment provides, first with two-way turn of the third after the training of method described in Fig. 3 Parallel operation model, for each word in sentence to be sorted, initial term vector and the word based on the word are in the language to be sorted Information above in sentence obtains the corresponding positive vector of the word;Followed by the third bidirectional transducer model, for described Each word in sentence to be sorted, the hereafter letter of initial term vector and the word in the sentence to be sorted based on the word Breath, obtains the corresponding opposite vector of the word;Then according to the position of each word in the sentence to be sorted, before the position The opposite vector of the latter word of the positive vector sum position of one word is stitched together, as the corresponding target word in the position to Amount;Further according to the corresponding target term vector in position each in the sentence to be sorted, the corresponding sentence of the sentence to be sorted is generated The expression vector of son;Finally using more than second disaggregated model after the training of method described in Fig. 3, it is based on the language to be sorted The expression vector of the corresponding sentence of sentence carries out text classification to the sentence to be sorted.In this specification embodiment, to sequence When data carry out depth modelling, the fireballing feature of Transformer model running can be utilized, and guarantee the robust of model Property, and bidirectional transducer model and more disaggregated models after two stages training, be conducive to obtain preferable text classification As a result.
Three kinds of models involved in previous embodiment will specifically be introduced below: two-way Transformer model is (referred to as two-way Transformer), language model and more disaggregated models (also referred to as multi-categorizer).
Wherein, two-way Transformer:
Then the working principle for introducing tradition Transformer (i.e. unidirectional Transformer) first is expanded to double To on Transformer.
1, unidirectional Transformer
Transformer be Ashish Vaswani of Google et al. propose, for by text sequence be converted into Amount, Transformer overcome LSTM processing text when need by word calculate the shortcomings that, by each word on it text in make Information above is obtained with attention mechanism (attention mechanism).In this process, the output vector of each word Calculating can be parallel.
Fig. 5 is the schematic diagram of internal structure for the unidirectional Transformer model that this specification embodiment provides, referring to Fig. 5:
The input of Transformer module is a sequence vector X={ x1,x2,…,xN, wherein xiFor i-th of position Expression vector.X first passes around bull from noticing that power module (multi-head self attention) makes each word and thereon Each word generates interaction in text, to each word plus its above associated important information.Bull from pay attention to power module by Multiple structures are identical from power module composition is paid attention to, wherein each notice that the calculating process of power module is as follows certainly:
Use entirely connected layer (feed forward) by each word x firstiIt is converted into two vector kiAnd ti:
ki=tanh (Wqxi+b)
ti=tanh (Wvxi+b)
WqAnd WvFor parameter trainable in model.kiFor calculating xiAll the above word { x1,…,xi-1For xiWeight The property wanted, tiThen for storing xiIn information, be supplied to other words use:
Obtained vector ciAs extract from the above to xiUseful important information.Even if bull is from attention mechanism With multiple above-mentioned attention power modules, each word x is given from different perspectivesiFrom { x above1,…,xi-1In extract important information.Finally The vector that all attention power modules are extracted to each word is spliced into di, as bull is from attention power module for the defeated of each word Outgoing vector.
The output vector d of each wordiUsing normalization layer (layer normalization)-entirely be connected layer (feed Forward output vector l is obtained after)-normalization layeri, as xiBy the vector after the conversion of Transformer module, meter Calculation process is as follows:
li=LayerNorm (LayerNorm (xi+di)+W·LayerNorm(xi+di)
Wherein W is trainable parameter, and LayerNorm is used to normalize one layer of neural network, make between layer Information flow is more stable.LayerNorm calculation is as follows:
Wherein μ is the mean value of all neurons in one layer of neural network, and σ is the mark of all neurons in one layer of neural network It is quasi- poor.
Transformer module can carry out multiple-level stack, and the output of next layer of Transformer is as upper one layer The input of Transformer, to form multilayer Transformer network.The calculating process of Transformer network can be with table It is shown as
2, two-way Transformer
Two-way Transformer is the extension of unidirectional Transformer, is embodied in unidirectional Transformer in attention Information above is only considered in mechanism, has ignored context information.And the information hereafter of each word hereafter is also useful to itself 's.Therefore two-way Transformer models sentence from context both direction, increases the ability to express of model.It was calculated Journey is as follows:
WhereinWithRespectively indicate from it is upper and lower text in extract to xiImportant information.Utilize two-way Transformer Obtain diLater, as unidirectional Transformer,WithAlso it is obtained respectively by normalization layer and the full layer that is connected two-way Transformer is for xiTwo output vectors.After multi-layer biaxially oriented Transformer, the last one sentence S= {w1,w2,…,wNIt is converted into two groups of vectorsWithThe calculating of two-way Transformer Journey can be expressed as
Wherein, language model:
One sentence S={ w1,w2,…,wNBy being converted into two groups of vectors after multi-layer biaxially oriented TransformerWithIt can be by language model task come pre-training model, because of language model task Data do not need to mark, therefore are very easy to obtain mass data to the abundant pre-training of model progress.The mesh of language model task Be by each word xiContext { x1,…,xi-1,xi+1,…,xNPredict xi, so that model learning is to the interior of natural language In rule, if as soon as model can by context come it is correctly predicted go out each word, then this model is well The inherent law of natural language is acquired.The calculating process of two-way Transformer language model is as follows:
First by the positive vector of (i-1)-th wordWith the opposite vector of i+1 wordIt is stitched together
Then v is usediPredict i-th of word wiProbability:
Wherein WLMIt is trainable parameter, W in language modelj LMIndicate WLMJth row.
The loss function of language model is the mean value of the cross entropy loss function of all words in sentence
The purpose of language model is to want minimization LLM, remember in two-way Transformer it is all can training parameter collection be combined into W, In language model it is all can training parameter collection be combined into WLM.W and WLMOptimization is iterated by gradient descent method:
By iteration optimization model when pre-training, until LLMThe threshold value beta set less than one (can usually take 0.1,0.01 Deng), model just trains, and model has acquired the inherent law of natural language at this time.γ1Usually take the reality of 0.0001 magnitude Number.
Wherein, multi-categorizer:
Fine tuning the stage, using have the data of true tag to Jing Guo pre-training two-way Transformer model progress Fine tuning.Because two-way Transformer passes through pre-training, the knowledge of natural language is grasped, therefore first compared to random Beginningization model, the training directly on having label data can reach better effect by the fine tuning of pre-training bonus point class.
Trim process includes two parts, and a part is language model portion identical with pre-training process, and another part is To the sentence (S={ w of each input1,w2,…,wN, l) classify, wherein l is that the labeling process of sentence is as follows:
First with two-way Transformer by S={ w1,w2,…,wNIt is converted into vector [v1,…,vN], wherein it is each to Amount is all the expression vector of corresponding position word.Then to the expression of all words to measuring the obtained vector of mean value as entire sentence Expression vector.
Then Softmax classifier pair is usedCalculate the probability that S belongs to each label:
Wherein WcFor parameter sets trainable in multi-categorizer, Wk cIndicate WcIn row k.The loss function of classifier Belong to the cross entropy of its true tag l for each sample, i.e.,
LC=-logpC(l|S)
In trim process, the purpose of model is the loss function L by minimization language modelLMWith the loss letter of classifier Number LCThe sum of L=LLM+LC, all parameters in model by gradient descent method iteration optimization,
γ2Value usually compare γ1A small magnitude is 0.00001 or so.
In forecast period, by only needing two-way Transformer that sentence S is converted into vectorThen multi-categorizer is used It calculates S and belongs to each label lkProbability, finally take the label of maximum probability to export.
L=argmaxkpc(lk|S)
So far, it is achieved that the effect that textual classification model is improved using two-way Transformer.
It should be noted that multitask classifier and task arbiter are all not limited only to softmax classifier, it is all can be into The model of row classification can serve as multitask classifier and task arbiter, such as support vector machines, logistic regression, multilayer mind Through network etc..
According to the embodiment of another aspect, a kind of model training apparatus for text analyzing is also provided, described device is used In the model training method for text analyzing for executing the offer of this specification embodiment, for example, shown in Fig. 2 for text point The model training method of analysis.Fig. 6 shows the schematic frame of the model training apparatus for text analyzing according to one embodiment Figure.As shown in fig. 6, the device 600 includes:
Positive vector generation unit 61, for utilizing the first bidirectional transducer model, for every in the first training sentence A word, the information above of initial term vector and the word based on the word in the first training sentence, it is corresponding to obtain the word Positive vector;
Opposite vector generation unit 62, for utilizing the first bidirectional transducer model, for the first training language Each word in sentence, the context information of initial term vector and the word based on the word in the first training sentence, obtains The corresponding opposite vector of the word;
Term vector generation unit 63, for the position according to each word in the first training sentence, by the forward direction What opposite vector generation unit 62 described in the positive vector sum of the previous word for the position that vector generation unit 61 obtains obtained The opposite vector of the latter word of the position is stitched together, as the corresponding target term vector in the position;
Predicting unit 64, for utilizing first language model, obtained for the term vector generation unit 63 described the The corresponding target term vector in each position in one training sentence, prediction obtain the first probability of the corresponding word in the position;
Model training unit 65, for the first-loss by making the first probability correlation obtained with the predicting unit 64 Function Minimization is trained the first bidirectional transducer model and the first language model, and after being trained Two bidirectional transducer models and second language model.
Optionally, as one embodiment, the opposite vector generation unit 62 is specifically used for:
Attention certainly is used for each word in the first training sentence using the first bidirectional transducer model Power mechanism, the context information of initial term vector and the word based on the word in the first training sentence, from different perspectives Extract multiple important informations;
The corresponding vector of important information each in the multiple important information is spliced, it is corresponding reversed to obtain the word Vector.
The device provided by this specification embodiment, vector generation unit 61 positive first utilize the first bidirectional transducer Model, for each word in the first training sentence, initial term vector and the word based on the word are in the first training language Information above in sentence obtains the corresponding positive vector of the word;Then opposite vector generation unit 62 is two-way using described first Converter model, for each word in the first training sentence, initial term vector and the word based on the word are described Context information in first training sentence, obtains the corresponding opposite vector of the word;Then term vector generation unit 63 is according to described The position of each word in first training sentence, by the latter word of the positive vector sum position of the previous word of the position Opposite vector is stitched together, as the corresponding target term vector in the position;Predicting unit 64 recycles first language model, for The corresponding target term vector in each position in the first training sentence, prediction obtain the first probability of the corresponding word in the position; Last model training unit 65 is two-way to described first by making the first-loss Function Minimization with first probability correlation Converter model and the first language model are trained, the second bidirectional transducer model and second language after being trained Model.In this specification embodiment, different from common unidirectional Transformer model, bidirectional transducer model is fully considered The contextual information of each word, rather than only consider information above, when carrying out depth modelling to sequence data, can utilize The fireballing feature of Transformer model running, and guarantee the robustness of model.
According to the embodiment of another aspect, a kind of model training apparatus for text analyzing is also provided, described device is used In the model training method for text analyzing for executing the offer of this specification embodiment, for example, shown in Fig. 3 for text point The model training method of analysis.Fig. 7 shows the schematic frame of the model training apparatus for text analyzing according to one embodiment Figure.As shown in fig. 7, the device 700 includes:
Positive vector generation unit 71, for utilizing the second bidirectional transducer mould after the training of method described in Fig. 2 Type, for each word in the second training sentence, initial term vector and the word based on the word are in the second training sentence In information above, obtain the corresponding positive vector of the word;
Opposite vector generation unit 72, for utilizing the second bidirectional transducer model, for the second training language Each word in sentence, the context information of initial term vector and the word based on the word in the second training sentence, obtains The corresponding opposite vector of the word;
Term vector generation unit 73, for the position according to each word in the second training sentence, by the forward direction What opposite vector generation unit 72 described in the positive vector sum of the previous word for the position that vector generation unit 71 obtains obtained The opposite vector of the latter word of the position is stitched together, as the corresponding target term vector in the position;
First predicting unit 74, for the second language model after being trained using method described in Fig. 2, for described The corresponding target term vector in each position in second training sentence, prediction obtain the first probability of the corresponding word in the position;
Sentence vector generation unit 75, the second training sentence for being obtained according to the term vector generation unit 73 In the corresponding target term vector in each position, generate the expression vector of the corresponding sentence of the second training sentence;
Second predicting unit 76, for utilizing more disaggregated models, the institute obtained based on the sentence vector generation unit 75 The expression vector for stating the corresponding sentence of the second training sentence predicts the second probability of the second training sentence corresponding label;
Model training unit 77, for by make first-loss function and the second loss function and minimization, to described Second bidirectional transducer model, the second language model and more disaggregated models are trained, and obtain third bi-directional conversion Device model, third language model and more than second disaggregated model;Wherein, the first-loss function and first probability correlation, Second loss function and second probability correlation.
Optionally, as one embodiment, the opposite vector generation unit 72 is specifically used for:
Attention certainly is used for each word in the second training sentence using the second bidirectional transducer model Power mechanism, the context information of initial term vector and the word based on the word in the second training sentence, from different perspectives Extract multiple important informations;
The corresponding vector of important information each in the multiple important information is spliced, it is corresponding reversed to obtain the word Vector.
Optionally, as one embodiment, the sentence vector generation unit 75 is specifically used for the second training language The corresponding target term vector in each position takes mean value in sentence, using the mean value as the corresponding sentence of the second training sentence Indicate vector.
Optionally, as one embodiment, the model training unit 77 is described specifically for being made by gradient descent method First-loss function and the second loss function and minimization, with determination the second bidirectional transducer model, second language Say the model parameter of model and more disaggregated models.
The device that this specification embodiment provides, vector generation unit 71 positive first utilize the training of method described in Fig. 2 The second bidirectional transducer model afterwards, for each word in the second training sentence, the initial term vector based on the word, with And above information of the word in the second training sentence, obtain the corresponding positive vector of the word;Then opposite vector generates Unit 72 utilizes the second bidirectional transducer model, first based on the word for each word in the second training sentence The context information of beginning term vector and the word in the second training sentence, obtains the corresponding opposite vector of the word;Then word Vector generation unit 73 according to the position of each word in the second training sentence, by the previous word of the position it is positive to The opposite vector of the latter word of amount and the position is stitched together, as the corresponding target term vector in the position;First prediction is single Member 74 recycles the second language model after the training of method described in Fig. 2, for each position in the second training sentence Corresponding target term vector is set, prediction obtains the first probability of the corresponding word in the position;And sentence vector generation unit 75 According to the corresponding target term vector in position each in the second training sentence, the corresponding sentence of the second training sentence is generated Indicate vector;Then the second predicting unit 76 utilizes more disaggregated models, the table based on the corresponding sentence of the second training sentence Show vector, predicts the second probability of the second training sentence corresponding label;Last model training unit 77 is by making the first damage Lose function and the second loss function and minimization, to the second bidirectional transducer model, the second language model and institute It states more disaggregated models to be trained, obtains third bidirectional transducer model, third language model and more than second disaggregated model;Its In, the first-loss function and first probability correlation, second loss function and second probability correlation.This theory It, can be fast using Transformer model running speed not only when carrying out depth modelling to sequence data in bright book embodiment The characteristics of, and guarantee the robustness of model;Moreover, carrying out model to bidirectional transducer model and language model in first aspect On the basis of training, joint training further is carried out to bidirectional transducer model, language model and more disaggregated models, reaches more preferable Model training effect.
According to the embodiment of another aspect, a kind of document sorting apparatus is also provided, described device is for executing this specification The file classification method that embodiment provides, for example, file classification method shown in Fig. 4.Fig. 8 shows the text according to one embodiment The schematic block diagram of this sorter.As shown in figure 8, the device 800 includes:
Positive vector generation unit 81, for utilizing the third bidirectional transducer mould after the training of method described in Fig. 3 Type, for each word in sentence to be sorted, initial term vector and the word based on the word are in the sentence to be sorted Information above obtains the corresponding positive vector of the word;
Opposite vector generation unit 82, for utilizing the third bidirectional transducer model, for the sentence to be sorted In each word, the context information of initial term vector and the word in the sentence to be sorted based on the word obtains the word Corresponding opposite vector;
Term vector generation unit 83, for the position according to each word in the sentence to be sorted, by it is described it is positive to What opposite vector generation unit 82 described in the positive vector sum of the previous word for the position that amount generation unit 81 obtains obtained should The opposite vector of the latter word of position is stitched together, as the corresponding target term vector in the position;
Sentence vector generation unit 84, in the sentence to be sorted for being obtained according to the term vector generation unit 83 The corresponding target term vector in each position generates the expression vector of the corresponding sentence of the sentence to be sorted;
Text classification unit 85, for being based on institute using more than second disaggregated model after the training of method described in Fig. 3 The expression vector for the corresponding sentence of the sentence to be sorted for stating that sentence vector generation unit 84 obtains, to the sentence to be sorted Carry out text classification.
The device that this specification embodiment provides, vector generation unit 81 positive first utilize the training of method described in Fig. 3 The third bidirectional transducer model afterwards, for each word in sentence to be sorted, the initial term vector based on the word, and Above information of the word in the sentence to be sorted obtains the corresponding positive vector of the word;Then opposite vector generation unit 82 utilize the third bidirectional transducer models, for each word in the sentence to be sorted, based on the initial word of the word to The context information of amount and the word in the sentence to be sorted, obtains the corresponding opposite vector of the word;Then term vector generates Unit 83 is according to the position of each word in the sentence to be sorted, by the positive vector sum of the previous word of the position position The opposite vector of the latter word be stitched together, as the corresponding target term vector in the position;Sentence vector generation unit 84 is again According to the corresponding target term vector in position each in the sentence to be sorted, the table of the corresponding sentence of the sentence to be sorted is generated Show vector;Last text classification unit 85 is using more than second disaggregated model after the training of method described in Fig. 3, based on described The expression vector of the corresponding sentence of sentence to be sorted carries out text classification to the sentence to be sorted.In this specification embodiment, When carrying out depth modelling to sequence data, the fireballing feature of Transformer model running can be utilized, and guarantee mould The robustness of type, and bidirectional transducer model and more disaggregated models after two stages training, are conducive to obtain preferably Text classification result.
According to the embodiment of another aspect, a kind of computer readable storage medium is also provided, is stored thereon with computer journey Sequence enables computer execute and combines method described in Fig. 2 to Fig. 4 when the computer program executes in a computer.
According to the embodiment of another further aspect, a kind of calculating equipment, including memory and processor, the memory are also provided In be stored with executable code, when the processor executes the executable code, realize and combine side described in Fig. 2 to Fig. 4 Method.
Those skilled in the art are it will be appreciated that in said one or multiple examples, function described in the invention It can be realized with hardware, software, firmware or their any combination.It when implemented in software, can be by these functions Storage in computer-readable medium or as on computer-readable medium one or more instructions or code transmitted.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all any modification, equivalent substitution, improvement and etc. on the basis of technical solution of the present invention, done should all Including within protection scope of the present invention.

Claims (16)

1. a kind of model training method for text analyzing, which comprises
Using the first bidirectional transducer model, for each word in the first training sentence, the initial term vector based on the word, with And above information of the word in the first training sentence, obtain the corresponding positive vector of the word;
Using the first bidirectional transducer model, for each word in the first training sentence, based on the initial of the word The context information of term vector and the word in the first training sentence, obtains the corresponding opposite vector of the word;
According to the position of each word in the first training sentence, by the positive vector sum of the previous word of the position position The opposite vector of the latter word be stitched together, as the corresponding target term vector in the position;
Using first language model, for the corresponding target term vector in position each in the first training sentence, prediction is obtained First probability of the corresponding word in the position;
By making the first-loss Function Minimization with first probability correlation, to the first bidirectional transducer model and institute It states first language model to be trained, the second bidirectional transducer model and second language model after being trained.
2. the method for claim 1, wherein described utilize the first bidirectional transducer model, for described first Each word in training sentence, the hereafter letter of initial term vector and the word based on the word in the first training sentence Breath, obtains the corresponding opposite vector of the word, comprising:
Using the first bidirectional transducer model, for each word in the first training sentence, using from attention machine System, the context information of initial term vector and the word based on the word in the first training sentence, is extracted from different perspectives Multiple important informations;
The corresponding vector of important information each in the multiple important information is spliced, obtain the word it is corresponding reversely to Amount.
3. a kind of model training method for text analyzing, which comprises
Using the second bidirectional transducer model after the method as described in claim 1 training, for the second training sentence In each word, initial term vector and the word based on the word it is described second training sentence in information above, be somebody's turn to do The corresponding positive vector of word;
Using the second bidirectional transducer model, for each word in the second training sentence, based on the initial of the word The context information of term vector and the word in the second training sentence, obtains the corresponding opposite vector of the word;
According to the position of each word in the second training sentence, by the positive vector sum of the previous word of the position position The opposite vector of the latter word be stitched together, as the corresponding target term vector in the position;
Using the second language model after the method as described in claim 1 training, in the second training sentence The corresponding target term vector in each position, prediction obtain the first probability of the corresponding word in the position;And according to second instruction Practice the corresponding target term vector in each position in sentence, generates the expression vector of the corresponding sentence of the second training sentence;
Second training is predicted based on the expression vector of the corresponding sentence of the second training sentence using more disaggregated models Second probability of sentence corresponding label;
By make first-loss function and the second loss function and minimization, to the second bidirectional transducer model, described Second language model and more disaggregated models are trained, and obtain third bidirectional transducer model, third language model and More than two disaggregated models;Wherein, the first-loss function and first probability correlation, second loss function and described the Two probability correlations.
4. method as claimed in claim 3, wherein it is described to utilize the second bidirectional transducer model, for described second Each word in training sentence, the hereafter letter of initial term vector and the word based on the word in the second training sentence Breath, obtains the corresponding opposite vector of the word, comprising:
Using the second bidirectional transducer model, for each word in the second training sentence, using from attention machine System, the context information of initial term vector and the word based on the word in the second training sentence, is extracted from different perspectives Multiple important informations;
The corresponding vector of important information each in the multiple important information is spliced, obtain the word it is corresponding reversely to Amount.
5. method as claimed in claim 3, wherein described according to the corresponding target in position each in the second training sentence Term vector generates the expression vector of the corresponding sentence of the second training sentence, comprising:
The corresponding target term vector in position each in the second training sentence is taken into mean value, using the mean value as described second The expression vector of the corresponding sentence of training sentence.
6. method as claimed in claim 3, wherein it is described by make the first-loss function and the second loss function and Minimization is trained the second bidirectional transducer model, the second language model and more disaggregated models, packet It includes:
Make the first-loss function and the second loss function by gradient descent method and minimization, with determination described second pair To the model parameter of converter model, the second language model and more disaggregated models.
7. a kind of file classification method, which comprises
Using the third bidirectional transducer model after method as claimed in claim 3 training, in sentence to be sorted Each word, the information above of initial term vector and the word in the sentence to be sorted based on the word obtains the word pair The positive vector answered;
Utilize the third bidirectional transducer model, for each word in the sentence to be sorted, the initial word based on the word The context information of vector and the word in the sentence to be sorted, obtains the corresponding opposite vector of the word;
According to the position of each word in the sentence to be sorted, by the positive vector sum position of the previous word of the position The opposite vector of the latter word is stitched together, as the corresponding target term vector in the position;
According to the corresponding target term vector in position each in the sentence to be sorted, the corresponding sentence of the sentence to be sorted is generated Expression vector;
Using more than second disaggregated model after method as claimed in claim 3 training, it is based on the sentence pair to be sorted The expression vector for the sentence answered carries out text classification to the sentence to be sorted.
8. a kind of model training apparatus for text analyzing, described device include:
Positive vector generation unit, for utilizing the first bidirectional transducer model, for each word in the first training sentence, base The information above in sentence is trained described first in the initial term vector of the word and the word, obtains the corresponding forward direction of the word Vector;
Opposite vector generation unit, for utilizing the first bidirectional transducer model, in the first training sentence Each word, the context information of initial term vector and the word based on the word in the first training sentence, obtains the word pair The opposite vector answered;
Term vector generation unit, it is for the position according to each word in the first training sentence, the positive vector is raw Behind the position that opposite vector generation unit described in the positive vector sum of the previous word of the position obtained at unit obtains The opposite vector of one word is stitched together, as the corresponding target term vector in the position;
Predicting unit is used to utilize first language model, the first training language obtained for the term vector generation unit The corresponding target term vector in each position in sentence, prediction obtain the first probability of the corresponding word in the position;
Model training unit, for by keeping the first-loss function of the first probability correlation obtained with the predicting unit minimum Change, the first bidirectional transducer model and the first language model is trained, two-way turn of second after being trained Parallel operation model and second language model.
9. device as claimed in claim 8, wherein the opposite vector generation unit is specifically used for:
Using the first bidirectional transducer model, for each word in the first training sentence, using from attention machine System, the context information of initial term vector and the word based on the word in the first training sentence, is extracted from different perspectives Multiple important informations;
The corresponding vector of important information each in the multiple important information is spliced, obtain the word it is corresponding reversely to Amount.
10. a kind of model training apparatus for text analyzing, described device include:
Positive vector generation unit, for utilizing second bidirectional transducer after the method as described in claim 1 training Model, for each word in the second training sentence, initial term vector and the word based on the word are in the second training language Information above in sentence obtains the corresponding positive vector of the word;
Opposite vector generation unit, for utilizing the second bidirectional transducer model, in the second training sentence Each word, the context information of initial term vector and the word based on the word in the second training sentence, obtains the word pair The opposite vector answered;
Term vector generation unit, it is for the position according to each word in the second training sentence, the positive vector is raw Behind the position that opposite vector generation unit described in the positive vector sum of the previous word of the position obtained at unit obtains The opposite vector of one word is stitched together, as the corresponding target term vector in the position;
First predicting unit, for the second language model after being trained using the method as described in claim 1, for institute The corresponding target term vector in each position in the second training sentence is stated, prediction obtains the first probability of the corresponding word in the position;
Sentence vector generation unit, described second for being obtained according to the term vector generation unit trains each position in sentence Corresponding target term vector is set, the expression vector of the corresponding sentence of the second training sentence is generated;
Second predicting unit is used to utilize more disaggregated models, second instruction obtained based on the sentence vector generation unit Practice the expression vector of the corresponding sentence of sentence, predicts the second probability of the second training sentence corresponding label;
Model training unit, for by make first-loss function and the second loss function and minimization, to described second pair Be trained to converter model, the second language model and more disaggregated models, obtain third bidirectional transducer model, Third language model and more than second disaggregated model;Wherein, the first-loss function and first probability correlation, described second Loss function and second probability correlation.
11. device as claimed in claim 10, wherein the opposite vector generation unit is specifically used for:
Using the second bidirectional transducer model, for each word in the second training sentence, using from attention machine System, the context information of initial term vector and the word based on the word in the second training sentence, is extracted from different perspectives Multiple important informations;
The corresponding vector of important information each in the multiple important information is spliced, obtain the word it is corresponding reversely to Amount.
12. device as claimed in claim 10, wherein the sentence vector generation unit, specifically for described second is instructed Practice the corresponding target term vector in each position in sentence and take mean value, using the mean value as the corresponding sentence of the second training sentence The expression vector of son.
13. device as claimed in claim 10, wherein the model training unit, specifically for being made by gradient descent method The first-loss function and the second loss function and minimization, with determination the second bidirectional transducer model, described The model parameter of two language models and more disaggregated models.
14. a kind of document sorting apparatus, described device include:
Positive vector generation unit, for utilizing the third bidirectional transducer after method as claimed in claim 3 training Model, for each word in sentence to be sorted, initial term vector and the word based on the word are in the sentence to be sorted Information above, obtain the corresponding positive vector of the word;
Opposite vector generation unit, for utilizing the third bidirectional transducer model, for every in the sentence to be sorted A word, the context information of initial term vector and the word in the sentence to be sorted based on the word, it is corresponding to obtain the word Opposite vector;
Term vector generation unit generates the positive vector for the position according to each word in the sentence to be sorted The position that opposite vector generation unit described in the positive vector sum of the previous word for the position that unit obtains obtains it is latter The opposite vector of a word is stitched together, as the corresponding target term vector in the position;
Sentence vector generation unit, each position in the sentence to be sorted for being obtained according to the term vector generation unit Corresponding target term vector generates the expression vector of the corresponding sentence of the sentence to be sorted;
Text classification unit, for being based on using more than second disaggregated model after method as claimed in claim 3 training The expression vector for the corresponding sentence of the sentence to be sorted that the sentence vector generation unit obtains, to the sentence to be sorted Carry out text classification.
15. a kind of computer readable storage medium, is stored thereon with computer program, when the computer program in a computer When execution, computer perform claim is enabled to require the method for any one of 1-7.
16. a kind of calculating equipment, including memory and processor, executable code, the processing are stored in the memory When device executes the executable code, the method for any one of claim 1-7 is realized.
CN201910176632.6A 2019-03-08 2019-03-08 Model training method for text analysis, text classification method and device Active CN110046248B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910176632.6A CN110046248B (en) 2019-03-08 2019-03-08 Model training method for text analysis, text classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910176632.6A CN110046248B (en) 2019-03-08 2019-03-08 Model training method for text analysis, text classification method and device

Publications (2)

Publication Number Publication Date
CN110046248A true CN110046248A (en) 2019-07-23
CN110046248B CN110046248B (en) 2023-08-25

Family

ID=67274609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910176632.6A Active CN110046248B (en) 2019-03-08 2019-03-08 Model training method for text analysis, text classification method and device

Country Status (1)

Country Link
CN (1) CN110046248B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110543566A (en) * 2019-09-06 2019-12-06 上海海事大学 intention classification method based on self-attention neighbor relation coding
CN110781305A (en) * 2019-10-30 2020-02-11 北京小米智能科技有限公司 Text classification method and device based on classification model and model training method
CN111221963A (en) * 2019-11-19 2020-06-02 成都晓多科技有限公司 Intelligent customer service data training model field migration method
CN111506702A (en) * 2020-03-25 2020-08-07 北京万里红科技股份有限公司 Knowledge distillation-based language model training method, text classification method and device
CN111554277A (en) * 2020-05-15 2020-08-18 深圳前海微众银行股份有限公司 Voice data recognition method, device, equipment and medium
CN111625645A (en) * 2020-05-14 2020-09-04 北京字节跳动网络技术有限公司 Training method and device of text generation model and electronic equipment
CN112232088A (en) * 2020-11-19 2021-01-15 京北方信息技术股份有限公司 Contract clause risk intelligent identification method and device, electronic equipment and storage medium
CN112613316A (en) * 2020-12-31 2021-04-06 北京师范大学 Method and system for generating ancient Chinese marking model
WO2021143396A1 (en) * 2020-01-16 2021-07-22 支付宝(杭州)信息技术有限公司 Method and apparatus for carrying out classification prediction by using text classification model
CN113392193A (en) * 2020-03-12 2021-09-14 广东博智林机器人有限公司 Dialog text generation method and device
US11468246B2 (en) 2019-07-22 2022-10-11 Capital One Services, Llc Multi-turn dialogue response generation with template generation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JACOB DEVLIN 等: "Bert:Pre-training of deep bidirectional transformers for language understanding", 《HTTP://ARXIV:1810.04805》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11468246B2 (en) 2019-07-22 2022-10-11 Capital One Services, Llc Multi-turn dialogue response generation with template generation
US11816439B2 (en) 2019-07-22 2023-11-14 Capital One Services, Llc Multi-turn dialogue response generation with template generation
US11651163B2 (en) 2019-07-22 2023-05-16 Capital One Services, Llc Multi-turn dialogue response generation with persona modeling
US11615255B2 (en) 2019-07-22 2023-03-28 Capital One Services, Llc Multi-turn dialogue response generation with autoregressive transformer models
US11487954B2 (en) 2019-07-22 2022-11-01 Capital One Services, Llc Multi-turn dialogue response generation via mutual information maximization
CN110543566B (en) * 2019-09-06 2022-07-22 上海海事大学 Intention classification method based on self-attention neighbor relation coding
CN110543566A (en) * 2019-09-06 2019-12-06 上海海事大学 intention classification method based on self-attention neighbor relation coding
CN110781305A (en) * 2019-10-30 2020-02-11 北京小米智能科技有限公司 Text classification method and device based on classification model and model training method
CN110781305B (en) * 2019-10-30 2023-06-06 北京小米智能科技有限公司 Text classification method and device based on classification model and model training method
CN111221963B (en) * 2019-11-19 2023-05-12 成都晓多科技有限公司 Intelligent customer service data training model field migration method
CN111221963A (en) * 2019-11-19 2020-06-02 成都晓多科技有限公司 Intelligent customer service data training model field migration method
WO2021143396A1 (en) * 2020-01-16 2021-07-22 支付宝(杭州)信息技术有限公司 Method and apparatus for carrying out classification prediction by using text classification model
CN113392193A (en) * 2020-03-12 2021-09-14 广东博智林机器人有限公司 Dialog text generation method and device
CN111506702A (en) * 2020-03-25 2020-08-07 北京万里红科技股份有限公司 Knowledge distillation-based language model training method, text classification method and device
CN111625645A (en) * 2020-05-14 2020-09-04 北京字节跳动网络技术有限公司 Training method and device of text generation model and electronic equipment
CN111625645B (en) * 2020-05-14 2023-05-23 北京字节跳动网络技术有限公司 Training method and device for text generation model and electronic equipment
CN111554277A (en) * 2020-05-15 2020-08-18 深圳前海微众银行股份有限公司 Voice data recognition method, device, equipment and medium
CN111554277B (en) * 2020-05-15 2023-11-03 深圳前海微众银行股份有限公司 Voice data recognition method, device, equipment and medium
CN112232088A (en) * 2020-11-19 2021-01-15 京北方信息技术股份有限公司 Contract clause risk intelligent identification method and device, electronic equipment and storage medium
CN112613316A (en) * 2020-12-31 2021-04-06 北京师范大学 Method and system for generating ancient Chinese marking model

Also Published As

Publication number Publication date
CN110046248B (en) 2023-08-25

Similar Documents

Publication Publication Date Title
CN110046248A (en) Model training method, file classification method and device for text analyzing
CN108664632A (en) A kind of text emotion sorting algorithm based on convolutional neural networks and attention mechanism
CN110222178A (en) Text sentiment classification method, device, electronic equipment and readable storage medium storing program for executing
CN109241255A (en) A kind of intension recognizing method based on deep learning
CN110245229A (en) A kind of deep learning theme sensibility classification method based on data enhancing
CN110489755A (en) Document creation method and device
CN109753567A (en) A kind of file classification method of combination title and text attention mechanism
CN108460089A (en) Diverse characteristics based on Attention neural networks merge Chinese Text Categorization
CN110188192B (en) Multi-task network construction and multi-scale criminal name law enforcement combined prediction method
CN107918782A (en) A kind of method and system for the natural language for generating description picture material
CN107330011A (en) The recognition methods of the name entity of many strategy fusions and device
CN109325231A (en) A kind of method that multi task model generates term vector
CN104899298A (en) Microblog sentiment analysis method based on large-scale corpus characteristic learning
CN109145290A (en) Based on word vector with from the semantic similarity calculation method of attention mechanism
CN109446331A (en) A kind of text mood disaggregated model method for building up and text mood classification method
CN106682089A (en) RNNs-based method for automatic safety checking of short message
CN106683667A (en) Automatic rhythm extracting method, system and application thereof in natural language processing
CN114238577B (en) Multi-task learning emotion classification method integrating multi-head attention mechanism
CN109783644A (en) A kind of cross-cutting emotional semantic classification system and method based on text representation study
CN112561718A (en) Case microblog evaluation object emotion tendency analysis method based on BilSTM weight sharing
CN110390110A (en) The method and apparatus that pre-training for semantic matches generates sentence vector
Thattinaphanich et al. Thai named entity recognition using Bi-LSTM-CRF with word and character representation
CN116579347A (en) Comment text emotion analysis method, system, equipment and medium based on dynamic semantic feature fusion
CN113673241B (en) Text abstract generation framework system and method based on example learning
CN109670169B (en) Deep learning emotion classification method based on feature extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20201019

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201019

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

GR01 Patent grant
GR01 Patent grant