CN110489555A - A kind of language model pre-training method of combination class word information - Google Patents
A kind of language model pre-training method of combination class word information Download PDFInfo
- Publication number
- CN110489555A CN110489555A CN201910775453.4A CN201910775453A CN110489555A CN 110489555 A CN110489555 A CN 110489555A CN 201910775453 A CN201910775453 A CN 201910775453A CN 110489555 A CN110489555 A CN 110489555A
- Authority
- CN
- China
- Prior art keywords
- training
- word
- character string
- model
- sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Abstract
The present invention relates to language processing techniques field more particularly to a kind of language model pre-training methods of combination class word information comprising following steps: S1, providing pre-training model and pre-training text;S2, it extracts character string and forms vocabulary;S3, two sentences are extracted as training sentence while sentence will be trained to be divided into individual character sequence;S4, the word in the character string and individual character sequence in step S2 is matched, and to and individual character sequence in the character string that matches of word be marked;S5, the individual character for choosing preset ratio among individual character sequence is covered or is replaced, and by the training sentence for covering or replacing and labeled character string while be input in pre-training model pre-training model is trained and is optimized;S6, step S2-S5 is repeated until pre-training model reaches the optimal conditions of setting with the pre-training model after being optimized.The language model pre-training method and pre-training model of combination class word information provided by the invention have better performance on multiple Downstream Jobs.
Description
[technical field]
The present invention relates to language processing techniques field more particularly to a kind of language model pre-training sides of combination class word information
Method.
[background technique]
Current state-of-the-art pre-training language model is divided into two classes, is autoregression language model respectively
(Autoregressive Model) with from coded language model (Autoencoding Model).GPT and GPT2 be performance compared with
Good autoregression language model.The training objective of autoregression model is correctly to speculate next word according to above.BERT is generation
Table from coded language model.The training objective of BERT is based on context correctly to deduce covered or replacement word.Two
Kind language model has their own advantages and disadvantage.Autoregression model can only combine above, and can not complete in combination with context
Particular task.On the other hand, contextual information can be utilized simultaneously from coded language model, but wherein in pre-training process
In, in order to cover supposition target word, [mask] is marked, training corpus is added for replacing original target word, however [mask]
Coding does not appear in during the micro-adjustment for particular task.Above-mentioned reason results in pre-training language model in pre-training
Input with micro-adjustment mismatches, and then influences the performance of model entirety.Recently, XLNet is proposed for solving simultaneously above-mentioned
Two problems, so that pre-training language model while not introducing [mask] label, can complete task in conjunction with context.
However above-mentioned language model and underuse word, phrase, the entity etc. occurred in pre-training and micro-adjustment corpus
The information of larger particle degree.And this type of information is even more important in Chinese task.Compared with English, Chinese is clear without space etc.
Word boundary so that model is more difficult from the sequence of individual character study to double word or the whole meaning of multi-character words.
Recently, BERT-wwm model be proposed as a kind of BERT model carried out on Chinese regarding to the issue above it is excellent
Change.BERT-wwm and BERT the difference is that only the pretreatment to training corpus.BERT is covered to pre-training corpus
When operation, 15% individual character is substituted for [mask], other words will retain.And BERT-wwm first uses participle tool to primitive material
It is segmented, then carries out identical covering operation as unit of whole word.Before a little earlier, the ERNIE of Baidu's publication is also BERT
Improvement regarding to the issue above.ERNIE uses multi-level covering strategy.The multi-level strategy that covers includes that word rank covers,
Phrase rank covers and entity level covers.In order to reach the target covered at many levels, Baidu in addition to Chinese wikipedia data,
Baidupedia, Baidu's discussion bar and question and answer data are additionally used.Although ERNIE has used more training datas, learn more
Knowledge, at that time it the performance on Downstream Jobs is suitable with BERT-wwm.
However covering strategy by multilayer also has problems to learn word boundary information.Firstly, covering the effective of strategy
Property dependent on the additional information except text, such as BERT-wwm dependent on segmenter provide as a result, and ERNIE dependent on outer
Portion's knowledge.In actual use, there are following disadvantages using additional information.First, the quality of information is unable to get guarantee.Example
As the effect of BERT-wwm depends on the quality of Chinese word segmentation.Second, it is that the information of high quality needs a large amount of acquisitions and mark, gives
Pre-training language model brings additional cost.Third only carries out covering and under utilized word information for word, because of word
Language may contain with the exotic vocabularies such as literal unrelated amplification meaning, such as " Romania ", the Chinese idioms such as " the old frontiersman lose his horse ", with
And two-part allegorical sayings such as " daughter's son carrys a lantern lighted ".
For this problem, this patent proposes a kind of new by class word information on the basis of existing language model
It is dissolved into the pre-training of language model and the method for fine tuning.
[summary of the invention]
For the low and at high cost defect of existing language model prediction accuracy, the present invention provides a kind of combination class word
The language model pre-training method of information.
In order to solve the above technical problem, the present invention provides a kind of language model pre-training of combination class word information by the present invention
Method comprising following steps: a pre-training model and pre-training text S1, are provided;S2, from the pre-training text
It extracts character string and forms vocabulary;S3, two sentences of extraction simultaneously will be described as training sentence from the pre-training text
Training sentence is divided into individual character sequence;S4, the word in the character string and the individual character sequence in the step S2 is matched,
And the character string to match with the word in the individual character sequence is marked;S5, preset ratio will be chosen among individual character sequence
Individual character covered or replaced, and by the training sentence for covering or replacing and labeled character string while being input to pre- instruction
Practice and pre-training model is trained and is optimized in model;S6, the S2-S5 that repeats the above steps are until pre-training model reaches setting
Optimal conditions with the pre-training model after being optimized.
Preferably, in above-mentioned steps S2, character string is obtained by derived algorithm or artificially extracts character string.
Preferably, in above-mentioned steps S3, the ending of each sentence is added respectively in two trained sentences of extraction
[sep], the beginning of the sentence addition [cls] in first sentence;In the step S4, the location information of the character string is utilized
And/or the character string is marked in length information.
Preferably, in above-mentioned steps S6, when executing step S2 every time, two sentences are extracted one by one from pre-training file
As training sentence, until sentence all in the pre-training text is extracted and finishes, two extracted every time sentence is phase
Adjacent is either non-conterminous, when extraction finishes, proportional region shared by two adjacent sentences and non-conterminous two sentences
For 40-70%, sum of the two 100%.
Preferably, the step S5 specifically comprises the following steps: S51, establishes the target letter about the pre-training model
Number;S52, the individual character that the individual character sequence chooses 15% is covered or is replaced;S53, the training that will be covered or replaced
Sentence and labeled character string are input in pre-training model simultaneously;S54, it is capped or is replaced by pre-training model prediction
Word to obtain the vector expression for representing described capped or replacement word;And S55, utilize the vector expression calculate target letter
It counts and optimizes the pre-training model.
Preferably, the language model pre-training method of the combination class word information further includes following steps: step S7, being combined
The vocabulary formed in the step S2 carries out task fine tuning to the pre-training model after the optimization obtained in the step S6.
Preferably, the step S7 specifically comprises the following steps: S71, provides fine tuning task text;S72, to the fine tuning
Task text segmentation is at individual character sequence;S73, will be in the individual character sequence in the character string and the step S72 in the step S2
Word carry out match and the character string after the matching is marked;S74, by the individual character sequence and labeled character
Pre-training model is finely adjusted in string while the pre-training model being input to after optimization.
Preferably, it in above-mentioned steps S74, is realized by full articulamentum or CRF network optimization objective function to described
The optimization of pre-training model after optimization.
Preferably, the pre-training model includes embeding layer, character level encoder, word level encoder, multiple attentions
Encoder;Wherein, the embeding layer is in covered in the step S5 or the training sentence and the step S4 replaced
Labeled character string input, the embeding layer by the individual character be converted into individual character insertion vector corresponding with each individual character and
Each character string is converted into character string insertion vector corresponding with each character string each individual character insertion vector sum character simultaneously
String insertion vector is corresponding plus position encoded;The character level encoder is compiled for its corresponding position of individual character insertion vector sum
Code is inputted and is calculated to obtain the word vector expression of not covered or replacement word;Institute's predicate level encoder supplies the word
Its corresponding position encoded input of symbol string insertion vector sum is simultaneously calculated to obtain term vector expression;The attention encoder
To be multiple, express while inputted to obtain about described for the word vector expression for not covering or replacing and the term vector
The vector expression of covered or replacement word.
Preferably, the pre-training model further includes Linear network layer and Softmax network layer, the word vector expression
With term vector expression through being input to the Linear network layer and Softmax network after the attention encoder output
Further training and fine tuning are done to the pre-training model in layer.
Compared with prior art, the language model pre-training method and pre-training mould of combination class word information provided by the invention
Type has the following beneficial effects:
One pre-training model and pre-training text are provided first, character string is extracted from the pre-training text and formed
Then vocabulary extracts two sentences as training sentence from the pre-training text and the trained sentence is divided into list simultaneously
Word sequence;By the way that the word in character string and the individual character sequence is matched, and to the word phase in the individual character sequence
The character string matched is marked, can well using character string relevant information and not be only individual character sequence in individual character to
Amount information predicts capped or replacement word, character string is marked the location information and length for often using character string
It spends information to carry out, therefore, can go to predict covered or replacement word using relevance between character string well, improve pre- instruction
Practice model to the accuracy of capped or replacement word, while the pre-training model that optimization is come has preferably performance
Also, there is better performance in multiple tasks, for example, in Chinese word segmentation, part-of-speech tagging, Entity recognition, sentiment analysis, nature
Language inference, sentence classification, machine read the Downstream Jobs such as understanding, article classification with preferably performance.
In pre-training model provided by the invention, including embeding layer, character level encoder, word level encoder, multiple notes
Meaning power encoder, institute's predicate level encoder is for its corresponding position encoded input of character string insertion vector sum and is counted
Calculate to obtain term vector expression, and attention encoder for the word vector expression for not covering or replacing and institute's predicate to
Amount is expressed while being inputted to obtain the vector expression about described covered or replacement word, and a word level encoder knot is arranged
Character level encoder is closed, so that obtaining when attention encodes and calculates the vector expression for obtaining covered or replacement word
Vector expression preferably represent capped or replacement word, the pre- instruction for improving the accuracy of prediction, while optimization being come
Practicing model has preferably performance also, has better performance in multiple tasks, for example, part-of-speech tagging is real in Chinese word segmentation
Body identification, the Downstream Jobs such as sentiment analysis and document classification have preferably performance.And pre-training mould provided by the invention
Type joined word boundary information, have stronger generative capacity, can apply and generate in keyword, and article is continued, and article is summarized
Etc. among tasks, when the task of execution, it is higher to generate the sentence quality come.
[Detailed description of the invention]
Fig. 1 is in first embodiment of the invention in conjunction with the flow diagram of the language model pre-training method of class word information;
Fig. 2 is the pre- instruction used for combining the language model pre-training method of class word information in the present invention in first embodiment
Practice the module diagram of model;
Fig. 3 is will be single in the language model pre-training method and step S4 for combine in first embodiment in the present invention class word information
Word sequence and character string carry out matched schematic diagram;
Fig. 4 is in first embodiment of the invention in conjunction with the details of step S5 in the language model pre-training method of class word information
Flow chart;
Fig. 5 is in first embodiment of the invention in conjunction with step S53 and step in the language model pre-training method of class word information
The rapid corresponding input pre-training model of S54 executes the schematic diagram of respective operations;
Fig. 6 is in first embodiment of the invention in conjunction with variant embodiment in the language model pre-training method of class word information
Flow diagram;
Fig. 7 is in first embodiment of the invention in conjunction with the details of step S7 in the language model pre-training method of class word information
Flow chart;
Fig. 8 is the module diagram of the electronic equipment provided in second embodiment of the invention;
Fig. 9 is adapted for the structural schematic diagram for the computer system for realizing the server of the embodiment of the present invention.
Description of symbols:
11, embeding layer;12, character level encoder;13, word level encoder;14, attention encoder;60, electronics is set
It is standby;601, memory;602, processor;800, computer system;801, central processing unit (CPU);802, memory
(ROM);803,RAM;804, bus;805, I/O interface;806, importation;807, output par, c;808, storage section;
809, communications portion;810, driver;811, detachable media.
[specific embodiment]
In order to make the purpose of the present invention, technical solution and advantage are more clearly understood, below in conjunction with attached drawing and embodiment,
The present invention will be described in further detail.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention,
It is not intended to limit the present invention.
Referring to Fig. 1, first embodiment of the invention provides a kind of language model pre-training method of combination class word information,
Include the following steps:
S1, a pre-training model and pre-training text are provided;
S2, character string is extracted from the pre-training text and forms vocabulary;
S3, two sentences are extracted as training sentence from the pre-training text while being divided into the trained sentence
Individual character sequence;
S4, the word in the character string and the individual character sequence in the step S2 is matched, and to the individual character
The character string that word in sequence matches is marked;
S5, the individual character for choosing preset ratio among individual character sequence is covered or is replaced, and will covered or replaced
Training sentence and labeled character string are input in pre-training model simultaneously to be trained and optimizes to pre-training model;
S6, the S2-S5 that repeats the above steps are until pre-training model reaches the optimal conditions of setting with pre- after being optimized
Training pattern.
In the step S1, the acquisition of the pre-training text from wikipedia, news corpus, medical question and answer corpus and
The text-only files such as financial report data, which are chosen, to be obtained.
Referring to Fig. 2, in the step S1, the pre-training model is to be carried out based on existing from coded language model
Acquisition is improved, existing it includes but is not limited to BERT (Bidirectional Encoder from coded language model
Representations from Transformer) language model.The pre-training model includes embeding layer 11, character level volume
The modular units such as code device 12, word level encoder 13, multiple attention encoders 14.Wherein the xN in attached drawing 2 is indicated wherein also
Multiple attention encoders 14 are omitted.
The embeding layer 11 is for quilt in covered in the step S5 or the training sentence and the step S4 replaced
The character string input of label, the embeding layer 11 by the individual character be converted into corresponding with each individual character individual character insertion vector and
Each character string is converted into character string insertion vector corresponding with each character string each individual character insertion vector sum character simultaneously
String insertion vector is corresponding plus position encoded.
The character level encoder 12 is for its corresponding position encoded input of individual character insertion vector sum and is calculated
To obtain the word vector expression of not covered or replacement word;
Institute's predicate level encoder 13 is for its corresponding position encoded input of character string insertion vector sum and is counted
It calculates to obtain term vector expression.
The attention encoder 14 is multiple, the word vector expression for not covering or replacing for described and the term vector
It expresses while inputting to obtain the vector expression about described covered or replacement word.
Pre-training model further includes Linear network layer 15 and Softmax network layer 16, described covered or replacement word
The expression of word vector be input to the Linear network layer 15 and Softmax network after the attention encoder 14 output
In layer 16, further to complete the optimization and fine tuning task to pre-training model.
In the step S2, character string is extracted from the pre-training text and forms vocabulary.It in this step can be with
Character string is obtained derived algorithm or by extracting character string by way of artificially extracting to form vocabulary.Generally, word is taken out
Algorithm is also referred to as segmentation methods or string matching segmentation methods.Such algorithm is according to certain strategy by character to be matched
Word in string and " sufficiently big " dictionary having had built up is matched, illustrate to match if finding some entry at
Function, that is, identifying the word.Optionally, derived algorithm includes but is not limited to Accesser Variety.It will of course be understood that
, in this step, pre-training text can be carried out to take out word by way of artificially extracting and character string is placed into word
In table.When, there are when the relatively low character string of the utilization rates such as some two-part allegorical sayings, Chinese idiom or common saying, passing through in pre-training text
Character string is added in vocabulary the content that can enrich vocabulary well by the artificial mode for taking out word, improves the optimization of pre-training model
Effects.
In the step S3, two sentences are extracted from the pre-training text as training sentence simultaneously by the instruction
Practice sentence and is divided into individual character sequence.In this step, it is divided into individual character sequence to be also divided into a sentence training sentence
Using single word as the meaning of minimum unit.Training sentence can be divided into individual character sequence by split function.In step S3
In, further include operation: the ending of each sentence adds [sep], at described first respectively in two trained sentences of extraction
The beginning of the sentence of sentence adds [cls].
Referring to Fig. 3, in above-mentioned steps S4, by the word in the character string and the individual character sequence in the step S2 into
Row matching, and the character string to match with the word in the individual character sequence is marked.In the present embodiment, individual character sequence pair
Should be " people, work, intelligence, energy, reality, test, room, very, rhinoceros, benefit ", it is corresponding in vocabulary that there are following character strings: " artificial ", " intelligence ",
" experiment ", " laboratory " and " sharp ";It is as shown in Figure 3 after word and string matching so in individual character sequence.When the two into
After row matching, the character string is carried out using the location information and/or length information of the corresponding training sentence of the character string
Label.
In above-mentioned steps S5, the individual character that preset ratio is chosen among individual character sequence is covered or replaced, and will be hidden
The training sentence that covers or replaced and labeled character string while being input in pre-training model instructs pre-training model
Practice and optimizes.In this step, preset ratio is usually the percentage that covered or replacement word accounts for entire sentence number of words, model
It encloses for 10-30%, in the present embodiment, the ratio of selection is 15%.
Referring to Fig. 4, the step S5 specifically comprises the following steps:
The objective function of S51, foundation about the pre-training model;
S52, the individual character that the individual character sequence chooses 15% is covered or is replaced;
S53, by the training sentence for covering or replacing and labeled character string while pre-training model is input to;
S54, word capped by pre-training model prediction or replacing represent described capped or replacement word to obtain
Vector expression;And
S55, calculating target function is expressed using the vector and optimizes the pre-training model.
Referring to Fig. 5, in the step S53, the training sentence of extraction are as follows: " Artificial Intelligence Laboratory is very sharp ", it will
" work " word therein and " rhinoceros " word are substituted for " mask ", simultaneously with labeled character string by the training sentence for covering or replacing
It is input in the embeding layer 11 of pre-training model, individual character is converted into individual character corresponding with each individual character and is embedded in vector by embeding layer 11
And each character string is converted into character string insertion vector corresponding with each character string each individual character insertion vector sum simultaneously
It is corresponding plus position encoded that character string is embedded in vector.It is appreciated that being provided with position coder in embeding layer 11, pass through position
Encoder is corresponding plus position encoded in each individual character insertion vector sum character string insertion vector.It is position encoded to be used to indicate
The position that each character string occurs in training sentence.
As shown in Figure 5, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10 correspond in individual character sequence " people ",
" mask ", " work ", " intelligence ", " energy ", " reality ", " testing ", " room ", " very ", " rhinoceros " " benefit ";W1, w2, w3, w4, w5 respectively correspond as word
Symbol string " artificial ", " intelligence ", " experiment ", " laboratory ", " sharp " insertion vector.
Further, individual character insertion its corresponding position encoded is input in character level encoder 12 of vector sum is calculated
To obtain the word vector expression of not covered or replacement word;Character string insertion vector sum its corresponding position encoded be input to word
Level encoder 13 is calculated to obtain and express at term vector.
In step S54, the word vector expression of the word for not covering or replacing and the term vector are expressed while being inputted
It is expressed in the attention encoder 14 with acquisition " [cls] " label and covered or replacement individual character vector.In step S54
In, term vector expression has location information and length information as mark information, and character string and word are matched, so that
When attention encoder 14 calculates the word vector expression for the word for being capped or being replaced, accuracy is higher, so that being predicted
The accuracy of word out is higher, also, has better performance in multiple tasks, for example, in Chinese word segmentation, part-of-speech tagging,
Entity recognition, sentiment analysis, natural language inference, sentence classification, machine read the Downstream Jobs such as understanding, article classification with more
Good performance.
After executing the step S55, pre-training model has obtained certain optimization, further below execute step S6,
S2-S5 repeat the above steps until pre-training model reaches the optimal conditions of setting with the pre-training model after being optimized.In
In this step, the optimal conditions of setting correspond to the objective function and reach convergence state.When executing step S2 every time, from pre- instruction
Practice and extracts two sentences in file one by one as training sentence, until sentence all in the pre-training text has been extracted
Finish, two sentences extracted every time be it is adjacent either non-conterminous, extraction is when finishing, two adjacent sentences and non-conterminous
Two sentences shared by proportional region be 40-70%, sum of the two 100%.In the present embodiment, the ratio that the two respectively accounts for
It is 50%.
Referring to Fig. 6, the language model pre-training method of the combination class word information further includes following steps: step S7,
Task fine tuning is carried out to the pre-training model after the optimization obtained in the step S6 in conjunction with the vocabulary formed in the step S2.
Referring to Fig. 7, the step S7 specifically comprises the following steps:
S71, fine tuning task text is provided;
S72, to the fine tuning task text segmentation at individual character sequence;
S73, by the word in the individual character sequence in the character string and the step S72 in the step S2 carry out matching and it is right
Character string after the matching is marked;
S74, the individual character sequence and labeled character string are input in the pre-training model after optimization simultaneously to pre-
Training pattern is finely adjusted.
In step S71, fine tuning task text is also from wikipedia, news corpus, medical question and answer corpus and financial report number
It chooses and obtains according to equal text-only files, it cannot be identical with the pre-training text in step S1 but finely tune task text.
In above-mentioned steps S74, after being realized by full articulamentum or CRF network optimization objective function to the optimization
The optimization of pre-training model.
Referring to Fig. 8, the second embodiment of the present invention provides a kind of electronic equipment 60, including memory 601 and processor
602, computer program is stored in the memory 601, the computer program is arranged to be executed when operation as first is real
Apply the language model pre-training method of combination class word information described in example;
The processor 602 is arranged to execute combination class word as in the first embodiment by the computer program
The language model pre-training method of information.
Below with reference to Fig. 9, it illustrates the terminal device/server computers for being suitable for being used to realize the embodiment of the present application
The structural schematic diagram of system 800.Terminal device/server shown in Fig. 8 is only an example, should not be to the embodiment of the present application
Function and use scope bring any restrictions.
As shown in figure 9, computer system 800 includes central processing unit (CPU) 801, it can be read-only according to being stored in
Program in memory (ROM) 802 or be loaded into the program in random access storage device (RAM) 803 from storage section 808 and
Execute various movements appropriate and processing.In RAM 803, also it is stored with system 800 and operates required various programs and data.
CPU 801, ROM 802 and RAM 803 are connected with each other by bus 804.Input/output (I/O) interface 805 is also connected to always
Line 804.
I/O interface 805 is connected to lower component: the importation 806 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 807 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 808 including hard disk etc.;
And the communications portion 809 of the network interface card including LAN card, modem etc..Communications portion 809 via such as because
The network of spy's net executes communication process.Driver 810 is also connected to I/O interface 805 as needed.Detachable media 811, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 810, in order to read from thereon
Computer program be mounted into storage section 808 as needed.
Disclosed embodiment according to the present invention may be implemented as computer software above with reference to the process of flow chart description
Program.For example, embodiment disclosed by the invention includes a kind of computer program product comprising be carried on computer-readable medium
On computer program, which includes the program code for method shown in execution flow chart.In such reality
It applies in example, which can be downloaded and installed from network by communications portion 809, and/or from detachable media
811 are mounted.When the computer program is executed by central processing unit (CPU) 801, limited in execution the present processes
Above-mentioned function.It should be noted that computer-readable medium described herein can be computer-readable signal media or
Computer readable storage medium either the two any combination.Computer readable storage medium for example can be-but not
Be limited to-electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.It calculates
The more specific example of machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, portable of one or more conducting wires
Formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory
(EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or
The above-mentioned any appropriate combination of person.
The calculating of the operation for executing the application can be write with one or more programming languages or combinations thereof
Machine program code, described program design language include object oriented program language-such as Java, Smalltalk, C+
+, it further include conventional procedural programming language-such as " such as " language or similar programming language.Program code can
Fully to execute, partly be executed on management end computer, as an independent software package on management end computer
It executes, partially part executes on the remote computer or completely in remote computer or server on management end computer
Upper execution.In situations involving remote computers, remote computer can pass through the network of any kind --- including local
Net (LAN) or the domain wide area network (WAN) are connected to management end computer, or, it may be connected to outer computer (such as using because
Spy nets service provider to connect by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use
The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box
The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually
It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse
Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding
The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction
Combination realize.
Compared with prior art, the language model pre-training method and pre-training mould of combination class word information provided by the invention
Type has the following beneficial effects:
One pre-training model and pre-training text are provided first, character string is extracted from the pre-training text and formed
Then vocabulary extracts two sentences as training sentence from the pre-training text and the trained sentence is divided into list simultaneously
Word sequence;By the way that the word in character string and the individual character sequence is matched, and to the word phase in the individual character sequence
The character string matched is marked, can well using character string relevant information and not be only individual character sequence in individual character to
Amount information predicts capped or replacement word, character string is marked the location information and length for often using character string
It spends information to carry out, therefore, can go to predict covered or replacement word using relevance between character string well, improve pre- instruction
Practice model to the accuracy of capped or replacement word, while the pre-training model that optimization is come has preferably performance
Also, there is better performance in multiple tasks, for example, in Chinese word segmentation, part-of-speech tagging, Entity recognition, sentiment analysis, nature
Language inference, sentence classification, machine read the Downstream Jobs such as understanding, article classification with preferably performance.
In pre-training model provided by the invention, including embeding layer, character level encoder, word level encoder, multiple notes
Meaning power encoder, institute's predicate level encoder is for its corresponding position encoded input of character string insertion vector sum and is counted
Calculate to obtain term vector expression, and attention encoder for the word vector expression for not covering or replacing and institute's predicate to
Amount is expressed while being inputted to obtain the vector expression about described covered or replacement word, and a word level encoder knot is arranged
Character level encoder is closed, so that obtaining when attention encodes and calculates the vector expression for obtaining covered or replacement word
Vector expression preferably represent capped or replacement word, the pre- instruction for improving the accuracy of prediction, while optimization being come
Practicing model has preferably performance also, has better performance in multiple tasks, for example, part-of-speech tagging is real in Chinese word segmentation
Body identification, the Downstream Jobs such as sentiment analysis and document classification have preferably performance.And pre-training mould provided by the invention
Type joined word boundary information, have stronger generative capacity, can apply and generate in keyword, and article is continued, and article is summarized
Etc. among tasks, when the task of execution, it is higher to generate the sentence quality come.
The foregoing is merely present pre-ferred embodiments, are not intended to limit the invention, it is all principle of the present invention it
Any modification made by interior, equivalent replacement and improvement etc. should all be comprising within protection scope of the present invention.
Claims (10)
1. a kind of language model pre-training method of combination class word information, it is characterised in that: it includes the following steps:
S1, a pre-training model and pre-training text are provided;
S2, character string is extracted from the pre-training text and forms vocabulary;
S3, two sentences are extracted from the pre-training text as training sentence while the trained sentence is divided into individual character
Sequence;
S4, the word in the character string and the individual character sequence in the step S2 is matched, and to the individual character sequence
In the character string that matches of word be marked;
S5, the training that the individual character that preset ratio is chosen among individual character sequence is covered or replaced, and will covered or replaced
Sentence and labeled character string are input in pre-training model simultaneously to be trained and optimizes to pre-training model;
S6, the S2-S5 that repeats the above steps are until pre-training model reaches the optimal conditions of setting with the pre-training after being optimized
Model.
2. combining the language model pre-training method of class word information as described in claim 1, it is characterised in that: in above-mentioned steps
In S2, character string is obtained by derived algorithm or artificially extracts character string.
3. combining the language model pre-training method of class word information as described in claim 1, it is characterised in that: in above-mentioned steps
In S3, the ending of each sentence adds [sep], in the beginning of the sentence of first sentence respectively in two trained sentences of extraction
It adds [cls];In the step S4, using the character string location information and/or length information to the character string into
Line flag.
4. combining the language model pre-training method of class word information as described in claim 1, it is characterised in that: in above-mentioned steps
In S6, when executing step S2 every time, two sentences are extracted one by one from pre-training file as training sentence, until the pre- instruction
Practice all sentence in text and be extracted and finishes, two extracted every time sentence be it is adjacent either non-conterminous, extracted
Proportional region shared by Bi Shi, two adjacent sentences and non-conterminous two sentences is 40-70%, sum of the two 100%.
5. combining the language model pre-training method of class word information as described in claim 1, it is characterised in that: the step S5
Specifically comprise the following steps:
The objective function of S51, foundation about the pre-training model;
S52, the individual character that the individual character sequence chooses 15% is covered or is replaced;
S53, it by the training sentence for covering or replacing and labeled character string while being input in pre-training model;
S54, or the word replaced capped by pre-training model prediction with obtain represent described capped or replacement word to
Amount expression;And
S55, calculating target function is expressed using the vector and optimizes the pre-training model.
6. combining the language model pre-training method of class word information as described in claim 1, it is characterised in that: the combination class
The language model pre-training method of word information further includes following steps: step S7, in conjunction with the vocabulary pair formed in the step S2
Pre-training model after the optimization obtained in the step S6 carries out task fine tuning.
7. combining the language model pre-training method of class word information as claimed in claim 6, it is characterised in that: the step S7
Specifically comprise the following steps:
S71, fine tuning task text is provided;
S72, to the fine tuning task text segmentation at individual character sequence;
S73, the word in the individual character sequence in the character string and the step S72 in the step S2 is carried out to matching and to described
Character string after matching is marked;
S74, the individual character sequence and labeled character string are input in the pre-training model after optimization simultaneously to pre-training
Model is finely adjusted.
8. combining the language model pre-training method of class word information as claimed in claim 7, it is characterised in that: in above-mentioned steps
In S74, the optimization to the pre-training model after the optimization is realized by full articulamentum or CRF network optimization objective function.
9. such as the language model pre-training method of combination class word information of any of claims 1-8, it is characterised in that:
The pre-training model includes embeding layer, character level encoder, word level encoder, multiple attention encoders;Wherein,
The embeding layer is for being labeled in covered in the step S5 or the training sentence and the step S4 replaced
The individual character is converted into corresponding with each individual character individual character insertion vector and by each word by character string input, the embeding layer
Symbol string be converted into corresponding with each character string character string insertion vector simultaneously each individual character insertion vector sum character string be embedded in
Amount is corresponding plus position encoded;
The character level encoder is for its corresponding position encoded input of individual character insertion vector sum and is calculated to obtain
The word vector expression of not covered or replacement word;
Institute's predicate level encoder is for its corresponding position encoded input of character string insertion vector sum and is calculated to obtain
Obtain term vector expression;
The attention encoder be it is multiple, the word vector expression for not covering or replacing for described and the term vector are expressed same
When input with obtain about it is described covered or replacement word vector expression.
10. combining the language model pre-training method of class word information as claimed in claim 9, it is characterised in that: the pre- instruction
Practicing model further includes Linear network layer and Softmax network layer, and the word vector expression and the term vector are expressed described in warp
It is input in the Linear network layer and Softmax network layer after attention encoder output and the pre-training model is done
Further training and fine tuning.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910775453.4A CN110489555B (en) | 2019-08-21 | 2019-08-21 | Language model pre-training method combined with similar word information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910775453.4A CN110489555B (en) | 2019-08-21 | 2019-08-21 | Language model pre-training method combined with similar word information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110489555A true CN110489555A (en) | 2019-11-22 |
CN110489555B CN110489555B (en) | 2022-03-08 |
Family
ID=68552689
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910775453.4A Active CN110489555B (en) | 2019-08-21 | 2019-08-21 | Language model pre-training method combined with similar word information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110489555B (en) |
Cited By (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111008531A (en) * | 2019-12-06 | 2020-04-14 | 北京金山数字娱乐科技有限公司 | Training method and device for sentence word selection model and sentence word selection method and device |
CN111144115A (en) * | 2019-12-23 | 2020-05-12 | 北京百度网讯科技有限公司 | Pre-training language model obtaining method and device, electronic equipment and storage medium |
CN111222337A (en) * | 2020-01-08 | 2020-06-02 | 山东旗帜信息有限公司 | Training method and device for entity recognition model |
CN111259663A (en) * | 2020-01-14 | 2020-06-09 | 北京百度网讯科技有限公司 | Information processing method and device |
CN111401077A (en) * | 2020-06-02 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Language model processing method and device and computer equipment |
CN111460832A (en) * | 2020-03-27 | 2020-07-28 | 北京百度网讯科技有限公司 | Object coding method, device, system, equipment and computer storage medium |
CN111522944A (en) * | 2020-04-10 | 2020-08-11 | 北京百度网讯科技有限公司 | Method, apparatus, device and storage medium for outputting information |
CN111581383A (en) * | 2020-04-30 | 2020-08-25 | 上海电力大学 | Chinese text classification method based on ERNIE-BiGRU |
CN111737383A (en) * | 2020-05-21 | 2020-10-02 | 百度在线网络技术(北京)有限公司 | Method for extracting spatial relation of geographic position points and method and device for training extraction model |
CN111798986A (en) * | 2020-07-07 | 2020-10-20 | 云知声智能科技股份有限公司 | Data enhancement method and equipment |
CN111814448A (en) * | 2020-07-03 | 2020-10-23 | 苏州思必驰信息科技有限公司 | Method and device for quantizing pre-training language model |
CN112016300A (en) * | 2020-09-09 | 2020-12-01 | 平安科技(深圳)有限公司 | Pre-training model processing method, pre-training model processing device, downstream task processing device and storage medium |
CN112307212A (en) * | 2020-11-11 | 2021-02-02 | 上海昌投网络科技有限公司 | Public opinion delivery monitoring method for advertisement delivery |
CN112329391A (en) * | 2020-11-02 | 2021-02-05 | 上海明略人工智能(集团)有限公司 | Target encoder generation method, target encoder generation device, electronic equipment and computer readable medium |
CN112329392A (en) * | 2020-11-05 | 2021-02-05 | 上海明略人工智能(集团)有限公司 | Target encoder construction method and device for bidirectional encoding |
CN112635013A (en) * | 2020-11-30 | 2021-04-09 | 泰康保险集团股份有限公司 | Medical image information processing method and device, electronic equipment and storage medium |
CN112749251A (en) * | 2020-03-09 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Text processing method and device, computer equipment and storage medium |
CN113011176A (en) * | 2021-03-10 | 2021-06-22 | 云从科技集团股份有限公司 | Language model training and language reasoning method, device and computer storage medium thereof |
CN113032559A (en) * | 2021-03-15 | 2021-06-25 | 新疆大学 | Language model fine-tuning method for low-resource adhesion language text classification |
CN113033192A (en) * | 2019-12-09 | 2021-06-25 | 株式会社理光 | Training method and device for sequence labels and computer readable storage medium |
CN113032560A (en) * | 2021-03-16 | 2021-06-25 | 北京达佳互联信息技术有限公司 | Sentence classification model training method, sentence processing method and equipment |
WO2021169288A1 (en) * | 2020-02-26 | 2021-09-02 | 平安科技(深圳)有限公司 | Semantic understanding model training method and apparatus, computer device, and storage medium |
CN113360751A (en) * | 2020-03-06 | 2021-09-07 | 百度在线网络技术(北京)有限公司 | Intention recognition method, apparatus, device and medium |
CN113468877A (en) * | 2021-07-09 | 2021-10-01 | 浙江大学 | Language model fine-tuning method and device, computing equipment and storage medium |
CN113486141A (en) * | 2021-07-29 | 2021-10-08 | 宁波薄言信息技术有限公司 | Text, resume and financing bulletin extraction method based on SegaBert pre-training model |
CN113496122A (en) * | 2020-04-08 | 2021-10-12 | 中移(上海)信息通信科技有限公司 | Named entity identification method, device, equipment and medium |
CN113591475A (en) * | 2021-08-03 | 2021-11-02 | 美的集团(上海)有限公司 | Unsupervised interpretable word segmentation method and device and electronic equipment |
CN113656763A (en) * | 2020-04-24 | 2021-11-16 | 支付宝(杭州)信息技术有限公司 | Method and device for determining small program feature vector and electronic equipment |
CN113779185A (en) * | 2020-06-10 | 2021-12-10 | 武汉Tcl集团工业研究院有限公司 | Natural language model generation method and computer equipment |
CN113836297A (en) * | 2021-07-23 | 2021-12-24 | 北京三快在线科技有限公司 | Training method and device for text emotion analysis model |
CN113887245A (en) * | 2021-12-02 | 2022-01-04 | 腾讯科技(深圳)有限公司 | Model training method and related device |
CN113961669A (en) * | 2021-10-26 | 2022-01-21 | 杭州中软安人网络通信股份有限公司 | Training method of pre-training language model, storage medium and server |
WO2022022421A1 (en) * | 2020-07-29 | 2022-02-03 | 北京字节跳动网络技术有限公司 | Language representation model system, pre-training method and apparatus, device and medium |
CN114186043A (en) * | 2021-12-10 | 2022-03-15 | 北京三快在线科技有限公司 | Pre-training method, device, equipment and storage medium |
CN114444488A (en) * | 2022-01-26 | 2022-05-06 | 中国科学技术大学 | Reading understanding method, system, device and storage medium for few-sample machine |
CN114792097A (en) * | 2022-05-14 | 2022-07-26 | 北京百度网讯科技有限公司 | Method and device for determining prompt vector of pre-training model and electronic equipment |
CN115017915A (en) * | 2022-05-30 | 2022-09-06 | 北京三快在线科技有限公司 | Model training and task executing method and device |
WO2022222854A1 (en) * | 2021-04-18 | 2022-10-27 | 华为技术有限公司 | Data processing method and related device |
CN117235233A (en) * | 2023-10-24 | 2023-12-15 | 之江实验室 | Automatic financial report question-answering method and device based on large model |
CN113033192B (en) * | 2019-12-09 | 2024-04-26 | 株式会社理光 | Training method and device for sequence annotation and computer readable storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160027433A1 (en) * | 2014-07-24 | 2016-01-28 | Intrnational Business Machines Corporation | Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods |
CN108228758A (en) * | 2017-12-22 | 2018-06-29 | 北京奇艺世纪科技有限公司 | A kind of file classification method and device |
CN108415896A (en) * | 2017-02-09 | 2018-08-17 | 北京京东尚科信息技术有限公司 | Deep learning model training method, segmenting method, training system and Words partition system |
CN109086267A (en) * | 2018-07-11 | 2018-12-25 | 南京邮电大学 | A kind of Chinese word cutting method based on deep learning |
CN109933795A (en) * | 2019-03-19 | 2019-06-25 | 上海交通大学 | Based on context-emotion term vector text emotion analysis system |
CN110032644A (en) * | 2019-04-03 | 2019-07-19 | 人立方智能科技有限公司 | Language model pre-training method |
CN110083831A (en) * | 2019-04-16 | 2019-08-02 | 武汉大学 | A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF |
-
2019
- 2019-08-21 CN CN201910775453.4A patent/CN110489555B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160027433A1 (en) * | 2014-07-24 | 2016-01-28 | Intrnational Business Machines Corporation | Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods |
CN108415896A (en) * | 2017-02-09 | 2018-08-17 | 北京京东尚科信息技术有限公司 | Deep learning model training method, segmenting method, training system and Words partition system |
CN108228758A (en) * | 2017-12-22 | 2018-06-29 | 北京奇艺世纪科技有限公司 | A kind of file classification method and device |
CN109086267A (en) * | 2018-07-11 | 2018-12-25 | 南京邮电大学 | A kind of Chinese word cutting method based on deep learning |
CN109933795A (en) * | 2019-03-19 | 2019-06-25 | 上海交通大学 | Based on context-emotion term vector text emotion analysis system |
CN110032644A (en) * | 2019-04-03 | 2019-07-19 | 人立方智能科技有限公司 | Language model pre-training method |
CN110083831A (en) * | 2019-04-16 | 2019-08-02 | 武汉大学 | A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF |
Cited By (58)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111008531B (en) * | 2019-12-06 | 2023-05-26 | 北京金山数字娱乐科技有限公司 | Training method and device for sentence selection model, sentence selection method and device |
CN111008531A (en) * | 2019-12-06 | 2020-04-14 | 北京金山数字娱乐科技有限公司 | Training method and device for sentence word selection model and sentence word selection method and device |
CN113033192A (en) * | 2019-12-09 | 2021-06-25 | 株式会社理光 | Training method and device for sequence labels and computer readable storage medium |
CN113033192B (en) * | 2019-12-09 | 2024-04-26 | 株式会社理光 | Training method and device for sequence annotation and computer readable storage medium |
CN111144115A (en) * | 2019-12-23 | 2020-05-12 | 北京百度网讯科技有限公司 | Pre-training language model obtaining method and device, electronic equipment and storage medium |
CN111144115B (en) * | 2019-12-23 | 2023-10-20 | 北京百度网讯科技有限公司 | Pre-training language model acquisition method, device, electronic equipment and storage medium |
CN111222337A (en) * | 2020-01-08 | 2020-06-02 | 山东旗帜信息有限公司 | Training method and device for entity recognition model |
CN111259663A (en) * | 2020-01-14 | 2020-06-09 | 北京百度网讯科技有限公司 | Information processing method and device |
CN111259663B (en) * | 2020-01-14 | 2023-05-26 | 北京百度网讯科技有限公司 | Information processing method and device |
US11775776B2 (en) | 2020-01-14 | 2023-10-03 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for processing information |
WO2021169288A1 (en) * | 2020-02-26 | 2021-09-02 | 平安科技(深圳)有限公司 | Semantic understanding model training method and apparatus, computer device, and storage medium |
CN113360751A (en) * | 2020-03-06 | 2021-09-07 | 百度在线网络技术(北京)有限公司 | Intention recognition method, apparatus, device and medium |
CN112749251A (en) * | 2020-03-09 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Text processing method and device, computer equipment and storage medium |
CN112749251B (en) * | 2020-03-09 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Text processing method, device, computer equipment and storage medium |
CN111460832A (en) * | 2020-03-27 | 2020-07-28 | 北京百度网讯科技有限公司 | Object coding method, device, system, equipment and computer storage medium |
CN111460832B (en) * | 2020-03-27 | 2023-11-24 | 北京百度网讯科技有限公司 | Method, device, system, equipment and computer storage medium for object coding |
CN113496122A (en) * | 2020-04-08 | 2021-10-12 | 中移(上海)信息通信科技有限公司 | Named entity identification method, device, equipment and medium |
CN111522944B (en) * | 2020-04-10 | 2023-11-14 | 北京百度网讯科技有限公司 | Method, apparatus, device and storage medium for outputting information |
CN111522944A (en) * | 2020-04-10 | 2020-08-11 | 北京百度网讯科技有限公司 | Method, apparatus, device and storage medium for outputting information |
CN113656763B (en) * | 2020-04-24 | 2024-01-09 | 支付宝(中国)网络技术有限公司 | Method and device for determining feature vector of applet and electronic equipment |
CN113656763A (en) * | 2020-04-24 | 2021-11-16 | 支付宝(杭州)信息技术有限公司 | Method and device for determining small program feature vector and electronic equipment |
CN111581383A (en) * | 2020-04-30 | 2020-08-25 | 上海电力大学 | Chinese text classification method based on ERNIE-BiGRU |
CN111737383B (en) * | 2020-05-21 | 2021-11-23 | 百度在线网络技术(北京)有限公司 | Method for extracting spatial relation of geographic position points and method and device for training extraction model |
CN111737383A (en) * | 2020-05-21 | 2020-10-02 | 百度在线网络技术(北京)有限公司 | Method for extracting spatial relation of geographic position points and method and device for training extraction model |
CN111401077A (en) * | 2020-06-02 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Language model processing method and device and computer equipment |
CN111401077B (en) * | 2020-06-02 | 2020-09-18 | 腾讯科技(深圳)有限公司 | Language model processing method and device and computer equipment |
CN113779185A (en) * | 2020-06-10 | 2021-12-10 | 武汉Tcl集团工业研究院有限公司 | Natural language model generation method and computer equipment |
CN113779185B (en) * | 2020-06-10 | 2023-12-29 | 武汉Tcl集团工业研究院有限公司 | Natural language model generation method and computer equipment |
CN111814448A (en) * | 2020-07-03 | 2020-10-23 | 苏州思必驰信息科技有限公司 | Method and device for quantizing pre-training language model |
CN111814448B (en) * | 2020-07-03 | 2024-01-16 | 思必驰科技股份有限公司 | Pre-training language model quantization method and device |
CN111798986B (en) * | 2020-07-07 | 2023-11-03 | 云知声智能科技股份有限公司 | Data enhancement method and device |
CN111798986A (en) * | 2020-07-07 | 2020-10-20 | 云知声智能科技股份有限公司 | Data enhancement method and equipment |
WO2022022421A1 (en) * | 2020-07-29 | 2022-02-03 | 北京字节跳动网络技术有限公司 | Language representation model system, pre-training method and apparatus, device and medium |
CN112016300A (en) * | 2020-09-09 | 2020-12-01 | 平安科技(深圳)有限公司 | Pre-training model processing method, pre-training model processing device, downstream task processing device and storage medium |
CN112329391A (en) * | 2020-11-02 | 2021-02-05 | 上海明略人工智能(集团)有限公司 | Target encoder generation method, target encoder generation device, electronic equipment and computer readable medium |
CN112329392A (en) * | 2020-11-05 | 2021-02-05 | 上海明略人工智能(集团)有限公司 | Target encoder construction method and device for bidirectional encoding |
CN112329392B (en) * | 2020-11-05 | 2023-12-22 | 上海明略人工智能(集团)有限公司 | Method and device for constructing target encoder of bidirectional encoding |
CN112307212A (en) * | 2020-11-11 | 2021-02-02 | 上海昌投网络科技有限公司 | Public opinion delivery monitoring method for advertisement delivery |
CN112635013A (en) * | 2020-11-30 | 2021-04-09 | 泰康保险集团股份有限公司 | Medical image information processing method and device, electronic equipment and storage medium |
CN112635013B (en) * | 2020-11-30 | 2023-10-27 | 泰康保险集团股份有限公司 | Medical image information processing method and device, electronic equipment and storage medium |
CN113011176A (en) * | 2021-03-10 | 2021-06-22 | 云从科技集团股份有限公司 | Language model training and language reasoning method, device and computer storage medium thereof |
CN113032559A (en) * | 2021-03-15 | 2021-06-25 | 新疆大学 | Language model fine-tuning method for low-resource adhesion language text classification |
CN113032560B (en) * | 2021-03-16 | 2023-10-27 | 北京达佳互联信息技术有限公司 | Sentence classification model training method, sentence processing method and equipment |
CN113032560A (en) * | 2021-03-16 | 2021-06-25 | 北京达佳互联信息技术有限公司 | Sentence classification model training method, sentence processing method and equipment |
WO2022222854A1 (en) * | 2021-04-18 | 2022-10-27 | 华为技术有限公司 | Data processing method and related device |
CN113468877A (en) * | 2021-07-09 | 2021-10-01 | 浙江大学 | Language model fine-tuning method and device, computing equipment and storage medium |
CN113836297A (en) * | 2021-07-23 | 2021-12-24 | 北京三快在线科技有限公司 | Training method and device for text emotion analysis model |
CN113836297B (en) * | 2021-07-23 | 2023-04-14 | 北京三快在线科技有限公司 | Training method and device for text emotion analysis model |
CN113486141A (en) * | 2021-07-29 | 2021-10-08 | 宁波薄言信息技术有限公司 | Text, resume and financing bulletin extraction method based on SegaBert pre-training model |
CN113591475A (en) * | 2021-08-03 | 2021-11-02 | 美的集团(上海)有限公司 | Unsupervised interpretable word segmentation method and device and electronic equipment |
CN113961669A (en) * | 2021-10-26 | 2022-01-21 | 杭州中软安人网络通信股份有限公司 | Training method of pre-training language model, storage medium and server |
CN113887245B (en) * | 2021-12-02 | 2022-03-25 | 腾讯科技(深圳)有限公司 | Model training method and related device |
CN113887245A (en) * | 2021-12-02 | 2022-01-04 | 腾讯科技(深圳)有限公司 | Model training method and related device |
CN114186043A (en) * | 2021-12-10 | 2022-03-15 | 北京三快在线科技有限公司 | Pre-training method, device, equipment and storage medium |
CN114444488A (en) * | 2022-01-26 | 2022-05-06 | 中国科学技术大学 | Reading understanding method, system, device and storage medium for few-sample machine |
CN114792097A (en) * | 2022-05-14 | 2022-07-26 | 北京百度网讯科技有限公司 | Method and device for determining prompt vector of pre-training model and electronic equipment |
CN115017915A (en) * | 2022-05-30 | 2022-09-06 | 北京三快在线科技有限公司 | Model training and task executing method and device |
CN117235233A (en) * | 2023-10-24 | 2023-12-15 | 之江实验室 | Automatic financial report question-answering method and device based on large model |
Also Published As
Publication number | Publication date |
---|---|
CN110489555B (en) | 2022-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110489555A (en) | A kind of language model pre-training method of combination class word information | |
CN107943847B (en) | Business connection extracting method, device and storage medium | |
CN110232114A (en) | Sentence intension recognizing method, device and computer readable storage medium | |
CN109145294B (en) | Text entity identification method and device, electronic equipment and storage medium | |
CN106844349B (en) | Comment spam recognition methods based on coorinated training | |
CN111062217B (en) | Language information processing method and device, storage medium and electronic equipment | |
CN110020438A (en) | Enterprise or tissue Chinese entity disambiguation method and device based on recognition sequence | |
CN107330011A (en) | The recognition methods of the name entity of many strategy fusions and device | |
CN112101041B (en) | Entity relationship extraction method, device, equipment and medium based on semantic similarity | |
CN110222188A (en) | A kind of the company's bulletin processing method and server-side of multi-task learning | |
CN110489750A (en) | Burmese participle and part-of-speech tagging method and device based on two-way LSTM-CRF | |
CN110196982A (en) | Hyponymy abstracting method, device and computer equipment | |
CN115357719B (en) | Power audit text classification method and device based on improved BERT model | |
CN110334186A (en) | Data query method, apparatus, computer equipment and computer readable storage medium | |
CN113919366A (en) | Semantic matching method and device for power transformer knowledge question answering | |
CN111738018A (en) | Intention understanding method, device, equipment and storage medium | |
Yolchuyeva et al. | Self-attention networks for intent detection | |
CN113095063A (en) | Two-stage emotion migration method and system based on masking language model | |
Yuan et al. | Personalized sentence generation using generative adversarial networks with author-specific word usage | |
CN110287396A (en) | Text matching technique and device | |
CN115510188A (en) | Text keyword association method, device, equipment and storage medium | |
CN113761875B (en) | Event extraction method and device, electronic equipment and storage medium | |
CN114239555A (en) | Training method of keyword extraction model and related device | |
Jiang et al. | Construction of segmentation and part of speech annotation model in ancient chinese | |
Liu | Supervised ensemble learning for Vietnamese tokenization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |