CN110489555A - A kind of language model pre-training method of combination class word information - Google Patents

A kind of language model pre-training method of combination class word information Download PDF

Info

Publication number
CN110489555A
CN110489555A CN201910775453.4A CN201910775453A CN110489555A CN 110489555 A CN110489555 A CN 110489555A CN 201910775453 A CN201910775453 A CN 201910775453A CN 110489555 A CN110489555 A CN 110489555A
Authority
CN
China
Prior art keywords
training
word
character string
model
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910775453.4A
Other languages
Chinese (zh)
Other versions
CN110489555B (en
Inventor
白佳欣
宋彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Innovation Workshop (guangzhou) Artificial Intelligence Research Co Ltd
Original Assignee
Innovation Workshop (guangzhou) Artificial Intelligence Research Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innovation Workshop (guangzhou) Artificial Intelligence Research Co Ltd filed Critical Innovation Workshop (guangzhou) Artificial Intelligence Research Co Ltd
Priority to CN201910775453.4A priority Critical patent/CN110489555B/en
Publication of CN110489555A publication Critical patent/CN110489555A/en
Application granted granted Critical
Publication of CN110489555B publication Critical patent/CN110489555B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The present invention relates to language processing techniques field more particularly to a kind of language model pre-training methods of combination class word information comprising following steps: S1, providing pre-training model and pre-training text;S2, it extracts character string and forms vocabulary;S3, two sentences are extracted as training sentence while sentence will be trained to be divided into individual character sequence;S4, the word in the character string and individual character sequence in step S2 is matched, and to and individual character sequence in the character string that matches of word be marked;S5, the individual character for choosing preset ratio among individual character sequence is covered or is replaced, and by the training sentence for covering or replacing and labeled character string while be input in pre-training model pre-training model is trained and is optimized;S6, step S2-S5 is repeated until pre-training model reaches the optimal conditions of setting with the pre-training model after being optimized.The language model pre-training method and pre-training model of combination class word information provided by the invention have better performance on multiple Downstream Jobs.

Description

A kind of language model pre-training method of combination class word information
[technical field]
The present invention relates to language processing techniques field more particularly to a kind of language model pre-training sides of combination class word information Method.
[background technique]
Current state-of-the-art pre-training language model is divided into two classes, is autoregression language model respectively (Autoregressive Model) with from coded language model (Autoencoding Model).GPT and GPT2 be performance compared with Good autoregression language model.The training objective of autoregression model is correctly to speculate next word according to above.BERT is generation Table from coded language model.The training objective of BERT is based on context correctly to deduce covered or replacement word.Two Kind language model has their own advantages and disadvantage.Autoregression model can only combine above, and can not complete in combination with context Particular task.On the other hand, contextual information can be utilized simultaneously from coded language model, but wherein in pre-training process In, in order to cover supposition target word, [mask] is marked, training corpus is added for replacing original target word, however [mask] Coding does not appear in during the micro-adjustment for particular task.Above-mentioned reason results in pre-training language model in pre-training Input with micro-adjustment mismatches, and then influences the performance of model entirety.Recently, XLNet is proposed for solving simultaneously above-mentioned Two problems, so that pre-training language model while not introducing [mask] label, can complete task in conjunction with context.
However above-mentioned language model and underuse word, phrase, the entity etc. occurred in pre-training and micro-adjustment corpus The information of larger particle degree.And this type of information is even more important in Chinese task.Compared with English, Chinese is clear without space etc. Word boundary so that model is more difficult from the sequence of individual character study to double word or the whole meaning of multi-character words.
Recently, BERT-wwm model be proposed as a kind of BERT model carried out on Chinese regarding to the issue above it is excellent Change.BERT-wwm and BERT the difference is that only the pretreatment to training corpus.BERT is covered to pre-training corpus When operation, 15% individual character is substituted for [mask], other words will retain.And BERT-wwm first uses participle tool to primitive material It is segmented, then carries out identical covering operation as unit of whole word.Before a little earlier, the ERNIE of Baidu's publication is also BERT Improvement regarding to the issue above.ERNIE uses multi-level covering strategy.The multi-level strategy that covers includes that word rank covers, Phrase rank covers and entity level covers.In order to reach the target covered at many levels, Baidu in addition to Chinese wikipedia data, Baidupedia, Baidu's discussion bar and question and answer data are additionally used.Although ERNIE has used more training datas, learn more Knowledge, at that time it the performance on Downstream Jobs is suitable with BERT-wwm.
However covering strategy by multilayer also has problems to learn word boundary information.Firstly, covering the effective of strategy Property dependent on the additional information except text, such as BERT-wwm dependent on segmenter provide as a result, and ERNIE dependent on outer Portion's knowledge.In actual use, there are following disadvantages using additional information.First, the quality of information is unable to get guarantee.Example As the effect of BERT-wwm depends on the quality of Chinese word segmentation.Second, it is that the information of high quality needs a large amount of acquisitions and mark, gives Pre-training language model brings additional cost.Third only carries out covering and under utilized word information for word, because of word Language may contain with the exotic vocabularies such as literal unrelated amplification meaning, such as " Romania ", the Chinese idioms such as " the old frontiersman lose his horse ", with And two-part allegorical sayings such as " daughter's son carrys a lantern lighted ".
For this problem, this patent proposes a kind of new by class word information on the basis of existing language model It is dissolved into the pre-training of language model and the method for fine tuning.
[summary of the invention]
For the low and at high cost defect of existing language model prediction accuracy, the present invention provides a kind of combination class word The language model pre-training method of information.
In order to solve the above technical problem, the present invention provides a kind of language model pre-training of combination class word information by the present invention Method comprising following steps: a pre-training model and pre-training text S1, are provided;S2, from the pre-training text It extracts character string and forms vocabulary;S3, two sentences of extraction simultaneously will be described as training sentence from the pre-training text Training sentence is divided into individual character sequence;S4, the word in the character string and the individual character sequence in the step S2 is matched, And the character string to match with the word in the individual character sequence is marked;S5, preset ratio will be chosen among individual character sequence Individual character covered or replaced, and by the training sentence for covering or replacing and labeled character string while being input to pre- instruction Practice and pre-training model is trained and is optimized in model;S6, the S2-S5 that repeats the above steps are until pre-training model reaches setting Optimal conditions with the pre-training model after being optimized.
Preferably, in above-mentioned steps S2, character string is obtained by derived algorithm or artificially extracts character string.
Preferably, in above-mentioned steps S3, the ending of each sentence is added respectively in two trained sentences of extraction [sep], the beginning of the sentence addition [cls] in first sentence;In the step S4, the location information of the character string is utilized And/or the character string is marked in length information.
Preferably, in above-mentioned steps S6, when executing step S2 every time, two sentences are extracted one by one from pre-training file As training sentence, until sentence all in the pre-training text is extracted and finishes, two extracted every time sentence is phase Adjacent is either non-conterminous, when extraction finishes, proportional region shared by two adjacent sentences and non-conterminous two sentences For 40-70%, sum of the two 100%.
Preferably, the step S5 specifically comprises the following steps: S51, establishes the target letter about the pre-training model Number;S52, the individual character that the individual character sequence chooses 15% is covered or is replaced;S53, the training that will be covered or replaced Sentence and labeled character string are input in pre-training model simultaneously;S54, it is capped or is replaced by pre-training model prediction Word to obtain the vector expression for representing described capped or replacement word;And S55, utilize the vector expression calculate target letter It counts and optimizes the pre-training model.
Preferably, the language model pre-training method of the combination class word information further includes following steps: step S7, being combined The vocabulary formed in the step S2 carries out task fine tuning to the pre-training model after the optimization obtained in the step S6.
Preferably, the step S7 specifically comprises the following steps: S71, provides fine tuning task text;S72, to the fine tuning Task text segmentation is at individual character sequence;S73, will be in the individual character sequence in the character string and the step S72 in the step S2 Word carry out match and the character string after the matching is marked;S74, by the individual character sequence and labeled character Pre-training model is finely adjusted in string while the pre-training model being input to after optimization.
Preferably, it in above-mentioned steps S74, is realized by full articulamentum or CRF network optimization objective function to described The optimization of pre-training model after optimization.
Preferably, the pre-training model includes embeding layer, character level encoder, word level encoder, multiple attentions Encoder;Wherein, the embeding layer is in covered in the step S5 or the training sentence and the step S4 replaced Labeled character string input, the embeding layer by the individual character be converted into individual character insertion vector corresponding with each individual character and Each character string is converted into character string insertion vector corresponding with each character string each individual character insertion vector sum character simultaneously String insertion vector is corresponding plus position encoded;The character level encoder is compiled for its corresponding position of individual character insertion vector sum Code is inputted and is calculated to obtain the word vector expression of not covered or replacement word;Institute's predicate level encoder supplies the word Its corresponding position encoded input of symbol string insertion vector sum is simultaneously calculated to obtain term vector expression;The attention encoder To be multiple, express while inputted to obtain about described for the word vector expression for not covering or replacing and the term vector The vector expression of covered or replacement word.
Preferably, the pre-training model further includes Linear network layer and Softmax network layer, the word vector expression With term vector expression through being input to the Linear network layer and Softmax network after the attention encoder output Further training and fine tuning are done to the pre-training model in layer.
Compared with prior art, the language model pre-training method and pre-training mould of combination class word information provided by the invention Type has the following beneficial effects:
One pre-training model and pre-training text are provided first, character string is extracted from the pre-training text and formed Then vocabulary extracts two sentences as training sentence from the pre-training text and the trained sentence is divided into list simultaneously Word sequence;By the way that the word in character string and the individual character sequence is matched, and to the word phase in the individual character sequence The character string matched is marked, can well using character string relevant information and not be only individual character sequence in individual character to Amount information predicts capped or replacement word, character string is marked the location information and length for often using character string It spends information to carry out, therefore, can go to predict covered or replacement word using relevance between character string well, improve pre- instruction Practice model to the accuracy of capped or replacement word, while the pre-training model that optimization is come has preferably performance Also, there is better performance in multiple tasks, for example, in Chinese word segmentation, part-of-speech tagging, Entity recognition, sentiment analysis, nature Language inference, sentence classification, machine read the Downstream Jobs such as understanding, article classification with preferably performance.
In pre-training model provided by the invention, including embeding layer, character level encoder, word level encoder, multiple notes Meaning power encoder, institute's predicate level encoder is for its corresponding position encoded input of character string insertion vector sum and is counted Calculate to obtain term vector expression, and attention encoder for the word vector expression for not covering or replacing and institute's predicate to Amount is expressed while being inputted to obtain the vector expression about described covered or replacement word, and a word level encoder knot is arranged Character level encoder is closed, so that obtaining when attention encodes and calculates the vector expression for obtaining covered or replacement word Vector expression preferably represent capped or replacement word, the pre- instruction for improving the accuracy of prediction, while optimization being come Practicing model has preferably performance also, has better performance in multiple tasks, for example, part-of-speech tagging is real in Chinese word segmentation Body identification, the Downstream Jobs such as sentiment analysis and document classification have preferably performance.And pre-training mould provided by the invention Type joined word boundary information, have stronger generative capacity, can apply and generate in keyword, and article is continued, and article is summarized Etc. among tasks, when the task of execution, it is higher to generate the sentence quality come.
[Detailed description of the invention]
Fig. 1 is in first embodiment of the invention in conjunction with the flow diagram of the language model pre-training method of class word information;
Fig. 2 is the pre- instruction used for combining the language model pre-training method of class word information in the present invention in first embodiment Practice the module diagram of model;
Fig. 3 is will be single in the language model pre-training method and step S4 for combine in first embodiment in the present invention class word information Word sequence and character string carry out matched schematic diagram;
Fig. 4 is in first embodiment of the invention in conjunction with the details of step S5 in the language model pre-training method of class word information Flow chart;
Fig. 5 is in first embodiment of the invention in conjunction with step S53 and step in the language model pre-training method of class word information The rapid corresponding input pre-training model of S54 executes the schematic diagram of respective operations;
Fig. 6 is in first embodiment of the invention in conjunction with variant embodiment in the language model pre-training method of class word information Flow diagram;
Fig. 7 is in first embodiment of the invention in conjunction with the details of step S7 in the language model pre-training method of class word information Flow chart;
Fig. 8 is the module diagram of the electronic equipment provided in second embodiment of the invention;
Fig. 9 is adapted for the structural schematic diagram for the computer system for realizing the server of the embodiment of the present invention.
Description of symbols:
11, embeding layer;12, character level encoder;13, word level encoder;14, attention encoder;60, electronics is set It is standby;601, memory;602, processor;800, computer system;801, central processing unit (CPU);802, memory (ROM);803,RAM;804, bus;805, I/O interface;806, importation;807, output par, c;808, storage section; 809, communications portion;810, driver;811, detachable media.
[specific embodiment]
In order to make the purpose of the present invention, technical solution and advantage are more clearly understood, below in conjunction with attached drawing and embodiment, The present invention will be described in further detail.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, It is not intended to limit the present invention.
Referring to Fig. 1, first embodiment of the invention provides a kind of language model pre-training method of combination class word information, Include the following steps:
S1, a pre-training model and pre-training text are provided;
S2, character string is extracted from the pre-training text and forms vocabulary;
S3, two sentences are extracted as training sentence from the pre-training text while being divided into the trained sentence Individual character sequence;
S4, the word in the character string and the individual character sequence in the step S2 is matched, and to the individual character The character string that word in sequence matches is marked;
S5, the individual character for choosing preset ratio among individual character sequence is covered or is replaced, and will covered or replaced Training sentence and labeled character string are input in pre-training model simultaneously to be trained and optimizes to pre-training model;
S6, the S2-S5 that repeats the above steps are until pre-training model reaches the optimal conditions of setting with pre- after being optimized Training pattern.
In the step S1, the acquisition of the pre-training text from wikipedia, news corpus, medical question and answer corpus and The text-only files such as financial report data, which are chosen, to be obtained.
Referring to Fig. 2, in the step S1, the pre-training model is to be carried out based on existing from coded language model Acquisition is improved, existing it includes but is not limited to BERT (Bidirectional Encoder from coded language model Representations from Transformer) language model.The pre-training model includes embeding layer 11, character level volume The modular units such as code device 12, word level encoder 13, multiple attention encoders 14.Wherein the xN in attached drawing 2 is indicated wherein also Multiple attention encoders 14 are omitted.
The embeding layer 11 is for quilt in covered in the step S5 or the training sentence and the step S4 replaced The character string input of label, the embeding layer 11 by the individual character be converted into corresponding with each individual character individual character insertion vector and Each character string is converted into character string insertion vector corresponding with each character string each individual character insertion vector sum character simultaneously String insertion vector is corresponding plus position encoded.
The character level encoder 12 is for its corresponding position encoded input of individual character insertion vector sum and is calculated To obtain the word vector expression of not covered or replacement word;
Institute's predicate level encoder 13 is for its corresponding position encoded input of character string insertion vector sum and is counted It calculates to obtain term vector expression.
The attention encoder 14 is multiple, the word vector expression for not covering or replacing for described and the term vector It expresses while inputting to obtain the vector expression about described covered or replacement word.
Pre-training model further includes Linear network layer 15 and Softmax network layer 16, described covered or replacement word The expression of word vector be input to the Linear network layer 15 and Softmax network after the attention encoder 14 output In layer 16, further to complete the optimization and fine tuning task to pre-training model.
In the step S2, character string is extracted from the pre-training text and forms vocabulary.It in this step can be with Character string is obtained derived algorithm or by extracting character string by way of artificially extracting to form vocabulary.Generally, word is taken out Algorithm is also referred to as segmentation methods or string matching segmentation methods.Such algorithm is according to certain strategy by character to be matched Word in string and " sufficiently big " dictionary having had built up is matched, illustrate to match if finding some entry at Function, that is, identifying the word.Optionally, derived algorithm includes but is not limited to Accesser Variety.It will of course be understood that , in this step, pre-training text can be carried out to take out word by way of artificially extracting and character string is placed into word In table.When, there are when the relatively low character string of the utilization rates such as some two-part allegorical sayings, Chinese idiom or common saying, passing through in pre-training text Character string is added in vocabulary the content that can enrich vocabulary well by the artificial mode for taking out word, improves the optimization of pre-training model Effects.
In the step S3, two sentences are extracted from the pre-training text as training sentence simultaneously by the instruction Practice sentence and is divided into individual character sequence.In this step, it is divided into individual character sequence to be also divided into a sentence training sentence Using single word as the meaning of minimum unit.Training sentence can be divided into individual character sequence by split function.In step S3 In, further include operation: the ending of each sentence adds [sep], at described first respectively in two trained sentences of extraction The beginning of the sentence of sentence adds [cls].
Referring to Fig. 3, in above-mentioned steps S4, by the word in the character string and the individual character sequence in the step S2 into Row matching, and the character string to match with the word in the individual character sequence is marked.In the present embodiment, individual character sequence pair Should be " people, work, intelligence, energy, reality, test, room, very, rhinoceros, benefit ", it is corresponding in vocabulary that there are following character strings: " artificial ", " intelligence ", " experiment ", " laboratory " and " sharp ";It is as shown in Figure 3 after word and string matching so in individual character sequence.When the two into After row matching, the character string is carried out using the location information and/or length information of the corresponding training sentence of the character string Label.
In above-mentioned steps S5, the individual character that preset ratio is chosen among individual character sequence is covered or replaced, and will be hidden The training sentence that covers or replaced and labeled character string while being input in pre-training model instructs pre-training model Practice and optimizes.In this step, preset ratio is usually the percentage that covered or replacement word accounts for entire sentence number of words, model It encloses for 10-30%, in the present embodiment, the ratio of selection is 15%.
Referring to Fig. 4, the step S5 specifically comprises the following steps:
The objective function of S51, foundation about the pre-training model;
S52, the individual character that the individual character sequence chooses 15% is covered or is replaced;
S53, by the training sentence for covering or replacing and labeled character string while pre-training model is input to;
S54, word capped by pre-training model prediction or replacing represent described capped or replacement word to obtain Vector expression;And
S55, calculating target function is expressed using the vector and optimizes the pre-training model.
Referring to Fig. 5, in the step S53, the training sentence of extraction are as follows: " Artificial Intelligence Laboratory is very sharp ", it will " work " word therein and " rhinoceros " word are substituted for " mask ", simultaneously with labeled character string by the training sentence for covering or replacing It is input in the embeding layer 11 of pre-training model, individual character is converted into individual character corresponding with each individual character and is embedded in vector by embeding layer 11 And each character string is converted into character string insertion vector corresponding with each character string each individual character insertion vector sum simultaneously It is corresponding plus position encoded that character string is embedded in vector.It is appreciated that being provided with position coder in embeding layer 11, pass through position Encoder is corresponding plus position encoded in each individual character insertion vector sum character string insertion vector.It is position encoded to be used to indicate The position that each character string occurs in training sentence.
As shown in Figure 5, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10 correspond in individual character sequence " people ", " mask ", " work ", " intelligence ", " energy ", " reality ", " testing ", " room ", " very ", " rhinoceros " " benefit ";W1, w2, w3, w4, w5 respectively correspond as word Symbol string " artificial ", " intelligence ", " experiment ", " laboratory ", " sharp " insertion vector.
Further, individual character insertion its corresponding position encoded is input in character level encoder 12 of vector sum is calculated To obtain the word vector expression of not covered or replacement word;Character string insertion vector sum its corresponding position encoded be input to word Level encoder 13 is calculated to obtain and express at term vector.
In step S54, the word vector expression of the word for not covering or replacing and the term vector are expressed while being inputted It is expressed in the attention encoder 14 with acquisition " [cls] " label and covered or replacement individual character vector.In step S54 In, term vector expression has location information and length information as mark information, and character string and word are matched, so that When attention encoder 14 calculates the word vector expression for the word for being capped or being replaced, accuracy is higher, so that being predicted The accuracy of word out is higher, also, has better performance in multiple tasks, for example, in Chinese word segmentation, part-of-speech tagging, Entity recognition, sentiment analysis, natural language inference, sentence classification, machine read the Downstream Jobs such as understanding, article classification with more Good performance.
After executing the step S55, pre-training model has obtained certain optimization, further below execute step S6, S2-S5 repeat the above steps until pre-training model reaches the optimal conditions of setting with the pre-training model after being optimized.In In this step, the optimal conditions of setting correspond to the objective function and reach convergence state.When executing step S2 every time, from pre- instruction Practice and extracts two sentences in file one by one as training sentence, until sentence all in the pre-training text has been extracted Finish, two sentences extracted every time be it is adjacent either non-conterminous, extraction is when finishing, two adjacent sentences and non-conterminous Two sentences shared by proportional region be 40-70%, sum of the two 100%.In the present embodiment, the ratio that the two respectively accounts for It is 50%.
Referring to Fig. 6, the language model pre-training method of the combination class word information further includes following steps: step S7, Task fine tuning is carried out to the pre-training model after the optimization obtained in the step S6 in conjunction with the vocabulary formed in the step S2.
Referring to Fig. 7, the step S7 specifically comprises the following steps:
S71, fine tuning task text is provided;
S72, to the fine tuning task text segmentation at individual character sequence;
S73, by the word in the individual character sequence in the character string and the step S72 in the step S2 carry out matching and it is right Character string after the matching is marked;
S74, the individual character sequence and labeled character string are input in the pre-training model after optimization simultaneously to pre- Training pattern is finely adjusted.
In step S71, fine tuning task text is also from wikipedia, news corpus, medical question and answer corpus and financial report number It chooses and obtains according to equal text-only files, it cannot be identical with the pre-training text in step S1 but finely tune task text.
In above-mentioned steps S74, after being realized by full articulamentum or CRF network optimization objective function to the optimization The optimization of pre-training model.
Referring to Fig. 8, the second embodiment of the present invention provides a kind of electronic equipment 60, including memory 601 and processor 602, computer program is stored in the memory 601, the computer program is arranged to be executed when operation as first is real Apply the language model pre-training method of combination class word information described in example;
The processor 602 is arranged to execute combination class word as in the first embodiment by the computer program The language model pre-training method of information.
Below with reference to Fig. 9, it illustrates the terminal device/server computers for being suitable for being used to realize the embodiment of the present application The structural schematic diagram of system 800.Terminal device/server shown in Fig. 8 is only an example, should not be to the embodiment of the present application Function and use scope bring any restrictions.
As shown in figure 9, computer system 800 includes central processing unit (CPU) 801, it can be read-only according to being stored in Program in memory (ROM) 802 or be loaded into the program in random access storage device (RAM) 803 from storage section 808 and Execute various movements appropriate and processing.In RAM 803, also it is stored with system 800 and operates required various programs and data. CPU 801, ROM 802 and RAM 803 are connected with each other by bus 804.Input/output (I/O) interface 805 is also connected to always Line 804.
I/O interface 805 is connected to lower component: the importation 806 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 807 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 808 including hard disk etc.; And the communications portion 809 of the network interface card including LAN card, modem etc..Communications portion 809 via such as because The network of spy's net executes communication process.Driver 810 is also connected to I/O interface 805 as needed.Detachable media 811, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 810, in order to read from thereon Computer program be mounted into storage section 808 as needed.
Disclosed embodiment according to the present invention may be implemented as computer software above with reference to the process of flow chart description Program.For example, embodiment disclosed by the invention includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 809, and/or from detachable media 811 are mounted.When the computer program is executed by central processing unit (CPU) 801, limited in execution the present processes Above-mentioned function.It should be noted that computer-readable medium described herein can be computer-readable signal media or Computer readable storage medium either the two any combination.Computer readable storage medium for example can be-but not Be limited to-electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.It calculates The more specific example of machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, portable of one or more conducting wires Formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or The above-mentioned any appropriate combination of person.
The calculating of the operation for executing the application can be write with one or more programming languages or combinations thereof Machine program code, described program design language include object oriented program language-such as Java, Smalltalk, C+ +, it further include conventional procedural programming language-such as " such as " language or similar programming language.Program code can Fully to execute, partly be executed on management end computer, as an independent software package on management end computer It executes, partially part executes on the remote computer or completely in remote computer or server on management end computer Upper execution.In situations involving remote computers, remote computer can pass through the network of any kind --- including local Net (LAN) or the domain wide area network (WAN) are connected to management end computer, or, it may be connected to outer computer (such as using because Spy nets service provider to connect by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Compared with prior art, the language model pre-training method and pre-training mould of combination class word information provided by the invention Type has the following beneficial effects:
One pre-training model and pre-training text are provided first, character string is extracted from the pre-training text and formed Then vocabulary extracts two sentences as training sentence from the pre-training text and the trained sentence is divided into list simultaneously Word sequence;By the way that the word in character string and the individual character sequence is matched, and to the word phase in the individual character sequence The character string matched is marked, can well using character string relevant information and not be only individual character sequence in individual character to Amount information predicts capped or replacement word, character string is marked the location information and length for often using character string It spends information to carry out, therefore, can go to predict covered or replacement word using relevance between character string well, improve pre- instruction Practice model to the accuracy of capped or replacement word, while the pre-training model that optimization is come has preferably performance Also, there is better performance in multiple tasks, for example, in Chinese word segmentation, part-of-speech tagging, Entity recognition, sentiment analysis, nature Language inference, sentence classification, machine read the Downstream Jobs such as understanding, article classification with preferably performance.
In pre-training model provided by the invention, including embeding layer, character level encoder, word level encoder, multiple notes Meaning power encoder, institute's predicate level encoder is for its corresponding position encoded input of character string insertion vector sum and is counted Calculate to obtain term vector expression, and attention encoder for the word vector expression for not covering or replacing and institute's predicate to Amount is expressed while being inputted to obtain the vector expression about described covered or replacement word, and a word level encoder knot is arranged Character level encoder is closed, so that obtaining when attention encodes and calculates the vector expression for obtaining covered or replacement word Vector expression preferably represent capped or replacement word, the pre- instruction for improving the accuracy of prediction, while optimization being come Practicing model has preferably performance also, has better performance in multiple tasks, for example, part-of-speech tagging is real in Chinese word segmentation Body identification, the Downstream Jobs such as sentiment analysis and document classification have preferably performance.And pre-training mould provided by the invention Type joined word boundary information, have stronger generative capacity, can apply and generate in keyword, and article is continued, and article is summarized Etc. among tasks, when the task of execution, it is higher to generate the sentence quality come.
The foregoing is merely present pre-ferred embodiments, are not intended to limit the invention, it is all principle of the present invention it Any modification made by interior, equivalent replacement and improvement etc. should all be comprising within protection scope of the present invention.

Claims (10)

1. a kind of language model pre-training method of combination class word information, it is characterised in that: it includes the following steps:
S1, a pre-training model and pre-training text are provided;
S2, character string is extracted from the pre-training text and forms vocabulary;
S3, two sentences are extracted from the pre-training text as training sentence while the trained sentence is divided into individual character Sequence;
S4, the word in the character string and the individual character sequence in the step S2 is matched, and to the individual character sequence In the character string that matches of word be marked;
S5, the training that the individual character that preset ratio is chosen among individual character sequence is covered or replaced, and will covered or replaced Sentence and labeled character string are input in pre-training model simultaneously to be trained and optimizes to pre-training model;
S6, the S2-S5 that repeats the above steps are until pre-training model reaches the optimal conditions of setting with the pre-training after being optimized Model.
2. combining the language model pre-training method of class word information as described in claim 1, it is characterised in that: in above-mentioned steps In S2, character string is obtained by derived algorithm or artificially extracts character string.
3. combining the language model pre-training method of class word information as described in claim 1, it is characterised in that: in above-mentioned steps In S3, the ending of each sentence adds [sep], in the beginning of the sentence of first sentence respectively in two trained sentences of extraction It adds [cls];In the step S4, using the character string location information and/or length information to the character string into Line flag.
4. combining the language model pre-training method of class word information as described in claim 1, it is characterised in that: in above-mentioned steps In S6, when executing step S2 every time, two sentences are extracted one by one from pre-training file as training sentence, until the pre- instruction Practice all sentence in text and be extracted and finishes, two extracted every time sentence be it is adjacent either non-conterminous, extracted Proportional region shared by Bi Shi, two adjacent sentences and non-conterminous two sentences is 40-70%, sum of the two 100%.
5. combining the language model pre-training method of class word information as described in claim 1, it is characterised in that: the step S5 Specifically comprise the following steps:
The objective function of S51, foundation about the pre-training model;
S52, the individual character that the individual character sequence chooses 15% is covered or is replaced;
S53, it by the training sentence for covering or replacing and labeled character string while being input in pre-training model;
S54, or the word replaced capped by pre-training model prediction with obtain represent described capped or replacement word to Amount expression;And
S55, calculating target function is expressed using the vector and optimizes the pre-training model.
6. combining the language model pre-training method of class word information as described in claim 1, it is characterised in that: the combination class The language model pre-training method of word information further includes following steps: step S7, in conjunction with the vocabulary pair formed in the step S2 Pre-training model after the optimization obtained in the step S6 carries out task fine tuning.
7. combining the language model pre-training method of class word information as claimed in claim 6, it is characterised in that: the step S7 Specifically comprise the following steps:
S71, fine tuning task text is provided;
S72, to the fine tuning task text segmentation at individual character sequence;
S73, the word in the individual character sequence in the character string and the step S72 in the step S2 is carried out to matching and to described Character string after matching is marked;
S74, the individual character sequence and labeled character string are input in the pre-training model after optimization simultaneously to pre-training Model is finely adjusted.
8. combining the language model pre-training method of class word information as claimed in claim 7, it is characterised in that: in above-mentioned steps In S74, the optimization to the pre-training model after the optimization is realized by full articulamentum or CRF network optimization objective function.
9. such as the language model pre-training method of combination class word information of any of claims 1-8, it is characterised in that: The pre-training model includes embeding layer, character level encoder, word level encoder, multiple attention encoders;Wherein,
The embeding layer is for being labeled in covered in the step S5 or the training sentence and the step S4 replaced The individual character is converted into corresponding with each individual character individual character insertion vector and by each word by character string input, the embeding layer Symbol string be converted into corresponding with each character string character string insertion vector simultaneously each individual character insertion vector sum character string be embedded in Amount is corresponding plus position encoded;
The character level encoder is for its corresponding position encoded input of individual character insertion vector sum and is calculated to obtain The word vector expression of not covered or replacement word;
Institute's predicate level encoder is for its corresponding position encoded input of character string insertion vector sum and is calculated to obtain Obtain term vector expression;
The attention encoder be it is multiple, the word vector expression for not covering or replacing for described and the term vector are expressed same When input with obtain about it is described covered or replacement word vector expression.
10. combining the language model pre-training method of class word information as claimed in claim 9, it is characterised in that: the pre- instruction Practicing model further includes Linear network layer and Softmax network layer, and the word vector expression and the term vector are expressed described in warp It is input in the Linear network layer and Softmax network layer after attention encoder output and the pre-training model is done Further training and fine tuning.
CN201910775453.4A 2019-08-21 2019-08-21 Language model pre-training method combined with similar word information Active CN110489555B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910775453.4A CN110489555B (en) 2019-08-21 2019-08-21 Language model pre-training method combined with similar word information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910775453.4A CN110489555B (en) 2019-08-21 2019-08-21 Language model pre-training method combined with similar word information

Publications (2)

Publication Number Publication Date
CN110489555A true CN110489555A (en) 2019-11-22
CN110489555B CN110489555B (en) 2022-03-08

Family

ID=68552689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910775453.4A Active CN110489555B (en) 2019-08-21 2019-08-21 Language model pre-training method combined with similar word information

Country Status (1)

Country Link
CN (1) CN110489555B (en)

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111008531A (en) * 2019-12-06 2020-04-14 北京金山数字娱乐科技有限公司 Training method and device for sentence word selection model and sentence word selection method and device
CN111144115A (en) * 2019-12-23 2020-05-12 北京百度网讯科技有限公司 Pre-training language model obtaining method and device, electronic equipment and storage medium
CN111222337A (en) * 2020-01-08 2020-06-02 山东旗帜信息有限公司 Training method and device for entity recognition model
CN111259663A (en) * 2020-01-14 2020-06-09 北京百度网讯科技有限公司 Information processing method and device
CN111401077A (en) * 2020-06-02 2020-07-10 腾讯科技(深圳)有限公司 Language model processing method and device and computer equipment
CN111460832A (en) * 2020-03-27 2020-07-28 北京百度网讯科技有限公司 Object coding method, device, system, equipment and computer storage medium
CN111522944A (en) * 2020-04-10 2020-08-11 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for outputting information
CN111581383A (en) * 2020-04-30 2020-08-25 上海电力大学 Chinese text classification method based on ERNIE-BiGRU
CN111737383A (en) * 2020-05-21 2020-10-02 百度在线网络技术(北京)有限公司 Method for extracting spatial relation of geographic position points and method and device for training extraction model
CN111798986A (en) * 2020-07-07 2020-10-20 云知声智能科技股份有限公司 Data enhancement method and equipment
CN111814448A (en) * 2020-07-03 2020-10-23 苏州思必驰信息科技有限公司 Method and device for quantizing pre-training language model
CN112016300A (en) * 2020-09-09 2020-12-01 平安科技(深圳)有限公司 Pre-training model processing method, pre-training model processing device, downstream task processing device and storage medium
CN112307212A (en) * 2020-11-11 2021-02-02 上海昌投网络科技有限公司 Public opinion delivery monitoring method for advertisement delivery
CN112329391A (en) * 2020-11-02 2021-02-05 上海明略人工智能(集团)有限公司 Target encoder generation method, target encoder generation device, electronic equipment and computer readable medium
CN112329392A (en) * 2020-11-05 2021-02-05 上海明略人工智能(集团)有限公司 Target encoder construction method and device for bidirectional encoding
CN112635013A (en) * 2020-11-30 2021-04-09 泰康保险集团股份有限公司 Medical image information processing method and device, electronic equipment and storage medium
CN112749251A (en) * 2020-03-09 2021-05-04 腾讯科技(深圳)有限公司 Text processing method and device, computer equipment and storage medium
CN113011176A (en) * 2021-03-10 2021-06-22 云从科技集团股份有限公司 Language model training and language reasoning method, device and computer storage medium thereof
CN113032559A (en) * 2021-03-15 2021-06-25 新疆大学 Language model fine-tuning method for low-resource adhesion language text classification
CN113033192A (en) * 2019-12-09 2021-06-25 株式会社理光 Training method and device for sequence labels and computer readable storage medium
CN113032560A (en) * 2021-03-16 2021-06-25 北京达佳互联信息技术有限公司 Sentence classification model training method, sentence processing method and equipment
WO2021169288A1 (en) * 2020-02-26 2021-09-02 平安科技(深圳)有限公司 Semantic understanding model training method and apparatus, computer device, and storage medium
CN113360751A (en) * 2020-03-06 2021-09-07 百度在线网络技术(北京)有限公司 Intention recognition method, apparatus, device and medium
CN113468877A (en) * 2021-07-09 2021-10-01 浙江大学 Language model fine-tuning method and device, computing equipment and storage medium
CN113486141A (en) * 2021-07-29 2021-10-08 宁波薄言信息技术有限公司 Text, resume and financing bulletin extraction method based on SegaBert pre-training model
CN113496122A (en) * 2020-04-08 2021-10-12 中移(上海)信息通信科技有限公司 Named entity identification method, device, equipment and medium
CN113591475A (en) * 2021-08-03 2021-11-02 美的集团(上海)有限公司 Unsupervised interpretable word segmentation method and device and electronic equipment
CN113656763A (en) * 2020-04-24 2021-11-16 支付宝(杭州)信息技术有限公司 Method and device for determining small program feature vector and electronic equipment
CN113779185A (en) * 2020-06-10 2021-12-10 武汉Tcl集团工业研究院有限公司 Natural language model generation method and computer equipment
CN113836297A (en) * 2021-07-23 2021-12-24 北京三快在线科技有限公司 Training method and device for text emotion analysis model
CN113887245A (en) * 2021-12-02 2022-01-04 腾讯科技(深圳)有限公司 Model training method and related device
CN113961669A (en) * 2021-10-26 2022-01-21 杭州中软安人网络通信股份有限公司 Training method of pre-training language model, storage medium and server
WO2022022421A1 (en) * 2020-07-29 2022-02-03 北京字节跳动网络技术有限公司 Language representation model system, pre-training method and apparatus, device and medium
CN114186043A (en) * 2021-12-10 2022-03-15 北京三快在线科技有限公司 Pre-training method, device, equipment and storage medium
CN114444488A (en) * 2022-01-26 2022-05-06 中国科学技术大学 Reading understanding method, system, device and storage medium for few-sample machine
CN114792097A (en) * 2022-05-14 2022-07-26 北京百度网讯科技有限公司 Method and device for determining prompt vector of pre-training model and electronic equipment
CN115017915A (en) * 2022-05-30 2022-09-06 北京三快在线科技有限公司 Model training and task executing method and device
WO2022222854A1 (en) * 2021-04-18 2022-10-27 华为技术有限公司 Data processing method and related device
CN117235233A (en) * 2023-10-24 2023-12-15 之江实验室 Automatic financial report question-answering method and device based on large model
CN113033192B (en) * 2019-12-09 2024-04-26 株式会社理光 Training method and device for sequence annotation and computer readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160027433A1 (en) * 2014-07-24 2016-01-28 Intrnational Business Machines Corporation Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods
CN108228758A (en) * 2017-12-22 2018-06-29 北京奇艺世纪科技有限公司 A kind of file classification method and device
CN108415896A (en) * 2017-02-09 2018-08-17 北京京东尚科信息技术有限公司 Deep learning model training method, segmenting method, training system and Words partition system
CN109086267A (en) * 2018-07-11 2018-12-25 南京邮电大学 A kind of Chinese word cutting method based on deep learning
CN109933795A (en) * 2019-03-19 2019-06-25 上海交通大学 Based on context-emotion term vector text emotion analysis system
CN110032644A (en) * 2019-04-03 2019-07-19 人立方智能科技有限公司 Language model pre-training method
CN110083831A (en) * 2019-04-16 2019-08-02 武汉大学 A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160027433A1 (en) * 2014-07-24 2016-01-28 Intrnational Business Machines Corporation Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods
CN108415896A (en) * 2017-02-09 2018-08-17 北京京东尚科信息技术有限公司 Deep learning model training method, segmenting method, training system and Words partition system
CN108228758A (en) * 2017-12-22 2018-06-29 北京奇艺世纪科技有限公司 A kind of file classification method and device
CN109086267A (en) * 2018-07-11 2018-12-25 南京邮电大学 A kind of Chinese word cutting method based on deep learning
CN109933795A (en) * 2019-03-19 2019-06-25 上海交通大学 Based on context-emotion term vector text emotion analysis system
CN110032644A (en) * 2019-04-03 2019-07-19 人立方智能科技有限公司 Language model pre-training method
CN110083831A (en) * 2019-04-16 2019-08-02 武汉大学 A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF

Cited By (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111008531B (en) * 2019-12-06 2023-05-26 北京金山数字娱乐科技有限公司 Training method and device for sentence selection model, sentence selection method and device
CN111008531A (en) * 2019-12-06 2020-04-14 北京金山数字娱乐科技有限公司 Training method and device for sentence word selection model and sentence word selection method and device
CN113033192A (en) * 2019-12-09 2021-06-25 株式会社理光 Training method and device for sequence labels and computer readable storage medium
CN113033192B (en) * 2019-12-09 2024-04-26 株式会社理光 Training method and device for sequence annotation and computer readable storage medium
CN111144115A (en) * 2019-12-23 2020-05-12 北京百度网讯科技有限公司 Pre-training language model obtaining method and device, electronic equipment and storage medium
CN111144115B (en) * 2019-12-23 2023-10-20 北京百度网讯科技有限公司 Pre-training language model acquisition method, device, electronic equipment and storage medium
CN111222337A (en) * 2020-01-08 2020-06-02 山东旗帜信息有限公司 Training method and device for entity recognition model
CN111259663A (en) * 2020-01-14 2020-06-09 北京百度网讯科技有限公司 Information processing method and device
CN111259663B (en) * 2020-01-14 2023-05-26 北京百度网讯科技有限公司 Information processing method and device
US11775776B2 (en) 2020-01-14 2023-10-03 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for processing information
WO2021169288A1 (en) * 2020-02-26 2021-09-02 平安科技(深圳)有限公司 Semantic understanding model training method and apparatus, computer device, and storage medium
CN113360751A (en) * 2020-03-06 2021-09-07 百度在线网络技术(北京)有限公司 Intention recognition method, apparatus, device and medium
CN112749251A (en) * 2020-03-09 2021-05-04 腾讯科技(深圳)有限公司 Text processing method and device, computer equipment and storage medium
CN112749251B (en) * 2020-03-09 2023-10-31 腾讯科技(深圳)有限公司 Text processing method, device, computer equipment and storage medium
CN111460832A (en) * 2020-03-27 2020-07-28 北京百度网讯科技有限公司 Object coding method, device, system, equipment and computer storage medium
CN111460832B (en) * 2020-03-27 2023-11-24 北京百度网讯科技有限公司 Method, device, system, equipment and computer storage medium for object coding
CN113496122A (en) * 2020-04-08 2021-10-12 中移(上海)信息通信科技有限公司 Named entity identification method, device, equipment and medium
CN111522944B (en) * 2020-04-10 2023-11-14 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for outputting information
CN111522944A (en) * 2020-04-10 2020-08-11 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for outputting information
CN113656763B (en) * 2020-04-24 2024-01-09 支付宝(中国)网络技术有限公司 Method and device for determining feature vector of applet and electronic equipment
CN113656763A (en) * 2020-04-24 2021-11-16 支付宝(杭州)信息技术有限公司 Method and device for determining small program feature vector and electronic equipment
CN111581383A (en) * 2020-04-30 2020-08-25 上海电力大学 Chinese text classification method based on ERNIE-BiGRU
CN111737383B (en) * 2020-05-21 2021-11-23 百度在线网络技术(北京)有限公司 Method for extracting spatial relation of geographic position points and method and device for training extraction model
CN111737383A (en) * 2020-05-21 2020-10-02 百度在线网络技术(北京)有限公司 Method for extracting spatial relation of geographic position points and method and device for training extraction model
CN111401077A (en) * 2020-06-02 2020-07-10 腾讯科技(深圳)有限公司 Language model processing method and device and computer equipment
CN111401077B (en) * 2020-06-02 2020-09-18 腾讯科技(深圳)有限公司 Language model processing method and device and computer equipment
CN113779185A (en) * 2020-06-10 2021-12-10 武汉Tcl集团工业研究院有限公司 Natural language model generation method and computer equipment
CN113779185B (en) * 2020-06-10 2023-12-29 武汉Tcl集团工业研究院有限公司 Natural language model generation method and computer equipment
CN111814448A (en) * 2020-07-03 2020-10-23 苏州思必驰信息科技有限公司 Method and device for quantizing pre-training language model
CN111814448B (en) * 2020-07-03 2024-01-16 思必驰科技股份有限公司 Pre-training language model quantization method and device
CN111798986B (en) * 2020-07-07 2023-11-03 云知声智能科技股份有限公司 Data enhancement method and device
CN111798986A (en) * 2020-07-07 2020-10-20 云知声智能科技股份有限公司 Data enhancement method and equipment
WO2022022421A1 (en) * 2020-07-29 2022-02-03 北京字节跳动网络技术有限公司 Language representation model system, pre-training method and apparatus, device and medium
CN112016300A (en) * 2020-09-09 2020-12-01 平安科技(深圳)有限公司 Pre-training model processing method, pre-training model processing device, downstream task processing device and storage medium
CN112329391A (en) * 2020-11-02 2021-02-05 上海明略人工智能(集团)有限公司 Target encoder generation method, target encoder generation device, electronic equipment and computer readable medium
CN112329392A (en) * 2020-11-05 2021-02-05 上海明略人工智能(集团)有限公司 Target encoder construction method and device for bidirectional encoding
CN112329392B (en) * 2020-11-05 2023-12-22 上海明略人工智能(集团)有限公司 Method and device for constructing target encoder of bidirectional encoding
CN112307212A (en) * 2020-11-11 2021-02-02 上海昌投网络科技有限公司 Public opinion delivery monitoring method for advertisement delivery
CN112635013A (en) * 2020-11-30 2021-04-09 泰康保险集团股份有限公司 Medical image information processing method and device, electronic equipment and storage medium
CN112635013B (en) * 2020-11-30 2023-10-27 泰康保险集团股份有限公司 Medical image information processing method and device, electronic equipment and storage medium
CN113011176A (en) * 2021-03-10 2021-06-22 云从科技集团股份有限公司 Language model training and language reasoning method, device and computer storage medium thereof
CN113032559A (en) * 2021-03-15 2021-06-25 新疆大学 Language model fine-tuning method for low-resource adhesion language text classification
CN113032560B (en) * 2021-03-16 2023-10-27 北京达佳互联信息技术有限公司 Sentence classification model training method, sentence processing method and equipment
CN113032560A (en) * 2021-03-16 2021-06-25 北京达佳互联信息技术有限公司 Sentence classification model training method, sentence processing method and equipment
WO2022222854A1 (en) * 2021-04-18 2022-10-27 华为技术有限公司 Data processing method and related device
CN113468877A (en) * 2021-07-09 2021-10-01 浙江大学 Language model fine-tuning method and device, computing equipment and storage medium
CN113836297A (en) * 2021-07-23 2021-12-24 北京三快在线科技有限公司 Training method and device for text emotion analysis model
CN113836297B (en) * 2021-07-23 2023-04-14 北京三快在线科技有限公司 Training method and device for text emotion analysis model
CN113486141A (en) * 2021-07-29 2021-10-08 宁波薄言信息技术有限公司 Text, resume and financing bulletin extraction method based on SegaBert pre-training model
CN113591475A (en) * 2021-08-03 2021-11-02 美的集团(上海)有限公司 Unsupervised interpretable word segmentation method and device and electronic equipment
CN113961669A (en) * 2021-10-26 2022-01-21 杭州中软安人网络通信股份有限公司 Training method of pre-training language model, storage medium and server
CN113887245B (en) * 2021-12-02 2022-03-25 腾讯科技(深圳)有限公司 Model training method and related device
CN113887245A (en) * 2021-12-02 2022-01-04 腾讯科技(深圳)有限公司 Model training method and related device
CN114186043A (en) * 2021-12-10 2022-03-15 北京三快在线科技有限公司 Pre-training method, device, equipment and storage medium
CN114444488A (en) * 2022-01-26 2022-05-06 中国科学技术大学 Reading understanding method, system, device and storage medium for few-sample machine
CN114792097A (en) * 2022-05-14 2022-07-26 北京百度网讯科技有限公司 Method and device for determining prompt vector of pre-training model and electronic equipment
CN115017915A (en) * 2022-05-30 2022-09-06 北京三快在线科技有限公司 Model training and task executing method and device
CN117235233A (en) * 2023-10-24 2023-12-15 之江实验室 Automatic financial report question-answering method and device based on large model

Also Published As

Publication number Publication date
CN110489555B (en) 2022-03-08

Similar Documents

Publication Publication Date Title
CN110489555A (en) A kind of language model pre-training method of combination class word information
CN107943847B (en) Business connection extracting method, device and storage medium
CN110232114A (en) Sentence intension recognizing method, device and computer readable storage medium
CN109145294B (en) Text entity identification method and device, electronic equipment and storage medium
CN106844349B (en) Comment spam recognition methods based on coorinated training
CN111062217B (en) Language information processing method and device, storage medium and electronic equipment
CN110020438A (en) Enterprise or tissue Chinese entity disambiguation method and device based on recognition sequence
CN107330011A (en) The recognition methods of the name entity of many strategy fusions and device
CN112101041B (en) Entity relationship extraction method, device, equipment and medium based on semantic similarity
CN110222188A (en) A kind of the company's bulletin processing method and server-side of multi-task learning
CN110489750A (en) Burmese participle and part-of-speech tagging method and device based on two-way LSTM-CRF
CN110196982A (en) Hyponymy abstracting method, device and computer equipment
CN115357719B (en) Power audit text classification method and device based on improved BERT model
CN110334186A (en) Data query method, apparatus, computer equipment and computer readable storage medium
CN113919366A (en) Semantic matching method and device for power transformer knowledge question answering
CN111738018A (en) Intention understanding method, device, equipment and storage medium
Yolchuyeva et al. Self-attention networks for intent detection
CN113095063A (en) Two-stage emotion migration method and system based on masking language model
Yuan et al. Personalized sentence generation using generative adversarial networks with author-specific word usage
CN110287396A (en) Text matching technique and device
CN115510188A (en) Text keyword association method, device, equipment and storage medium
CN113761875B (en) Event extraction method and device, electronic equipment and storage medium
CN114239555A (en) Training method of keyword extraction model and related device
Jiang et al. Construction of segmentation and part of speech annotation model in ancient chinese
Liu Supervised ensemble learning for Vietnamese tokenization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant