CN109800435A - A kind of training method and device of language model - Google Patents
A kind of training method and device of language model Download PDFInfo
- Publication number
- CN109800435A CN109800435A CN201910086877.XA CN201910086877A CN109800435A CN 109800435 A CN109800435 A CN 109800435A CN 201910086877 A CN201910086877 A CN 201910086877A CN 109800435 A CN109800435 A CN 109800435A
- Authority
- CN
- China
- Prior art keywords
- vector
- participle
- word
- target
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Machine Translation (AREA)
Abstract
This application discloses a kind of training method of language model and devices, the described method includes: after getting including the language model training data of great amount of samples text, these sample texts can be subjected to word segmentation processing, to obtain the participle label of each participle, wherein, participle label includes location information of each word in corresponding participle in corresponding participle, then, language model can be trained according to the participle label of participle each in these sample texts.As it can be seen that the participle label of sample text is trained language model as training data by the application, so that data used in train language model are more fully, so as to reduce the PPL value of language model, the effect of language model is improved.
Description
Technical field
This application involves field of computer technology more particularly to the training methods and device of a kind of language model.
Background technique
Language model can estimate the probability of one section of text, that is, one character string of description belongs to natural language
Probability.The application in terms of many natural language processings can be used in language model, for example, speech recognition, machine translation, part of speech
Mark, syntactic analysis and information retrieval, etc..
However, the effect of existing language model is ideal not enough, therefore, it is current for how improving the effect of language model
Assistant officer's technical problem to be solved.
Summary of the invention
The main purpose of the embodiment of the present application is to provide the training method and device of a kind of language model, is able to ascend language
Say the effect of model.
The embodiment of the present application provides a kind of training method of language model, comprising:
The training data of language model is obtained, the training data includes a large amount of sample text;
The sample text is subjected to word segmentation processing, obtains the participle label of each participle, the participle label includes pair
Location information of each word in corresponding participle in should segmenting;
According to the participle label of participle each in the sample text, the language model is trained.
Optionally, the participle label according to participle each in the sample text, instructs the language model
Practice, comprising:
Using each word in the sample text as target word;
The label vector of each target word in the sample text is generated, the label vector characterizes corresponding target word institute
Belong to the relevant information for the correspondence target word for including in the participle label of participle;
According to the label vector of target word each in the sample text, the language model is trained.
Optionally, the label vector according to target word each in the sample text carries out the language model
Training, comprising:
Obtain the word vector of each target word in the sample text;
The word vector sum label vector of target word same in the sample text is subjected to Vector Fusion, obtains the first fusion
Vector;
According to the first of target word each in the sample text the fusion vector, the language model is trained.
Optionally, the word vector sum label vector by target word same in the sample text carries out Vector Fusion,
Include:
By the label vector of target word same in the sample text, it is inserted into the default of the word vector of the same target word
At position;
Alternatively, the label vector of target word same in the sample text is replaced the word vector of the same target word
In vector element.
Optionally, the participle label according to participle each in the sample text, instructs the language model
Practice, comprising:
It is segmented each participle in the sample text as target;
The label vector of each target participle in the sample text is generated, the label vector characterizes corresponding target point
The information of the participle label of word;
According to the label vector that target each in the sample text segments, the language model is trained.
Optionally, it is described according to target each in the sample text segment label vector, to the language model into
Row training, comprising:
Obtain the term vector of each target participle in the sample text;
The term vector of target same in sample text participle and label vector are subjected to Vector Fusion, second is obtained and melts
Resultant vector;
According to the second fusion vector of target each in sample text participle, the language model is trained.
Optionally, described to melt the term vector of target same in sample text participle and label vector progress vector
It closes, comprising:
The label vector that target same in the sample text is segmented is inserted into the participle vector of the same target participle
Predetermined position;
Alternatively, the label vector that target same in the sample text is segmented, replaces point of the same target participle
Vector element in term vector.
Optionally, the participle label further includes probability when corresponding participle belongs to participle.
The embodiment of the present application also provides a kind of training devices of language model, comprising:
Training data acquiring unit, for obtaining the training data of language model, the training data includes a large amount of sample
This text;
Tag-obtaining unit is segmented, for the sample text to be carried out word segmentation processing, obtains the participle mark of each participle
Label, the participle label include location information of each word in corresponding participle in corresponding participle;
Language model training unit, for the participle label according to participle each in the sample text, to the language
Model is trained.
Optionally, the language model training unit includes:
Target word obtains subelement, for using each word in the sample text as target word;
Primary vector generates subelement, for generating the label vector of each target word in the sample text, the mark
Label vector characterizes the relevant information for the correspondence target word for including in the participle label segmented belonging to corresponding target word;
First language model training subelement, it is right for the label vector according to target word each in the sample text
The language model is trained.
Optionally, the first language model training subelement includes:
Word vector obtains subelement, for obtaining the word vector of each target word in the sample text;
Primary vector merge subelement, for by the word vector sum label vector of target word same in the sample text into
Row vector fusion, obtains the first fusion vector;
First model training subelement, it is right for merging vector according to the first of target word each in the sample text
The language model is trained.
Optionally, the primary vector fusion subelement is specifically used for:
By the label vector of target word same in the sample text, it is inserted into the default of the word vector of the same target word
At position;
Alternatively, the label vector of target word same in the sample text is replaced the word vector of the same target word
In vector element.
Optionally, the language model training unit includes:
Target participle obtains subelement, for segmenting each participle in the sample text as target;
Secondary vector generates subelement, described for generating the label vector of each target participle in the sample text
Label vector characterizes the information of the participle label of corresponding target participle;
Second language model training subelement, the label vector for being segmented according to target each in the sample text,
The language model is trained.
Optionally, the second language model training subelement includes:
Term vector obtains subelement, for obtaining the term vector of each target participle in the sample text;
Secondary vector merges subelement, term vector and label vector for segmenting target same in the sample text
Vector Fusion is carried out, the second fusion vector is obtained;
Second model training subelement, the second fusion vector for being segmented according to target each in the sample text,
The language model is trained.
Optionally, the secondary vector fusion subelement is specifically used for:
The label vector that target same in the sample text is segmented is inserted into the participle vector of the same target participle
Predetermined position;
Alternatively, the label vector that target same in the sample text is segmented, replaces point of the same target participle
Vector element in term vector.
Optionally, the participle label further includes probability when corresponding participle belongs to participle.
The embodiment of the present application also provides a kind of training equipment of language model, comprising: processor, memory, system are total
Line;
The processor and the memory are connected by the system bus;
The memory includes instruction, described instruction for storing one or more programs, one or more of programs
The processor is set to execute any one realization side in the training method of above-mentioned language model when being executed by the processor
Formula.
The embodiment of the present application also provides a kind of computer readable storage medium, deposited in the computer readable storage medium
Instruction is contained, when described instruction is run on the terminal device, so that the terminal device executes the training of above-mentioned language model
Any one implementation in method.
The embodiment of the present application also provides a kind of computer program product, the computer program product is on the terminal device
When operation, so that the terminal device executes any one implementation in the training method of above-mentioned language model.
The training method and device of a kind of language model provided by the embodiments of the present application are being got including great amount of samples text
After this language model training data, these sample texts can be subjected to word segmentation processing, to obtain the participle mark of each participle
Label, wherein participle label includes location information of each word in corresponding participle in corresponding participle then can be according to this
The participle label of each participle, is trained language model in a little sample texts.As it can be seen that the embodiment of the present application is by sample text
Participle label language model is trained as training data so that data used in train language model are more complete
Face improves the effect of language model so as to reduce the PPL value of language model.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the application
Some embodiments for those of ordinary skill in the art without creative efforts, can also basis
These attached drawings obtain other attached drawings.
Fig. 1 is a kind of flow diagram of the training method of language model provided by the embodiments of the present application;
Fig. 2 is that the participle label provided by the embodiments of the present application according to participle each in sample text carries out language model
One of trained flow diagram;
Fig. 3 is that the participle label provided by the embodiments of the present application according to participle each in sample text carries out language model
The two of trained flow diagram;
Fig. 4 is a kind of composition schematic diagram of the training device of language model provided by the embodiments of the present application.
Specific embodiment
Currently, people, which generally prefer that, carrys out train language model using neural network, to complete in natural language processing field
At language processing tasks such as speech recognition, machine translation, part-of-speech tagging, syntactic analysis and information retrievals, this is because making
When carrying out the training of language model with neural network, whole process participates in training without artificial, can be obtained preferable language model
Training result.
However, the training process of above-mentioned entire language model be it is sightless, people are difficult to explain can in training process
The reason of various phenomenons that can occur, also, during the existing progress language model training using neural network, often
Some useful information (the participle label average information such as generated during participle) can be ignored, cause the training effect of model
It is not ideal enough.
Specifically, when carrying out word segmentation processing to text using segmenting method (such as based on the segmenting method of dictionary), people
People often only focuses on whether word segmentation result is each participle for including in text, if so, can will be each in each participle
The feature of this characterization text basic semantic information of the word vector or the term vector of each word of word, the input number as language model
According to, to be trained to language model, but often ignore using segmenting method carry out word segmentation processing during generate
Other useful informations, such as in text each word segmented belonging to it in location information, i.e., participle label information.So
In the existing training process for carrying out language model using neural network, this spy that will only characterize text basic semantic information
The mode that sign (term vector of the word vector of each word or each word in text) is trained as input data, training are used
Input data it is not comprehensive enough, cause the training result of language model not ideal enough.
To solve drawbacks described above, the embodiment of the present application provides a kind of training method of language model, get including
After the training data of great amount of samples text, word segmentation processing can be carried out to these sample texts, and obtain during processing every
The participle label of one participle, wherein participle label includes location information of each word in corresponding participle in corresponding participle, is connect
, language model can be trained according to the participle label of participle each in these sample texts.As it can be seen that compared to existing
Some merely with characterization text basic semantic information feature (term vector of the word vector of each word or each word in text) into
The method of row language model training, the application is using the participle label for generate during word segmentation processing to sample text as defeated
Enter data to be trained language model, that is, convert this average information of participle label generated during word segmentation processing to
Data mode is simultaneously added in the input data of train language model, so that input data used in train language model is more
To be comprehensive, so as to reduce fascination degree (Perplexity, abbreviation PPL) value of language model, the effect of language model is improved
Fruit.
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is
Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall in the protection scope of this application.
First embodiment
It is a kind of flow diagram of the training method of language model provided in this embodiment referring to Fig. 1, this method includes
Following steps:
S101: the training data of language model is obtained, wherein the training data includes a large amount of sample text.
In the present embodiment, the present embodiment will be used to carry out any text definition of language model training as sample text.
Also, the languages type of the unlimited sample text processed of the present embodiment, for example, sample text can be Chinese text or English text
Deng;The length of the present embodiment also unlimited sample text processed, for example, sample text can be sentence text, be also possible to chapter text
This;The source of the present embodiment also unlimited sample text processed, is answered for example, sample text can be interview, meeting, speech, debate etc.
With the corresponding text identified of scene.
In the present embodiment, each sample text that can be will acquire constitutes the training number for being used for train language model
According to realize the training to language model by subsequent step.
S102: for each sample text, which is subjected to word segmentation processing, obtains the participle mark of each participle
Label, wherein the participle label includes location information of each word in corresponding participle in corresponding participle.
In the present embodiment, it after the training data that language model is got by step S101, can use existing or not
Segmenting method to occur carries out word segmentation processing to the sample text in training data.For example, minimum entropy calculation can be used first
Method is based on official document corpus (i.e. linguistic data) and constructs special dictionary, then, using the dictionary to the sample text in training data
Segmented, still, since some words in sample text may be not present in the dictionary, if only rely on the dictionary into
Row participle, possibly can not accurately obtain word segmentation result, at this point it is possible to using hidden Markov model (Hidden Markov is based on
Model, abbreviation HMM) statistical method sample text is further segmented, to realize to the accurate participle of sample text,
And during participle, the participle label of each participle is obtained.
It specifically, is exactly to be input to the participle model based on HMM using sample text as input data and segmented,
By the model, the corresponding participle label of each participle in the sample text is exported, participle label includes every in corresponding participle
Location information of one word in corresponding participle, for the location information of each word, can be expressed as B (begin), M (middle),
One of E (end), S (single).Wherein, B (begin) indicates starting position of the word in corresponding participle, i.e. prefix position
It sets;M (middle) indicates middle position of the word in corresponding participle, i.e. position in word;E (end) indicates the word at corresponding point
End position in word, i.e. suffix position;S (single) indicates that the word itself is exactly a word, i.e. individual character at word, that is,
Say, when S indicates that the word is in corresponding participle (that is, the word itself), both prefix position and also in word position, also in suffix position
It sets.
For example: assuming that sample text " I comes Tsinghua University " is used as input data, it is input to point based on HMM
Word model is segmented, and by the model, the participle tag combination for each participle that can be exported is " SBEBMME ", Mei Yifen
Location information of the corresponding word in corresponding participle, the participle label " S " of such as output illustrate in the corresponding participle of word tag representation
For individual character at " I " of word, " B " after " S " illustrates location information of the word " next " in participle " coming " in participle " coming ",
That is prefix position, and so on, the participle of the available corresponding each participle label of the sample text and the sample text
As a result " I comes Tsinghua University ".
It should be noted that in a kind of optional implementation method, participle label corresponding for each participle, the participle mark
Label further comprise the probability (that is, size a possibility that belonging to participle) when corresponding participle belongs to participle.
In this implementation, participle label not only may include position of each word in corresponding participle in corresponding participle
Confidence breath can also include probability when corresponding participle belongs to participle, be defined as alpha.Specifically, in this step S102,
When the segmenting method using existing or future appearance carries out word segmentation processing to sample text, for example, when utilizing point based on HMM
Word model and/or based on the segmenting method of dictionary to sample text carry out word segmentation processing when, during word segmentation processing
It obtains each participle and belongs to the probability actually segmented (that is, size a possibility that belonging to a practical participle).For example:
By the way that sample text " I comes Tsinghua University " is used as input data, it is input to after the participle model based on HMM segmented,
It not only may include " S " by the participle label for the participle " I " that the model exports, can also include that participle " I " belongs to one
Probability alpha when practical participle, for example, when the value of alpha is 90%, then when showing that " I " belongs to a practical participle
Probability be 90%, that is, " I " has 90% a possibility that belong to participle.
S103: according to the participle label of participle each in the sample text, language model is trained.
In the present embodiment, after the participle label that each participle in sample text is obtained by step S102, further may be used
It, specifically, can be by the participle mark of participle each in sample text be trained to language model according to the participle label
Label and word segmentation result are input to language model and are trained to it collectively as input data, specifically, can be by sample text
In each participle participle label be converted to label vector, with sample text in each word word vector or each participle word to
Amount, collectively as input data, is input to language model and is trained to it.
Participle label by introducing each participle of great amount of samples text instructs language model as input data
Practice, the PPL value of language model can be effectively reduced, shown in experimental data, PPL value can be reduced to 12 from 22, that is, will
The PPL value of language model reduces 45%, to improve the effect of language model.
Wherein, PPL value refers to the index for measuring language model effect quality, is mainly estimated according to each participle
Meter belongs to the probability that a word of natural language occurs, and specific formula for calculation is as follows:
Wherein, n indicates the length of sentence, that is, the number of word;P (wi) indicates the probability of i-th of word, for example, first word
Probability be p (w1 | w0), and w0 indicates the starting word of sentence, can be indicated with placeholder;PPL (s) is indicated according to each word
Probability to estimate that a word s for belonging to natural language occurs then shows p by above-mentioned formula (1) it is found that PPL (s) is smaller
(wi) value is bigger, to show that a word s probability of occurrence for belonging to natural language is higher, and then shows the effect of language model
Better.
It should be noted that this step S103 may include two kinds of specific implementations, both specific implementations will
It is introduced in second embodiment and 3rd embodiment respectively.
To sum up, the training method of language model provided in this embodiment is getting the language including great amount of samples text
After model training data, these sample texts can be subjected to word segmentation processing, to obtain the participle label of each participle, wherein
Participle label includes that location information of each word in corresponding participle in corresponding participle then can be according to these samples text
Participle label of each participle, is trained language model in this.As it can be seen that the embodiment of the present application is by the participle mark of sample text
Label are trained language model as training data, so that data used in train language model are more fully, so as to
The PPL value for enough reducing language model, improves the effect of language model.
Second embodiment
Under normal conditions, can use each sample text in training data, respectively through the above steps S103 to language
Say that model carries out a wheel training, by obtaining final language model after taking turns training more.It should be noted that the present embodiment will be right
The specific implementation of above-mentioned steps S103 is introduced, and (each in the sample text to the sample text for using epicycle
Word) carry out model training training method be illustrated, the training method using other sample texts is similar therewith, no longer one by one
It repeats.
Referring to fig. 2, it illustrates the participle labels provided in this embodiment according to participle each in sample text to language
One of the flow diagram that model is trained, the process the following steps are included:
S201: using each word in sample text as target word.
In the present embodiment, in order to which according to the participle label of participle each in sample text, (participle label includes pair
Location information of each word in corresponding participle in should segmenting), language model is trained, it first can be by sample text
In each word be defined as target word, then each target word is handled by subsequent step, with according to processing result reality
Now to the training of language model.
S202: the label vector of each target word in sample text is generated, wherein the label vector characterizes corresponding target
The relevant information for the correspondence target word for including in the participle label segmented belonging to word.
In the present embodiment, after obtaining the participle label of each participle in sample text by step S102, due to dividing
Word label includes location information of each target word in corresponding participle in corresponding participle therefore can be according to each target
Location information of the word in affiliated participle generates the corresponding label vector of each target word.It specifically, can be by each target word
Location information be converted to four dimensional vectors, as its corresponding label vector.
In one implementation, the label vector about each target word, the label vector characterize corresponding target word
Location information of the correspondence target word for including in the participle label of affiliated participle in affiliated participle.
For example: it is each to segment corresponding participle set of tags still by taking sample text " I comes Tsinghua University " as an example
It is combined into " SBEBMME ", respectively corresponds each target word.Wherein, " B " corresponding four-dimensional label vector can for [1,0,0,0],
It can be [0,0,1,0], " S " that " M " corresponding four-dimensional label vector, which can be [0,1,0,0], " E " corresponding four-dimension label vector,
Corresponding four-dimension label vector can be [0,0,0,1].
In another implementation, the label vector about each target word, the label vector not only characterize correspondence
Location information of the correspondence target word for including in the participle label segmented belonging to target word in affiliated participle, also carries correspondence
Participle belonging to target word belongs to probability when a practical participle.Specifically, the location information of each target word can be converted
After above-mentioned four dimensional vectors being made of 1 and 0, the vector element which is 1 is divided with belonging to corresponding target word
Probability alpha when word belongs to participle is multiplied (i.e. 1*alpha), and using four dimensional vectors after multiplication as the mark of corresponding target word
Sign vector.
For example: it is based on the example above, the participle mark of the target word " I " in sample text " I comes Tsinghua University "
Label not only include " S " for characterizing its location information, can also include probability when target word " I " belongs to a practical participle
Alpha, it is assumed that the value of the alpha is 90%, then shows that probability when target word " I " belongs to a practical participle is 90%,
It, can also be by the four dimensional vectors intermediate value so as to after its corresponding location information is converted to four dimensional vectors [0,0,0,1]
For 1 vector element participle corresponding with target word " I " (i.e. individual character at word " I ") belong to a practical participle when probability
Alpha (90%) is multiplied, i.e. 1*90%, to obtain four dimensional vectors [0,0,0,0.9], and by four dimensional vector [0,0,0,
0.9] it is used as target word " I " corresponding label vector.
It should be noted that in the present embodiment, it can also be by the corresponding location information of target word each in sample text
Be converted to the multi-C vector of other dimension forms.For example, the corresponding label vector of location information " B " can both be expressed as it is four-dimensional to
Measure [1,0,0,0], can also each dimension values in four dimensional vector [1,0,0,0] be extended to 4 dimensions, be expressed as 16 dimensional vectors
[1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0]。
S203: according to the label vector of target word each in sample text, language model is trained.
In the present embodiment, it is generated in sample text after the label vector of each target word by step S202, it can be right
The label vector of each target word carries out data processing, and is trained according to processing result to language model, specifically,
In a kind of implementation of the present embodiment, this step S203 may comprise steps of A1-A3:
Step A1: the word vector of each target word in sample text is obtained.
In this implementation, in order to realize to language model according to the label vector of target word each in sample text
Training, the word vector of target word each into sample text available first, for example, can carry out language model training
Before, using the vector generation method of existing or future appearance, the word vector of corresponding 128 dimension of each target word is generated.
Step A2: the word vector sum label vector of target word same in sample text is subjected to Vector Fusion, obtains first
Merge vector.
In this implementation, the word vector of each target word in sample text is obtained by step A1, and passes through step
Rapid S202 is generated in sample text after the label vector of each target word, can be by the word vector of target word same in sample text
Vector Fusion is carried out with label vector, to obtain the first fusion vector, then by subsequent step A3 realization to the instruction of language model
Practice.Wherein, by the mode that the word vector sum label vector of target word same in sample text carries out Vector Fusion may include with
Lower two kinds:
The first Vector Fusion mode is, by the label vector of target word same in sample text, to be inserted into same target word
Word vector predetermined position.For example: assuming that the vector of target word is 128 dimensions, [a can be used1,a2,...,a128] come
Indicate, and the corresponding label vector of the target word be [1,0,0,0], then can by the label vector [1,0,0,0] of the target word,
It is inserted into the predetermined position of the word vector of the target word, forms the vector of one 132 dimension, for example is inserted into the target word
The original position of word vector, then fused vector is [1,0,0,0, a1,a2,...,a128]。
It should be noted that " predeterminated position " in the first above-mentioned Vector Fusion mode can be set according to the actual situation
Set, the application to this without limiting, also, during every wheel is trained, according to the first above-mentioned Vector Fusion mode to every
When one target word carries out Vector Fusion, " predeterminated position " should be consistent, such as each target word, can be by the mesh
The label vector of marking-up is inserted into the original position of the word vector of the target word, alternatively, can be by the label of the target word
Vector is inserted at the end position of the word vector of the target word etc., to guarantee the format consistency of training data.
Second of Vector Fusion mode is, by the label vector of target word same in sample text, to replace same target word
Word vector in vector element.For example: it will again be assumed that the vector of target word is 128 dimensions, it can still use [a1,a2,...,
a128] indicate, and the target word corresponding label vector is [1,0,0,0], then can by the label vector of the target word [1,
0,0,0], the vector element in the word vector of the target word is replaced, so that replaced vector is still the vector of one 128 dimension,
For example the preceding four-dimensional vector element of the word vector of the target word can be replaced, then fused vector is [1,0,0,0, a5,
a6,...,a128]。
It should be noted that the position of the vector element being replaced in above-mentioned second of Vector Fusion mode can be according to reality
Border situation is configured, the application to this without limiting, also, during every wheel is trained, according to above-mentioned second of vector
When amalgamation mode carries out Vector Fusion to each target word, the position of the vector element of replacement should be consistent, such as
The label vector of the target word can be replaced the preceding four-dimensional vector element of the word vector of the target word by each target word, or
The label vector of the target word can be replaced rear four-dimensional vector element of the word vector of the target word etc., to guarantee to instruct by person
Practice the format consistency of data.
Step A3: according to the first of target word each in sample text the fusion vector, language model is trained.
In this implementation, the word vector sum label vector of target word same in sample text is carried out by step A2
Vector Fusion can be using the first fusion vector of target word each in sample text as input after obtaining the first fusion vector
Data are input to language model, are updated based on model output result to model parameter, to complete the sheet to language model
Wheel training.
To sum up, the present embodiment merges the corresponding label vector of the word vector of each word in sample text,
Fused first fusion vector is generated, thus can when being trained to language model using the first fusion vector
So that data used in train language model are more fully, and then promote the effect of language model.
3rd embodiment
Under normal conditions, can use each sample text in training data, respectively through the above steps S103 to language
Say that model carries out a wheel training, by obtaining final language model after taking turns training more.It should be noted that the present embodiment will be right
The specific implementation of above-mentioned steps S103 is introduced, and (each in the sample text to the sample text for using epicycle
Word) carry out model training training method be illustrated, the training method using other sample texts is similar therewith, no longer one by one
It repeats.
Referring to Fig. 3, it illustrates the participle labels provided in this embodiment according to participle each in sample text to language
The two of the flow diagram that model is trained, the process the following steps are included:
S301: it is segmented each participle in sample text as target.
In the present embodiment, in order to which according to the participle label of participle each in sample text, (participle label includes pair
Location information of each word in corresponding participle in should segmenting), language model is trained, it first can be by sample text
In each participle be defined as target participle, then by subsequent step to each target participle handle, according to processing
As a result the training to language model is realized.
S302: the label vector of each target participle in sample text is generated, wherein the label vector characterizes corresponding mesh
The information of the participle label of mark participle.
It in the present embodiment, can after obtaining the participle label of each target participle in sample text by step S102
To segment corresponding participle label according to each target, the label vector of each target participle is generated, for example, can be by each mesh
Mark segments corresponding participle label and is converted to multi-C vector, as its corresponding label vector.
In one implementation, the label vector about each target participle, the label vector characterize corresponding target
The location information about each word in the participle label of participle.
For example: it is each to segment corresponding participle set of tags still by taking sample text " I comes Tsinghua University " as an example
It is combined into " SBEBMME ", word segmentation result is " I comes Tsinghua University ", wherein participle label shares tri- kinds of forms of BE, BME, S.Its
In, the four-dimensional representation of the label vector based on word each in second embodiment can make " BE " corresponding label vector
[1,0,0,0,0,0,1,0], make " BME " corresponding label vector [1,0,0,0,0,1,0,0,0,0,1,0], keep " S " corresponding
Label vector be [0,0,0,1], in this way, having obtained the label vector of each target participle in sample text.
In another implementation, the label vector about each target participle, the label vector not only characterize pair
The location information about each word in participle label for answering target to segment also carries corresponding target participle and belongs to a reality
Probability when participle.Specifically, each target can be segmented corresponding participle label be converted to it is above-mentioned be made of 1 and 0 it is more
After dimensional vector, then vector element and corresponding target that the multi-C vector intermediate value is 1 are segmented to probability alpha phase when belonging to participle
Multiply (i.e. 1*alpha), and using the multi-C vector after multiplication as the label vector of corresponding target participle.
For example: it is based on the example above, the participle of the target participle " I " in sample text " I comes Tsinghua University "
Label not only includes " S " for characterizing its location information, can also include that target participle " I " belongs to when a reality segments
Probability alpha, it is assumed that the value of the alpha is 95%, then shows probability when target participle " I " belongs to a practical participle
Be 95%, so as to after participle label that its is corresponding is converted to four dimensional vectors [0,0,0,1], can also by the four-dimension to
The probability alpha (95%) that the vector element that amount intermediate value is 1 segments when " I " belongs to a practical participle with target is multiplied, i.e. 1*
95%, to obtain four dimensional vectors [0,0,0,0.95], and by four dimensional vector [0,0,0,0.95] as target participle " I "
Corresponding label vector.Similarly, " BE " corresponding 8 dimensional vector and " BME " corresponding 12 dimensional vector, also will be in respective vector
" 1 " is used as label vector after being multiplied with corresponding alpha.
It should be noted that in the present embodiment, target each in sample text can also be segmented corresponding participle and marked
Label are converted to the multi-C vector of other dimension forms.For example, the corresponding label vector of participle label " BE " can both be expressed as eight
Dimensional vector [1,0,0,0,0,0,1,0] can also extend each dimension values in the octuple vector [1,0,0,0,0,0,1,0]
For 4 dimension, be expressed as 32 dimensional vectors [1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
0,0,0,0]。
S303: the label vector segmented according to target each in sample text is trained language model.
In the present embodiment, it is generated in sample text after the label vector of each target participle by step S302, it can be with
Data processing is carried out to the label vector of each target participle, and language model is trained according to processing result, it is specific next
It says, in a kind of implementation of the present embodiment, this step S303 may comprise steps of B1-B3:
Step B1: the term vector of each target participle in sample text is obtained.
In this implementation, for the label vector segmented according to target each in sample text, realize to language mould
The training of type, the term vector of target participle each into sample text available first, for example, language model can carried out
Before training, using the vector generation method of existing or future appearance, each target participle is generated (assuming that target participle is by two
Word composition) corresponding 256 term vectors tieed up.
Step B2: the term vector of target same in sample text participle and label vector are subjected to Vector Fusion, obtain the
Two fusion vectors.
In this implementation, the term vector of each target participle in sample text is obtained by step B1, and is passed through
Step S302 is generated in sample text after the label vector of each target participle, can be by target same in sample text participle
Term vector and label vector carry out Vector Fusion, to obtain the second fusion vector, then by subsequent step B3 realization to language mould
Type is trained.Wherein, the term vector of target same in sample text participle and label vector are carried out to the mode of Vector Fusion
May include following two:
The first Vector Fusion mode is that the label vector for segmenting target same in sample text is inserted into same target
The predetermined position of the participle vector of participle.For example: assuming that the term vector of target participle is 256 dimensions, [b can be used1,
b2,...,b256] indicate, and the target segment corresponding label vector be " BE " corresponding label vector [1,0,0,0,0,0,
1,0], then the participle vector of target participle can by the label vector [1,0,0,0,0,0,1,0] that the target segments, be inserted into
Predetermined position, forms the vector of one 264 dimension, for example is inserted into the initial position of the participle vector of target participle
Place, then fused vector is [1,0,0,0,0,0,1,0, b1,b2,...,b256]。
It should be noted that " predeterminated position " in the first above-mentioned Vector Fusion mode can be set according to the actual situation
Set, the application to this without limiting, also, during every wheel is trained, according to the first above-mentioned Vector Fusion mode to every
When one target participle carries out Vector Fusion, " predeterminated position " should be consistent, for example each target is segmented, and can be incited somebody to action
The label vector of target participle is inserted into the original position of the participle vector of target participle, alternatively, can be by the mesh
At the end position for the participle vector that the label vector for marking participle is inserted into target participle etc., to guarantee the format of training data
Consistency.
Second of Vector Fusion mode is that the label vector for segmenting target same in sample text replaces same target
Vector element in the participle vector of participle.For example: it will again be assumed that the term vector of target participle is 256 dimensions, can still use
[b1,b2,...,b256] indicate, and the target segment corresponding label vector be still " BE " corresponding label vector [1,0,0,
0,0,0,1,0], then the label vector [1,0,0,0,0,0,1,0] that the target segments can be replaced into the participle of target participle
Vector element in vector so that replaced vector is still the vector of one 256 dimension, for example can replace target participle
The first eight dimensional vector element of vector is segmented, then fused vector is [1,0,0,0,0,0,1,0, b9,b10,...,b256]。
It should be noted that the position for the vector element being replaced in above-mentioned second of Vector Fusion mode can be according to reality
Situation is configured, and the application without limiting, also, during every wheel is trained, melts this according to above-mentioned second of vector
When conjunction mode segments progress Vector Fusion to each target, the position of the vector element of replacement should be consistent, such as
Each target participle, the label vector that can segment the target replace the first eight dimensional vector of the participle vector of target participle
Element, alternatively, the label vector that the target can be segmented replaces the rear octuple element vector of the participle vector of target participle
Element etc., to guarantee the format consistency of training data.
Step B3: according to the second fusion vector of target each in sample text participle, language model is trained.
In this implementation, by the step B2 term vector for segmenting target same in sample text and label vector into
Row vector fusion, after obtaining the second fusion vector, can using target each in sample text segment second fusion vector as
Input data is input to language model, is updated based on model output result to model parameter, to complete to language model
Epicycle training.
To sum up, the present embodiment melts the corresponding label vector of the term vector of each participle in sample text
It closes, generates fused second fusion vector, thus using the second fusion vector, when being trained to language model,
It enables to data used in train language model more fully, and then promotes the effect of language model.
Fourth embodiment
It should be noted that the present embodiment will be to the mode input data provided using the prior art and using the application
Modelling effect after the mode input data progress model training of offer is introduced.Wherein, what experimental data set used can be with
It is official document corpus, which refers to the corpus in the message file that government unit externally issues, for example, the official document corpus
It can be PTB (Penn Treebank Dataset) text data set, be most widely used at present in language model study
Data set.
One, the existing training process that language model is carried out using neural network
In the existing training process for carrying out language model using neural network, text basic semantic information will be only characterized
Feature (term vector of the word vector of each word or each word in text) is used as input data, carries out experiment instruction to language model
Practice, specific experimentation is as follows:
(1) the word vector of each word in official document corpus is initialized.For example, the vector that can use existing or future appearance is raw
At method, the word vector that each word corresponding 128 is tieed up in the official document corpus is generated, [c can be used1,c2,...,c128] indicate.
(2) by the word vector [c of corresponding 128 dimension of each word1,c2,...,c128] be used as input data, to language model into
Row training.
(3) after the completion of training, the PPL of computational language model.
Specifically, it can use above-mentioned formula (1) and calculate the existing language model using neural metwork training
PPL are as follows: PPL=22.1146914958953.
Two, the corresponding 4 dimension label vector of each word is spliced to behind its corresponding 128 dimension word vector and carries out language
The training of model
It, can after the corresponding 4 dimension label vector of word each in above-mentioned second embodiment step S202 generation sample text
The corresponding 4 dimension label vector of each word to be spliced to behind its corresponding 128 dimension word vector, spliced vector is obtained,
And using the spliced vector as input data, Experiment Training is carried out to language model, specific experimentation is as follows:
(1) word segmentation processing is carried out to official document corpus, obtains location information of each word in corresponding participle in corpus.
(2) location information according to each word in corresponding participle, generates the corresponding 4 dimension label vector of each word, can be with
With [d1,d2,d3,d4] indicate.
(3) the word vector of each word in official document corpus is initialized.For example, the vector that can use existing or future appearance is raw
At method, the word vector that each word corresponding 128 is tieed up in the official document corpus is generated, [c can be still used1,c2,...,c128] indicate.
(4) by the corresponding 4 dimension label vector [d of each word1,d2,d3,d4] it is spliced to its corresponding 128 dimension word vector [c1,
c2,...,c128] behind, obtaining spliced vector is [c1,c2,...,c128,d1,d2,d3,d4]。
(5) by the corresponding spliced vector [c of each word1,c2,...,c128,d1,d2,d3,d4] it is used as input data, it is right
Language model is trained.
(6) after the completion of training, the PPL of computational language model.
Specifically, it can use above-mentioned formula (1) and calculate trained obtained language model correspondence through the above way
PPL are as follows: PPL=10.0910344230461.
Three, the corresponding 4 dimension label vector of each word is inserted into the centre of its corresponding 128 dimension word vector to carry out language
The training of model
It, can after the corresponding 4 dimension label vector of word each in above-mentioned second embodiment step S202 generation sample text
The corresponding 4 dimension label vector of each word to be inserted into the centre of its corresponding 128 dimension word vector, vector after being inserted into,
And using the vector after the insertion as input data, Experiment Training is carried out to language model, specific experimentation is as follows:
(1) word segmentation processing is carried out to official document corpus, obtains location information of each word in corresponding participle in corpus.
(2) location information according to each word in corresponding participle, generates the corresponding 4 dimension label vector of each word, still may be used
With with [d1,d2,d3,d4] indicate.
(3) the word vector of each word in official document corpus is initialized.For example, the vector that can use existing or future appearance is raw
At method, the word vector that each word corresponding 128 is tieed up in the official document corpus is generated, [c can be still used1,c2,...,c128] indicate.
(4) by the corresponding 4 dimension label vector [d of each word1,d2,d3,d4] it is inserted into its corresponding 128 dimension word vector [c1,
c2,...,c128] centre, vector after being inserted into is [c1,c2,...,c64,d1,d2,d3,d4,c65,...,c128.]。
(5) by the vector [c after the corresponding insertion of each word1,c2,...,c64,d1,d2,d3,d4,c65,...,c128.] conduct
Input data is trained language model.
(6) after the completion of training, the PPL of computational language model.
Specifically, it can use above-mentioned formula (1) and calculate trained obtained language model correspondence through the above way
PPL are as follows: PPL=11.669197821116088.
Four, the corresponding 16 dimension label vector of each word is inserted into the centre of its corresponding 128 dimension word vector to carry out language
Say the training of model
It, can after the corresponding 16 dimension label vector of word each in above-mentioned second embodiment step S202 generation sample text
The corresponding 16 dimension label vector of each word to be inserted into the centre of its corresponding 128 dimension word vector, vector after being inserted into,
And using the vector after the insertion as input data, Experiment Training is carried out to language model, concrete implementation process is as follows:
(1) word segmentation processing is carried out to official document corpus, obtains location information of each word in corresponding participle in corpus.
(2) location information according to each word in corresponding participle, generates the corresponding 16 dimension label vector of each word, can be with
With [e1,e2,...,e16] indicate.
(3) the word vector of each word in official document corpus is initialized.For example, the vector that can use existing or future appearance is raw
At method, the word vector that each word corresponding 128 is tieed up in the official document corpus is generated, [c can be still used1,c2,...,c128] indicate.
(4) by the corresponding 16 dimension label vector [e of each word1,e2,...,e16] it is inserted into its corresponding 128 dimension word vector
[c1,c2,...,c128] centre, vector after being inserted into is [c1,c2,...,c64,e1,e2,...,e16,c65,...,
c128.]。
(5) by the vector [c after the corresponding insertion of each word1,c2,...,c64,e1,e2,...,e16,c65,...,c128.]
As input data, language model is trained.
(6) after the completion of training, the PPL of computational language model.
Specifically, it can use above-mentioned formula (1) and calculate trained obtained language model correspondence through the above way
PPL are as follows: PPL=12.28524435023514.
The different PPL value of four obtained as a result, from aforementioned four experimentation, which can be seen that, passes through the embodiment of the present application
Training method, the corresponding PPL value of language model can be greatly reduced, that is, can by the corresponding PPL value of language model from
22.1146914958953 be reduced to 10.0910344230461,11.669197821116088 and
12.28524435023514 so as to which the PPL value of language model is reduced by 45% or so, and then improving language model
Effect.
5th embodiment
A kind of training device of language model will be introduced in the present embodiment, and related content refers to above method implementation
Example.
It referring to fig. 4, is a kind of composition schematic diagram of the training device of language model provided in this embodiment, the device 400
Include:
Training data acquiring unit 401, for obtaining the training data of language model, the training data includes a large amount of
Sample text;
Tag-obtaining unit 402 is segmented, for the sample text to be carried out word segmentation processing, obtains the participle of each participle
Label, the participle label include location information of each word in corresponding participle in corresponding participle;
Language model training unit 403, for the participle label according to participle each in the sample text, to institute's predicate
Speech model is trained.
In a kind of implementation of the present embodiment, the language model training unit 403 includes:
Target word obtains subelement, for using each word in the sample text as target word;
Primary vector generates subelement, for generating the label vector of each target word in the sample text, the mark
Label vector characterizes the relevant information for the correspondence target word for including in the participle label segmented belonging to corresponding target word;
First language model training subelement, it is right for the label vector according to target word each in the sample text
The language model is trained.
In a kind of implementation of the present embodiment, the first language model training subelement includes:
Word vector obtains subelement, for obtaining the word vector of each target word in the sample text;
Primary vector merge subelement, for by the word vector sum label vector of target word same in the sample text into
Row vector fusion, obtains the first fusion vector;
First model training subelement, it is right for merging vector according to the first of target word each in the sample text
The language model is trained.
In a kind of implementation of the present embodiment, the primary vector fusion subelement is specifically used for:
By the label vector of target word same in the sample text, it is inserted into the default of the word vector of the same target word
At position;
Alternatively, the label vector of target word same in the sample text is replaced the word vector of the same target word
In vector element.
In a kind of implementation of the present embodiment, the language model training unit 403 includes:
Target participle obtains subelement, for segmenting each participle in the sample text as target;
Secondary vector generates subelement, described for generating the label vector of each target participle in the sample text
Label vector characterizes the information of the participle label of corresponding target participle;
Second language model training subelement, the label vector for being segmented according to target each in the sample text,
The language model is trained.
In a kind of implementation of the present embodiment, the second language model training subelement includes:
Term vector obtains subelement, for obtaining the term vector of each target participle in the sample text;
Secondary vector merges subelement, term vector and label vector for segmenting target same in the sample text
Vector Fusion is carried out, the second fusion vector is obtained;
Second model training subelement, the second fusion vector for being segmented according to target each in the sample text,
The language model is trained.
In a kind of implementation of the present embodiment, the secondary vector fusion subelement is specifically used for:
The label vector that target same in the sample text is segmented is inserted into the participle vector of the same target participle
Predetermined position;
Alternatively, the label vector that target same in the sample text is segmented, replaces point of the same target participle
Vector element in term vector.
In a kind of implementation of the present embodiment, the participle label further includes general when corresponding participle belongs to participle
Rate.
Further, the embodiment of the present application also provides a kind of training equipment of language model, comprising: processor, storage
Device, system bus;
The processor and the memory are connected by the system bus;
The memory includes instruction, described instruction for storing one or more programs, one or more of programs
The processor is set to execute any implementation method of the training method of above-mentioned language model when being executed by the processor.
Further, described computer-readable to deposit the embodiment of the present application also provides a kind of computer readable storage medium
Instruction is stored in storage media, when described instruction is run on the terminal device, so that the terminal device executes above-mentioned language
Any implementation method of the training method of model.
Further, the embodiment of the present application also provides a kind of computer program product, the computer program product exists
When being run on terminal device, so that the terminal device executes any implementation method of the training method of above-mentioned language model.
As seen through the above description of the embodiments, those skilled in the art can be understood that above-mentioned implementation
All or part of the steps in example method can be realized by means of software and necessary general hardware platform.Based on such
Understand, substantially the part that contributes to existing technology can be in the form of software products in other words for the technical solution of the application
It embodies, which can store in storage medium, such as ROM/RAM, magnetic disk, CD, including several
Instruction is used so that a computer equipment (can be the network communications such as personal computer, server, or Media Gateway
Equipment, etc.) execute method described in certain parts of each embodiment of the application or embodiment.
It should be noted that each embodiment in this specification is described in a progressive manner, each embodiment emphasis is said
Bright is the difference from other embodiments, and the same or similar parts in each embodiment may refer to each other.For reality
For applying device disclosed in example, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place
Referring to method part illustration.
It should also be noted that, herein, relational terms such as first and second and the like are used merely to one
Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation
There are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to contain
Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (19)
1. a kind of training method of language model characterized by comprising
The training data of language model is obtained, the training data includes a large amount of sample text;
The sample text is subjected to word segmentation processing, obtains the participle label of each participle, the participle label includes corresponding point
Location information of each word in corresponding participle in word;
According to the participle label of participle each in the sample text, the language model is trained.
2. the method according to claim 1, wherein the participle according to participle each in the sample text
Label is trained the language model, comprising:
Using each word in the sample text as target word;
The label vector of each target word in the sample text is generated, the label vector characterizes divides belonging to corresponding target word
The relevant information for the correspondence target word for including in the participle label of word;
According to the label vector of target word each in the sample text, the language model is trained.
3. according to the method described in claim 2, it is characterized in that, the mark according to target word each in the sample text
Vector is signed, the language model is trained, comprising:
Obtain the word vector of each target word in the sample text;
The word vector sum label vector of target word same in the sample text is subjected to Vector Fusion, obtain the first fusion to
Amount;
According to the first of target word each in the sample text the fusion vector, the language model is trained.
4. according to the method described in claim 3, it is characterized in that, the word by target word same in the sample text to
Amount and label vector carry out Vector Fusion, comprising:
By the label vector of target word same in the sample text, it is inserted into the predeterminated position of the word vector of the same target word
Place;
Alternatively, by the label vector of target word same in the sample text, in the word vector of the replacement same target word
Vector element.
5. the method according to claim 1, wherein the participle according to participle each in the sample text
Label is trained the language model, comprising:
It is segmented each participle in the sample text as target;
The label vector of each target participle in the sample text is generated, the label vector characterizes corresponding target participle
Segment the information of label;
According to the label vector that target each in the sample text segments, the language model is trained.
6. according to the method described in claim 5, it is characterized in that, it is described according to target each in the sample text participle
Label vector is trained the language model, comprising:
Obtain the term vector of each target participle in the sample text;
By target same in the sample text participle term vector and label vector carry out Vector Fusion, obtain the second fusion to
Amount;
According to the second fusion vector of target each in sample text participle, the language model is trained.
7. according to the method described in claim 6, it is characterized in that, the word that target same in the sample text is segmented
Vector sum label vector carries out Vector Fusion, comprising:
The label vector that target same in the sample text is segmented is inserted into the pre- of the participle vector of the same target participle
If at position;
Alternatively, the label vector that target same in the sample text is segmented, replace the participle of the same target participle to
Vector element in amount.
8. method according to any one of claims 1 to 7, which is characterized in that the participle label further includes corresponding participle
Belong to probability when participle.
9. a kind of training device of language model characterized by comprising
Training data acquiring unit, for obtaining the training data of language model, the training data includes a large amount of sample text
This;
Tag-obtaining unit is segmented, for the sample text to be carried out word segmentation processing, obtains the participle label of each participle, institute
Stating participle label includes location information of each word in corresponding participle in corresponding participle;
Language model training unit, for the participle label according to participle each in the sample text, to the language model
It is trained.
10. device according to claim 9, which is characterized in that the language model training unit includes:
Target word obtains subelement, for using each word in the sample text as target word;
Primary vector generates subelement, for generating the label vector of each target word in the sample text, the label to
Amount characterizes the relevant information for the correspondence target word for including in the participle label segmented belonging to corresponding target word;
First language model training subelement, for the label vector according to target word each in the sample text, to described
Language model is trained.
11. device according to claim 10, which is characterized in that the first language model training subelement includes:
Word vector obtains subelement, for obtaining the word vector of each target word in the sample text;
Primary vector merge subelement, for by the word vector sum label vector of target word same in the sample text carry out to
Amount fusion, obtains the first fusion vector;
First model training subelement, for merging vector according to the first of target word each in the sample text, to described
Language model is trained.
12. device according to claim 11, which is characterized in that the primary vector fusion subelement is specifically used for:
By the label vector of target word same in the sample text, it is inserted into the predeterminated position of the word vector of the same target word
Place;
Alternatively, by the label vector of target word same in the sample text, in the word vector of the replacement same target word
Vector element.
13. device according to claim 9, which is characterized in that the language model training unit includes:
Target participle obtains subelement, for segmenting each participle in the sample text as target;
Secondary vector generates subelement, for generating the label vector of each target participle in the sample text, the label
Vector characterizes the information of the participle label of corresponding target participle;
Second language model training subelement, the label vector for being segmented according to target each in the sample text, to institute
Language model is stated to be trained.
14. device according to claim 13, which is characterized in that the second language model training subelement includes:
Term vector obtains subelement, for obtaining the term vector of each target participle in the sample text;
Secondary vector merges subelement, and the term vector and label vector for segmenting target same in the sample text carry out
Vector Fusion obtains the second fusion vector;
Second model training subelement, the second fusion vector for being segmented according to target each in the sample text, to institute
Language model is stated to be trained.
15. device according to claim 14, which is characterized in that the secondary vector fusion subelement is specifically used for:
The label vector that target same in the sample text is segmented is inserted into the pre- of the participle vector of the same target participle
If at position;
Alternatively, the label vector that target same in the sample text is segmented, replace the participle of the same target participle to
Vector element in amount.
16. according to the described in any item devices of claim 9 to 15, which is characterized in that the participle label further includes corresponding point
Word belongs to probability when participle.
17. a kind of training equipment of language model characterized by comprising processor, memory, system bus;
The processor and the memory are connected by the system bus;
The memory includes instruction for storing one or more programs, one or more of programs, and described instruction works as quilt
The processor makes the processor perform claim require 1-8 described in any item methods when executing.
18. a kind of computer readable storage medium, which is characterized in that instruction is stored in the computer readable storage medium,
When described instruction is run on the terminal device, so that the terminal device perform claim requires the described in any item methods of 1-8.
19. a kind of computer program product, which is characterized in that when the computer program product is run on the terminal device, make
It obtains the terminal device perform claim and requires the described in any item methods of 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910086877.XA CN109800435B (en) | 2019-01-29 | 2019-01-29 | Training method and device for language model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910086877.XA CN109800435B (en) | 2019-01-29 | 2019-01-29 | Training method and device for language model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109800435A true CN109800435A (en) | 2019-05-24 |
CN109800435B CN109800435B (en) | 2023-06-20 |
Family
ID=66559308
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910086877.XA Active CN109800435B (en) | 2019-01-29 | 2019-01-29 | Training method and device for language model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109800435B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110413999A (en) * | 2019-07-17 | 2019-11-05 | 新华三大数据技术有限公司 | Entity relation extraction method, model training method and relevant apparatus |
CN110851596A (en) * | 2019-10-11 | 2020-02-28 | 平安科技(深圳)有限公司 | Text classification method and device and computer readable storage medium |
CN111008528A (en) * | 2019-12-05 | 2020-04-14 | 北京知道智慧信息技术有限公司 | Text processing method and device, electronic equipment and readable storage medium |
CN116612750A (en) * | 2023-05-23 | 2023-08-18 | 苏州科帕特信息科技有限公司 | Automatic training method for language model |
Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060015326A1 (en) * | 2004-07-14 | 2006-01-19 | International Business Machines Corporation | Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building |
US20070198511A1 (en) * | 2006-02-23 | 2007-08-23 | Samsung Electronics Co., Ltd. | Method, medium, and system retrieving a media file based on extracted partial keyword |
CN102411563A (en) * | 2010-09-26 | 2012-04-11 | 阿里巴巴集团控股有限公司 | Method, device and system for identifying target words |
CN104375989A (en) * | 2014-12-01 | 2015-02-25 | 国家电网公司 | Natural language text keyword association network construction system |
CN105159949A (en) * | 2015-08-12 | 2015-12-16 | 北京京东尚科信息技术有限公司 | Chinese address word segmentation method and system |
CN105516499A (en) * | 2015-12-14 | 2016-04-20 | 北京奇虎科技有限公司 | Method and device for classifying short messages, communication terminal and server |
WO2016177069A1 (en) * | 2015-07-20 | 2016-11-10 | 中兴通讯股份有限公司 | Management method, device, spam short message monitoring system and computer storage medium |
CN106156004A (en) * | 2016-07-04 | 2016-11-23 | 中国传媒大学 | The sentiment analysis system and method for film comment information based on term vector |
CN106294863A (en) * | 2016-08-23 | 2017-01-04 | 电子科技大学 | A kind of abstract method for mass text fast understanding |
CN107305549A (en) * | 2016-04-18 | 2017-10-31 | 北京搜狗科技发展有限公司 | Language data processing method, device and the device for language data processing |
CN107423288A (en) * | 2017-07-05 | 2017-12-01 | 达而观信息科技(上海)有限公司 | A kind of Chinese automatic word-cut and method based on unsupervised learning |
CN107622044A (en) * | 2016-07-13 | 2018-01-23 | 阿里巴巴集团控股有限公司 | Segmenting method, device and the equipment of character string |
CN108121700A (en) * | 2017-12-21 | 2018-06-05 | 北京奇艺世纪科技有限公司 | A kind of keyword extracting method, device and electronic equipment |
CN108170674A (en) * | 2017-12-27 | 2018-06-15 | 东软集团股份有限公司 | Part-of-speech tagging method and apparatus, program product and storage medium |
CN108287820A (en) * | 2018-01-12 | 2018-07-17 | 北京神州泰岳软件股份有限公司 | A kind of generation method and device of text representation |
CN108334492A (en) * | 2017-12-05 | 2018-07-27 | 腾讯科技(深圳)有限公司 | Text participle, instant message treating method and apparatus |
CN108595428A (en) * | 2018-04-25 | 2018-09-28 | 杭州闪捷信息科技股份有限公司 | The method segmented based on bidirectional circulating neural network |
CN109192213A (en) * | 2018-08-21 | 2019-01-11 | 平安科技(深圳)有限公司 | The real-time transfer method of court's trial voice, device, computer equipment and storage medium |
CN109271493A (en) * | 2018-11-26 | 2019-01-25 | 腾讯科技(深圳)有限公司 | A kind of language text processing method, device and storage medium |
-
2019
- 2019-01-29 CN CN201910086877.XA patent/CN109800435B/en active Active
Patent Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060015326A1 (en) * | 2004-07-14 | 2006-01-19 | International Business Machines Corporation | Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building |
US20070198511A1 (en) * | 2006-02-23 | 2007-08-23 | Samsung Electronics Co., Ltd. | Method, medium, and system retrieving a media file based on extracted partial keyword |
CN102411563A (en) * | 2010-09-26 | 2012-04-11 | 阿里巴巴集团控股有限公司 | Method, device and system for identifying target words |
CN104375989A (en) * | 2014-12-01 | 2015-02-25 | 国家电网公司 | Natural language text keyword association network construction system |
WO2016177069A1 (en) * | 2015-07-20 | 2016-11-10 | 中兴通讯股份有限公司 | Management method, device, spam short message monitoring system and computer storage medium |
CN105159949A (en) * | 2015-08-12 | 2015-12-16 | 北京京东尚科信息技术有限公司 | Chinese address word segmentation method and system |
CN105516499A (en) * | 2015-12-14 | 2016-04-20 | 北京奇虎科技有限公司 | Method and device for classifying short messages, communication terminal and server |
CN107305549A (en) * | 2016-04-18 | 2017-10-31 | 北京搜狗科技发展有限公司 | Language data processing method, device and the device for language data processing |
CN106156004A (en) * | 2016-07-04 | 2016-11-23 | 中国传媒大学 | The sentiment analysis system and method for film comment information based on term vector |
CN107622044A (en) * | 2016-07-13 | 2018-01-23 | 阿里巴巴集团控股有限公司 | Segmenting method, device and the equipment of character string |
CN106294863A (en) * | 2016-08-23 | 2017-01-04 | 电子科技大学 | A kind of abstract method for mass text fast understanding |
CN107423288A (en) * | 2017-07-05 | 2017-12-01 | 达而观信息科技(上海)有限公司 | A kind of Chinese automatic word-cut and method based on unsupervised learning |
CN108334492A (en) * | 2017-12-05 | 2018-07-27 | 腾讯科技(深圳)有限公司 | Text participle, instant message treating method and apparatus |
CN108121700A (en) * | 2017-12-21 | 2018-06-05 | 北京奇艺世纪科技有限公司 | A kind of keyword extracting method, device and electronic equipment |
CN108170674A (en) * | 2017-12-27 | 2018-06-15 | 东软集团股份有限公司 | Part-of-speech tagging method and apparatus, program product and storage medium |
CN108287820A (en) * | 2018-01-12 | 2018-07-17 | 北京神州泰岳软件股份有限公司 | A kind of generation method and device of text representation |
CN108595428A (en) * | 2018-04-25 | 2018-09-28 | 杭州闪捷信息科技股份有限公司 | The method segmented based on bidirectional circulating neural network |
CN109192213A (en) * | 2018-08-21 | 2019-01-11 | 平安科技(深圳)有限公司 | The real-time transfer method of court's trial voice, device, computer equipment and storage medium |
CN109271493A (en) * | 2018-11-26 | 2019-01-25 | 腾讯科技(深圳)有限公司 | A kind of language text processing method, device and storage medium |
Non-Patent Citations (4)
Title |
---|
JOINT LEARNING OF CHARACTER ANDWORD EMBEDDINGS: "Joint Learning of Character andWord Embeddings", 《IJCAI 2015》 * |
JOINT LEARNING OF CHARACTER ANDWORD EMBEDDINGS: "Joint Learning of Character andWord Embeddings", 《IJCAI 2015》, 25 July 2015 (2015-07-25), pages 1236 - 1240 * |
张婧等: "面向中文社交媒体语料的无监督新词识别研究", 《中文信息学报》 * |
张婧等: "面向中文社交媒体语料的无监督新词识别研究", 《中文信息学报》, vol. 32, no. 3, 15 March 2018 (2018-03-15), pages 17 - 25 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110413999A (en) * | 2019-07-17 | 2019-11-05 | 新华三大数据技术有限公司 | Entity relation extraction method, model training method and relevant apparatus |
CN110851596A (en) * | 2019-10-11 | 2020-02-28 | 平安科技(深圳)有限公司 | Text classification method and device and computer readable storage medium |
CN110851596B (en) * | 2019-10-11 | 2023-06-27 | 平安科技(深圳)有限公司 | Text classification method, apparatus and computer readable storage medium |
CN111008528A (en) * | 2019-12-05 | 2020-04-14 | 北京知道智慧信息技术有限公司 | Text processing method and device, electronic equipment and readable storage medium |
CN116612750A (en) * | 2023-05-23 | 2023-08-18 | 苏州科帕特信息科技有限公司 | Automatic training method for language model |
Also Published As
Publication number | Publication date |
---|---|
CN109800435B (en) | 2023-06-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jin et al. | A novel lexicalized HMM-based learning framework for web opinion mining | |
CN109800435A (en) | A kind of training method and device of language model | |
WO2023065617A1 (en) | Cross-modal retrieval system and method based on pre-training model and recall and ranking | |
CN110765791B (en) | Automatic post-editing method and device for machine translation | |
CN110634487A (en) | Bilingual mixed speech recognition method, device, equipment and storage medium | |
CN110781394A (en) | Personalized commodity description generation method based on multi-source crowd-sourcing data | |
CN113987169A (en) | Text abstract generation method, device and equipment based on semantic block and storage medium | |
CN112100332A (en) | Word embedding expression learning method and device and text recall method and device | |
CN113961685A (en) | Information extraction method and device | |
US11727915B1 (en) | Method and terminal for generating simulated voice of virtual teacher | |
CN111860237A (en) | Video emotion fragment identification method and device | |
CN112861540A (en) | Broadcast television news keyword automatic extraction method based on deep learning | |
CN113705315A (en) | Video processing method, device, equipment and storage medium | |
CN112188311A (en) | Method and apparatus for determining video material of news | |
CN117171303A (en) | Joint multi-mode aspect-level emotion analysis method based on self-adaptive attention fusion | |
CN114281948A (en) | Summary determination method and related equipment thereof | |
CN111950281B (en) | Demand entity co-reference detection method and device based on deep learning and context semantics | |
CN113887244A (en) | Text processing method and device | |
CN116306506A (en) | Intelligent mail template method based on content identification | |
CN114154489A (en) | Triple extraction method, device, equipment and storage medium | |
CN114676699A (en) | Entity emotion analysis method and device, computer equipment and storage medium | |
CN113673222A (en) | Social media text fine-grained emotion analysis method based on bidirectional collaborative network | |
CN110826313A (en) | Information extraction method, electronic equipment and computer readable storage medium | |
CN111801673A (en) | Application program introduction method, mobile terminal and server | |
CN113704488B (en) | Content generation method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |