CN109800435A

CN109800435A - A kind of training method and device of language model

Info

Publication number: CN109800435A
Application number: CN201910086877.XA
Authority: CN
Inventors: 李长亮; 徐智涛; 齐济
Original assignee: Beijing Kingsoft Software Co Ltd; Beijing Jinshan Digital Entertainment Technology Co Ltd
Current assignee: Beijing Kingsoft Software Co Ltd; Beijing Jinshan Digital Entertainment Technology Co Ltd
Priority date: 2019-01-29
Filing date: 2019-01-29
Publication date: 2019-05-24
Anticipated expiration: 2039-01-29
Also published as: CN109800435B

Abstract

This application discloses a kind of training method of language model and devices, the described method includes: after getting including the language model training data of great amount of samples text, these sample texts can be subjected to word segmentation processing, to obtain the participle label of each participle, wherein, participle label includes location information of each word in corresponding participle in corresponding participle, then, language model can be trained according to the participle label of participle each in these sample texts.As it can be seen that the participle label of sample text is trained language model as training data by the application, so that data used in train language model are more fully, so as to reduce the PPL value of language model, the effect of language model is improved.

Description

A kind of training method and device of language model

Technical field

This application involves field of computer technology more particularly to the training methods and device of a kind of language model.

Background technique

Language model can estimate the probability of one section of text, that is, one character string of description belongs to natural language Probability.The application in terms of many natural language processings can be used in language model, for example, speech recognition, machine translation, part of speech Mark, syntactic analysis and information retrieval, etc..

However, the effect of existing language model is ideal not enough, therefore, it is current for how improving the effect of language model Assistant officer's technical problem to be solved.

Summary of the invention

The main purpose of the embodiment of the present application is to provide the training method and device of a kind of language model, is able to ascend language Say the effect of model.

The embodiment of the present application provides a kind of training method of language model, comprising:

The training data of language model is obtained, the training data includes a large amount of sample text；

The sample text is subjected to word segmentation processing, obtains the participle label of each participle, the participle label includes pair Location information of each word in corresponding participle in should segmenting；

According to the participle label of participle each in the sample text, the language model is trained.

Optionally, the participle label according to participle each in the sample text, instructs the language model Practice, comprising:

Using each word in the sample text as target word；

The label vector of each target word in the sample text is generated, the label vector characterizes corresponding target word institute Belong to the relevant information for the correspondence target word for including in the participle label of participle；

According to the label vector of target word each in the sample text, the language model is trained.

Optionally, the label vector according to target word each in the sample text carries out the language model Training, comprising:

Obtain the word vector of each target word in the sample text；

The word vector sum label vector of target word same in the sample text is subjected to Vector Fusion, obtains the first fusion Vector；

According to the first of target word each in the sample text the fusion vector, the language model is trained.

Optionally, the word vector sum label vector by target word same in the sample text carries out Vector Fusion, Include:

By the label vector of target word same in the sample text, it is inserted into the default of the word vector of the same target word At position；

Alternatively, the label vector of target word same in the sample text is replaced the word vector of the same target word In vector element.

It is segmented each participle in the sample text as target；

The label vector of each target participle in the sample text is generated, the label vector characterizes corresponding target point The information of the participle label of word；

According to the label vector that target each in the sample text segments, the language model is trained.

Optionally, it is described according to target each in the sample text segment label vector, to the language model into Row training, comprising:

Obtain the term vector of each target participle in the sample text；

The term vector of target same in sample text participle and label vector are subjected to Vector Fusion, second is obtained and melts Resultant vector；

According to the second fusion vector of target each in sample text participle, the language model is trained.

Optionally, described to melt the term vector of target same in sample text participle and label vector progress vector It closes, comprising:

The label vector that target same in the sample text is segmented is inserted into the participle vector of the same target participle Predetermined position；

Alternatively, the label vector that target same in the sample text is segmented, replaces point of the same target participle Vector element in term vector.

Optionally, the participle label further includes probability when corresponding participle belongs to participle.

The embodiment of the present application also provides a kind of training devices of language model, comprising:

Training data acquiring unit, for obtaining the training data of language model, the training data includes a large amount of sample This text；

Tag-obtaining unit is segmented, for the sample text to be carried out word segmentation processing, obtains the participle mark of each participle Label, the participle label include location information of each word in corresponding participle in corresponding participle；

Language model training unit, for the participle label according to participle each in the sample text, to the language Model is trained.

Optionally, the language model training unit includes:

Target word obtains subelement, for using each word in the sample text as target word；

Primary vector generates subelement, for generating the label vector of each target word in the sample text, the mark Label vector characterizes the relevant information for the correspondence target word for including in the participle label segmented belonging to corresponding target word；

First language model training subelement, it is right for the label vector according to target word each in the sample text The language model is trained.

Optionally, the first language model training subelement includes:

Word vector obtains subelement, for obtaining the word vector of each target word in the sample text；

Primary vector merge subelement, for by the word vector sum label vector of target word same in the sample text into Row vector fusion, obtains the first fusion vector；

First model training subelement, it is right for merging vector according to the first of target word each in the sample text The language model is trained.

Optionally, the primary vector fusion subelement is specifically used for:

Optionally, the language model training unit includes:

Target participle obtains subelement, for segmenting each participle in the sample text as target；

Secondary vector generates subelement, described for generating the label vector of each target participle in the sample text Label vector characterizes the information of the participle label of corresponding target participle；

Second language model training subelement, the label vector for being segmented according to target each in the sample text, The language model is trained.

Optionally, the second language model training subelement includes:

Term vector obtains subelement, for obtaining the term vector of each target participle in the sample text；

Secondary vector merges subelement, term vector and label vector for segmenting target same in the sample text Vector Fusion is carried out, the second fusion vector is obtained；

Second model training subelement, the second fusion vector for being segmented according to target each in the sample text, The language model is trained.

Optionally, the secondary vector fusion subelement is specifically used for:

The embodiment of the present application also provides a kind of training equipment of language model, comprising: processor, memory, system are total Line；

The processor and the memory are connected by the system bus；

The memory includes instruction, described instruction for storing one or more programs, one or more of programs The processor is set to execute any one realization side in the training method of above-mentioned language model when being executed by the processor Formula.

The embodiment of the present application also provides a kind of computer readable storage medium, deposited in the computer readable storage medium Instruction is contained, when described instruction is run on the terminal device, so that the terminal device executes the training of above-mentioned language model Any one implementation in method.

The embodiment of the present application also provides a kind of computer program product, the computer program product is on the terminal device When operation, so that the terminal device executes any one implementation in the training method of above-mentioned language model.

The training method and device of a kind of language model provided by the embodiments of the present application are being got including great amount of samples text After this language model training data, these sample texts can be subjected to word segmentation processing, to obtain the participle mark of each participle Label, wherein participle label includes location information of each word in corresponding participle in corresponding participle then can be according to this The participle label of each participle, is trained language model in a little sample texts.As it can be seen that the embodiment of the present application is by sample text Participle label language model is trained as training data so that data used in train language model are more complete Face improves the effect of language model so as to reduce the PPL value of language model.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the application Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.

Fig. 1 is a kind of flow diagram of the training method of language model provided by the embodiments of the present application；

Fig. 2 is that the participle label provided by the embodiments of the present application according to participle each in sample text carries out language model One of trained flow diagram；

Fig. 3 is that the participle label provided by the embodiments of the present application according to participle each in sample text carries out language model The two of trained flow diagram；

Fig. 4 is a kind of composition schematic diagram of the training device of language model provided by the embodiments of the present application.

Specific embodiment

Currently, people, which generally prefer that, carrys out train language model using neural network, to complete in natural language processing field At language processing tasks such as speech recognition, machine translation, part-of-speech tagging, syntactic analysis and information retrievals, this is because making When carrying out the training of language model with neural network, whole process participates in training without artificial, can be obtained preferable language model Training result.

However, the training process of above-mentioned entire language model be it is sightless, people are difficult to explain can in training process The reason of various phenomenons that can occur, also, during the existing progress language model training using neural network, often Some useful information (the participle label average information such as generated during participle) can be ignored, cause the training effect of model It is not ideal enough.

Specifically, when carrying out word segmentation processing to text using segmenting method (such as based on the segmenting method of dictionary), people People often only focuses on whether word segmentation result is each participle for including in text, if so, can will be each in each participle The feature of this characterization text basic semantic information of the word vector or the term vector of each word of word, the input number as language model According to, to be trained to language model, but often ignore using segmenting method carry out word segmentation processing during generate Other useful informations, such as in text each word segmented belonging to it in location information, i.e., participle label information.So In the existing training process for carrying out language model using neural network, this spy that will only characterize text basic semantic information The mode that sign (term vector of the word vector of each word or each word in text) is trained as input data, training are used Input data it is not comprehensive enough, cause the training result of language model not ideal enough.

To solve drawbacks described above, the embodiment of the present application provides a kind of training method of language model, get including After the training data of great amount of samples text, word segmentation processing can be carried out to these sample texts, and obtain during processing every The participle label of one participle, wherein participle label includes location information of each word in corresponding participle in corresponding participle, is connect , language model can be trained according to the participle label of participle each in these sample texts.As it can be seen that compared to existing Some merely with characterization text basic semantic information feature (term vector of the word vector of each word or each word in text) into The method of row language model training, the application is using the participle label for generate during word segmentation processing to sample text as defeated Enter data to be trained language model, that is, convert this average information of participle label generated during word segmentation processing to Data mode is simultaneously added in the input data of train language model, so that input data used in train language model is more To be comprehensive, so as to reduce fascination degree (Perplexity, abbreviation PPL) value of language model, the effect of language model is improved Fruit.

To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall in the protection scope of this application.

First embodiment

It is a kind of flow diagram of the training method of language model provided in this embodiment referring to Fig. 1, this method includes Following steps:

S101: the training data of language model is obtained, wherein the training data includes a large amount of sample text.

In the present embodiment, the present embodiment will be used to carry out any text definition of language model training as sample text. Also, the languages type of the unlimited sample text processed of the present embodiment, for example, sample text can be Chinese text or English text Deng；The length of the present embodiment also unlimited sample text processed, for example, sample text can be sentence text, be also possible to chapter text This；The source of the present embodiment also unlimited sample text processed, is answered for example, sample text can be interview, meeting, speech, debate etc. With the corresponding text identified of scene.

In the present embodiment, each sample text that can be will acquire constitutes the training number for being used for train language model According to realize the training to language model by subsequent step.

S102: for each sample text, which is subjected to word segmentation processing, obtains the participle mark of each participle Label, wherein the participle label includes location information of each word in corresponding participle in corresponding participle.

In the present embodiment, it after the training data that language model is got by step S101, can use existing or not Segmenting method to occur carries out word segmentation processing to the sample text in training data.For example, minimum entropy calculation can be used first Method is based on official document corpus (i.e. linguistic data) and constructs special dictionary, then, using the dictionary to the sample text in training data Segmented, still, since some words in sample text may be not present in the dictionary, if only rely on the dictionary into Row participle, possibly can not accurately obtain word segmentation result, at this point it is possible to using hidden Markov model (Hidden Markov is based on Model, abbreviation HMM) statistical method sample text is further segmented, to realize to the accurate participle of sample text, And during participle, the participle label of each participle is obtained.

It specifically, is exactly to be input to the participle model based on HMM using sample text as input data and segmented, By the model, the corresponding participle label of each participle in the sample text is exported, participle label includes every in corresponding participle Location information of one word in corresponding participle, for the location information of each word, can be expressed as B (begin), M (middle), One of E (end), S (single).Wherein, B (begin) indicates starting position of the word in corresponding participle, i.e. prefix position It sets；M (middle) indicates middle position of the word in corresponding participle, i.e. position in word；E (end) indicates the word at corresponding point End position in word, i.e. suffix position；S (single) indicates that the word itself is exactly a word, i.e. individual character at word, that is, Say, when S indicates that the word is in corresponding participle (that is, the word itself), both prefix position and also in word position, also in suffix position It sets.

For example: assuming that sample text " I comes Tsinghua University " is used as input data, it is input to point based on HMM Word model is segmented, and by the model, the participle tag combination for each participle that can be exported is " SBEBMME ", Mei Yifen Location information of the corresponding word in corresponding participle, the participle label " S " of such as output illustrate in the corresponding participle of word tag representation For individual character at " I " of word, " B " after " S " illustrates location information of the word " next " in participle " coming " in participle " coming ", That is prefix position, and so on, the participle of the available corresponding each participle label of the sample text and the sample text As a result " I comes Tsinghua University ".

It should be noted that in a kind of optional implementation method, participle label corresponding for each participle, the participle mark Label further comprise the probability (that is, size a possibility that belonging to participle) when corresponding participle belongs to participle.

In this implementation, participle label not only may include position of each word in corresponding participle in corresponding participle Confidence breath can also include probability when corresponding participle belongs to participle, be defined as alpha.Specifically, in this step S102, When the segmenting method using existing or future appearance carries out word segmentation processing to sample text, for example, when utilizing point based on HMM Word model and/or based on the segmenting method of dictionary to sample text carry out word segmentation processing when, during word segmentation processing It obtains each participle and belongs to the probability actually segmented (that is, size a possibility that belonging to a practical participle).For example: By the way that sample text " I comes Tsinghua University " is used as input data, it is input to after the participle model based on HMM segmented, It not only may include " S " by the participle label for the participle " I " that the model exports, can also include that participle " I " belongs to one Probability alpha when practical participle, for example, when the value of alpha is 90%, then when showing that " I " belongs to a practical participle Probability be 90%, that is, " I " has 90% a possibility that belong to participle.

S103: according to the participle label of participle each in the sample text, language model is trained.

In the present embodiment, after the participle label that each participle in sample text is obtained by step S102, further may be used It, specifically, can be by the participle mark of participle each in sample text be trained to language model according to the participle label Label and word segmentation result are input to language model and are trained to it collectively as input data, specifically, can be by sample text In each participle participle label be converted to label vector, with sample text in each word word vector or each participle word to Amount, collectively as input data, is input to language model and is trained to it.

Participle label by introducing each participle of great amount of samples text instructs language model as input data Practice, the PPL value of language model can be effectively reduced, shown in experimental data, PPL value can be reduced to 12 from 22, that is, will The PPL value of language model reduces 45%, to improve the effect of language model.

Wherein, PPL value refers to the index for measuring language model effect quality, is mainly estimated according to each participle Meter belongs to the probability that a word of natural language occurs, and specific formula for calculation is as follows:

Wherein, n indicates the length of sentence, that is, the number of word；P (wi) indicates the probability of i-th of word, for example, first word Probability be p (w1 | w0), and w0 indicates the starting word of sentence, can be indicated with placeholder；PPL (s) is indicated according to each word Probability to estimate that a word s for belonging to natural language occurs then shows p by above-mentioned formula (1) it is found that PPL (s) is smaller (wi) value is bigger, to show that a word s probability of occurrence for belonging to natural language is higher, and then shows the effect of language model Better.

It should be noted that this step S103 may include two kinds of specific implementations, both specific implementations will It is introduced in second embodiment and 3rd embodiment respectively.

To sum up, the training method of language model provided in this embodiment is getting the language including great amount of samples text After model training data, these sample texts can be subjected to word segmentation processing, to obtain the participle label of each participle, wherein Participle label includes that location information of each word in corresponding participle in corresponding participle then can be according to these samples text Participle label of each participle, is trained language model in this.As it can be seen that the embodiment of the present application is by the participle mark of sample text Label are trained language model as training data, so that data used in train language model are more fully, so as to The PPL value for enough reducing language model, improves the effect of language model.

Second embodiment

Under normal conditions, can use each sample text in training data, respectively through the above steps S103 to language Say that model carries out a wheel training, by obtaining final language model after taking turns training more.It should be noted that the present embodiment will be right The specific implementation of above-mentioned steps S103 is introduced, and (each in the sample text to the sample text for using epicycle Word) carry out model training training method be illustrated, the training method using other sample texts is similar therewith, no longer one by one It repeats.

Referring to fig. 2, it illustrates the participle labels provided in this embodiment according to participle each in sample text to language One of the flow diagram that model is trained, the process the following steps are included:

S201: using each word in sample text as target word.

In the present embodiment, in order to which according to the participle label of participle each in sample text, (participle label includes pair Location information of each word in corresponding participle in should segmenting), language model is trained, it first can be by sample text In each word be defined as target word, then each target word is handled by subsequent step, with according to processing result reality Now to the training of language model.

S202: the label vector of each target word in sample text is generated, wherein the label vector characterizes corresponding target The relevant information for the correspondence target word for including in the participle label segmented belonging to word.

In the present embodiment, after obtaining the participle label of each participle in sample text by step S102, due to dividing Word label includes location information of each target word in corresponding participle in corresponding participle therefore can be according to each target Location information of the word in affiliated participle generates the corresponding label vector of each target word.It specifically, can be by each target word Location information be converted to four dimensional vectors, as its corresponding label vector.

In one implementation, the label vector about each target word, the label vector characterize corresponding target word Location information of the correspondence target word for including in the participle label of affiliated participle in affiliated participle.

For example: it is each to segment corresponding participle set of tags still by taking sample text " I comes Tsinghua University " as an example It is combined into " SBEBMME ", respectively corresponds each target word.Wherein, " B " corresponding four-dimensional label vector can for [1,0,0,0], It can be [0,0,1,0], " S " that " M " corresponding four-dimensional label vector, which can be [0,1,0,0], " E " corresponding four-dimension label vector, Corresponding four-dimension label vector can be [0,0,0,1].

In another implementation, the label vector about each target word, the label vector not only characterize correspondence Location information of the correspondence target word for including in the participle label segmented belonging to target word in affiliated participle, also carries correspondence Participle belonging to target word belongs to probability when a practical participle.Specifically, the location information of each target word can be converted After above-mentioned four dimensional vectors being made of 1 and 0, the vector element which is 1 is divided with belonging to corresponding target word Probability alpha when word belongs to participle is multiplied (i.e. 1*alpha), and using four dimensional vectors after multiplication as the mark of corresponding target word Sign vector.

For example: it is based on the example above, the participle mark of the target word " I " in sample text " I comes Tsinghua University " Label not only include " S " for characterizing its location information, can also include probability when target word " I " belongs to a practical participle Alpha, it is assumed that the value of the alpha is 90%, then shows that probability when target word " I " belongs to a practical participle is 90%, It, can also be by the four dimensional vectors intermediate value so as to after its corresponding location information is converted to four dimensional vectors [0,0,0,1] For 1 vector element participle corresponding with target word " I " (i.e. individual character at word " I ") belong to a practical participle when probability Alpha (90%) is multiplied, i.e. 1*90%, to obtain four dimensional vectors [0,0,0,0.9], and by four dimensional vector [0,0,0, 0.9] it is used as target word " I " corresponding label vector.

It should be noted that in the present embodiment, it can also be by the corresponding location information of target word each in sample text Be converted to the multi-C vector of other dimension forms.For example, the corresponding label vector of location information " B " can both be expressed as it is four-dimensional to Measure [1,0,0,0], can also each dimension values in four dimensional vector [1,0,0,0] be extended to 4 dimensions, be expressed as 16 dimensional vectors [1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0]。

S203: according to the label vector of target word each in sample text, language model is trained.

In the present embodiment, it is generated in sample text after the label vector of each target word by step S202, it can be right The label vector of each target word carries out data processing, and is trained according to processing result to language model, specifically, In a kind of implementation of the present embodiment, this step S203 may comprise steps of A1-A3:

Step A1: the word vector of each target word in sample text is obtained.

In this implementation, in order to realize to language model according to the label vector of target word each in sample text Training, the word vector of target word each into sample text available first, for example, can carry out language model training Before, using the vector generation method of existing or future appearance, the word vector of corresponding 128 dimension of each target word is generated.

Step A2: the word vector sum label vector of target word same in sample text is subjected to Vector Fusion, obtains first Merge vector.

In this implementation, the word vector of each target word in sample text is obtained by step A1, and passes through step Rapid S202 is generated in sample text after the label vector of each target word, can be by the word vector of target word same in sample text Vector Fusion is carried out with label vector, to obtain the first fusion vector, then by subsequent step A3 realization to the instruction of language model Practice.Wherein, by the mode that the word vector sum label vector of target word same in sample text carries out Vector Fusion may include with Lower two kinds:

The first Vector Fusion mode is, by the label vector of target word same in sample text, to be inserted into same target word Word vector predetermined position.For example: assuming that the vector of target word is 128 dimensions, [a can be used₁,a₂,...,a₁₂₈] come Indicate, and the corresponding label vector of the target word be [1,0,0,0], then can by the label vector [1,0,0,0] of the target word, It is inserted into the predetermined position of the word vector of the target word, forms the vector of one 132 dimension, for example is inserted into the target word The original position of word vector, then fused vector is [1,0,0,0, a₁,a₂,...,a₁₂₈]。

It should be noted that " predeterminated position " in the first above-mentioned Vector Fusion mode can be set according to the actual situation Set, the application to this without limiting, also, during every wheel is trained, according to the first above-mentioned Vector Fusion mode to every When one target word carries out Vector Fusion, " predeterminated position " should be consistent, such as each target word, can be by the mesh The label vector of marking-up is inserted into the original position of the word vector of the target word, alternatively, can be by the label of the target word Vector is inserted at the end position of the word vector of the target word etc., to guarantee the format consistency of training data.

Second of Vector Fusion mode is, by the label vector of target word same in sample text, to replace same target word Word vector in vector element.For example: it will again be assumed that the vector of target word is 128 dimensions, it can still use [a₁,a₂,..., a₁₂₈] indicate, and the target word corresponding label vector is [1,0,0,0], then can by the label vector of the target word [1, 0,0,0], the vector element in the word vector of the target word is replaced, so that replaced vector is still the vector of one 128 dimension, For example the preceding four-dimensional vector element of the word vector of the target word can be replaced, then fused vector is [1,0,0,0, a₅, a₆,...,a₁₂₈]。

It should be noted that the position of the vector element being replaced in above-mentioned second of Vector Fusion mode can be according to reality Border situation is configured, the application to this without limiting, also, during every wheel is trained, according to above-mentioned second of vector When amalgamation mode carries out Vector Fusion to each target word, the position of the vector element of replacement should be consistent, such as The label vector of the target word can be replaced the preceding four-dimensional vector element of the word vector of the target word by each target word, or The label vector of the target word can be replaced rear four-dimensional vector element of the word vector of the target word etc., to guarantee to instruct by person Practice the format consistency of data.

Step A3: according to the first of target word each in sample text the fusion vector, language model is trained.

In this implementation, the word vector sum label vector of target word same in sample text is carried out by step A2 Vector Fusion can be using the first fusion vector of target word each in sample text as input after obtaining the first fusion vector Data are input to language model, are updated based on model output result to model parameter, to complete the sheet to language model Wheel training.

To sum up, the present embodiment merges the corresponding label vector of the word vector of each word in sample text, Fused first fusion vector is generated, thus can when being trained to language model using the first fusion vector So that data used in train language model are more fully, and then promote the effect of language model.

3rd embodiment

Referring to Fig. 3, it illustrates the participle labels provided in this embodiment according to participle each in sample text to language The two of the flow diagram that model is trained, the process the following steps are included:

S301: it is segmented each participle in sample text as target.

In the present embodiment, in order to which according to the participle label of participle each in sample text, (participle label includes pair Location information of each word in corresponding participle in should segmenting), language model is trained, it first can be by sample text In each participle be defined as target participle, then by subsequent step to each target participle handle, according to processing As a result the training to language model is realized.

S302: the label vector of each target participle in sample text is generated, wherein the label vector characterizes corresponding mesh The information of the participle label of mark participle.

It in the present embodiment, can after obtaining the participle label of each target participle in sample text by step S102 To segment corresponding participle label according to each target, the label vector of each target participle is generated, for example, can be by each mesh Mark segments corresponding participle label and is converted to multi-C vector, as its corresponding label vector.

In one implementation, the label vector about each target participle, the label vector characterize corresponding target The location information about each word in the participle label of participle.

For example: it is each to segment corresponding participle set of tags still by taking sample text " I comes Tsinghua University " as an example It is combined into " SBEBMME ", word segmentation result is " I comes Tsinghua University ", wherein participle label shares tri- kinds of forms of BE, BME, S.Its In, the four-dimensional representation of the label vector based on word each in second embodiment can make " BE " corresponding label vector [1,0,0,0,0,0,1,0], make " BME " corresponding label vector [1,0,0,0,0,1,0,0,0,0,1,0], keep " S " corresponding Label vector be [0,0,0,1], in this way, having obtained the label vector of each target participle in sample text.

In another implementation, the label vector about each target participle, the label vector not only characterize pair The location information about each word in participle label for answering target to segment also carries corresponding target participle and belongs to a reality Probability when participle.Specifically, each target can be segmented corresponding participle label be converted to it is above-mentioned be made of 1 and 0 it is more After dimensional vector, then vector element and corresponding target that the multi-C vector intermediate value is 1 are segmented to probability alpha phase when belonging to participle Multiply (i.e. 1*alpha), and using the multi-C vector after multiplication as the label vector of corresponding target participle.

For example: it is based on the example above, the participle of the target participle " I " in sample text " I comes Tsinghua University " Label not only includes " S " for characterizing its location information, can also include that target participle " I " belongs to when a reality segments Probability alpha, it is assumed that the value of the alpha is 95%, then shows probability when target participle " I " belongs to a practical participle Be 95%, so as to after participle label that its is corresponding is converted to four dimensional vectors [0,0,0,1], can also by the four-dimension to The probability alpha (95%) that the vector element that amount intermediate value is 1 segments when " I " belongs to a practical participle with target is multiplied, i.e. 1* 95%, to obtain four dimensional vectors [0,0,0,0.95], and by four dimensional vector [0,0,0,0.95] as target participle " I " Corresponding label vector.Similarly, " BE " corresponding 8 dimensional vector and " BME " corresponding 12 dimensional vector, also will be in respective vector " 1 " is used as label vector after being multiplied with corresponding alpha.

It should be noted that in the present embodiment, target each in sample text can also be segmented corresponding participle and marked Label are converted to the multi-C vector of other dimension forms.For example, the corresponding label vector of participle label " BE " can both be expressed as eight Dimensional vector [1,0,0,0,0,0,1,0] can also extend each dimension values in the octuple vector [1,0,0,0,0,0,1,0] For 4 dimension, be expressed as 32 dimensional vectors [1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1, 0,0,0,0]。

S303: the label vector segmented according to target each in sample text is trained language model.

In the present embodiment, it is generated in sample text after the label vector of each target participle by step S302, it can be with Data processing is carried out to the label vector of each target participle, and language model is trained according to processing result, it is specific next It says, in a kind of implementation of the present embodiment, this step S303 may comprise steps of B1-B3:

Step B1: the term vector of each target participle in sample text is obtained.

In this implementation, for the label vector segmented according to target each in sample text, realize to language mould The training of type, the term vector of target participle each into sample text available first, for example, language model can carried out Before training, using the vector generation method of existing or future appearance, each target participle is generated (assuming that target participle is by two Word composition) corresponding 256 term vectors tieed up.

Step B2: the term vector of target same in sample text participle and label vector are subjected to Vector Fusion, obtain the Two fusion vectors.

In this implementation, the term vector of each target participle in sample text is obtained by step B1, and is passed through Step S302 is generated in sample text after the label vector of each target participle, can be by target same in sample text participle Term vector and label vector carry out Vector Fusion, to obtain the second fusion vector, then by subsequent step B3 realization to language mould Type is trained.Wherein, the term vector of target same in sample text participle and label vector are carried out to the mode of Vector Fusion May include following two:

The first Vector Fusion mode is that the label vector for segmenting target same in sample text is inserted into same target The predetermined position of the participle vector of participle.For example: assuming that the term vector of target participle is 256 dimensions, [b can be used₁, b₂,...,b₂₅₆] indicate, and the target segment corresponding label vector be " BE " corresponding label vector [1,0,0,0,0,0, 1,0], then the participle vector of target participle can by the label vector [1,0,0,0,0,0,1,0] that the target segments, be inserted into Predetermined position, forms the vector of one 264 dimension, for example is inserted into the initial position of the participle vector of target participle Place, then fused vector is [1,0,0,0,0,0,1,0, b₁,b₂,...,b₂₅₆]。

It should be noted that " predeterminated position " in the first above-mentioned Vector Fusion mode can be set according to the actual situation Set, the application to this without limiting, also, during every wheel is trained, according to the first above-mentioned Vector Fusion mode to every When one target participle carries out Vector Fusion, " predeterminated position " should be consistent, for example each target is segmented, and can be incited somebody to action The label vector of target participle is inserted into the original position of the participle vector of target participle, alternatively, can be by the mesh At the end position for the participle vector that the label vector for marking participle is inserted into target participle etc., to guarantee the format of training data Consistency.

Second of Vector Fusion mode is that the label vector for segmenting target same in sample text replaces same target Vector element in the participle vector of participle.For example: it will again be assumed that the term vector of target participle is 256 dimensions, can still use [b₁,b₂,...,b₂₅₆] indicate, and the target segment corresponding label vector be still " BE " corresponding label vector [1,0,0, 0,0,0,1,0], then the label vector [1,0,0,0,0,0,1,0] that the target segments can be replaced into the participle of target participle Vector element in vector so that replaced vector is still the vector of one 256 dimension, for example can replace target participle The first eight dimensional vector element of vector is segmented, then fused vector is [1,0,0,0,0,0,1,0, b₉,b₁₀,...,b₂₅₆]。

It should be noted that the position for the vector element being replaced in above-mentioned second of Vector Fusion mode can be according to reality Situation is configured, and the application without limiting, also, during every wheel is trained, melts this according to above-mentioned second of vector When conjunction mode segments progress Vector Fusion to each target, the position of the vector element of replacement should be consistent, such as Each target participle, the label vector that can segment the target replace the first eight dimensional vector of the participle vector of target participle Element, alternatively, the label vector that the target can be segmented replaces the rear octuple element vector of the participle vector of target participle Element etc., to guarantee the format consistency of training data.

Step B3: according to the second fusion vector of target each in sample text participle, language model is trained.

In this implementation, by the step B2 term vector for segmenting target same in sample text and label vector into Row vector fusion, after obtaining the second fusion vector, can using target each in sample text segment second fusion vector as Input data is input to language model, is updated based on model output result to model parameter, to complete to language model Epicycle training.

To sum up, the present embodiment melts the corresponding label vector of the term vector of each participle in sample text It closes, generates fused second fusion vector, thus using the second fusion vector, when being trained to language model, It enables to data used in train language model more fully, and then promotes the effect of language model.

Fourth embodiment

It should be noted that the present embodiment will be to the mode input data provided using the prior art and using the application Modelling effect after the mode input data progress model training of offer is introduced.Wherein, what experimental data set used can be with It is official document corpus, which refers to the corpus in the message file that government unit externally issues, for example, the official document corpus It can be PTB (Penn Treebank Dataset) text data set, be most widely used at present in language model study Data set.

One, the existing training process that language model is carried out using neural network

In the existing training process for carrying out language model using neural network, text basic semantic information will be only characterized Feature (term vector of the word vector of each word or each word in text) is used as input data, carries out experiment instruction to language model Practice, specific experimentation is as follows:

(1) the word vector of each word in official document corpus is initialized.For example, the vector that can use existing or future appearance is raw At method, the word vector that each word corresponding 128 is tieed up in the official document corpus is generated, [c can be used₁,c₂,...,c₁₂₈] indicate.

(2) by the word vector [c of corresponding 128 dimension of each word₁,c₂,...,c₁₂₈] be used as input data, to language model into Row training.

(3) after the completion of training, the PPL of computational language model.

Specifically, it can use above-mentioned formula (1) and calculate the existing language model using neural metwork training PPL are as follows: PPL=22.1146914958953.

Two, the corresponding 4 dimension label vector of each word is spliced to behind its corresponding 128 dimension word vector and carries out language The training of model

It, can after the corresponding 4 dimension label vector of word each in above-mentioned second embodiment step S202 generation sample text The corresponding 4 dimension label vector of each word to be spliced to behind its corresponding 128 dimension word vector, spliced vector is obtained, And using the spliced vector as input data, Experiment Training is carried out to language model, specific experimentation is as follows:

(1) word segmentation processing is carried out to official document corpus, obtains location information of each word in corresponding participle in corpus.

(2) location information according to each word in corresponding participle, generates the corresponding 4 dimension label vector of each word, can be with With [d₁,d₂,d₃,d₄] indicate.

(3) the word vector of each word in official document corpus is initialized.For example, the vector that can use existing or future appearance is raw At method, the word vector that each word corresponding 128 is tieed up in the official document corpus is generated, [c can be still used₁,c₂,...,c₁₂₈] indicate.

(4) by the corresponding 4 dimension label vector [d of each word₁,d₂,d₃,d₄] it is spliced to its corresponding 128 dimension word vector [c₁, c₂,...,c₁₂₈] behind, obtaining spliced vector is [c₁,c₂,...,c₁₂₈,d₁,d₂,d₃,d₄]。

(5) by the corresponding spliced vector [c of each word₁,c₂,...,c₁₂₈,d₁,d₂,d₃,d₄] it is used as input data, it is right Language model is trained.

(6) after the completion of training, the PPL of computational language model.

Specifically, it can use above-mentioned formula (1) and calculate trained obtained language model correspondence through the above way PPL are as follows: PPL=10.0910344230461.

Three, the corresponding 4 dimension label vector of each word is inserted into the centre of its corresponding 128 dimension word vector to carry out language The training of model

It, can after the corresponding 4 dimension label vector of word each in above-mentioned second embodiment step S202 generation sample text The corresponding 4 dimension label vector of each word to be inserted into the centre of its corresponding 128 dimension word vector, vector after being inserted into, And using the vector after the insertion as input data, Experiment Training is carried out to language model, specific experimentation is as follows:

(2) location information according to each word in corresponding participle, generates the corresponding 4 dimension label vector of each word, still may be used With with [d₁,d₂,d₃,d₄] indicate.

(4) by the corresponding 4 dimension label vector [d of each word₁,d₂,d₃,d₄] it is inserted into its corresponding 128 dimension word vector [c₁, c₂,...,c₁₂₈] centre, vector after being inserted into is [c₁,c₂,...,c₆₄,d₁,d₂,d₃,d₄,c₆₅,...,c₁₂₈.]。

(5) by the vector [c after the corresponding insertion of each word₁,c₂,...,c₆₄,d₁,d₂,d₃,d₄,c₆₅,...,c_128.] conduct Input data is trained language model.

(6) after the completion of training, the PPL of computational language model.

Specifically, it can use above-mentioned formula (1) and calculate trained obtained language model correspondence through the above way PPL are as follows: PPL=11.669197821116088.

Four, the corresponding 16 dimension label vector of each word is inserted into the centre of its corresponding 128 dimension word vector to carry out language Say the training of model

It, can after the corresponding 16 dimension label vector of word each in above-mentioned second embodiment step S202 generation sample text The corresponding 16 dimension label vector of each word to be inserted into the centre of its corresponding 128 dimension word vector, vector after being inserted into, And using the vector after the insertion as input data, Experiment Training is carried out to language model, concrete implementation process is as follows:

(2) location information according to each word in corresponding participle, generates the corresponding 16 dimension label vector of each word, can be with With [e₁,e₂,...,e₁₆] indicate.

(4) by the corresponding 16 dimension label vector [e of each word₁,e₂,...,e₁₆] it is inserted into its corresponding 128 dimension word vector [c₁,c₂,...,c₁₂₈] centre, vector after being inserted into is [c₁,c₂,...,c₆₄,e₁,e₂,...,e₁₆,c₆₅,..., c₁₂₈.]。

(5) by the vector [c after the corresponding insertion of each word₁,c₂,...,c₆₄,e₁,e₂,...,e₁₆,c₆₅,...,c₁₂₈.] As input data, language model is trained.

(6) after the completion of training, the PPL of computational language model.

Specifically, it can use above-mentioned formula (1) and calculate trained obtained language model correspondence through the above way PPL are as follows: PPL=12.28524435023514.

The different PPL value of four obtained as a result, from aforementioned four experimentation, which can be seen that, passes through the embodiment of the present application Training method, the corresponding PPL value of language model can be greatly reduced, that is, can by the corresponding PPL value of language model from 22.1146914958953 be reduced to 10.0910344230461,11.669197821116088 and 12.28524435023514 so as to which the PPL value of language model is reduced by 45% or so, and then improving language model Effect.

5th embodiment

A kind of training device of language model will be introduced in the present embodiment, and related content refers to above method implementation Example.

It referring to fig. 4, is a kind of composition schematic diagram of the training device of language model provided in this embodiment, the device 400 Include:

Training data acquiring unit 401, for obtaining the training data of language model, the training data includes a large amount of Sample text；

Tag-obtaining unit 402 is segmented, for the sample text to be carried out word segmentation processing, obtains the participle of each participle Label, the participle label include location information of each word in corresponding participle in corresponding participle；

Language model training unit 403, for the participle label according to participle each in the sample text, to institute's predicate Speech model is trained.

In a kind of implementation of the present embodiment, the language model training unit 403 includes:

In a kind of implementation of the present embodiment, the first language model training subelement includes:

In a kind of implementation of the present embodiment, the primary vector fusion subelement is specifically used for:

In a kind of implementation of the present embodiment, the second language model training subelement includes:

In a kind of implementation of the present embodiment, the secondary vector fusion subelement is specifically used for:

In a kind of implementation of the present embodiment, the participle label further includes general when corresponding participle belongs to participle Rate.

Further, the embodiment of the present application also provides a kind of training equipment of language model, comprising: processor, storage Device, system bus；

The processor and the memory are connected by the system bus；

The memory includes instruction, described instruction for storing one or more programs, one or more of programs The processor is set to execute any implementation method of the training method of above-mentioned language model when being executed by the processor.

Further, described computer-readable to deposit the embodiment of the present application also provides a kind of computer readable storage medium Instruction is stored in storage media, when described instruction is run on the terminal device, so that the terminal device executes above-mentioned language Any implementation method of the training method of model.

Further, the embodiment of the present application also provides a kind of computer program product, the computer program product exists When being run on terminal device, so that the terminal device executes any implementation method of the training method of above-mentioned language model.

As seen through the above description of the embodiments, those skilled in the art can be understood that above-mentioned implementation All or part of the steps in example method can be realized by means of software and necessary general hardware platform.Based on such Understand, substantially the part that contributes to existing technology can be in the form of software products in other words for the technical solution of the application It embodies, which can store in storage medium, such as ROM/RAM, magnetic disk, CD, including several Instruction is used so that a computer equipment (can be the network communications such as personal computer, server, or Media Gateway Equipment, etc.) execute method described in certain parts of each embodiment of the application or embodiment.

It should be noted that each embodiment in this specification is described in a progressive manner, each embodiment emphasis is said Bright is the difference from other embodiments, and the same or similar parts in each embodiment may refer to each other.For reality For applying device disclosed in example, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place Referring to method part illustration.

It should also be noted that, herein, relational terms such as first and second and the like are used merely to one Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation There are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to contain Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims

1. a kind of training method of language model characterized by comprising

The sample text is subjected to word segmentation processing, obtains the participle label of each participle, the participle label includes corresponding point Location information of each word in corresponding participle in word；

2. the method according to claim 1, wherein the participle according to participle each in the sample text Label is trained the language model, comprising:

Using each word in the sample text as target word；

The label vector of each target word in the sample text is generated, the label vector characterizes divides belonging to corresponding target word The relevant information for the correspondence target word for including in the participle label of word；

3. according to the method described in claim 2, it is characterized in that, the mark according to target word each in the sample text Vector is signed, the language model is trained, comprising:

Obtain the word vector of each target word in the sample text；

The word vector sum label vector of target word same in the sample text is subjected to Vector Fusion, obtain the first fusion to Amount；

4. according to the method described in claim 3, it is characterized in that, the word by target word same in the sample text to Amount and label vector carry out Vector Fusion, comprising:

By the label vector of target word same in the sample text, it is inserted into the predeterminated position of the word vector of the same target word Place；

Alternatively, by the label vector of target word same in the sample text, in the word vector of the replacement same target word Vector element.

5. the method according to claim 1, wherein the participle according to participle each in the sample text Label is trained the language model, comprising:

It is segmented each participle in the sample text as target；

The label vector of each target participle in the sample text is generated, the label vector characterizes corresponding target participle Segment the information of label；

6. according to the method described in claim 5, it is characterized in that, it is described according to target each in the sample text participle Label vector is trained the language model, comprising:

Obtain the term vector of each target participle in the sample text；

By target same in the sample text participle term vector and label vector carry out Vector Fusion, obtain the second fusion to Amount；

7. according to the method described in claim 6, it is characterized in that, the word that target same in the sample text is segmented Vector sum label vector carries out Vector Fusion, comprising:

The label vector that target same in the sample text is segmented is inserted into the pre- of the participle vector of the same target participle If at position；

Alternatively, the label vector that target same in the sample text is segmented, replace the participle of the same target participle to Vector element in amount.

8. method according to any one of claims 1 to 7, which is characterized in that the participle label further includes corresponding participle Belong to probability when participle.

9. a kind of training device of language model characterized by comprising

Training data acquiring unit, for obtaining the training data of language model, the training data includes a large amount of sample text This；

Tag-obtaining unit is segmented, for the sample text to be carried out word segmentation processing, obtains the participle label of each participle, institute Stating participle label includes location information of each word in corresponding participle in corresponding participle；

Language model training unit, for the participle label according to participle each in the sample text, to the language model It is trained.

10. device according to claim 9, which is characterized in that the language model training unit includes:

Primary vector generates subelement, for generating the label vector of each target word in the sample text, the label to Amount characterizes the relevant information for the correspondence target word for including in the participle label segmented belonging to corresponding target word；

First language model training subelement, for the label vector according to target word each in the sample text, to described Language model is trained.

11. device according to claim 10, which is characterized in that the first language model training subelement includes:

Primary vector merge subelement, for by the word vector sum label vector of target word same in the sample text carry out to Amount fusion, obtains the first fusion vector；

First model training subelement, for merging vector according to the first of target word each in the sample text, to described Language model is trained.

12. device according to claim 11, which is characterized in that the primary vector fusion subelement is specifically used for:

13. device according to claim 9, which is characterized in that the language model training unit includes:

Secondary vector generates subelement, for generating the label vector of each target participle in the sample text, the label Vector characterizes the information of the participle label of corresponding target participle；

Second language model training subelement, the label vector for being segmented according to target each in the sample text, to institute Language model is stated to be trained.

14. device according to claim 13, which is characterized in that the second language model training subelement includes:

Secondary vector merges subelement, and the term vector and label vector for segmenting target same in the sample text carry out Vector Fusion obtains the second fusion vector；

Second model training subelement, the second fusion vector for being segmented according to target each in the sample text, to institute Language model is stated to be trained.

15. device according to claim 14, which is characterized in that the secondary vector fusion subelement is specifically used for:

16. according to the described in any item devices of claim 9 to 15, which is characterized in that the participle label further includes corresponding point Word belongs to probability when participle.

17. a kind of training equipment of language model characterized by comprising processor, memory, system bus；

The processor and the memory are connected by the system bus；

The memory includes instruction for storing one or more programs, one or more of programs, and described instruction works as quilt The processor makes the processor perform claim require 1-8 described in any item methods when executing.

18. a kind of computer readable storage medium, which is characterized in that instruction is stored in the computer readable storage medium, When described instruction is run on the terminal device, so that the terminal device perform claim requires the described in any item methods of 1-8.

19. a kind of computer program product, which is characterized in that when the computer program product is run on the terminal device, make It obtains the terminal device perform claim and requires the described in any item methods of 1-8.