CN113011176A - Language model training and language reasoning method, device and computer storage medium thereof - Google Patents

Language model training and language reasoning method, device and computer storage medium thereof Download PDF

Info

Publication number
CN113011176A
CN113011176A CN202110271045.2A CN202110271045A CN113011176A CN 113011176 A CN113011176 A CN 113011176A CN 202110271045 A CN202110271045 A CN 202110271045A CN 113011176 A CN113011176 A CN 113011176A
Authority
CN
China
Prior art keywords
language model
training
corpus sample
task
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110271045.2A
Other languages
Chinese (zh)
Inventor
杜晓薇
郝东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yuncong Technology Group Co Ltd
Original Assignee
Yuncong Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yuncong Technology Group Co Ltd filed Critical Yuncong Technology Group Co Ltd
Priority to CN202110271045.2A priority Critical patent/CN113011176A/en
Publication of CN113011176A publication Critical patent/CN113011176A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a language model training method and a language reasoning method and device thereof, and a computer storage medium, wherein the method comprises the steps of determining each training corpus sample to correspond to each functional task label of each granularity level according to each training corpus sample; constructing a language model, taking each training corpus sample as input for the language model to predict, and outputting each functional task prediction label of each training corpus sample corresponding to each granularity level; and then, training a language model according to the label labels of the functional tasks corresponding to the granularity levels of the training corpus samples and the prediction labels of the functional tasks. Therefore, the method and the device can output the reasoning results corresponding to different granularity levels at the same time, and have higher reasoning efficiency.

Description

Language model training and language reasoning method, device and computer storage medium thereof
Technical Field
The embodiment of the application relates to the technical field of language identification, in particular to a language model training method, a language reasoning method, a language model training device and a computer storage medium.
Background
The natural language processing technology based on deep learning can not be separated from a language model. The currently popular language models mainly include: self-coding language models, such as Bert, Roberta, etc., which contain only the encoder portion of the transform, the training method can be generalized to cover a portion of the input sentence, predicting the true value of the covered portion; an autoregressive language model, such as GPT, which includes only the decoder portion of the transform, can be generalized to predict the next word or other context by the preamble. A generative language model, such as Bart, etc., contains only the encoder and decoder parts of the transform, which are trained by inputting the context in the encoder part and generating the context that it wants in the generator part.
The above-mentioned training methods of language models are all aimed at using attention mechanism to perform language recognition only by using context information of the corpus itself, but cannot understand other features in the corpus well, for example, let the language models learn some proper nouns, parts of speech, grammar, semantic roles, and relationships.
Furthermore, existing language models do not specifically focus on the level of granularity of the training task during the training process. Training tasks such as Bert are root word (subword) and sentence based and span Bert is span based, i.e., the granularity levels of the different tasks do not form a system in existing language models.
Moreover, for different application scenarios, it is currently common to rewrite tasks related to the application scenarios based on the existing language models and fine-tune the language models to obtain language models suitable for the application scenarios.
In addition, if multiple problems are involved in the application scenario, it is common to design a language model for each task to form a pipeline by combining multiple language models, however, this approach sacrifices the reasoning efficiency of the language model.
Disclosure of Invention
In view of the foregoing, the present application provides a language model training method, a language reasoning method, a language model training device, and a computer storage medium, which can output prediction results of each functional task of a corpus sample corresponding to different granularity levels, and have higher reasoning efficiency and more accurate reasoning results.
A first aspect of the present application provides a method for training a language model, which includes: determining functional task label labels of the training corpus samples corresponding to the granularity levels according to the training corpus samples; constructing a language model, taking each corpus sample as input to predict the language model, and outputting each functional task prediction label of each corpus sample corresponding to each granularity level; and training the language model according to the functional task label labels and the functional task prediction labels of the training corpus samples corresponding to the granularity levels.
A second aspect of the present application provides a computer storage medium having stored therein instructions for performing the steps of the language model training method of the first aspect
A third aspect of the present application provides a language inference method, including: the language model trained by the language model training method of the first aspect performs inference on a target corpus sample to obtain inference labels of each functional task of each granularity level corresponding to the target corpus sample; and combining the functional task reasoning labels of the target corpus sample corresponding to the granularity levels to obtain a reasoning result of the target corpus sample.
A fourth aspect of the present application provides a computer storage medium having stored therein instructions for performing the steps of the language inference method of the second aspect.
A fifth aspect of the present application provides a language model training device, which includes a training sample obtaining module, configured to determine, according to each training corpus sample, that each training corpus sample corresponds to each functional task label of each granularity level; and the model training module is used for constructing a language model, taking each corpus sample as input to predict the language model, outputting each functional task prediction label corresponding to each granularity level of each corpus sample, and training the language model according to each functional task label corresponding to each granularity level of each corpus sample and each functional task prediction label.
A sixth aspect of the present application provides a language inference device, comprising: the target sample obtaining module is used for obtaining a target corpus sample; a language inference module, configured to perform inference on the target corpus sample by using the language model trained by the language model training apparatus according to the fifth aspect, so as to obtain inference labels of each functional task of each granularity level corresponding to the target corpus sample; and the merging module is used for merging the functional task reasoning labels of the target corpus sample corresponding to the granularity levels to obtain a reasoning result of the target corpus sample.
In summary, by introducing multiple task functions corresponding to different granularity levels, the language model can simultaneously output the function task prediction results based on the granularity levels such as sentence level, root level, word level, span level and the like, so that the method has higher inference efficiency, can learn characteristic information including proper nouns, parts of speech, grammar, semantic roles, reference relations and the like in the training process, and improves the accuracy of language prediction.
In addition, the language model trained by the technical scheme of the embodiment of the application can be directly applied to different task scenes and can also be applied to related task scenes after fine tuning, and the language model executing the fine tuning processing can be quickly converged so as to improve the reasoning execution efficiency.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.
FIG. 1 is a schematic flow chart illustrating a language model training method according to a first embodiment of the present application;
FIG. 2 is a flowchart illustrating a language model training method according to a second embodiment of the present application;
FIG. 3 is a schematic diagram of an embodiment of a language model training method of the present application performing training based on different levels of granularity;
FIG. 4 is a flowchart illustrating a language inference method according to a fourth embodiment of the present application;
FIG. 5 is a schematic flow chart of a language inference method according to a fifth embodiment of the present application;
FIGS. 6 and 7 are schematic diagrams illustrating an architecture of a language model training device according to a seventh embodiment of the present application;
fig. 8 is a schematic structural diagram of a language inference device according to an eighth embodiment of the present application.
Element number
600, a language model training device; 602, training a sample acquisition module; 604, a model training module; 606, a language model; 6062, predictor; 60622, predictor models; 6064, a discriminator; 6066; an evaluator; 800, a language inference device; 802, a target sample acquisition module; 804, a language reasoning module; 806, a merge module.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of the protection of the embodiments in the present application.
As shown in the background section of the present application, current language models are only trained for tasks at a single level of granularity, and no system is formed in the same language model for different levels of granularity of tasks. Moreover, the current language model only carries out reasoning according to the context information of the corpus, and cannot learn the characteristic information such as proper nouns, parts of speech, grammar, semantic roles, reference relations and the like.
Moreover, the current language model cannot be directly applied to a specific application scenario, and if the application scenario involves solving a plurality of problems, a corresponding language model needs to be allocated for each application scenario, and a pipeline composed of a plurality of language models is formed, but this configuration affects the reasoning efficiency of the language model.
In view of the above technical problems of the current language model, the present application provides a language model training method, a language reasoning method, a language model training device, a language reasoning device, and a computer storage medium, which can output reasoning results corresponding to different granularity levels simultaneously, not only improve reasoning efficiency, but also can be applied to different task scenarios.
The following further describes specific implementations of the embodiments of the present application with reference to the drawings of the embodiments of the present application.
First embodiment
Fig. 1 is a flowchart illustrating a language model training method according to a first embodiment of the present application. As shown in the figure, the language model training method of the present embodiment mainly includes:
and S102, determining functional task label labels corresponding to various granularity levels of the training corpus samples according to the training corpus samples.
Optionally, the granularity level includes at least one of a sentence level, a root word (subword) level, a word level, and a span level.
It should be noted that the granularity level is not limited to the above, and may also include chapter level, etc., and those skilled in the art may adjust the granularity level according to the needs, and the present application does not limit the granularity level.
In this embodiment, the functional task label labels can be added manually or automatically by a machine.
For example, a trained language model corresponding to a certain functional task may be used as a reference model (teacher) to perform inference prediction on a non-labeled corpus sample, and a result predicted by the reference model may be labeled as a functional task label of the corpus sample corresponding to the granularity level.
And step S104, constructing a language model, taking each training corpus sample as input for the language model to predict, and outputting each functional task prediction label of each training corpus sample corresponding to each granularity level.
Optionally, the language model may include a predictor.
Specifically, the predictor may be provided to perform prediction on each input corpus sample, and obtain each function task prediction label corresponding to each granularity level for each corpus sample.
And step S106, training a language model according to the functional task label labels and the functional task prediction labels of the training corpus samples corresponding to the granularity levels.
Optionally, the language model may include a discriminator.
Specifically, the discriminator may be provided to obtain the prediction loss of the prediction label of each functional task according to each functional task label and each functional task prediction label corresponding to each granularity level of each training corpus sample, and repeat training of the predictor based on the prediction loss until the predictor training is completed.
In summary, the language model trained by the language model training method of the embodiment can output the prediction labels of the functional tasks corresponding to the training corpus samples at different granularity levels at the same time, and help the language model to understand corpus information at different granularity levels, so as to have higher inference efficiency.
Moreover, the language model trained by the method of the embodiment can be directly applied to different task scenes or applied to related task scenes after fine tuning, and has faster convergence speed when fine tuning occurs.
Second embodiment
Fig. 2 is a flowchart illustrating a language model training method according to a second embodiment of the present application. The present embodiment mainly shows a specific implementation of the step S104, and as shown in the figure, the language model training method of the present embodiment mainly includes:
step S202, inputting each corpus sample.
Optionally, the segmentation process may be performed on each corpus sample according to the text length supported by the language model, so as to obtain at least one text fragment corresponding to each corpus sample.
In this embodiment, if a plurality of text segments are generated corresponding to one corpus sample, each text segment is input into the language model for prediction.
Optionally, preprocessing may be performed on each input corpus sample based on a preset preprocessing rule to improve the prediction accuracy of the language model on each corpus sample.
Optionally, the preset processing rule may include, but is not limited to, at least one of corpus de-noising and normalization.
For example, corpus denoising and normalization processing can be performed for special characters, capitals, simplified and traditional Chinese characters, and the like in the corpus sample.
Alternatively, the language model may comprise a base language model, which may be, for example, a Bert model or a Roberta model, etc.
Optionally, the MLM function task of the basic language model itself may be used to perform degradation processing on the corpus sample to obtain a degraded portion of the corpus sample, and a predictor of the language model is provided to perform prediction on the degraded portion of the corpus sample.
In this embodiment, the degradation process performed by the base language model includes, but is not limited to: masking processing, replacement processing, order inversion processing, order scramble processing, and the like are performed for the corpus samples.
Optionally, the granularity level corresponding to the degeneration process may include one of a sentence level, a root level, a word level, and a span level.
Step S204, aiming at each corpus sample, coding processing is executed according to each granularity level aiming at the corpus sample, and each grammar unit code of the corpus sample corresponding to each granularity level is obtained.
In this embodiment, the language model may include a plurality of embedding modules, and each of the embedding modules may perform encoding on the corpus samples based on a corresponding granularity level to obtain syntax unit encodings corresponding to different granularity levels (please refer to fig. 3).
In an embodiment, a root word or word embedding module may be provided to perform coding processing on the corpus samples based on a root word level or a word level to obtain at least one root word embedding or word embedding corresponding to the corpus samples, where a root word (subword) is a corpus sample for english, a word is a corpus sample for chinese, and a code thereof may be directly obtained from an output of the base language model.
Alternatively, a linear transformation or FFN transformation or the like may be performed on the output result of the base language model.
In an embodiment, the word embedding module may perform encoding processing on the corpus samples based on the word-level classification to obtain at least one word embedding corresponding to the corpus samples.
Optionally, the word embedding may be obtained by adding, averaging, and pooling for root word embedding or word embedding; or, the GRU and Attention operations can be performed for root word embedding or word embedding to obtain word embedding; alternatively, a differencing, stitching process may be performed for the first and last root word embedding or word embedding to obtain word embedding.
Alternatively, the word embeddings acquired by the various methods may be combined arbitrarily, and the word embeddings acquired by the various methods may be subjected to processing such as concatenation or addition.
Moreover, other characteristics, such as the length of the word, the position of the word in the sentence, etc., can also be introduced in the process of performing encoding on the corpus sample by using the word embedding module.
In an embodiment, a span embedding module may be provided to perform an encoding process on the corpus sample based on a span level to obtain at least one span embedding corresponding to the corpus sample.
Specifically, the component units of span embedding can be directly derived from root word embedding or word embedding, and can also be derived from word embedding.
Alternatively, span embedding can be achieved by performing addition, averaging, pooling processing for the constituent units of the span; GRU and Attention processing can be carried out on the composition units to obtain span embedding; alternatively, the difference and concatenation process can be performed for the first and last constituent units to obtain span embedding.
Alternatively, the acquisition modes of the various span embeddings can be arbitrarily combined, and the span embeddings acquired by the various modes can be spliced or added.
Optionally, other features, such as the length of the span, the position of the span in the sentence, etc., can also be introduced in the process of performing encoding on the corpus sample by using the span embedding module.
In one embodiment, a sentence embedding module may be provided to perform an encoding process on the corpus samples based on the sentence level to obtain at least one sentence embedding corresponding to the corpus samples.
In particular, the sentence embedding module is used to encode one or more sentences or a text fragment (depending on whether the input of the language model is one or more sentences or a piece of text).
Optionally, the sentence embedding module may perform a linear or FFN transformation for CLS encoding of the language model to obtain sentence embedding.
Alternatively, the digital conversion process may also be performed for each syntax unit encoding of the corpus sample corresponding to each granularity level.
Alternatively, each syntax unit coding corresponding to a different level of granularity may be represented using a mapping relationship.
For example, if a triplet (span _ id, char _ start _ ind, char _ end _ ind) represents syntax unit coding at a span level, the triplet (1, 2, 7) represents that characters from 2 to 7 constitute syntax unit coding at a 1 st span level.
Step S206, according to each granularity level corresponding to each grammar unit code, at least one function task corresponding to each granularity level is determined.
In the present embodiment, the function tasks corresponding to the root word level or the word level (granularity level) may include a proper-name word class task, a part-of-speech class task, and the like.
Specifically, the functional tasks corresponding to the root level or the word level are, for example: command entity recognition task or part-of-speech tagging task, etc. The command entity recognition task or the part of speech tagging task can be linearly classified based on root word embedding or word embedding, and can also be realized by using a conditional random field (crf).
In the present embodiment, the functional tasks corresponding to the word level (granularity level) may include a proper-name word class task, a part-of-speech class task, a grammar class task, and the like.
Specifically, the functional tasks corresponding to the word level are, for example: command entity recognition tasks, part-of-speech tagging tasks, syntax dependency tree tasks, and the like. The grammar dependency tree task may include finding a corresponding core word for each word and classifying a relationship between the word and the core word.
In the present embodiment, the functional tasks corresponding to the span level (granularity level) can be included in syntax class tasks, semantic class tasks, reference resolution class tasks, and the like.
Specifically, the functional tasks corresponding to the span level are, for example: a syntax component tree task, a semantic role labeling task and a reference resolution task. Wherein the content of the first and second substances,
after all the span embeddings are obtained, the span embeddings can be subjected to linear classification, and a syntactic component tree analysis task and a semantic role labeling task can be completed according to different types. For the reference resolution task, it is to obtain a set of span embeddings that refer to the same thing.
Alternatively, candidates, i.e., pronouns, indicative descriptions, etc., can be screened from different span embeddings, and by combining two candidates, it can be determined whether different span embeddings refer to the same thing. For example, two span embeddings can be stitched, linear or FFN transformed, and then classified.
In this embodiment, the functional tasks corresponding to the sentence level are, for example: text classification, text inclusion, sentence continuity judgment, and the like. All the above functional tasks are realized by linear classification after sentences are embedded.
In this embodiment, a classification task based on sentence level may also be selectively performed, where the classification task may classify the category of a single sentence embedding, may also classify a pair of sentence embedding to determine whether two sentence embedding are semantically consistent or continuous, and may even classify for multiple sentence embedding.
In this embodiment, the functional tasks corresponding to the chapter level can be implemented by segmenting the chapters into a plurality of text segments.
And step S208, respectively executing prediction aiming at each grammar unit code of each granularity level based on each determined function task, and obtaining each function task prediction label of the training corpus sample corresponding to each granularity level.
Optionally, the predictor includes a plurality of predictor models, each predictor model is configured to perform prediction for each syntax unit code of each granularity level based on each function task, and obtain a prediction label of each function task of the training corpus sample corresponding to each granularity level.
Alternatively, the predictor submodels may be combined in any combination as actually needed to specify the prediction to perform one or more functional tasks.
Wherein the specified operation of the one or more functional tasks may be one of manually configured or randomly selected.
Step S210, obtaining each prediction loss of each prediction submodel according to each function task label and each function task prediction label of each training corpus sample corresponding to each granularity level.
Specifically, each prediction loss of each prediction submodel can be obtained by the discriminator according to each functional task label and each functional task prediction label of each granularity level corresponding to each training corpus sample.
Optionally, the prediction loss may include at least one of a cross-entropy loss or a maximum likelihood estimation loss.
Alternatively, each functional task label corresponding to each granularity level of each corpus sample can be obtained by each reference submodel.
Specifically, each corpus sample may be input into each reference submodel, so that each reference submodel predicts each corpus sample of each granularity level based on each functional task, obtains each functional task reference label of each corpus sample corresponding to each granularity level, and uses each functional task reference label as each functional task label to train each prediction submodel.
In this embodiment, the reference sub-model may be adjusted based on a specific functional task to the language model when the language model is trained to a certain stage, so that the language model can achieve better prediction performance on the specific functional task to form the reference sub-model corresponding to the functional task.
In other embodiments, the reference sub-model may be other trained models with excellent performance corresponding to a specific functional task.
Alternatively, the functional task reference output by the reference submodel may be an explicit one-class result (i.e. hard predictor) or a probability distribution result on different classes (i.e. soft predictor).
In step S212, it is determined whether each prediction loss of each predictor model satisfies a predetermined convergence condition, if so, step S214 is performed, and if not, step S208 is returned to.
Alternatively, a determination result that the predicted loss satisfies the preset convergence condition may be obtained when it is determined that the predicted loss converges to the stable value.
Optionally, the language model further includes an evaluator, and the evaluator may obtain a determination result that the predicted loss satisfies the preset convergence condition when the predicted loss of the predictor model satisfies the preset evaluation index according to each preset evaluation index corresponding to each functional task.
Step S214, the predictor training is completed, and the process is ended.
In summary, the language model training method of this embodiment may perform coding processing on each corpus sample based on different granularity levels to obtain each syntax unit code corresponding to each granularity level, and then execute a corresponding function task according to each syntax unit code corresponding to different granularity levels to obtain each function task prediction label corresponding to each granularity level of the corpus sample. Therefore, by introducing various functional tasks based on different granularity levels, the embodiment helps the language model to understand corpus information on different granularity levels, and enables the language model to learn characteristic information such as proper nouns, parts of speech, grammars, semantic roles and reference relations simultaneously in the training process so as to improve the accuracy of a prediction result.
Moreover, the language model trained by the embodiment can be directly applied to relevant task scenes, or can be applied after fine tuning based on specific task scenes, and the task model trained by the embodiment has better prediction performance aiming at the specific task scenes, so that the technical effect of faster convergence can be achieved even if fine tuning processing occurs.
Third embodiment
A third embodiment of the present application provides a computer storage medium, in which instructions for executing the steps of the language model training method according to the first embodiment or the second embodiment are stored.
Fourth embodiment
Fig. 4 shows a flow chart of a language inference method according to a fourth embodiment of the present application. As shown in the figure, the language inference method of the present embodiment mainly includes:
and S402, reasoning the target corpus sample by using the language model trained by the language model training method to obtain the functional task reasoning labels of the target corpus sample corresponding to each granularity level.
And S404, merging the functional task reasoning labels of the target corpus sample corresponding to the granularity levels to obtain a reasoning result of the target corpus sample.
Fifth embodiment
Fig. 5 shows a flow chart of a language inference method according to a fifth embodiment of the present application. As shown in the figure, the language inference method of the present embodiment mainly includes:
step S502, according to the text length supported by the language model, the segmentation processing is executed for the target corpus sample, and at least one target text segment corresponding to the target corpus sample is obtained.
Step S504, the target text segments are respectively input into a language model, so that the language model can respectively carry out reasoning aiming at the target text segments, and the reasoning results of the segments corresponding to the target text segments are obtained.
Specifically, the language model can be provided to perform inference aiming at each target text segment, obtain each function task inference label corresponding to each granularity level of each target text segment, and then combine each function task inference label to obtain each segment inference result corresponding to each target text segment.
And S506, combining the reasoning results of the fragments corresponding to the target text fragments to obtain the reasoning result of the target corpus sample.
In summary, the language inference method according to the fourth embodiment and the fifth embodiment of the present application executes language inference based on the language models trained by the first embodiment and the second embodiment, and can be applied to inference for different task scenarios, and has higher inference efficiency.
Sixth embodiment
A sixth embodiment of the present application provides a computer storage medium, in which instructions for executing the steps of the language model training method according to the fourth embodiment or the fifth embodiment are stored.
Seventh embodiment
Fig. 6 is a schematic diagram illustrating an architecture of a language model training apparatus according to a seventh embodiment of the present application. As shown in the figure, the language model training apparatus 600 of the present embodiment mainly includes: a training sample acquisition module 602 and a model training module 604.
The training sample obtaining module 602 is configured to determine, according to each training corpus sample, each functional task label corresponding to each granularity level of each training corpus sample.
Optionally, the training sample obtaining module 602 further performs segmentation processing on each of the training corpus samples according to the text length supported by the language model 606, so as to obtain at least one text fragment corresponding to each of the training corpus samples.
Optionally, the training sample obtaining module 602 further performs preprocessing on each training corpus sample based on a preset preprocessing rule; wherein the preset preprocessing rule comprises: and denoising and normalizing the corpus.
The model training module 604 is configured to construct a language model 606, use each corpus sample as an input for the language model 606 to predict, output each functional task prediction label corresponding to each granularity level of each corpus sample, and train the language model 606 according to each functional task label corresponding to each granularity level of each corpus sample and each functional task prediction label.
Optionally, the model training module 604 further performs prediction on each text segment to obtain a prediction label of each functional task corresponding to each granularity level for each text segment.
Optionally, the language model 606 includes a predictor 6062 and a discriminator 6064 (refer to fig. 7), and the model training module 604 further takes each of the corpus samples as an input, takes each of the functional task prediction labels corresponding to each of the granularity levels of each of the corpus samples as an output, and trains the predictor 6062; obtaining a prediction loss of each functional task prediction label according to each functional task label and each functional task prediction label corresponding to each granularity level of each training corpus sample by the discriminator 6064; the step of training the predictor 6062 is repeated based on the predicted loss until the predictor 6062 training is complete.
Optionally, the model training module 604 further performs a degradation process on the corpus sample, obtains a degraded portion of the corpus sample, and provides the predictor 6062 to perform prediction on the degraded portion of the corpus sample; wherein the degradation process includes at least one of a masking process, a replacement process, a reverse order process, and a shuffle process.
Optionally, the model training module 604 further performs, for each corpus sample, coding processing on the corpus sample according to each granularity level to obtain each syntax unit code of the corpus sample corresponding to each granularity level; determining at least one function task corresponding to each granularity level according to each granularity level corresponding to each grammar unit code, and respectively executing prediction aiming at each grammar unit code of each granularity level based on each determined function task to obtain each function task prediction label of the training corpus sample corresponding to each granularity level.
Optionally, the granularity level includes at least one of a sentence level, a root level, a word level, and a span level.
Optionally, the functional task includes at least one of a proper name word class task, a part of speech class task, a grammar class task, a semantic class task, and a reference resolution class task.
Optionally, the prediction loss comprises at least one of a cross-entropy loss or a maximum likelihood estimation loss.
Optionally, the model training module 604 is further configured to repeat the training of the predictor 6062 based on the predicted loss if the predicted loss does not satisfy a preset convergence condition; if the predicted loss satisfies the predetermined convergence condition, the predictor 6062 completes training.
Optionally, the preset convergence condition comprises when the predicted loss converges to a stable value.
Optionally, the predictor 6062 further includes a plurality of predictor models 60622, and the model training module 604 is further configured to provide each of the predictor models 60622 to perform prediction for each of the syntax unit codes of each of the granularity levels based on each of the functional tasks, so as to obtain each of the functional task prediction labels of the training corpus samples corresponding to each of the granularity levels.
Optionally, the language model 606 further includes an evaluator 6066 (refer to fig. 7), and the model training module 604 is further configured to obtain, by using the discriminator 6064, each prediction loss of each predictor model 60622 according to each functional task label and each functional task prediction label corresponding to each granularity level of each training corpus sample; and when the prediction loss of the predictor model 60622 meets the preset evaluation indexes by using the evaluator 6066 according to the preset evaluation indexes corresponding to the functional tasks, judging that the training of the predictor model 60622 is finished.
Optionally, the model training module 604 is further configured to input each corpus sample into each reference sub-model, so that each reference sub-model predicts each corpus sample of each granularity level based on each function task, and obtains each function task reference label of each corpus sample corresponding to each granularity level; using each of the functional task reference labels as each of the functional task label labels to train each of the predictor models 60622; wherein the reference submodel is obtainable by adapting the language model 606 based on the functional task.
In addition, the language model training device 600 of the embodiment of the present invention can also be used to implement other steps in the foregoing language model training method embodiments, and has the beneficial effects of the corresponding method step embodiments, which are not described herein again.
Eighth embodiment
Fig. 8 shows an architecture diagram of a language inference device according to an eighth embodiment of the present application. As shown, the language inference apparatus 800 of the present embodiment mainly includes a target sample obtaining module 802, a language inference module 804 and a merging module 806.
The target sample obtaining module 802 is configured to obtain a target corpus sample.
Optionally, the target sample obtaining module 802 performs segmentation processing on the target corpus sample based on the text length supported by the language model, so as to obtain at least one target text segment corresponding to the target corpus sample.
The language inference module 804 is configured to perform inference on the target corpus sample by using the language model trained by the language model training apparatus according to the seventh embodiment, so as to obtain inference labels of each functional task corresponding to each granularity level of the target corpus sample.
Optionally, the language inference module 804 is further configured to provide the language model to perform inference on each target text segment, obtain inference results of each segment corresponding to each target text segment, and combine the inference results of each segment corresponding to each target text segment to obtain the inference result of the target corpus sample.
The merging module 806 is configured to merge the functional task inference labels corresponding to the granularity levels of the target corpus sample, and obtain an inference result of the target corpus sample.
In addition, the language inference device 800 according to the embodiment of the present invention can also be used to implement other steps in the foregoing language inference method embodiments, and has the beneficial effects of the corresponding method step embodiments, which are not described herein again.
In summary, the language model training method, the language inference device, and the computer storage medium provided in the embodiments of the present application introduce task functions of multiple granularity levels, so that the language model can simultaneously output prediction results of each function task of a corpus sample corresponding to different granularity levels, and therefore, the language model of the present application can understand corpus information based on different granularity levels, and can learn feature information including proper nouns, parts of speech, grammars, semantic roles, and reference relations in the training process, so as to improve the accuracy of the prediction results, and the trained language model can be applied to different task scenarios, and has higher inference efficiency.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the embodiments of the present application, and are not limited thereto; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (19)

1. A method for training a language model, comprising:
determining functional task label labels of the training corpus samples corresponding to the granularity levels according to the training corpus samples;
constructing a language model, taking each corpus sample as input to predict the language model, and outputting each functional task prediction label of each corpus sample corresponding to each granularity level; and
and training the language model according to the functional task label labels and the functional task prediction labels of the training corpus samples corresponding to the granularity levels.
2. The language model training method of claim 1, wherein the method further comprises:
and performing segmentation processing on each corpus sample according to the text length supported by the language model to obtain at least one text segment corresponding to each corpus sample, so that the language model performs prediction on each text segment to obtain each functional task prediction label corresponding to each granularity level of each text segment.
3. The language model training method of claim 2, wherein the method further comprises:
performing preprocessing on each training corpus sample based on a preset preprocessing rule;
wherein the preset preprocessing rule comprises: and denoising and normalizing the corpus.
4. The method of claim 1, wherein the language model comprises a predictor and a discriminator; wherein the content of the first and second substances,
the step of taking each corpus sample as an input for the language model to predict, and outputting each functional task prediction label of each corpus sample corresponding to each granularity level specifically includes:
taking each corpus sample as input, taking each functional task prediction label corresponding to each granularity level of each corpus sample as output, and training the predictor;
obtaining the prediction loss of each functional task prediction label according to each functional task label and each functional task prediction label of each training corpus sample corresponding to each granularity level by the discriminator;
repeating the step of training the predictor based on the prediction loss until the predictor training is complete.
5. The language model training method of claim 4, wherein the method further comprises:
performing degradation processing on the corpus sample to obtain a degradation part of the corpus sample, and providing the predictor to perform prediction on the degradation part of the corpus sample; wherein the content of the first and second substances,
the degeneration process includes performing at least one of a masking process, a replacement process, an order inversion process, and an order scrambling process on the corpus samples.
6. The method according to claim 4, wherein the step of inputting each corpus sample for the language model to predict, and the step of outputting each functional task prediction label corresponding to each granularity level for each corpus sample comprises:
for each corpus sample, performing coding processing on the corpus sample according to each granularity level to obtain each grammar unit code of the corpus sample corresponding to each granularity level;
determining at least one function task corresponding to each granularity level according to each granularity level corresponding to each grammar unit code, and respectively executing prediction aiming at each grammar unit code of each granularity level based on each determined function task to obtain each function task prediction label of the training corpus sample corresponding to each granularity level.
7. The language model training method of claim 6, wherein the level of granularity comprises at least one of a sentence level, a root level, a word level, and a span level.
8. The language model training method of claim 7, wherein the functional tasks comprise at least one of a proper name part-of-speech task, a grammar task, a semantics task, and a reference resolution task.
9. A method for language model training as defined in claim 4 wherein the predictive losses include at least one of cross-entropy losses or maximum likelihood estimate losses.
10. The method of claim 9, wherein the step of training the predictor based on the prediction loss is repeated until the predictor training is complete comprises:
if the prediction loss does not meet a preset convergence condition, repeating the step of training the predictor based on the prediction loss; and if the prediction loss meets the preset convergence condition, the predictor is trained.
11. The method of language model training according to claim 10, further comprising:
the preset convergence condition includes when the predicted loss converges to a stable value.
12. The method of claim 4, wherein the predictor further comprises a plurality of predictor models for performing prediction for each of the syntax unit codes of each of the granularity levels based on each of the functional tasks to obtain each of the functional task prediction labels of the corpus sample corresponding to each of the granularity levels.
13. The method of language model training as recited in claim 12, wherein the language model further comprises an evaluator, the method further comprising:
obtaining each prediction loss of each prediction submodel according to each functional task label and each functional task prediction label of each training corpus sample corresponding to each granularity level by using the discriminator;
and judging that the training of the predictor model is finished when the prediction loss of the predictor model meets the preset evaluation indexes by utilizing the evaluator according to the preset evaluation indexes corresponding to the functional tasks.
14. The method of language model training according to claim 12, the method further comprising:
inputting each corpus sample into each reference submodel, so that each reference submodel predicts each corpus sample of each granularity level based on each functional task, and obtains each functional task reference label of each corpus sample corresponding to each granularity level;
taking each function task reference label as each function task label to train each predictor model;
wherein the reference submodel is obtainable by adapting the language model based on the functional task.
15. A method of language inference, comprising:
reasoning is carried out on a target corpus sample by using the language model trained by the language model training method according to any one of claims 1 to 14, and functional task reasoning labels of the target corpus sample corresponding to various granularity levels are obtained;
and combining the functional task reasoning labels of the target corpus sample corresponding to the granularity levels to obtain a reasoning result of the target corpus sample.
16. The language inference method of claim 15, wherein the method further comprises:
performing segmentation processing on the target corpus sample based on the text length supported by the language model to obtain at least one target text segment corresponding to the target corpus sample;
providing the language model to carry out reasoning aiming at each target text segment to obtain a reasoning result of each segment corresponding to each target text segment; and
and combining the reasoning results of all the segments corresponding to all the target text segments to obtain the reasoning result of the target corpus sample.
17. A computer storage medium having stored therein instructions for performing the steps of the language model training method according to any one of claims 1 to 14, or instructions for performing the steps of the language inference method according to any one of claims 15 to 16.
18. A language model training device, comprising:
a training sample obtaining module, configured to determine, according to each training corpus sample, each functional task label, corresponding to each granularity level, of each training corpus sample;
and the model training module is used for constructing a language model, taking each corpus sample as input to predict the language model, outputting each functional task prediction label corresponding to each granularity level of each corpus sample, and training the language model according to each functional task label corresponding to each granularity level of each corpus sample and each functional task prediction label.
19. A language inference apparatus, comprising:
the target sample obtaining module is used for obtaining a target corpus sample;
a language reasoning module, configured to perform reasoning on the target corpus sample by using the language model trained by the language model training apparatus according to claim 18, to obtain functional task reasoning labels of the target corpus sample corresponding to each granularity level;
and the merging module is used for merging the functional task reasoning labels of the target corpus sample corresponding to the granularity levels to obtain a reasoning result of the target corpus sample.
CN202110271045.2A 2021-03-10 2021-03-10 Language model training and language reasoning method, device and computer storage medium thereof Pending CN113011176A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110271045.2A CN113011176A (en) 2021-03-10 2021-03-10 Language model training and language reasoning method, device and computer storage medium thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110271045.2A CN113011176A (en) 2021-03-10 2021-03-10 Language model training and language reasoning method, device and computer storage medium thereof

Publications (1)

Publication Number Publication Date
CN113011176A true CN113011176A (en) 2021-06-22

Family

ID=76406285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110271045.2A Pending CN113011176A (en) 2021-03-10 2021-03-10 Language model training and language reasoning method, device and computer storage medium thereof

Country Status (1)

Country Link
CN (1) CN113011176A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101377769A (en) * 2007-08-29 2009-03-04 中国科学院自动化研究所 Method for representing multiple graininess of text message
CN110489555A (en) * 2019-08-21 2019-11-22 创新工场(广州)人工智能研究有限公司 A kind of language model pre-training method of combination class word information
CN110929767A (en) * 2019-10-24 2020-03-27 云从科技集团股份有限公司 Font processing method, system, device and medium
CN111159416A (en) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 Language task model training method and device, electronic equipment and storage medium
CN111611808A (en) * 2020-05-22 2020-09-01 北京百度网讯科技有限公司 Method and apparatus for generating natural language model
CN111859951A (en) * 2020-06-19 2020-10-30 北京百度网讯科技有限公司 Language model training method and device, electronic equipment and readable storage medium
CN111859987A (en) * 2020-07-28 2020-10-30 网易(杭州)网络有限公司 Text processing method, and training method and device of target task model
CN111897908A (en) * 2020-05-12 2020-11-06 中国科学院计算技术研究所 Event extraction method and system fusing dependency information and pre-training language model
CN112101484A (en) * 2020-11-10 2020-12-18 中国科学院自动化研究所 Incremental event identification method, system and device based on knowledge consolidation
CN112395891A (en) * 2020-12-03 2021-02-23 内蒙古工业大学 Chinese-Mongolian translation method combining Bert language model and fine-grained compression

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101377769A (en) * 2007-08-29 2009-03-04 中国科学院自动化研究所 Method for representing multiple graininess of text message
CN110489555A (en) * 2019-08-21 2019-11-22 创新工场(广州)人工智能研究有限公司 A kind of language model pre-training method of combination class word information
CN110929767A (en) * 2019-10-24 2020-03-27 云从科技集团股份有限公司 Font processing method, system, device and medium
CN111159416A (en) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 Language task model training method and device, electronic equipment and storage medium
CN111897908A (en) * 2020-05-12 2020-11-06 中国科学院计算技术研究所 Event extraction method and system fusing dependency information and pre-training language model
CN111611808A (en) * 2020-05-22 2020-09-01 北京百度网讯科技有限公司 Method and apparatus for generating natural language model
CN111859951A (en) * 2020-06-19 2020-10-30 北京百度网讯科技有限公司 Language model training method and device, electronic equipment and readable storage medium
CN111859987A (en) * 2020-07-28 2020-10-30 网易(杭州)网络有限公司 Text processing method, and training method and device of target task model
CN112101484A (en) * 2020-11-10 2020-12-18 中国科学院自动化研究所 Incremental event identification method, system and device based on knowledge consolidation
CN112395891A (en) * 2020-12-03 2021-02-23 内蒙古工业大学 Chinese-Mongolian translation method combining Bert language model and fine-grained compression

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XINSONG ZHANG: "AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization", 《COMPUTER AND LANGUAGE》 *
墨客无言: "文献阅读:AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization", 《HTTPS://BLOG.CSDN.NET/CHARMVE/ARTICLE/DETAILS/108439749》 *
数据拾光者: "广告行业中那些趣事:从理论到实战BERT知识蒸馏", 《ZHIHU.COM/P/258098344》 *

Similar Documents

Publication Publication Date Title
CN110135457B (en) Event trigger word extraction method and system based on self-encoder fusion document information
CN109992782B (en) Legal document named entity identification method and device and computer equipment
CN110348016B (en) Text abstract generation method based on sentence correlation attention mechanism
CN110534095B (en) Speech recognition method, apparatus, device and computer readable storage medium
CN106776548B (en) Text similarity calculation method and device
CN114580382A (en) Text error correction method and device
CN111341293B (en) Text voice front-end conversion method, device, equipment and storage medium
CN110442880B (en) Translation method, device and storage medium for machine translation
CN112699216A (en) End-to-end language model pre-training method, system, device and storage medium
CN112017643B (en) Speech recognition model training method, speech recognition method and related device
CN110427619B (en) Chinese text automatic proofreading method based on multi-channel fusion and reordering
CN112037773A (en) N-optimal spoken language semantic recognition method and device and electronic equipment
CN111814479B (en) Method and device for generating enterprise abbreviations and training model thereof
CN113609284A (en) Method and device for automatically generating text abstract fused with multivariate semantics
CN114757184B (en) Method and system for realizing knowledge question and answer in aviation field
CN110992943B (en) Semantic understanding method and system based on word confusion network
CN113553847A (en) Method, device, system and storage medium for parsing address text
CN110717316B (en) Topic segmentation method and device for subtitle dialog flow
CN114707518B (en) Semantic fragment-oriented target emotion analysis method, device, equipment and medium
CN115994204A (en) National defense science and technology text structured semantic analysis method suitable for few sample scenes
CN115983203A (en) Voice translation method, device, equipment and readable storage medium
CN113792166B (en) Information acquisition method and device, electronic equipment and storage medium
CN115270795A (en) Small sample learning-based named entity recognition technology in environmental assessment field
CN113011176A (en) Language model training and language reasoning method, device and computer storage medium thereof
CN115203206A (en) Data content searching method and device, computer equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210622

RJ01 Rejection of invention patent application after publication