CN115080736A - Model adjusting method and device of discriminant language model - Google Patents

Model adjusting method and device of discriminant language model Download PDF

Info

Publication number
CN115080736A
CN115080736A CN202210567681.4A CN202210567681A CN115080736A CN 115080736 A CN115080736 A CN 115080736A CN 202210567681 A CN202210567681 A CN 202210567681A CN 115080736 A CN115080736 A CN 115080736A
Authority
CN
China
Prior art keywords
language model
model
discriminant
training
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210567681.4A
Other languages
Chinese (zh)
Inventor
刘知远
孙茂松
王建勇
姚远
董博文
张正彦
谢若冰
林乐宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tsinghua University
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Tencent Technology Shenzhen Co Ltd filed Critical Tsinghua University
Priority to CN202210567681.4A priority Critical patent/CN115080736A/en
Publication of CN115080736A publication Critical patent/CN115080736A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a model adjusting method and a device of a discriminant language model, wherein the method comprises the following steps: acquiring a discriminant language model for pre-training and a training data set of a downstream task; responding to a task request, and adjusting the pre-trained language model according to the type of the task request and the training data set; wherein the discriminant language model is obtained by training a text sample. By adjusting the model parameters of the discriminant language model at different downstream task stages, the difference between the discriminant language model at the model pre-training stage and the downstream task is eliminated, and the overall effect of the model is improved.

Description

Model adjusting method and device of discriminant language model
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method and a device for adjusting a model of a discriminant language model.
Background
Text Classification (Text Classification) and Question Answering (Question Answering) are important tasks in the field of natural language processing, and are widely applied in daily life, for example, Text Classification can be used for classifying news types, judging sentiment colors of comments, judging whether texts accord with grammars and the like, and Question Answering can be used for knowledge Question-Answering robots and the like. For both tasks, the best available work is the large scale pre-trained language model (PLM).
Pre-training a language model refers to dividing the model into two stages: pre-training and model fine tuning. Pre-training is performed on a large-scale corpus, and in this stage, a model learns a representation method of a universal text, namely the 'commonality' of knowledge; model tuning is performed on the downstream task data set, at which stage the model needs to adapt to the task quickly. Model tuning refers to the tuning based on a Pre-trained model (Fine Tune) given. Trimming saves a lot of computing resources and computing time relative to Training from scratch (Training a model from scratch). The pre-training and the model fine-tuning are separated, so that the training cost can be greatly saved on the premise that the model has strong knowledge potential. The pre-training phase is usually completed and downloaded by Google, Facebook and other companies and organizations, so the effect of the model fine-tuning phase directly affects the quality of the model.
The key of limiting the pre-training language model at present is the difference between the pre-training task and the downstream task in terms of form, and how to eliminate the difference is the key point and the difficulty for improving the model fine-tuning effect.
Disclosure of Invention
The invention provides a model fine-tuning method and a model fine-tuning device of a discriminant language model, which are used for solving the defect that the discriminant language model has formal difference between a pre-training task stage and a downstream task stage in the prior art, eliminating the difference between the pre-training stage and the downstream task stage of the discriminant language model and improving the effect of model fine-tuning.
The invention provides a model adjusting method of a discriminant language model, which comprises the following steps:
acquiring a discriminant language model for pre-training and a training data set of a downstream task;
responding to a task request, and adjusting the pre-trained language model according to the type of the task request and the training data set;
wherein the discriminant language model is obtained by training a text sample.
According to the model adjusting method of the discriminant language model, the pre-trained language model is adjusted based on the Prompt-Tuning paradigm.
According to the model adjusting method of the discriminant language model provided by the invention, in response to a task request, the pre-trained language model is adjusted according to the type of the task request and the training data set, and the method specifically comprises the following steps:
acquiring an input text in the training data set;
if the type of the task request is a text classification task, then:
encoding the input text into a token group comprising a plurality of tokens, and inserting a category token after the tokens;
extracting a vector corresponding to the class token, and performing inner product processing on the vector and a shared vector with the same length to obtain a probability value corresponding to the vector;
and adjusting the pre-trained language model based on the probability value.
According to the model adjusting method of the discriminant language model provided by the invention, the vector and the shared vector with the same length are subjected to inner product processing, and then the method further comprises the following steps:
and activating the obtained inner product processing result through a Sigmoid function to obtain a probability value corresponding to the vector.
According to the model adjusting method for the discriminant language model provided by the invention, the adjusting of the pre-trained language model based on the probability value specifically comprises the following steps:
setting the probability value as p, and taking 1-p as a real probability value;
calculating model loss according to the real probability value;
fine-tuning the pre-trained language model based on the model loss.
According to the model adjusting method of the discriminant language model provided by the invention, in response to a task request, the pre-trained language model is adjusted according to the type of the task request and the training data set, and the method specifically comprises the following steps:
obtaining a model discrimination head of the pre-trained discriminant language model in a pre-training stage;
if the type of the task request is a question-answering task, the following steps are carried out:
and adjusting the pre-trained language model based on the training data set and the model discrimination head.
According to the model adjusting method of the discriminant language model provided by the invention,
the pre-training process of the discriminant language model comprises the following steps:
acquiring the text sample and an initial discriminant language model, and encoding each letter of the text sample into a corresponding token; the initial discriminant language model comprises a generator and a discriminator;
generating a replacing token by the generator to replace a plurality of tokens corresponding to the text sample;
judging the token corresponding to the replaced text sample through the judger to obtain a judgment result;
and inputting the token corresponding to the replaced text sample and the discrimination result into the initial discriminant language model for training to obtain the pre-trained discriminant language model.
The invention also provides a model adjusting device of the discriminant language model, which comprises: the acquisition module is used for acquiring a discriminant language model to be pre-trained and a training data set of a downstream task;
the adjusting module is used for responding to a task request and adjusting the pre-trained language model according to the type of the task request and the training data set;
wherein the discriminant language model is obtained by training a text sample.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the model adjusting method of the discriminant language model.
The present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of model tuning of a discriminative language model as described in any one of the above.
The present invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a method for model adjustment of a discriminative language model as described in any of the above.
According to the model adjusting method and device for the discriminant language model, provided by the invention, the discriminant language model is subjected to model parameter adjustment at different downstream task stages, so that the difference between the model pre-training stage and the downstream task of the discriminant language model is eliminated, and the overall effect of the model is improved. When the downstream task is a text classification task or a question and answer task, the accuracy of the model in the text classification or question and answer task is improved by adjusting the model of the discriminant language model in the downstream task stage.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for adjusting a discriminant language model according to the present invention;
FIG. 2 is a schematic diagram of a discriminative language model framework provided by the present invention;
FIG. 3 is a schematic diagram of a pre-training phase for a discriminative language model provided by the present invention;
FIG. 4 is a diagram illustrating Fine Tuning based on Fine-Tuning for a discriminant language model according to the present invention;
FIG. 5 is a schematic diagram of a Prompt-Tuning based Tuning for a discriminative language model according to the present invention;
FIG. 6 is an experimental result of the discriminant language model provided by the present invention during a text classification task;
FIG. 7 is an experimental result of the discriminant language model provided by the present invention during question answering tasks;
FIG. 8 is the experimental results of the ablation experiments performed by the Prompt-Tuning framework for discriminant language models provided in the present invention;
FIG. 9 is a graph of the stability test results of the Prompt-Tuning framework for the discriminant language model according to the present invention;
FIG. 10 is a schematic structural diagram of a model adjustment apparatus for a discriminant language model according to the present invention;
fig. 11 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The model fine Tuning method with the best effect at present is called as Prompt-Tuning, and the difference between the pre-training stage and the downstream task is eliminated mainly by constructing Prompt, so that the model fine Tuning effect is improved. Constructing a task-specific Prompt to adjust a pre-training language model (PLM) is a promising method for text classification tasks. Previous studies showed that in low data scenarios, Prompt-Tuning has a significant advantage over the Fine-Tuning approach with additional classifiers.
The core idea of Prompt-Tuning is to insert text fragments (i.e., templates) into the input and convert the classification problem into a mask language modeling problem, where the key step is to construct a mapping between word space and label space using a language formulator.
However, the prior Prompt-Tuning correlation works are for generative language models (GLM, such as BERT, RoBERTa, etc.), and not for discriminant language models (DLM, such as eletra).
The pre-training process of the generative language model replaces part of the original tokens with [ MASK ] tokens to corrupt the input and trains the model to regenerate the original tokens, which has the disadvantage of defining the tasks on a small subset that is replaced. The pre-training process of the discriminant language model is to generate a replacement token by using a generating network and train the model to judge whether the token is replaced or not, and the task is defined on all the inputs. Compared with the generative language model, the discriminant language model has better performance and higher computational efficiency, however, due to the problems of unstable fine tuning process, the application of the discriminant language model is greatly limited.
In conclusion, the text classification and question-answering task is an important task in the field of natural language processing, the feasibility and performance of the project-Tuning design for the discriminant language model are explored, the problems of unstable fine Tuning and the like are solved, and the possibility is provided for better utilizing the pre-training language model to complete the two tasks.
The following describes a model adjustment method of a discriminant language model according to the present invention with reference to fig. 1.
In machine learning, supervised learning can be classified into two types of models: discriminant models and generative models. Briefly, a discriminant model models a conditional distribution, while a generative model models a joint distribution. Discriminant language models are particularly attractive for trimming because they are typically competitive and computationally efficient compared to generative language models.
The invention provides a model adjusting method aiming at a discriminant language model.
Fig. 1 is a schematic flowchart of a method for adjusting a model of a discriminant language model according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step 110, a pre-trained discriminant language model and a training data set of a downstream task are obtained. Firstly, the discriminant language model needs to be pre-trained in a pre-training stage, and the pre-training process of the discriminant language model comprises the following steps: acquiring a text sample and an initial discriminant language model, and encoding each letter of the text sample into a corresponding token; the initial discriminative language model includes a generator and a discriminator. And generating a replacing token by the generator to replace a plurality of tokens corresponding to the text sample. And judging the token corresponding to the replaced text sample through a discriminator to obtain a judgment result. And inputting tokens corresponding to the replaced text samples and the discrimination results into the initial discriminant language model for training to obtain a pre-trained discriminant language model.
Specifically, as shown in fig. 2, the pre-training process of the Discriminant Language Model (DLM) is as follows:
the input sentence is first encoded into a plurality of tokens, DLM generates a plurality of tokens to replace the original input through a generator (as shown in fig. 2, "the" and "linked" are replaced), and then trains a discriminator to judge whether the tokens are replaced or not (the token is not changed before and after the replacement, so that the token is judged not to be replaced, the "linked" is changed into "ate", so that the token is judged to be replaced, and the other tokens are judged not to be replaced).
And 120, responding to the task request, and adjusting the pre-trained language model according to the type of the task request and the training data set. Optionally, the task request types for the discriminant language model include a text classification task and a question and answer task. The discriminant language model is adjusted in the downstream task stage, so that the difference from the pre-training stage is eliminated, and the overall effect of the model is improved.
Wherein, the discriminant language model is obtained by training text samples. According to the method, the model parameters of the discriminant language model are adjusted at different downstream task stages, so that the difference between the discriminant language model at the model pre-training stage and the downstream task is eliminated, and the overall effect of the model is improved. When the downstream task is a text classification task or a question and answer task, the accuracy of the model in the text classification or question and answer task is improved by adjusting the model of the discriminant language model in the downstream task stage.
Further, in the method for adjusting a discriminant language model according to the present invention, the pre-trained language model is adjusted based on the Prompt-Tuning paradigm.
With the increasing amount of pre-training language models, the hardware requirements, data requirements and actual costs for performing Fine-Tune on the pre-training language models are increasing. In addition, the design of the pre-training and fine-Tuning stages is complicated by the rich variety of downstream tasks, and therefore researchers hope to explore a more compact, lightweight, more ubiquitous and efficient method, and Prompt-Tuning is an attempt along this direction.
The present invention proposes DPT, the first Prompt-Tuning framework for discriminant language modeling, which reformulates NLP tasks as a discriminant language modeling problem. Experimental results on text classification and question-answering tasks show that compared with a traditional model fine-tuning method, the DPT achieves obviously higher performance, and meanwhile the problem of unstable training of a large-scale pre-training model under the full-data and low-resource scenes is solved.
In step 120, in response to the task request, the pre-trained language model is adjusted according to the type of the task request and the training data set, and the method specifically includes the following steps:
acquiring an input text in a training data set;
if the type of the task request is a text classification task, then:
encoding the input text into a token group comprising a plurality of tokens, and inserting a category token after the tokens;
extracting a vector corresponding to the category token, and performing inner product processing on the vector and a shared vector with the same length to obtain a probability value corresponding to the vector;
and adjusting the pre-trained language model based on the probability value. The difference of the discriminant language model between the pre-training stage and the text recognition task stage is eliminated by adjusting the discriminant language model in the text classification task.
Further, the inner product processing is performed on the vector and the shared vector with the same length, and then the method further comprises the following steps: and activating the obtained inner product processing result through a Sigmoid function to obtain a probability value corresponding to the vector. Sigmoid function is a common biological Sigmoid function, also called sigmoidal growth curve. In the information science, due to the properties of single increment and single increment of an inverse function, a Sigmoid function is often used as an activation function of a neural network, and variables are mapped between 0 and 1. When the function is used for classification, not only the classification can be predicted, but also the approximate probability prediction can be obtained. This is useful for many tasks that require decision-making with probability assistance. The logarithm probability function is a convex function with arbitrary order derivation, and has good mathematical properties, and a plurality of numerical optimization algorithms can be directly used for solving the optimal solution.
The rationale is that during pre-training, discriminant language models typically need to generate large inner products for replacement tokens (i.e., incorrect tokens) and small inner products for original tokens (i.e., correct tokens). Therefore, in the method for adjusting a model of a discriminant language model according to the present invention, the method for adjusting a pre-trained language model based on a probability value further includes the following steps:
setting the probability value as p, and taking 1-p as a real probability value;
calculating model loss according to the real probability value;
based on the model loss, the pre-trained language model is fine-tuned.
Specifically, as shown in fig. 3, 4 and 5, fig. 3(a) is a schematic diagram of pre-training using DLM head for the Differential Language Modeling (DLM), fig. 4(b) is a schematic diagram of conventional Fine Tuning using new Classification (CLS) head based on Fine-Tuning, and fig. 5(c) is a schematic diagram of the DPT model Tuning method based on Prompt-Tuning of the present invention, which redesigns NLP task as the differential language modeling problem.
In fig. 3, an input sentence is first encoded into tokens, DLM generates tokens to replace the original input through a generator, and then trains a discriminator to determine whether the tokens have been replaced, for example, "feel" is original, "make" is replaced, and "on" is original.
In FIG. 4, the discriminant language model is Fine-tuned based on Fine-Tuning, specifically, in the pretraining stage of the discriminant language model, the input to the pretrained model is first encoded into a number of tokens, all of which are typically wrapped using [ CLS ] and [ SEP ]. For the text classification task, the conventional method is to take out the vector corresponding to [ CLS ], multiply the vector by a matrix with a dimension of vector dimension x class number (set as N) to obtain a new vector with a dimension of N, and then perform Softmax normalization to obtain the probability of each class. This method uses only the vector corresponding to [ CLS ] and clearly has a significant difference from the pre-training phase.
In particular, DPT populates the input text into a template containing candidate answers and distinguishes whether each candidate answer is correct (i.e., original) or incorrect (i.e., replaced) based on the re-used DLM header. The invention (DPT) uses a method of inserting a token after the input, in the form of a "class: class a, class B. "then determine if" class a "," class B ", etc. has been" replaced ":
and (3) taking out vectors corresponding to the class A, the class B and the like, performing inner product on the N vectors and a shared vector with the same length, and activating by using a Sigmoid function to obtain N probabilities. DPT eliminates the difference between the model fine-tuning and pre-training phases and keeps the space-time cost unchanged. Meanwhile, the dependency of the model on the labeled data is reduced because the quantity of parameters for final probability calculation is changed from the matrix concentrated on random initialization to the N vectors from the model. In addition, because during pre-training, discriminant language models typically require large inner products for replacement tokens (i.e., incorrect tokens) and small inner products for original tokens (i.e., correct tokens), differences between downstream task datasets and pre-training corpora are further eliminated.
And performing a text classification task and a question and answer task by using the discriminant language model after model adjustment, wherein the text classification can be applied to classification of news types, judgment of emotional colors of comments, judgment of whether the text conforms to grammar and the like.
When the discriminant language model after the model adjustment method is used for performing a text classification task, the method comprises the following steps:
acquiring text data to be classified;
and inputting the text data to be classified into a discriminant language model after pre-training and model adjustment, and classifying the text data to be classified by the discriminant language model to obtain a classification result. Compared with the discriminant language model which is not subjected to model adjustment or is based on the existing Fine-Tuning model adjustment method, the discriminant language model using the method has higher recognition rate and better classification effect.
Specifically, the pre-training phase of the discriminant language model is to make the model obtain a basic semantic understanding function. Because the discriminant language model has a certain difference between the pre-training stage and the text classification task stage, the discriminant language model needs to be adjusted based on the text classification task and the data set corresponding to the text classification task after the discriminant language model is pre-trained.
In the invention, the model of the pre-trained discriminant language model is adjusted based on the Prompt-Tuning paradigm, so that the difference between the text classification stage and the pre-training stage of the model is eliminated, and the stability and the accuracy of the adjusted discriminant language model are improved.
In the method for adjusting a model of a discriminant language model according to the present invention, step 120, in response to a task request, the method adjusts a pre-trained language model according to the type of the task request and a training data set, and specifically includes:
obtaining a model discrimination head of a discriminant language model to be trained in a pre-training stage;
if the type of the task request is a question-answering task, then:
and adjusting the pre-trained language model based on the training data set and the model discrimination head. The difference between the discriminant language model in the pre-training stage and the text recognition task stage is eliminated by adjusting the discriminant language model in the question-answering task.
And performing a text classification task and a question and answer task by using the discriminant language model after model adjustment, wherein the text classification can be applied to a knowledge question and answer robot and the like.
When the discriminant language model after the model adjustment method is used for performing a question answering task, the method comprises the following steps:
acquiring text data of a task to be asked and answered;
and inputting the text data into a discriminant language model after pre-training and model adjustment, and judging the result of the text data to be classified by the discriminant language model to obtain the result of the question-answering task. Compared with the discriminant language model which is not subjected to model adjustment or is based on the existing Fine-Tuning model adjustment method, the discriminant language model using the method has higher recognition efficiency and higher question-answer result accuracy.
Specifically, the pre-training phase of the discriminant language model is to make the model obtain a basic semantic understanding function. Because the discriminant language model has a certain difference between the pre-training stage and the question-answering task stage, the model adjustment needs to be performed on the pre-trained discriminant language model based on the question-answering task and the data set corresponding to the question-answering task after the discriminant language model is pre-trained.
In the invention, model adjustment is carried out on the pre-trained discriminant language model based on the Prompt-Tuning paradigm, and the difference between the question answering task stage and the pre-training stage of the model is eliminated, so that the stability and the accuracy of the adjusted discriminant language model are improved.
Specifically, for a general multi-span question-answering task, the most advanced method at present is to customize the task as a sequence-labeled problem: giving text and questions, and marking the part of the text belonging to the answer. In the conventional model fine-tuning method, the discrimination head used for judging whether the answer belongs to the pre-training stage is discarded, and a randomly initialized discrimination head is used for training again. The invention provides that the discrimination head used in the pre-training stage is used as initialization for fine tuning, so that the dependence of the model fine tuning process on the labeled data is greatly reduced.
The core points of the invention are as follows: (1) a primary attempt of designing a Prompt-Tuning method for a discriminant pre-training model is made, the difference between a pre-training stage and model fine Tuning is eliminated, and deep potential relation between pre-training corpora and downstream tasks is excavated; (2) the dependence on the labeled data is reduced, and the stability of the model is improved.
Fig. 6 shows the results of the experiment as a text classification. The whole set is as follows: 100% data, low resource settings are: 10% data. FT is: Fine-Tuning, DPT: Prompt-Tuning. "BERT, RoBERTA, ELECTRA" correspond to different discriminant language models, respectively. The SST-2, SST-5, TREC and AGNews respectively refer to different data sets, and experimental results show that different discriminant language models have text classification performance under different data sets by using different model adjustment methods.
FIG. 7 shows experimental results of the question-answering task performance of different discriminant language models under different data sets by using different model adjustment methods.
As shown in the attached figures 6 and 7, the experimental results show that: on a plurality of data sets of the text classification task and the question and answer task, the method realizes the optimization under various configurations, and compared with the method before the Prompt-Tuning is used, the method has the advantage that the effect is obviously improved.
FIG. 8 shows model performance on different data sets based on different model tuning methods in a low-resource environment.
FIG. 9(a) is a graph showing model performance for each data set using the Fine-Tuning and discriminative language models of the DPT of the present invention at a full set of settings at 100% data;
FIG. 9(b) is a graph showing the model performance distribution for each data set using the Fine-Tuning and discriminative language models of the DPT of the present invention with the ground resource set to 10% data.
Through the ablation experiment shown in fig. 8, the improvement of the effect of the model by the improved details of "using 1-p as the probability" in the text classification task can be obtained. Through the stability experiment shown in the attached figure 9, the method can obviously solve the problem of unstable discriminant language model training effectively, and provides powerful support for the wide application of the discriminant language model.
The following describes a model adjustment apparatus for a discriminant language model according to the present invention, and the model adjustment apparatus for a discriminant language model described below and the model adjustment method for a discriminant language model described above may be referred to with each other.
As shown in fig. 10, a model adjusting apparatus for a discriminant language model according to an embodiment of the present invention includes the following modules: an acquisition module 1010 and an adjustment module 1020.
The obtaining module 1010 is configured to obtain a pre-trained discriminative language model and a training data set of a downstream task. The tuning module 1020 is configured to tune the pre-trained language model in response to the task request according to the type of the task request and the training data set. Wherein, the discriminant language model is obtained by training text samples. According to the model adjusting device of the discriminant language model, model parameters of the discriminant language model are adjusted at different downstream task stages, so that the difference between the discriminant language model at the model pre-training stage and the downstream task is eliminated, and the overall effect of the model is improved. When the downstream task is a text classification task or a question and answer task, the accuracy of the model in the text classification or question and answer task is improved by adjusting the model of the discriminant language model in the downstream task stage.
The technical problem to be solved by the invention is as follows: the feasibility of the discriminative language model for carrying out model fine Tuning by using Prompt-Tuning is explored, the problem of unstable discriminant model training is solved, and the model effect in text classification and question-answering tasks is improved.
The invention provides a model adjusting device and method of a discriminant language model, and the technical scheme of the invention has the following advantages:
1. compared with the existing model using Prompt-Tuning, the discriminant language model with task definition on all inputs is used for the first time, and the optimal effect is achieved on text classification and question-answering tasks.
2. The method eliminates the difference between the pre-training stage and the model fine-Tuning stage by using the Prompt-Tuning method, and excavates the deep potential relation between the pre-training corpus and the downstream task data set, thereby greatly improving the model performance compared with the traditional model fine-Tuning method.
3. By effectively utilizing the parameters of the pre-training model, the dependence on the labeled data is reduced, so that the problem that the large-scale discriminant language model is unstable in training in a low-resource scene is solved.
Fig. 11 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 9: a processor (processor)1110, a communication Interface (Communications Interface)1120, a memory (memory)1130, and a communication bus 1140, wherein the processor 1110, the communication Interface 1120, and the memory 1130 communicate with each other via the communication bus 1140. Processor 1110 may invoke logic instructions in memory 1130 to perform a method for model tuning of a discriminant language model, the method comprising the steps of: acquiring a discriminant language model for pre-training and a training data set of a downstream task; responding to the task request, and adjusting the pre-trained language model according to the type of the task request and the training data set; wherein, the discriminant language model is obtained by training text samples.
In addition, the logic instructions in the memory 1130 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention further provides a computer program product, the computer program product includes a computer program, the computer program can be stored on a non-transitory computer readable storage medium, when the computer program is executed by a processor, the computer can execute a method for model adjustment of a discriminant language model provided by the above methods, the method includes the following steps: acquiring a discriminant language model of pre-training and a training data set of a downstream task; responding to the task request, and adjusting the pre-trained language model according to the type of the task request and the training data set; wherein, the discriminant language model is obtained by training text samples.
In still another aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing a method for model adjustment of a discriminant language model provided by the above methods, the method comprising the steps of: acquiring a discriminant language model for pre-training and a training data set of a downstream task; responding to the task request, and adjusting the pre-trained language model according to the type of the task request and the training data set; wherein, the discriminant language model is obtained by training text samples.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on the understanding, the above technical solutions substantially or otherwise contributing to the prior art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (11)

1. A method for model tuning of a discriminant language model, the method comprising:
acquiring a discriminant language model for pre-training and a training data set of a downstream task;
responding to a task request, and adjusting the pre-trained language model according to the type of the task request and the training data set;
wherein the discriminant language model is obtained by training a text sample.
2. The method of claim 1, wherein the pre-trained language model is adapted based on a Prompt-Tuning paradigm.
3. The method for model tuning of a discriminative language model according to claim 2, wherein tuning the pre-trained language model in response to a task request according to the type of the task request and the training data set specifically comprises:
acquiring an input text in the training data set;
if the type of the task request is a text classification task, then:
encoding the input text into a token group comprising a plurality of tokens, and inserting a category token after the tokens;
extracting a vector corresponding to the class token, and performing inner product processing on the vector and a shared vector with the same length to obtain a probability value corresponding to the vector;
and adjusting the pre-trained language model based on the probability value.
4. The method of claim 3, wherein the vector is subjected to inner product processing with a shared vector having the same length, and then further comprising:
and activating the obtained inner product processing result through a Sigmoid function to obtain a probability value corresponding to the vector.
5. The method for model tuning of a discriminative language model according to claim 4, wherein the tuning of the pre-trained language model based on the probability value specifically comprises:
setting the probability value as p, and taking 1-p as a real probability value;
calculating model loss according to the real probability value;
fine-tuning the pre-trained language model based on the model loss.
6. The method of claim 2, wherein the adjusting the pre-trained language model according to the type of the task request and the training data set in response to the task request comprises:
obtaining a model discrimination head of the pre-trained discriminant language model in a pre-training stage;
if the type of the task request is a question-answering task, the following steps are carried out:
and adjusting the pre-trained language model based on the training data set and the model discrimination head.
7. The method of claim 2, wherein the pre-training process of the discriminant language model comprises:
acquiring the text sample and an initial discriminant language model, and encoding each letter of the text sample into a corresponding token; the initial discriminant language model comprises a generator and a discriminator;
generating a replacing token by the generator to replace a plurality of tokens corresponding to the text sample;
judging the token corresponding to the replaced text sample through the judger to obtain a judgment result;
and inputting the token corresponding to the replaced text sample and the discrimination result into the initial discriminant language model for training to obtain the pre-trained discriminant language model.
8. An apparatus for model adjustment of a discriminant language model, the apparatus comprising: the acquisition module is used for acquiring a discriminant language model to be pre-trained and a training data set of a downstream task;
the adjusting module is used for responding to a task request and adjusting the pre-trained language model according to the type of the task request and the training data set;
wherein the discriminant language model is obtained by training a text sample.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for model adjustment of a discriminative language model according to any of claims 1 to 7 when executing the program.
10. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements a model adjustment method for a discriminant language model according to any one of claims 1 to 7.
11. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements a method of model adjustment of a discriminative language model as claimed in any one of claims 1 to 7.
CN202210567681.4A 2022-05-23 2022-05-23 Model adjusting method and device of discriminant language model Pending CN115080736A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210567681.4A CN115080736A (en) 2022-05-23 2022-05-23 Model adjusting method and device of discriminant language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210567681.4A CN115080736A (en) 2022-05-23 2022-05-23 Model adjusting method and device of discriminant language model

Publications (1)

Publication Number Publication Date
CN115080736A true CN115080736A (en) 2022-09-20

Family

ID=83248495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210567681.4A Pending CN115080736A (en) 2022-05-23 2022-05-23 Model adjusting method and device of discriminant language model

Country Status (1)

Country Link
CN (1) CN115080736A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115438176A (en) * 2022-11-08 2022-12-06 阿里巴巴达摩院(杭州)科技有限公司 Method and equipment for generating downstream task model and executing task

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115438176A (en) * 2022-11-08 2022-12-06 阿里巴巴达摩院(杭州)科技有限公司 Method and equipment for generating downstream task model and executing task

Similar Documents

Publication Publication Date Title
CN111859960B (en) Semantic matching method, device, computer equipment and medium based on knowledge distillation
CN106547735B (en) Construction and use method of context-aware dynamic word or word vector based on deep learning
CN113591902B (en) Cross-modal understanding and generating method and device based on multi-modal pre-training model
CN111738251B (en) Optical character recognition method and device fused with language model and electronic equipment
US11663483B2 (en) Latent space and text-based generative adversarial networks (LATEXT-GANs) for text generation
CN112435656B (en) Model training method, voice recognition method, device, equipment and storage medium
CN111460833A (en) Text generation method, device and equipment
WO2022217849A1 (en) Methods and systems for training neural network model for mixed domain and multi-domain tasks
CN111027292B (en) Method and system for generating limited sampling text sequence
CN111339772B (en) Russian text emotion analysis method, electronic device and storage medium
CN110968725A (en) Image content description information generation method, electronic device, and storage medium
CN115064154A (en) Method and device for generating mixed language voice recognition model
CN115080736A (en) Model adjusting method and device of discriminant language model
CN111091809B (en) Regional accent recognition method and device based on depth feature fusion
CN116071472B (en) Image generation method and device, computer readable storage medium and terminal
CN116303966A (en) Dialogue behavior recognition system based on prompt learning
CN110610006A (en) Morphological double-channel Chinese word embedding method based on strokes and glyphs
CN116822530A (en) Knowledge graph-based question-answer pair generation method
CN115525749A (en) Voice question-answering method, device, electronic equipment and storage medium
CN114330375A (en) Term translation method and system based on fixed paradigm
CN115905500B (en) Question-answer pair data generation method and device
CN114840697B (en) Visual question-answering method and system for cloud service robot
CN112347196B (en) Entity relation extraction method and device based on neural network
CN112685558B (en) Training method and device for emotion classification model
CN114676684B (en) Text error correction method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination