CN113420123A - Language model training method, NLP task processing method and device - Google Patents

Language model training method, NLP task processing method and device Download PDF

Info

Publication number
CN113420123A
CN113420123A CN202110705729.9A CN202110705729A CN113420123A CN 113420123 A CN113420123 A CN 113420123A CN 202110705729 A CN202110705729 A CN 202110705729A CN 113420123 A CN113420123 A CN 113420123A
Authority
CN
China
Prior art keywords
text
language model
task
training
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110705729.9A
Other languages
Chinese (zh)
Inventor
张学君
张震
王晗
李鹏
刘建
石瑾
刘睿霖
颜永红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
National Computer Network and Information Security Management Center
Original Assignee
Institute of Acoustics CAS
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, National Computer Network and Information Security Management Center filed Critical Institute of Acoustics CAS
Priority to CN202110705729.9A priority Critical patent/CN113420123A/en
Publication of CN113420123A publication Critical patent/CN113420123A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a language model training method, an NLP task processing method and a device, comprising the following steps: acquiring a training sample set; the training sample set comprises a first task label of a new task, a plurality of first training texts of the new task, a first text label of each first training text, and a second task label of each old task in at least one old task; copying the language model to obtain a teacher language model, and taking the language model as a student language model; inputting the second task labels into the teacher language model, and generating a plurality of second training texts corresponding to the old tasks and a second text label of each second training text; and inputting the first task label, the second task label, the first training text and the second training text into a student language model, generating a first prediction text, a first prediction result, a second prediction text and a second prediction result, and training the student language model. According to the embodiment of the application, the problem that storage resources occupy a large amount in the related technology can be solved.

Description

Language model training method, NLP task processing method and device
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a method for training a language model, a method and an apparatus for NLP task processing.
Background
With the development of Natural Language Processing (NLP) technology, Language models have been widely applied to various fields of life, so as to accomplish different kinds of Natural Language Processing tasks, such as text classification, emotion classification, semantic role labeling of a dialog system, and the like.
At present, after a conventional neural network learns a new task, the performance of the conventional neural network on an old task is sharply reduced, namely the performance of the neural network is reduced along with the distribution change of a learned data set, so that the problem of catastrophic forgetting of the neural network exists. In the related art, in order to avoid catastrophic forgetting of the neural network, a plurality of sub-neural network models can be added on the main neural network model, different sub-neural network models can learn different tasks, and the trained tasks can be reviewed by storing training samples and prediction results of the trained tasks.
However, in the implementation manner for avoiding catastrophic forgetting of the neural network, the storage resources occupied by the neural network are increased as the number of subnetworks is increased by constructing the subnetworks to cause the model to be increased along with the increase of the subnetworks; saving training samples and prediction results of already trained tasks requires a large amount of memory resources.
Disclosure of Invention
The embodiment of the application provides a language model training method, an NLP task processing method and an NLP task processing device, which can generate a training text of a trained task through a teacher language model, so that the problem of large storage resource occupation in the related technology is solved.
In a first aspect, an embodiment of the present application provides a method for training a language model, where the method includes:
acquiring a training sample set; the training sample set comprises a first task label of a new task, a plurality of first training texts of the new task, a first text label of each first training text, and a second task label of each old task in at least one old task, wherein the old task is an NLP task with a trained language model, and the new task is an NLP task without the trained language model;
copying the language model to obtain a teacher language model, and taking the language model as a student language model;
inputting the second task labels into the teacher language model, and generating a plurality of second training texts corresponding to the old tasks and a second text label of each second training text;
inputting the first task label, the second task label, the first training text and the second training text into a student language model to generate a first prediction text corresponding to the first task label, a first prediction result corresponding to the first training text, a second prediction text corresponding to the second task label and a second prediction result corresponding to the second training text;
and training the student language model according to the first training text and the first predicted text, the second training text and the second predicted text, the first predicted result and the first text label, and the second predicted result and the second text label.
In one possible implementation manner, the first training text and the second training text are both question and answer format texts, and the question and answer format texts include question prompt information and questions generated according to the question prompt information.
In one possible implementation, training a student language model according to a first training text, a first predicted text, a second training text, and a second predicted text includes:
performing knowledge distillation on the student language model according to the second text label and the second prediction result;
and training the student language model according to the loss between the first training text and the first prediction text, the loss between the second training text and the second prediction text, the loss between the first prediction result and the first text label and the loss between the second prediction result and the second text label.
In one possible implementation, knowledge distillation is performed on the student language model according to the second text label and the second prediction result, and the knowledge distillation comprises the following steps:
calculating JS divergence of the teacher language model and the student language model according to the second text label and the second prediction result;
calculating the earth movement distance of transferring the characteristics of each layer in the teacher language model to each layer in the student language model;
calculating the loss between the teacher language model and the student language model according to the JS divergence and the earth moving distance;
and updating the student language model according to the loss between the teacher language model and the student language model.
In one possible implementation, inputting the second task tag into the teacher language model to generate a plurality of second predicted texts corresponding to the new task, including:
and inputting the second task labels into the teacher language model, and generating a plurality of second prediction texts according to the number and the preset proportion of the second training texts in the second training text set.
In a second aspect, an embodiment of the present application provides an NLP task processing method, including:
acquiring a text of the NLP task;
inputting the text into a student language model as in the first aspect or any possible implementation manner of the first aspect, and generating a prediction result of the NLP task.
In one possible implementation, the text includes question prompt information and questions for the NLP task; inputting text into a student language model as in the first aspect or any possible implementation manner of the first aspect, and generating a prediction result of the NLP task, including:
and inputting the question prompt information and the question into a student language model to generate an answer corresponding to the question.
In a third aspect, an embodiment of the present application provides a training apparatus for a language model, including:
the acquisition module is used for acquiring a training sample set; the training sample set comprises a first task label of a new task, a plurality of first training texts of the new task, a first text label of each first training text, and a second task label of each old task in at least one old task, wherein the old task is an NLP task with a trained language model, and the new task is an NLP task without the trained language model;
the copying module is used for copying the language model to obtain a teacher language model and taking the language model as a student language model;
the generating module is used for inputting the second task labels into the teacher language model and generating a plurality of second training texts corresponding to the old tasks and a second text label of each second training text;
the generating module is further used for inputting the first task label, the second task label, the first training text and the second training text into the student language model, and generating a first prediction text corresponding to the first task label, a first prediction result corresponding to the first training text, a second prediction text corresponding to the second task label and a second prediction result corresponding to the second training text;
and the training module is used for training the student language model according to the first training text and the first prediction text, the second training text and the second prediction text, the first prediction result and the first text label, and the second prediction result and the second text label.
In one possible implementation manner, the first training text and the second training text are both question and answer format texts, and the question and answer format texts include question prompt information and questions generated according to the question prompt information.
In one possible implementation, the training module is configured to:
performing knowledge distillation on the student language model according to the second text label and the second prediction result;
and training the student language model according to the loss between the first training text and the first prediction text, the loss between the second training text and the second prediction text, the loss between the first prediction result and the first text label and the loss between the second prediction result and the second text label.
In one possible implementation, the training module is configured to:
calculating JS divergence of the teacher language model and the student language model according to the second text label and the second prediction result;
calculating the earth movement distance of transferring the characteristics of each layer in the teacher language model to each layer in the student language model;
calculating the loss between the teacher language model and the student language model according to the JS divergence and the earth moving distance;
and updating the student language model according to the loss between the teacher language model and the student language model.
In a possible implementation manner, the generating module is configured to input the second task label into the teacher language model, and generate a plurality of second prediction texts according to the number and the preset proportion of the second training texts in the second training text set.
In a fourth aspect, an embodiment of the present application provides an NLP task processing apparatus, including:
the acquisition module is used for acquiring a text of the NLP task;
a generating module for inputting text into the student language model according to any one of claims 1 to 5, generating a predicted result of the NLP task.
In one possible implementation, the text includes question prompt information and questions for the NLP task; the generation module is to: and inputting the question prompt information and the question into a student language model to generate an answer corresponding to the question.
In a fifth aspect, embodiments of the present application provide a computer device, including a processor, a memory, and a computer program stored on the memory and executable on the processor, where the computer program, when executed by the processor, implements the method as provided in the first aspect or any one of the possible implementations of the first aspect, or implements the method as provided in the second aspect or any one of the possible implementations of the second aspect.
In a sixth aspect, embodiments of the present application provide a computer storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the method provided in the first aspect or any one of the possible implementations of the first aspect, or implement the method provided in the second aspect or any one of the possible implementations of the second aspect.
According to the language model training method, the NLP task processing method and the NLP task processing device, the second task label of the old task is input into the teacher language model trained by the old task, and therefore the second training text corresponding to the old task and the text label corresponding to the second training text are generated. The new task is an NLP task with an untrained language model, and the old task is an NLP task with a trained language model. The teacher language model is obtained by copying the language model, and the student language model is the language model. And then inputting the first training text and the first task label of the new task, and the second training text and the second task label into a student language model to generate a first predicted text corresponding to the first task label and a second predicted text corresponding to the second task label. And training the student language model according to the first training text, the first predicted text, the second training text, the second predicted text, the first text label and the second text label. Therefore, the training texts corresponding to the tasks which are trained by the teacher language model can be obtained without storing the training texts corresponding to the tasks which are trained by the teacher language model, and the student language models which are the same as the teacher language model are trained by combining the training texts corresponding to the tasks which are not trained by the model, so that the model which can execute the trained tasks and the untrained tasks can be obtained, the problem of catastrophic forgetting of the model is avoided, and the performance of the model is improved.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating a method for training a language model according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart illustrating an NLP task processing method according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a training apparatus for a language model according to an embodiment of the present application;
fig. 4 is a schematic structural diagram illustrating an NLP task processing apparatus according to an embodiment of the present application;
fig. 5 shows a schematic structural diagram of a computer device provided in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the embodiments of the present application will be described below with reference to the accompanying drawings.
In the description of the embodiments of the present application, the words "exemplary," "for example," or "for instance" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary," "e.g.," or "e.g.," is not to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the words "exemplary," "e.g.," or "exemplary" is intended to present relevant concepts in a concrete fashion.
In the description of the embodiments of the present application, the term "and/or" is only one kind of association relationship describing an associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, B exists alone, and A and B exist at the same time. In addition, the term "plurality" means two or more unless otherwise specified. For example, the plurality of systems refers to two or more systems, and the plurality of screen terminals refers to two or more screen terminals.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicit indication of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
With the development of Natural Language Processing (NLP) technology, Language models have been widely applied to various fields of life, so as to accomplish different kinds of Natural Language Processing tasks, such as text classification, emotion classification, semantic role labeling of a dialog system, and the like.
At present, the performance of the neural network is reduced due to the catastrophic forgetting problem of the neural network. In the related art, in order to avoid catastrophic forgetting of the neural network, a plurality of sub-neural network models can be added on the main neural network model, different sub-neural network models can learn different tasks, and the trained tasks can be reviewed by storing training samples and prediction results of the trained tasks.
However, in the implementation manner for avoiding catastrophic forgetting of the neural network, the storage resources occupied by the neural network are increased as the number of subnetworks is increased by constructing the subnetworks to cause the model to be increased along with the increase of the subnetworks; saving training samples and prediction results of already trained tasks requires a large amount of memory resources.
Based on this, the embodiment of the application provides a training method of a language model, an NLP task processing method and an NLP task processing device, a training text of a trained task is generated through a teacher language model, and the problem that storage resources for storing training sample data occupy a large amount is solved.
Fig. 1 is a schematic flowchart of a method for training a language model according to an embodiment of the present application. As shown in fig. 1, the method for training a language model provided in the embodiment of the present application may include S101-S104.
S101: acquiring a training sample set; the training sample set comprises a first task label of a new task, a plurality of first training texts of the new task, a first text label of each first training text, and a second task label of each old task in at least one old task, wherein the old task is an NLP task with a trained language model, and the new task is an NLP task without the trained language model.
NLP tasks may involve multiple aspects, such as mathematics, poetry, music, and so forth. In order to distinguish the categories between different NLPs, a task label of each NLP task can be set, and the task label can characterize the task category. Here, the old task is an NLP task that has been trained by a language model, and the new task is an NLP task that has not been trained by the language model. For example, the new task is a mathematical task and the old task is a poetry task. The first training text for the new task may be "1 +1 equals 2".
Here, there may be a plurality of old tasks.
In some embodiments, to improve the performance of the model, after the first training text is obtained, the training text may be formatted. Specifically, the first training texts may be unified into a question-answer format text by using a regularization method according to a question-answer format. Here, the question-and-answer format text includes question presentation information, and a question generated based on the question presentation information. For example, if the text of the new task is "add, 1+1 equals 2", the text of the new task may be unified into "C #1+1, Q # equals several, a # 2", the first training text of the new task is "C #1+1, Q # equals several", and the first text label of the first training text is "a # 2".
Here, the first training text may include a first text label, for example, if the text of the new task is "add, 1+1 equals 2", the text of the new task may be unified into "C #1+1, Q # equals several, a # 2", and the first training text may be determined to be "C #1+1, Q # equals several, a # 2".
In some embodiments, the language model is the GPT-2 model.
S102: and copying the language model to obtain a teacher language model, and taking the language model as a student language model.
In order to realize the self-distillation of the language model and avoid that the language model cannot execute old tasks due to catastrophic forgetting, the language model can be copied, the copied language model is used as a teacher language model, and then the language model is used as a student language model.
Here, in order to avoid the occupation of storage resources, the teacher language model may be deleted after the student language model is trained.
S103: and inputting the second task labels into the teacher language model, and generating a plurality of second training texts corresponding to the old tasks and a second text label of each second training text.
In order to avoid that the student language model is catastrophically forgotten in the training process of the new task and loses the performance of executing the old task, the training text of the new task and the training text of the old task can be combined in the training process. And inputting the second task label of the old task into the teacher language model to obtain the pseudo data corresponding to the old task, wherein the pseudo data can be used as a training sample for training the old task of the student language model, namely a second training text and a second text label. The second training text is a predicted text of second task labels of the old tasks randomly generated by the teacher language model from the second task labels.
With the increase of different tasks, training samples can also be increased, and therefore the occupied storage space is larger and larger, the training text of the old task of the student language model is trained in the embodiment of the application in a mode that the teacher language model of the old task is trained to generate the pseudo data, and therefore the storage space does not need to be allocated to store a large number of training samples of the old task, and the problem that the storage resource is large in occupation is avoided.
In some embodiments, the text format of the second training text is a question and answer format. For example, if the second task label is math, the generated second training text may be "1 +1 ═ 2" and "1 ═ 1".
In order to ensure the performance of the model, the second training texts may be unified in format, for example, "1 +1 — 2" is unified into "C #1+1," Q # is equal to several "and" a #2, "1 — 1" is unified into "C #1 — 1," Q # is equal to several "and" a #1, "and so on. The second text label represents a true value corresponding to the question, e.g., "X" in "A # X" represents a true value corresponding to the second training text.
In some embodiments, to improve the training efficiency of the model, the second training text may be generated according to the number of the first training texts. Specifically, the product of the number of the first training texts and the preset proportion is calculated, and the number required by the second training texts is determined.
S104: and inputting the first task label, the second task label, the first training text and the second training text into a student language model to generate a first prediction text corresponding to the first task label, a first prediction result corresponding to the first training text, a second prediction text corresponding to the second task label and a second prediction result corresponding to the second training text.
Here, the second predicted text refers to a predicted text corresponding to the second task tag generated by the student language model based on the second task tag.
Specifically, a first task label is input into a student language model, and a first prediction text is generated; and inputting the second task tag into the student language model to generate a second predicted text, namely the student language model executes the language task. And inputting the first training text into the student language model to obtain a first predicted text corresponding to the first training text. For example, the first training text is "C # quiet night thought, who is the Q # author," and the first predictive text can be "a # pau. And inputting the second training text into the student language model to obtain a second predicted text corresponding to the second training text. That is, the student language model performs the question-answering task.
S105: and training the student language model according to the first training text and the first predicted text, the second training text and the second predicted text, the first predicted result and the first text label, and the second predicted result and the second text label.
In some embodiments, to ensure that the student language model retains the performance of the teacher language model on the old tasks, knowledge distillation of the student language model is required so that the language model completes self-distillation. Specifically, in S105, first, knowledge distillation is performed on the student language model according to the second predicted text and the second text label; next, a language model is trained based on the loss between the first predicted text and the first training text, the loss between the first text label and the first predicted result, the loss between the second predicted text and the second training text, and the loss between the second text label and the second predicted result.
Here, in the knowledge distillation of the student language model, first, JS (Jensen-Shannon) divergence between the teacher language model and the student language model is calculated from the second prediction text and the second training text. Calculating the earth movement distance of transferring the characteristics of each layer in the teacher language model to each layer in the student language model; secondly, calculating the loss between the teacher language model and the student language model according to the JS divergence and the earth movement distance; and updating the student language model according to the loss between the teacher language model and the student language model.
Therefore, the teacher language model is obtained by copying the language model, knowledge distillation is carried out on the student model, self-distillation of the language model is achieved, namely the language model is subjected to incremental learning, and the universality of the language model capable of executing tasks is improved.
According to the training method of the language model, the second task label of the old task is input into the teacher language model trained by the old task, so that a second training text corresponding to the old task and a text label corresponding to the second training text are generated. The new task is an NLP task with an untrained language model, and the old task is an NLP task with a trained language model. The teacher language model is obtained by copying the language model, and the student language model is the language model. And then inputting the first training text and the first task label of the new task, and the second training text and the second task label into a student language model to generate a first predicted text corresponding to the first task label and a second predicted text corresponding to the second task label. And training the student language model according to the first training text, the first predicted text, the second training text, the second predicted text, the first text label and the second text label. Therefore, the training texts corresponding to the tasks which are trained by the teacher language model can be obtained without storing the training texts corresponding to the tasks which are trained by the teacher language model, and the student language models which are the same as the teacher language model are trained by combining the training texts corresponding to the tasks which are not trained by the model, so that the model which can execute the trained tasks and the untrained tasks can be obtained, the problem of catastrophic forgetting of the model is avoided, and the performance of the model is improved.
Based on the language model in the above embodiment, the embodiment of the present application further provides an NLP task processing method. Fig. 2 is a schematic flowchart of an NLP task processing method provided in an embodiment of the present application, and as shown in fig. 2, the NLP task processing method provided in the embodiment of the present application may include S201 to S202.
S201: and acquiring the text of the NLP task.
In some embodiments, the NLP task text is in question and answer format, and the NLP task text includes question prompt information and a question.
S202: and inputting the text into a language model to generate a prediction result of the NLP task.
Here, the language model is a language model trained by the embodiment corresponding to fig. 1.
In some embodiments, the question prompt information and the question are input into a language model, and an answer corresponding to the question is generated through greedy decoding.
According to the NLP task processing method provided by the embodiment of the application, the NLP task is input into the language model trained by the embodiment corresponding to the graph 1, a prediction result is obtained, and the universality of task processing is improved.
Based on the training method of the language model in the above embodiment, the embodiment of the present application further provides a training device of the language model. Fig. 3 is a schematic structural diagram of a training apparatus 300 for a language model according to an embodiment of the present disclosure, and as shown in fig. 3, the training apparatus 300 for a language model may include an obtaining module 301, a copying module 302, a generating module 303, and a training module 304.
An obtaining module 301, configured to obtain a training sample set; the training sample set comprises a first task label of a new task, a plurality of first training texts of the new task, a first text label of each first training text, and a second task label of each old task in at least one old task, wherein the old task is an NLP task with a trained language model, and the new task is an NLP task without the trained language model;
the copying module 302 is used for copying the language model to obtain a teacher language model and taking the language model as a student language model;
the generating module 303 is configured to input the second task label into the teacher language model, and generate a plurality of second training texts corresponding to the old task and a second text label of each second training text;
the generating module 303 is further configured to input the first task label, the second task label, the first training text, and the second training text into a student language model, and generate a first predicted text corresponding to the first task label, a first predicted result corresponding to the first training text, a second predicted text corresponding to the second task label, and a second predicted result corresponding to the second training text;
the training module 304 is configured to train the student language model according to the first training text and the first predicted text, the second training text and the second predicted text, the first predicted result and the first text label, and the second predicted result and the second text label.
In one possible implementation manner, the first training text and the second training text are both question and answer format texts, and the question and answer format texts include question prompt information and questions generated according to the question prompt information.
In one possible implementation, the training module 304 is configured to:
performing knowledge distillation on the student language model according to the second text label and the second prediction result;
and training the student language model according to the loss between the first training text and the first prediction text, the loss between the second training text and the second prediction text, the loss between the first prediction result and the first text label and the loss between the second prediction result and the second text label.
In one possible implementation, the training module 304 is configured to:
calculating JS divergence of the teacher language model and the student language model according to the second text label and the second prediction result;
calculating the earth movement distance of transferring the characteristics of each layer in the teacher language model to each layer in the student language model;
calculating the loss between the teacher language model and the student language model according to the JS divergence and the earth moving distance;
and updating the student language model according to the loss between the teacher language model and the student language model.
In a possible implementation manner, the generating module 303 is configured to input the second task label into the teacher language model, and generate a plurality of second prediction texts according to the number and the preset ratio of the second training texts in the second training text set.
The training device for language models provided in the embodiment of the present application can perform the steps of the method in the embodiment corresponding to fig. 1, and can achieve the same technical effect, and in order to avoid repetition, detailed description is not provided here.
According to the training device for the language model, the second task label of the old task is input into the teacher language model trained by the old task, so that a second training text corresponding to the old task and a text label corresponding to the second training text are generated. The new task is an NLP task with an untrained language model, and the old task is an NLP task with a trained language model. The teacher language model is obtained by copying the language model, and the student language model is the language model. And then inputting the first training text and the first task label of the new task, and the second training text and the second task label into a student language model to generate a first predicted text corresponding to the first task label and a second predicted text corresponding to the second task label. And training the student language model according to the first training text, the first predicted text, the second training text, the second predicted text, the first text label and the second text label. Therefore, the training texts corresponding to the tasks which are trained by the teacher language model can be obtained without storing the training texts corresponding to the tasks which are trained by the teacher language model, and the student language models which are the same as the teacher language model are trained by combining the training texts corresponding to the tasks which are not trained by the model, so that the model which can execute the trained tasks and the untrained tasks can be obtained, the problem of catastrophic forgetting of the model is avoided, and the performance of the model is improved.
Based on the NLP task processing method in the above embodiment, the present application embodiment further provides an NLP task processing device. Fig. 4 is a schematic structural diagram of an NLP task processing device 400 according to an embodiment of the present disclosure, and as shown in fig. 4, the NLP task processing device 400 according to the embodiment of the present disclosure may include an obtaining module 401 and a generating module 402.
An obtaining module 401, configured to obtain a text of an NLP task;
a generating module 402 for inputting text into the student language model according to any one of claims 1 to 5, generating a prediction result of the NLP task.
In one possible implementation, the text includes question prompt information and questions for the NLP task; the generation module 402 is configured to: and inputting the question prompt information and the question into a student language model to generate an answer corresponding to the question.
The training device for language models provided in the embodiment of the present application can perform the steps of the method in the embodiment corresponding to fig. 2, and can achieve the same technical effect, and in order to avoid repetition, detailed description is not provided here.
The NLP task processing device provided in the embodiment of the present application obtains a prediction result by inputting an NLP task to a language model trained in the embodiment corresponding to fig. 1, and improves the universality of task processing.
A computer device provided in an embodiment of the present application is described below.
Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 5, the computer device provided in the embodiment of the present application may be used to implement the method for training a language model or the NLP task processing method described in the foregoing method embodiment.
The computer device may comprise a processor 501 and a memory 502 in which computer program instructions are stored.
Specifically, the processor 501 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 502 may include mass storage for data or instructions. By way of example, and not limitation, memory 502 may include a Hard Disk Drive (HDD), a floppy Disk Drive, flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 502 may include removable or non-removable (or fixed) media, where appropriate. The memory 502 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 502 is non-volatile solid-state memory.
The memory may include Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors), it is operable to perform operations described with reference to methods in accordance with the present application.
The processor 501 reads and executes the computer program instructions stored in the memory 502 to implement the training method of the language model or the NLP task processing method in any of the above embodiments.
In one example, the electronic device can also include a communication interface 505 and a bus 510. As shown in fig. 5, the processor 501, the memory 502, and the communication interface 505 are connected via a bus 510 to complete communication therebetween.
The communication interface 505 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present application.
Bus 510 includes hardware, software, or both to couple the components of the electronic device to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 510 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
In addition, in combination with the above embodiments, the embodiments of the present application may be implemented by providing a computer storage medium. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement a method of training a language model or a method of NLP task processing in any of the above embodiments.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Aspects of the present application are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As described above, only the specific embodiments of the present application are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims (9)

1. A method for training a language model, comprising:
acquiring a training sample set; the training sample set comprises a first task label of a new task, a plurality of first training texts of the new task, a first text label of each first training text, and a second task label of each old task in at least one old task, wherein the old task is an NLP task with a trained language model, and the new task is an NLP task without the trained language model;
copying the language model to obtain a teacher language model, and taking the language model as a student language model;
inputting the second task labels into the teacher language model, and generating a plurality of second training texts corresponding to the old tasks and a second text label of each second training text;
inputting the first task label, the second task label, the first training text and the second training text into the student language model, and generating a first predicted text corresponding to the first task label, a first predicted result corresponding to the first training text, a second predicted text corresponding to the second task label and a second predicted result corresponding to the second training text;
and training the student language model according to the first training text and the first predicted text, the second training text and the second predicted text, the first predicted result and the first text label, and the second predicted result and the second text label.
2. The method according to claim 1, wherein the first training text and the second training text are question and answer formatted texts, and the question and answer formatted texts comprise question prompt information and questions generated according to the question prompt information.
3. The method of claim 2, wherein training the student language model based on the first training text, the first predictive text, the second training text, and the second predictive text comprises:
performing knowledge distillation on the student language model according to the second text label and the second prediction result;
and training the student language model according to the loss between the first training text and the first predicted text, the loss between the second training text and the second predicted text, the loss between the first predicted result and the first text label and the loss between the second predicted result and the second text label.
4. The method of claim 3, wherein knowledge distillation of the student language model based on the second text label and the second prediction comprises:
calculating JS divergence of the teacher language model and the student language model according to the second text label and the second prediction result;
calculating the earth movement distance of the feature transfer of each layer in the teacher language model to each layer in the student language model;
calculating the loss between the teacher language model and the student language model according to the JS divergence and the earth movement distance;
updating the student language model based on the loss between the teacher language model and the student language model.
5. The method of claim 1, wherein the inputting the second task label into a teacher language model and generating a plurality of second predicted texts corresponding to the new task comprises:
and inputting the second task labels into a teacher language model, and generating a plurality of second prediction texts according to the number and preset proportion of the second training texts in the second training text set.
6. An NLP task processing method, comprising:
acquiring a text of the NLP task;
inputting the text into the student language model of any one of claims 1-5, generating a prediction result of the NLP task.
7. The method of claim 6, wherein the text includes question prompt information and questions for the NLP task; the inputting the text into the student language model of any one of claims 1-5, generating a predicted result of the NLP task, comprising:
and inputting the question prompt information and the question into the student language model to generate an answer corresponding to the question.
8. An apparatus for training a language model, comprising:
the acquisition module is used for acquiring a training sample set; the training sample set comprises a first task label of a new task, a plurality of first training texts of the new task, a first text label of each first training text, and a second task label of each old task in at least one old task, wherein the old task is an NLP task with a trained language model, and the new task is an NLP task without the trained language model;
the copying module is used for copying the language model to obtain a teacher language model and taking the language model as a student language model;
the generating module is used for inputting the second task labels into the teacher language model, and generating a plurality of second training texts corresponding to the old task and a second text label of each second training text;
the generating module is further configured to input the first task label, the second task label, the first training text, and the second training text into the student language model, and generate a first predicted text corresponding to the first task label, a first predicted result corresponding to the first training text, a second predicted text corresponding to the second task label, and a second predicted result corresponding to the second training text;
and the training module is used for training the student language model according to the first training text and the first predicted text, the second training text and the second predicted text, the first predicted result and the first text label, and the second predicted result and the second text label.
9. An NLP task processing apparatus, comprising:
the acquisition module is used for acquiring a text of the NLP task;
a generating module for inputting the text into the student language model according to any one of claims 1 to 5, and generating a prediction result of the NLP task.
CN202110705729.9A 2021-06-24 2021-06-24 Language model training method, NLP task processing method and device Pending CN113420123A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110705729.9A CN113420123A (en) 2021-06-24 2021-06-24 Language model training method, NLP task processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110705729.9A CN113420123A (en) 2021-06-24 2021-06-24 Language model training method, NLP task processing method and device

Publications (1)

Publication Number Publication Date
CN113420123A true CN113420123A (en) 2021-09-21

Family

ID=77717665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110705729.9A Pending CN113420123A (en) 2021-06-24 2021-06-24 Language model training method, NLP task processing method and device

Country Status (1)

Country Link
CN (1) CN113420123A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116401359A (en) * 2023-06-09 2023-07-07 深圳前海环融联易信息科技服务有限公司 Document extraction method and device, medium and equipment
CN117350407A (en) * 2023-11-20 2024-01-05 北京中关村科金技术有限公司 Model processing method, device, electronic equipment and readable storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200134506A1 (en) * 2018-10-29 2020-04-30 Fujitsu Limited Model training method, data identification method and data identification device
CN111159416A (en) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 Language task model training method and device, electronic equipment and storage medium
CN111199242A (en) * 2019-12-18 2020-05-26 浙江工业大学 Image increment learning method based on dynamic correction vector
CN111506702A (en) * 2020-03-25 2020-08-07 北京万里红科技股份有限公司 Knowledge distillation-based language model training method, text classification method and device
CN111767711A (en) * 2020-09-02 2020-10-13 之江实验室 Compression method and platform of pre-training language model based on knowledge distillation
US20200334520A1 (en) * 2019-04-19 2020-10-22 Microsoft Technology Licensing, Llc Multi-task machine learning architectures and training procedures
CN112487182A (en) * 2019-09-12 2021-03-12 华为技术有限公司 Training method of text processing model, and text processing method and device
CN112507209A (en) * 2020-11-10 2021-03-16 中国科学院深圳先进技术研究院 Sequence recommendation method for knowledge distillation based on land moving distance
CN112613273A (en) * 2020-12-16 2021-04-06 上海交通大学 Compression method and system of multi-language BERT sequence labeling model
CN112966712A (en) * 2021-02-01 2021-06-15 北京三快在线科技有限公司 Language model training method and device, electronic equipment and computer readable medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200134506A1 (en) * 2018-10-29 2020-04-30 Fujitsu Limited Model training method, data identification method and data identification device
US20200334520A1 (en) * 2019-04-19 2020-10-22 Microsoft Technology Licensing, Llc Multi-task machine learning architectures and training procedures
CN112487182A (en) * 2019-09-12 2021-03-12 华为技术有限公司 Training method of text processing model, and text processing method and device
CN111199242A (en) * 2019-12-18 2020-05-26 浙江工业大学 Image increment learning method based on dynamic correction vector
CN111506702A (en) * 2020-03-25 2020-08-07 北京万里红科技股份有限公司 Knowledge distillation-based language model training method, text classification method and device
CN111159416A (en) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 Language task model training method and device, electronic equipment and storage medium
CN111767711A (en) * 2020-09-02 2020-10-13 之江实验室 Compression method and platform of pre-training language model based on knowledge distillation
CN112507209A (en) * 2020-11-10 2021-03-16 中国科学院深圳先进技术研究院 Sequence recommendation method for knowledge distillation based on land moving distance
CN112613273A (en) * 2020-12-16 2021-04-06 上海交通大学 Compression method and system of multi-language BERT sequence labeling model
CN112966712A (en) * 2021-02-01 2021-06-15 北京三快在线科技有限公司 Language model training method and device, electronic equipment and computer readable medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐聪;李擎;张德政;陈鹏;崔家瑞;: "文本生成领域的深度强化学习研究进展", 工程科学学报, no. 04, 31 March 2020 (2020-03-31) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116401359A (en) * 2023-06-09 2023-07-07 深圳前海环融联易信息科技服务有限公司 Document extraction method and device, medium and equipment
CN117350407A (en) * 2023-11-20 2024-01-05 北京中关村科金技术有限公司 Model processing method, device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN110147456B (en) Image classification method and device, readable storage medium and terminal equipment
CN111737476B (en) Text processing method and device, computer readable storage medium and electronic equipment
CN113420123A (en) Language model training method, NLP task processing method and device
CN108090218B (en) Dialog system generation method and device based on deep reinforcement learning
CN116629275B (en) Intelligent decision support system and method based on big data
CN110209782B (en) Question-answering model and answer sentence generation method and device, medium and electronic equipment
CN108038541B (en) CTR (China train redundancy) estimation method, device, equipment and computer readable medium
CN116501898A (en) Financial text event extraction method and device suitable for few samples and biased data
CN114511038A (en) False news detection method and device, electronic equipment and readable storage medium
CN114443483A (en) Test method and device of artificial intelligence system, electronic equipment and medium
CN113095045A (en) Chinese mathematics application problem data enhancement method based on reverse operation
CN114548192A (en) Sample data processing method and device, electronic equipment and medium
CN111026849B (en) Data processing method and device
CN110929516A (en) Text emotion analysis method and device, electronic equipment and readable storage medium
CN110825866A (en) Automatic question-answering system and device based on deep network and text similarity
CN110852042A (en) Character type conversion method and device
CN114547308A (en) Text processing method and device, electronic equipment and storage medium
CN114647977A (en) Training method and device for optical fiber transmission signal prediction model
CN113222050A (en) Image classification method and device, readable medium and electronic equipment
CN112907409A (en) Application problem solving method, device, medium and electronic equipment
CN113870846B (en) Speech recognition method, device and storage medium based on artificial intelligence
CN114330512B (en) Data processing method, device, electronic equipment and computer readable storage medium
CN117493688B (en) Teaching information recommendation method and device based on user portrait and electronic equipment
CN114077654A (en) Question answering method, device, equipment and storage medium
US20240135188A1 (en) Semi-supervised framework for efficient time-series ordinal classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination