CN116340779A

CN116340779A - Training method and device for next-generation universal basic model and electronic equipment

Info

Publication number: CN116340779A
Application number: CN202310620027.XA
Authority: CN
Inventors: 王业全; 李响; 姜鑫; 孟绪颖; 孙爱欣
Original assignee: Beijing Zhiyuan Artificial Intelligence Research Institute
Current assignee: Beijing Zhiyuan Artificial Intelligence Research Institute
Priority date: 2023-05-30
Filing date: 2023-05-30
Publication date: 2023-06-27

Abstract

The invention discloses a training method and device for a next-generation universal basic model and electronic equipment, and belongs to the technical field of natural language processing. Training a model by utilizing the original data in a language training stage so that a next generation universal basic model can generate corresponding unified data based on the input original data; the unified data is utilized to train the model in the training stage of the teacher so that the correctness of the proposition can be judged by the universal basic model of the next generation; and carrying out alternate iterative language training and teacher training on the model to obtain a trained next-generation universal basic model. The method teaches the model to learn task awareness data by utilizing language raw data and task awareness data in a training process while emphasizing its role as a language model. A user can process multiple tasks by utilizing the single model obtained by training, additional fine adjustment is not needed for each task, the modeling cost is low, the model generalization is strong, and the performance of business performance can be improved.

Description

Training method and device for next-generation universal basic model and electronic equipment

Technical Field

The present invention relates to the field of natural language processing technologies, and in particular, to a training method and apparatus for a universal basic model of the next generation, and an electronic device.

Background

Natural language processing (Natural Language Processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language.

At present, natural language processing models that are widely studied and used include small models and large models. The small model is a language model which is obtained by training each task by using special data independently. Since a corresponding model needs to be built for each task, not only is the modeling cost high, but also the generalization of the model is poor. The large model is a basic model, and is an AI model trained on a large-scale and widely-sourced data set based on a neural network and a self-supervision learning technology. The training of the base model comprises two steps, the first step being pre-training, the purpose of this step being to train the model using as many self-supervised data sets as possible. The pre-training method may use self-supervised learning techniques (e.g., autoregressive language models and self-coding techniques). Single-language, multi-language, and multi-modal models may be trained. The second step is fine tuning, which modifies the network for the specific task. The training data may be text, text-image pairs, text-video pairs. After fine tuning, the model is used for supporting various technologies such as classification, sequence marking, structure prediction, sequence generation and the like, and constructing applications such as abstract, machine translation, picture retrieval, video annotation and the like. Therefore, the basic model is mainly trained by adopting a paradigm of 'pre-training followed by fine tuning'. Although the model of the model obtained by the model form has been remarkably successful in many natural language processing tasks, the model only can learn language knowledge in the training stage of the language model and can not learn related knowledge in actual tasks, so that fine adjustment is required to be carried out on each task, and the modeling cost is very high. For example, a large model, after fine tuning of 100 tasks, will yield 100 dedicated large models, and each dedicated large model will be less generic.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides the following technical scheme.

The first aspect of the present invention provides a training method for a next-generation universal base model, comprising:

acquiring training data, wherein the training data comprises original data and unified data; the unified data comprises propositions of natural language description and corresponding true and false labels, and the unified data is obtained by converting the original data aiming at different original labels in a classification problem target space;

alternately iterating language training and teacher training on the next-generation universal basic model to be trained to obtain a trained next-generation universal basic model; wherein,,

training the next generation universal basic model by utilizing the original data in the language training stage so that the next generation universal basic model can generate corresponding unified data based on the input original data;

and training the next-generation universal basic model by utilizing the unified data in the training stage of the teacher so that the next-generation universal basic model can judge the correctness of the proposition.

Preferably, the unified data is for different original labels in the classification problem target space, and the converting the original data includes: under the guidance of a specific template, aiming at each original label in the classification problem target space, converting the original data into a proposition described by natural language.

Preferably, the specific template is used for completing the conversion rule from the original data to the unified data, including task description of the original data, context of the original data and mapping rule of the original label to the proposition.

Preferably, the raw data includes a natural language processing data set corresponding to at least one classification task, the at least one classification task including topic classification, natural language reasoning, emotion analysis, reading understanding, automatic question-answering, and story renewal.

Preferably, the natural language processing data set corresponding to the at least one classification task is integrated by means of building a template.

Preferably, in the training stage of a teacher, the model takes propositions as input and takes propositions correctness labels as supervision to learn; and adding a preset character at the end of the proposition, selecting the output of the model at the corresponding position of the preset character as the characteristic representation of the preset character, mapping the characteristic representation to a true and false two-dimensional space through a linear layer, and finally calculating the correctness probability of the proposition through a softmax activation function.

A second aspect of the present invention provides a training apparatus for a next generation universal base model, comprising:

the training data acquisition module is used for acquiring training data, wherein the training data comprises original data and unified data; the unified data comprises propositions of natural language description and corresponding true and false labels, and the unified data is obtained by converting the original data aiming at different original labels in a classification problem target space;

the model training module is used for carrying out language training and teacher training of alternate iteration on the next-generation universal basic model to be trained so as to obtain a trained next-generation universal basic model; wherein,,

A third aspect of the present invention provides a classification method using a next-generation universal base model, comprising:

inputting the original data to be classified into a next generation universal basic model, and outputting a classification result of the original data to be classified according to the original label corresponding to the proposition with the maximum accuracy probability;

the next-generation universal basic model is obtained by training in advance by adopting the training method in the first aspect.

A fourth aspect of the present invention provides a memory storing a plurality of instructions for implementing the training method of the next-generation generic base model according to the first aspect, or the classification method using the next-generation generic base model according to the third aspect.

A fifth aspect of the present invention provides an electronic device comprising a processor and a memory coupled to the processor, the memory storing a plurality of instructions that are loadable and executable by the processor to enable the processor to perform the method of training the next generation generic base model as described in the first aspect or the method of classifying using the next generation generic base model as described in the third aspect.

The beneficial effects of the invention are as follows: aiming at the technical problems of a large model obtained by adopting a model of pre-training and fine tuning in the prior art, the invention provides a method for establishing a next generation universal basic model. Because the learned task knowledge has the effect of reducing the search optimization space, the language knowledge and the task knowledge can mutually promote in the cyclic learning process, so that the training speed of the next-generation universal basic model is accelerated; in addition, the technology provided by the invention can use a single model trained by self-contained data and even open source data to process a plurality of specific tasks in the service, and additional fine tuning is not required for each task. Therefore, the invention can reduce modeling cost, has strong generalization of the model and can improve the performance of service performance.

Drawings

FIG. 1 is a schematic flow chart of a training method of a next generation universal basic model according to the present invention;

FIG. 2 is a schematic diagram of a unified data acquisition mode according to the present invention;

FIG. 3 is a schematic diagram of the training process (alternately iterating language training and teacher training) of the next generation universal base model according to the present invention;

FIG. 4 is a schematic diagram of the language training and teacher training modes according to the present invention;

fig. 5 is a functional structural diagram of a training device of the next-generation universal basic model according to the present invention.

Detailed Description

In order to better understand the above technical solutions, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.

The method provided by the invention can be implemented in a terminal environment, and the terminal can comprise one or more of the following components: processor, memory and display screen. Wherein the memory stores at least one instruction that is loaded and executed by the processor to implement the method described in the embodiments below.

The processor may include one or more processing cores. The processor connects various parts within the overall terminal using various interfaces and lines, performs various functions of the terminal and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory, and invoking data stored in the memory.

The Memory may include random access Memory (Random Access Memory, RAM) or Read-Only Memory (ROM). The memory may be used to store instructions, programs, code, sets of codes, or instructions.

The display screen is used for displaying a user interface of each application program.

In addition, it will be appreciated by those skilled in the art that the structure of the terminal described above is not limiting and that the terminal may include more or fewer components, or may combine certain components, or a different arrangement of components. For example, the terminal further includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a power supply, and the like, which are not described herein.

Example 1

As shown in fig. 1, an embodiment of the present invention provides a training method for a next-generation universal base model, including: s101, training data is obtained, wherein the training data comprises original data and unified data; the unified data comprises propositions of natural language description and corresponding true and false labels, and the unified data is obtained by converting the original data aiming at different original labels in a classification problem target space; s102, carrying out alternate iterative language training and teacher training on a next-generation universal basic model to be trained so as to obtain a trained next-generation universal basic model; wherein, in the language training stage, the next generation universal basic model is trained by utilizing the original data so that the next generation universal basic model can generate corresponding unified data based on the input original data; and training the next-generation universal basic model by utilizing the unified data in the training stage of the teacher so that the next-generation universal basic model can judge the correctness of the proposition.

The method for converting the unified data is characterized in that the unified data is different original labels in a classification problem target space, and the method for converting the original data further comprises the following steps: under the guidance of a specific template, aiming at each original label in the classification problem target space, converting the original data into a proposition described by natural language.

In a preferred embodiment of the present invention, the specific templates are used to complete the conversion rules from raw data to unified data, including task descriptions of raw data, context of raw data, and mapping rules of raw labels to propositions.

The original data comprises a natural language processing data set corresponding to at least one classification task, wherein the at least one classification task comprises theme classification, natural language reasoning, emotion analysis, reading understanding, automatic question and answer and story renewal.

Further, the natural language processing data sets corresponding to the classification tasks can be integrated by means of building templates. For example, consider a text topic classification data: the original dataset is typically made up of a plurality of < text, tag > tuples, wherein tags can be used to represent text categories. And the finally constructed unified data is composed of a < proposition, correctness label > binary group. The "template" describes the rules for converting the propositions of the original data to unified data. A typical template is exemplified as follows: "[ background ] [ text ] [ proposition core ]"; where "[ background ]" represents "a piece of news text", "[ text ]" will be replaced with text in actual data, that is, the 'text' in original data, such as "NBA post-season express boat 2-1 leading sun...the", "[ proposition core ]" will be generated from the tag of each piece of original data, such as "sports" of the tag of original data, the [ proposition core ] will be replaced with "the subject of the piece of news is 'sports'. "the proposition part in the unified data finally constituted will be" the following is a piece of news text: the NBA season post-competition express boat 2-1 led the sun. For the "correctness label", the true label will generate correct propositions according to the true/false label determination used when constructing the data, that is, the propositions in the unified data are correct, and the correctness label can be expressed as 1; the wrong label generated by negative sampling will generate wrong propositions, i.e. the propositions are wrong in unified data, and the correctness label can be expressed as 0.

In another preferred embodiment of the invention, in the training stage of a teacher, the model learns by taking propositions as input and taking propositions correctness labels as supervision; adding a preset character, such as a 'cls' special character, at the end of the proposition, selecting the output of the model at the corresponding position of the preset character as the characteristic representation of the preset character, mapping the characteristic representation to a true and false two-dimensional space through a linear layer, and finally calculating the correctness probability of the proposition through a softmax activation function.

In the invention, in order to enable one language model to simultaneously have the solving capability of multiple tasks, various tasks are unified in form. Based on the assumption that all classification problems can be converted into non-judgment problems, the invention provides a modeling mode which can uniformly solve various classification problems, namely proposition correctness judgment. Accordingly, a variety of NLU (Natural Language Understanding ) datasets are integrated, building Unified Data (Unified Data) for training of the model.

FIG. 2 shows the unified modeling mode and the conversion process of unified data for judging the correctness of the proposition. Taking text topic classification tasks as an example: under the guidance of a specific Template (Template), the original data are respectively converted into a proposition described in natural language aiming at each label (0- "World",1- "Sports",2- "Business",3- "Sci/Tech") in the target space of the classification problem. Thus, one piece of raw data is converted into N propositions (N is the target sort space size), and only one of them is a true proposition. And finally, in the training stage of the teacher, respectively inputting the propositions and the corresponding true and false labels into the model for training the propositions correctness judging capability of the model. In the reasoning stage, the original data to be classified is input into the model as well, and the classification result of the original data to be classified can be output according to the original label corresponding to the proposition with the highest accuracy probability, so that the classification problem is solved.

Under the task form conversion framework, NLP data sets which are any classification task in essence can be integrated in a mode of respectively constructing templates for the data sets, the scale of unified data is continuously expanded, and generalization and robustness of a final training model on each task are further improved.

Based on the unified form of classification task, the training process of the next generation universal basic model (FreeLM) can be as shown in fig. 3, and the model training of the FreeLM is divided into two stages: the language training stage and the teacher training stage respectively correspond to two training targets of 'prediction of next word' and 'judgment of proposition correctness'. The two training stages are alternately iterated, so that the model is guaranteed to have the text generation capability and the proposition correctness judging capability of the language model. The FreeLM can be trained in a manner as shown in FIG. 4, wherein the language training phase (Language Iterative Stage) of the model (transducer) is indifferent to conventional autoregressive language model training, which predicts the kth word based on the given top k-1 words. Taking The language training phase (a) in fig. 4 as an example, the model receives "The cat sat on The mat" as input, and through The attention masking mechanism, the model will perform current word prediction under The condition that only word sequence information before The current predicted word is received, such as receiving "The cat", predicting "sat", or receiving "The cat sat on The", predicting "mat". The retention of the language model training target ensures the language generation capability of the model. And a teacher training stage (Teacher Iterative Stage) of the model takes the constructed unified data (propositions) as input and takes the propositional correctness labels as supervision to learn. And finally adding a '[ cls ]' special symbol in the proposition text, selecting the output of the model at the corresponding position as a characteristic representation, mapping to a True/False two-dimensional space through a linear layer, and calculating the correctness probability of the proposition through a softmax activation function. Taking the teacher training phase (b) in fig. 4 as an example, the input first describes that the current task is natural language reasoning (natural language inference), and the two sentences given respectively represent "The cat sat on the mat (cat sitting on mat)", "The cat did not sat on the mat (cat not sitting on mat)"; and the proposition is expressed as "Here, the premise entails the latter hypothesis (the former premise implies the latter assumption)". Obviously, the expressions of the two sentences are contradictory and have no implication. Thus, the proposition is wrong, and the model will also be trained to judge the proposition as wrong.

The large model constructed by the method provided by the invention has the following beneficial effects: (1) The language knowledge and the task knowledge are subjected to alternate loop iterative learning in a model, so that a Fine-Tuning-Free target is realized, and the modeling cost is reduced; (2) In the natural language understanding task, the method is greatly superior to the GPT (generating Pre-trained Transformer) series, including InstructGPT, has a very large training data set expansion space, and the capacity can be continuously expanded along with the size of a model and the enlargement of the training set; (3) On the task of natural language generation, the next generation universal basic model constructed by the invention shows understanding advantages in the practical effect. More importantly, the capabilities of the model such as guiding learning, context learning and the like are completely compatible, so that the language generation capability can be further enhanced; (4) The solution provided by the present invention can be adapted to basic models in almost all fields including, but not limited to, e.g., vision, speech, music, authoring, biology, medicine, etc.

Example two

As shown in fig. 5, an embodiment of the present invention provides a training apparatus for a next-generation universal base model, including: a training data acquisition module 501, configured to acquire training data, where the training data includes original data and unified data; the unified data comprises propositions of natural language description and corresponding true and false labels, and the unified data is obtained by converting the original data aiming at different original labels in a classification problem target space; the model training module 502 is configured to perform language training and teacher training alternately and iteratively on a next-generation universal basic model to be trained, so as to obtain a trained next-generation universal basic model.

Wherein, in the language training stage, the next generation universal basic model is trained by utilizing the original data so that the next generation universal basic model can generate corresponding unified data based on the input original data; and training the next-generation universal basic model by utilizing the unified data in the training stage of the teacher so that the next-generation universal basic model can judge the correctness of the proposition.

Further, in the training data obtaining module, the unified data is different original labels in the classification problem target space, and the converting the original data further includes: under the guidance of a specific template, aiming at each original label in the classification problem target space, converting the original data into a proposition described by natural language.

The specific template is used for completing the conversion rule from the original data to the unified data, and comprises a task description of the original data, a context of the original data and a mapping rule from an original label to a proposition.

Further, the raw data includes a natural language processing dataset corresponding to at least one classification task including topic classification, natural language reasoning, emotion analysis, reading understanding, automatic question-answering, and story renewal.

Furthermore, the natural language processing data set corresponding to the at least one classification task is integrated by means of a template construction.

Further, in the model training module, in the teacher training stage, the model takes propositions as input and the propositions correctness labels as supervision to learn; and adding a preset character at the end of the proposition, selecting the output of the model at the corresponding position of the preset character as the characteristic representation of the preset character, mapping the characteristic representation to a true and false two-dimensional space through a linear layer, and finally calculating the correctness probability of the proposition through a softmax activation function.

The device may be implemented by the training method of the next-generation universal base model provided in the first embodiment, and the specific implementation method may be described in the first embodiment, which is not described herein.

Example III

The embodiment of the invention provides a classification method using a next generation universal basic model, which comprises the following steps: inputting the original data to be classified into a next generation universal basic model, and outputting a classification result of the original data to be classified according to the original label corresponding to the proposition with the maximum accuracy probability; the next-generation universal basic model is obtained by training in advance by adopting the training method in the first embodiment.

The training method of the next-generation universal base model can be specifically referred to the description of the first embodiment, and will not be described herein.

The embodiment of the invention also provides a memory, which stores a plurality of instructions for realizing the training method of the next-generation universal basic model according to the first embodiment or the classification method using the next-generation universal basic model according to the third embodiment.

The embodiment of the invention also provides electronic equipment, which comprises a processor and a memory connected with the processor, wherein the memory stores a plurality of instructions which can be loaded and executed by the processor so that the processor can execute the training method of the next-generation general basic model as described in the first embodiment or the classification method using the next-generation general basic model as described in the third embodiment.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method for training a next generation generic base model, comprising:

2. The method for training a next-generation generic base model according to claim 1, wherein the unified data is for different original labels in a classification problem target space, and the converting the original data further comprises: under the guidance of a specific template, aiming at each original label in the classification problem target space, converting the original data into a proposition described by natural language.

3. The training method of the next-generation generic basic model according to claim 2, wherein the specific templates are used to complete the conversion rules from the original data to the unified data, including the task description of the original data, the context of the original data, and the mapping rules of the original labels to the propositions.

4. The training method of the next-generation generic basic model of claim 1, wherein the raw data comprises a natural language processing dataset corresponding to at least one classification task, the at least one classification task comprising topic classification, natural language reasoning, emotion analysis, reading understanding, automatic question-answering, and story-renewal.

5. The method of claim 4, wherein the natural language processing data set corresponding to the at least one classification task is integrated by way of a template.

6. The training method of the next-generation universal basic model according to claim 1, wherein in the training stage of a teacher, the model learns by taking a proposition as an input and taking a proposition correctness label as a supervision; and adding a preset character at the end of the proposition, selecting the output of the model at the corresponding position of the preset character as the characteristic representation of the preset character, mapping the characteristic representation to a true and false two-dimensional space through a linear layer, and finally calculating the correctness probability of the proposition through a softmax activation function.

7. A training device for a next generation generic basic model, comprising:

8. A classification method using a next-generation universal base model, comprising:

wherein the next-generation universal base model is trained beforehand using the training method according to any one of claims 1 to 6.

9. A memory, characterized in that a plurality of instructions for realizing the training method of the next-generation general basic model according to any one of claims 1 to 6 or the classification method using the next-generation general basic model according to claim 8 are stored.

10. An electronic device comprising a processor and a memory coupled to the processor, the memory storing a plurality of instructions that are loadable and executable by the processor to enable the processor to perform the method of training the next generation generic basis model as set forth in any one of claims 1-6 or the method of classifying using the next generation generic basis model as set forth in claim 8.