CN116629235B

CN116629235B - Large-scale pre-training language model fine tuning method and device, electronic equipment and medium

Info

Publication number: CN116629235B
Application number: CN202310913374.1A
Authority: CN
Inventors: 暴宇健; 汪骞
Original assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Current assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date: 2023-07-25
Filing date: 2023-07-25
Publication date: 2024-01-05
Anticipated expiration: 2043-07-25
Also published as: CN116629235A

Abstract

The application provides a method, a device, electronic equipment and a medium for fine tuning of a large-scale pre-training language model. The method comprises the following steps: obtaining a pre-trained large-scale language model, modifying input data of a corresponding task by utilizing a task instruction template to obtain an input text and an output text for fine tuning training, and taking the output text as a correct answer corresponding to the input text; inputting an input text into the pre-trained large-scale language model for prediction, and obtaining a prediction result output by the pre-trained large-scale language model; and calculating a loss function based on a predicted result and a correct answer corresponding to the input text, and updating parameters of the pre-trained large-scale language model by using the loss function until the pre-trained large-scale language model converges. The method and the device greatly improve the efficiency of model deployment, reduce the computing resource and cost of model deployment, effectively improve the performance of the model on zero sample tasks and improve the model precision.

Description

Large-scale pre-training language model fine tuning method and device, electronic equipment and medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and apparatus for fine tuning a large-scale pre-training language model, an electronic device, and a medium.

Background

In recent years, in the field of Natural Language Processing (NLP), large-scale pre-trained language models (e.g., BERT, GPT, etc.) have been significantly successful and have become a popular research direction. These models are pre-trained with large amounts of unlabeled text data, and then fine-tuned with small amounts of labeled data to suit a particular task. However, for zero sample tasks, such as reading understanding and question-answering systems, etc., the performance of these pre-trained models remains to be improved.

A zero sample task refers to a model that needs to complete a task without the annotation data for any particular task. In this case, the pre-trained model typically requires a large amount of annotation data to fine tune in order to achieve good performance. However, in practice, labeling data tends to be scarce, expensive, or require a significant amount of time to collect. Therefore, developing a fine tuning method for a large-scale pre-training model that performs well on a zero-sample task has become an important topic of current research.

Currently, prior art solutions are to fine tune pre-trained models by using transfer learning. The basic idea of transfer learning is to utilize language representation capabilities learned by pre-trained models to transfer these capabilities to new tasks. Although this approach effectively exploits the ability of the pre-trained model, there are significant drawbacks. First, the trimming process requires additional labeling data, and the collection of such data and the labeling process tend to consume huge resources. Secondly, the trimmed model is often only suitable for a single or a few specific tasks, and is difficult to widely adapt to various tasks, which clearly increases the deployment and use cost of the model.

Disclosure of Invention

In view of this, the embodiment of the application provides a method, a device, an electronic device and a medium for fine tuning a large-scale pre-training language model, so as to solve the problems of expensive and time-consuming collection of labeling data, low model deployment efficiency, high use cost and poor model precision in the prior art.

In a first aspect of an embodiment of the present application, a method for fine tuning a large-scale pre-training language model is provided, including: acquiring a pre-trained large-scale language model, and taking the pre-trained large-scale language model as a reference model; the input data of the corresponding task is modified by utilizing a preset task instruction template to obtain an input text and an output text for fine tuning the pre-trained large-scale language model, and the output text is used as a correct answer corresponding to the input text; inputting an input text into the pre-trained large-scale language model for prediction, and obtaining a prediction result output by the pre-trained large-scale language model; and calculating a loss function based on a predicted result and a correct answer corresponding to the input text, and updating parameters of the pre-trained large-scale language model by using the loss function until the pre-trained large-scale language model converges.

In a second aspect of the embodiments of the present application, there is provided a macro-scale pre-training language model fine tuning apparatus, including: the acquisition module is configured to acquire a pre-trained large-scale language model, and the pre-trained large-scale language model is used as a reference model; the transformation module is configured to transform input data of a corresponding task by utilizing a preset task instruction template to obtain an input text and an output text for fine tuning the pre-trained large-scale language model, and the output text is used as a correct answer corresponding to the input text; the prediction module is configured to input an input text into the pre-trained large-scale language model for prediction, so as to obtain a prediction result output by the pre-trained large-scale language model; and the updating module is configured to calculate a loss function based on a prediction result and a correct answer corresponding to the input text, and update parameters of the pre-trained large-scale language model by using the loss function until the pre-trained large-scale language model converges.

In a third aspect of the embodiments of the present application, there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

In a fourth aspect of the embodiments of the present application, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above method.

The above-mentioned at least one technical scheme that this application embodiment adopted can reach following beneficial effect:

the method comprises the steps of obtaining a pre-trained large-scale language model, and taking the pre-trained large-scale language model as a reference model; the input data of the corresponding task is modified by utilizing a preset task instruction template to obtain an input text and an output text for fine tuning the pre-trained large-scale language model, and the output text is used as a correct answer corresponding to the input text; inputting an input text into the pre-trained large-scale language model for prediction, and obtaining a prediction result output by the pre-trained large-scale language model; and calculating a loss function based on a predicted result and a correct answer corresponding to the input text, and updating parameters of the pre-trained large-scale language model by using the loss function until the pre-trained large-scale language model converges. According to the method and the device, the data do not need to be additionally marked on the new task, the efficiency of model deployment is greatly improved, the computing resource and the use cost of model deployment are reduced, the performance of the model on a zero sample task can be effectively improved, and the model precision is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for fine tuning a large-scale pre-training language model according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a macro-scale pre-training language model fine tuning device according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In recent years, large-scale pre-trained language models (e.g., BERT, GPT, etc.) have become a popular research direction in the field of natural language processing, with significant success in many applications. These models are typically pre-trained with large amounts of unlabeled text data, and then fine-tuned with small amounts of labeled data to suit a particular task.

However, in zero sample tasks (e.g., reading understanding, question-answering systems, etc.), the performance of these pre-trained models is still not ideal, and a large amount of annotation data is often required for fine tuning to achieve better performance. In practice, labeling data is often scarce, expensive, or requires a significant amount of time to collect. Therefore, a large-scale pre-training model fine tuning method for zero sample tasks is extremely important.

The existing technical scheme is to use a transfer learning (transfer learning) method for fine tuning the model. The transfer learning refers to fine tuning the model to a new task on the basis of a pre-trained model. The advantage of this approach is that the language representation capability learned by the pre-trained model can be fully exploited, but it still suffers from some drawbacks:

additional annotation data is required for fine tuning, and the collection and annotation of annotation data is often scarce, expensive and requires a significant amount of time to collect. In the fine tuning process, if the quality of the labeling data is poor, the performance of the model can be seriously affected.

Moreover, the trimmed model is only suitable for a single task or a few specific tasks, and the model cannot be easily applied to other tasks, so that the single model cannot process multiple tasks at the same time, and the deployment and use cost of the model are increased.

In view of the problems in the prior art, the present application provides a new method of instruction fine tuning for a large-scale pre-trained language model for zero-sample tasks. The application hopes that one model can work adequately on a plurality of tasks after training, so that the cost of deploying the model can be greatly reduced. The method does not need to additionally annotate data on the new task, can effectively improve the performance of the model on the zero sample task, and can be more easily applied to various other tasks. The method not only can effectively improve the performance of the model on zero sample tasks, but also can enable the model to be more easily migrated to various tasks, and greatly reduces the deployment and use cost of the model.

FIG. 1 is a flow chart of a method for fine tuning a large-scale pre-training language model according to an embodiment of the present application. The large-scale pre-trained language model tuning method of fig. 1 may be performed by a server. As shown in FIG. 1, the method for fine tuning a large-scale pre-training language model specifically may include:

s101, acquiring a pre-trained large-scale language model, and taking the pre-trained large-scale language model as a reference model;

s102, reforming input data of a corresponding task by using a preset task instruction template to obtain an input text and an output text for fine tuning a pre-trained large-scale language model, and taking the output text as a correct answer corresponding to the input text;

s103, inputting the input text into the pre-trained large-scale language model for prediction, and obtaining a prediction result output by the pre-trained large-scale language model;

and S104, calculating a loss function based on a prediction result and a correct answer corresponding to the input text, and updating parameters of the pre-trained large-scale language model by using the loss function until the pre-trained large-scale language model converges.

In the embodiment of the application, the basic idea of the application is to convert any task into a question-and-answer type task so as to process the task by using a pre-trained large-scale language model. First, the embodiment of the present application needs to select a large language model (i.e., a large language model) that has been pre-trained as a reference model. Large language models need to be large-scale, typically over billions in parameter size, and such models have been pre-trained on large amounts of unlabeled text data, learning rich language representations.

Then, the task instruction template is utilized to conduct instruction fine tuning on the pre-trained large-scale language model. Instruction refinement is core in telling the pre-training model what task it is expected to do by providing it with a piece of instructions. This instruction is added as part of the input sequence before the input that would otherwise be given to the model.

In practical applications, the generation model of the application can select a plurality of mature pre-training large-scale language models, such as GPT2, GPT3, BART large, T5 large and the like. The specific model structure of the pre-trained large-scale language model does not constitute a limitation to the technical scheme of the application.

In some embodiments, modifying input data of a corresponding task using a preset task instruction template includes: determining a task instruction template corresponding to a task, extracting a transformation instruction of the task from the task instruction template, and transforming input data of the task by utilizing the transformation instruction to obtain fine tuning training data; the transformation instruction is used for representing the task type predicted by the model and the task to be completed by the model.

Specifically, the embodiment of the application firstly obtains a pre-trained large-scale language model, such as GPT-3, BART large and the like, and takes the pre-trained large-scale language model as a basis. Such pre-trained large-scale language models have typically been pre-trained on large-scale unlabeled text data, enabling rich language patterns and knowledge to be captured.

In one example, a task instruction template corresponding to a particular task is determined. For example, if the task is text summary generation, embodiments of the present application may select the task instruction templates as follows: please summarize the following paragraphs: { input }). Here, "{ input }" is a placeholder, representing input data of a task. The transformation instruction is extracted from the task instruction template, namely, please summarize and generalize the following paragraphs: ".

In some embodiments, reform the input data of the task with reform instructions to obtain fine-tuning training data, comprising: adding a transformation instruction to the head position of an input sequence corresponding to the input data so as to transform the input data to obtain fine tuning training data; the fine tuning training data comprises input text and output text for model fine tuning.

Specifically, the embodiment of the application reforms the task input data based on the reforming instruction, namely reforms the task input data by using the reforming instruction. In this embodiment, the manner in which the reformulation instruction is added to the front of the input data converts the original task into a question-and-answer type task. The purpose of the reformulation instructions is to explicitly direct the model's predicted task types and the tasks that the model needs to accomplish.

In one example, if the task is an emotion classification task, then the reformulation instruction may be "please answer the following sentence emotion: "and then adding this reformulation instruction to the front of the original input data, the resulting fine tuning training data may be" please answer the following sentence emotion: the movie is truly very wonderful-! Positive a, negative B, neutral C). The model should output the corresponding text, e.g. "C neutral", at the time of prediction.

Similarly, if the task is a natural language reasoning task, the reformulation instruction may be "what relationship exists between the following two sentences? ". Adding this reformulation instruction to the front of the original input data, the resulting fine tuning training data may be "what relationship exists between the following two sentences? The precondition is that: if you need this book, you may have been too late unless you are about to take a SAT or GRE test. ", and assuming: "never too late unless you are about to take an examination. A: support, B: contradiction, C: independent of each other. In this case, the model should output the word "C don't care".

According to the technical scheme provided by the embodiment of the application, the transformation of the input data can be realized by adding the transformation instruction to the head position of the input data, so that the model can understand and adapt to a plurality of different task types. The fine tuning training data comprises input text and output text for fine tuning of the model, which enables parameters of the model to be effectively adjusted in the training process, so that the model can be better adapted to various tasks.

When the model training process needs to be described, the more the types of tasks are added, the more the total data amount is, and the better the effect of the final model is, the more the tasks which can be added generally include text classification, natural language reasoning, reading and understanding tasks, general field question-answering tasks, translation tasks, abstract generation tasks and the like.

In some embodiments, before modifying the input data of the task with the modification instruction, the method of the embodiments of the present application further includes: optimizing the task instruction templates so as to obtain multiple task instruction templates with the same expression mode and different forms; and evaluating the effect of each task instruction template by using the verification set, and selecting a final task instruction template corresponding to each task according to the evaluation result, wherein the final task instruction template is used for modifying input data of the task.

In particular, in the training process, attention is paid to the instruction format or words of various tasks, several descriptive words with the same meaning but different expression modes can be tested more, and the best word description is selected to perform model training according to the verification set effect. That is, embodiments of the present application also provide for selection of instructions. That is, during the training process, different ways may be attempted to express the same instruction. For example, "please answer the emotion of the following sentence" may be replaced by "please judge the emotion of the following sentence". The best-performing instruction is then selected for model training by comparing the effects of these different instructions on the validation set.

Further, all the input text and the output text are aligned such that each input text matches a correct output answer. For example, in an emotion classification task, the input may be: "please answer the following sentence emotion: the movie is truly very wonderful-! Positive a, negative B, neutral C ", the correct output answer may be: "C neutral".

In some embodiments, calculating the loss function based on the predicted outcome and the correct answer corresponding to the input text includes: in the process of performing fine tuning training on the pre-trained large-scale language model, performing loss calculation between a prediction result corresponding to the input text output by the pre-trained large-scale language model and an output text corresponding to the input text in fine tuning training data to obtain a loss function.

Specifically, input text is first fed into a pre-trained Large-scale language generation model, such as GPT2, GPT3, BART-Larget, T5-Larget, and the like. These pre-trained models process the input text and output the predicted results. The prediction result is the output generated by the model from the input text, which represents the understanding and prediction of the task by the model.

Further, the predicted outcome of the model is compared with the correct answer. The correct answer is the output text preset for each input text in the fine tuning training data. By comparing the predicted result of the model with the correct answer, the difference between them can be calculated, which is the loss.

In one example, embodiments of the present application may use a cross entropy loss function to quantify the difference between model predictions and correct answers. The cross entropy loss function can effectively reflect the prediction performance of the model, and the smaller the value of the cross entropy loss function is, the closer the prediction result of the model is to a correct answer.

Finally, the parameters of the model are updated according to the calculated loss function. The purpose of updating the model parameters is to adjust the model so that it can output results closer to the correct answer in future predictions. This process is repeated until the model performance reaches a satisfactory level, i.e., the training converges.

Thus, embodiments of the present application describe a method for improving model performance by comparing model predictions to correct answers, calculating a loss function, and then updating model parameters based on the loss function. The method can effectively utilize the pre-trained large-scale language model, and improve the performance of the model on various tasks.

In some embodiments, the method of embodiments of the present application further comprises: and selecting a corresponding loss function according to tasks corresponding to the pre-trained large-scale language model after fine tuning training, wherein the loss function adopts cross entropy loss, CTC loss or CRF loss.

Specifically, the embodiment of the application not only sends the input text into the pre-trained large-scale language generation model and obtains the prediction result, but also needs to select a proper loss function according to the specific property of the task so as to more accurately measure the performance of the model. For example, embodiments of the present application may select cross entropy loss (Cross Entropy Loss), CTC loss (Connectionist TemporalClassification Loss), or CRF loss (ConditionalRandom Field Loss) as the loss function. These loss functions are each characterized and can cope with different types of tasks.

In one example, for classification tasks and generation tasks, embodiments of the present application may select cross entropy loss. The cross entropy penalty can effectively scale the difference between the model predictive probability distribution and the true probability distribution.

For sequence labeling tasks, such as speech recognition or handwriting recognition, embodiments of the present application may select CTC losses. CTC loss can address the problem of mismatch in input and output sequence lengths, enabling such tasks to be handled efficiently.

For complex structured prediction tasks, such as named entity recognition or part-of-speech tagging, the present embodiments may choose CRF loss. The CRF loses global information that takes into account the output sequence, and can handle the complexity of such tasks.

By selecting a loss function suitable for a task, the embodiment of the application can more accurately measure the performance of the model and better guide the training of the model, thereby improving the performance of the model on various tasks. The method can effectively utilize the pre-trained large-scale language model, and improve the performance of the model on various tasks.

In some embodiments, the method of embodiments of the present application further comprises: when the zero sample task is predicted by utilizing the pre-trained large-scale language model after fine tuning training, the service input data of the zero sample task is rewritten according to a transformation instruction corresponding to the preset zero sample task, and the rewritten service input data is used as model input to predict.

Specifically, when model training is complete, embodiments of the present application may use it for various new zero sample tasks. It should be noted that the embodiment of the present application needs to give the model an instruction before inputting data to the model, which tells the model what task the embodiment of the present application wants it to accomplish. For example, the embodiment of the application expects the model to perform a text summarization task, and the text to be summarized can be added into an instruction and rewritten into: please summarize the following paragraphs: in recent years, large-scale pre-trained language models (e.g., BERT, GPT, etc.) have become a popular research direction in the field of natural language processing, with significant success in many applications. These models are typically pre-trained with large amounts of unlabeled text data, and then fine-tuned with small amounts of labeled data to suit a particular task. The model generates a summary of the input paragraphs for embodiments of the present application based on this instruction. This process is just as much as the embodiments of the present application do when training the model, but this time, the instructions that the embodiments of the present application give the model are intended to help the embodiments of the present application generate the abstract, rather than performing the classification or other tasks.

Notably, embodiments of the present application must first convert the task into a format that is seen when the model is trained before submitting the task to the model process. That is, embodiments of the present application require that an appropriate instruction be provided for each task. This instruction needs to explicitly indicate not only the type of task (e.g., emotion classification, natural language reasoning, or text summarization), but also what the model needs to generate output.

According to the technical scheme provided by the embodiment of the application, the instruction fine tuning method of the large-scale pre-training language model provided by the embodiment of the application can enable a single pre-training model to process a plurality of different types of tasks without performing additional task specific training or fine tuning. This feature is particularly valuable in situations where the annotation data is scarce or unavailable. Since one generic model can handle a variety of tasks, the cost of model deployment can be significantly reduced. According to the method and the device, a special model is not required to be trained and deployed for each task, so that the deployment efficiency is greatly improved, and meanwhile, the computing resources and the deployment cost are reduced. The method can effectively improve the performance of the model on new zero sample learning tasks, and has stronger generalization capability. This is mainly benefited by the application using multitasking training data for instruction refinement, rather than a large amount of annotation data for specific downstream tasks, which can be very expensive and time consuming to obtain. The training method is simple and easy to operate, training data are easy to construct, and training is easy to conduct in practical application. The model can continuously add task types and quantity into the data set, the model precision is continuously and iteratively improved, and high expandability is shown.

The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

Fig. 2 is a schematic structural diagram of a macro-scale pretrained language model fine tuning device according to an embodiment of the present application. As shown in fig. 2, the large-scale pre-training language model fine tuning device includes:

an acquisition module 201 configured to acquire a pre-trained large-scale language model, taking the pre-trained large-scale language model as a reference model;

the transformation module 202 is configured to transform input data of a corresponding task by using a preset task instruction template to obtain an input text and an output text for fine tuning the pre-trained large-scale language model, and take the output text as a correct answer corresponding to the input text;

the prediction module 203 is configured to input an input text into the pre-trained large-scale language model for prediction, so as to obtain a prediction result output by the pre-trained large-scale language model;

and the updating module 204 is configured to calculate a loss function based on the prediction result and the correct answer corresponding to the input text, and update the parameters of the pre-trained large-scale language model by using the loss function until the pre-trained large-scale language model converges.

In some embodiments, the transformation module 202 of fig. 2 determines a task instruction template corresponding to a task, extracts transformation instructions for the task from the task instruction template, transforms input data for the task using the transformation instructions, and obtains fine-tuning training data; the transformation instruction is used for representing the task type predicted by the model and the task to be completed by the model.

In some embodiments, the reform module 202 of fig. 2 adds reform instructions to head positions of the input sequence corresponding to the input data to reform the input data to obtain fine-tuning training data; the fine tuning training data comprises input text and output text for model fine tuning.

In some embodiments, the reform module 202 of fig. 2 optimizes the task instruction templates to obtain a plurality of task instruction templates of the same expression and different forms prior to reformulating the input data of the task with the reform instructions; and evaluating the effect of each task instruction template by using the verification set, and selecting a final task instruction template corresponding to each task according to the evaluation result, wherein the final task instruction template is used for modifying input data of the task.

In some embodiments, the update module 204 of fig. 2 performs a penalty calculation on a prediction result corresponding to an input text output by the pre-trained large-scale language model and an output text corresponding to the input text in the fine-tuning training data in the process of performing fine-tuning training on the pre-trained large-scale language model, so as to obtain a penalty function.

In some embodiments, the update module 204 of fig. 2 selects a corresponding loss function according to the task corresponding to the pre-trained large-scale language model after the fine-tuning training, wherein the loss function employs cross entropy loss, CTC loss, or CRF loss.

In some embodiments, when the zero sample task is predicted by using the pre-trained large-scale language model after the fine tuning training, the rewrite module 205 of fig. 2 rewrites the service input data of the zero sample task according to the transformation instruction corresponding to the preset zero sample task, and predicts the rewritten service input data as the model input.

It should be understood that, the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.

Fig. 3 is a schematic structural diagram of the electronic device 3 provided in the embodiment of the present application. As shown in fig. 3, the electronic apparatus 3 of this embodiment includes: a processor 301, a memory 302 and a computer program 303 stored in the memory 302 and executable on the processor 301. The steps of the various method embodiments described above are implemented when the processor 301 executes the computer program 303. Alternatively, the processor 301, when executing the computer program 303, performs the functions of the modules/units in the above-described apparatus embodiments.

Illustratively, the computer program 303 may be partitioned into one or more modules/units, which are stored in the memory 302 and executed by the processor 301 to complete the present application. One or more of the modules/units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program 303 in the electronic device 3.

The electronic device 3 may be an electronic device such as a desktop computer, a notebook computer, a palm computer, or a cloud server. The electronic device 3 may include, but is not limited to, a processor 301 and a memory 302. It will be appreciated by those skilled in the art that fig. 3 is merely an example of the electronic device 3 and does not constitute a limitation of the electronic device 3, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic device may also include an input-output device, a network access device, a bus, etc.

The processor 301 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 302 may be an internal storage unit of the electronic device 3, for example, a hard disk or a memory of the electronic device 3. The memory 302 may also be an external storage device of the electronic device 3, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 3. Further, the memory 302 may also include both an internal storage unit and an external storage device of the electronic device 3. The memory 302 is used to store computer programs and other programs and data required by the electronic device. The memory 302 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in this application, it should be understood that the disclosed apparatus/computer device and method may be implemented in other ways. For example, the apparatus/computer device embodiments described above are merely illustrative, e.g., the division of modules or elements is merely a logical functional division, and there may be additional divisions of actual implementations, multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow in the methods of the above embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program may implement the steps of the respective method embodiments described above when executed by a processor. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method for fine tuning a large scale pre-trained language model, comprising:

acquiring a pre-trained large-scale language model, and taking the pre-trained large-scale language model as a reference model;

transforming input data of a corresponding task by using a preset task instruction template to obtain an input text and an output text for fine tuning the pre-trained large-scale language model, and taking the output text as a correct answer corresponding to the input text;

inputting the input text into the pre-trained large-scale language model for prediction, and obtaining a prediction result output by the pre-trained large-scale language model;

calculating a loss function based on a prediction result and a correct answer corresponding to the input text, and updating parameters of the pre-trained large-scale language model by using the loss function until the pre-trained large-scale language model converges;

the transformation of input data of the corresponding task by using a preset task instruction template comprises the following steps:

determining a task instruction template corresponding to a task, extracting a transformation instruction of the task from the task instruction template, and transforming input data of the task by utilizing the transformation instruction to obtain fine tuning training data; the transformation instruction is used for representing the task type predicted by the model and the task to be completed by the model;

the step of transforming the input data of the task by using the transformation instruction to obtain fine tuning training data comprises the following steps:

adding the transformation instruction to the head position of an input sequence corresponding to the input data so as to transform the input data to obtain the fine tuning training data; the fine tuning training data comprises an input text and an output text for fine tuning of a model;

before the reform of the input data of the task with the reform instruction, the method further comprises:

optimizing the task instruction templates so as to obtain a plurality of task instruction templates with the same expression mode and different forms; and evaluating the effect of each task instruction template by using a verification set, and selecting a final task instruction template corresponding to each task according to the evaluation result, wherein the final task instruction template is used for modifying the input data of the task.

2. The method of claim 1, wherein the calculating a loss function based on the predicted result and the correct answer corresponding to the input text comprises:

and in the process of performing fine tuning training on the pre-trained large-scale language model, performing loss calculation between a prediction result corresponding to the input text output by the pre-trained large-scale language model and an output text corresponding to the input text in the fine tuning training data to obtain the loss function.

3. The method according to claim 2, wherein the method further comprises:

and selecting a corresponding loss function according to tasks corresponding to the pre-trained large-scale language model after the fine tuning training, wherein the loss function adopts cross entropy loss, CTC loss or CRF loss.

4. The method according to claim 1, wherein the method further comprises:

when the zero sample task is predicted by utilizing the pre-trained large-scale language model after fine tuning training, the service input data of the zero sample task is rewritten according to a preset transformation instruction corresponding to the zero sample task, and the rewritten service input data is used as model input to predict.

5. A large scale pre-training language model fine tuning apparatus, comprising:

the acquisition module is configured to acquire a pre-trained large-scale language model, and takes the pre-trained large-scale language model as a reference model;

the transformation module is configured to transform input data of a corresponding task by utilizing a preset task instruction template to obtain an input text and an output text for fine tuning the pre-trained large-scale language model, and the output text is used as a correct answer corresponding to the input text;

the prediction module is configured to input the input text into the pre-trained large-scale language model for prediction, so as to obtain a prediction result output by the pre-trained large-scale language model;

the updating module is configured to calculate a loss function based on a prediction result and a correct answer corresponding to the input text, and update parameters of the pre-trained large-scale language model by using the loss function until the pre-trained large-scale language model converges;

the transformation module is also used for determining a task instruction template corresponding to a task, extracting a transformation instruction of the task from the task instruction template, and transforming input data of the task by utilizing the transformation instruction to obtain fine tuning training data; the transformation instruction is used for representing the task type predicted by the model and the task to be completed by the model;

the transformation module is further used for adding the transformation instruction to the head position of the input sequence corresponding to the input data so as to transform the input data to obtain the fine tuning training data; the fine tuning training data comprises an input text and an output text for fine tuning of a model;

the transformation module is also used for optimizing the task instruction templates before transforming the input data of the task by utilizing the transformation instruction so as to obtain a plurality of task instruction templates with the same expression mode and different forms; and evaluating the effect of each task instruction template by using a verification set, and selecting a final task instruction template corresponding to each task according to the evaluation result, wherein the final task instruction template is used for modifying the input data of the task.

6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 4 when the computer program is executed.

7. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method according to any one of claims 1 to 4.