CN115018043A

CN115018043A - Model training method and device, computer readable storage medium and computer equipment

Info

Publication number: CN115018043A
Application number: CN202210461145.6A
Authority: CN
Inventors: 张珍茹; 徐海洋; 谭传奇; 黄松芳
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-04-28
Filing date: 2022-04-28
Publication date: 2022-09-06

Abstract

The invention discloses a model training method, a model training device, a computer readable storage medium and computer equipment. Wherein, the method comprises the following steps: acquiring an initial task model, wherein the initial task model comprises a pre-training model and a group of prefix parameters, and the prefix parameters are hyper-parameters of the initial task model and are used for enabling the initial task model to be adapted to different tasks; training the initial task model based on training sample data to obtain target weights corresponding to the prefix parameters in the initial task model, wherein the training sample data correspond to the target tasks; and determining a target task model based on the target weight and the initial task model. The method solves the technical problem that the pre-training model added with the prefix is over-fitted or under-fitted due to the fact that the prefix is too long or too short.

Description

Model training method and device, computer readable storage medium and computer equipment

Technical Field

The invention relates to the field of artificial intelligence, in particular to a model training method, a model training device, a computer readable storage medium and computer equipment.

Background

In order to apply the pre-trained model to downstream tasks, conventional tuning methods typically require tuning all parameters of the model. It is often necessary to save a number of different fine-tuning models for different downstream tasks. With the popularity of large-scale pre-training models and the diversity of downstream tasks, the training difficulty of fine tuning and the parameter storage difficulty are increased. The Prefix-tuning in Lightweight fine-tuning generally freezes the pre-training model, and only a small amount of extra tasks are utilized to train to obtain Prefix parameters so as to adapt to downstream tasks, so that not only is the forgetting of priori knowledge of the pre-training model avoided, but also the time efficiency of training and the space efficiency of parameter storage are improved. However, since the prefix parameters are artificially set and the number is fixed, the fine tuning of the model by using such prefixes cannot be completely adapted to different tasks.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a model training method, a model training device, a computer readable storage medium and computer equipment, which are used for at least solving the technical problem that a pre-training model added with a prefix is over-fitted or under-fitted due to the fact that the prefix is too long or too short.

According to an aspect of an embodiment of the present invention, there is provided a model training method, including: acquiring an initial task model, wherein the initial task model comprises a pre-training model and a group of prefix parameters, and the prefix parameters are hyper-parameters of the initial task model and are used for enabling the initial task model to adapt to different tasks; training the initial task model based on training sample data to obtain target weight corresponding to the prefix parameter in the initial task model, wherein the training sample data corresponds to a target task; and determining a target task model based on the target weight and the initial task model.

Optionally, the training the initial task model based on training sample data to obtain a target weight corresponding to the prefix parameter in the initial task model includes: determining an undetermined weight corresponding to the prefix parameter based on the pre-training model; and performing optimization training on the undetermined weight and the prefix parameter of the initial task model based on the training sample data to obtain a target prefix parameter and a target weight corresponding to the target prefix parameter.

Optionally, the determining a pending weight corresponding to the prefix parameter based on the pre-training model includes: in the case where the pre-trained model includes multiple network layers, the pending weights include: intra-layer undetermined weights; determining the intra-layer undetermined weight corresponding to any one of the prefix parameters, including: determining a target network layer corresponding to any one prefix parameter, wherein the target network layer is one of a plurality of network layers included in the pre-training model; obtaining model parameters of a network layer which is positioned above the target network layer in the pre-training model; determining the intra-layer pending weights based on the model parameters of the previous network layer.

Optionally, the determining the intra-layer pending weight based on the model parameter of the previous network layer includes: acquiring the learning characteristic of the last network layer according to the model parameter of the last network layer; and constructing the in-layer undetermined weight according to the learning characteristics.

Optionally, the constructing the intra-layer undetermined weight according to the learning feature includes: generating a undetermined factor; and converting the undetermined factor and the learning characteristic into a probability value interval through a logistic function to obtain the undetermined weight in the layer.

Optionally, the determining, based on the pre-training model, a pending weight corresponding to the prefix parameter includes: in the case where the pre-trained model includes multiple network layers, the pending weights include: inter-layer undetermined weights; determining the inter-layer undetermined weight corresponding to any one of the prefix parameters, including: determining an inter-layer undetermined weight corresponding to any one of the plurality of network layers; determining a target network layer corresponding to any one of the prefix parameters, wherein the target network layer is one of the network layers; and determining the interlayer undetermined weight corresponding to the target network layer as the interlayer undetermined weight corresponding to the any prefix parameter.

Optionally, the method further includes: and sending the target weight and the prefix parameter to a first target device storing the pre-training model, wherein the first target device is used for generating the target task model according to the target weight, the prefix parameter and the pre-training model.

Optionally, the pre-training model comprises any one of: pre-training a language model and pre-training an image recognition model.

According to another aspect of the embodiments of the present invention, there is also provided a model training method, including: transmitting training sample data to second target equipment, wherein the training sample data corresponds to a target task; receiving target weight and prefix parameters returned by second target equipment, wherein the prefix parameters are a group of hyper-parameters of an initial task model and are used for enabling the initial task model to adapt to different tasks, and the target weight is weight which is obtained by training the initial task model based on training sample data and corresponds to the prefix parameters; and determining a target task model based on the target weight, the prefix parameters and a pre-training model prestored in the local area, wherein the pre-training model is the same as the pre-training model in the initial task model.

According to another aspect of the embodiments of the present invention, there is also provided a model training apparatus, including: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an initial task model, the initial task model comprises a pre-training model and a group of prefix parameters, and the prefix parameters are hyper-parameters of the initial task model and are used for enabling the initial task model to be adapted to different tasks; the training module is used for training the initial task model based on training sample data to obtain target weight corresponding to the prefix parameter in the initial task model, wherein the training sample data corresponds to a target task; and the first determining module is used for determining a target task model based on the target weight and the initial task model.

According to another aspect of the embodiments of the present invention, there is also provided a model training apparatus, including: the sending module is used for sending training sample data to second target equipment, wherein the training sample data corresponds to a target task; the receiving module is used for receiving target weight and prefix parameters returned by second target equipment, wherein the prefix parameters are a group of hyper-parameters of an initial task model and are used for enabling the initial task model to adapt to different tasks, and the target weight is the weight which is obtained by training the initial task model based on training sample data and corresponds to the prefix parameters; and the second determination module is used for determining a target task model based on the target weight, the prefix parameters and a pre-training model pre-stored locally, wherein the pre-training model is the same as the pre-training model in the initial task model.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, where the computer-readable storage medium includes a stored program, and when the program runs, the apparatus where the computer-readable storage medium is located is controlled to execute any one of the above-mentioned model training methods.

According to another aspect of the embodiments of the present invention, there is also provided a computer device, including: a memory and a processor, the memory storing a computer program; the processor is configured to execute the computer program stored in the memory, and when the computer program runs, the processor is enabled to execute any one of the model training methods described above.

In the embodiment of the invention, a mode of matching weights for prefix parameters in an initial task model is adopted, different prefix parameters are matched with different target weight values through model training, the influence of each prefix parameter on the model is properly scaled, and the purpose of dynamically adjusting the length of the prefix parameter of the pre-trained model is achieved, so that the technical effect of improving the initial task model to improve the performance of the initial task model when the initial task model processes a specific task is realized, and the technical problem that the model is over-fitted or under-fitted due to overlong or overlong prefixes of the pre-trained model added with the prefixes is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 shows a block diagram of a hardware architecture of a computer terminal for implementing a model training method;

FIG. 2 is a flow chart of a first method of model training according to an embodiment of the invention;

FIG. 3 is a schematic structural diagram of a dynamic prefix fine-tuning model provided in accordance with an alternative embodiment of the present invention;

FIG. 4 is a flowchart of a second method of model training according to an embodiment of the present invention;

FIG. 5 is a block diagram of a first model training apparatus according to an embodiment of the present invention;

FIG. 6 is a block diagram of a second model training apparatus according to an embodiment of the present invention;

fig. 7 is a block diagram of a computer terminal according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

the pre-training model, that is, the initial model trained with a large data set, needs to be refined according to the characteristics of the target task when the pre-training model is applied to the target task, so as to achieve the purpose of adapting the refined model to the target task.

Lightweight fine-tuning (Lightweight fine-tuning), a method aiming at adjusting a model by using only a small amount of parameters on the basis of freezing parameters of a pre-training model so as to adapt the adjusted model to a specific target task and obtain a better effect.

Prefix-tuning, (Prefix-tuning), a lightweight tuning method, adding a continuous task-specific vector sequence to the input, called Prefix, which, unlike prompt-tuning (prompt-tuning), consists entirely of free parameters, does not correspond to a real token, like a virtual token, and only optimizes the Prefix when optimizing the model. By giving a plurality of prefixes in front of each layer of input text data or the pre-training model, the prior knowledge of the pre-training model can be better mined, so that the model shows good effect in a low-resource scene or a small quantity of training parameters.

The prefix parameters, i.e. the prefixes in the prefix fine tuning method, are composed of free parameters, and when the model is optimized, the prefix fine tuning only optimizes the prefix parameters and freezes the model parameters of the pre-training model.

Example 1

There is also provided, in accordance with an embodiment of the present invention, a method embodiment of model training, to note that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

The method provided by the embodiment 1 of the present application can be executed in a mobile terminal, a computer terminal or a similar computing device. Fig. 1 shows a hardware configuration block diagram of a computer terminal for implementing the model training method. As shown in fig. 1, computer terminal 10 may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), memory 104 for storing data, and a transmission device for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10. As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the model training method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implementing the vulnerability detection method of the application program. The memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with the user interface of the computer terminal 10.

Under the above operating environment, the present application provides a first model training method as shown in fig. 2. FIG. 2 is a flowchart of a first model training method according to an embodiment of the present invention. As shown in fig. 2, the method comprises the steps of:

step S202, an initial task model is obtained, wherein the initial task model comprises a pre-training model and a group of prefix parameters, and the prefix parameters are hyper-parameters of the initial task model and are used for enabling the initial task model to adapt to different tasks.

In this embodiment, the pre-training model is an initial model trained with a larger data set, and in the subsequent fine tuning for the target task, the parameters of the pre-training model are not adjusted any more, and the pre-training model is "frozen". In this step, an initial task model formed by a set of Prefix parameters and a pre-training model is a Prefix fine tuning model (Prefix-tuning), and after the Prefix parameters are fine tuned, the model obtained by combining the Prefix parameters with the pre-training model can be adapted to a downstream task.

It should be noted that the prefix parameter is used as a hyper-parameter of the initial task model, and is a parameter directly set by a worker as an input, the number of the parameters is fixed, and the model parameter in the pre-training model is obtained based on the large data set training. In addition, the value of the prefix parameter can also be obtained through training optimization, for example, after the pre-training model is frozen and the model parameter is kept unchanged, the prefix parameter is trained by using a small-scale sample set to obtain a specific value of the prefix parameter.

Step S204, training the initial task model based on training sample data to obtain target weight corresponding to the prefix parameter in the initial task model, wherein the training sample data corresponds to the target task.

Because the prefix parameters are fixed-number hyper-parameters set by workers, the learning capabilities of the pre-training models are different, and the required parameter amounts are also different, if the number of the prefix parameters is too large or too small, over-fitting or under-fitting of the whole model can be caused, and the effect of processing the target task is poor. However, in the prior art, the number of prefix parameters cannot be changed, so that the technical problem cannot be solved. In this step, by matching the prefix parameters with the target weights, if a part of the prefix parameters is given a lower target weight, the current prefix parameters are suppressed, so that the influence of the part of the prefix parameters on the execution of the downstream target task is reduced, which is equivalent to reducing the overall length of the prefix parameters corresponding to the pre-training model, and variably adjusting the number of the prefix parameters, so that the adjusted target task model is more suitable for the execution of the target task. Conversely, other prefix parameters may be amplified by the target weight.

Step S206, based on the target weight and the initial task model, a target task model is determined. It should be noted that the target task model and the initial task model both include pre-training models, and parameters of the pre-training models are kept consistent, and the target task model adds target weight to prefix parameters in the models, so that the initial task model is further optimized in a direction adapted to execute the target task, and the model performance for processing the target task is improved.

Through the steps, the mode of matching weights for the prefix parameters in the initial task model is adopted, different prefix parameters are matched with different target weight values through model training, the influence of each prefix parameter on the model is properly scaled, and the purpose of dynamically adjusting the length of the prefix parameters of the pre-trained model is achieved, so that the technical effect of improving the initial task model to improve the performance of the initial task model when the initial task model processes a specific task is achieved, and the technical problem that the model is over-fitted or under-fitted due to the fact that the prefix is too long or too short in the pre-trained model with the prefix is solved.

As an alternative embodiment, in order to obtain the value of the target weight, the value may be obtained by continuously optimizing in a model training manner. Firstly, the undetermined weight corresponding to the prefix parameter can be determined based on a pre-training model; and then, optimizing and training the undetermined weight and the prefix parameter of the initial task model based on training sample data to obtain a target prefix parameter and a target weight corresponding to the target prefix parameter.

The undetermined weight determined based on the pre-training model can be an initial value of the weight, and the undetermined weight can be initialized according to the structure of the pre-training model, the model parameters and some learned characteristics of the pre-training model when the undetermined weight is determined. In this optional embodiment, the process of optimizing the weight of the prefix parameter and the process of optimizing the prefix parameter itself may be performed synchronously in training based on machine learning, or only the weight may be optimized, and the undetermined weight of the initial task model is optimally trained based on training sample data to obtain the target weight. It should be noted that, the values of the plurality of target weights do not need to be normalized, and through training optimization, if the values of the target weights of a part of prefix parameters are close to 0, it means that the influence of the part of prefix parameters on the model is "eliminated", and although the prefix parameters are not deleted, the prefix parameters play a role in reducing the number of the prefix parameters in a phase-changing manner, so that the target task model can exert better performance.

As an alternative embodiment, determining the pending weights corresponding to the prefix parameters based on the pre-training model may include the following steps: in the case where the pre-trained model includes multiple network layers, the pending weights include: intra-layer undetermined weights; determining the intra-layer undetermined weight corresponding to any one of the prefix parameters, including: determining a target network layer corresponding to any prefix parameter, wherein the target network layer is one of a plurality of network layers included in a pre-training model; obtaining model parameters of a network layer which is positioned above a target network layer in a pre-training model; and determining the undetermined weight in the layer based on the model parameter of the last network layer.

Optionally, the pre-training model may be a deep neural network model including a plurality of network layers, each layer of the network layers corresponds to some prefix parameters, and the intra-layer undetermined weight is an intra-layer weight allocated to a plurality of prefix parameters corresponding to each network layer. By distributing the in-layer undetermined weight to the prefix parameter of each network layer and optimizing the in-layer undetermined weight into the in-layer target weight in a model training mode, the purpose of endowing different network layers with different numbers of prefix parameters can be achieved, and the effect of dynamically adjusting the learning capability of the prefix is achieved. For the pre-training model, knowledge and learning ability contained in each network layer are different, and prefix lengths required by different tasks are also different, so that the mode of giving the same number of prefix parameters to each network layer in the prior art cannot be completely adapted to different network layers and different tasks of the model. By adding the in-layer undetermined weights to the prefix parameters of different network layers and training the in-layer target weights based on the in-layer undetermined weights, part of the prefix parameters corresponding to each network layer can be inhibited or amplified, which is equivalent to reducing or increasing the number of the prefix parameters corresponding to the network layer, so that the dynamic adjustment of the number arrangement of the prefix parameters of the whole lightweight fine-tuning model is realized, and the model can be better adapted to a target task.

As an alternative embodiment, determining the intra-layer pending weights based on the model parameters of the previous network layer may include the following steps: acquiring the learning characteristics of the previous network layer according to the model parameters of the previous network layer; and according to the learning characteristics, constructing the in-layer undetermined weight.

By acquiring the model parameters of the previous network layer of the pre-training model and then specifying the in-layer undetermined weight of the current network layer according to the model parameters, the initial construction of the parameters of the next layer can be assisted by using the information, knowledge and training results of the previous layer. Optionally, the learning feature of the previous network layer may include a classification feature CLS (CLS for short), or other tokens of the previous layer may be used to assist in building the intra-layer undetermined weight.

As an alternative embodiment, according to the learning characteristics, constructing the intra-layer undetermined weight may include generating undetermined factors; and converting the undetermined factors and the learning characteristics into probability value intervals through a logistic function to obtain the undetermined weight in the layer. The undetermined factors represent parameters in the in-layer undetermined weight required to be automatically learned in the model training process, the undetermined factors and the learning characteristics of the previous network layer can be converted into probability values through a logistic function, the range of the probability value interval is [0,1], the probability values are determined as the values of the in-layer undetermined weights, the in-layer undetermined weights and the initial task model can be combined into a model to be trained subsequently, then training is carried out by using training sample data, the values of the undetermined factors are obtained, and the target value of the in-layer undetermined weight corresponding to each prefix parameter can be determined.

Optionally, the intra-layer undetermined weight may be constructed according to the following formula:

α _i ＝sigmoid(w·[CLS] _i-1 )

wherein alpha is _i Representing the undetermined weight corresponding to the target network layer, i represents the targetNumber of layers of network layer, w represents the undetermined factor, [ CLS] _i-1 And the sigmoid represents a logistic function.

As an optional embodiment, the pending weight corresponding to the prefix parameter may also be determined in the following manner: under the condition that the pre-training model comprises a plurality of network layers, the undetermined weight comprises an inter-layer undetermined weight; determining the inter-layer undetermined weight corresponding to any one of the prefix parameters, including: determining an inter-layer undetermined weight corresponding to any one of a plurality of network layers; determining a target network layer corresponding to any one of the prefix parameters, wherein the target network layer is one of a plurality of network layers; and determining the inter-layer undetermined weight corresponding to the target network layer as the inter-layer undetermined weight corresponding to any prefix parameter.

Relative to the intra-layer undetermined weight, the inter-layer undetermined weight can be regarded as a coarse-grained weight coefficient, each network layer can correspond to one inter-layer undetermined weight corresponding to different network layers of the pre-training model, each prefix parameter corresponding to the network layer needs to be suppressed or amplified by the inter-layer undetermined weight, the weight coefficient can adjust the weight ratio between all prefix parameters of the network layer and model parameters of the pre-training model of the network layer, the inter-layer undetermined weight can also be obtained through automatic learning of model training, namely, a target value corresponding to the inter-layer undetermined weight is obtained through learning, and the target value is the target weight.

As an optional embodiment, in the case that the pre-training model includes a plurality of network layers, the pending weight may include two parts, namely, an inter-layer pending weight and an intra-layer pending weight, and the corresponding target weight may also include two parts, namely, an inter-layer target weight and an intra-layer target weight. Through model training optimization, the inter-layer undetermined weight is optimized to be the inter-layer target weight, and the intra-layer undetermined weight is optimized to be the intra-layer target weight. The interlayer target weight can be called coarse granularity parameter, the in-layer target weight can be called fine granularity parameter, and for the prefix parameter of the initial task model, the original prefix parameter is respectively multiplied by the fine granularity parameter and the coarse granularity parameter to obtain the targetAnd target prefix parameters in the target task model. For example, the i-th layer parameters of the target task model may be represented as follows: h is _i ′←[h _i ；λ _i ·α _i ·Δh _i ]。

Wherein h is _i The sum of the model parameter of the ith layer of the pre-trained model in the initial task model and the prefix parameter corresponding to the ith layer, Δ h _i Representing a prefix parameter, λ, corresponding to the i-th layer of the pre-trained model _i Indicating the inter-layer target weight, i.e. coarse-grained parameter, alpha, for the ith layer _i Representing the weight of the target in the layer corresponding to the ith layer, i.e. the fine-grained parameter, h _i ' denotes a parameter of the i-th layer of the target task model after introducing a weight of the prefix parameter.

As an optional embodiment, after the target task model is obtained, the target weight and the prefix parameter may also be sent to a first target device in which a pre-training model is stored, where the first target device is configured to generate the target task model according to the target weight, the prefix parameter, and the pre-training model. In this embodiment, a pre-training model may be stored in advance in the first target device for executing the downstream task, and the target task model may be generated locally only by downloading the target weight and the prefix parameter from the server, so that the transmission pressure of the model data and the storage pressure of the model data at the first target device end are greatly reduced by transmitting and storing the very lightweight prefix parameter data and the target weight data.

As an alternative embodiment, the pre-trained model comprises any one of: pre-training a language model and pre-training an image recognition model. The pre-training language model may include a prefix-tuning model, where the prefix-tuning model adds trainable continuous token, called prefix (prefix), to the input and the beginning of each layer of the pre-training model, and the model parameters of the pre-training model are fixed and only the prefix parameters are updated during training. In addition, the method can be applied to a language model based on transformations so as to improve the utilization rate of parameters, reduce redundancy of the parameters, reduce overfitting, help to improve the effect of a pre-training model obtained by training on a public data set when processing downstream tasks and better finish the tasks. For example, when the pre-training model is a language model and the target task is an inference task, the prefix-tuning model is used to infer whether the correct rate of the association relationship between the question and the answer is 66%, and by adding the target weight to the prefix parameter, the correct rate of the inference task can be increased to 68%, which obviously improves the performance of the model on a specific downstream task.

Fig. 3 is a schematic structural diagram of a dynamic Prefix fine-tuning model according to an alternative embodiment of the present invention, and as shown in fig. 3, Prefix-tuning represents a Prefix fine-tuning model, which is composed of a fixed length, a fixed number of Prefix parameters, and a frozen pre-training model. The Dynamic Prefix-tuning represents a Dynamic Prefix fine-tuning model, and the model consists of a Prefix parameter, target weights corresponding to the Prefix parameter and a frozen pre-training model. The pre-trained model in fig. 3 includes a plurality of network layers, i.e., layer 1 to layer N in the figure, where each network layer corresponds to the model parameters of the pre-trained model and a layer of prefix parameters. The fine granularity parameter in the legend is the weight of an intra-layer target or the weight to be determined in the layer, and the coarse granularity parameter is the weight of an inter-layer target or the weight to be determined in the layer, and the fine granularity parameter depends on whether the dynamic prefix fine tuning model is trained. As can be seen from the figure, each coarse-grained parameter covers all the prefix parameters in each layer, that is, all the prefix parameters in the layer correspond to the same coarse-grained parameter, and the parameter can adjust the weight ratio between the whole prefix parameter in the layer and the model parameter in the layer; the fine-grained parameter may give a weight value to each prefix parameter in a layer, and the fine-grained parameter may suppress or amplify the prefix parameter in each layer, so as to achieve dynamic adjustment of the number of the prefix parameters between different layers, so that the finally obtained dynamic prefix fine-tuning model is more adaptive to the target task, where the dynamic prefix fine-tuning model in fig. 3 is a specific embodiment of the target task model in the above method.

Example 2

The application also provides a second model training method as shown in fig. 4, and fig. 4 is a flowchart of the second model training method according to the embodiment of the invention. As shown in fig. 4, the method includes the steps of:

step S402, transmitting training sample data to a second target device, wherein the training sample data corresponds to the target task.

Step S404, receiving a target weight and a prefix parameter returned by the second target device, wherein the prefix parameter is a group of hyper-parameters of the initial task model and is used for enabling the initial task model to adapt to different tasks, and the target weight is a weight which is obtained by training the initial task model based on training sample data and corresponds to the prefix parameter.

Step S406, determining a target task model based on the target weight, the prefix parameter and a pre-training model pre-stored locally, wherein the pre-training model is the same as the pre-training model in the initial task model.

Through the steps, a method applicable to the terminal device is provided, and the target task model is obtained through rapid training and generated in the terminal device through data interaction between the terminal device and the second target device. The second target device in this embodiment may be a server, the terminal device is configured to execute a downstream target task, and it needs to obtain a target task model capable of executing the target task, at this time, the terminal device may send training sample data corresponding to the target task to the second target device, the second target device trains based on an initial task model whose model type is a prefix fine tuning model, adds a target weight to a prefix parameter in the model to obtain a target task model adapted to the target task, and then the second target device may only return the prefix parameter and the target weight to the terminal device, where both the two types of data are light-weight data, the transmission speed is fast, the terminal device stores no pressure, and the terminal device generates the target task model in the terminal device based on a pre-training model stored in advance by itself in combination with the prefix parameter and the target weight, the method completes the quick creation of the model in transmission, improves the generation speed of the model and the adaptability to the target task, and improves the performance of the model by adding the weight to the prefix parameter.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the model training method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 3

According to an embodiment of the present invention, there is also provided an apparatus for implementing the first model training method, fig. 5 is a block diagram of a first model training apparatus according to an embodiment of the present invention, and as shown in fig. 5, the first model training apparatus 50 includes: an acquisition module 52, a training module 54, and a first determination module 56, which describe various parts of the apparatus below:

an obtaining module 52, configured to obtain an initial task model, where the initial task model includes a pre-training model and a set of prefix parameters, and the prefix parameters are hyper-parameters of the initial task model and are used to adapt the initial task model to different tasks;

a training module 54, connected to the obtaining module 52, configured to train the initial task model based on training sample data, to obtain a target weight corresponding to the prefix parameter in the initial task model, where the training sample data corresponds to the target task;

a first determining module 56, connected to the training module 54, is configured to determine a target task model based on the target weights and the initial task model.

It should be noted here that the acquiring module 52, the training module 54 and the first determining module 56 correspond to steps S202 to S206 in embodiment 1, and a plurality of modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

Example 4

According to an embodiment of the present invention, there is further provided an apparatus for implementing the second model training method, fig. 6 is a block diagram of a second model training apparatus according to an embodiment of the present invention, and as shown in fig. 6, the second model training apparatus 60 includes: a sending module 62, a receiving module 64 and a second determining module 66, which are described below for the parts of the second model training device 60:

a sending module 62, configured to send training sample data to the second target device, where the training sample data corresponds to the target task;

a receiving module 64, connected to the sending module 62, configured to receive a target weight and a prefix parameter returned by the second target device, where the prefix parameter is a set of hyper-parameters of the initial task model, and is used to adapt the initial task model to different tasks, and the target weight is a weight corresponding to the prefix parameter and obtained by training the initial task model based on training sample data;

and a second determining module 66, connected to the receiving module 64, for determining the target task model based on the target weight, the prefix parameter and a pre-training model pre-stored locally, wherein the pre-training model is the same as the pre-training model in the initial task model.

It should be noted here that the sending module 62, the receiving module 64 and the second determining module 66 correspond to steps S402 to S406 in embodiment 2, and a plurality of modules are the same as the corresponding steps in implementation examples and application scenarios, but are not limited to the disclosure in embodiment 2. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

Example 5

The embodiment of the invention can provide a computer terminal which can be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.

In this embodiment, the computer terminal may execute program codes of the following steps in the model training method for the application program: acquiring an initial task model, wherein the initial task model comprises a pre-training model and a group of prefix parameters, and the prefix parameters are hyper-parameters of the initial task model and are used for enabling the initial task model to be adapted to different tasks; training the initial task model based on training sample data to obtain target weights corresponding to the prefix parameters in the initial task model, wherein the training sample data correspond to the target tasks; and determining a target task model based on the target weight and the initial task model.

Alternatively, fig. 7 is a block diagram of a computer terminal according to an embodiment of the present invention. As shown in fig. 7, the computer terminal may include: one or more (only one shown) processors, memories, etc.

The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the model training method and apparatus in the embodiments of the present invention, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, so as to implement the model training method. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory located remotely from the processor, and these remote memories may be connected to the computer terminal through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring an initial task model, wherein the initial task model comprises a pre-training model and a group of prefix parameters, and the prefix parameters are hyper-parameters of the initial task model and are used for enabling the initial task model to be adapted to different tasks; training the initial task model based on training sample data to obtain target weights corresponding to the prefix parameters in the initial task model, wherein the training sample data correspond to the target tasks; and determining a target task model based on the target weight and the initial task model.

Optionally, the processor may further execute the program code of the following steps: training the initial task model based on training sample data to obtain target weights corresponding to the prefix parameters in the initial task model, wherein the target weights comprise: determining undetermined weights corresponding to the prefix parameters based on a pre-training model; and performing optimization training on the undetermined weight and the prefix parameter of the initial task model based on training sample data to obtain a target prefix parameter and a target weight corresponding to the target prefix parameter.

Optionally, the processor may further execute the program code of the following steps: determining the undetermined weight corresponding to the prefix parameter based on a pre-training model, comprising: in the case where the pre-trained model includes multiple network layers, the pending weights include: intra-layer undetermined weights; determining the intra-layer undetermined weight corresponding to any one of the prefix parameters, including: determining a target network layer corresponding to any one prefix parameter, wherein the target network layer is one of a plurality of network layers included in a pre-training model; obtaining model parameters of a network layer which is positioned above a target network layer in a pre-training model; and determining the undetermined weight in the layer based on the model parameter of the last network layer.

Optionally, the processor may further execute the program code of the following steps: determining the undetermined weight in the layer based on the model parameter of the previous network layer, comprising: acquiring the learning characteristics of the previous network layer according to the model parameters of the previous network layer; and according to the learning characteristics, constructing the in-layer undetermined weight.

Optionally, the processor may further execute the program code of the following steps: according to the learning characteristics, the in-layer undetermined weight is constructed, and the method comprises the following steps: generating a undetermined factor; and converting the undetermined factors and the learning characteristics into probability value intervals through a logistic function to obtain the undetermined weight in the layer.

Optionally, the processor may further execute the program code of the following steps: determining the undetermined weight corresponding to the prefix parameter based on a pre-training model, comprising: in the case where the pre-trained model includes multiple network layers, the pending weights include: inter-layer undetermined weights; determining the inter-layer undetermined weight corresponding to any one of the prefix parameters, including: determining an inter-layer undetermined weight corresponding to any one of a plurality of network layers; determining a target network layer corresponding to any one of the prefix parameters, wherein the target network layer is one of a plurality of network layers; and determining the inter-layer undetermined weight corresponding to the target network layer as the inter-layer undetermined weight corresponding to any prefix parameter.

Optionally, the processor may further execute the program code of the following steps: and sending the target weight and the prefix parameter to first target equipment in which a pre-training model is stored, wherein the first target equipment is used for generating a target task model according to the target weight, the prefix parameter and the pre-training model.

Optionally, the processor may further execute the program code of the following steps: the pre-training model includes any one of: pre-training a language model and pre-training an image recognition model.

Optionally, the processor may further execute the program code of the following steps: transmitting training sample data to second target equipment, wherein the training sample data corresponds to the target task; receiving target weight and prefix parameter returned by the second target equipment, wherein the prefix parameter is a group of hyper-parameters of the initial task model and is used for enabling the initial task model to adapt to different tasks, and the target weight is weight which is obtained by training the initial task model based on training sample data and corresponds to the prefix parameter; and determining a target task model based on the target weight, the prefix parameters and a pre-training model pre-stored locally, wherein the pre-training model is the same as the pre-training model in the initial task model.

It can be understood by those skilled in the art that the structure shown in fig. 7 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 7 is a diagram illustrating the structure of the electronic device. For example, computer terminal 70 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 7, or have a different configuration than shown in FIG. 7.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 6

Embodiments of the present invention also provide a computer-readable storage medium. Optionally, in this embodiment, the computer-readable storage medium may be used to store the program code executed by the model training method provided in embodiment 1.

Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring an initial task model, wherein the initial task model comprises a pre-training model and a group of prefix parameters, and the prefix parameters are hyper-parameters of the initial task model and are used for enabling the initial task model to be adapted to different tasks; training the initial task model based on training sample data to obtain target weights corresponding to the prefix parameters in the initial task model, wherein the training sample data correspond to the target tasks; and determining a target task model based on the target weight and the initial task model.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: training the initial task model based on training sample data to obtain target weight corresponding to the prefix parameter in the initial task model, including: determining undetermined weights corresponding to the prefix parameters based on a pre-training model; and performing optimization training on the undetermined weight and the prefix parameter of the initial task model based on training sample data to obtain a target prefix parameter and a target weight corresponding to the target prefix parameter.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: determining the undetermined weight corresponding to the prefix parameter based on a pre-training model, comprising: in the case where the pre-trained model includes multiple network layers, the pending weights include: intra-layer undetermined weights; determining the intra-layer undetermined weight corresponding to any one of the prefix parameters, including: determining a target network layer corresponding to any prefix parameter, wherein the target network layer is one of a plurality of network layers included in a pre-training model; obtaining model parameters of a network layer which is positioned above a target network layer in a pre-training model; and determining the undetermined weight in the layer based on the model parameter of the last network layer.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: determining the undetermined weight in the layer based on the model parameter of the previous network layer, comprising: acquiring the learning characteristics of the previous network layer according to the model parameters of the previous network layer; and according to the learning characteristics, constructing the in-layer undetermined weight.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: according to the learning characteristics, the method for constructing the in-layer undetermined weight comprises the following steps: generating a undetermined factor; and converting the undetermined factors and the learning characteristics into probability value intervals through a logistic function to obtain the undetermined weight in the layer.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: determining the undetermined weight corresponding to the prefix parameter based on a pre-training model, comprising: in the case where the pre-trained model includes multiple network layers, the pending weights include: inter-layer undetermined weights; determining the inter-layer undetermined weight corresponding to any one of the prefix parameters, including: determining an inter-layer undetermined weight corresponding to any one of a plurality of network layers; determining a target network layer corresponding to any one of the prefix parameters, wherein the target network layer is one of a plurality of network layers; and determining the inter-layer undetermined weight corresponding to the target network layer as the inter-layer undetermined weight corresponding to any prefix parameter.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: the method further comprises the following steps: and sending the target weight and the prefix parameter to first target equipment in which a pre-training model is stored, wherein the first target equipment is used for generating a target task model according to the target weight, the prefix parameter and the pre-training model.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: the pre-training model includes any one of: pre-training a language model and pre-training an image recognition model.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: transmitting training sample data to second target equipment, wherein the training sample data corresponds to the target task; receiving target weight and prefix parameter returned by the second target equipment, wherein the prefix parameter is a group of hyper-parameters of the initial task model and is used for enabling the initial task model to adapt to different tasks, and the target weight is weight which is obtained by training the initial task model based on training sample data and corresponds to the prefix parameter; and determining a target task model based on the target weight, the prefix parameters and a pre-training model prestored in the local area, wherein the pre-training model is the same as the pre-training model in the initial task model.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is substantially or partly contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method of model training, comprising:

acquiring an initial task model, wherein the initial task model comprises a pre-training model and a group of prefix parameters, and the prefix parameters are hyper-parameters of the initial task model and are used for enabling the initial task model to adapt to different tasks;

training the initial task model based on training sample data to obtain target weight corresponding to the prefix parameter in the initial task model, wherein the training sample data corresponds to a target task;

and determining a target task model based on the target weight and the initial task model.

2. The method of claim 1, wherein the training the initial task model based on training sample data to obtain the target weight corresponding to the prefix parameter in the initial task model comprises:

determining an undetermined weight corresponding to the prefix parameter based on the pre-training model;

and carrying out optimization training on the undetermined weight and the prefix parameter of the initial task model based on the training sample data to obtain a target prefix parameter and a target weight corresponding to the target prefix parameter.

3. The method of claim 2, wherein determining the pending weights corresponding to the prefix parameters based on the pre-trained model comprises:

in the case where the pre-trained model includes multiple network layers, the pending weights include: intra-layer undetermined weights;

determining the intra-layer undetermined weight corresponding to any one of the prefix parameters, including:

determining a target network layer corresponding to any one prefix parameter, wherein the target network layer is one of a plurality of network layers included in the pre-training model;

obtaining model parameters of a network layer which is positioned above the target network layer in the pre-training model;

determining the intra-layer pending weights based on the model parameters of the previous network layer.

4. The method of claim 3, wherein said determining the intra-layer pending weights based on the model parameters of the previous network layer comprises:

acquiring the learning characteristics of the previous network layer according to the model parameters of the previous network layer;

and constructing the in-layer undetermined weight according to the learning characteristics.

5. The method of claim 4, wherein said constructing the intralayer pending weights according to the learned characteristics comprises:

generating a undetermined factor;

and converting the undetermined factor and the learning characteristic into a probability value interval through a logistic function to obtain the undetermined weight in the layer.

6. The method of claim 2, wherein determining the pending weights corresponding to the prefix parameters based on the pre-trained model comprises:

in the case where the pre-trained model includes a plurality of network layers, the pending weights include: inter-layer undetermined weights;

determining the inter-layer undetermined weight corresponding to any one of the prefix parameters, including:

determining an inter-layer undetermined weight corresponding to any one of the plurality of network layers;

determining a target network layer corresponding to any one of the prefix parameters, wherein the target network layer is one of the plurality of network layers;

and determining the interlayer undetermined weight corresponding to the target network layer as the interlayer undetermined weight corresponding to the any one prefix parameter.

7. The method of claim 1, further comprising: and sending the target weight and the prefix parameter to a first target device in which the pre-training model is stored, wherein the first target device is used for generating the target task model according to the target weight, the prefix parameter and the pre-training model.

8. The method of any of claims 1 to 7, wherein the pre-trained model comprises any of: pre-training a language model and pre-training an image recognition model.

9. A method of model training, comprising:

transmitting training sample data to second target equipment, wherein the training sample data corresponds to a target task;

receiving target weight and prefix parameter returned by the second target device, wherein the prefix parameter is a group of hyper-parameters of an initial task model and is used for enabling the initial task model to adapt to different tasks, and the target weight is weight which is obtained by training the initial task model based on training sample data and corresponds to the prefix parameter;

and determining a target task model based on the target weight, the prefix parameters and a pre-training model pre-stored locally, wherein the pre-training model is the same as the pre-training model in the initial task model.

10. A model training apparatus, comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an initial task model, the initial task model comprises a pre-training model and a group of prefix parameters, and the prefix parameters are hyper-parameters of the initial task model and are used for enabling the initial task model to be adapted to different tasks;

the training module is used for training the initial task model based on training sample data to obtain target weight corresponding to the prefix parameter in the initial task model, wherein the training sample data corresponds to a target task;

and the first determining module is used for determining a target task model based on the target weight and the initial task model.

11. A model training apparatus, comprising:

the sending module is used for sending training sample data to second target equipment, wherein the training sample data corresponds to a target task;

a receiving module, configured to receive a target weight and a prefix parameter returned by the second target device, where the prefix parameter is a set of hyper-parameters of an initial task model, and is used to adapt the initial task model to different tasks, and the target weight is a weight corresponding to the prefix parameter and obtained by training the initial task model based on training sample data;

and the second determination module is used for determining a target task model based on the target weight, the prefix parameters and a pre-training model pre-stored locally, wherein the pre-training model is the same as the pre-training model in the initial task model.

12. A computer-readable storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the model training method of any one of claims 1 to 9.

13. A computer device, comprising: a memory and a processor, wherein the processor is capable of,

the memory stores a computer program;

the processor for executing a computer program stored in the memory, the computer program when executed causing the processor to perform the model training method of any one of claims 1 to 9.